Subj : Dupeloops
To : Rob Swindell
From : mark lewis
Date : Wed Jun 20 2018 08:08 am
On 2018 Jun 19 22:43:24, you wrote to me:
>> AFAIK, seenbys and paths are not included in most dupe detection
>> schemes... other non-changing control lines are fine to be included...
>> one of the problems comes when some system sort those control lines on
>> messages they are passing along... we don't see so much of that like we
>> did at one time ;)
RS> So some metadata is included in the data that is hashed for dupe
RS> detection and some is not?
yes...
RS> Are you sure about that?
yes... in fact, and i don't recall who pointed this out to me back in the '90s,
dbridge does exactly this in a manner of speaking... it takes the whole message
header plus X bytes immediately following the message header and uses all of
that as at least part of the checksum calculation... this was pointed out to me
when i was working on my posting tool and was adding MSGID support to it...
i was using a library and just letting it do its thing... some of my test posts
were reported as dupes when they clearly weren't... IIRC, they were detected as
dupes because they were posted within the same second... it turned out that my
MSGID was somewhere in the middle of the control lines at the beginning of the
message body and only my dbridge using testers were seeing this... someone
pointed out this thing about dbridge also using X bytes from the beginning of
the message body in addition to the message header so i moved my posting tool's
MSGID to the top of the list and no more dupes were detected by those dbridge
systems...
i don't know what other systems do... there's only a very few that provide this
information... SBBS is one of them... when i was testing Mystic, there was some
discussion about dupe detection as james worked to try to figure out the best
method he liked... i have used fastecho here for decades but i don't know what
data it uses for its checksums... i do know it uses two checksums, though... i
know this because i was being nosy one day and looking at FE's dupe database
file (one for all message areas) with a hex viewer and noticed that groups of
bytes were repeated all throughout the file... i asked about this and was told
i found a bug... basically, FE has two checksums that it uses for each message
and both are supposed to be stored in the database... what i found was that
only one was being used and written to both fields... toby fixed that problem
right quick... i just don't know what data is used to calculate them...
back in the day, dupe detection formulas were not really shared around... maybe
a couple of developers talking amongst themselves would tell each other what
they were doing but this information was not published where everyone could
find it... it was more or less black majik to a point...
RS> Anyway, duplicate Message-IDs *should* be caught be any FTN software
RS> written or updated in the past 20 years.
true... and we still have some systems that don't do MSGID at all so other
methods must be used on them...
)\/(ark
Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it
wrong...
... WANTED: assistant to magician in beheading illusion. Blue Cross & salary.
---
* Origin: (1:3634/12.73)