Subj : ET phone home
To : Nick Andre
From : mark lewis
Date : Sun Jan 08 2017 11:44 am
On 2017 Jan 08 10:30:12, you wrote to me:
NA> On 08 Jan 17 09:16:24, Mark Lewis said the following to Nick Andre:
ML>> exactly sure on the details but i know that they greatly influenced me
ML>> to my software place the MSGID as close to the beginning of the control
ML>> lines possible so that db would not detect messages posted within one
ML>> second as dupes... this was especially important when testing at 100+
ML>> posts per second... are you willing to share information about how db
ML>> does its dupe detection so others can understand more? please?
NA> Its not that hard to understand. A CRC is computed from the header and
date
NA> of the tossed message. I would have to dig into the code and I'm not sure
NA> how many bytes are being included from the start of the message.
yeah, these little details are what is/was being sought... if the header and X
bytes are being read into a buffer and then the CRC calculated on that entire
raw buffer or if each field is read individually and then fed to the CRC
calculator... knowing this may help others who are trying to work out how to do
dupe checking that doesn't rely on MSGID alone... that because messages without
MSGID can'tbe checked that way so an alternative or three is desired/needed...
NA> Each Echomail area has a cache database file. In the case of *.MSG,
NA> this is called DBRIDGE.DUP and resides in each area and for
NA> Hudson/QBBS there is one database segmented slightly different. The
NA> CRC's are kept in there. I believe the code sets the cache database
NA> size at 1,024 entries.
i remember the different dup cache files... i didn't know they were limited to
a paltry 1024 entries, though... i never dug that deep ;)
NA> Interestingly it appears that there is a "reputation" method for the
NA> cache database. It appears as it is loaded into RAM during a toss, any
NA> time a CRC match is encountered, that CRC is pushed up the cache
NA> table, while CRC's of legitimate messages end up being pushed down.
NA> The CRC table is saved into that cache file every time the Echomail
NA> area changes in the toss cycle; or there are no more packets to toss.
that's pretty interesting... i guess that's so that messages with more dupes
can be detected faster with their CRCs at the top of the queue... interesting
idea and i'm sure one that was important back in the day of slower machines :)
)\/(ark
Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it
wrong...
... Actually, if they leak, you've pumped them too many times.
---
* Origin: (1:3634/12.73)