Subj : Double postings
To   : Joe Martin
From : mark lewis
Date : Sun Sep 15 2019 11:26 am

On 2019 Sep 15 08:31:20, you wrote to me:

JM> -> MSGID is the main way but older software doesn't generate MSGID so
JM> -> other methods need to be used...

JM> My mailer/tosser uses a combined approach.  If the message contains a
JM> MSGID then use its value, otherwise CRC the header and message body
JM> including control lines but never the SEEN-BY/PATH lines (considering
JM> they change all the time).

this is good but for one small thing... there is a package that is known to be
reformatting messages in transit which is going to throw the message body CRC
out the door... there is no estimate on when this but will be fixed as the
developer is apparently quite busy with RL outside of FTNs...

JM> The tosser never duplicates an MSGID either as it maintains a file
JM> with the last used value seeded upon creation by the current
JM> date/time.  This prevents issues should that file get deleted.

sounds similar to what my MSGID code does... i've shared that information with
several folks... not sure if you were one of those or not... i still have the
original 1994 (i think) post that described it, too :)

JM> To provide speed and limit disk space, I also have an expiration
JM> mechanism (user configurable) that will purge CRC entries after a given
JM> amount of time (ie: 2 weeks but not more than 30 days).  So while it's
JM> efficient catching dupes in that time period, if someone does a rescan
JM> and dumps everything back into the echo a month later, it won't catch
JM> them. It's a trade off, but back in the day when we had 40mb drives and
JM> 8088/80286 processors, it was extremely important.

yeah and that's gonna likely be a problem since the spec states three years...
in this day in time, retaining three years worth of dupe detection data should
be a small drop in the bucket of available drive space and processing power
needed to perform a lookup...

JM> -> instead of CRC... the problem then comes from those systems that
JM> -> mistakenly reformat the messages as they process them and write the
JM> -> reformatted messages to new PKTs... now the message body is

JM> Yeah this is and always will be an issue.

not if the message body is not CRC'd ;)

i really like (IIRC) the d'bridge method of taking the header and first 40
bytes (i think) of the message body to get those few initial control lines and
using that... that'll take care of the different dates as well as the MSGID but
i would also grab the MSGID if it exists and store it in the database as
well... basically i'm thinking of at least two or three fields in each
record...

JM> -> is apparent on systems that only get, for example, one posting of
JM> an
JM> -> echos rules each month and only accept new postings of those rules

JM> It would seem to me, (me mind you) that if you're moderating an echo,
JM> your software "should" be able to generate a MSGID to prevent this issue
JM> entirely.  But hey...

that depends on the software used... some text file posting tools are really
old and do not have any concept of MSGID... i'm thinking of the old Harvey's
Robot in at least one case...

JM> -> what i would do would be to ask other tosser devs what they use in
JM> -> their code...
JM> ->
JM> -> listed in no particular order:
JM> ->
JM> -> tobias burchhardt  - fastecho
JM> -> rob swindell       - sbbsecho
JM> -> nick andre         - d'bridge
JM> -> vince coen         - mbse's tosser
JM> -> kim heino          - bbbs' tosser
JM> -> wilfred van velzen - fmail
JM> -> james coyle        - mystic

JM> Thanks Mark...

you're welcome... i hope that you've also seen the other two posts about HPT
and intermail which should also be added to the above list...

)\/(ark

Once men turned their thinking over to machines in the hope that this would set
them free. But that only permitted other men with machines to enslave them.
... You may never know who's right but you always know who is in charge!
---
* Origin:  (1:3634/12.73)