Subj : Re: Dupe checking
To   : Dale Shipp
From : Maurice Kinal
Date : Sat Apr 10 2004 09:02 am

Hey Dale!

Apr 09 23:35 04, Dale Shipp wrote to Bo Simonsen:

DS>   Not so.   Squish has the ability to use two different sorts of
DS>   duplicate checking.   One is based on MSGID.

Which can be duped without the actual messsage being a dupe.

DS> The other is based on
DS>   the header info (TO, FROM, SUBJ and DATE).  If *either* of these
DS> two
DS>   things is a match, then Squish calls the message a dupe.

The problem with this is that often some of these fields are altered such as
the SUBJ field, which is quite common I've noticed.  Thus an actual dupe could
slip by or nondupes end up a dupes.  The DATE, containing time, is based on the
originating computer's time which can malfunction without the computer
malfunctioning (ie create unique messages all with the exact same time and
date).  The TO and FROM can be the same without the messages being dupes.

The only true way is to check the actual message but I wouldn't do that for
every one coming through but instead if it fails any of the preliminary tests
then only those would get a more rigorous test ... maybe.  :-)

The thing is that there is too much broken, unsupported stuff out there and far
too many kludges being tacked on along the way by everyone and their dog.
Personally I have gotten to the stage where I am just stripping everything out
of messages, minus the TO, FROM, DATE (<- ignoring it though), and getting the
originating nodenumber from the Origin.  Everything else is known without
jumping through hoops once they are stripped.  Without compression I found that
I average a 50% reduction in archive size when compared to normal Fido
archiving methods.

Life is good,
Maurice

--- Msged/LNX 6.1.1
* Origin: Little Mikey's Brain BBS - A work in progress (1:153/401.1)