Subj : *.MSG vs. packed help?
To : All
From : Mike Luther
Date : Sun Jul 08 2001 07:54 am
Looking for suggestions here.
Long ago and far away I gave a small amount of money for a DOS database
analysis tool called askSam. The askSam, at least the professional version,is
written totally in assembler and is VERY fast and good. Later on the askSam
utility was re-written or Win 3.1 and has, I think, also been ported to WIN-9x,
but not OS/2. As well, it was merged into a full CGI script driven Web page
creation and hosting tool that uses the core capabilities of the askSam engine
as well.
What's neat about askSam? Well, for one thing, you just read the whole mess
for any old text into the database. Then *AFTER* you have any database
created, you can create fields, manipulate all the fields, and whatever,without
ever messing with the actual database! Intriquing indeed. The tool has, so
said, a fair nitch market in law enforcement, for munching in huge piles of
information. Then you go in and research patterns in the information which
will not, perhaps, be apparent as to how you can prove how this or that record
is related to this or that other record until askSam goes through all the
needles in the haystack and shows you.
The US DEA broke the Manuel Noriega case with it, for example. They crunched
in all the surveilance traffic and used it to ferret out who talked to whom and
about what over megabytes of text .. ;)
In essence, for about $65 back then on special, I got, all those years back,a
full text search engine for all the Fido traffic as well. Preposterous? Not at
all! For a smaller FidoNet net like Net 117 here, all the local traffic for
the last ten years isn't any more than about 36MB or so! That's because the
engine is a full relational database engine!
What I've been doing all these years is simple -- as long as I use a *.MSG
format. All I do is take the inbound messages, and export them into a pure
ASCI text base dupe file! Then I call askSam as a command line input function.
I then punch in the entire message .. seen by's .. origin lines,the whole 9
yards into the master database. From that point on I can tell you, for
example, the identity of every node that ever used this or that word of
profanity .. or whatever. E.v.e.r.y W.o.r.d.
Interesting. Blows the Hades out of Fido elections and complaints at times.
Anyway.
Problem is that as I and a few folks have gotten more interested in other
things and higher traffic volume echos, changes are necessary. It matters not
what BBS system is involved, the pricipal is the same. One of the reasons we
all moved away from *.MSG format is because it takes too long to scan and
manipulate the message base in that format as the number off message areas and
messages in them grows up!
I've reached that point now where the system still works fine, but the
maintenace operations for the utilties take too long in the *.MSG format for
the system.
What do I do now?
I need a *.MSG format for the inbound traffic, keyboard or toss produced, it
doesn't matter. I need that, or at least I think I do, to export that into the
askSam ghoul and utilities I hand wrote to convert the *.MSG format to what I
want for Uncle Sambo!
It doesn't really matter if the results are not done in synchronization with
the BBS system or not. In-as-much as OS/2 is threaded, if I just have the pile
of files to smunch into the ghoul, that part of the process can go on
irrespective of whether or not the BBS has been returned to service.
But what is missing, if I go to, for example SQUISH and a SQUISH message base,
is the fact that in the toss process, we don't get a new cap and a new stack of
*.MSG's in a directory to clue the askSam thread task, to "Aha! traffic! Munch
in all from the old cap to the new one!" That's how I wrote the interface code
hard coded untility which is called now to do the task I wrote ass MSG2ASK.EXE
all these years ago.
At 36 MB of file askSam isn't even cooking yet. It can handle a 2 GB ull
relational database and 4 GB under some file systems! Split across an OS/2 LVM
oriented drive, who knows where the limit is! I suspect it's actually larger
still, but at the time it was written, getting to 4GB was a feat!
Advice?
Suggestions?
Mike @ 1:117/3001
--- Maximus/2 3.01
* Origin: Ziplog Public Port (1:117/3001)