Subj : Committing file changes
To   : David Noon
From : Coridon Henshaw
Date : Mon Aug 21 2000 05:09 pm

On Sunday August 20 2000 at 08:00, David Noon wrote to Coridon Henshaw:

DN> Since 4-to-256 bytes does not constitute a typical Usenet article, those
DN> would not be your logical syncpoints. You should be physically writing the
DN> data to disk at your syncpoints and only at your syncpoints.

I break up articles into 251 byte chunks and write the chunks as a linked list.
Since the database will reuse article chunks which have been freed as a result
of article expiry, the article linked lists need not be sequential.  As such,
when the DB engine writes an article, it writes 251 bytes, reads and updates
the five byte control structure, then seeks to the next block.  This process
continues until the entire article is written.  It's not really possible to
break up these writes without giving up the linked list structure, and with it,
either the ability to rapidly grow the database, or the ability of the DB to
reuse existing space as articles are expired.

CH>> My database format and engine implementations are robust enough to
CH>> cope with applications dying unexpectedly without finishing write
CH>> operations; they're not robust enough to handle boot-up CHKDSK
CH>> removing 80Kb of data from the end of a 100Kb file.

DN> So you do have a syncpoint architecture, then?

<snip>

While I appricate your comments, what you suggest is vast overkill for my
application.  The NewsDB engine isn't sophisticated enough to support
syncpoints or rollback.  Think along the lines of Squish MSGAPI rather than
DB2: NewsDB is basically Squish-for-Usenet.  My intention is to produce a
lightweight multiuser offline news system so small groups of users (1-25) can
read Usenet offline without needing to install a full news server.  As a
lightweight alternative to a local news server, NewsDB doesn't need the
overhead of a fully-fledged SQL engine.

NewsDB is decidedly not intended for mission critical environments; surviving
Anything and Everything isn't part of the design requirements.  Rather, my
intention is to contain common errors to the extent that they can be repaired
by automated repair tools.

DN> This seems to me to be the type of activity you really want to perform.
One
DN> of your problems is that your input stream is not persistent, as it would
DN> be a socket connected to a NNTP server [if I read your design correctly,
DN> and assume you are coding from the ground up]. This means that you need to
DN> be able to restart a failed instance of the application, resuming from its
DN> most recent succesful syncpoint. The usual method to deal with this is to
DN> use a log or journal file that keeps track of "in flight" transactions;
the
DN> journal is where your I/O remains unbuffered. If your NNTP server allows
DN> you to re-fetch articles -- and most do -- you can keep your journal in
RAM
DN> or on a RAMDISK; this prevents performance hits for doing short I/O's.

Just to clarify things: NewsDB isn't a single application.  It's a RFC-based
message base format similar in purpose to Squish and JAM.  I'm writing an
access library (NewsDBLib) to work with the NewsDB format.  I'm also writing
two applications (an importer and a reader) which use NewsDBLib.

At the moment, none of these applications download news.  The importer reads
SOUP packets from disk just so I can avoid messing with NNTP.  Reading from
disk also gives me the flexibility to, at a later date, import other packet
formats such as UUCP news and FTN PKT.

--- GoldED/2 3.0.1
* Origin: Life sucks and then you croak. (1:250/820)