Subj : Committing file changes
To   : Coridon Henshaw
From : David Noon
Date : Sun Aug 20 2000 01:00 am

Hi Coridon,

Replying to a message of Coridon Henshaw to David Noon:

CH> I'm building an open-source databasing offline Usenet news system,
CH> basically along the lines of standard Fidonet message tossers and
CH> readers, except designed from the ground up for Usenet news.  As I
CH> intend the system to be portable, I'd like to keep the number of
CH> platform-specific API calls to an absolute minimum.

Thats poses some difficulties. A combination of safety, performance and
platform-independence is a big ask. I would tend to compromise that last one
before I compromised the first two.

DN>> Firstly, you should not be doing buffered I/O if your updates must be
DN>> committed immediately, so you should not use fopen() and fwrite()
DN>> without a setbuf() call to suppress buffer allocation.  Better yet,
DN>> you should consider using open() and write() instead, and use the
DN>> UNIX-like unbuffered I/O routines.

CH> I'm concerned that disabling buffering entirely is going to hurt
CH> performance very badly as my application does lots of short (4 to 256
CH> byte) IO calls.  Relying on the disk cache to handle this kind of
CH> load seems a bit wasteful.

Since 4-to-256 bytes does not constitute a typical Usenet article, those would
not be your logical syncpoints. You should be physically writing the data to
disk at your syncpoints and only at your syncpoints.

DN>> Moreover, if your data resources are critically important then you
DN>> should be handling any traps that occur in your program and cleaning
DN>> up the critical data resources in an orderly manner. This is far and
DN>> away the most professional approach to the situation. About the only
DN>> things you can't handle are kernel level traps and power outages.

CH> The problem I ran into was that the kernel trapped (for reasons
CH> unrelated to this project) a few hours after I wrote an article into
CH> the article database.  Since database was still in open (I leave the
CH> article reader running 24x7), the file system structures were
CH> inconsistant enough that CHKDSK truncated the database well before
CH> its proper end point.  As you say, catching exceptions wouldn't help
CH> much here.

The flip side is that kernel traps are far less frequent than application
traps, especially during development of the application. If your data integrity
is critical you should not only be handling any exceptions that arise, but you
should be rolling back to your most recent syncpoint when an error does arise.

CH> My database format and engine implementations are robust enough to
CH> cope with applications dying unexpectedly without finishing write
CH> operations; they're not robust enough to handle boot-up CHKDSK
CH> removing 80Kb of data from the end of a 100Kb file.

So you do have a syncpoint architecture, then?

DN>> In your situation, I would have used the second facility before
DN>> considering any intermediate commits.

CH> It's not intermediate commits I need: what I need is some way to flush
CH> out write operations made to files which might be open for days or
CH> weeks at a time.

That's what an intermediate commit is.

The way industrial strength database management systems work [since at least
the days of IMS/360, over 30 years ago] is that an application would have
defined within it points in its execution where a logical unit of work was
complete and the state of the data on disk should by synchronized with the
state of the data in memory; this is how the term "syncpoint" arose, and the
processing between syncpoints became known as a transaction. The process of
writing the changes in data to disk became known as commiting the changes. The
SQL statement that performs this operation under DB2, Oracle, Sybase and other
RDBMS's is COMMIT.

These RDBMS's also have another statement, coded as ROLLBACK. This backs out a
partially complete unit of work when an error condition has arisen. The upshot
is that the content of the database on disk can be assured to conform to the
data model the application is suposed to support. It does not mean that every
byte of input has been captured; it means, instead, that the data structures on
disk are consistent with some design.

This seems to me to be the type of activity you really want to perform. One of
your problems is that your input stream is not persistent, as it would be a
socket connected to a NNTP server [if I read your design correctly, and assume
you are coding from the ground up]. This means that you need to be able to
restart a failed instance of the application, resuming from its most recent
succesful syncpoint. The usual method to deal with this is to use a log or
journal file that keeps track of "in flight" transactions; the journal is where
your I/O remains unbuffered. If your NNTP server allows you to re-fetch
articles -- and most do -- you can keep your journal in RAM or on a RAMDISK;
this prevents performance hits for doing short I/O's.

This design and implementation seem like a lot of work, and I suppose they are.
But some old timers were doing this on machines with only 128KiB of RAM when I
was in high school, so a modern PC should handle it easily. To save yourself a
lot of coding, you might care to use a commercial DBMS; a copy of DB2 UDB
Personal Developer Edition can be had free for the download, or on CD for the
price of the medium and shipping. Start at:
   http://www.software.ibm.com/data/db2/
and follow the links to the download areas, or ask Indelible Blue about CD
prices.

Using a multi-platform commercial product will provide you with platform
independence, as well as safety. It is the simplest and most robust approach
unless you are prepared either to do a lot of coding or compromise on the
safety of your application.

Regards

Dave
<Team PL/I>

--- FleetStreet 1.25.1
* Origin: My other computer is an IBM S/390 (2:257/609.5)