Subj : Random line from a file
To   : All
From : Amcleod
Date : Tue Aug 21 2001 05:39 am

The question arises periodically -- how to select a line randomly from a text
file?  If the file were filled with fixed-length records you could compute the
offset of a random line and FSEEK/FREAD the apropriate line.  Alas, text files
are annoyingly variable in line-length.  Or you could read all the lines into
an array (simulated on disk if needed) and select randomly from the array.
Ugh!  So here is my method-of-choice, which reads through the entire file from
top to bottom, and randomly picks one line out of the file:

 #----------------------------------------------------#
 # randomly select one random-length line from a file #
 #----------------------------------------------------#

 !INCLUDE FILE_IO.INC

 STR randomfile inbuffer winner
 INT handle line_count rval cval

 #--------------------#
 # set up some values #
 #--------------------#
 SET randomfile "c:\\sbbs\\exec\\oneliner.txt"
 SET line_count 0
 SET winner "Default Value -- in case routine fails"

 #---------------#
 # open the file #
 #---------------#
 FOPEN handle O_RDONLY randomfile
 IF_FALSE
   PRINTF "Error opening %s\n" randomfile
   RETURN
 END_IF

 #----------------------------------------#
 # loop through entire file, line by line #
 #----------------------------------------#
 :Loop
   # check for End-of-File
   FEOF handle
   IF_TRUE
       # we're done!
       GOTO Were_Done
   END_IF

   # read line -- watch for read failure
   FREAD_LINE handle inbuffer
   IF_FALSE
       # we're done!
       GOTO Were_Done
   END_IF
         ADD line_count 1

   # compute random value and comparison value
         RANDOM rval 1000000
         SET cval 1000000
         DIV cval line_count

         # is this line a winner, replacing previous winner?
         COMPARE rval cval
         IF_LESS
           # we have a winner
               COPY winner inbuffer
         END_IF

   # check remaining lines
   GOTO Loop

 #------------------------------------------------------------------------#
 # we should have a winner by this point, and can do with it what we want #
 #------------------------------------------------------------------------#
 :Were_Done
 PRINT winner

It's quite clever how it works (no, I didn't think of it first!).  It reads
each line and determines whether that line would have been selected _assuming_
there are no more lines in the file.  If it _does_ find another line, it
determines whether it would have been chosen over what ever line had _already_
been chosen.  Like this:

It reads the first line.  Possibly the ONLY line.  So this line must be
selected 100% of the time.  Hence Line #1 is selected randomly 1/1 of the time.
Then it reads the second (and possibly last) line.  If there are only two lines
in the file, then there is a 1/2 chance that line #2 will be selected.  Should
a random comparison show that the 1/2 chance is met, line #2 replaces what was
previoulsy selected (IOW line #1).  Now it reads line #3.  This line is 1/3
likely to be selected.  If it is, it replaces the previously selected line
(either #1 or #2).  Line #4 replaces the previously selected line 1/4 of the
time.  Line #5 replaces the previous winner 1/5 of the time... and so on.

The only thing tricky is that since we don't have FP values, we have to
generate integer random numbers and scale the count accordingly.  (You can't
have values of 1/4 = 0.25 to compare with randiom numbers between 0 and 1, so
you have to use a scaling factor -- 1,000,000 in this case -- and compare
1000000/4 with a random number between 0 and 1000000.)

Caution -- this routine will select blank lines as well as any other, so watch
out for blank lines PARTICULARLY at the end of the file.  Or check for blank
lines immediately after reading and "GOTO Loop" immediately, without adjusting
the counter, if you find one.  Skip Comments the same way.

You could also select "cookies" which consist of more than one line using a
method similar to this.  You'd have to separate groups of lines somehow (say a
blank line), accumulate all non-blank lines as a possible winner, and when you
discover a blank, THEN you incriment your counter, make your random check, and
if successful replace the previous winning lines with the new set of lines you
have accumulated.

I've run this algorythm in a loop 250,000 times with a file containing 25
different lines, and each line is selected 10,000 times +/- 1% which is good
enough for government work.

---
� Synchronet � Vertrauen � Home of Synchronet � telnet://vert.synchro.net