Subj : Random line from a file
To : All
From : Amcleod
Date : Tue Aug 21 2001 05:39 am
The question arises periodically -- how to select a line randomly from a text
file? If the file were filled with fixed-length records you could compute the
offset of a random line and FSEEK/FREAD the apropriate line. Alas, text files
are annoyingly variable in line-length. Or you could read all the lines into
an array (simulated on disk if needed) and select randomly from the array.
Ugh! So here is my method-of-choice, which reads through the entire file from
top to bottom, and randomly picks one line out of the file:
#----------------------------------------------------#
# randomly select one random-length line from a file #
#----------------------------------------------------#
!INCLUDE FILE_IO.INC
STR randomfile inbuffer winner
INT handle line_count rval cval
#--------------------#
# set up some values #
#--------------------#
SET randomfile "c:\\sbbs\\exec\\oneliner.txt"
SET line_count 0
SET winner "Default Value -- in case routine fails"
#---------------#
# open the file #
#---------------#
FOPEN handle O_RDONLY randomfile
IF_FALSE
PRINTF "Error opening %s\n" randomfile
RETURN
END_IF
#----------------------------------------#
# loop through entire file, line by line #
#----------------------------------------#
:Loop
# check for End-of-File
FEOF handle
IF_TRUE
# we're done!
GOTO Were_Done
END_IF
# compute random value and comparison value
RANDOM rval 1000000
SET cval 1000000
DIV cval line_count
# is this line a winner, replacing previous winner?
COMPARE rval cval
IF_LESS
# we have a winner
COPY winner inbuffer
END_IF
# check remaining lines
GOTO Loop
#------------------------------------------------------------------------#
# we should have a winner by this point, and can do with it what we want #
#------------------------------------------------------------------------#
:Were_Done
PRINT winner
It's quite clever how it works (no, I didn't think of it first!). It reads
each line and determines whether that line would have been selected _assuming_
there are no more lines in the file. If it _does_ find another line, it
determines whether it would have been chosen over what ever line had _already_
been chosen. Like this:
It reads the first line. Possibly the ONLY line. So this line must be
selected 100% of the time. Hence Line #1 is selected randomly 1/1 of the time.
Then it reads the second (and possibly last) line. If there are only two lines
in the file, then there is a 1/2 chance that line #2 will be selected. Should
a random comparison show that the 1/2 chance is met, line #2 replaces what was
previoulsy selected (IOW line #1). Now it reads line #3. This line is 1/3
likely to be selected. If it is, it replaces the previously selected line
(either #1 or #2). Line #4 replaces the previously selected line 1/4 of the
time. Line #5 replaces the previous winner 1/5 of the time... and so on.
The only thing tricky is that since we don't have FP values, we have to
generate integer random numbers and scale the count accordingly. (You can't
have values of 1/4 = 0.25 to compare with randiom numbers between 0 and 1, so
you have to use a scaling factor -- 1,000,000 in this case -- and compare
1000000/4 with a random number between 0 and 1000000.)
Caution -- this routine will select blank lines as well as any other, so watch
out for blank lines PARTICULARLY at the end of the file. Or check for blank
lines immediately after reading and "GOTO Loop" immediately, without adjusting
the counter, if you find one. Skip Comments the same way.
You could also select "cookies" which consist of more than one line using a
method similar to this. You'd have to separate groups of lines somehow (say a
blank line), accumulate all non-blank lines as a possible winner, and when you
discover a blank, THEN you incriment your counter, make your random check, and
if successful replace the previous winning lines with the new set of lines you
have accumulated.
I've run this algorythm in a loop 250,000 times with a file containing 25
different lines, and each line is selected 10,000 times +/- 1% which is good
enough for government work.
---
� Synchronet � Vertrauen � Home of Synchronet � telnet://vert.synchro.net