2024-07-02 - Dinosaur Hunting Part 2
====================================

Recently i scoured my recipe collection for recipes with ALL
UPPERCASE TEXT.  I wrote a quick algorithm in AWK to sort recipes by
percentage of uppercase letters versus lowercase letters.  I called
this "dinosaur hunting."

My algorithm has a weakness.  Suppose there is an otherwise normal
recipe file that has only one paragraph with all uppercase letters.
This can fall below the 30% uppercase letter threshold, so it would
not be reported.

I wrote a new algorithm find recipes with at least 3 consecutive
lines that are all uppercase.

   $ cat >caps2.awk <<_EOF__
   BEGIN {
       FS=""
   }
   !/^MMMMM/ {
       if (lfn != FILENAME) {
           wasallcaps = 0
       }
       lcase = 0
       ucase = 0
       for (i = 1; i <= NF; i++) {
           if (match($i, /[a-z]/)) {
               lcase++
           } else if (match($i, /[A-Z]/)) {
               ucase++
           }
       }
       if (ucase == 0 || lcase > 0) {
           wasallcaps = 0
       } else {
           wasallcaps++
           if (wasallcaps > 2) {
               dinosaurs[FILENAME] = 1
           }
       }
       lfn = FILENAME
   }
   END {
       for (fn in dinosaurs) {
           print fn
       }
   }
   __EOF__

Then i ran this script against all recipe files:

   $ find moar/ascii -type f | xargs awk -f caps2.awk >clis

This revealed around 20 dinosaurs missed by my original algorithm.

tags: bencollver,retrocomputing,technical

Tags
====

bencollver
<gopher://tilde.pink/1/~bencollver/log/tag/bencollver/>
retrocomputing
<gopher://tilde.pink/1/~bencollver/log/tag/retrocomputing/>
technical
<gopher://tilde.pink/1/~bencollver/log/tag/technical/>