View source | |
# 2024-07-02 - Dinosaur Hunting Part 2 | |
Recently i scoured my recipe collection for recipes with ALL | |
UPPERCASE TEXT. I wrote a quick algorithm in AWK to sort recipes by | |
percentage of uppercase letters versus lowercase letters. I called | |
this "dinosaur hunting." | |
My algorithm has a weakness. Suppose there is an otherwise normal | |
recipe file that has only one paragraph with all uppercase letters. | |
This can fall below the 30% uppercase letter threshold, so it would | |
not be reported. | |
I wrote a new algorithm find recipes with at least 3 consecutive | |
lines that are all uppercase. | |
$ cat >caps2.awk <<_EOF__ | |
BEGIN { | |
FS="" | |
} | |
!/^MMMMM/ { | |
if (lfn != FILENAME) { | |
wasallcaps = 0 | |
} | |
lcase = 0 | |
ucase = 0 | |
for (i = 1; i <= NF; i++) { | |
if (match($i, /[a-z]/)) { | |
lcase++ | |
} else if (match($i, /[A-Z]/)) { | |
ucase++ | |
} | |
} | |
if (ucase == 0 || lcase > 0) { | |
wasallcaps = 0 | |
} else { | |
wasallcaps++ | |
if (wasallcaps > 2) { | |
dinosaurs[FILENAME] = 1 | |
} | |
} | |
lfn = FILENAME | |
} | |
END { | |
for (fn in dinosaurs) { | |
print fn | |
} | |
} | |
__EOF__ | |
Then i ran this script against all recipe files: | |
$ find moar/ascii -type f | xargs awk -f caps2.awk >clis | |
This revealed around 20 dinosaurs missed by my original algorithm. | |
tags: bencollver,retrocomputing,technical | |
# Tags | |
bencollver | |
retrocomputing | |
technical |