View source | |
# 2024-07-02 - Dinosaur Hunting With AWK | |
The 80,000 or so recipes in MOAR came from a dump of the web site | |
formerly at soar.berkeley.edu. Most of the recipes were collected | |
from the BBS scene going back into the days of yore when dinosaurs | |
roamed cyberspace and real programmers wrote code using bits of | |
shells and strings. Some hardware and software DID NOT SUPPORT | |
LOWERCASE LETTERS AT ALL. Consequently, some of the recipes used | |
ALL CAPITAL LETTERS. Some recipes were normal except either the | |
ingredients were all uppercase, or the instructions were all | |
uppercase. | |
I PERSONALLY FIND DINOSAUR LANGUAGE DIFFICULT TO READ, SO I RESOLVED | |
TO FIND THESE RECIPES AND FIX THEM ONCE AND FOR ALL. | |
I wrote a quick awk script to report the percentage of capital | |
letters in each recipe file. | |
$ cat >caps.awk <<_EOF__ | |
BEGIN { | |
FS="" | |
} | |
{ | |
for (i = 1; i <= NF; i++) { | |
if (match($i, /[a-z]/)) { | |
lcase[FILENAME]++ | |
} else if (match($i, /[A-Z]/)) { | |
ucase[FILENAME]++ | |
} | |
} | |
} | |
END { | |
for (fn in ucase) { | |
lnum = lcase[fn] | |
unum = ucase[fn] | |
if (unum > 0) { | |
pct = int(100 * unum / lnum) | |
printf "%d\t%s\n", pct, fn | |
} | |
} | |
} | |
__EOF__ | |
Then i ran this script against all recipe files: | |
$ find moar/ascii -type f | xargs awk -f caps.awk | sort -n >clis | |
Using trial and error i found that files with more than 30% uppercase | |
letters were good candidates to be fixed. This identified | |
659 dinosaurs. It took some doing, but now these recipes are fixed | |
to be more readable on MOAR. | |
gopher://tilde.pink/1/~bencollver/recipes/ | |
tags: bencollver,retrocomputing,technical | |
# Tags | |
bencollver | |
retrocomputing | |
technical |