| View source | |
| # 2024-07-02 - Dinosaur Hunting Part 2 | |
| Recently i scoured my recipe collection for recipes with ALL | |
| UPPERCASE TEXT. I wrote a quick algorithm in AWK to sort recipes by | |
| percentage of uppercase letters versus lowercase letters. I called | |
| this "dinosaur hunting." | |
| My algorithm has a weakness. Suppose there is an otherwise normal | |
| recipe file that has only one paragraph with all uppercase letters. | |
| This can fall below the 30% uppercase letter threshold, so it would | |
| not be reported. | |
| I wrote a new algorithm find recipes with at least 3 consecutive | |
| lines that are all uppercase. | |
| $ cat >caps2.awk <<_EOF__ | |
| BEGIN { | |
| FS="" | |
| } | |
| !/^MMMMM/ { | |
| if (lfn != FILENAME) { | |
| wasallcaps = 0 | |
| } | |
| lcase = 0 | |
| ucase = 0 | |
| for (i = 1; i <= NF; i++) { | |
| if (match($i, /[a-z]/)) { | |
| lcase++ | |
| } else if (match($i, /[A-Z]/)) { | |
| ucase++ | |
| } | |
| } | |
| if (ucase == 0 || lcase > 0) { | |
| wasallcaps = 0 | |
| } else { | |
| wasallcaps++ | |
| if (wasallcaps > 2) { | |
| dinosaurs[FILENAME] = 1 | |
| } | |
| } | |
| lfn = FILENAME | |
| } | |
| END { | |
| for (fn in dinosaurs) { | |
| print fn | |
| } | |
| } | |
| __EOF__ | |
| Then i ran this script against all recipe files: | |
| $ find moar/ascii -type f | xargs awk -f caps2.awk >clis | |
| This revealed around 20 dinosaurs missed by my original algorithm. | |
| tags: bencollver,retrocomputing,technical | |
| # Tags | |
| bencollver | |
| retrocomputing | |
| technical |