Introduction
Introduction Statistics Contact Development Disclaimer Help
View source
# 2024-07-02 - Dinosaur Hunting Part 2
Recently i scoured my recipe collection for recipes with ALL
UPPERCASE TEXT. I wrote a quick algorithm in AWK to sort recipes by
percentage of uppercase letters versus lowercase letters. I called
this "dinosaur hunting."
My algorithm has a weakness. Suppose there is an otherwise normal
recipe file that has only one paragraph with all uppercase letters.
This can fall below the 30% uppercase letter threshold, so it would
not be reported.
I wrote a new algorithm find recipes with at least 3 consecutive
lines that are all uppercase.
$ cat >caps2.awk <<_EOF__
BEGIN {
FS=""
}
!/^MMMMM/ {
if (lfn != FILENAME) {
wasallcaps = 0
}
lcase = 0
ucase = 0
for (i = 1; i <= NF; i++) {
if (match($i, /[a-z]/)) {
lcase++
} else if (match($i, /[A-Z]/)) {
ucase++
}
}
if (ucase == 0 || lcase > 0) {
wasallcaps = 0
} else {
wasallcaps++
if (wasallcaps > 2) {
dinosaurs[FILENAME] = 1
}
}
lfn = FILENAME
}
END {
for (fn in dinosaurs) {
print fn
}
}
__EOF__
Then i ran this script against all recipe files:
$ find moar/ascii -type f | xargs awk -f caps2.awk >clis
This revealed around 20 dinosaurs missed by my original algorithm.
tags: bencollver,retrocomputing,technical
# Tags
bencollver
retrocomputing
technical
You are viewing proxied material from tilde.pink. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.