/~bencollver/log/2024-07-07-dinosaur-hunting-part-2 on tilde.pink

	View source

	# 2024-07-02 - Dinosaur Hunting Part 2

	Recently i scoured my recipe collection for recipes with ALL
	UPPERCASE TEXT. I wrote a quick algorithm in AWK to sort recipes by
	percentage of uppercase letters versus lowercase letters. I called
	this "dinosaur hunting."

	My algorithm has a weakness. Suppose there is an otherwise normal
	recipe file that has only one paragraph with all uppercase letters.
	This can fall below the 30% uppercase letter threshold, so it would
	not be reported.

	I wrote a new algorithm find recipes with at least 3 consecutive
	lines that are all uppercase.

	$ cat >caps2.awk <<_EOF__
	BEGIN {
	FS=""
	}
	!/^MMMMM/ {
	if (lfn != FILENAME) {
	wasallcaps = 0
	}
	lcase = 0
	ucase = 0
	for (i = 1; i <= NF; i++) {
	if (match($i, /[a-z]/)) {
	lcase++
	} else if (match($i, /[A-Z]/)) {
	ucase++
	}
	}
	if (ucase == 0 \|\| lcase > 0) {
	wasallcaps = 0
	} else {
	wasallcaps++
	if (wasallcaps > 2) {
	dinosaurs[FILENAME] = 1
	}
	}
	lfn = FILENAME
	}
	END {
	for (fn in dinosaurs) {
	print fn
	}
	}
	__EOF__

	Then i ran this script against all recipe files:

	$ find moar/ascii -type f \| xargs awk -f caps2.awk >clis

	This revealed around 20 dinosaurs missed by my original algorithm.

	tags: bencollver,retrocomputing,technical

	# Tags

	bencollver
	retrocomputing
	technical