---------------------------------------- | |
Using ptx to generate one-time pads | |
March 15th, 2018 | |
---------------------------------------- | |
I have been working my way through coreutils [0] recently when | |
I came across ptx. | |
$ apropos ptx | |
ptx (1) - produce a permuted index of file contents | |
What the hell does that mean? I know... | |
$ man ptx | |
PTX(1) User Commands PTX(1) | |
NAME | |
ptx - produce a permuted index of file contents | |
SYNOPSIS | |
ptx [OPTION]... [INPUT]... (without -G) | |
ptx -G [OPTION]... [INPUT [OUTPUT]] | |
DESCRIPTION | |
Output a permuted index, including context, of the words | |
in the input files. | |
With no FILE, or when FILE is -, read standard input. | |
Mandatory arguments to long options are mandatory for | |
short options too. | |
... | |
Oh that totally clears it... nope. Still no clue. | |
So I asked on Mastodon and a few people had some suggestions in | |
particular someone was able to shoot me over to a blog post [1] | |
which tries to clear up what a 'purmuted index' even is. And | |
that's the key. So check this out: | |
A while back before we had badass search engines and hyperlinked | |
doom shenanigans manually finding the reference to a word in | |
a document SUUUUUUUUCKED. So they made this index in the back that | |
listed all the key terms alphebetically in the middle column of | |
a page. To the left of that word it would list whatever sentence | |
led up to it. To the right they'd list the sentence fragment that | |
followed the term. Finally, the page number. With that you could | |
jump to the page and eye-ball search it yourself. | |
It's been around since systemV and it's pretty much useless, | |
right? Well, foxy, I think I came up with a fun hobby use-case. | |
Pick a book with a publically available canonical plain-text | |
source. Oh, I dunno, head over to Project Gutenburg [2] or | |
something and wrestle yourself up some Joyce (or ILLEGAL GERMAN | |
NOVELS!!!!! [3]). We're gonna shove that badboy into ptx like | |
a champ. Here we go... | |
$ curl https://www.gutenberg.org/files/4300/4300-0.txt > ulysses.txt | |
$ ptx ulysses.txt | |
SCREEN EXPLODES WITH TEXT FOR SEVERAL MINUTES!!!!! | |
That's not how that works. Back to manpage! | |
Hmmm... | |
...assumes latin-1 charset... | |
...ignore case, perhaps... | |
...[.?!][]\"')}]*\\($\\|\t\\| \\)[ \t\n]*... | |
...Emacs next-error, grumble... | |
...-w, width, ahha... | |
ROFF! NO FUCKING WAY! | |
One of the output formats for ptx is freaking roff! Syncronicity, | |
baby! [4] Lets try something a little smaller. | |
$ curl http://www.gutenberg.org/cache/epub/1065/pg1065.txt > theraven.txt | |
$ ptx -O -f -w 66 theraven.txt > theraven-index.txt | |
That sorta works. Ugh, but I'm getitng tired. Here's the plan for | |
what's next: | |
- Figure out how to format this stuff so I can awk it | |
- awk so that the text key and one more word to the right are | |
the output. Two words with a space between, that's it. | |
- sort unique that bad-boy by each column in turn so both pairs | |
of words are unique. | |
- Use whatever words are in your primary list to write a plain | |
text message. If your source document is large enough that's | |
virtually any word you'd like to use. | |
- Use awk to replace your words with the one to the right via | |
a lookup file | |
- Send secret message to a friend. The knowledge of which book | |
is your cypher is all that's necessary to repeat the process | |
in reverse. | |
Huzzah for secret codes. | |
If I get some time this weekend I'll look at writing a script to | |
automate this for you. Provide a book and a message and indicate | |
whether to encode or decode. Oh what fun that would be for some | |
private crypto. Thinking you could do this in perl? Wanna show me | |
up? Put your illogical collection of special characters where your | |
mouth is, buddy! | |
[0] GNU Core Utilities | |
[1] Reading a Permuted Index | |
[2] Project Gutenberg on Gopher | |
[3] Project Gutenberg Blocks Access to Germany | |
[4] dbucklin - Formatting for Gopher with GNU troff |