(2023-12-25) Pen and paper cryptography: breaking down what's still worth
-------------------------------------------------------------------------
A long time ago (a _very_ long time ago) I created a protocol for SMS text
encryption. It was based on a well-known Bifid cipher (we'll get to this a
bit later) but applied a little bit differently, and, of course, all
encryption/decryption must be done manually. Back then, I thought of it as
of something worth publishing because almost no one had been touching "hand
crypto" in the age of computers anymore. But my protocol also had several
issues I only realized much later and abandoned it in the depth of my GitHub
Gist. The topic is still alive though, and may see another resurgence as we
are drowning deeper and deeper into the modern dystopia. The only problem is
what is still worth studying within this topic, because most hand ciphers
existing in the world really are obsolete and can't be used nowadays.

In this post, I'm only going to focus on _symmetrical_ cryptography.
Asymmetrical cryptography still is somewhat possible to do by hand but it's
much less practical as of now.

Before I get into the details of what we can build, let me share my own
criteria of what I consider a usable pen and paper crypto algorithm today:

1. It must meet both Shannon's requirements for confusion and diffusion.
2. It must support (or be able to extend to support) encrypting alphanumeric
messages (so, at least 36 characters).
3. It must not involve any calculations beyond some modulo addition and
subtraction that can be done mentally, and preferably should not make the
operators to do even those.
4. Guessing the key length by analyzing the ciphertext must be impossible.
5. Average encryption/decryption speed must not be lower than 10 characters
per minute by a trained operator.
6. All printed or handwritten material that doesn't involve key information
must be as reusable as possible.

Based on these criteria, here's what I'm NOT considering viable pen and paper
ciphers or their building blocks:

1. Straddling checkerboards. Yes, they are useful but not fast enough and
restrict you to modulo-10 arithmetic. And for keystream generation, their
output is not uniform as they will generate the letters that map to single
digits much more often.
2. Any playing card based system such as Solitaire/Pontifex. Too slow to
encrypt even a single character.
3. Columnar transpositions, except when preparing the key material. When used
on the plaintext, they can expose the key length. Also, only the relative
order of the letters in the key matters for a columnar transposition, not
their values, so different keys with the same relative order (like, for
example, words "key" and "law") will produce the same results.
4. Bigram substitutions (like Playfair), except when preparing the key
material. When used on the plaintext, they expose too many irregularities
and can't provide any protection against statistical analysis.

Now, let's finally list the building blocks for a pen and paper cipher that
meets the above criteria:

1. Tabula recta. For an alphabet of size N, this is a square table of NxN
characters consisting of all possible rotations of this alphabet. For the
user's convenience, the table might be indexed with the duplicates of the
first row and the first column respectively. This is what's used to "add"
and "subtract" letters of the target alphabet without having to convert them
to digits and performing any modular arithmetics. For any chosen character
set, a tabula recta can be generated, typed or even handwritten from memory
with virtually no effort. In fact, for a "one-time pad" cryptosystem, which
is proven to be unbreakable if properly used, all one needs is a tabula
recta and securely distributed sheets of key material.
2. Polybius square. For an alphabet of size N (and N should be a full square
in this case), this is a square consisting of all N characters of this
alphabet, and all rows and columns are usually numbered from 1 to sqrt(N).
Such a square can be used for different purposes in the cryptosystem, but
the most promising uses are based on achieving the diffusion goal by
splitting the letters by their row and column numbers and recombining them
from those numbers in a different way. This is how ADFGVX and Bifid ciphers
worked.
3. Autokey. Although mistakenly attributed to only one (and pretty insecure)
method of just putting the beginning of the plaintext into the keystream
after the key itself has ended, the autokey concept in fact means multiple
ways of extending the key material based on the previous values, and it
doesn't (and shouldn't) necessarily involve plaintext at all.
4. Polyalphabetic substitutions. The way the alphabet gets modified doesn't
matter much, what matters is that it's not static. However, not every way of
doing this meets our initial criteria, so we must be careful about what we
choose here. For instance, just using a plain tabula recta as the source of
a polyalphabetic substitution and calling it a day won't work.

So, these are the main ideas that can be utilized for our paper crypto
purposes. Now, let's see them in action by designing a pen and paper stream
cipher!

Part 1. The tabula recta

This is the definite tool that will help us with all our "calculations".
Print out (or write down) the tabula recta for the alphabet of your choice,
in our case it's 36 characters: first A to Z, then 0 to 9. Writing down
36x36 squares can be tedious, so I recommend preprinting them. The columns
and rows are indexed from both sides for ease of use.

Here's an AWK script to generate and display the full 36x36 tabula recta:

BEGIN {
 alpha = "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z " \
         "0 1 2 3 4 5 6 7 8 9 "
 l = length(alpha)
 printf(" | %s|\n%s", alpha, "-|")
 for(i=0;i<l;i++) printf("%s", "-")
 print("-|-")
 beta = alpha
 do {
   ind = substr(beta, 1, 1)
   printf("%s| %s |%s\n", ind, substr(beta, 1, l - 1), ind)
   beta = substr(beta, 3) substr(beta, 1, 2)
 } while(beta != alpha)
 printf("%s", "-|")
 for(i=0;i<l;i++) printf("%s", "-")
 print("-|-")
 printf(" | %s|\n", alpha)
}

It creates the following table that you can print out:

| A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 |
-|-------------------------------------------------------------------------|-
A| A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 |A
B| B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A |B
C| C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B |C
D| D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C |D
E| E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D |E
F| F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E |F
G| G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F |G
H| H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G |H
I| I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H |I
J| J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I |J
K| K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J |K
L| L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K |L
M| M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L |M
N| N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M |N
O| O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N |O
P| P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O |P
Q| Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P |Q
R| R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q |R
S| S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R |S
T| T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S |T
U| U V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T |U
V| V W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U |V
W| W X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V |W
X| X Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W |X
Y| Y Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X |Y
Z| Z 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y |Z
0| 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z |0
1| 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 |1
2| 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 |2
3| 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 |3
4| 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 |4
5| 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 |5
6| 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 |6
7| 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 |7
8| 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 |8
9| 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 |9
-|-------------------------------------------------------------------------|-
| A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 |

Note: use a monospace font to display and print it correctly.

Part 2. Alphabet keying and converting your passkey into the seed

Now we need to scramble our alphabet to use in the keystream. Just like the
seed that we'll get to in a moment, this is going to depend on the passkey
we've chosen. The process of keying the alphabet is as follows:

1. Uppercase your passkey and remove all characters not belonging to the
target alphabet (in our case, [A-Z0-9] range).
2. Start writing the remaining passkey characters into your new alphabet. If
the character repeats, keep going down the alphabet (wrapping around the end
if necessary) until you find the one that hasn't been used. Note that this
modification is only used for substitution keying, not the seed!
3. Write down all the remaining alphabet characters in the reverse order.
4. Write the resulting alphabet as the first row in the substitution box, and
the unscrambled alphabet as the second row.

Suppose we have the alphabet represented by the above 36x36 tabula recta, and
our passkey is SiVisPacemParaBellum-2024. Let's key the alphabet using the
above rules:

1. Prepare the passkey: SIVISPACEMPARABELLUM2024
2. Start the alphabet by writing the passkey characters: SIV... Ok, we
already have I so we go down 1 position and write H. If we continue like
this, we'll end up with the following modified passkey:
SIVHRPACEMO9Q8BDLKUJ2014. This is the start of our keyed alphabet.
3. Finish the alphabet by writing all remaining characters in the reverse
order. In our case, we should be left with this (exactly 36 different
characters):

SIVHRPACEMO9Q8BDLKUJ20147653ZYXWTNGF

This is the first substitution box row. Complete it with the plain alphabet
in the second row:

SIVHRPACEMO9Q8BDLKUJ20147653ZYXWTNGF
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

Now, to create a seed for your keystream, there are several methods. The
laziest one is to just reuse the adapted passkey (in our case,
SIVISPACEMPARABELLUM2024) as the seed directly. This works but can possibly
reveal some correlation information. To break this correlation a little, we
can just "add" the passkey to the _reversed_ keyed alphabet string using the
tabula recta.

In our example:

SIVISPACEMPARABELLUM2024
FGNTWXYZ35674102JUKLDB8Q
------------------------
XO81ECY17HL7L116U54X510K = our seed for the autokey part.

Part 3. The autokey

For keystream generation from the seed (the autokey element), let's use
lagged Fibonacci generators. The main idea is to add or even subtract
(numerically or directly via the tabula recta) modulo N (the alphabet size)
not the previous two characters like in a normal Fibonacci sequence, but the
previous one and the character placed at some larger fixed distance before
it. In our case, the obvious choice for this distance would be the seed
length minus 1. So, for instance, if we use addition and a standard 26x26
tabula recta, then for the seed CRYPTO, we'd add O and C and get Q, then add
Q and R and get H, then add H and Y and get E and so on. As shown by
Francisco Ruiz ([1]), this particular autokey technique allows to generate
keystreams that exhibit nice statistical properties using nothing but a
single tabula recta, which might be additionally keyed (the substitution box
we've generated in the previous step performs exactly the same function). In
our case, we use a 36x36 tabula recta but the idea is absolutely the same.

Let's demonstrate how this works. Start with the seed we generated earlier:

XO81ECY17HL7L116U54X510K

To get the next keystream character, we just "add" the current last character
to the first one using the tabula recta, and then use our keyed alphabet
substitution:

X + K = 7, sub(7) = Y

To get yet another keystream character, we "add" the new last character to
the second one and then do the substitution:

Y + O = C, sub(C) = H

And so on. Whenever we're going to "add" a new character at (pos), we're
"adding" the character at (pos - 1) and the character at (pos - slen), where
slen is our seed length.

Since we use keyed substitution as the middle step, this process is nonlinear.

Important note: the initial seed must be dropped from the keystream after
enough characters are generated to cover your plaintext messages. So, for
the plaintext sized M, you have to generate at least M new keystream
characters from the seed. I recommend preparing keystream sheets by both
parties before the transmission, just like they would do for one-time pad
communication. And, just like for one-time pad communication, the keystream
characters already used for encryption must be securely discarded and never
be reused.

Part 4. The protocol

Once both parties have the keystream sheets ready, the protocol is extremely
simple:

1. Alice wants to transmit a message M1 of len L1, so she uses L1 bytes of
the keystream to encrypt M1 (by "adding" each its character to the
corresponding keystream character with the tabula recta, which takes very
little time) and transmits it as the cryptogram C1.
2. Bob receives the cryptogram C1 and determines its length L1. He then uses
L1 bytes of the keystream to decrypt C1 (by "subtracting" each keystream
character from the corresponding C1 character using the tabula recta) and
recovers the initial message L1.
3. Both parties cross out L1 keystream characters. The next round starts with
the (L1+1)-th character.
4. When the parties are about to run out of the keystream characters (50-60
remaining), they agree on a new key phrase to generate the next keystream.
The old key phrase must not be reused. For instance, chen trying to set up a
new key phrase like "Opus Magnum, Veni Vidi Vici", Alice might send
"NKXOPUSMAGNUMVENIVIDIVICIXCNF" to Bob, and Bob might send the confirmation
"NKXOPUSMAGNUMVENIVIDIVICIXACK", both encrypted with the remaining old
keystream characters. The new transmissions afterwards would already use the
new keystream.

Verdict

36x36 tabula recta is noice. You can estimate how many bits of key
information is contained in N base-36 characters by multiplying N by
log2(36), which is approximately 5.17. And it might surprise you that you
can already exceed the current symmetric cipher standard key size (128 bits)
with just 25 (significant) passphrase characters. Even 11 passphrase
characters give you DES grade of security (over 56 bits), 13 characters give
64 bits, 16 characters give 80 bits. So, while 36x36 may seem large in
comparison to the traditional 26x26, in fact it's much more functional and
allows to keep your passkeys at reasonable sizes while still maintaining a
decent keyspace.

But the tabula, after all, is only a "calculator". The heart is in the
algorithm itself. Keep in mind that the above system isn't yet complete and
polished and is more of a mind excercise of mine for now, it might be
simplified even more if something is proven unnecessary. I'm also probably
going to come up with something _much_ simpler that (unlike e.g. Hutton
cipher) doesn't require you to write any intermediate results at all in case
you can't have anything else besides your keypad phone where you read or
write encrypted messages. Whenever I deem either of them finished, I'll
publish this algo and the "mental" one in the LuxDocs section, alongside the
computer-aided simulations for both. Overall, I hope this longread was
interesting and useful for you.

--- Luxferre ---

[1]: https://arxiv.org/pdf/2312.02869.pdf