PGP word list
Words for conveying data bytes in speech
The PGP Word List ("Pretty Good Privacy word list", also called a
biometric word list for reasons explained below) is a list of words
for conveying data bytes in a clear unambiguous way via a voice
channel. They are analogous in purpose to the NATO phonetic
alphabet, except that a longer list of words is used, each word
corresponding to one of the 256 distinct numeric byte values.
History and structure
The PGP Word List was designed in 1995 by Patrick Juola, a
computational linguist, and Philip Zimmermann, creator of
PGP.[1][2] The words were carefully chosen for their phonetic
distinctiveness, using genetic algorithms to select lists of words
that had optimum separations in phoneme space. The candidate word
lists were randomly drawn from Grady Ward's Moby Pronunciator list
as raw material for the search, successively refined by the genetic
algorithms. The automated search converged to an optimized solution
in about 40 hours on a DEC Alpha, a particularly fast machine in
that era.
The Zimmermann–Juola list was originally designed to be used in
PGPfone, a secure VoIP application, to allow the two parties to
verbally compare a short authentication string to detect a man-in-
the-middle attack (MiTM). It was called a biometric word list
because the authentication depended on the two human users
recognizing each other's distinct voices as they read and compared
the words over the voice channel, binding the identity of the
speaker with the words, which helped protect against the MiTM
attack. The list can be used in many other situations where a
biometric binding of identity is not needed, so calling it a
biometric word list may be imprecise. Later, it was used in PGP to
compare and verify PGP public key fingerprints over a voice
channel. This is known in PGP applications as the "biometric"
representation. When it was applied to PGP, the list of words was
further refined, with contributions by Jon Callas. More recently,
it has been used in Zfone and the ZRTP protocol, the successor to
PGPfone.
The list is actually composed of two lists, each containing 256
phonetically distinct words, in which each word represents a
different byte value between 0 and 255. Two lists are used because
reading aloud long random sequences of human words usually risks
three kinds of errors: 1) transposition of two consecutive words,
2) duplicate words, or 3) omitted words. To detect all three kinds
of errors, the two lists are used alternately for the even-offset
bytes and the odd-offset bytes in the byte sequence. Each byte
value is actually represented by two different words, depending on
whether that byte appears at an even or an odd offset from the
beginning of the byte sequence. The two lists are readily
distinguished by the number of syllables; the even list has words
of two syllables, the odd list has three. The two lists have a
maximum word length of 9 and 11 letters, respectively. Using a two-
list scheme was suggested by Zhahai Stewart.
Word lists
Here are the two lists of words as presented in the PGPfone Owner's
Manual.[3]
Hex Even Word Odd Word
--- --------- ----------
00 aardvark adroitness
01 absurd adviser
02 accrue aftermath
03 acme aggregate
04 adrift alkali
05 adult almighty
06 afflict amulet
07 ahead amusement
08 aimless antenna
09 Algol applicant
0A allow Apollo
0B alone armistice
0C ammo article
0D ancient asteroid
0E apple Atlantic
0F artist atmosphere
10 assume autopsy
11 Athens Babylon
12 atlas backwater
13 Aztec barbecue
14 baboon belowground
15 backfield bifocals
16 backward bodyguard
17 banjo bookseller
18 beaming borderline
19 bedlamp bottomless
1A beehive Bradbury
1B beeswax bravado
1C befriend Brazilian
1D Belfast breakaway
1E berserk Burlington
1F billiard businessman
20 bison butterfat
21 blackjack Camelot
22 blockade candidate
23 blowtorch cannonball
24 bluebird Capricorn
25 bombast caravan
26 bookshelf caretaker
27 brackish celebrate
28 breadline cellulose
29 breakup certify
2A brickyard chambermaid
2B briefcase Cherokee
2C Burbank Chicago
2D button clergyman
2E buzzard coherence
2F cement combustion
30 chairlift commando
31 chatter company
32 checkup component
33 chisel concurrent
34 choking confidence
35 chopper conformist
36 Christmas congregate
37 clamshell consensus
38 classic consulting
39 classroom corporate
3A cleanup corrosion
3B clockwork councilman
3C cobra crossover
3D commence crucifix
3E concert cumbersome
3F cowbell customer
40 crackdown Dakota
41 cranky decadence
42 crowfoot December
43 crucial decimal
44 crumpled designing
45 crusade detector
46 cubic detergent
47 dashboard determine
48 deadbolt dictator
49 deckhand dinosaur
4A dogsled direction
4B dragnet disable
4C drainage disbelief
4D dreadful disruptive
4E drifter distortion
4F dropper document
50 drumbeat embezzle
51 drunken enchanting
52 Dupont enrollment
53 dwelling enterprise
54 eating equation
55 edict equipment
56 egghead escapade
57 eightball Eskimo
58 endorse everyday
59 endow examine
5A enlist existence
5B erase exodus
5C escape fascinate
5D exceed filament
5E eyeglass finicky
5F eyetooth forever
60 facial fortitude
61 fallout frequency
62 flagpole gadgetry
63 flatfoot Galveston
64 flytrap getaway
65 fracture glossary
66 framework gossamer
67 freedom graduate
68 frighten gravity
69 gazelle guitarist
6A Geiger hamburger
6B glitter Hamilton
6C glucose handiwork
6D goggles hazardous
6E goldfish headwaters
6F gremlin hemisphere
70 guidance hesitate
71 hamlet hideaway
72 highchair holiness
73 hockey hurricane
74 indoors hydraulic
75 indulge impartial
76 inverse impetus
77 involve inception
78 island indigo
79 jawbone inertia
7A keyboard infancy
7B kickoff inferno
7C kiwi informant
7D klaxon insincere
7E locale insurgent
7F lockup integrate
80 merit intention
81 minnow inventive
82 miser Istanbul
83 Mohawk Jamaica
84 mural Jupiter
85 music leprosy
86 necklace letterhead
87 Neptune liberty
88 newborn maritime
89 nightbird matchmaker
8A Oakland maverick
8B obtuse Medusa
8C offload megaton
8D optic microscope
8E orca microwave
8F payday midsummer
90 peachy millionaire
91 pheasant miracle
92 physique misnomer
93 playhouse molasses
94 Pluto molecule
95 preclude Montana
96 prefer monument
97 preshrunk mosquito
98 printer narrative
99 prowler nebula
9A pupil newsletter
9B puppy Norwegian
9C python October
9D quadrant Ohio
9E quiver onlooker
9F quota opulent
A0 ragtime Orlando
A1 ratchet outfielder
A2 rebirth Pacific
A3 reform pandemic
A4 regain Pandora
A5 reindeer paperweight
A6 rematch paragon
A7 repay paragraph
A8 retouch paramount
A9 revenge passenger
AA reward pedigree
AB rhythm Pegasus
AC ribcage penetrate
AD ringbolt perceptive
AE robust performance
AF rocker pharmacy
B0 ruffled phonetic
B1 sailboat photograph
B2 sawdust pioneer
B3 scallion pocketful
B4 scenic politeness
B5 scorecard positive
B6 Scotland potato
B7 seabird processor
B8 select provincial
B9 sentence proximate
BA shadow puberty
BB shamrock publisher
BC showgirl pyramid
BD skullcap quantity
BE skydive racketeer
BF slingshot rebellion
C0 slowdown recipe
C1 snapline recover
C2 snapshot repellent
C3 snowcap replica
C4 snowslide reproduce
C5 solo resistor
C6 southward responsive
C7 soybean retraction
C8 spaniel retrieval
C9 spearhead retrospect
CA spellbind revenue
CB spheroid revival
CC spigot revolver
CD spindle sandalwood
CE spyglass sardonic
CF stagehand Saturday
D0 stagnate savagery
D1 stairway scavenger
D2 standard sensation
D3 stapler sociable
D4 steamship souvenir
D5 sterling specialist
D6 stockman speculate
D7 stopwatch stethoscope
D8 stormy stupendous
D9 sugar supportive
DA surmount surrender
DB suspense suspicious
DC sweatband sympathy
DD swelter tambourine
DE tactics telephone
DF talon therapist
E0 tapeworm tobacco
E1 tempest tolerance
E2 tiger tomorrow
E3 tissue torpedo
E4 tonic tradition
E5 topmost travesty
E6 tracker trombonist
E7 transit truncated
E8 trauma typewriter
E9 treadmill ultimate
EA Trojan undaunted
EB trouble underfoot
EC tumor unicorn
ED tunnel unify
EE tycoon universe
EF uncut unravel
F0 unearth upcoming
F1 unwind vacancy
F2 uproot vagabond
F3 upset vertigo
F4 upshot Virginia
F5 vapor visitor
F6 village vocalist
F7 virus voyager
F8 Vulcan warranty
F9 waffle Waterloo
FA wallet whimsical
FB watchword Wichita
FC wayside Wilmington
FD willow Wyoming
FE woodlark yesteryear
FF Zulu Yucatan
Examples
Each byte in a bytestring is encoded as a single word. A sequence
of bytes is rendered in network byte order, from left to right. For
example, the leftmost (i.e. byte 0) is considered "even" and is
encoded using the PGP Even Word table. The next byte to the right
(i.e. byte 1) is considered "odd" and is encoded using the PGP Odd
Word table. This process repeats until all bytes are encoded. Thus,
"E582" produces "topmost Istanbul", whereas "82E5" produces "miser
travesty".
A PGP public key fingerprint that displayed in hexadecimal as
E582 94F2 E9A2 2748 6E8B
061B 31CC 528F D7FA 3F19
would display in PGP Words (the "biometric" fingerprint) as
topmost Istanbul Pluto vagabond treadmill Pacific brackish
dictator goldfish Medusa
afflict bravado chatter revolver Dupont midsummer stopwatch
whimsical cowbell bottomless
The order of bytes in a bytestring depends on endianness.
Other word lists for data
There are several other word lists for conveying data in a clear
unambiguous way via a voice channel:
* the NATO phonetic alphabet maps individual letters and digits
to individual words
* the S/KEY system maps 64 bit numbers to 6 short words of 1 to 4
characters each from a publicly accessible 2048-word
dictionary. The same dictionary is used in RFC 1760 and RFC
2289.
* the Diceware system maps five base-6 random digits (almost 13
bits of entropy) to a word from a dictionary of 7,776 distinct
words.
* the Electronic Frontier Foundation has published a set of
improved word lists based on the same concept[4]
* FIPS 181: Automated Password Generator converts random numbers
into somewhat pronounceable "words".
* mnemonic encoding converts 32 bits of data into 3 words from a
vocabulary of 1626 words.[5]
* what3words encodes geographic coordinates in 3 dictionary words.
* the BIP39 standard permits encoding a cryptographic key of
fixed size (128 or 256 bits, usually the unencrypted master key
of a Cryptocurrency wallet) into a short sequence of readable
words known as the seed phrase, for the purpose of storing the
key offline. This is used in cryptocurrencies such as Bitcoin
or Monero.
* Like the PGP word list, the Bytewords standard maps each
possible byte to a word. There is only one list, rather than
two. The words are uniformly four letters long and can be
uniquely identified by their first and last letters
References
This article incorporates material that is copyrighted by PGP
Corporation and has been licensed under the GNU Free
Documentation License. (per Jon Callas, CTO, CSO PGP
Corporation, 4-Jan-2007)
1. ↑ Juola, Patrick; Zimmermann, Philip (1996). "Whole-word
phonetic distances and the PGPfone alphabet (Archived)"
(PDF). Proceeding of Fourth International Conference on
Spoken Language Processing. ICSLP '96. Vol. 1. pp. 98–101.
doi:10.1109/ICSLP.1996.607046. ISBN 0-7803-3555-4.
S2CID 10385500. Archived from the original (PDF) on 7
September 2006.
2. ↑ Juola, Patrick (1996). "Isolated Word Confusion Metrics and
the PGPfone Alphabet". Proceedings of New Methods in Language
Processing 2. Ankara, Turkey: Oxford University, Dept. of
Experimental Psychology. arXiv:cmp-lg/9608021.
Bibcode:1996cmp.lg....8021J.
3. ↑ "Archived copy". web.mit.edu. Archived from the original on
26 March 2010. Retrieved 12 January 2022.{{cite web}}: CS1
maint: archived copy as title (link)
4. ↑ "EFF's New Wordlists for Random Passphrases". 19 July 2016.
5. ↑ mnemonic encoding Archived 2008-03-02 at the Wayback
Machine and updated code
References
1.
https://en.wikipedia.org/wiki/Pretty_Good_Privacy (link)
2.
https://en.wikipedia.org/wiki/Word (link)
3.
https://en.wikipedia.org/wiki/Bytes (link)
4.
https://en.wikipedia.org/wiki/NATO_phonetic_alphabet (link)
5.
https://en.wikipedia.org/wiki/Patrick_Juola (link)
6.
https://en.wikipedia.org/wiki/Computational_linguistics (link)
7.
https://en.wikipedia.org/wiki/Philip_Zimmermann (link)
8.
https://en.wikipedia.org/wiki/Pretty_Good_Privacy (link)
9.
https://en.wikipedia.org/wiki/PGP_word_list#cite_note-Juola1996a-1
(link)
10.
https://en.wikipedia.org/wiki/PGP_word_list#cite_note-Juola1996b-2
(link)
11.
https://en.wikipedia.org/wiki/Phonetic (link)
12.
https://en.wikipedia.org/wiki/Genetic_algorithms (link)
13.
https://en.wikipedia.org/wiki/Phoneme (link)
14.
https://en.wikipedia.org/wiki/Grady_Ward (link)
15.
https://en.wikipedia.org/wiki/Moby_Project (link)
16.
https://en.wikipedia.org/wiki/DEC_Alpha (link)
17.
https://en.wikipedia.org/wiki/PGPfone (link)
18.
https://en.wikipedia.org/wiki/Man-in-the-middle_attack (link)
19.
https://en.wikipedia.org/wiki/Biometric (link)
20.
https://en.wikipedia.org/wiki/Pretty_Good_Privacy (link)
21.
https://en.wikipedia.org/wiki/Public_key (link)
22.
https://en.wikipedia.org/wiki/Message_digest (link)
23.
https://en.wikipedia.org/wiki/Jon_Callas (link)
24.
https://en.wikipedia.org/wiki/Zfone (link)
25.
https://en.wikipedia.org/wiki/ZRTP (link)
26.
https://en.wikipedia.org/wiki/Phonetics (link)
27.
https://en.wikipedia.org/wiki/Syllables (link)
28.
https://en.wikipedia.org/wiki/PGP_word_list#cite_note-3 (link)
29.
https://en.wikipedia.org/wiki/Network_byte_order (link)
30.
https://en.wikipedia.org/wiki/Hexadecimal (link)
31.
https://en.wikipedia.org/wiki/Endianness (link)
32.
https://en.wikipedia.org/wiki/NATO_phonetic_alphabet (link)
33.
https://en.wikipedia.org/wiki/S/KEY (link)
34.
https://en.wikipedia.org/wiki/Diceware (link)
35.
https://en.wikipedia.org/wiki/Electronic_Frontier_Foundation (link)
36.
https://en.wikipedia.org/wiki/PGP_word_list#cite_note-4 (link)
37.
https://en.wikipedia.org/wiki/Automated_Password_Generator (link)
38.
https://en.wikipedia.org/wiki/PGP_word_list#cite_note-5 (link)
39.
https://en.wikipedia.org/wiki/What3words (link)
40.
https://en.wikipedia.org/wiki/Cryptocurrency_wallet (link)
41.
https://en.wikipedia.org/wiki/Seed_phrase (link)
42.
https://en.wikipedia.org/wiki/Bitcoin (link)
43.
https://en.wikipedia.org/wiki/Monero (link)
44.
https://developer.blockchaincommons.com/bytewords/ (link)
45.
https://en.wikipedia.org/wiki/PGP_word_list#cite_ref-Juola1996a_1-0
(link)
46.
https://web.archive.org/web/20060907131751/https://www.mathcs.duq.edu/~j
uola/papers.d/icslp96.pdf (link)
47.
https://en.wikipedia.org/wiki/Doi_(identifier) (link)
48.
https://doi.org/10.1109%2FICSLP.1996.607046 (link)
49.
https://en.wikipedia.org/wiki/ISBN_(identifier) (link)
50.
https://en.wikipedia.org/wiki/Special:BookSources/0-7803-3555-4
(link)
51.
https://en.wikipedia.org/wiki/S2CID_(identifier) (link)
52.
https://api.semanticscholar.org/CorpusID:10385500 (link)
53.
https://www.mathcs.duq.edu/~juola/papers.d/icslp96.pdf (link)
54.
https://en.wikipedia.org/wiki/PGP_word_list#cite_ref-Juola1996b_2-0
(link)
55.
http://www.mathcs.duq.edu/~juola/papers.d/pgpfonenemlap.ps (link)
56.
https://en.wikipedia.org/wiki/ArXiv_(identifier) (link)
57.
https://arxiv.org/abs/cmp-lg/9608021 (link)
58.
https://en.wikipedia.org/wiki/Bibcode_(identifier) (link)
59.
https://ui.adsabs.harvard.edu/abs/1996cmp.lg....8021J (link)
60.
https://en.wikipedia.org/wiki/PGP_word_list#cite_ref-3 (link)
61.
https://web.archive.org/web/20100326141145/http://web.mit.edu/network/pg
pfone/manual/index.html#PGP000062 (link)
62.
http://web.mit.edu/network/pgpfone/manual/index.html#PGP000062
(link)
63.
https://en.wikipedia.org/wiki/Template:Cite_web (link)
64.
https://en.wikipedia.org/wiki/Category:CS1_maint:_archived_copy_as_title
(link)
65.
https://en.wikipedia.org/wiki/PGP_word_list#cite_ref-4 (link)
66.
https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases
(link)
67.
https://en.wikipedia.org/wiki/PGP_word_list#cite_ref-5 (link)
68.
http://www.tothink.com/mnemonic/ (link)
69.
https://web.archive.org/web/20080302025836/http://www.tothink.com/mnemon
ic/ (link)
70.
https://en.wikipedia.org/wiki/Wayback_Machine (link)
71.
https://github.com/singpolyma/mnemonicode (link)
From: <
https://en.wikipedia.org/wiki/PGP_word_list>