Finding Anagrams from a list of words in Python | |
Publishing date: 2023-05-12 10:28 +0200 | |
I'm kind of obsessed with historic cryptography and puzzles. | |
A week ago or so I had to find anagrams for a given word and | |
although you could use your favorite search engine to look | |
up an existing list for a given language - or even fancier, | |
using ChatGPT - I decided to cook it up my own. | |
First, an anagram isn't just a simple random permutation, it | |
must also be a proper word existing in the language. While | |
one could simply do something like | |
``` | |
In [21]: iword = list("hello") | |
In [22]: shuffle(iword) | |
In [23]: ''.join(iword) | |
Out[23]: 'ollhe' | |
``` | |
this isn't exactly helpful. | |
So what I'm doing instead is reading in a list of words into | |
a list of strings, then sort the word I want to find | |
anagrams for by the ASCII-value of each individual | |
characters and then look for words in the list matching the | |
same pattern. Example: | |
``` | |
In [25]: [ord(c) for c in 'hello'] | |
Out[25]: [104, 101, 108, 108, 111] | |
In [29]: o = [ord(c) for c in 'hello'] | |
In [30]: o.sort(); o | |
Out[30]: [101, 104, 108, 108, 111] | |
``` | |
IF an anagram exists, then there should be at least two | |
words in the list, which follow the same pattern. To | |
accommodate for upper-/lowercase characters, I make all | |
characters lowercase first. | |
So, first, read in a list of words - with one word per | |
line - and put it into a list: | |
``` | |
en = [] | |
with open("/home/alex/share/wordlists/english.txt") as f: | |
while True: | |
line=f.readline() | |
if not line: | |
break | |
else: | |
en += [ line.strip('\n') ] | |
en[0:5] | |
['W', 'w', 'WW', 'WWW', 'WY'] | |
``` | |
Alrighty... Now the fun: | |
``` | |
def findAnagram(word, wl): | |
"""Find an anagram for word in wordlist wl. | |
wl must be python list of words (as strings). | |
A wordlist can be generated by reading a flat | |
text file containing words, | |
e.g. by using the helper function | |
gen_wordlist_list_from_file(). | |
""" | |
# The idea is to grab all words of the same | |
# length, then sort the characters and get an | |
# ascii representation; then find all | |
# which have the same representation. | |
word = word.lower() | |
tmp_wl = [i for i in wl if len(i) == len(word)] | |
enc_word = [ord(i) for i in word] | |
enc_word.sort() | |
out = [] | |
for i in tmp_wl: | |
i = i.lower() | |
t = [ord(x) for x in i] | |
t.sort() | |
if enc_word == t: | |
out += [ i ] | |
return out | |
``` | |
Let's try this! | |
``` | |
[findAnagram(word, en) for word in "How does this \ | |
even work".split(" ")] | |
[['how', 'who', 'who'], | |
['odes', 'does', 'dose'], | |
['this', 'hist', 'hits', 'shit'], | |
['even'], | |
['work']] | |
``` | |
Fun! | |
___________________________________________________________________ | |
Gophered by Gophernicus/3.1.1 on Raspbian/12 armv7l |