Introduction
Introduction Statistics Contact Development Disclaimer Help
Finding Anagrams from a list of words in Python
Publishing date: 2023-05-12 10:28 +0200
I'm kind of obsessed with historic cryptography and puzzles.
A week ago or so I had to find anagrams for a given word and
although you could use your favorite search engine to look
up an existing list for a given language - or even fancier,
using ChatGPT - I decided to cook it up my own.
First, an anagram isn't just a simple random permutation, it
must also be a proper word existing in the language. While
one could simply do something like
```
In [21]: iword = list("hello")
In [22]: shuffle(iword)
In [23]: ''.join(iword)
Out[23]: 'ollhe'
```
this isn't exactly helpful.
So what I'm doing instead is reading in a list of words into
a list of strings, then sort the word I want to find
anagrams for by the ASCII-value of each individual
characters and then look for words in the list matching the
same pattern. Example:
```
In [25]: [ord(c) for c in 'hello']
Out[25]: [104, 101, 108, 108, 111]
In [29]: o = [ord(c) for c in 'hello']
In [30]: o.sort(); o
Out[30]: [101, 104, 108, 108, 111]
```
IF an anagram exists, then there should be at least two
words in the list, which follow the same pattern. To
accommodate for upper-/lowercase characters, I make all
characters lowercase first.
So, first, read in a list of words - with one word per
line - and put it into a list:
```
en = []
with open("/home/alex/share/wordlists/english.txt") as f:
while True:
line=f.readline()
if not line:
break
else:
en += [ line.strip('\n') ]
en[0:5]
['W', 'w', 'WW', 'WWW', 'WY']
```
Alrighty... Now the fun:
```
def findAnagram(word, wl):
"""Find an anagram for word in wordlist wl.
wl must be python list of words (as strings).
A wordlist can be generated by reading a flat
text file containing words,
e.g. by using the helper function
gen_wordlist_list_from_file().
"""
# The idea is to grab all words of the same
# length, then sort the characters and get an
# ascii representation; then find all
# which have the same representation.
word = word.lower()
tmp_wl = [i for i in wl if len(i) == len(word)]
enc_word = [ord(i) for i in word]
enc_word.sort()
out = []
for i in tmp_wl:
i = i.lower()
t = [ord(x) for x in i]
t.sort()
if enc_word == t:
out += [ i ]
return out
```
Let's try this!
```
[findAnagram(word, en) for word in "How does this \
even work".split(" ")]
[['how', 'who', 'who'],
['odes', 'does', 'dose'],
['this', 'hist', 'hits', 'shit'],
['even'],
['work']]
```
Fun!
___________________________________________________________________
Gophered by Gophernicus/3.1.1 on Raspbian/12 armv7l
You are viewing proxied material from gopher.ynfonatic.de. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.