* * * * *
THE QUANTUM SUPPOSITION OF OZ
> More and more Dorothy wondered how and why the great giants had ever
> submitted to become slaves of such skinny, languid masters …
>
One of the better turn of phrase from “The Quantum Supposition of Oz [1]”
I'm done. I finished NaNoGenMo (National Novel Generation Month) [2] in only
a few hours total of work. I decided against Racter vs. ELIZA [3] because of
the technical challenges. It's easy enough to find source code I can
understand to some version of ELIZA [4], the same can't be said for Racter
[5]. The code I do have is nearly incomprehensible with no documentation
other than the output of the program itself.
That in and of itself wouldn't be a show-stopper—I do have a running copy of
Racter, but it's an MS-DOS (Microsoft Disk Operating System) executable that
I have to run under an emulator, so piping the the output from ELIZA to
Racter and back again is not a trivial problem that can be solved in the few
remaining days left of NaNoGenMo (National Novel Generation Month). Pity,
really, as the output would be most amusing to read.
So I fell back to the old stand-by—Markov chains [6]. The input I used for
the Markov chaining process (more on that below) was the entire works of Oz
[7] by L. Frank Baum [8]. I can't say why I picked those, other than I had
already downloaded them from Project Gutenberg [9] some years ago and had
them handy. And they are in the public domain, so anybody can butcher them.
Now a Markov chain is pretty straight-forward—I used an order-3 Markov chain.
So you start with three words, say “the Wicked Witch.” That's your start, and
you output that. Then you find each word that follows that phrase and count
the number of times they occur:
Table: Frequency of words following “the Wicked Witch”
word count
------------------------------
of 22
10
, 9
was 7
and 6
had 5
has 2
really 1
discovered 1
conquered 1
a 1
who 1
merely 1
said 1
dies 1
put 1
or 1
before 1
died 1
enchanted 1
surrounded 1
ruled 1
is 1
took 1
looked 1
laughed 1
¶ 1
realized 1
came 1
And from there, you can calculate the precentage chance of a given word
following “the Wicked Witch:”
Table: Precentage chance of a given word following “the Wicked Witch”
word chance of following
------------------------------
of 25.29
11.49
, 10.34
was 8.05
and 6.90
had 5.75
has 2.30
really 1.15
discovered 1.15
conquered 1.15
a 1.15
who 1.15
merely 1.15
said 1.15
dies 1.15
put 1.15
or 1.15
before 1.15
died 1.15
enchanted 1.15
surrounded 1.15
ruled 1.15
is 1.15
took 1.15
looked 1.15
laughed 1.15
¶ 1.15
realized 1.15
came 1.15
You then pick a word randomly, but based on the percentage chance (“of” is
more likely than “came”) and say the choice is “of.” That's your next word
you output. Now your three words are ”Wicked Witch of” and you do that
process again and again until you get the desired number of words printed.
In my case, the initial words were three paragraph markers (¶) and the
initial opening paragraphs that came out were:
> THE WONDERFUL WIZARD OF OZ
>
> CHAP . 17
>
> The Shaggy Man laughed merrily .
>
> " A prisoner is a captive , " replied Ozma , promptly .
>
> " Just wonderful ! " declared the Lion , in a voice of horror .
>
> " Oh , indeed ! " exclaimed the Pumpkinhead .
>
> " I'd kick out with those long legs and large knees and feet . Below in the
> streets of the conquered city and the gardens and Rinkitink thought the
> best part of me then remaining . Moreover , there was little pleasure in
> talking with the goat they kept away from the others .
>
> They now entered the great hall , his shaggy hat in his hands , was a big
> house , round , red cap held in place by means of its strings , securely
> around the Ork's neck , just where his belt was buckled . He rose into the
> air , for I can stand it if the others can . "
>
> So Dorothy , who had gone on ahead , came bounding back to say that Dorothy
> and the Scarecrow and Ozma alone ; but Dorothy had been listening with
> interest to this conversation . Now she turned to her captives and said :
>
> " Are you certain this is snow ? " she asked .
>
“The Quantum Supposition of Oz [10]”
Yes, the spacing of the punctuation is a bit odd, and I'll get to that in a
bit.
And the fact that I start with chapter 17 is a quirk of the Markov chaining
process, as is the initial line of the novel, “THE WONDERFUL WIZARD OF OZ,”
due to the initial three words selected (three paragraph markers).
Now, most of the time on this project was spent in two phases:
1. An initial editing of the Oz books [11] from Project Gutenberg. I had to
remove all the verbiage that didn't directly relate to the story. This
included not only the text Project Gutenberg added, but Table of Contents and
Introductions in each book, as well as page numbers and references to
illustrations.
This was perhaps an hour or two of time—only one book had page numbers
(thankfully, the other thirteen did not) and the text editor made light work
of removing the image references. Most of the verbiage removed was located at
the start and end of each book, so that was easy to cut.
2. Defining what a “word” was for the Markov chaning.
Seriously.
I spent more time on this than I did on the initial editing.
So, what is a word?
A quick answer is “letters surrounded by space.”
And that's good for about 95% of the words. But then you get stuff like
“I'll” or “Dorothy's”. Then you expand the definition to “letters, with an
embedded apostrophe, surrounded by space.” Then you come across “goin'” and
you redefine yet again. Then you come across “Tik-tok” (a character in the
story) or “Coo-ee-oh” and you redefine your definition yet again. Then you
find “how-d” and “ye-do” and realize you need to handle “how-d'ye-do” and by
now you realize you also missed “Dr.” and “Mr.” and “P. S.” and …
Yes, the definition of a “word” isn't quite so simple (oh, and then you come
across entries like “No. 17”—sigh).
In the end, I defined a word as such (and in this order):
1. A series of blank lines denotes a paragraph marker—¶.
2. Punctuation (these two to avoid the dreaded “wall-of-text [12]” you
often get in generative text, but they're printed as words and thus, the
odd spacing you see)
3. “--”—these designate an m-dash, a typographical punctuation mark
4. Digits (but see below)
5. “Mr.”
6. “MR.”
7. “Mrs.”
8. “MRS.”
9. “Dr.”
10. “DR.”
11. “P. S.” (and the variation “P.S.”)
12. “T. E.” (and the variation “T.E.”—stands for “Thoroughly Educated”)
13. “Gen.” (short for “General”)
14. “No. ” followed by digits (no real reason for that—I just did it that
way)
15. “N. B.” (and the variation “N.B.”)
16. “H.” (an initial)
17. “M.” (an initial)
18. “O.” (an initial)
19. “Z.” (an initial)
20. A few really complicated rules to catch “how-d'ye-do” but avoid making a
word out of “me--please” (some context: “don't strike me– please
don't”).
Then all that was left was to generate a few novels (about a minute or two)
and pick one that at least starts off strong and there you have it [13], a
novel.
Oh, and the code [14] that generated this awful dreck, should you be
interested.
[1]
https://github.com/spc476/NaNoGenMo-
[2]
https://github.com/dariusk/NaNoGenMo
[3]
gopher://gopher.conman.org/0Phlog:2014/11/28.1
[4]
http://en.wikipedia.org/wiki/ELIZA
[5]
gopher://gopher.conman.org/0Phlog:2008/06/18.2
[6]
http://blog.codinghorror.com/markov-and-you/
[7]
http://en.wikipedia.org/wiki/List_of_Oz_books
[8]
http://en.wikipedia.org/wiki/L._Frank_Baum
[9]
https://www.gutenberg.org/
[10]
https://github.com/spc476/NaNoGenMo-
[11]
https://www.gutenberg.org/ebooks/search/?query=Oz
[12]
http://www.cigaretteboy.com/
[13]
https://github.com/spc476/NaNoGenMo-
[14]
https://github.com/spc476/NaNoGenMo-
Email author at
[email protected]