* * * * *
Perhaps an 80M script is a bit excessive …
Every so often I'll do a bit of work on an unimportant project, just to keep
myself sane from working in PHP and Drupal [1].
About a month ago I decided to save the data from email indexer program [2]
as a Lua program, something like:
> emails =
> {
> filelist =
> {
> {
> file = "/home/spc/Mail/sent" ,
> size = 902273,
> time = "Tue, 10 Nov 2009 09:35:10 GMT",
> },
> {
> file = "/home/spc/LINUS/Archive.mail/20060607/cctalk",
> size = 230140,
> time = "Tue, 02 May 2006 18:22:28 GMT",
> },
> -- and so on ...
> },
>
> mbox =
> {
> {
> info = { mboxfile = 1, oh={45, 322}, ob={368, 15}},
> ['Message-ID'] = "<
[email protected]>",
> ['From'] = { "Sean Conner <
[email protected]>",},
> ['To'] = { "
[email protected]",},
> ['Subject'] = "This is a test",
> ['Date'] = "Tue, 21 Oct 2008 01:13:31 -0400",
> ['MIME-Version'] = "1.0",
> ['Content-Type'] =
> {
> "text/plain",
> "charset=us-ascii",
> },
> mimeheaders =
> {
> ['Content-Disposition'] = "inline",
> },
> ['Lines'] = 1,
> extraheaders =
> {
> ['User-Agent'] = "Mutt/1.4.1i",
> ['Status'] = "RO",
> },
> },
> -- and so on ...
> }
> }
>
That way, I could load it into the Lua interpreter and work with the data in
Lua, instead of writing a bunch of C code. I debugged the output to make sure
it was valid Lua and everything was fine.
Until I threw 80,919 messages from 2,360 email files I had lying around
(going back to 1991). Then all I got from Lua was:
> lua: constant table overflow
>
Hmmm … okay, maybe throwing a 80MB (Megabyte) into the Lua interpreter wasn't
such a good idea.
But then tonight I decided to give it one more try. The source code to Lua
didn't reveal any immediate settings to tweak, so I did a bit of searching.
And yes, I'm not the only one with that problem [3]. Reading further, I
learned that while there isn't a limit to the size a Lua table can get, there
is a limit to the number of constants in a single Lua function [4].
But the code isn't in a Lua function.
Or is it?
It is. When you load Lua code from an external source, it gets compiled into
an anonymous function that needs to be run. So, the solution is to break the
initialization into several functions, and from some experimenting, I found
that things would work (with this particular data set) if I only initialized
16,384 items per function.
But there's a difference between “it worked” and “this is a usable solution.”
Generating the Lua code? 30 seconds.
Loading the Lua code into the interpreter? Six minutes and an overheated CPU
(Central Processing Unit)
Interesting …
Update Tuesday, November 10^th, 2009
I managed to hit the worst case run-time [5] with the code. Change the order
of things, and it runs in about 15 seconds. Go figure …
Update Wednesday, February 3^rd, 2010
It was a bug in Lua that has since been fixed [6].
[1]
gopher://gopher.conman.org/0Phlog:2009/09/15.1
[2]
gopher://gopher.conman.org/0Phlog:2009/06/01.1
[3]
http://lua-users.org/lists/lua-l/2008-02/msg00255.html
[4]
http://lua-users.org/lists/lua-l/2007-06/msg00231.html
[5]
gopher://gopher.conman.org/0Phlog:2009/11/10.1
[6]
gopher://gopher.conman.org/0Phlog:2010/02/03.1
Email author at
[email protected]