* * * * *
Profiles of Lua code
I started “Project: Sippy-Cup [1]” six years ago as a “proof-of-concept” and
it ended up in production (without my knowledge and to my dismay at the time)
about a year or two afterwards. The code was written to be correct, not fast.
And for the past four or five years its been running, performance has never
been a real issue. But that's changing, as the projected traffic levels shoot
past the “oh my” and into the “oh my God” territory.
“Project: Sippy-Cup” processes a lot of SIP (Session Initiation Protocol)
messages, which are text based, so there's a lot of text processing. I use
LPEG [2] for ease of writing parsers, but it's not necessarily as fast as it
could be.
There are two issues with LPEG—it has infinite look-ahead, and ordered
choices. So the code that checks for the SIP method:
-----[ Lua ]-----
method = lpeg.P"ACK"
+ lpeg.P"BYE"
+ lpeg.P"CANCEL"
+ lpeg.P"INFO"
+ lpeg.P"INVITE"
+ lpeg.P"MESSAGE"
+ lpeg.P"NOTIFY"
+ lpeg.P"OPTIONS"
+ lpeg.P"PRACK"
+ lpeg.P"PUBLISH"
+ lpeg.P"REFER"
+ lpeg.P"REGISTER"
+ lpeg.P"SUBSCRIBE"
+ lpeg.P"UPDATE"
+ (lpeg.R("AZ","az","09") + lpeg.S"-.!%*_+`'~")^1
-----[ END OF LINE ]-----
will first compare the input to “ACK”; if it doesn't match, it then
backtracks and tries comparing the input to “BYE”, and so on down the list
until it gets the last rule which is a “catch-all” rule. It would be easy to
reorder the list so that the checks are “most-likely” to “least-likely,” but
really the entire list could be removed leaving just the catch-all:
-----[ Lua ]-----
method = (lpeg.R("AZ","az","09") + lpeg.S"-.!%*_+`'~")^1
-----[ END OF LINE ]-----
I have the same issue with SIP headers—there are 100 headers that are
“parsed” (for various values of “parsed”) but I only really look at a dozen
headers—the rest just slow things down and can be passed by a generic parsing
rule. The full headers were added during the “proof-of-concept” stage since I
wasn't sure at the time which headers would be critical and which ones
wouldn't, and I've never gone back and cleaned up the code.
Another aspect is the sheer number of validity checks the code makes on the
incoming SIP message. Many of the checks don't really have any effect on the
processing due to managerial mandate at the time, so they could go (I wanted
strict checking that bailed on any error; my manager at the time did not want
such strictness—no need to guess who won, but I still track parsing
irregularities).
So while I feel these are two areas where the code could be made faster, I
don't know if that's where the time is spent, and so it's time to profile the
code.
The issue now is that the system profiler will profile the code as C, not as
Lua. I don't need to profile the code to know the Lua VM (Virtual Machine)
gets called all the time. What I need to know is what Lua code is called all
the time. But it can't hurt to try the system profiler, right? And given that
the regression test has over 12,000 test cases, we should get some good
profiling information, right?
Table: Original Profile—Each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call name
------------------------------
13.32 3.47 3.47 match
12.74 6.79 3.32 luaV_execute
9.31 9.22 2.43 luaS_newlstr
6.83 11.00 1.78 luaD_precall
5.31 12.38 1.39 luaH_getstr
3.38 13.26 0.88 luaD_poscall
2.57 13.93 0.67 index2adr
2.19 14.50 0.57 luaV_gettable
Not bad at all. The function match() is the LPEG execution engine, which
matches my intial thoughts on the code. It wasn't much work to remove
extraneous SIP headers I don't bother with, and to simplify the method
parsing (see above). Re-profile the code and:
Table: Modified LPEG—Each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call name
------------------------------
14.25 3.67 3.67 luaV_execute
11.22 6.56 2.89 luaS_newlstr
10.49 9.26 2.70 match
6.33 10.89 1.63 luaD_precall
5.20 12.23 1.34 luaH_getstr
2.76 12.94 0.71 index2adr
2.58 13.61 0.67 luaD_poscall
2.41 14.23 0.62 luaV_gettable
match() drops from first to third place, so that's good. And a load test done
by the QA (Quality Assurance) engineer showed an easy 25% increase is message
processing.
But that's really as far as I can go with profiling. I did a run where I
removed most of the validation checks (but only after I saw none of them were
triggered over the past 30 days) and didn't see much of a speed improvement.
So I really need to profile the Lua code as Lua code and not as C.
That's going to take some work.
[1]
gopher://gopher.conman.org/0Phlog:2014/03/05.1
[2]
http://www.inf.puc-rio.br/~roberto/lpeg/
Email author at
[email protected]