* * * * *

          A good example of building parsing expressions on the fly

As part of the regression test, one of the ways I check the results is to
scan through the log files, checking the output matches what we expect to
happen. The components (and there a quite a few) log KPI (Key Performance
Indicator)s in the format:

> stats|F0021: fooreq=123 fooabrts=0 fooerrs=1 foo-qs=2/14 foo-latency=1.2/4.2/15.2ms foo-bar-xlats=3
>

(Each log line from each component contains a unique identifier which makes
it easy to grep for particular lines of interest in the logs.)

In the past week, a major change was made in how the KPI are reported (used
to be cumulative over the run of the program—now they're reset as they're
logged) required a change in how my testing program processed the logs and I
decided this was a good time as any to make a dramatic change in how that's
done.

Prior, I used the built in Lua string patterns [1] (think regular
expressions) for parsing, but as happens, such code tends to be hard to read,
and additional processing is required to massage the resulting values from
strings to numbers. So I thought, why not use LPeg [2] for my parsing needs?

> -- The rest of the code assumes the following has been declared
> local lpeg = require "lpeg"
>
> local Ct = lpeg.Ct -- capture results into a table
> local Cc = lpeg.Cc -- matches "", return given value as capture
> local Cg = lpeg.Cg -- capture results and assign to given name
> local R  = lpeg.R  -- range capture
> local P  = lpeg.P  -- literal capture
> local S  = lpeg.S  -- character set capture
>

To parse a number, we look for at least one character in the range of “0” to
“9” (inclusive), and pass the match to a function that will convert the
string to a numeric value, which is returned as the capture:

> local num = R"09"^1 / function(c) return tonumber(c) end
>

For non-integers, we can use the following bit of code—yes, this won't parse
all floating point numbers, but we won't need much more than this for what's
logged—an optional minus sign, followed by digits, and an optional decimal
point plus more digits, which is also translated t a numeric value:

> local fnum = (P"-"^-1 * num * (P"." * num)^-1) / function(c) return tonumber(c) end
>

We also have a few multi-value KPIs, some with two values, and some with
three values. These are parsed with the values returned in a table with the
values in conveniently named fields:

> local lat  = Ct(                        -- latency
>                   P"no-data"      *
>                   Cg(Cc(0),'min') *
>                   Cg(Cc(0),'avg') *
>                   Cg(Cc(0),'max') *
>                   Cg(Cc(false),'valid')
>                )
>              +
>              Ct(
>                  Cg(fnum,'min') * P'/'  *
>                  Cg(fnum,'avg') * P'/'  *
>                  Cg(fnum,'max') * P'ms' *
>                  Cg(Cc(true),'valid')
>                )
>
> local cnt  = Ct(                        -- counts
>                  P"no-data" *
>                  Cg(Cc(0),'avg') *
>                  Cg(Cc(0),'max') *
>                  Cg(Cc(false),'valid')
>                )
>              +
>              Ct(
>                  Cg(fnum,'avg') * P'/' *
>                  Cg(fnum,'max') *
>                  Cg(Cc(true),'valid')
>                )
>

The only wrinkle here (and it's not much of a wrinkle) is that

> foo-qs=no-data foo-latency=no-data
>

can be logged and we need to take that into account. If no-data is seen, the
code returns the following table (if we're parsing the three value KPI; the
two value KPI returns something similar):

> {
>   min = 0.0,
>   avg = 0.0,
>   max = 0.0,
>   valid = false
> }
>

otherwise, the following will be returned:

> {
>   min = 1.2,
>   avg = 4.2,
>   max = 15.2,
>   valid = true
> }
>

To parse the actual KPIs, I could have done something like:

> F0021 = Ct(
>             P": " -- I have a reason for starting the parse at the ':'
>           * P"fooreq="        * Cg(num,'foofeq')      * P" "
>           * P"fooabrts="      * Cg(num,'fooabrts')    * P" "
>           * P"fooerrs="       * Cg(num,'fooerrs')     * P" "
>           * P"foo-qs="        * Cg(cnt,'foo-qs')      * P" "
>           * P"foo-latency="   * Cg(lat,'foo-latency') * P" "
>           * P"foo-bar-xlats=" * Cg(num,'foo-bar-xlats')
>       )
>

But I didn't. Not because it's a lot of typing, but because the probability
of errors is high because of the repetitious nature of it—I'm repeating the
field names twice (look closely and you can see where I messed up). I'd
rather not repeat myself [3], especially since I have multiple lines of KPIs
to parse. Ideally, I'd like to specify the name of the field once, something
like:

> {
>   { "fooreq"          , num } , -- might as well include
>   { "fooabrts"                , num } , -- the type of the KPI
>   { "fooerrs"         , num } , -- while I'm at it
>   { "foo-qs"          , cnt } ,
>   { "foo-latency"     , lat } ,
>   { "foo-bar-xlats"   , num }
> }
>

And, because LPeg is composable [4], I can get away with that:

> local function mkmatch(list)
>   local pattern = P": "
>
>   for i = 1 , #list do
>     pattern = pattern
>             * P(list[i][1])           -- look for name of field
>             * P"="
>             * Cg(list[i][2],list[i][1]) -- capture value as name
>             * P" "^0                  -- optional space at the end
>   end
>
>   return Ct(pattern) -- return all captures in a table
> end
>
> F0021 = mkmatch {
>   { "fooreq"          , num } ,
>   { "fooabrts"                , num } ,
>   { "fooerrs"         , num } ,
>   { "foo-qs"          , cnt } ,
>   { "foo-latency"     , lat } ,
>   { "foo-bar-xlats"   , num }
> }
>

And there goes the repetition.

Now, why do I start parsing at the colon? Simple, I use the Lua string
patterns to quickly check the log lines—that's fairly easy to see what's
going on, and from there, I can call the appropriate LPeg pattern to further
parse the KPIs.

[1] http://www.lua.org/manual/5.1/manual.html#5.4.1
[2] http://www.inf.puc-rio.br/~roberto/lpeg/
[3] http://c2.com/cgi/wiki?DontRepeatYourself
[4] gopher://gopher.conman.org/0Phlog:2013/01/14.1

Email author at [email protected]