* * * * *
I solved the issue, but I'm not sure what the issue was
It's a bug whose solution I can't explain.
So I have a Lua module that enables event-driven programming [1]. I also have
a few modules that drive TCP (Transmission Control Protocol) and TLS
(Transport Layer Security) connections [2]. To make it even easier, I have a
module that presents a Lua file-like interface to network connections [3]—
functions like obj:read() and obj:write().
The previous version of this interface module, org.conman.net.ios, used LPEG
(Lua Parsing Expression Grammar) [4] to handle line-based IO (Input/Output)
requests, as well as an extension to read headers from an Internet message
[5]. Given the overhead of LPEG [6] I thought I might try using the built-in
pattern matching of Lua. I reworked the code and a bench mark did show a
decent and measurable improvement in speed and memory usage.
But the new code failed when transferring a sizable amount of data (about
6.7M (Megabytes)) over TLS. It took about two days to track down the problem,
and I still don't have a root cause. The code works, but I don't know why it
works. And that bugs me.
To further complicate matters, the code did work when I download the data
from a server I wrote (using the same Lua code as the client) but it would
fail when I tried downloading the data from another server (different TLS
implementation, different language, etc.).
I was able to eventually isolate the issue down one function in
org.conman.net.ios. Here was the original code:
-----[ Lua ]-----
local function write(ios,...)
if ios._eof then
return false,"stream closed",-2
end
local output = ""
for i = 1 , select('#',...) do
local data = select(i,...)
if type(data) ~= 'string' and type(data) ~= 'number' then
error("string or number expected, got " .. type(data))
end
output = output .. data
end
return ios:_drain(output)
end
-----[ END OF LINE ]-----
It works, but I didn't like that it first accumulated all the output first
before writing it. So when I rewrote org.conman.net.ios, I modified the
function thusly:
-----[ Lua ]-----
local function write(ios,...)
if ios._eof then
return false,"stream closed",-2
end
for i = 1 , select('#',...) do
local data = select(i,...)
if type(data) ~= 'string' and type(data) ~= 'number' then
error(string.format("bad argument #%d to 'write' (string expected, got %s)
end
data = tostring(data)
local okay,err,ev = ios:_drain(data)
if not okay then
syslog('error',"ios:_drain() = %s",err)
return okay,err,ev
end
ios._wbytes = ios._wbytes + #data
end
return true
end
-----[ END OF LINE ]-----
Instead of accumulating the data into one large buffer, it outputs it
piecemeal. To further confound things, this doesn't appear to have anything
to do with reading, which is what I was having issues with.
The client only did one call to this function:
-----[ Lua ]-----
local okay,err = ios:write(location,"\r\n")
-----[ END OF LINE ]-----
The request went out, I would start receiving data, but for some odd reason,
the connection would just drop about 200K (Kilobytes) short of the full file
(it was never a consistent amount either).
While the reading side was a different implementation, the writing side
didn't have to be different, I just felt the second vesion to be a bit
better, and it shouldn't make a difference, right? [There's that word! –
Editor] [What word? –Sean] [“Should.” –Editor] But regardless of my feelings
about how that certainly can't be at fault, I put the previous version of
write() back and lo'! It worked!
…
I'm flummoxed!
I don't understand why the new version of write() would cause the TLS
connection to eventually fail, but it did, for whatever reason. Weird.
[1]
https://github.com/spc476/lua-conmanorg/blob/master/lua/nfl.lua
[2]
https://github.com/spc476/lua-conmanorg/tree/master/lua/nfl
[3]
https://github.com/spc476/lua-conmanorg/blob/master/lua/net/ios.lua
[4]
http://www.inf.puc-rio.br/~roberto/lpeg/
[5]
https://www.ietf.org/rfc/rfc5322.txt
[6]
gopher://gopher.conman.org/0Phlog:2020/12/19.2
Email author at
[email protected]