* * * * *

                    Are you tired of network tarpits yet?

> You would not believe how hard it was to write a binary search that
> returned the correct index for a missing record in an array.
>

“Some notes on a binary search implementation [1]”

A week later, and I finally have it working.

One technique used to debug a program is to have another program that does
the same thing, but implemented using a different method or language or both.
And I did. I ran the Perl program I had over the 1.1G (Gigabyte) log file
[2], then ran ltpstat over the same log file and got two different results.

Not good.

ltpstat returned 2% more connections than the Perl script. Getting a dump
from the currently running version on the LaBrea [3] system and cleaning the
output showed a 2% difference again.

So I spent the past week trying to track down the problem. It was obvious
that ltpstat was storing duplicate records, but why was a different matter.
My testing sample of about 1,100 connections is apparently too small to
completely test the program, so I had to test using the 1.1G log file which
has approximately 230,000 connections.

To help debug this problem, I wrote a linear search and would call it as well
as the binary search. If both agreed, then I would return the information,
otherwise, I would log the discrepency, do the search again, then exit. The
reason for doing the search a second time? So I could set a breakpoint there,
and let the program run for a couple of hours until it triggered. Then I
could step through both searches to see where the problem was.

Yup, each run took several hours to trigger the bug.

I ended up testing four different binary search routines (including the
original one I thought worked, plus one I modified from _The Standard C
Library_ [4], plus two other versions I wrote) before sitting down and
working through things on paper.

And I still missed corner cases.

But finally, I tested my final version it against the Perl script and only
had 122 discrepencies out of some 230,000 records (or 5% of 5%—too small for
me to worry about after spending a week on this).

* * * * *

I took a snapshot of the currently running version (which had been running
for a bit over three days now), cleansed the output of duplicates, and the
final tally was 416,230 connections from 12,911 unique IPs. Again, nothing
surprising about the ports being attacked:

Table: Top 10 ports captured by Labrea in the past 3 days
Port #  Port description        # connections
------------------------------
139     NetBIOS (Basic Input/Output System) Session Service     160,799
135     Microsoft-RPC (Remote Procedure Call) service   108,958
445     Microsoft-DS (Directory Service?) Service       67,506
80      Hypertext Transfer Protocol     23,921
4899    Remote Administration [5]       9,225
22      Secure Shell Login      7,253
1433    Microsoft SQL (Standard Query Language) Server  6,503
8080    Hypertext Transfer Protocol—typical alternative port  3,717
3128    Squid HTTP (HyperText Transport Protocol) Proxy 3,329
1080    W32.Mydoom.F@mm worm [6]        3,150

------------------------------
Port #  Port description        # connections
And again, the Microsoft specific ports account for 81% of the scans. I'll
need to discuss with Smirk about blocking those ports in the core router. If
nothing else, LaBrea is giving me an indication of which ports to block.

[1] gopher://gopher.conman.org/0Phlog:2006/01/15.3
[2] gopher://gopher.conman.org/0Phlog:2006/01/18.2
[3] http://sourceforge.net/projects/labrea
[4] http://www.amazon.com/exec/obidos/ASIN/0131315099/conmanlaborat-20
[5] http://www.famatech.com/
[6] http://securityresponse.symantec.com/avcenter/venc/data/w32.mydoom.f@mm.html

Email author at [email protected]