Floodgap Gopher Statistics Project methodology
UPDATED 1 December 2007

The aim for the GStats project is not to catalogue a complete usage and
activity count for the whole of modern Gopherspace, which would neither be
accurate nor reasonably possible.

Instead, the GStats project uses access data from the Floodgap Public Gopher
Proxy to generate usage totals. This is useful because the Proxy can access
any host -- just like a Gopher client, since it *is* one -- so it provides a
very useful analysis of a large, anonymous user base's activity on Gopherspace.

In the statistics, a successful connection to a valid gopher host is one hit.
This means that accesses to hosts that were once valid but are not now will
not be counted, which may underreport interest in former sites, but also
allows us to screen out people probing the Proxy for security issues (such
as typing in www.myspace.com and expecting it to act as an HTTP proxy instead).

Monthly these hits are then aggregated into a total count and plotted on a
rolling seven-month history for trend analysis, along with monthly pie charts
(using GNUplot and ascii_chart). A count of number of IP/port pairs accessed
is also generated. This is only indirectly comparable to the Veronica-2 count,
as it counts host names instead, and is also usually behind on indexing new
hosts due to its data massage cycles.

From 5/07 to 9/07 inclusive, the Proxy was not configured to do traffic
analysis and these figures were generated retrospectively from the webserver
log. These figures are likely to be slightly higher than usual because it
was not possible in all cases to screen out proxy abuse, although an effort
to eliminate common probing attempts was made on the data set. For this
reason, the host statistics are also likely to be slightly more inflated.
However, I believe the difference is likely not large, so I have included
these data sets in the rolling history and made them available.

Again, I repeat that the numbers should not be interpreted as:
- A total number of hosts in Gopherspace: merely a total number of IP/ports
that were accessed through the Proxy. Veronica-2 is likely to have a more
accurate count.
- A total assessment of all traffic in Gopherspace: most Gopherspace
traffic actually occurs directly (at least at Floodgap) from clients or
web browsers with Gopher support. Proxy traffic is at most a minority of
access here, and is probably the same for most other Gopher sites.

I would appreciate your comments.

       [email protected]