* * * * *

                The Ins and Outs of Calculating Browser Usage

I spent the past few hours writing a program to parse the browser string from
the web server log files. Why didn't I use an existing web analyizer package?
I wanted the browser strings to be rewriten to have correct information, as
well as being in a more consistent style. This meant changing it from, say:

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)

to

MSIE/6.0 Windows/98

This also means I can generate decent stats about the popularity of certain
browsers on the fly (using the Unix command line, I can pull out the browser
string, feed that through the newly written program, then count unique
browsers easier). An initial run through last month's log file for my blog:

Table: Browser Statistics for The Boston Diaries
# Hits  Browser/Version OS/Version
1,228   Googlebot/2.1   -/-
748     MSIE/6.0        WindowsNT/5.1
712     MSIE/6.0        Windows/98
641     MSIE/6.0        WindowsNT/5.0
476     Mercator/2.0    -/-
371     MSIE/5.5        Windows/98
303     MSIE/5.0        Windows/98
302     MSIE/5.5        WindowsNT/5.0
238     -/-     -/-
216     MSIE/5.01       WindowsNT/5.0
137     ia_archiver/-   -/-
113     Syndic8/1.0     -/-
101     NCSA/-  -/-
101     MSIE/5.01       Windows/98
100     MSIE/6.0        WindowsNT/4.0
99      Mozilla/3.01    -/-
89      Gecko/20020529  Linux/i686
88      Gecko/20020523  WindowsNT/5.0
81      MSIE/5.14       Mac_PowerPC/-
79      Mozilla/5.0     -/-
68      SlySearch/1.2   -/-
66      MSIE/5.5        Windows/95
62      MSIE/5.5        WindowsNT/4.0
62      Gecko/20020529  PPC/Mac
61      Openfind/-      -/-
55      MSIE/5.0        Mac_PowerPC/-
49      Indy-Library/-  -/-
48      Gecko/20020510  Linux/i686
42      Mozilla/3.0     -/-
41      sitecheck.internetseer.com/-    -/-
40      Gecko/20020311  WindowsNT/5.1
38      MSIE/5.01       Windows/95
36      [email protected]/-        -/-
33      Gecko/20020530  WindowsNT/5.0
28      bumblebee/1.0   -/-
28      Gecko/20020510  WinNT4.0/-
27      Opera/6.02      Windows/2000
27      MSIE/5.0        WindowsNT/4.0

This gives a decent flavor for what's being used to view my site (out of the
7,943 hits last month, about 16% were from the Google spider [1]) but one of
the primary reasons I did this was to see just how many people are still
using older browsers like Netscape 4x or Internet Explorer 4x (which would
show up as Mozilla/4.x and MSIE/4.x respectively). So, strip out the
operating system column, and look at only the major version numbers, we then
get:

Table: More Specific Browser Statistics for The Boston Diaries
# Hits  Browser/major Version
2,210   MSIE/6
1,671   MSIE/5
1,228   Googlebot/2
543     Gecko/-
476     Mercator/2
238     -/-
142     Opera/6
141     Mozilla/3
137     ia_archiver/-
134     Mozilla/4
113     Syndic8/1
101     NCSA/-
79      Mozilla/5
68      SlySearch/1
61      Openfind/-
49      Indy-Library/-
45      MSIE/4
41      sitecheck.internetseer.com/-
37      Netscape6/6.2
36      [email protected]/-
28      bumblebee/1
26      linkhype.com/1
26      Netscape/7
24      BlogBot/1
22      Win32/-
22      Konqueror/3.0
20      Frontier/8.0
16      Internet/-
16      Ask-Jeeves/-
15      Mozilla/-
14      Microsoft/-
14      Konqueror/2.2
12      w3m/0.2
12      obidos/bot
12      Mozilla/4.7C-CCK-MCD
11      myownhomeblogindexingservicecrawler/-
11      htdig/3.1
10      Mozilla/3.x

The bad news: 48% of the browsers were Internet Explorer 5x or 6x (although
surprisingly enough, I did get five hits from a Mozilla [2] based browser
under OS/2). The good news though, is that 58% of the hits were from browsers
capable of viewing CSS (Cascading Style Sheets) without crashing. And
speaking of horrible browsers that can't support CSS, about 2.5% were running
Netscape 4x or IE 4x (they can see the site, only it doesn't look that
great).

I also checked the log file for Spring's [3] site (Hi honey!). 53% of her
visitors are using Internet Explorer 5 or higher, or Mozilla (or Netscape 6
and higher). Only about 3% are using Netscape 4x or Internet Explorer 4x,
which is pretty much on par with my site (the rest are mostly robots or
experiemental browsers).

[1] http://www.googlebot.com/bot.html
[2] http://www.mozilla.org/
[3] http://www.springdew.com/

Email author at [email protected]