* * * * *
The Ins and Outs of Calculating Browser Usage
I spent the past few hours writing a program to parse the browser string from
the web server log files. Why didn't I use an existing web analyizer package?
I wanted the browser strings to be rewriten to have correct information, as
well as being in a more consistent style. This meant changing it from, say:
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)
to
MSIE/6.0 Windows/98
This also means I can generate decent stats about the popularity of certain
browsers on the fly (using the Unix command line, I can pull out the browser
string, feed that through the newly written program, then count unique
browsers easier). An initial run through last month's log file for my blog:
Table: Browser Statistics for The Boston Diaries
# Hits Browser/Version OS/Version
1,228 Googlebot/2.1 -/-
748 MSIE/6.0 WindowsNT/5.1
712 MSIE/6.0 Windows/98
641 MSIE/6.0 WindowsNT/5.0
476 Mercator/2.0 -/-
371 MSIE/5.5 Windows/98
303 MSIE/5.0 Windows/98
302 MSIE/5.5 WindowsNT/5.0
238 -/- -/-
216 MSIE/5.01 WindowsNT/5.0
137 ia_archiver/- -/-
113 Syndic8/1.0 -/-
101 NCSA/- -/-
101 MSIE/5.01 Windows/98
100 MSIE/6.0 WindowsNT/4.0
99 Mozilla/3.01 -/-
89 Gecko/20020529 Linux/i686
88 Gecko/20020523 WindowsNT/5.0
81 MSIE/5.14 Mac_PowerPC/-
79 Mozilla/5.0 -/-
68 SlySearch/1.2 -/-
66 MSIE/5.5 Windows/95
62 MSIE/5.5 WindowsNT/4.0
62 Gecko/20020529 PPC/Mac
61 Openfind/- -/-
55 MSIE/5.0 Mac_PowerPC/-
49 Indy-Library/- -/-
48 Gecko/20020510 Linux/i686
42 Mozilla/3.0 -/-
41 sitecheck.internetseer.com/- -/-
40 Gecko/20020311 WindowsNT/5.1
38 MSIE/5.01 Windows/95
36
[email protected]/- -/-
33 Gecko/20020530 WindowsNT/5.0
28 bumblebee/1.0 -/-
28 Gecko/20020510 WinNT4.0/-
27 Opera/6.02 Windows/2000
27 MSIE/5.0 WindowsNT/4.0
This gives a decent flavor for what's being used to view my site (out of the
7,943 hits last month, about 16% were from the Google spider [1]) but one of
the primary reasons I did this was to see just how many people are still
using older browsers like Netscape 4x or Internet Explorer 4x (which would
show up as Mozilla/4.x and MSIE/4.x respectively). So, strip out the
operating system column, and look at only the major version numbers, we then
get:
Table: More Specific Browser Statistics for The Boston Diaries
# Hits Browser/major Version
2,210 MSIE/6
1,671 MSIE/5
1,228 Googlebot/2
543 Gecko/-
476 Mercator/2
238 -/-
142 Opera/6
141 Mozilla/3
137 ia_archiver/-
134 Mozilla/4
113 Syndic8/1
101 NCSA/-
79 Mozilla/5
68 SlySearch/1
61 Openfind/-
49 Indy-Library/-
45 MSIE/4
41 sitecheck.internetseer.com/-
37 Netscape6/6.2
36
[email protected]/-
28 bumblebee/1
26 linkhype.com/1
26 Netscape/7
24 BlogBot/1
22 Win32/-
22 Konqueror/3.0
20 Frontier/8.0
16 Internet/-
16 Ask-Jeeves/-
15 Mozilla/-
14 Microsoft/-
14 Konqueror/2.2
12 w3m/0.2
12 obidos/bot
12 Mozilla/4.7C-CCK-MCD
11 myownhomeblogindexingservicecrawler/-
11 htdig/3.1
10 Mozilla/3.x
The bad news: 48% of the browsers were Internet Explorer 5x or 6x (although
surprisingly enough, I did get five hits from a Mozilla [2] based browser
under OS/2). The good news though, is that 58% of the hits were from browsers
capable of viewing CSS (Cascading Style Sheets) without crashing. And
speaking of horrible browsers that can't support CSS, about 2.5% were running
Netscape 4x or IE 4x (they can see the site, only it doesn't look that
great).
I also checked the log file for Spring's [3] site (Hi honey!). 53% of her
visitors are using Internet Explorer 5 or higher, or Mozilla (or Netscape 6
and higher). Only about 3% are using Netscape 4x or Internet Explorer 4x,
which is pretty much on par with my site (the rest are mostly robots or
experiemental browsers).
[1]
http://www.googlebot.com/bot.html
[2]
http://www.mozilla.org/
[3]
http://www.springdew.com/
Email author at
[email protected]