hglogstat (1)

Name

hglogstat - create statistics about WWW-access to Hyper-G servers

Synopsis

hglogstat parameters
For a short description of the parameters try hglogstat -h. For more details
see below.

Description

Based on the logfiles produced by the WWW-gateway, hglogstat produces
access-statistics by collecting various information (see Modes). Depending
on the users' selection the tool may put the results into an HTML-document
which is immediately inserted into a defined Hyper-G collection (along with
some graphic representation), or it writes detailed information about
requested objects, searches and failed searches to a file, or, as a third
possibility, the tool may produce overall statistics, based on information
gathered during either of the previous modes. (The first two actions may be
combined in a single run, the third one needs an extra run.)

Modes

hglogstat may execute in three different modes, the first two of which can
be combined into a single run.

  * Produce Exhaustive Statistics

    In this mode, information in collected from the logfiles and presented
    in following categories:

       o General Information lists the total number of sessions, user
         requests, robot requests, redirected requests, failed requests,
         successful search requests, failed search requests, requesting
         hosts, bytes transferred etc.

       o Details about Requested Objects include how many distinct objects
         have been requested how many times within the relevant period of
         time, and which objects have been requested but could not be
         delivered, for what ever reason. (The reasons are listed in an
         extra domain.)

       o Details about Searches list the objects that have successfully
         been searched for, and also those for with a searches did not
         yield any result.

       o Details about Actions show the most frequent successful and failed
         actions.

       o Details about User Access shows the most frequent referring pages,
         entry points, user agents and domain names of the requesting
         hosts.

       o Time Information finally shows requests and sessions per hour and
         day. Since this information does not say too much when presented
         in a tabular form, it is also presented in a graphic form,
         accessible via hyperlinks. (For the creation of these graphics,
         Gnuplot is applied)

    The parameter -top specifies how many of these items shall be listed in
    the report; the default value is 20.
    There is also a possibility to make sure certain items do not appear in
    the report. This is achieved by adding them to a list in the file
    hglogstat.rc (for more details, see below.)

  * Save Detailed Information

    In this mode, all requested objects, searches and failed searches (all
    along with the number of occurrence) are written to a file. From there,
    this information may further be processed by other tools.
    The script collstat, for example, uses these files to produce
    statistics about single collections instead of the whole server.

  * Produce Overall Statistics

    When hglogstat executes in the first mode, it outputs the number of
    sessions to the file sessions.log, the number of requests to the file
    requests.log. (These files are in the same directory as the script.)
    Taking this information, daily and monthly summaries may be generated
    in this mode.

Parameters

-html
    Mode 1. The script will produce detailed statistics, output an HTML
    document and insert it into the specified collection.

-details
    Mode 2. Output all requested objects, searches and failed searches in
    plain ASCII (for further use).

-overall
    Mode 3. Produce overall statistics using results from previous runs.

-dir
    Defines the directory the logfiles are stored in. Only logfiles in this
    directory will be examined.

-file
    Gives the name of the current logfile. The default name is wwwlog, so
    this parameter may be omitted.
    Old logfiles are supposed to be consist of the given filename followed
    by a timestamp (e.g. wwwlog.30703723). Optionally, these files may also
    be gzipped; in this case, the tool temporarily expands them (using gzip
    -c). So, giving 'wwwlog' as filename actually means all files matching
    wwwlog[.timestamp[.gz]].

-hghost
    Name of Hyper-G host ...

-pname
    ... and name of collection to put the HTML document into.

-imgcoll
    Name of collection to put images into (by default, equals the
    collection defined by -pname).

-hname
    Hostname that shall appear in the summary's title. This option may be
    used in Mode 1, when an alias name shall be used instead of the host's
    domain name within the report.
    When the script is executed in Mode 2 only, there is no need to define
    -hghost, -pname and -imgcoll, since an ASCII-file is the only thing
    that will be output. So -hname may be used to still give the host a
    name.

-from
    First day to analyze. Should be in the form yy/mm/dd.

-to
    Last day to analyze. By default, yesterday's date is assumed. Format as
    above.

-lastseven
    Analyzes the last seven days (may be used instead of -from and -to).

-lastmonth
    Analyzes the last month (may be used instead of -from and -to).

-top
    Specifies the top n items to be listed (20 by default).

-v
    Verbose mode.

hglogstat.rc

It has been mentioned above that unwanted items may be excluded from the
summaries by adding their titles to the hglogstat.rc file. This file must be
located in the same directory as hglogstat.
The list of items in this file may be divided into several categories, each
headed by a line identifying the type of objects to follow. So far,
requested objects, entry pages and user agents may be skipped, the
corresponding heading lines are _SKIP_OBJECTS_, _SKIP_ENTRIES_ and
_SKIP_AGENTS_.
Lines starting with # are considered to be comments.
It shall be emphasized, however, that the items that appear in hglogstat.rc
are excluded from the top-n lists only; they still count as requested
objects or entry pages!

An example of an hglogstat.rc file:

# unwanted objects
_SKIP_OBJECTS_
options.gif
home.gif
search.gif
info.gif
coll_open.gif
coll_clos.gif

# unwanted entry pages
_SKIP_ENTRIES_
/
identify.gif
options.gif
search.gif
help.gif
home.gif
coll_open.gif
coll_clos.gif
text.gif
info.gif

What is necessary to run hglogstat?

hglogstat is a perl script and takes advantage of the features new in perl
5. So, the first prerequisite is perl 5 to be installed on your system.
The graphics are produced by Gnuplot, which is called by the script. So,
this has to be installed, too. Since Gnuplot does not produce gif outputs
(at least my version 3.5 (pre 3.6) does not), ppmtogif is called to do the
translation. So, this, too, should be installed.
Finally, insertion of the HTML document into the database is done by
hginstext. If you have this installed on your system, too, nothing can keep
you from working with hglogstat.

Known Bugs

Of course, there are some minor bugs, but none of them is really serious.

  * POST Requests

    These requests sometimes are the start of a new session, sometimes they
    are not. In the logfile, however, they are simply declared as POST
    Requests. As a consequence, the exact number of sessions cannot be
    figured out, the result slightly diverges from the result obtained by
    analyzing the dbserver's logfiles.
    In numbers, the deviation within a month is a few hundred, which is
    less than 0.5% and usually may be neglected.
    To eliminate this bug, the logfile's format must be changed, which it
    will anyway soon.

  * Gnuplot

    It is a great graphics tool, but sometimes it behaves a bit strange.
    I place two plots on one screen, and although they start at the same
    x-position, the second plot is moved one unit to the right - but only
    on some architectures. There is a simple remedy to this - forcing a
    plot at (0,0) which is invisible - but this produces faulty behaviour
    on other architectures.
    Till now, I have not found an elegant solution.

Author

Alfons Schmid ([email protected]) - April 2, 1996