news.utdallas.edu!tamsun.tamu.edu!cs.utexas.edu!zaphod.mps.ohio-state.edu!uwm.edu!ux1.cso.uiuc.edu!news.cso.uiuc.edu!ekatz Mon Mar 22 15:42:22 CST 1993
Article: 1519 of comp.infosystems.gopher
Xref: feenix.metronet.com comp.infosystems.gopher:1519 alt.gopher:113
Newsgroups: comp.infosystems.gopher,alt.gopher
Path: feenix.metronet.com!news.utdallas.edu!tamsun.tamu.edu!cs.utexas.edu!zaphod.mps.ohio-state.edu!uwm.edu!ux1.cso.uiuc.edu!news.cso.uiuc.edu!ekatz
From: [email protected] (Eric Katz)
#Subject: GopherReport v.2.0
Date: Mon, 22 Mar 1993 17:14:40 GMT
Message-ID: <[email protected]>
Distribution: usa
Sender: [email protected] (Eric Katz
Originator: [email protected]
Organization: Nat'l Ctr for Supercomp App (NCSA) @ University of Illinois
Lines: 536

Following is the new release of GopherReport version 2.0.  Thanks to all
who have contributed comments and bug reports.  There are 2 important
items to remember with this new release.

1. invoking with now arguements now displays a help page.
2. This version corrects a bug that incorrectly calculated file/directory
  accesses.  I apologize for any inconvenience this may have caused.

See the revision history in the program text for a summary of features
and bug fixes.

<<<<<  CUT HERE >>>>

#!/usr/local/bin/perl
sub USAGE {
print "USAGE: GopherReport -a|d|t|f|h|e [-s] [-vN] [-l logname] [keys]
       a       All
       d       Daily
       t       Time
       f       Files
       h       Host
       e       Errors
       s       Summary
       l       Name of the log file
       vN      Verbosity
       D       Current Day only\n";
print " keys    To limit the report to specific days, times, files, and/or
               hosts.  Any combination of keys may be specified.  See
               examples below.
               Try to use the following conventions for the best results:
               Search for a date:   'nn'       ie: 10
               Search for a time:   'nn:'      ie: 10:
               Search for a Day:    'Xxx'      ie: Sun or Sunday
               Search for a file:   'xxx'      ie: software
               or                   'xx/'          you may need a leading
               or                   '/xx'          or trailing '/'
               Search for a domain: '.xxx'     ie: .edu or ncsa\n\n";
               exit;
}

# Written by: Eric Katz
# email:      [email protected]
# If you use this program regularly please send e-mail to the above
# address and I will try to provide you with updated versions as they
# come available.  I am also interested in knowing how often you rotate
# your gopher log and how long you keep old logs.
#########################################################################
#  Permission is granted to anyone to use this software for any purpose on
#  any computer, and to alter it for your personal use.  Please do not
#  distribute any alterations.
#
#  0. This software is provided "as is" and without any express or implied
#     warranties, including, without limitation, the implied warranties of
#     merchantability and fitness for a particular purpose.
#
#  1. The author is not responsible for the consequences of the use of this
#     software, no matter how deleterious, even if they arise from flaws in it.
#
#  2. The origin of this software must not be misrepresented, either by
#     explicit claim or by omission.
#
#
#  3. This software may not be sold.
#
#  4. This notice may not be removed or altered.
#
#########################################################################
# To install - Please remove everthing above the #! line.
# Change the perl path to reflect the location of your perl installation.
# Then answer the following question.

# Whereis your gopher log?
$GopherLOG = "/var/log/gopher_log";

######################################################################
# DO NOT CHANGE ANTHING BELOW THIS LINE FOR DISTRIBUTION.
# You make may changes for your personal use but DO NO DISTRIBUTE YOUR
# CHANGES.
#
# REVISION HISTORY
# V1.0 Released 2/9/93
# V1.1 Released 2/15/93
#       Bug Fix: File display now correctly displays file names with spaces
#       Bug Fix: Now handles file names with regexp in them.
#       Bug Fix: Illegal division by zero if no connections for the report
#                period fixed.
#       Bug Fix: Cleaned up output of large file names
#       Feature: Added number of files accessed.
#       Feature: Added varying level of verbosity to the output of file
#                accesses or host accesses.
#                In the case of file accesses the number represents the depth
#                of the directory tree that will be printed.
#                In other words -v1 will only report on root menu accesses,
#                while -v2 will go one directory deeper and so on.
#                In the case of host accesses, the number represents the
#                top percentile.
#                In other words, -v10 will show the top 10 percent of
#                connections.
#       Feature: Added -s, summary output.  This prints only the total
#                number of connections and files accessed.
#       Feature: Added search key capability.  Any arguments at the end of
#                the command line are assumed to be search key limiters.
#                It will try to 'do the right thing' but it is safer
#                to limit the type of search by using one of the -d, -t,
#                -f, or -h flags.
#
#
#        EXAMPLES:
#                       GopherReport Thu Friday 10 11 14 15 Software
#                               Will report on the number of people who
#                               accessed the Software files on Thursday
#                               and Friday during the 10am, 11am, 2pm,
#                               and 4pm hours.
#                       GopherReport -f edu
#                               Will report on file accesses that have
#                               the edu substring in them.
#                       GopherReport -h edu
#                               Will report on hosts with edu in the name.
#                       GopherReport -fh edu
#                               Will report on both hosts and files with
#                               edu in them.
# V2.0 Released 3/22/93
#
#       Bug Fix: Previous version incorrectly tallied bogus entries.
#       Bug Fix: Broke out directory accesses from file retrievals.
#       Feature: Added -l logname capability to examine logfiles other
#                than the default
#       Feature: Added -D to make it easier to examine the logfile for
#                today's information.
#       Feature: Added -e to scan for error messages
##########################################################################

require 'getopts.pl';
require 'ctime.pl';

# For now the only date information that is needed is the third field
# regardless of whether it is a BSD system or not but we never know
# what the future holds so I included the following subroutines.
if (-e "/vmunix") {
       $BSD = "true";
       }
else {
       $BSD = "false";
}

# Get the current time and format it:
$DATE=&ctime(time),"\n";
if ($BSD eq "true") {
       ($day, $month, $date, $time, $year) = split (" ",$DATE);
}
else {
       ($day, $month, $date, $time, $TZ, $year) = split (" ",$DATE);
}

&Getopts('adDtfhesl:v:');

# If no options are passed on the command line then display help page.
&USAGE if (!($opt_d) &! ($opt_t) &! ($opt_f) &! ($opt_e) &! ($opt_h) &! ($opt_s));

# If an alternate logfile has been requested
if (defined $opt_l) {
       $GopherLOG = $opt_l;
}

# If you want a printout for today only
#$SearchDate = $date if ($opt_D);
if ($opt_D) {
$SearchDate = "\\b" . $date . "\\b" . "|";
}

# If there are no search key limiters.
if (($#ARGV < 0) &! ($opt_D)) {
       $NO_KEY = 1;
       $NO_DAY_KEY = 1;
       $NO_TIME_KEY = 1;
       $NO_OTHERS_KEY = 1;
}

# Else create the 'or' pattern matching string
else {
       for ($i=0;$i<=$#ARGV;$i++) {
               if ($ARGV[$i] =~ /Mon|Tue|Wed|Thu|Fri|Sat|Sun/) {
                       $SearchDay = $SearchDay . $ARGV[$i] . "|";
               }
               elsif ($ARGV[$i] =~ /\d+:/) {
                       chop $ARGV[$i];
                       $SearchTime = $SearchTime . $ARGV[$i] . "|";
               }
               elsif ($ARGV[$i] =~ /\d+/) {
               die "This does not make sense with the -D option\n" if ($opt_D);
               $SearchDate = $SearchDate . "\\b" . $ARGV[$i] . "\\b" . "|";
               }
               else {
                       $SearchOthers = $SearchOthers . $ARGV[$i] . "|";
               }
       }
}
if (defined $SearchDay) {
       chop ($SearchDay);
}
else {
       $NO_DAY_KEY = 1;
}
if (defined $SearchDate) {
       chop ($SearchDate);
}
else {
       $NO_DATE_KEY = 1;
}
if (defined $SearchTime) {
       chop ($SearchTime);
}
else {
       $NO_TIME_KEY = 1;
}
if (defined $SearchOthers) {
       chop ($SearchOthers);

# We need to escape possible wildcard characters in this string, then
# unescape the 'or' character
       $SearchOthers =~ s/\W/\\$&/g;
       $SearchOthers =~ s/\\\|/\|/g;
}
else {
       $NO_OTHERS_KEY = 1;
}


open (LOG,$GopherLOG) || die "Couldn't open log file\n";

# Process the information in the log.
# First, split the line into 2 parts separated by the colon.  This tells
#   you who logged on and what they did.
# Second, split the 'who' variable into the different components.
# Third, split the 'what' variable into it's different components.
# Fourth, If this is the first record in the log then set the start date
#  for the output to the date this entry was made.
# Fifth, set the array information and determine the maximum value for
#  each array.  The maximum value will be used to set the scale for the
#  the graph.
#       If the entry is that of a connection then increment the number
#       of connections, the number of connections for that weekday,
#       the number of connections for that hour of the day, and the
#       number of connections for that host.
#       If the entry is that of an action vs. a connection then increment
#       the number of retrievals for that directory.
#       Sixth, increment the record number.

$RECORD_NUMBER = 1;
while (<LOG>) {
       ($who, $what) = split (/ : /,$_,2);
       ($day, $mon, $date, $time, $year, $pid, $host) = split (/\s+/,$who);
       ($action, $file_type, $path) = split (/\s+/,$what,3);
       chop $path;
       ($hour,$minutes,$seconds) = split (/:/,$time);
       if ($RECORD_NUMBER eq 1) {
               $start_date = "$mon $date, $year @ $time";
       }
# This ridiculously long conditional just tries to do the right thing with
# possible search key limiters
if (($NO_KEY) || ($NO_DAY_KEY || ($day =~ /$SearchDay/)) && ($NO_DATE_KEY || ($date =~ /$SearchDate/)) && ($NO_TIME_KEY || ($hour =~ /$SearchTime/)) && ($NO_OTHERS_KEY || ($path =~ (/$SearchOthers/) || ($host =~ /$SearchOthers/)))) {
       if ( $file_type =~ /Connection/ ) {
               $connections ++;
               $Connect{$day} ++;
                 $DayMax = $Connect{$day} if ($Connect{$day} > $DayMax);
               $Connect{$hour} ++;
                 $HourMax = $Connect{$hour} if ($Connect{$hour} > $HourMax);
               $Machines{$host} ++;
                 $HostMax = $Machines{$host} if ($Machines{$host} > $HostMax);
       }
       elsif (($action =~ /retrieved/) &! ($host =~ /^0\./)) {
               if ($file_type =~ /file/) {
                       $FILE_ACCESSES ++;
                       $Directory{$path} ++;
                       $FileMax = $Directory{$path} if ($Directory{$path} > $FileMax);
               }
               if ($file_type =~ /directory/) {
                       $DIRECTORY_ACCESSES ++;
                       $Directory{$path} ++;
                       $FileMax = $Directory{$path} if ($Directory{$path} > $FileMax);
               }

       }
       else {
               if ($path =~ /not exist!!/) {
                       $EXIST_ERRORS_PRINT = $EXIST_ERRORS_PRINT . "\n$mon $date       $hour:$minutes $action";
                       $EXIST_ERRORS_PRINT = $EXIST_ERRORS_PRINT . " $file_type" if ($file_type !~ /does/);
                       $EXIST_ERRORS ++;
               }
               else {
                       if ($file_type =~ /directory/) {
                               $ACCESS_ERRORS_PRINT = $ACCESS_ERRORS_PRINT . "$mon $date       $hour:$minutes $path\n";
                               $ACCESS_ERRORS ++;
                       }
               }
       }


$RECORD_NUMBER ++;
}
}
close LOG;

if ($opt_e) {
print "==================================================================\n";
print "                 Gopher Error History\n";
print "         Beginning $start_date\n";
print "         Ending    $end_date\n";
print "==================================================================\n";
print "\n";
print "NON-EXISTING FILES: $EXIST_ERRORS Occurences\nDATE       TIME    FILE\n____________________________________________\n";
if ($EXIST_ERRORS) {
print $EXIST_ERRORS_PRINT;
}
print "\n\nNON-ACCESSIBLE DIRECTORIES: $ACCESS_ERRORS Occurences\nDATE  TIME    FILE\n____________________________________________\n";
if ($ACCESS_ERRORS) {
print $ACCESS_ERRORS_PRINT;
}
}

# Set the end date variable to that of the last entry in the log.
$end_date = "$mon $date, $year @ $time";


if ($opt_a) {
       $opt_d = 1;
       $opt_h = 1;
       $opt_t = 1;
       $opt_f = 1;
}
# This number represents either the number of subdirectory levels to report
# or the top x percentage of host connections depending on the context.
if (defined $opt_v){
       $verbosity_limit = $opt_v;
}
else {
       $verbosity_limit = 100;
}

# Set an array to convert 3 letter weekday abbreviations to full names.
@weekdays = ('Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday');

#  Print the header
print "\n\n";
print "==================================================================\n";
print "                 Gopher Usage Report\n";
print "         Beginning $start_date\n";
print "         Ending    $end_date\n";
print "==================================================================\n";
print "\n";
print "                                  Successful     Failed\n";
print "                                 ------------   ---------\n";
print  "Total Number of Connections:             $connections\n";
print  "Total Number of Files Retrieved:         $FILE_ACCESSES         $EXIST_ERRORS\n";
print  "Total Number of Directories Accessed:    $DIRECTORY_ACCESSES            $ACCESS_ERRORS\n";
print "\n";
exit if ($opt_s);

if ($opt_d) {
$SubTotal=0;
# Print the connections/weekday data
print "\n";
print "Connections Per Week Day\n";

       printf ("%-8s%-30s\n%s\n"," Day","Connections","---------------------------------------------------------------------");

# Print the output for each record.

foreach $heading (@weekdays) {
# Convert the heading for each day (long form) into the short form for
# accessing the array information.
$index = substr($heading,0,3);
# Calculate the number of stars to print.
       if ($DayMax > 0 ){
       $graph = '*' x ($Connect{$index} / $DayMax * 50);
       }
       else {
       $graph = "";
       }
# Print them.
       if (($index =~ /$SearchDay/) || $NO_DAY_KEY) {
        printf ("%-13s%3s %s\n",$heading,$Connect{$index},$graph);
       $SubTotal = $SubTotal + $Connect{$index};
       }
}
print "\n";
print "Total for this Section: $SubTotal\n\n" if !($NO_KEY);
}

if ($opt_t) {
$SubTotal=0;
# Iterate through the 'Hours' array to print the hourly information.  The
# first 10 numbers are quoted to preserve the leading 0.
@Hours = ('00','01','02','03','04','05','06','07','08','09',10 .. 23);
print "\n";
print "Connections Per Hour\n";
       printf ("%-8s%-30s\n%s\n"," Hour","Connections","---------------------------------------------------------------------");
foreach $heading (@Hours) {
       if ($HourMax > 0 ){
       $graph = '*' x ($Connect{$heading} / $HourMax * 50);
       }
       else {
       $graph = "";
       }
       if (($heading =~ /$SearchTime/) || $NO_TIME_KEY) {
       $SubTotal = $SubTotal + $Connect{$heading};
# Change the heading to read 'Midnight' if appropriate
       if ($heading == '00') {
               printf ("%-12s%4s %s\n","Midnight",$Connect{$heading},$graph);
       }
# Change the heading to read 'Noon' if appropriate
      elsif ($heading == '12') {
               printf ("%-12s%4s %s\n","Noon",$Connect{$heading},$graph);
       }
# Else just print the hour by number.
       else {
               printf ("%-12s%4s %s\n",$heading,$Connect{$heading},$graph);
       }
}
}
print "\n";
print "Total for this Section: $SubTotal\n\n" if !($NO_TIME_KEY);
}
if ($opt_f) {
$SubTotal=0;
print "\n";
print "Access Report for Directories and Files\n\n";
printf ("%s\n%s\n","Directory or File Accessed","-------------------------------------------------------------------------");
# Set an array of full pathnames
@DIRS = keys(%Directory);

# Sort that array.
@sortedDIRS = sort @DIRS;

foreach $heading (@sortedDIRS) {
# Calculate the number of stars for each entry (I add 1 just so that
# each line gets at least 1 star for visual reasons.  This actually sets
# the scale off a bit but it is just meant to be a quick reference any
# way.  If you want more accurate graphing remove the '+ 1' in the following
# line.
# Calculate the number of stars to print.

# StringLength is used in order to split the filename into multiple
# lines if it exceeds 30 characters.
$StringLength = 0;
       if ($FileMax > 0 ){
       $graph = '*' x (($Directory {$heading} / $FileMax * 30) + 1);
       }
       else {
       $graph = "";
       }
# Create an array consisting of the names in the full path without
# the '/'
       @heading_format = split (/\//,$heading);


$PRINT_ME = 1 if ($heading =~ /$SearchOthers/);

# Iterate through this list of file names that defines the path.  For each
# name in this path that is the same as the path of the previously printed
# file name prepend a leading '.' to the name.
# This is all to prevent the duplication of full pathnames in the output
# to make it more readable.

       for ($i=1;$i < $#heading_format;$i++) {
               if ($heading_format[$i] eq $previous_heading[$i]) {
                       $heading_format[$i] = ".";
               }
               else {
                       $heading_format[$i] = $heading_format[$i] . "/";
               }

               $StringLength = $StringLength + length($heading_format[$i]);
               if ($StringLength > 30){
                       $heading_format[$i] = $heading_format[$i] . "\n   ";
                       $StringLength = 3;
               }
       }

# Pad the output to create nice columns.  printf doesn't handle the
# multiple line filenames very well.
$Spaces = " " x (37 - ($StringLength + length ($heading_format[$i])));
$heading_format[$i] = "$heading_format[$i]$Spaces";

# If this is the root menu then label it as such.
       if ($heading eq "\/") {
               @heading_format = "Main_Menu                            ";
       }

# Set the printed array element delimiter to null
$" = "";

# Print it.
if ($i <= $verbosity_limit) {
       if ($PRINT_ME || $NO_OTHERS_KEY) {
       $SubTotal = $SubTotal + $Directory{$heading};
       printf ("%s%5s %s\n","@heading_format",$Directory{$heading},$graph);
       #printf ("%-35s%5s %s\n","@heading_format",$Directory{$heading},$graph);
       }
}

# Set this to be the previously printed filename.
       @previous_heading = split (/\//,$heading);
       $PRINT_ME = 0;
}
print "\n";
print "Total for this Section: $SubTotal\n\n" if !($NO_KEY);
}

if ($opt_h) {
$SubTotal=0;

# Print the name of each host and the number of time they connected.
print "Connections per Host\n";
sub by_connections {
       $Machines{$b} <=> $Machines{$a};
}
@HOSTS = keys(%Machines);
@sortedHOSTS = sort by_connections @HOSTS;
       printf ("%-30s%s\n%s\n","Host ","Number of Connections","---------------------------------------------------------------------");
foreach $heading (@sortedHOSTS) {
       if (($heading =~ /$SearchOthers/) || $NO_OTHERS_KEY) {
               if ($SubTotal < ($connections * ($verbosity_limit)/100)){
               $SubTotal = $SubTotal + $Machines{$heading};
               printf ("%-50s%s\n",$heading,$Machines{$heading});
               }
       }
}
print "\n";
print "Total for this Section: $SubTotal\n\n" if !($NO_KEY);
}
--

===============================================
Eric Katz ([email protected])
Computer Utilization Analyst