Path: usenet.cis.ufl.edu!usenet.eel.ufl.edu!psgrain!nntp.teleport.com!usenet
From: [email protected] (James B. Crocker)
Newsgroups: comp.lang.perl.announce,comp.lang.perl.misc
Subject: SOFTWARE: WebCubeCensus 1.1.0 (Freeware)
Followup-To: comp.lang.perl.misc
Date: 17 Sep 1995 23:13:09 GMT
Organization: University of North Dakota; Grand Forks, ND
Lines: 340
Approved: [email protected] (comp.lang.perl.announce)
Message-ID: <[email protected]>
NNTP-Posting-Host: linda.teleport.com
Keywords: UNIX Web Server Administration Analysis Perl
X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content.
Xref: usenet.cis.ufl.edu comp.lang.perl.announce:125 comp.lang.perl.misc:6505

I'm placing this anouncement in this group as the
program mentioned is a Perl application. If this is
an improper post please remove it. Thank you.

Attention: UNIX Web Administrators.

For Web Administrators who maintain numerous files and allow others to administrate
web documents keeping track of each and every one can be chore. This perl kit is
designed for UNIX Web Server Administrators who need a tool to monitor AND report
potential errors to individual file owners/maintainers. This kit will examine
documents and check for defined tags, regular expressions, etc. WCC is also a log analysis
tool. Rather than generate a summary of information, each and every file that is
referenced in the access log is given its own log history. There are too many options
to discuss here.

If you think this package would be of benefit to you please take a detour to

       http://www.aero.und.nodak.edu/~crocker


README File -----------------------------------------------------------------------------

   WebCubeCensus Kit, Version 1.1.0

   Copyright (c) 1995 University of North Dakota and James B. Crocker, All rights reserved.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


I. PURPOSE

       CAVEAT: ################################################################
               This SCRIPT is designed to work on UNIX systems. If you are not
               maintaing a WWW Server on a UNIX system WebCubeCensus may not benefit you.
               Also, PERL is required to enact this script.
               ################################################################

       WebCubeCensus (wcc) is a Perl script which will scan a WWW server and validate file
       and directory attributes, html links and www server documents timeliness.

       1. File ATTRIBUTES
               a. Privileges
               b. Owner
               c. Group
               d. Modification AGE
               e. Access AGE
               f. Creation AGE

       2. Directory ATTRIBUTES
               a. Privileges
               b. Owner
               c. Group

       3. HTML/TEXT/BINARY Documents
               a. HREF (Including # references), SRC, FILE, VIRTUAL, BACKGROUND and ACTION links
               b. Notification for use of path ALIASES
               c. LOCAL SYSTEM file references
               d. Special TAGS
               e. Case Sensitive/Insensitive matches
               f. MIME header/ Server Side Include ERROR notification.

       4. Log History
               a. History of EACH FILE
               b. Log filtration
               c. Top FILES/CLIENTS/GROUPS/HOURS Hit

II. FEATURES

       WebCubeCensus (wcc) will provide this service with mailings of Errors and
       STATISTICS SUMMARY to file MAINTAINERS/OWNERS and the System Administrator.
       Also, thru use of a FORMAT FILE, wcc will check specific paths within
       the WWW serer root and check group specific options.

       1. Contacting MAINTAINER/OWNERS
               File MAINTAINERS/OWNERS are e-mailed a listing of
               documents which are/contain:
                       OUT-OF-DATE
                       IMPROPER PRIVILEGES
                       IMPROPER GROUP
                       MALFORMED or In-Valid LINKS
                       NO LOCAL SYSTEM file references
                       CONTAIN IMPROPER/NONEXISTENT TAGS

       2. Mailing of STATISTICS SUMMARY
               a. System Administrators are mailed the STATISTICS SUMMARY
                  of ALL WWW server information.
               b. At the users option the STATISTICS SUMMARIES may be marked in
                  HTML format for browsing with a web client (HTML 3.0 Compliant).

       3. Users may define a FORMAT FILE which will allow for exclusive path
          checking inclusive of the WWW server root. Thus, users may define
          differing options for various paths under the WWW server root.
          For information on creating a FORMAT FILE use wcc -help.
          This is a beneficial feature for file/directory maintenance.

       4. If STATISTICS SUMMARY are dumped to an output file, then WebCubeCensus will
          provide an ESTIMATED time of completion and the percentage complete.
          The percentage may be defined from the command line.

       5. Users may select MARKING of files (which are OUT-OF-DATE or
          contain NO LOCAL SYSTEM file references) for deletion with the
          -mark option of WebCubeCensus (wcc).

       6. Files which are MARKED for deletion may be deleted after
          the System Administrator has reviewed the STATISTICS SUMMARY. They may
          be deleted using the STATISTICS SUMMARY in conjunction with
          the -delete option.

       7. COPIUS information may be provided which will indicate the exact line that
          an option or error was found.

       8. Will work with NCSA/Other servers and the Netscape server.

       9. Will allow for additional paths to search OUTSIDE of the WWW Server Documene ROOT.

      10. A slew of options is available for allowing unique WWW Server
          maintenance. Please refer to these with wcc -help.

III. INSTALLATION/USE

       CAVEAT: ################################################################
               This SCRIPT is designed to work on UNIX systems. If you are not
               maintaing a WWW Server on a UNIX system WebCubeCensus may not benefit you.
               Also, PERL is required to enact this script.
               ################################################################

       1. To use WebCubeCensus obtain a version of WebCubeCensus in an archived tar file.
          A copy may be obtained from ftp://www.aero.und.nodak.edu/pub/wcc-1.1.0.tar

       2. Untar the file: tar xvf wcc-1.1.0.tar

       3. Enter the directory wcc-1.1.0

       4. Make sure to read the README and Copying files.

       5. Make sure the attributes of the script wcc, URL.pl, url_get.pl and ftplib.pl
          have AT LEAST -r-x------ privileges.

       6. Running a WWW Server Check:

              *The BEST source of help is provided by wcc -help

              *If your HTTPD server's configuration files are in a location OTHER
               than the DEFAULT: /usr/local/lib/httpd/conf (or) /usr/local/lib/netscape-server/httpd-*/config
               be sure to provide the correct location with the -conf option.

               WebCubeCensus is called as wcc with ANY of the following options.
               Order of the options is NOT dependent.

               wcc  [-awarn -cflag -conf -copious -days -dgroup -downer -dpriv -ffile -ffsin
                     -fgroup -fowner -fpriv -hash - host -html -index -linksv -local -log -logh -mail
                     -mails -mailm -mailma -mailmf -mailml -mailmn -mailms -mark -mime -ncsa -ofile -path
                     -pdir -port -public -root -stags -topc -topf -topg -toph]

       7. Deleting MARKED files:

              *The BEST source of help is provided by wcc -help

              *Any file which have the + appended before the PATH of a file in the
               SUMMARY STATISTICS will be deleted. A message of success/unsuccess will
               be included in the SUMMARY STATISTICS for later reference if necessary.

               wcc -delete [-ifile -ofile]


       8. Defining a FORMAT FILE:

    The benefits of defining a FORMAT FILE are twofold. First, setting options
    for groupings allow for exclusive wcc checking parameters inclusive of the
    files listed in the root path. The second is that use of a FORMAT FILE
    allows for greater flexibility in file/directory checking options.

    A FORMAT FILE consists of named groupings and the settings for that group.
    Here is an example: group1:path     /var/spool/www/docs
    The GROUP may be ANY name of ANY case. For example: Fred, BaRnEy, whatever.
    Each GROUP is then UNIQUE. Fred does not belong to the group FrEd and vice
    versa. Having created a group name the following options may be given:
    awarn, copious, days, downer, dgroup, dpriv, fowner, fgroup, fpriv, host,
    log, logf, logl, links, linksv, mailm, mailma, mailmf, mailml, mailmn,
    mailms, mark, mime, path, port, stags, tagci, tagii, tagcx, tagix.
    Each of these has been previously defined with the exception of tagc[ix]
    and tagi[ix]. I'll get to those momentarily. For the interim here is how
    a sample FORMAT FILE might be configured:

   FORMAT FILE for www.aero.und.nodak.edu WWW Server.

   aviation:path           /var/spool/www/Academic_Departments/Aviation
   aviation:awarn
   aviation:copious
   aviation:days           45
   aviation:downer         1000
   aviation:dgroup         www
   aviation:dpriv          2775
   aviation:fowner         crocker
   aviation:fgroup         www
   aviation:fpriv          0664
   avaition:local
   aviation:links
   aviation:log
   aviation:logf           *cas.und.nodak.edu,agassiz*
   aviation:logh
   aviation:mailm
   aviation:mailma         aero.und.nodak.edu
   aviation:mailmf         WWW Aviation Administrator
   aviation:mailmn         Questions/Comments call (701)777-2964
   aviation:mark
   aviation:stags
   aviation:tagii           <!--#include\s*(file|virtual)\s*=\s*+\"?\S+\"?\s*>
   aviation:tagci           Last Modified
   aviation:tagii           DATE:\w+\s*\d{4}

   CSci:path               /var/spool/www/Academic_Departments/Computer_Science
   CSci:copious
   CSci:days               25
   CSci:downer             1000
   CSci:dgroup             csci
   CSci:dpriv              2775
   CSci:fowner             crocker
   CSci:fgroup             csci
   CSci:fpriv              0644
   CSci:mailm
   CSci:mailma             cs.und.nodak.edu
   CSci:mailmf             WWW CSci Administrator
   CSci:mailmn             Questions/Comments call (701)777-2964
   CSci:stags
   CSci:tagii               <!--#include\s*(file|virtual)\s*=\s*+\"?\S+\"?\s*>
   CSci:tagci               Last Modified

   END-OF-FORMATS

    The (tagc[ix], tagi[ix]) options are ONLY available in the FORMAT FILE!
    These options check for expressions which are INCLUDED and/or EXCLUDED from
    files UNDER the group path. The tagc[ix], tagi[ix] options accept VALID
    Perl Regular Expressions. If they are incorrect you will be notified.
    The option tagci checks CASE Sensitive expressions which SHOULD be in the
    file. Option tagcx checks CASE Sesitive expressions which SHOULD NOT be in
    the file. Option tagii checks CASE INSensitive expressions whcih SHOULD be
    in the file. And tagix checks CASE INSensitive expressions which SHOULD NOT
    be in the file. You may have UNLIMITED number of tag[ci][ix] selections.

   NOTES/WARNINGS:

            # WARNING: Files and Directories BENEATH the root path will inherit
              settings and options for checking from paths ABOVE it. That is to
              say that /var/spool/www/DIR0/DIR1 will inherit the options for
              checking from the /var/spool/www/DIR0 path options. So, if you
              request that /var/spool/www and ALL paths beneath it have certain
              check options then the only way to displace those settings is to
              use the FORMAT FILE option and define the group PATH and the
              check options in the FORMAT FILE (OR) define the command line
              otions -path -fpriv -fowner -fgroup etc...

            # WARNING: The STATISTICS SUMMARY file may grow to 600+KB. If the
              file is marked for HTML browsing, the MOST efficient and
              effective means to VIEW the file is to load it in UNIX Netscape.
              The memory restrictions are nearly inconsequential. However, if
              you intend on VIEWING the HTML STATISTICS SUMMARY on another
              platform you will need to max out on the clients memory
              partions/allocations. For a 650KB HTML STATISTICS SUMMARY,
              Netscape 1.1N for MacOS required 17MB of memory to load the page
              and display it.

           1. Groups and their options in the FORMAT FILE do NOT have to be
              in any particular order.

           2. The spacing after the group:option is irrelevant. Spacing
              AFTER certain options is. IE. group:path. If there is additional
              spacing the path searched will include the blank spaces.


           3. Any group:option with a # prepended will be skipped as a comment.
           4. Blank lines, and lines WITHOUT a group:option are passed over.
              as comments.

           5. To search/check from the root down AND use a FORMAT FILE either
              define the root path in the FORMAT FILE and give the options
              for files/directories beneath it (Excluding the settings for
              paths you've singled out.) OR use the FORMAT FILE and define
              the root path with the -path and/or -root options and the
              other appropriate options.

           6. It is pointless to define a group of options if you neglect to
              give it a path option.

           7. To use the COLOR ERROR FLAGS in the STATISTICS SUMMARY the image
              files MUST be in the SAME directory as the STATISTICS SUMMARY.
           8. For paths with 500+ files the time to check will be
              40 minutes or more.

           9. Using the -local option will incur a LARGE process execution
              time. It is a recursive call to scan ALL of the server documents.

          10. If you set the STATISTICS SUMMARY to be marked with HTML tags
              for later browsing note that the file will be large. You may
              need to split it OR give your web client more memory to read it.
              The HTML tags are HTML 3.0 compliant (TABLES).

          11. The url_get.pl package was MODIFIED to work in this script!

          12. Credit for the url_get goes to the authors Jack Lund et al.
              It is really a SHARP Perl kit w/o which I would not have
              this one.

          13. WebCubeCensus (wcc) executes from PERL 4.0 or greater.

          14. If you have any trouble, concerns, comments or questions please
              contact me at:

                   James B. Crocker
                   crocker\@aero.und.nodak.edu

                   University of North Dakota
                   UND Aerospace
                   Scientific Computing Center (SCC)
                   Internet Services

                   PO Box 9022
                   Grand Forks, ND 58202-9022

                   VOICE:(701)777-2964
                   FAX:(701)777-2940

   # Copyright (c) 1995 Univeristy of North Dakota and James B. Crocker, All rights reserved.
   ==================================================