Path: usenet.cis.ufl.edu!usenet.eel.ufl.edu!psgrain!nntp.teleport.com!usenet
From:
[email protected] (James B. Crocker)
Newsgroups: comp.lang.perl.announce,comp.lang.perl.misc
Subject: SOFTWARE: WebCubeCensus 1.1.0 (Freeware)
Followup-To: comp.lang.perl.misc
Date: 17 Sep 1995 23:13:09 GMT
Organization: University of North Dakota; Grand Forks, ND
Lines: 340
Approved:
[email protected] (comp.lang.perl.announce)
Message-ID: <
[email protected]>
NNTP-Posting-Host: linda.teleport.com
Keywords: UNIX Web Server Administration Analysis Perl
X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content.
Xref: usenet.cis.ufl.edu comp.lang.perl.announce:125 comp.lang.perl.misc:6505
I'm placing this anouncement in this group as the
program mentioned is a Perl application. If this is
an improper post please remove it. Thank you.
Attention: UNIX Web Administrators.
For Web Administrators who maintain numerous files and allow others to administrate
web documents keeping track of each and every one can be chore. This perl kit is
designed for UNIX Web Server Administrators who need a tool to monitor AND report
potential errors to individual file owners/maintainers. This kit will examine
documents and check for defined tags, regular expressions, etc. WCC is also a log analysis
tool. Rather than generate a summary of information, each and every file that is
referenced in the access log is given its own log history. There are too many options
to discuss here.
If you think this package would be of benefit to you please take a detour to
http://www.aero.und.nodak.edu/~crocker
README File -----------------------------------------------------------------------------
WebCubeCensus Kit, Version 1.1.0
Copyright (c) 1995 University of North Dakota and James B. Crocker, All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
I. PURPOSE
CAVEAT: ################################################################
This SCRIPT is designed to work on UNIX systems. If you are not
maintaing a WWW Server on a UNIX system WebCubeCensus may not benefit you.
Also, PERL is required to enact this script.
################################################################
WebCubeCensus (wcc) is a Perl script which will scan a WWW server and validate file
and directory attributes, html links and www server documents timeliness.
1. File ATTRIBUTES
a. Privileges
b. Owner
c. Group
d. Modification AGE
e. Access AGE
f. Creation AGE
2. Directory ATTRIBUTES
a. Privileges
b. Owner
c. Group
3. HTML/TEXT/BINARY Documents
a. HREF (Including # references), SRC, FILE, VIRTUAL, BACKGROUND and ACTION links
b. Notification for use of path ALIASES
c. LOCAL SYSTEM file references
d. Special TAGS
e. Case Sensitive/Insensitive matches
f. MIME header/ Server Side Include ERROR notification.
4. Log History
a. History of EACH FILE
b. Log filtration
c. Top FILES/CLIENTS/GROUPS/HOURS Hit
II. FEATURES
WebCubeCensus (wcc) will provide this service with mailings of Errors and
STATISTICS SUMMARY to file MAINTAINERS/OWNERS and the System Administrator.
Also, thru use of a FORMAT FILE, wcc will check specific paths within
the WWW serer root and check group specific options.
1. Contacting MAINTAINER/OWNERS
File MAINTAINERS/OWNERS are e-mailed a listing of
documents which are/contain:
OUT-OF-DATE
IMPROPER PRIVILEGES
IMPROPER GROUP
MALFORMED or In-Valid LINKS
NO LOCAL SYSTEM file references
CONTAIN IMPROPER/NONEXISTENT TAGS
2. Mailing of STATISTICS SUMMARY
a. System Administrators are mailed the STATISTICS SUMMARY
of ALL WWW server information.
b. At the users option the STATISTICS SUMMARIES may be marked in
HTML format for browsing with a web client (HTML 3.0 Compliant).
3. Users may define a FORMAT FILE which will allow for exclusive path
checking inclusive of the WWW server root. Thus, users may define
differing options for various paths under the WWW server root.
For information on creating a FORMAT FILE use wcc -help.
This is a beneficial feature for file/directory maintenance.
4. If STATISTICS SUMMARY are dumped to an output file, then WebCubeCensus will
provide an ESTIMATED time of completion and the percentage complete.
The percentage may be defined from the command line.
5. Users may select MARKING of files (which are OUT-OF-DATE or
contain NO LOCAL SYSTEM file references) for deletion with the
-mark option of WebCubeCensus (wcc).
6. Files which are MARKED for deletion may be deleted after
the System Administrator has reviewed the STATISTICS SUMMARY. They may
be deleted using the STATISTICS SUMMARY in conjunction with
the -delete option.
7. COPIUS information may be provided which will indicate the exact line that
an option or error was found.
8. Will work with NCSA/Other servers and the Netscape server.
9. Will allow for additional paths to search OUTSIDE of the WWW Server Documene ROOT.
10. A slew of options is available for allowing unique WWW Server
maintenance. Please refer to these with wcc -help.
III. INSTALLATION/USE
CAVEAT: ################################################################
This SCRIPT is designed to work on UNIX systems. If you are not
maintaing a WWW Server on a UNIX system WebCubeCensus may not benefit you.
Also, PERL is required to enact this script.
################################################################
1. To use WebCubeCensus obtain a version of WebCubeCensus in an archived tar file.
A copy may be obtained from
ftp://www.aero.und.nodak.edu/pub/wcc-1.1.0.tar
2. Untar the file: tar xvf wcc-1.1.0.tar
3. Enter the directory wcc-1.1.0
4. Make sure to read the README and Copying files.
5. Make sure the attributes of the script wcc, URL.pl, url_get.pl and ftplib.pl
have AT LEAST -r-x------ privileges.
6. Running a WWW Server Check:
*The BEST source of help is provided by wcc -help
*If your HTTPD server's configuration files are in a location OTHER
than the DEFAULT: /usr/local/lib/httpd/conf (or) /usr/local/lib/netscape-server/httpd-*/config
be sure to provide the correct location with the -conf option.
WebCubeCensus is called as wcc with ANY of the following options.
Order of the options is NOT dependent.
wcc [-awarn -cflag -conf -copious -days -dgroup -downer -dpriv -ffile -ffsin
-fgroup -fowner -fpriv -hash - host -html -index -linksv -local -log -logh -mail
-mails -mailm -mailma -mailmf -mailml -mailmn -mailms -mark -mime -ncsa -ofile -path
-pdir -port -public -root -stags -topc -topf -topg -toph]
7. Deleting MARKED files:
*The BEST source of help is provided by wcc -help
*Any file which have the + appended before the PATH of a file in the
SUMMARY STATISTICS will be deleted. A message of success/unsuccess will
be included in the SUMMARY STATISTICS for later reference if necessary.
wcc -delete [-ifile -ofile]
8. Defining a FORMAT FILE:
The benefits of defining a FORMAT FILE are twofold. First, setting options
for groupings allow for exclusive wcc checking parameters inclusive of the
files listed in the root path. The second is that use of a FORMAT FILE
allows for greater flexibility in file/directory checking options.
A FORMAT FILE consists of named groupings and the settings for that group.
Here is an example: group1:path /var/spool/www/docs
The GROUP may be ANY name of ANY case. For example: Fred, BaRnEy, whatever.
Each GROUP is then UNIQUE. Fred does not belong to the group FrEd and vice
versa. Having created a group name the following options may be given:
awarn, copious, days, downer, dgroup, dpriv, fowner, fgroup, fpriv, host,
log, logf, logl, links, linksv, mailm, mailma, mailmf, mailml, mailmn,
mailms, mark, mime, path, port, stags, tagci, tagii, tagcx, tagix.
Each of these has been previously defined with the exception of tagc[ix]
and tagi[ix]. I'll get to those momentarily. For the interim here is how
a sample FORMAT FILE might be configured:
FORMAT FILE for www.aero.und.nodak.edu WWW Server.
aviation:path /var/spool/www/Academic_Departments/Aviation
aviation:awarn
aviation:copious
aviation:days 45
aviation:downer 1000
aviation:dgroup www
aviation:dpriv 2775
aviation:fowner crocker
aviation:fgroup www
aviation:fpriv 0664
avaition:local
aviation:links
aviation:log
aviation:logf *cas.und.nodak.edu,agassiz*
aviation:logh
aviation:mailm
aviation:mailma aero.und.nodak.edu
aviation:mailmf WWW Aviation Administrator
aviation:mailmn Questions/Comments call (701)777-2964
aviation:mark
aviation:stags
aviation:tagii <!--#include\s*(file|virtual)\s*=\s*+\"?\S+\"?\s*>
aviation:tagci Last Modified
aviation:tagii DATE:\w+\s*\d{4}
CSci:path /var/spool/www/Academic_Departments/Computer_Science
CSci:copious
CSci:days 25
CSci:downer 1000
CSci:dgroup csci
CSci:dpriv 2775
CSci:fowner crocker
CSci:fgroup csci
CSci:fpriv 0644
CSci:mailm
CSci:mailma cs.und.nodak.edu
CSci:mailmf WWW CSci Administrator
CSci:mailmn Questions/Comments call (701)777-2964
CSci:stags
CSci:tagii <!--#include\s*(file|virtual)\s*=\s*+\"?\S+\"?\s*>
CSci:tagci Last Modified
END-OF-FORMATS
The (tagc[ix], tagi[ix]) options are ONLY available in the FORMAT FILE!
These options check for expressions which are INCLUDED and/or EXCLUDED from
files UNDER the group path. The tagc[ix], tagi[ix] options accept VALID
Perl Regular Expressions. If they are incorrect you will be notified.
The option tagci checks CASE Sensitive expressions which SHOULD be in the
file. Option tagcx checks CASE Sesitive expressions which SHOULD NOT be in
the file. Option tagii checks CASE INSensitive expressions whcih SHOULD be
in the file. And tagix checks CASE INSensitive expressions which SHOULD NOT
be in the file. You may have UNLIMITED number of tag[ci][ix] selections.
NOTES/WARNINGS:
# WARNING: Files and Directories BENEATH the root path will inherit
settings and options for checking from paths ABOVE it. That is to
say that /var/spool/www/DIR0/DIR1 will inherit the options for
checking from the /var/spool/www/DIR0 path options. So, if you
request that /var/spool/www and ALL paths beneath it have certain
check options then the only way to displace those settings is to
use the FORMAT FILE option and define the group PATH and the
check options in the FORMAT FILE (OR) define the command line
otions -path -fpriv -fowner -fgroup etc...
# WARNING: The STATISTICS SUMMARY file may grow to 600+KB. If the
file is marked for HTML browsing, the MOST efficient and
effective means to VIEW the file is to load it in UNIX Netscape.
The memory restrictions are nearly inconsequential. However, if
you intend on VIEWING the HTML STATISTICS SUMMARY on another
platform you will need to max out on the clients memory
partions/allocations. For a 650KB HTML STATISTICS SUMMARY,
Netscape 1.1N for MacOS required 17MB of memory to load the page
and display it.
1. Groups and their options in the FORMAT FILE do NOT have to be
in any particular order.
2. The spacing after the group:option is irrelevant. Spacing
AFTER certain options is. IE. group:path. If there is additional
spacing the path searched will include the blank spaces.
3. Any group:option with a # prepended will be skipped as a comment.
4. Blank lines, and lines WITHOUT a group:option are passed over.
as comments.
5. To search/check from the root down AND use a FORMAT FILE either
define the root path in the FORMAT FILE and give the options
for files/directories beneath it (Excluding the settings for
paths you've singled out.) OR use the FORMAT FILE and define
the root path with the -path and/or -root options and the
other appropriate options.
6. It is pointless to define a group of options if you neglect to
give it a path option.
7. To use the COLOR ERROR FLAGS in the STATISTICS SUMMARY the image
files MUST be in the SAME directory as the STATISTICS SUMMARY.
8. For paths with 500+ files the time to check will be
40 minutes or more.
9. Using the -local option will incur a LARGE process execution
time. It is a recursive call to scan ALL of the server documents.
10. If you set the STATISTICS SUMMARY to be marked with HTML tags
for later browsing note that the file will be large. You may
need to split it OR give your web client more memory to read it.
The HTML tags are HTML 3.0 compliant (TABLES).
11. The url_get.pl package was MODIFIED to work in this script!
12. Credit for the url_get goes to the authors Jack Lund et al.
It is really a SHARP Perl kit w/o which I would not have
this one.
13. WebCubeCensus (wcc) executes from PERL 4.0 or greater.
14. If you have any trouble, concerns, comments or questions please
contact me at:
James B. Crocker
crocker\@aero.und.nodak.edu
University of North Dakota
UND Aerospace
Scientific Computing Center (SCC)
Internet Services
PO Box 9022
Grand Forks, ND 58202-9022
VOICE:(701)777-2964
FAX:(701)777-2940
# Copyright (c) 1995 Univeristy of North Dakota and James B. Crocker, All rights reserved.
==================================================