Article 11300 of comp.infosystems.www:
Newsgroups: comp.infosystems.www
Path: feenix.metronet.com!news.ecn.bgu.edu!mp.cs.niu.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!EU.net!Germany.EU.net!netmbx.de!zrz.TU-Berlin.DE!zib-berlin.de!uni-paderborn.de!urmel.informatik.rwth-aachen.de!news.dfn.de!scsing.switch.ch!swidir.switch.ch!news.unige.ch!usenet
From:
[email protected] (Oscar Nierstrasz)
Subject: htget -- script to MIRROR WWW files and directories
Message-ID: <
[email protected]>
Sender:
[email protected]
Reply-To:
[email protected]
Organization: University of Geneva, Switzerland
Date: Fri, 25 Mar 1994 08:51:07 GMT
Lines: 62
Something that's been on my "to do" list for some time now ...
This is a first announcement for `htget', a perl script for
mirroring WWW files and directories (using HTTP only).
It is an evolution of `hget' an earlier script that can retrieve
individual files only.
`htget URL' makes a copy of the remote file in the current directory
`htget -s URL' copies the file to stdout (as hget does by default)
`htget -abs URL' converts all relative URLs to absolute URLs
(so that the local file will contain correct links)
`htget -r URL' will *recursively* retrieve the file and all other
files reachable from that URL *provided* they have the
same prefix (i.e., reside in the same directory hierarchy)
The interesting case is the last one. Htget will try to re-create the
required directory hierarchy, will convert all relative URLs to
absolute ones, *except* those of retrieved pages, which will be
made relative (so all links will be to the mirrored pages, not the
original ones). Htget also tries to make intelligent decisions
about which files should be called "index.html" (and tries to recover
if trailing slashes are left off directory URLs).
htget can be found at:
http://cui_www.unige.ch/ftp/PUBLIC/oscar/scripts/README.html
You will also need url.pl and ftplib.pl (at the same location).
htget has been used experimentally to mirror the WWW 94 and
OOPSLA 94 conference servers:
http://cui_www.unige.ch/WWW94/CERN/
http://cui_www.unige.ch/OSG/OOinfo/Conf/OOPSLA/
Naturally you should only use this when you are sure you really
want to mirror a whole directory! htget does give you feedback
about files it is retrieving and their sizes.
If you use this script and discover any problems, please let me know.
Oscar Nierstrasz
World Wide Web 94 Programme Chair
__________________________________________________________________________
Attend WWW 94 in Geneva! Contribute a hypertalk!
Being held at CERN, May 25-27, 1994.
See:
http://www1.cern.ch/WWW94/Welcome.html
__________________________________________________________________________
Oscar Nierstrasz -- M.E.R. (Assistant Professor) | Prefix: +41 22
Centre Universitaire d'Informatique, University of Geneva | Tel: 705.7664
24, rue General-Dufour -- CH-1211 Geneva 4 -- SWITZERLAND | Sec: 705.7770
E-mail:
[email protected] | Fax: 320.2927
Ftp: cui.unige.ch:/OO-articles |
WWW:
http://cui_www.unige.ch/OSG/Oscar/home.html | Home: 733.9568
__________________________________________________________________________