Path: usenet.cise.ufl.edu!huron.eel.ufl.edu!usenet.eel.ufl.edu!gatech!news-out.emf.net!news-out.cwix.com!newsfeed.cwix.com!209.251.183.12!newsfeed.corridex.com!nntp2.savvis.net!inetarena.com!not-for-mail
From:
[email protected] (Jari Aalto+mail.emacs)
Newsgroups: comp.lang.perl.announce,comp.lang.perl.modules
Subject: Announce: mywebget.pl v1999.0210 - Batch get updates from Http ftp
dirs.
Followup-To: comp.lang.perl.modules
Date: 24 Feb 1999 16:50:03 GMT
Organization: University of Tampere
Lines: 224
Approved:
[email protected] (comp.lang.perl.announce)
Message-ID: <
[email protected]>
NNTP-Posting-Host: halfdome.holdit.com
X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content.
Xref: usenet.cise.ufl.edu comp.lang.perl.announce:245 comp.lang.perl.modules:8792
Download
http://www.perl.com/CPAN-local//scripts/
This is first public release.
jari
NAME
@(#) mywebget.pl - Perl Web URL retrieve program
SYNOPSIS
mywebget.pl
http://example.com/ [URL] ..
mywebget.pl --file file-with-urls.txt
mywebget.pl --verbose --overwrite
http://example.com/
mywebget.pl --verbose --overwrite --Output ~/dir/
http://example.com/
OPTIONS
General options
Create paths that do not exist in `lcd:' directives. Normally any
LCD command that fails to find the path would interrupt the program.
With this option the local directory is created as needed.
--Firewall FIREWALL
Use FIREWALL when accessing files via ftp:// protocol.
--file FILE
Read URLs from file. File can contains comments starting with # and
the syntax is:
# @(#) $HOME/.mywebget.default - Perl configuration file
#
# This is comment
# Another comment
file://absolute/dir/file-1.23.tar.gz
lcd:HOME/updates # chdir here
http://www.example.com/page.html
http://www.example.com/page.html save:/dir/dir/page.html
ftp://ftp.com/dir/file.txt save:xx-file.txt login:foo pass:passwd
lcd:$HOME/download-kit
ftp://ftp.com/dir/kit-1.1.tar.gz new:
Possible keywords in the ftp:// line are
`lcd:DIRECTORY'
Set Local download directory to DIRECTORY. Any environment
variables are substituted in path name. If this tag is found, it
replaces setting of --Output. If path is not a directory,
terminate with error. See also --create-paths.
`login:LOGIN-NAME'
Ftp login. Default value used is "ftp".
`new:'
If this is found from a current line, then the newest file will
be retrieved. This variable is reset to the value of `--new'
after the line has been processed.
`pass:PASSWORD'
Defulet value is generic mail\@some.com email address.
`regexp:REGEXP'
Get all afiles in ftp directory matching regexp. Keyword SAVE:
is ignored.
`save:LOCAL-FILE-NAME'
Save file under this name to local disk.
--new
Get newest file. If filename does not end to .asp .html .htm, then
it is considered that the URL point to some program or data file.
When new releases are announced, the version number in filename
usually tells which is the current one so getting harcoded file
with:
mtwebget.pl -o -v
http://example.com/dir/program-1.3.tar.gz
is not usually good choice. Adding --new option to the command line
causes double pass: a) the whole
http://example.com/dir/ is examined
for all files. b) files matching approximately filename program-
1.3.tar.gz are examined, sorted and file with latest version number
in a is retrieved.
--Output DIR
Before retrieving any files, chdir to DIR.
--overwrite
Allow overwriting existing files when retrieving URLs.
--prefix PREFIX
Add PREFIX to all retrieved files.
--Postfix POSTFIX -P POSTFIX
Add PREFIX to all retrieved files.
--prefix-date -D
Add iso8601 ":YYYY-MM-DD" prefix to all retrived files. This is
added before possible --prefix-www or --prefix.
Add POSTFIX to all retrieved files.
--prefix-www -W
Usually the files are stored with the same names as the URL page,
but if you retrieve files that have identical names you can store
each page separately so that the file name is prefixed by the site
name.
http://example.com/page.html --> example.com::page.html
http://example2.com/page.html --> example2.com::page.html
Miscellaneous options
--debug -d LEVEL
Turn on debug with positive LEVEL number. Zero means no debug.
--help -h
Print help page.
--Version -V
Print program's version information.
README
This small utility makes it possible to keep a list of URLs in a file
and periodically retrieve those pages or files with simple command. This
utility is best suited for small batch jobs to download eg. most recent
versions of the software files. If you pass an URL that is already on
disk, be sure to supply option --overwrite to allow overwriting old
files.
If the URL ends to slash, then the directory is list on the remote
machine is stored to file name:
!path!000root-file
The content of this file can be either index.html or the directory
listing depending on the used http or ftp protocol.
While you can run this program from command line to retrieve individual
files, it has been designed t use separate configuration file via --file
option. In that configuration file you can control the downloading with
separate directived like `save:' which tells to save the file under
different name.
The siplest way to retreive a latest version of a kit from FTP site is:
mywebget.pl --new --overwite --verbose \
http://www.example.com/kit-1.00.tar.gz
Don't worry about the filename "kit-1.00.tar.gz". If there were kit-
3.08.tar.gz in the site that one would be retrieve. The option --new
instructs to find newer versions.
DESCRIPTION
See readme.
EXAMPLES
Read directory. It will be stored to YYYY-MM-DD::!dir!000root-file.
Notice that you give the http directory and not the file name: `-D -o -
v'
mywebget.pl --prefix-date --overwrite --verbose
http://www.example.com/dir/
To overwrite file and add a date prefix to the file name: `-D -o -v'
mywebget.pl --prefix-date --overwrite --verbose \
http://www.example.com/file.pl
--> YYYY-MM-DD::file.pl
To add date and WWW site prefix to the filenames: `-D -W -o -v'
mywebget.pl --prefix-date --prefix-www --overwrite --verbose \
http://www.example.com/file.pl
--> YYYY-MM-DD::www.example.com::file.pl
ENVIRONMENT
No environment settings.
SEE ALSO
C program wget(1)
http://www.ccp14.ac.uk/mirror/wget.htm and Old Perl 4
program webget(1)
http://www.wg.omron.co.jp/~jfriedl/perl/
AVAILABILITY
CPAN entry is
http://www.perl.com/CPAN-local//scripts/ Reach author at
[email protected] or
http://www.netforward.com/poboxes/?jari.aalto
SCRIPT CATEGORIES
CPAN/Administrative
PREREQUISITES
Modules `LWP::UserAgent' and `use Net::FTP' are required.
COREQUISITES
No optional CPAN modules needed.
OSNAMES
`any'
VERSION
$Id: mywebget.pl,v 1.12 1999/02/10 20:40:23 jaalto Exp $
AUTHOR
Copyright (C) 1996-1999 Jari Aalto. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same
terms as Perl itself or in terms of Gnu General Public licence v2 or
later.