Path: usenet.cise.ufl.edu!huron.eel.ufl.edu!usenet.eel.ufl.edu!gatech!news-out.emf.net!news-out.cwix.com!newsfeed.cwix.com!209.251.183.12!newsfeed.corridex.com!nntp2.savvis.net!inetarena.com!not-for-mail
From: [email protected] (Jari Aalto+mail.emacs)
Newsgroups: comp.lang.perl.announce,comp.lang.perl.modules
Subject: Announce: mywebget.pl v1999.0210 - Batch get updates from Http ftp
        dirs.
Followup-To: comp.lang.perl.modules
Date: 24 Feb 1999 16:50:03 GMT
Organization: University of Tampere
Lines: 224
Approved: [email protected] (comp.lang.perl.announce)
Message-ID: <[email protected]>
NNTP-Posting-Host: halfdome.holdit.com
X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content.
Xref: usenet.cise.ufl.edu comp.lang.perl.announce:245 comp.lang.perl.modules:8792



       Download

       http://www.perl.com/CPAN-local//scripts/

       This is first public release.
       jari


NAME
   @(#) mywebget.pl - Perl Web URL retrieve program

SYNOPSIS
       mywebget.pl http://example.com/ [URL] ..
       mywebget.pl --file file-with-urls.txt
       mywebget.pl --verbose --overwrite http://example.com/
       mywebget.pl --verbose --overwrite --Output ~/dir/ http://example.com/

OPTIONS
 General options

       Create paths that do not exist in `lcd:' directives. Normally any
       LCD command that fails to find the path would interrupt the program.
       With this option the local directory is created as needed.

   --Firewall FIREWALL
       Use FIREWALL when accessing files via ftp:// protocol.

   --file FILE
       Read URLs from file. File can contains comments starting with # and
       the syntax is:

           #   @(#) $HOME/.mywebget.default - Perl configuration file
           #
           #   This is comment
           #   Another comment

           file://absolute/dir/file-1.23.tar.gz

               lcd:HOME/updates        # chdir here

           http://www.example.com/page.html
           http://www.example.com/page.html save:/dir/dir/page.html
           ftp://ftp.com/dir/file.txt save:xx-file.txt login:foo pass:passwd

               lcd:$HOME/download-kit

           ftp://ftp.com/dir/kit-1.1.tar.gz new:

       Possible keywords in the ftp:// line are

           `lcd:DIRECTORY'

           Set Local download directory to DIRECTORY. Any environment
           variables are substituted in path name. If this tag is found, it
           replaces setting of --Output. If path is not a directory,
           terminate with error. See also --create-paths.

           `login:LOGIN-NAME'

           Ftp login. Default value used is "ftp".

           `new:'

           If this is found from a current line, then the newest file will
           be retrieved. This variable is reset to the value of `--new'
           after the line has been processed.

           `pass:PASSWORD'

           Defulet value is generic mail\@some.com email address.

           `regexp:REGEXP'

           Get all afiles in ftp directory matching regexp. Keyword SAVE:
           is ignored.

           `save:LOCAL-FILE-NAME'

           Save file under this name to local disk.

   --new
       Get newest file. If filename does not end to .asp .html .htm, then
       it is considered that the URL point to some program or data file.
       When new releases are announced, the version number in filename
       usually tells which is the current one so getting harcoded file
       with:

           mtwebget.pl -o -v http://example.com/dir/program-1.3.tar.gz

       is not usually good choice. Adding --new option to the command line
       causes double pass: a) the whole http://example.com/dir/ is examined
       for all files. b) files matching approximately filename program-
       1.3.tar.gz are examined, sorted and file with latest version number
       in a is retrieved.

   --Output DIR
       Before retrieving any files, chdir to DIR.

   --overwrite
       Allow overwriting existing files when retrieving URLs.

   --prefix PREFIX
       Add PREFIX to all retrieved files.

   --Postfix POSTFIX -P POSTFIX
       Add PREFIX to all retrieved files.

   --prefix-date -D
       Add iso8601 ":YYYY-MM-DD" prefix to all retrived files. This is
       added before possible --prefix-www or --prefix.

       Add POSTFIX to all retrieved files.

   --prefix-www -W
       Usually the files are stored with the same names as the URL page,
       but if you retrieve files that have identical names you can store
       each page separately so that the file name is prefixed by the site
       name.

           http://example.com/page.html    --> example.com::page.html
           http://example2.com/page.html   --> example2.com::page.html

 Miscellaneous options

   --debug -d LEVEL
       Turn on debug with positive LEVEL number. Zero means no debug.

   --help -h
       Print help page.

   --Version -V
       Print program's version information.

README
   This small utility makes it possible to keep a list of URLs in a file
   and periodically retrieve those pages or files with simple command. This
   utility is best suited for small batch jobs to download eg. most recent
   versions of the software files. If you pass an URL that is already on
   disk, be sure to supply option --overwrite to allow overwriting old
   files.

   If the URL ends to slash, then the directory is list on the remote
   machine is stored to file name:

       !path!000root-file

   The content of this file can be either index.html or the directory
   listing depending on the used http or ftp protocol.

   While you can run this program from command line to retrieve individual
   files, it has been designed t use separate configuration file via --file
   option. In that configuration file you can control the downloading with
   separate directived like `save:' which tells to save the file under
   different name.

   The siplest way to retreive a latest version of a kit from FTP site is:

       mywebget.pl --new --overwite --verbose \
          http://www.example.com/kit-1.00.tar.gz

   Don't worry about the filename "kit-1.00.tar.gz". If there were kit-
   3.08.tar.gz in the site that one would be retrieve. The option --new
   instructs to find newer versions.

DESCRIPTION
   See readme.

EXAMPLES
   Read directory. It will be stored to YYYY-MM-DD::!dir!000root-file.
   Notice that you give the http directory and not the file name: `-D -o -
   v'

       mywebget.pl --prefix-date --overwrite --verbose http://www.example.com/dir/

   To overwrite file and add a date prefix to the file name: `-D -o -v'

       mywebget.pl --prefix-date --overwrite --verbose \
          http://www.example.com/file.pl

       --> YYYY-MM-DD::file.pl

   To add date and WWW site prefix to the filenames: `-D -W -o -v'

       mywebget.pl --prefix-date --prefix-www --overwrite --verbose \
          http://www.example.com/file.pl

       --> YYYY-MM-DD::www.example.com::file.pl

ENVIRONMENT
   No environment settings.

SEE ALSO
   C program wget(1) http://www.ccp14.ac.uk/mirror/wget.htm and Old Perl 4
   program webget(1) http://www.wg.omron.co.jp/~jfriedl/perl/

AVAILABILITY
   CPAN entry is http://www.perl.com/CPAN-local//scripts/ Reach author at
   [email protected] or http://www.netforward.com/poboxes/?jari.aalto

SCRIPT CATEGORIES
   CPAN/Administrative

PREREQUISITES
   Modules `LWP::UserAgent' and `use Net::FTP' are required.

COREQUISITES
   No optional CPAN modules needed.

OSNAMES
   `any'

VERSION
   $Id: mywebget.pl,v 1.12 1999/02/10 20:40:23 jaalto Exp $

AUTHOR
   Copyright (C) 1996-1999 Jari Aalto. All rights reserved. This program is
   free software; you can redistribute it and/or modify it under the same
   terms as Perl itself or in terms of Gnu General Public licence v2 or
   later.