NAME
   URI::Fetch - Smart URI fetching/caching

SYNOPSIS
       use URI::Fetch;

       ## Simple fetch.
       my $res = URI::Fetch->fetch('http://example.com/atom.xml')
           or die URI::Fetch->errstr;

       ## Fetch using specified ETag and Last-Modified headers.
       $res = URI::Fetch->fetch('http://example.com/atom.xml',
               ETag => '123-ABC',
               LastModified => time - 3600,
       )
           or die URI::Fetch->errstr;

       ## Fetch using an on-disk cache that URI::Fetch manages for you.
       my $cache = Cache::File->new( cache_root => '/tmp/cache' );
       $res = URI::Fetch->fetch('http://example.com/atom.xml',
               Cache => $cache
       )
           or die URI::Fetch->errstr;

DESCRIPTION
   *URI::Fetch* is a smart client for fetching HTTP pages, notably
   syndication feeds (RSS, Atom, and others), in an intelligent, bandwidth-
   and time-saving way. That means:

   *   GZIP support

       If you have *Compress::Zlib* installed, *URI::Fetch* will
       automatically try to download a compressed version of the content,
       saving bandwidth (and time).

   *   *Last-Modified* and *ETag* support

       If you use a local cache (see the *Cache* parameter to *fetch*),
       *URI::Fetch* will keep track of the *Last-Modified* and *ETag*
       headers from the server, allowing you to only download pages that
       have been modified since the last time you checked.

   *   Proper understanding of HTTP error codes

       Certain HTTP error codes are special, particularly when fetching
       syndication feeds, and well-written clients should pay special
       attention to them. *URI::Fetch* can only do so much for you in this
       regard, but it gives you the tools to be a well-written client.

       The response from *fetch* gives you the raw HTTP response code,
       along with special handling of 4 codes:

       *   200 (OK)

           Signals that the content of a page/feed was retrieved
           successfully.

       *   301 (Moved Permanently)

           Signals that a page/feed has moved permanently, and that your
           database of feeds should be updated to reflect the new URI.

       *   304 (Not Modified)

           Signals that a page/feed has not changed since it was last
           fetched.

       *   410 (Gone)

           Signals that a page/feed is gone and will never be coming back,
           so you should stop trying to fetch it.

USAGE
 URI::Fetch->fetch($uri, %param)
   Fetches a page identified by the URI *$uri*.

   On success, returns a *URI::Fetch::Response* object; on failure, returns
   "undef".

   *%param* can contain:

   *   LastModified

   *   ETag

       *LastModified* and *ETag* can be supplied to force the server to
       only return the full page if it's changed since the last request. If
       you're writing your own feed client, this is recommended practice,
       because it limits both your bandwidth use and the server's.

       If you'd rather not have to store the *LastModified* time and *ETag*
       yourself, see the *Cache* parameter below (and the SYNOPSIS above).

   *   Cache

       If you'd like *URI::Fetch* to cache responses between requests,
       provide the *Cache* parameter with an object supporting the Cache
       API (e.g. *Cache::File*, *Cache::Memory*). Specifically, an object
       that supports "$cache->get($key)" and "$cache->set($key, $value,
       $expires)".

       If supplied, *URI::Fetch* will store the page content, ETag, and
       last-modified time of the response in the cache, and will pull the
       content from the cache on subsequent requests if the page returns a
       Not-Modified response.

   *   UserAgent

       Optional. You may provide your own LWP::UserAgent instance. Look
       into LWPx::ParanoidUserAgent if you're fetching URLs given to you by
       possibly malicious parties.

   *   NoNetwork

       Optional. Controls the interaction between the cache and HTTP
       requests with If-Modified-Since/If-None-Match headers. Possible
       behaviors are:

       false (default)
           If a page is in the cache, the origin HTTP server is always
           checked for a fresher copy with an If-Modified-Since and/or
           If-None-Match header.

       1   If set to 1, the origin HTTP is never contacted, regardless of
           the page being in cache or not. If the page is missing from
           cache, the fetch method will return undef. If the page is in
           cache, that page will be returned, no matter how old it is. Note
           that setting this option means the URI::Fetch::Response object
           will never have the http_response member set.

       "N", where N > 1
           The origin HTTP server is not contacted if the page is in cache
           and the cached page was inserted in the last N seconds. If the
           cached copy is older than N seconds, a normal HTTP request (full
           or cache check) is done.

   *   ContentAlterHook

       Optional. A subref that gets called with a scalar reference to your
       content so you can modify the content before it's returned and
       before it's put in cache.

       For instance, you may want to only cache the <head> section of an
       HTML document, or you may want to take a feed URL and cache only a
       pre-parsed version of it. If you modify the scalarref given to your
       hook and change it into a hashref, scalarref, or some blessed
       object, that same value will be returned to you later on
       not-modified responses.

   *   CacheEntryGrep

       Optional. A subref that gets called with the *URI::Fetch::Response*
       object about to be cached (with the contents already possibly
       transformed by your "ContentAlterHook"). If your subref returns
       true, the page goes into the cache. If false, it doesn't.

   *   Freeze

   *   Thaw

       Optional. Subrefs that get called to serialize and deserialize,
       respectively, the data that will be cached. The cached data should
       be assumed to be an arbitrary Perl data structure, containing
       (potentially) references to arrays, hashes, etc.

       Freeze should serialize the structure into a scalar; Thaw should
       deserialize the scalar into a data structure.

       By default, *Storable* will be used for freezing and thawing the
       cached data structure.

   *   ForceResponse

       Optional. A boolean that indicates a *URI::Fetch::Response* should
       be returned regardless of the HTTP status. By default "undef" is
       returned when a response is not a "success" (200 codes) or one of
       the recognized HTTP status codes listed above. The HTTP status
       message can then be retreived using the "errstr" method on the
       class.

LICENSE
   *URI::Fetch* is free software; you may redistribute it and/or modify it
   under the same terms as Perl itself.

AUTHOR & COPYRIGHT
   Except where otherwise noted, *URI::Fetch* is Copyright 2004 Benjamin
   Trott, [email protected]. All rights reserved.