NAME
   URL::Transform - perform URL transformations in various document types

SYNOPSIS
       my $output;
       my $urlt = URL::Transform->new(
           'document_type'      => 'text/html;charset=utf-8',
           'content_encoding'   => 'gzip',
           'output_function'    => sub { $output .= "@_" },
           'transform_function' => sub { return (join '|', @_) },
       );
       $urlt->parse_file($Bin.'/data/URL-Transform-01.html');

       print "and this is the output: ", $output;

DESCRIPTION
   URL::Transform is a generic module to perform an url transformation in a
   documents. Accepts callback function using which the url link can be
   changed.

   There are different modules to handle different document types, elements
   or attributes:

   `text/html', `text/vnd.wap.wml', `application/xhtml+xml',
   `application/vnd.wap.xhtml+xml'
       URL::Transform::using::HTML::Parser, URL::Transform::using::XML::SAX
       (incomplete was used only to benchmark)

   `text/css'
       URL::Transform::using::CSS::RegExp

   `text/html/meta-content'
       URL::Transform::using::HTML::Meta

   `application/x-javascript'
       URL::Transform::using::Remove

   By passing `parser' option to the `URL::Transform->new()' constructor
   you can set what library will be used to parse and execute the output
   and transform functions. Note that the elements inside for example
   `text/html' that are of a different type will be transformed via
   default_for($document_type) modules.

   `transform_function' is called with following arguments:

       transform_function->(
           'tag_name'       => 'img',
           'attribute_name' => 'src',
           'url'            => 'http://search.cpan.org/s/img/cpan_banner.png',
       );

   and must return (un)modified url as the return value.

   `output_function' is called with (already modified) document chunk for
   outputting.

PROPERTIES
       content_encoding
       document_type
       parser
       transform_function
       output_function

   parser
       For HTML/XML can be HTML::Parser, XML::SAX

   document_type
           text/html - default

   transform_function
       Function that will be called to make the transformation. The
       function will receive one argument - url text.

   output_function
       Reference to function that will receive resulting output. The
       default one is to use print.

   content_encoding
       Can be set to `gzip' or `deflate'. By default it is `undef', so
       there is no content encoding.

METHODS
 new
   Object constructor.

   Requires `transform_function' a CODE ref argument.

   The rest of the arguments are optional. Here is the list with defaults:

       document_type       => 'text/html;charset=utf-8',
       output_function     => sub { print @_ },
       parser              => 'HTML::Parser',
       content_encoding    => undef,

 default_for($document_type)
   Returns default parser for a supplied $document_type.

   Can be used also as a set function with additional argument - parser
   name.

   If called as object method set the default parser for the object. If
   called as module function set the default parser for a whole module.

 parse_string($string)
   Submit document as a string for parsing.

   This some function must be implemented by helper parsing classes.

 parse_chunk($chunk)
   Submit chunk of a document for parsing.

   This some function should be implemented by helper parsing classes.

 can_parse_chunks
   Return true/false if the parser can parse in chunks.

 parse_file($file_name)
   Submit file for parsing.

   This some function should be implemented by helper parsing classes.

 link_tags
       # To simplify things, reformat the %HTML::Tagset::linkElements
       # hash so that it is always a hash of hashes.

   # Construct a hash of tag names that may have links.

 js_attributes
   # Construct a hash of all possible JavaScript attribute names

 decode_string($string)
   Will return decoded string suitable for parsing. Decoding is chosen
   according to the $self->content_encoding.

   Decoding is run automatically for every chunk/string/file.

 encode_string($string)
   Will return encoded string. Encoding is chosen according to the
   $self->content_encoding.

   NOTE if you want to have your content encoded back to the
   $self->content_encoding you will have to run this method in your code.
   Argument to the `output_function()' are always plain text.

 get_supported_content_encodings()
   Returns hash reference of supported content encodings.

benchmarks
       Benchmark: timing 10000 iterations of HTML::Parser    , XML::LibXML::SAX, XML::SAX::PurePerl...
       HTML::Parser      :  3 wallclock secs ( 2.41 usr +  0.04 sys =  2.45 CPU) @ 4081.63/s (n=10000)
       XML::LibXML::SAX  : 29 wallclock secs (27.22 usr +  0.11 sys = 27.33 CPU) @ 365.90/s (n=10000)
       XML::SAX::PurePerl: 192 wallclock secs (180.62 usr +  0.50 sys = 181.12 CPU) @ 55.21/s (n=10000)

TODO
   There are urls in `pics' meta tag: `<meta http-equiv="pics-label"
   content=" ...'. See http://www.w3.org/PICS/.

SEE ALSO
   HTML::Parser, URL::Transform::using::HTML::Parser

AUTHOR
   Jozef Kutej `<jkutej at cpan.org>'

LICENSE AND COPYRIGHT
   This program is free software; you can redistribute it and/or modify it
   under the terms of either: the GNU General Public License as published
   by the Free Software Foundation; or the Artistic License.

   See http://dev.perl.org/licenses/ for more information.