DESCRIPTION
   Apache2::ModProxyPerlHtml is the most advanced Apache output filter to
   rewrite HTTP headers and HTML links for reverse proxy usage. It is
   written in Perl and exceeds all mod_proxy_html.c limitations without
   performance lost.

   Apache2::ModProxyPerlHtml is very simple and has far better
   parsing/replacement of URL than the original C code. It also support
   meta tag, CSS, and javascript URL rewriting and can be use with
   compressed HTTP. You can now replace any code by other, like changing
   images name or anything else. mod_proxy_html can't do all of that. Since
   release 3.x ModProxyPerlHtml is also able to rewrite HTTP headers with
   Refresh url redirection and Referer.

   The replacement capability concern only the following HTTP content type:

           text/javascript
           text/html
           text/css
           text/xml
           application/.*javascript
           application/.*xml

   other kind of file, will be left untouched (or see ProxyHTMLContentType
   and ProxyHTMLExcludeContentType).

AVAIBILITY
   You can get the latest version of Apache2::ModProxyPerlHtml from CPAN
   (http://search.cpan.org/).

PREREQUISITES
   You must have Apache2, mod_perl2 and IO::Compress::Zlib perl modules
   installed on your system.

   You also need to install the mod_proxy Apache modules. See documentation
   at http://httpd.apache.org/docs/2.0/mod/mod_proxy.html

INSTALLATION
           % perl Makefile.PL
           % make && make install

APACHE CONFIGURATION
   In your Apache configuration file you have to load the following DSO
   modules:

       LoadModule deflate_module modules/mod_deflate.so
       LoadModule headers_module modules/mod_headers.so
       LoadModule proxy_module modules/mod_proxy.so
       LoadModule proxy_connect_module modules/mod_proxy_connect.so
       LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
       LoadModule proxy_http_module modules/mod_proxy_http.so
       LoadModule ssl_module modules/mod_ssl.so
       LoadModule perl_module  modules/mod_perl.so

   Then in your Apache site or virtualhost configuration file use
   ModProxyPerlHtml* as follow:

       ProxyRequests Off
       ProxyPreserveHost Off
       ProxyPass       /webcal/  http://webcal.domain.com/

       PerlInputFilterHandler Apache2::ModProxyPerlHtml
       PerlOutputFilterHandler Apache2::ModProxyPerlHtml
       SetHandler perl-script
       # Use line below iand comment line above if you experience error:
       # "Attempt to serve directory". The reason is that with SetHandler
       # DirectoryIndex is not working
       # AddHandler perl-script *
       PerlSetVar ProxyHTMLVerbose "On"
       LogLevel Info


       <Location /webcal/>
           ProxyPassReverse /
           PerlAddVar ProxyHTMLURLMap "/ /webcal/"
           PerlAddVar ProxyHTMLURLMap "http://webcal.domain.com /webcal"
       </Location>

   Note that here FilterHandlers are set globally, you can also set them in
   any <Location> part to set it locally and avoid calling this Apache
   module globally.

   If you want to rewrite some code on the fly, like changing images
   filename you can use the perl variable ProxyHTMLRewrite under the
   location directive as follow:

       <Location /webcal/>
           ...
           PerlAddVar ProxyHTMLRewrite "/logo/image1.png /images/logo1.png"
           # Or more complicated to handle space in the code as space is the
           # pattern / substitution separator character internally in ModProxyPerlHtml
           PerlAddVar ProxyHTMLRewrite "ajaxurl[\s\t]*=[\s\t]*'/blog' ajaxurl = '/www2.mydom.org/blog'"
           ...
       </Location>

   this will replace each occurence of '/logo/image1.png' by
   '/images/logo1.png' in the entire stream (html, javascript or css). Note
   that this kind of substitution is done after all other proxy related
   replacements.

   In some conditions javascript code can be replaced by error, for
   example:

           imgUp.src = '/images/' + varPath + '/' + 'up.png';

   will be rewritten like this:

           imgUp.src = '/URL/images/' + varPath + '/URL/' + 'up.png';

   To avoid the second replacement, write your JS code like that:

           imgUp.src = '/images/' + varPath + unescape('%2F') + 'up.png';

   ModProxyPerlHTML replacement is activated on certain HTTP Content Type.
   If you experienced that replacement is not activated for your file type,
   you can use the ProxyHTMLContentType configuration directive to
   redefined the HTTP Content Type that should be parsed by
   ModProxyPerlHTML. The default value is the following Perl regular
   expresssion:

           PerlAddVar ProxyHTMLContentType    (text\/javascript|text\/html|text\/css|text\/xml|application\/.*javascript|application\/.*xml)

   If you know exactly what you are doing by editing this regexp fill free
   to add the missing Content-Type that must be parsed by ModProxyPerlHTML.
   Otherwise drop me a line with the content type, I will give you the
   rigth expression. If you don't know about the content type, with FireFox
   simply type Ctrl+i on the web page.

   Some MS Office files may conflict with the above ProxyHTMLContentType
   regex like .docx or .xlsx files. The result is that there could suffer
   of replacement inside and the file will be corrupted. to prevent this
   you have the ProxyHTMLExcludeContentType configuration directive to
   exclude certain content-type. Here is the default value:

           PerlAddVar ProxyHTMLExcludeContentType  (application\/vnd\.openxml)

   If you have problem with other content-type, use this directive. For
   example, as follow:

           PerlAddVar ProxyHTMLExcludeContentType  (application\/vnd\.openxml|application\/vnd\..*text)

   this regex will prevent any MS Office XML or text document to be parsed.

   Some javascript libraries like JQuery are wrongly rewritten by
   ModProxyPerlHtml. The problem is that those javascript code include some
   code and regex that are detected as links and rewritten. The only way to
   fix that is to exclude those files from the URL rewritter by using the
   "ProxyHTMLExcludeUri" configuration directive. For example:

           PerlAddVar ProxyHTMLExcludeUri  jquery.min.js$
           PerlAddVar ProxyHTMLExcludeUri  ^.*\/jquery-lib\/.*$

   Any downloaded URI that contains the given regex will be returned asis
   without rewritting. You can use this directive multiple time like above
   to match different cases.

LIVE EXAMPLE
   Here is the reverse proxy configuration I use to give access to Internet
   users to internal applications:

       ProxyRequests Off
       ProxyPreserveHost Off
       ProxyPass       /webmail/  http://webmail.domain.com/
       ProxyPass       /webcal/  http://webcal.domain.com/
       ProxyPass       /intranet/  http://intranet.domain.com/


       PerlInputFilterHandler Apache2::ModProxyPerlHtml
       PerlOutputFilterHandler Apache2::ModProxyPerlHtml
       SetHandler perl-script
       # Use line below iand comment line above if you experience error:
       # "Attempt to serve directory". The reason is that with SetHandler
       # DirectoryIndex is not working
       # AddHandler perl-script *
       PerlSetVar ProxyHTMLVerbose "On"
       LogLevel Info


       # URL rewriting
       RewriteEngine   On
       #RewriteLog      "/var/log/apache/rewrite.log"
       #RewriteLogLevel 9
       # Add ending '/' if not provided
       RewriteCond     %{REQUEST_URI}  ^/mail$
       RewriteRule     ^/(.*)$ /$1/    [R]
       RewriteCond     %{REQUEST_URI}  ^/planet$
       RewriteRule     ^/(.*)$ /$1/    [R]
       # Add full path to the CGI to bypass the index.html redirect that may fail
       RewriteCond     %{REQUEST_URI}  ^/calendar/$
       RewriteRule     ^/(.*)/$ /$1/cgi-bin/wcal.pl    [R]
       RewriteCond     %{REQUEST_URI}  ^/calendar$
       RewriteRule     ^/(.*)$ /$1/cgi-bin/wcal.pl     [R]


       <Location /webmail/>
           ProxyPassReverse /
           PerlAddVar ProxyHTMLURLMap "/ /webmail/"
           PerlAddVar ProxyHTMLURLMap "http://webmail.domain.com /webmail"
           # Use this to disable compressed HTTP
           #RequestHeader   unset   Accept-Encoding
       </Location>


       <Location /webcal/>
           ProxyPassReverse /
           PerlAddVar ProxyHTMLURLMap "/ /webcal/"
           PerlAddVar ProxyHTMLURLMap "http://webcal.domain.com /webcal"
       </Location>


       <Location /intranet/>
           ProxyPassReverse /
           PerlAddVar ProxyHTMLURLMap "/ /intranet/"
           PerlAddVar ProxyHTMLURLMap "http://intranet.domain.com /intranet"
           # Rewrite links that give access to the two previous location
           PerlAddVar ProxyHTMLURLMap "/intranet/webmail /webmail"
           PerlAddVar ProxyHTMLURLMap "/intranet/webcal /webcal"
       </Location>

   This gives access two a webmail and webcal application hosted internally
   to all authentified users through their own Internet acces. There's also
   one acces to an Intranet portal that have links to the webcal and
   webmail application. Those links must be rewritten twice to works.

BUGS
   Apache2::ModProxyPerlHtml is still under development and is pretty
   stable. Please send me email to submit bug reports or feature requests.

COPYRIGHT
   Copyright (c) 2005-2013 - Gilles Darold

   All rights reserved. This program is free software; you may redistribute
   it and/or modify it under the same terms as Perl itself.

AUTHOR
   Apache2::ModProxyPerlHtml was created by :

           Gilles Darold
           <gilles at darold dot net>

   and is currently maintain by me.