NAME
   Perl::LibExtractor - determine perl library subsets for building
   distributions

SYNOPSIS
      use Perl::LibExtractor;

DESCRIPTION
   The purpose of this module is to determine subsets of your perl library,
   that is, a set of files needed to satisfy certain dependencies (e.g. of
   a program).

   The goal is to extract a part of your perl installation including
   dependencies. A typical use case for this module would be to find out
   which files are needed to be build a PAR distribution, to link into an
   App::Staticperl binary, or to pack with Urlader, to create stand-alone
   distributions tailormade to run your app.

METHODS
   To use this module, first call the "new"-constructor and then as many
   other methods as you want, to generate a set of files. Then query the
   set of files and do whatever you want with them.

   The command-line utility perl-libextract can be a convenient alternative
   to using this module directly, and offers a few extra options, such as
   to copy out the files into a new directory, strip them and/or manipulate
   them in other ways.

 CREATION
   $extractor = new Perl::LibExtractor [key => value...]
       Creates a new extractor object. Each extractor object stores some
       configuration options and a subset of files that can be queried at
       any time,.

       Binary executables (such as the perl interpreter) are stored inside
       bin/, perl scripts are stored under script/, perl library files are
       stored under lib/ and shared libraries are stored under dll/.

       The following key-value pairs exist, with default values as
       specified.

       inc => \@INC without "."
           An arrayref with paths to perl library directories. The default
           is "\@INC", with . removed.

           To prepend custom dirs just do this:

              inc => ["mydir", @INC],

       use_packlist => 1
           Enable (if true) or disable the use of ".packlist" files. If
           enabled, then each time a file is traced, the complete
           distribution that contains it is included (but not traced).

           If disabled, only shared objects and autoload files will be
           added.

           Debian GNU/Linux doesn't completely package perl or any perl
           modules, so this option will fail. Other perls should be fine.

       extra_deps => { file => [files...] }
           Some (mainly runtime dependencies in the perl core library)
           cannot be detected automatically by this module, especially if
           you don't use packlists and "add_core".

           This module comes with a set of default dependencies (such as
           Carp requiring Carp::Heavy), which you cna override with this
           parameter.

           To see the default set of dependencies that come with this
           module, use this:

              perl -MPerl::LibExtractor -MData::Dumper -e 'print Dumper $Perl::LibExtractor::EXTRA_DEPS'

 TRACE/PACKLIST BASED ADDING
   The following methods add various things to the set of files.

   Each time a perl file is added, it is scanned by tracing either loading,
   execution or compiling it, and seeing which other perl modules and
   libraries have been loaded.

   For each library file found this way, additional dependencies are added:
   if packlists are enabled, then all files of the distribution that
   contains the file will be added. If packlists are disabled, then only
   shared objects and autoload files for modules will be added.

   Only files from perl library directories will be added automatically.
   Any other files (such as manpages or scripts installed in the bin
   directory) are skipped.

   If there is an error, such as a module not being found, then this module
   croaks (as opposed to silently skipping). If you want to add something
   of which you are not sure it exists, then you can wrap the call into
   "eval {}". In some cases, you can avoid this by executing the code you
   want to work later using "add_eval" - see "add_core_support" for an
   actual example of this technique.

   Note that packlists are meant to add files not covered by other
   mechanisms, such as resource files and other data files loaded directly
   by a module - they are not meant to add dependencies that are missed
   because they only happen at runtime.

   For example, with packlists, when using AnyEvent, then all event loop
   backends are automatically added as well, but *not* any event loops
   (i.e. AnyEvent::Impl::POE is added, but POE itself is not). Without
   packlists, only the backend that is being used is added (i.e. normally
   none, as loading AnyEvent does not instantly load any backend).

   To catch the extra event loop dependencies, you can either initialise
   AnyEvent so it picks a suitable backend:

      $extractor->add_eval ("use AnyEvent; AnyEvent::detect");

   Or you can directly load the backend modules you plan to use:

      $extractor->add_mod ("AnyEvent::Impl::EV", "AnyEvent::Impl::Perl");

   An example of a program (or module) that has extra resource files is
   Deliantra::Client - the normal tracing (without packlist usage) will
   correctly add all submodules, but miss the fonts and textures. By using
   the packlist, those files are added correctly.

   $extractor->add_mod ($module[, $module...])
       Adds the given module(s) to the file set - the module name must be
       specified as in "use", i.e. with "::" as separators and without .pm.

       The program will be loaded with the default import list, any
       dependent files, such as the shared object implementing xs
       functions, or autoload files, will also be added.

       If you want to use a different import list (for those rare modules
       wghere import lists trigger different backend modules to be loaded
       for example), you can use "add_eval" instead:

         $extractor->add_eval ("use Module qw(a b c)");

       Example: add Coro.pm and AnyEvent/AIO.pm, and all relevant files
       from the distribution they are part of.

         $extractor->add_mod ("Coro", "AnyEvent::AIO");

   $extractor->add_require ($name[, $name...])
       Works like "add_mod", but uses "require $name" to load the module,
       i.e. the name must be a filename.

       Example: load Coro and AnyEvent::AIO, but using "add_require"
       instead of "add_mod".

          $extractor->add_require ("Coro.pm", "AnyEvent/AIO.pm");

   $extractor->add_bin ($name[, $name...])
       Adds the given (perl) program(s) to the file set, that is, a program
       installed by some perl module, written in perl (an example would be
       the perl-libextract program that is part of the "Perl::LibExtractor"
       distribution).

       Example: add the deliantra client program installed by the
       Deliantra::Client module and put it under bin/deliantra.

          $extractor->add_bin ("deliantra");

   $extractor->add_eval ($string)
       Evaluates the string as perl code and adds all modules that are
       loaded by it. For example, this would add AnyEvent and the default
       backend implementation module and event loop module:

          $extractor->add_eval ("use AnyEvent; AnyEvent::detect");

       Each code snippet will be executed in its own package and under "use
       strict".

 OTHER METHODS FOR ADDING FILES
   The following methods add commonly used files that are either not
   covered by other methods or add commonly-used dependencies.

   $extractor->add_perl
       Adds the perl binary itself to the file set, including the libperl
       dll, if needed.

       For example, on UNIX systems, this usually adds a exe/perl and
       possibly some dll/libperl.so.XXX.

   $extractor->add_core_support
       Try to add modules and files needed to support commonly-used builtin
       language features. For example to open a scalar for I/O you need the
       PerlIO::scalar module:

          open $fh, "<", \$scalar

       A number of regex and string features (e.g. "ucfirst") need some
       unicore files, e.g.:

          'my $x = chr 1234; "\u$x\U$x\l$x\L$x"; $x =~ /\d|\w|\s|\b|$x/i';

       This call adds these files (simply by executing code similar to the
       above code fragments).

       Notable things that are missing are other PerlIO layers, such as
       PerlIO::encoding, and named character and character class matches.

   $extractor->add_unicore
       Adds (hopefully) all files from the unicore database that will ever
       be needed.

       If you are not sure which unicode character classes and similar
       unicore databases you need, and you do not care about an extra one
       thousand(!) files comprising 4MB of data, then you can just call
       this method, which adds basically all files from perl's unicode
       database.

       Note that "add_core_support" also adds some unicore files, but it's
       not a subset of "add_unicore" - the former adds all files neccessary
       to support core builtins (which includes some unicore files and
       other things), while the latter adds all unicore files (but nothing
       else).

       When in doubt, use both.

   $extractor->add_core
       This adds all files from the perl core distribution, that is, all
       library files that come with perl.

       This is a superset of "add_core_support" and "add_unicore".

       This is quite a lot, but on the plus side, you can be sure nothing
       is missing.

       This requires a full perl installation - Debian GNU/Linux doesn't
       package the full perl library, so this function will not work there.

 GLOB-BASED ADDING AND FILTERING
   These methods add or manipulate files by using glob-based patterns.

   These glob patterns work similarly to glob patterns in the shell:

   /   A / at the start of the pattern interprets the pattern as a file
       path inside the file set, almost the same as in the shell. For
       example, /bin/perl* would match all files whose names starting with
       perl inside the bin directory in the set.

       If the / is missing, then the pattern is interpreted as a module
       name (a .pm file). For example, Coro matches the file lib/Coro.pm ,
       while Coro::* would match lib/Coro/*.pm.

   *   A single star matches anything inside a single directory component.
       For example, /lib/Coro/*.pm would match all .pm files inside the
       lib/Coro/ directory, but not any files deeper in the hierarchy.

       Another way to look at it is that a single star matches anything but
       a slash (/).

   **  A double star matches any number of characters in the path,
       including /.

       For example, AnyEvent::** would match all modules whose names start
       with "AnyEvent::", no matter how deep in the hierarchy they are.

   $extractor->add_glob ($modglob[, $modglob...])
       Adds all files from the perl library that match the given glob
       pattern.

       For example, you could implement "add_unicore" yourself like this:

          $extractor->add_glob ("/unicore/**.pl");

   $extractor->filter ($pattern[, $pattern...])
       Applies a series of include/exclude filters. Each filter must start
       with either "+" or "-", to designate the pattern as *include* or
       *exclude* pattern. The rest of the pattern is a normal glob pattern.

       An exclude pattern ("-") instantly removes all matching files from
       the set. An include pattern ("+") protects matching files from later
       removals.

       That is, if you have an include pattern then all files that were
       matched by it will be included in the set, regardless of any further
       exclude patterns matching the same files.

       Likewise, any file excluded by a pattern will not be included in the
       set, even if matched by later include patterns.

       Any files not matched by any expression will simply stay in the set.

       For example, to remove most of the useless autoload functions by the
       POSIX module (they either do the same thing as a builtin or always
       raise an error), you would use this:

          $extractor->filter ("-/lib/auto/POSIX/*.al");

       This does not remove all autoload files, only the ones not defined
       by a subclass (e.g. it leaves "POSIX::SigRt::xxx" alone).

   $extractor->runtime_only
       This removes all files that are not needed at runtime, such as
       static archives, header and other files needed only for compilation
       of modules, and pod and html files (which are unlikely to be needed
       at runtime).

       This is quite useful when you want to have only files actually
       needed to execute a program.

 RESULT SET
   $set = $extractor->set
       Returns a hash reference that represents the result set. The hash is
       the actual internal storage hash and can only be modified as
       described below.

       Each key in the hash is the path inside the set, without a leading
       slash, e.g.:

          bin/perl
          lib/unicore/lib/Blk/Superscr.pl
          lib/AnyEvent/Impl/EV.pm

       The value is an array reference with mostly unspecified contents,
       except the first element, which is the file system path where the
       actual file can be found.

       This code snippet lists all files inside the set:

          print "$_\n"
             for sort keys %{ $extractor->set });

       This code fragment prints "filesystem_path => set_path" pairs for
       all files in the set:

          my $set = $extractor->set;
          while (my ($set,$fspath) = each %$set) {
             print "$fspath => $set\n";
          }

       You can implement your own filtering by asking for the result set
       with "$extractor->set", and then deleting keys from the referenced
       hash - since you can ask for the result set at any time you can add
       things, filter them out this way, and add additional things.

EXAMPLE
   To package he deliantra client (Deliantra::Client), finding all (perl)
   files needed to run it is a first step. This can be done by using
   something like the following code snippet:

      my $ex = new Perl::LibExtractor;

      $ex->add_perl;
      $ex->add_core_support;
      $ex->add_bin ("deliantra");
      $ex->add_mod ("AnyEvent::Impl::EV");
      $ex->add_mod ("AnyEvent::Impl::Perl");
      $ex->add_mod ("Urlader");
      $ex->filter ("-/*/auto/POSIX/**.al");
      $ex->runtime_only;

   First it sets the perl library directory to pm and . (the latter to work
   around some AutoLoader bugs), so perl uses only the perl library files
   that came with the binary package.

   Then it sets some environment variable to override the system default
   (which might be incompatible).

   Then it runs the client itself, using "require". Since "require" only
   looks in the perl library directory this is the reaosn why the scripts
   were put there (of course, since . is also included it doesn't matter,
   but I refuse to yield to bugs).

   Finally it exits with a clean status to signal "ok" to Urlader.

   Back to the original "Perl::LibExtractor" script: after initialising a
   new set, the script simply adds the perl interpreter and core support
   files (just in case, not all are needed, but some are, and I am too lazy
   to find out which ones exactly).

   Then it adds the deliantra executable itself, which in turn adds most of
   the required modules. After that, the AnyEvent implementation modules
   are added because these dependencies are not picked up automatically.

   The Urlader module is added because the client itself does not depend on
   it at all, but the wrapper does.

   At this point, all required files are present, and it's time to slim
   down: most of the ueseless POSIX autoloaded functions are removed, not
   because they are so big, but because creating files is a costly
   operation in itself, so even small fiels have considerable overhead when
   unpacking. Then files not required for running the client are removed.

   And that concludes it, the set is now ready.

SEE ALSO
   The utility program that comes with this module: perl-libextract.

   App::Staticperl, Urlader, Perl::Squish.

LICENSE
   This software package is licensed under the GPL version 3 or any later
   version, see COPYING for details.

   This license does not, of course, apply to any output generated by this
   software.

AUTHOR
      Marc Lehmann <[email protected]>
      http://home.schmorp.de/