CAM::PDFTextforms - CAM::PDF wrapper to also allow editing of checkboxes (ie. for IRS Tax forms).

The README is used to introduce the module and provide instructions on
how to install the module, any machine dependencies it may have (for
example C compilers and installed libraries) and any other information
that should be provided before the module is installed.

A README file is required for CPAN modules since CPAN extracts the
README file from a module distribution so that people browsing the
archive can use it get an idea of the modules uses. It is usually a
good idea to provide version information here so that people can
decide whether fixes for the module are worth downloading.

INSTALLATION

Install via one of the following:
 perl Makefile.PL
 make
 make test
 make install

or

 perl Build.PL
 perl Build
 perl Build test
 perl Build install

DEPENDENCIES
   Perl 5, CAM::PDF, Text::PDF, Crypt::RC4, Digest::MD5

NAME
   CAM::PDFTaxforms - CAM::PDF wrapper to also allow editing of checkboxes
   (ie. for IRS Tax forms).

AUTHOR
   Jim Turner "<https://metacpan.org/author/TURNERJW>".

   This module is a wrapper around and a drop-in replacement for CAM::PDF,
   by Chris Dolan.

ACKNOWLEDGMENTS
   Thanks to Chris Dolan and everyone involved in developing and supporting
   CAM::PDF, on which this module is based and relies on.

LICENSE AND COPYRIGHT
   Copyright (c) 2010-2019 Jim Turner "<mailto:[email protected]>"

   This library is free software; you can redistribute it and/or modify it
   under the same terms as CAM::PDF and Perl itself.

   CAM::PDF:

   Copyright (c) 2002-2006 Clotho Advanced Media, Inc.,
   <http://www.clotho.com/>

   Copyright (c) 2007-2008 Chris Dolan

   This library is free software; you can redistribute it and/or modify it
   under the same terms as Perl itself.

SYNOPSIS
       #!/usr/bin/perl -w

       use strict;
       use CAM::PDFTaxforms;
       my $pdf = CAM::PDFTaxforms->new('f1040.pdf') or die "Could not open PDF ($!)!";
       my $page1 = $pdf->getPageContent(1);

       #DISPLAY THE LIST NAMES OF EDITABLE FIELDS:
       my @fieldnames = $pdf->getFormFieldList();
       print "--fields=".join('|',@fieldnames)."=\n";

       #UPDATE THE VALUES OF ONE OF THE FIELDS AND A COUPLE OF THE CHECKBOXES:
       $pdf->fillFormFields('fieldname1' => 'value1', 'fieldname2' => 'value2');

       #WRITE THE UPDATED PDF FORM TO A NEW FILE NAME:
       $pdf->cleanoutput('f1040_completed.pdf');

   Many example programs are included in this distribution to do useful
   tasks. See the "bin" subdirectory.

DESCRIPTION
   This package is a wrapper for and creates a CAM::PDF object. The
   difference is that some method functions are overridden to fix some
   issues and add some new features, namely to better handle IRS tax forms,
   many of which have checkboxes, in addition to numeric and text fields.
   Several other patches have also been applied, particularly those
   provided by CAM::PDF bugs #58144, #122890 and #125299. Otherwise, it
   should work well as a full drop-in replacement for CAM::PDF in the API.

   CAM::PDF description:

   This package reads and writes any document that conforms to the PDF
   specification generously provided by Adobe at
   <http://partners.adobe.com/public/developer/pdf/index_reference.html>
   (link last checked Oct 2005).

   The file format through PDF 1.5 is well-supported, with the exception of
   the "linearized" or "optimized" output format, which this module can
   read but not write. Many specific aspects of the document model are not
   manipulable with this package (like fonts), but if the input document is
   correctly written, then this module will preserve the model integrity.

   The PDF writing feature saves as PDF 1.4-compatible. That means that we
   cannot write compressed object streams. The consequence is that reading
   and then writing a PDF 1.5+ document may enlarge the resulting file by a
   fair margin.

   This library grants you some power over the PDF security model. Note
   that applications editing PDF documents via this library MUST respect
   the security preferences of the document. Any violation of this respect
   is contrary to Adobe's intellectual property position, as stated in the
   reference manual at the above URL.

   Technical detail regarding corrupt PDFs: This library adheres strictly
   to the PDF specification. Adobe's Acrobat Reader is more lenient,
   allowing some corrupted PDFs to be viewable. Therefore, it is possible
   that some PDFs may be readable by Acrobat that are illegible to this
   library. In particular, files which have had line endings converted to
   or from DOS/Windows style (i.e. CR-NL) may be rendered unusable even
   though Acrobat does not complain. Future library versions may relax the
   parser, but not yet.

   This version is HACKED by Jim Turner 09/2010 to enable the
   fillFormFields() function to also modify checkboxes (primarily on IRS
   Tax forms).

EXAMPLE
   See the example subdirectory in the source tree. There is a sample blank
   2018 official IRS Schedule B tax form and two programs: *dof1040sb.pl*,
   which fills in the form using the sample input data text file
   *f1040sb_inputs.txt*, and creates a filled in version of the form called
   *f1040sb_out.pdf*. The other program (*test1040sb.pl*) can read the data
   filled in the filled in form created by the other program and displays
   it as output.

   To run the programs, switch to the examples subdirectory in the source
   tree and run them without arguments (ie. ./dof1040sb.pl).

   To see the names of the fields and their current values in a PDF form,
   such as the aforementioned tax form, run the included program, ie:
   *listpdffields2.pl -d f1040sb_out.pdf*.

API
 Functions intended to be used externally
    $self = CAM::PDFTaxform->new(content | filename | '-')
    $self->toPDF()
    $self->needsSave()
    $self->save()
    $self->cleansave()
    $self->output(filename | '-')
    $self->cleanoutput(filename | '-')
    $self->previousRevision()
    $self->allRevisions()
    $self->preserveOrder()
    $self->appendObject(olddoc, oldnum, [follow=(1|0)])
    $self->replaceObject(newnum, olddoc, oldnum, [follow=(1|0)])
       (olddoc can be undef in the above for adding new objects)
    $self->numPages()
    $self->getPageText(pagenum)
    $self->getPageDimensions(pagenum)
    $self->getPageContent(pagenum)
    $self->setPageContent(pagenum, content)
    $self->appendPageContent(pagenum, content)
    $self->deletePage(pagenum)
    $self->deletePages(pagenum, pagenum, ...)
    $self->extractPages(pagenum, pagenum, ...)
    $self->appendPDF(CAM::PDF object)
    $self->prependPDF(CAM::PDF object)
    $self->wrapString(string, width, fontsize, page, fontlabel)
    $self->getFontNames(pagenum)
    $self->addFont(page, fontname, fontlabel, [fontmetrics])
    $self->deEmbedFont(page, fontname, [newfontname])
    $self->deEmbedFontByBaseName(page, basename, [newfont])
    $self->getPrefs()
    $self->setPrefs()
    $self->canPrint()
    $self->canModify()
    $self->canCopy()
    $self->canAdd()
    $self->getFormFieldList()
    $self->fillFormFields(fieldname, value, [fieldname, value, ...])
      or $self->fillFormFields(%values)
    $self->clearFormFieldTriggers(fieldname, fieldname, ...)

   Note: 'clean' as in cleansave() and cleanobject() means write a fresh
   PDF document. The alternative (e.g. save()) reuses the existing doc and
   just appends to it. Also note that 'clean' functions sort the objects
   numerically. If you prefer that the new PDF docs more closely resemble
   the old ones, call preserveOrder() before cleansave() or cleanobject().

 For additional methods and functions, see the CAM::PDF documentation.
METHODS
   $doc = CAM::PDFTaxforms->new($content)
   $doc = CAM::PDFTaxforms->new($ownerpass, $userpass)
   $doc = CAM::PDFTaxforms->new($content, $ownerpass, $userpass, $prompt)
   $doc = CAM::PDFTaxforms->new($content, $ownerpass, $userpass, $options)
       Instantiate a new CAM::PDFTaxforms object. $content can be a
       document in a string, a filename, or '-'. The latter indicates that
       the document should be read from standard input. If the document is
       password protected, the passwords should be passed as additional
       arguments. If they are not known, a boolean $prompt argument allows
       the programmer to suggest that the constructor prompt the user for a
       password. This is rudimentary prompting: passwords are in the clear
       on the console.

       This constructor takes an optional final argument which is a hash
       reference. This hash can contain any of the following optional
       parameters:

       prompt_for_password => $boolean
           This is the same as the $prompt argument described above.

       fault_tolerant => $boolean
           This flag causes the instance to be more lenient when reading
           the input PDF. Currently, this only affects PDFs which cannot be
           successfully decrypted.

   $hashref = $doc->getFieldValue('fieldname1' [, fieldname2, ...
   fieldnameN ])
       (CAM::PDFTaxforms only, not available in CAM::PDF)

       Fetches the corresponding current values for each field name in the
       argument list. Returns a reference to a hash containing the field
       names as keys and the corresponding values. If a field does not
       exist or does not contain a value, an empty string is returned in
       the hash as it's value. If called in array / hash context, then a
       list of field names and values in the order (fieldname1, value1,
       fieldname2, value2, ... fieldnameN valueN) is returned.

   $doc->fillFormFields($name => $value, ...)
   $doc->fillFormFields($opts_hash, $name => $value, ...)
       Set the default values of PDF form fields. The name should be the
       full hierarchical name of the field as output by the
       getFormFieldList() function. The argument list can be a hash if you
       like. A simple way to use this function is something like this:

           my %fields = (fname => 'John', lname => 'Smith', state => 'WI');
           $field{zip} = 53703;
           $self->fillFormFields(%fields);

       NOTE: For checkbox fields specify any value that is *false* in Perl
       (ie. 0, '', or *undef*), or any of the strings: 'Off', 'No', or
       'Unchecked' (case insensitive) to un-check a checkbox, or any other
       value that is *true* in Perl to check it. Checkbox fields are only
       supported by CAM::PDFTaxforms and was the original reason for
       creating it.

       If the first argument is a hash reference, it is interpreted as
       options for how to render the filled data:

       background_color =< 'none' | $gray | [$r, $g, $b]
           Specify the background color for the text field.

   $doc->getFormFieldList()
       Return an array of the names of all of the PDF form fields. The
       names are the full hierarchical names constructed as explained in
       the PDF reference manual. These names are useful for the
       fillFormFields() function.

   $doc->getFormField($name)
       *For INTERNAL use*

       Return the object containing the form field definition for the
       specified field name. $name can be either the full name or the
       "short/alternate" name.

SCRIPTS
   CAM::PDF includes a number of handy utility scripts, installed in the
   users local/bin path, but we add a modified version of their
   *listpdffields.pl* utility that is called listpdffields2.pl which adds a
   -d (--data) option for displaying the names of all the fields found in a
   PDF form, along with their corresponding current values (if any).

   listpdffiles2.pl [-dhsvV] *pdfformfile.pdf*
       The general format is:

       listpdffiles2.pl -d *pdfformfile.pdf*

COMPATIBILITY
   This library was primarily developed against the 3rd edition of the
   reference (PDF v1.4) with several important updates from 4th edition
   (PDF v1.5). This library focuses most deeply on PDF v1.2 features.
   Nonetheless, it should be forward and backward compatible in the
   majority of cases.

PERFORMANCE
   This module is written with good speed and flexibility in mind, often at
   the expense of memory consumption. Entire PDF documents are typically
   slurped into RAM. As an example, simply calling
   "new('PDFReference15_v15.pdf')" (the 13.5 MB Adobe PDF Reference V1.5
   document) pushes Perl to consume 89 MB of RAM on my development machine.

DEPENDS
   CAM::PDF, Text::PDF, Crypt::RC4, Digest::MD5

KEYWORDS
   pdf taxforms

SEE ALSO
   CAM::PDF (Obviously) as this module is a wrapper around it (and requires
   it as a prerequisite). Also see the docs there for all the other methods
   and features available to CAM::PDFTaxforms (it's NOT just for IRS tax
   forms)!

   There are several other PDF modules on CPAN. Below is a brief
   description of a few of them. If these comments are out of date, please
   inform me.

   PDF::API2
       As of v0.46.003, LGPL license.

       This is the leading PDF library, in my opinion.

       Excellent text and font support. This is the highest level library
       of the bunch, and is the most complete implementation of the Adobe
       PDF spec. The author is amazingly responsive and patient.

   Text::PDF
       As of v0.25, Artistic license.

       Excellent compression support (CAM::PDF cribs off this Text::PDF
       feature). This has not been developed since 2003.

   PDF::Reuse
       As of v0.32, Artistic/GPL license, like Perl itself.

       This library is not object oriented, so it can only process one PDF
       at a time, while storing all data in global variables. I'm not fond
       of it, but it's quite popular, so don't take my word for it!

   Additionally, PDFLib is a commercial package not on CPAN
   (www.pdflib.com). It is a C-based library with a Perl interface. It is
   designed for PDF creation, not for reuse.