CAM::PDFTextforms - CAM::PDF wrapper to also allow editing of checkboxes (ie. for IRS Tax forms).
The README is used to introduce the module and provide instructions on
how to install the module, any machine dependencies it may have (for
example C compilers and installed libraries) and any other information
that should be provided before the module is installed.
A README file is required for CPAN modules since CPAN extracts the
README file from a module distribution so that people browsing the
archive can use it get an idea of the modules uses. It is usually a
good idea to provide version information here so that people can
decide whether fixes for the module are worth downloading.
INSTALLATION
Install via one of the following:
perl Makefile.PL
make
make test
make install
or
perl Build.PL
perl Build
perl Build test
perl Build install
DEPENDENCIES
Perl 5, CAM::PDF, Text::PDF, Crypt::RC4, Digest::MD5
NAME
CAM::PDFTaxforms - CAM::PDF wrapper to also allow editing of checkboxes
(ie. for IRS Tax forms).
AUTHOR
Jim Turner "<
https://metacpan.org/author/TURNERJW>".
This module is a wrapper around and a drop-in replacement for CAM::PDF,
by Chris Dolan.
ACKNOWLEDGMENTS
Thanks to Chris Dolan and everyone involved in developing and supporting
CAM::PDF, on which this module is based and relies on.
LICENSE AND COPYRIGHT
Copyright (c) 2010-2019 Jim Turner "<mailto:
[email protected]>"
This library is free software; you can redistribute it and/or modify it
under the same terms as CAM::PDF and Perl itself.
CAM::PDF:
Copyright (c) 2002-2006 Clotho Advanced Media, Inc.,
<
http://www.clotho.com/>
Copyright (c) 2007-2008 Chris Dolan
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
SYNOPSIS
#!/usr/bin/perl -w
use strict;
use CAM::PDFTaxforms;
my $pdf = CAM::PDFTaxforms->new('f1040.pdf') or die "Could not open PDF ($!)!";
my $page1 = $pdf->getPageContent(1);
#DISPLAY THE LIST NAMES OF EDITABLE FIELDS:
my @fieldnames = $pdf->getFormFieldList();
print "--fields=".join('|',@fieldnames)."=\n";
#UPDATE THE VALUES OF ONE OF THE FIELDS AND A COUPLE OF THE CHECKBOXES:
$pdf->fillFormFields('fieldname1' => 'value1', 'fieldname2' => 'value2');
#WRITE THE UPDATED PDF FORM TO A NEW FILE NAME:
$pdf->cleanoutput('f1040_completed.pdf');
Many example programs are included in this distribution to do useful
tasks. See the "bin" subdirectory.
DESCRIPTION
This package is a wrapper for and creates a CAM::PDF object. The
difference is that some method functions are overridden to fix some
issues and add some new features, namely to better handle IRS tax forms,
many of which have checkboxes, in addition to numeric and text fields.
Several other patches have also been applied, particularly those
provided by CAM::PDF bugs #58144, #122890 and #125299. Otherwise, it
should work well as a full drop-in replacement for CAM::PDF in the API.
CAM::PDF description:
This package reads and writes any document that conforms to the PDF
specification generously provided by Adobe at
<
http://partners.adobe.com/public/developer/pdf/index_reference.html>
(link last checked Oct 2005).
The file format through PDF 1.5 is well-supported, with the exception of
the "linearized" or "optimized" output format, which this module can
read but not write. Many specific aspects of the document model are not
manipulable with this package (like fonts), but if the input document is
correctly written, then this module will preserve the model integrity.
The PDF writing feature saves as PDF 1.4-compatible. That means that we
cannot write compressed object streams. The consequence is that reading
and then writing a PDF 1.5+ document may enlarge the resulting file by a
fair margin.
This library grants you some power over the PDF security model. Note
that applications editing PDF documents via this library MUST respect
the security preferences of the document. Any violation of this respect
is contrary to Adobe's intellectual property position, as stated in the
reference manual at the above URL.
Technical detail regarding corrupt PDFs: This library adheres strictly
to the PDF specification. Adobe's Acrobat Reader is more lenient,
allowing some corrupted PDFs to be viewable. Therefore, it is possible
that some PDFs may be readable by Acrobat that are illegible to this
library. In particular, files which have had line endings converted to
or from DOS/Windows style (i.e. CR-NL) may be rendered unusable even
though Acrobat does not complain. Future library versions may relax the
parser, but not yet.
This version is HACKED by Jim Turner 09/2010 to enable the
fillFormFields() function to also modify checkboxes (primarily on IRS
Tax forms).
EXAMPLE
See the example subdirectory in the source tree. There is a sample blank
2018 official IRS Schedule B tax form and two programs: *dof1040sb.pl*,
which fills in the form using the sample input data text file
*f1040sb_inputs.txt*, and creates a filled in version of the form called
*f1040sb_out.pdf*. The other program (*test1040sb.pl*) can read the data
filled in the filled in form created by the other program and displays
it as output.
To run the programs, switch to the examples subdirectory in the source
tree and run them without arguments (ie. ./dof1040sb.pl).
To see the names of the fields and their current values in a PDF form,
such as the aforementioned tax form, run the included program, ie:
*listpdffields2.pl -d f1040sb_out.pdf*.
API
Functions intended to be used externally
$self = CAM::PDFTaxform->new(content | filename | '-')
$self->toPDF()
$self->needsSave()
$self->save()
$self->cleansave()
$self->output(filename | '-')
$self->cleanoutput(filename | '-')
$self->previousRevision()
$self->allRevisions()
$self->preserveOrder()
$self->appendObject(olddoc, oldnum, [follow=(1|0)])
$self->replaceObject(newnum, olddoc, oldnum, [follow=(1|0)])
(olddoc can be undef in the above for adding new objects)
$self->numPages()
$self->getPageText(pagenum)
$self->getPageDimensions(pagenum)
$self->getPageContent(pagenum)
$self->setPageContent(pagenum, content)
$self->appendPageContent(pagenum, content)
$self->deletePage(pagenum)
$self->deletePages(pagenum, pagenum, ...)
$self->extractPages(pagenum, pagenum, ...)
$self->appendPDF(CAM::PDF object)
$self->prependPDF(CAM::PDF object)
$self->wrapString(string, width, fontsize, page, fontlabel)
$self->getFontNames(pagenum)
$self->addFont(page, fontname, fontlabel, [fontmetrics])
$self->deEmbedFont(page, fontname, [newfontname])
$self->deEmbedFontByBaseName(page, basename, [newfont])
$self->getPrefs()
$self->setPrefs()
$self->canPrint()
$self->canModify()
$self->canCopy()
$self->canAdd()
$self->getFormFieldList()
$self->fillFormFields(fieldname, value, [fieldname, value, ...])
or $self->fillFormFields(%values)
$self->clearFormFieldTriggers(fieldname, fieldname, ...)
Note: 'clean' as in cleansave() and cleanobject() means write a fresh
PDF document. The alternative (e.g. save()) reuses the existing doc and
just appends to it. Also note that 'clean' functions sort the objects
numerically. If you prefer that the new PDF docs more closely resemble
the old ones, call preserveOrder() before cleansave() or cleanobject().
For additional methods and functions, see the CAM::PDF documentation.
METHODS
$doc = CAM::PDFTaxforms->new($content)
$doc = CAM::PDFTaxforms->new($ownerpass, $userpass)
$doc = CAM::PDFTaxforms->new($content, $ownerpass, $userpass, $prompt)
$doc = CAM::PDFTaxforms->new($content, $ownerpass, $userpass, $options)
Instantiate a new CAM::PDFTaxforms object. $content can be a
document in a string, a filename, or '-'. The latter indicates that
the document should be read from standard input. If the document is
password protected, the passwords should be passed as additional
arguments. If they are not known, a boolean $prompt argument allows
the programmer to suggest that the constructor prompt the user for a
password. This is rudimentary prompting: passwords are in the clear
on the console.
This constructor takes an optional final argument which is a hash
reference. This hash can contain any of the following optional
parameters:
prompt_for_password => $boolean
This is the same as the $prompt argument described above.
fault_tolerant => $boolean
This flag causes the instance to be more lenient when reading
the input PDF. Currently, this only affects PDFs which cannot be
successfully decrypted.
$hashref = $doc->getFieldValue('fieldname1' [, fieldname2, ...
fieldnameN ])
(CAM::PDFTaxforms only, not available in CAM::PDF)
Fetches the corresponding current values for each field name in the
argument list. Returns a reference to a hash containing the field
names as keys and the corresponding values. If a field does not
exist or does not contain a value, an empty string is returned in
the hash as it's value. If called in array / hash context, then a
list of field names and values in the order (fieldname1, value1,
fieldname2, value2, ... fieldnameN valueN) is returned.
$doc->fillFormFields($name => $value, ...)
$doc->fillFormFields($opts_hash, $name => $value, ...)
Set the default values of PDF form fields. The name should be the
full hierarchical name of the field as output by the
getFormFieldList() function. The argument list can be a hash if you
like. A simple way to use this function is something like this:
my %fields = (fname => 'John', lname => 'Smith', state => 'WI');
$field{zip} = 53703;
$self->fillFormFields(%fields);
NOTE: For checkbox fields specify any value that is *false* in Perl
(ie. 0, '', or *undef*), or any of the strings: 'Off', 'No', or
'Unchecked' (case insensitive) to un-check a checkbox, or any other
value that is *true* in Perl to check it. Checkbox fields are only
supported by CAM::PDFTaxforms and was the original reason for
creating it.
If the first argument is a hash reference, it is interpreted as
options for how to render the filled data:
background_color =< 'none' | $gray | [$r, $g, $b]
Specify the background color for the text field.
$doc->getFormFieldList()
Return an array of the names of all of the PDF form fields. The
names are the full hierarchical names constructed as explained in
the PDF reference manual. These names are useful for the
fillFormFields() function.
$doc->getFormField($name)
*For INTERNAL use*
Return the object containing the form field definition for the
specified field name. $name can be either the full name or the
"short/alternate" name.
SCRIPTS
CAM::PDF includes a number of handy utility scripts, installed in the
users local/bin path, but we add a modified version of their
*listpdffields.pl* utility that is called listpdffields2.pl which adds a
-d (--data) option for displaying the names of all the fields found in a
PDF form, along with their corresponding current values (if any).
listpdffiles2.pl [-dhsvV] *pdfformfile.pdf*
The general format is:
listpdffiles2.pl -d *pdfformfile.pdf*
COMPATIBILITY
This library was primarily developed against the 3rd edition of the
reference (PDF v1.4) with several important updates from 4th edition
(PDF v1.5). This library focuses most deeply on PDF v1.2 features.
Nonetheless, it should be forward and backward compatible in the
majority of cases.
PERFORMANCE
This module is written with good speed and flexibility in mind, often at
the expense of memory consumption. Entire PDF documents are typically
slurped into RAM. As an example, simply calling
"new('PDFReference15_v15.pdf')" (the 13.5 MB Adobe PDF Reference V1.5
document) pushes Perl to consume 89 MB of RAM on my development machine.
DEPENDS
CAM::PDF, Text::PDF, Crypt::RC4, Digest::MD5
KEYWORDS
pdf taxforms
SEE ALSO
CAM::PDF (Obviously) as this module is a wrapper around it (and requires
it as a prerequisite). Also see the docs there for all the other methods
and features available to CAM::PDFTaxforms (it's NOT just for IRS tax
forms)!
There are several other PDF modules on CPAN. Below is a brief
description of a few of them. If these comments are out of date, please
inform me.
PDF::API2
As of v0.46.003, LGPL license.
This is the leading PDF library, in my opinion.
Excellent text and font support. This is the highest level library
of the bunch, and is the most complete implementation of the Adobe
PDF spec. The author is amazingly responsive and patient.
Text::PDF
As of v0.25, Artistic license.
Excellent compression support (CAM::PDF cribs off this Text::PDF
feature). This has not been developed since 2003.
PDF::Reuse
As of v0.32, Artistic/GPL license, like Perl itself.
This library is not object oriented, so it can only process one PDF
at a time, while storing all data in global variables. I'm not fond
of it, but it's quite popular, so don't take my word for it!
Additionally, PDFLib is a commercial package not on CPAN
(www.pdflib.com). It is a C-based library with a Perl interface. It is
designed for PDF creation, not for reuse.