NAME
   DateTime::Format::Builder - Create DateTime parser classes and objects.

SYNOPSIS
       package DateTime::Format::Brief;
       our $VERSION = '0.07';
       use DateTime::Format::Builder
       (
           parsers => {
               parse_datetime => [
               {
                   regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/,
                   params => [qw( year month day hour minute second )],
               },
               {
                   regex => qr/^(\d{4})(\d\d)(\d\d)$/,
                   params => [qw( year month day )],
               },
               ],
           }
       );

DESCRIPTION
   DateTime::Format::Builder creates DateTime parsers. Many string formats
   of dates and times are simple and just require a basic regular
   expression to extract the relevant information. Builder provides a
   simple way to do this without writing reams of structural code.

   Builder provides a number of methods, most of which you'll never need,
   or at least rarely need. They're provided more for exposing of the
   module's innards to any subclasses, or for when you need to do something
   slightly beyond what I expected.

TUTORIAL
   See DateTime::Format::Builder::Tutorial.

ERROR HANDLING AND BAD PARSES
   Often, I will speak of `undef' being returned, however that's not
   strictly true.

   When a simple single specification is given for a method, the method
   isn't given a single parser directly. It's given a wrapper that will
   call `on_fail()' if the single parser returns `undef'. The single parser
   must return `undef' so that a multiple parser can work nicely and actual
   errors can be thrown from any of the callbacks.

   Similarly, any multiple parsers will only call `on_fail()' right at the
   end when it's tried all it could.

   `on_fail()' (see later) is defined, by default, to throw an error.

   Multiple parser specifications can also specify `on_fail' with a coderef
   as an argument in the options block. This will take precedence over the
   inheritable and over-ridable method.

   That said, don't throw real errors from callbacks in multiple parser
   specifications unless you really want parsing to stop right there and
   not try any other parsers.

   In summary: calling a method will result in either a `DateTime' object
   being returned or an error being thrown (unless you've overridden
   `on_fail()' or `create_method()', or you've specified a `on_fail' key to
   a multiple parser specification).

   Individual parsers (be they multiple parsers or single parsers) will
   return either the `DateTime' object or `undef'.

SINGLE SPECIFICATIONS
   A single specification is a hash ref of instructions on how to create a
   parser.

   The precise set of keys and values varies according to parser type.
   There are some common ones though:

   *   length is an optional parameter that can be used to specify that
       this particular *regex* is only applicable to strings of a certain
       fixed length. This can be used to make parsers more efficient. It's
       strongly recommended that any parser that can use this parameter
       does.

       You may happily specify the same length twice. The parsers will be
       tried in order of specification.

       You can also specify multiple lengths by giving it an arrayref of
       numbers rather than just a single scalar. If doing so, please keep
       the number of lengths to a minimum.

       If any specifications without *length*s are given and the particular
       *length* parser fails, then the non-*length* parsers are tried.

       This parameter is ignored unless the specification is part of a
       multiple parser specification.

   *   label provides a name for the specification and is passed to some of
       the callbacks about to mentioned.

   *   on_match and on_fail are callbacks. Both routines will be called
       with parameters of:

       *   input, being the input to the parser (after any preprocessing
           callbacks).

       *   label, being the label of the parser, if there is one.

       *   self, being the object on which the method has been invoked
           (which may just be a class name). Naturally, you can then invoke
           your own methods on it do get information you want.

       *   args, being an arrayref of any passed arguments, if any. If
           there were no arguments, then this parameter is not given.

       These routines will be called depending on whether the regex match
       succeeded or failed.

   *   preprocess is a callback provided for cleaning up input prior to
       parsing. It's given a hash as arguments with the following keys:

       *   input being the datetime string the parser was given (if using
           multiple specifications and an overall *preprocess* then this is
           the date after it's been through that preprocessor).

       *   parsed being the state of parsing so far. Usually empty at this
           point unless an overall *preprocess* was given. Items may be
           placed in it and will be given to any postprocessor and
           `DateTime->new' (unless the postprocessor deletes it).

       *   self, args, label as per *on_match* and *on_fail*.

       The return value from the routine is what is given to the *regex*.
       Note that this is last code stop before the match.

       Note: mixing *length* and a *preprocess* that modifies the length of
       the input string is probably not what you meant to do. You probably
       meant to use the *multiple parser* variant of *preprocess* which is
       done before any length calculations. This `single parser' variant of
       *preprocess* is performed after any length calculations.

   *   postprocess is the last code stop before `DateTime->new()' is
       called. It's given the same arguments as *preprocess*. This allows
       it to modify the parsed parameters after the parse and before the
       creation of the object. For example, you might use:

           {
               regex  => qr/^(\d\d) (\d\d) (\d\d)$/,
               params => [qw( year  month  day   )],
               postprocess => \&_fix_year,
           }

       where `_fix_year' is defined as:

           sub _fix_year
           {
               my %args = @_;
               my ($date, $p) = @args{qw( input parsed )};
               $p->{year} += $p->{year} > 69 ? 1900 : 2000;
               return 1;
           }

       This will cause the two digit years to be corrected according to the
       cut off. If the year was '69' or lower, then it is made into 2069
       (or 2045, or whatever the year was parsed as). Otherwise it is
       assumed to be 19xx. The DateTime::Format::Mail module uses code
       similar to this (only it allows the cut off to be configured and it
       doesn't use Builder).

       Note: It is very important to return an explicit value from the
       *postprocess* callback. If the return value is false then the parse
       is taken to have failed. If the return value is true, then the parse
       is taken to have succeeded and `DateTime->new()' is called.

   See the documentation for the individual parsers for their valid keys.

   Parsers at the time of writing are:

   *   DateTime::Format::Builder::Parser::Regex - provides regular
       expression based parsing.

   *   DateTime::Format::Builder::Parser::Strptime - provides strptime
       based parsing.

 Subroutines / coderefs as specifications.
   A single parser specification can be a coderef. This was added mostly
   because it could be and because I knew someone, somewhere, would want to
   use it.

   If the specification is a reference to a piece of code, be it a
   subroutine, anonymous, or whatever, then it's passed more or less
   straight through. The code should return `undef' in event of failure (or
   any false value, but `undef' is strongly preferred), or a true value in
   the event of success (ideally a `DateTime' object or some object that
   has the same interface).

   This all said, I generally wouldn't recommend using this feature unless
   you have to.

 Callbacks
   I mention a number of callbacks in this document.

   Any time you see a callback being mentioned, you can, if you like,
   substitute an arrayref of coderefs rather than having the straight
   coderef.

MULTIPLE SPECIFICATIONS
   These are very easily described as an array of single specifications.

   Note that if the first element of the array is an arrayref, then you're
   specifying options.

   *   preprocess lets you specify a preprocessor that is called before any
       of the parsers are tried. This lets you do things like strip off
       timezones or any unnecessary data. The most common use people have
       for it at present is to get the input date to a particular length so
       that the *length* is usable (DateTime::Format::ICal would use it to
       strip off the variable length timezone).

       Arguments are as for the *single parser* *preprocess* variant with
       the exception that *label* is never given.

   *   on_fail should be a reference to a subroutine that is called if the
       parser fails. If this is not provided, the default action is to call
       `DateTime::Format::Builder::on_fail', or the `on_fail' method of the
       subclass of DTFB that was used to create the parser.

EXECUTION FLOW
   Builder allows you to plug in a fair few callbacks, which can make
   following how a parse failed (or succeeded unexpectedly) somewhat
   tricky.

 For Single Specifications
   A single specification will do the following:

   User calls parser:

          my $dt = $class->parse_datetime( $string );

   1   *preprocess* is called. It's given `$string' and a reference to the
       parsing workspace hash, which we'll call `$p'. At this point, `$p'
       is empty. The return value is used as `$date' for the rest of this
       single parser. Anything put in `$p' is also used for the rest of
       this single parser.

   2   *regex* is applied.

   3   If *regex* did not match, then *on_fail* is called (and is given
       `$date' and also *label* if it was defined). Any return value is
       ignored and the next thing is for the single parser to return
       `undef'.

       If *regex* did match, then *on_match* is called with the same
       arguments as would be given to *on_fail*. The return value is
       similarly ignored, but we then move to step 4 rather than exiting
       the parser.

   4   *postprocess* is called with `$date' and a filled out `$p'. The
       return value is taken as a indication of whether the parse was a
       success or not. If it wasn't a success then the single parser will
       exit at this point, returning undef.

   5   `DateTime->new()' is called and the user is given the resultant
       `DateTime' object.

   See the section on error handling regarding the `undef's mentioned
   above.

 For Multiple Specifications
   With multiple specifications:

   User calls parser:

         my $dt = $class->complex_parse( $string );

   1   The overall *preprocess*or is called and is given `$string' and the
       hashref `$p' (identically to the per parser *preprocess* mentioned
       in the previous flow).

       If the callback modifies `$p' then a copy of `$p' is given to each
       of the individual parsers. This is so parsers won't accidentally
       pollute each other's workspace.

   2   If an appropriate length specific parser is found, then it is called
       and the single parser flow (see the previous section) is followed,
       and the parser is given a copy of `$p' and the return value of the
       overall *preprocess*or as `$date'.

       If a `DateTime' object was returned so we go straight back to the
       user.

       If no appropriate parser was found, or the parser returned `undef',
       then we progress to step 3!

   3   Any non-*length* based parsers are tried in the order they were
       specified.

       For each of those the single specification flow above is performed,
       and is given a copy of the output from the overall preprocessor.

       If a real `DateTime' object is returned then we exit back to the
       user.

       If no parser could parse, then an error is thrown.

   See the section on error handling regarding the `undef's mentioned
   above.

METHODS
   In the general course of things you won't need any of the methods. Life
   often throws unexpected things at us so the methods are all available
   for use.

 import
   `import()' is a wrapper for `create_class()'. If you specify the *class*
   option (see documentation for `create_class()') it will be ignored.

 create_class
   This method can be used as the runtime equivalent of `import()'. That
   is, it takes the exact same parameters as when one does:

      use DateTime::Format::Builder ( blah blah blah )

   That can be (almost) equivalently written as:

      use DateTime::Format::Builder;
      DateTime::Format::Builder->create_class( blah blah blah );

   The difference being that the first is done at compile time while the
   second is done at run time.

   In the tutorial I said there were only two parameters at present. I
   lied. There are actually three of them.

   *   parsers takes a hashref of methods and their parser specifications.
       See the tutorial above for details.

       Note that if you define a subroutine of the same name as one of the
       methods you define here, an error will be thrown.

   *   constructor determines whether and how to create a `new()' function
       in the new class. If given a true value, a constructor is created.
       If given a false value, one isn't.

       If given an anonymous sub or a reference to a sub then that is used
       as `new()'.

       The default is `1' (that is, create a constructor using our default
       code which simply creates a hashref and blesses it).

       If your class defines its own `new()' method it will not be
       overwritten. If you define your own `new()' and also tell Builder to
       define one an error will be thrown.

   *   verbose takes a value. If the value is undef, then logging is
       disabled. If the value is a filehandle then that's where logging
       will go. If it's a true value, then output will go to `STDERR'.

       Alternatively, call `$DateTime::Format::Builder::verbose()' with the
       relevant value. Whichever value is given more recently is adhered
       to.

       Be aware that verbosity is a global wide setting.

   *   class is optional and specifies the name of the class in which to
       create the specified methods.

       If using this method in the guise of `import()' then this field will
       cause an error so it is only of use when calling as
       `create_class()'.

   *   version is also optional and specifies the value to give `$VERSION'
       in the class. It's generally not recommended unless you're combining
       with the *class* option. A `ExtUtils::MakeMaker' / `CPAN' compliant
       version specification is much better.

   In addition to creating any of the methods it also creates a `new()'
   method that can instantiate (or clone) objects.

SUBCLASSING
   In the rest of the documentation I've often lied in order to get some of
   the ideas across more easily. The thing is, this module's very flexible.
   You can get markedly different behaviour from simply subclassing it and
   overriding some methods.

 create_method
   Given a parser coderef, returns a coderef that is suitable to be a
   method.

   The default action is to call `on_fail()' in the event of a non-parse,
   but you can make it do whatever you want.

 on_fail
   This is called in the event of a non-parse (unless you've overridden
   `create_method()' to do something else.

   The single argument is the input string. The default action is to call
   `croak()'. Above, where I've said parsers or methods throw errors, this
   is the method that is doing the error throwing.

   You could conceivably override this method to, say, return `undef'.

USING BUILDER OBJECTS aka USERS USING BUILDER
   The methods listed in the METHODS section are all you generally need
   when creating your own class. Sometimes you may not want a full blown
   class to parse something just for this one program. Some methods are
   provided to make that task easier.

 new
   The basic constructor. It takes no arguments, merely returns a new
   `DateTime::Format::Builder' object.

       my $parser = DateTime::Format::Builder->new();

   If called as a method on an object (rather than as a class method), then
   it clones the object.

       my $clone = $parser->new();

 clone
   Provided for those who prefer an explicit `clone()' method rather than
   using `new()' as an object method.

       my $clone_of_clone = $clone->clone();

 parser
   Given either a single or multiple parser specification, sets the object
   to have a parser based on that specification.

       $parser->parser(
           regex  => qr/^ (\d{4}) (\d\d) (\d\d) $/x;
           params => [qw( year    month  day    )],
       );

   The arguments given to `parser()' are handed directly to
   `create_parser()'. The resultant parser is passed to `set_parser()'.

   If called as an object method, it returns the object.

   If called as a class method, it creates a new object, sets its parser
   and returns that object.

 set_parser
   Sets the parser of the object to the given parser.

      $parser->set_parser( $coderef );

   Note: this method does not take specifications. It also does not take
   anything except coderefs. Luckily, coderefs are what most of the other
   methods produce.

   The method return value is the object itself.

 get_parser
   Returns the parser the object is using.

      my $code = $parser->get_parser();

 parse_datetime
   Given a string, it calls the parser and returns the `DateTime' object
   that results.

      my $dt = $parser->parse_datetime( "1979 07 16" );

   The return value, if not a `DateTime' object, is whatever the parser
   wants to return. Generally this means that if the parse failed an error
   will be thrown.

 format_datetime
   If you call this function, it will throw an errror.

LONGER EXAMPLES
   Some longer examples are provided in the distribution. These implement
   some of the common parsing DateTime modules using Builder. Each of them
   are, or were, drop in replacements for the modules at the time of
   writing them.

THANKS
   Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing
   DateTime::Format::ICal and DateTime::Format::MySQL, and some much needed
   review.

   Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for
   writing the multilength code (both one length with multiple parsers and
   single parser with multiple lengths), blame for the Regex custom
   constructor code, spotting a bug in Dispatch, and more much needed
   review.

   Kellan Elliott-McCrea (KELLAN) for even more review, suggestions,
   DateTime::Format::W3CDTF and the encouragement to rewrite these docs
   almost 100%!

   Claus F�rber (CFAERBER) for having me get around to fixing the
   auto-constructor writing, providing the 'args'/'self' patch, and
   suggesting the multi-callbacks.

   Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now
   supports.

   Matthew McGillis for pointing out that `on_fail' overriding should be
   simpler.

   Simon Cozens (SIMON) for saying it was cool.

SUPPORT
   Support for this module is provided via the [email protected] email
   list. See http://lists.perl.org/ for more details.

   Alternatively, log them via the CPAN RT system via the web or email:

       http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DateTime%3A%3AFormat%3A%3ABuilder
       [email protected]

   This makes it much easier for me to track things and thus means your
   problem is less likely to be neglected.

LICENCE AND COPYRIGHT
   Copyright E<copy> Iain Truskett, 2003. All rights reserved.

   This library is free software; you can redistribute it and/or modify it
   under the same terms as Perl itself, either Perl version 5.000 or, at
   your option, any later version of Perl 5 you may have available.

   The full text of the licences can be found in the Artistic and COPYING
   files included with this module, or in perlartistic and perlgpl as
   supplied with Perl 5.8.1 and later.

AUTHOR
   Originally written by Iain Truskett <[email protected]>, who died on
   December 29, 2003.

   Maintained by Dave Rolsky <[email protected]>.

SEE ALSO
   `[email protected]' mailing list.

   http://datetime.perl.org/

   perl, DateTime, DateTime::Format::Builder::Tutorial,
   DateTime::Format::Builder::Parser