=encoding utf8

=head1 NAME

Geo::libpostal - Perl bindings for libpostal

=head1 SYNOPSIS

 use Geo::libpostal ':all';

 # normalize an address
 my @addresses = expand_address('120 E 96th St New York');

 # parse addresses into their components
 my %address = parse_address('The Book Club 100-106 Leonard St Shoreditch London EC2A 4RH, United Kingdom');

 # %address contains:
 # (
 #   road         => 'leonard st',
 #   postcode     => 'ec2a 4rh',
 #   house        => 'the book club',
 #   house_number => '100-106',
 #   suburb       => 'shoreditch',
 #   country      => 'united kingdom',
 #   city         => 'london'
 # );

=head1 DESCRIPTION

libpostal is a C library for parsing/normalizing international street addresses. Address strings can be normalized using C<expand_address> which returns a list of valid variations so you can check for duplicates in your dataset. It supports normalization in over L<60 languages|https://github.com/openvenues/libpostal/tree/master/resources/dictionaries>. An address string can also be parsed into its constituent parts using C<parse_address> such as house name, number, city and postcode.

=head1 FUNCTIONS

=head2 expand_address

 use Geo::libpostal 'expand_address';

 my @ny_addresses = expand_address('120 E 96th St New York');
 my @fr_addresses = expand_address('Quatre vingt douze R. de l\'Église');

Takes an address string and returns a list of known variants. Useful for normalization. Accepts many boolean options:

 expand_address('120 E 96th St New York',
     latin_ascii => 1,
     transliterate => 1,
     strip_accents => 1,
     decompose => 1,
     lowercase => 1,
     trim_string => 1,
     drop_parentheticals => 1,
     replace_numeric_hyphens => 1,
     delete_numeric_hyphens => 1,
     split_alpha_from_numeric => 1,
     replace_word_hyphens => 1,
     delete_word_hyphens => 1,
     delete_final_periods => 1,
     delete_acronym_periods => 1,
     drop_english_possessives => 1,
     delete_apostrophes => 1,
     expand_numex => 1,
     roman_numerals => 1,
 );

B<Warning>: old versions of libpostal L<segfault|https://github.com/openvenues/libpostal/issues/79> if all options are set to false. C<Geo::libpostal> includes a unit test for this.

Also accepts an arrayref of language codes per L<ISO 639-1|https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes>:

expand_address('120 E 96th St New York', languages => [qw(en fr)]);

This is useful if you are normalizing addresses in multiple languages.

Finally C<expand_address> accepts an option for which address C<components> to expand. This is a 16 bit integer bitmask. These constants are exported with the C<:all> tag:

 $ADDRESS_NONE
 $ADDRESS_ANY
 $ADDRESS_NAME
 $ADDRESS_HOUSE_NUMBER
 $ADDRESS_STREET
 $ADDRESS_UNIT
 $ADDRESS_LOCALITY
 $ADDRESS_ADMIN1
 $ADDRESS_ADMIN2
 $ADDRESS_ADMIN3
 $ADDRESS_ADMIN4
 $ADDRESS_ADMIN_OTHER
 $ADDRESS_COUNTRY
 $ADDRESS_POSTAL_CODE
 $ADDRESS_NEIGHBORHOOD
 $ADDRESS_ALL

These are the default components used by libpostal:

 use Geo::libpostal ':all';

 expand_address('120 E 96th St New York',
   components => $ADDRESS_NAME | $ADDRESS_HOUSE_NUMBER | $ADDRESS_STREET | $ADDRESS_UNIT
 );

The constant C<$ADDRESS_ALL> uses all components:

 expand_address('120 E 96th St New York',
   components => $ADDRESS_ALL
 );

C<expand_address> will C<die> on C<undef> and empty addresses, odd numbers of options and unrecognized options. Exported on request.

=head2 parse_address

 use Geo::libpostal 'parse_address';

 my %ny_address = parse_address('120 E 96th St New York');
 my %fr_address = parse_address('Quatre vingt douze R. de l\'Église');


=cut

=pod

Will C<die> on C<undef> and empty addresses. Exported on request.

C<parse_address> may return L<duplicate labels|https://github.com/openvenues/libpostal/issues/27> for invalid addresses
strings.

=head1 WARNING

libpostal uses C<setup> and C<teardown> functions. Setup is lazily loaded. Teardown occurs in an C<END> block automatically.

=over 4

=item *

Old versions of libpostal C<Geo::libpostal> will L<segfault|https://github.com/openvenues/libpostal/issues/82> if C<_teardown()> is called twice (this module includes a unit test for this).

=item *

If C<expand_address> or C<parse_address> is called after teardown, old versions of libpostal will L<error|https://github.com/openvenues/libpostal/pull/86> (this module includes a unit test for this too).

=item *

libpostal is not L<thread-safe|https://github.com/openvenues/libpostal/issues/34>.

=back

=head1 EXTERNAL DEPENDENCIES

L<libpostal|https://github.com/openvenues/libpostal> is required. C<Geo::libpostal> uses pkg-config to support custom install locations. Sometimes pkg-config can't find the library by default, in those cases just do:

 $ find / -name libpostal.pc 2>/dev/null
 /usr/local/lib/pkgconfig/libpostal.pc

 $ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

And then do a manual installation of C<Geo::libpostal>.

=head1 INSTALLATION

You can install this module with CPAN:

 $ cpan Geo::libpostal

Or clone it from GitHub and install it manually:

 $ git clone https://github.com/dnmfarrell/Geo-libpostal
 $ cd Geo-libpostal
 $ perl Makefile.PL
 $ make
 $ make test
 $ make install

=head1 AUTHOR

E<copy> 2016 David Farrell

=head1 LICENSE

See LICENSE


=cut