NAME

NAME
XML::Fast - Simple and very fast XML to hash conversion

SYNOPSIS
use XML::Fast;

my $hash = xml2hash $xml;
my $hash2 = xml2hash $xml, attr => '.', text => '~';

DESCRIPTION
This module implements simple, state machine based, XML parser written
in C.

It could parse and recover some kind of broken XML's. If you need XML
validator, use XML::LibXML

RATIONALE
Another similar module is XML::Bare. I've used it for some time, but it
have some failures:

* If your XML have node with name 'value', you'll got a segfault

* If your XML have node with TextNode, then CDATANode, then again
TextNode, you'll got broken value

* It doesn't support charsets

* It doesn't support any kind of entities.

So, after count of tries to fix XML::Bare I've decided to write parser
from scratch.

It is about 40% faster than XML::Bare and about 120% faster, than
XML::LibXML

I got this results using the following test on 35kb xml doc:

cmpthese timethese -10, {
libxml => sub { XML::LibXML->new->parse_string($doc) },
xmlfast => sub { XML::Fast::xml2hash($doc) },
xmlbare => sub { XML::Bare->new(text => $doc)->parse },
};

Rate libxml xmlbare xmlfast
libxml 1107/s -- -38% -56%
xmlbare 1782/s 61% -- -28%
xmlfast 2490/s 125% 40% --

Of course, the results could be defferent for different xml files. With
non-utf encodings and with many entities it could be slower. This test
was taken for a sample RSS feed in utf-8 mode with a small count of xml
entities.

Here is some features and principles:

* It uses minimal count of memory allocations.

* All XML is parsed in 1 scan.

* All values are copied from source XML only once (to destination
keys/values)

* If some types of nodes (for ex comments) are ignored, there are no
memory allocations/copy for them.

EXPORT
xml2hash $xml, [ %options ]
OPTIONS
order [ = 0 ]
Not implemented yet. Strictly keep the output order. When enabled,
structures become more complex, but xml could be completely
reverted.

attr [ = '-' ]
Attribute prefix

<node attr="test" /> => { node => { -attr => "test" } }

text [ = '#text' ]
Key name for storing text

When undef, text nodes will be ignored

<node>text</node> => { node => { sub => '', '#text' => "test" } }

join [ = '' ]
Join separator for text nodes, splitted by subnodes

Ignored when "order" in effect

# default:
xml2hash( '<item>Test1Test2</item>' )
: { item => { sub => '', '~' => 'Test1Test2' } };

xml2hash( '<item>Test1Test2</item>', join => '+' )
: { item => { sub => '', '~' => 'Test1+Test2' } };

trim [ = 1 ]
Trim leading and trailing whitespace from text nodes

cdata [ = undef ]
When defined, CDATA sections will be stored under this key

# cdata = undef
<node><![CDATA[ test ]]></node> => { node => 'test' }

# cdata = '#'
<node><![CDATA[ test ]]></node> => { node => { '#' => 'test' } }

comm [ = undef ]
When defined, comments sections will be stored under this key

When undef, comments will be ignored

# comm = undef
<node></node> => { node => { sub => '' } }

# comm = '/'
<node></node> => { node => { sub => '', '/' => 'comm' } }

array => 1
Force all nodes to be kept as arrays.

# no array
<node></node> => { node => { sub => '' } }

# array = 1
<node></node> => { node => [ { sub => [ '' ] } ] }

array => [ 'node', 'names']
Force nodes with names to be stored as arrays

# no array
<node></node> => { node => { sub => '' } }

# array => ['sub']
<node></node> => { node => { sub => [ '' ] } }

SEE ALSO
* XML::Bare

Another fast parser, but have problems

* XML::LibXML

The most powerful XML parser for perl. If you don't need to parse
gigabytes of XML ;)

* XML::Hash::LX

XML parser, that uses XML::LibXML for parsing and then constructs
hash structure, identical to one, generated by this module. (At
least, it should ;)). But of course it is much more slower, than
XML::Fast

TODO
* Ordered mode (as implemented in XML::Hash::LX)

* Create hash2xml, identical to one in XML::Hash::LX

* Partial content event-based parsing (I need this for reading XML
streams)

Patches, propositions and bug reports are welcome ;)

AUTHOR
Mons Anderson, <[email protected]>

COPYRIGHT AND LICENSE
Copyright (C) 2010 Mons Anderson

This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.