README for DTA::CAB
ABSTRACT
DTA::CAB - "Cascaded Analysis Broker" for error-tolerant linguistic
analysis
REQUIREMENTS
Perl Modules
See "Makefile.PL", "META.json", and/or "META.yml" in the
distribution directory. Perl dependencies should be available on
CPAN <
http://metacpan.org>.
Additional Perl modules may be required by particular
DTA::CAB::Analyzer subclasses. If you see errors like
Can't locate foo.pm in @INC (you may need to install the foo module)
... then you should probably first try looking for the "foo" module
on on CPAN <
http://metacpan.org>.
External Web-Service
If you just want to use the client libraries to query an external
"DTA::CAB" web-service, you'll need only the URL for that service
and an active internet connection. See the DTA::CAB Web-Service
HOWTO
<
http://odo.dwds.de/~jurish/software/DTA-CAB/doc/html/DTA.CAB.WebSer
viceHowto.html> for an introduction.
Language Resources
If you want to do anything other than querying an external
"DTA::CAB" web-service, you'll need a small menagerie of "gfsm"
transducers and various assorted other language(-variant)-specific
resources which are not included in this distribution, and for which
(presumably) there exists no "one-size-fits-all" solution. Look at
the documentation and code of the individual DTA::CAB::Analyzer
subclasses you're interested in for more details.
DESCRIPTION
The DTA::CAB package provides an object-oriented compiler/interpreter
for error-tolerant heuristic morphological analysis of tokenized text.
INSTALLATION
Issue the following commands to the shell:
bash$ cd DTA-CAB-0.01 # (or wherever you unpacked this distribution)
bash$ perl Makefile.PL # check requirements, etc.
bash$ make # build the module
bash$ make test # (optional): test module before installing
bash$ make install # install the module on your system
REFERENCES
If you use this service in an academic context, please include the
following citation in any related publications:
* Jurish, Bryan. *Finite-state Canonicalization Techniques for
Historical German.* PhD thesis, Universität Potsdam, 2012 (defended
2011). URN urn:nbn:de:kobv:517-opus-55789, [online
<
http://opus.kobv.de/ubp/volltexte/2012/5578/>, PDF
<
http://kaskade.dwds.de/~jurish/pubs/jurish2012diss.pdf>, BibTeX
<
http://kaskade.dwds.de/~jurish/pubs/jurish2012diss.bib>]
See here <
http://odo.dwds.de/~jurish/software/dta-cab/#pubs> for a list
of other CAB-related publications.
SEE ALSO
* The CAB software page <
http://odo.dwds.de/~jurish/software/dta-cab/>
is the top-level repository for CAB documentation, news, etc.
* The DTA::CAB manual page contains a basic introduction to the the
CAB architecture.
* The DTA::CAB::Format manual page describes the abstract CAB I/O
Format API, and includes a list of supported format classes.
* The DTA::CAB::HttpProtocol manual page describes the conventions
used by the CAB web-service API.
* The DTA 'Base Format' Guidelines (DTABf)
<
http://www.deutschestextarchiv.de/doku/basisformat> describes the
subset of the TEI <
http://www.tei-c.org/> encoding guidelines which
can reasonably be expected to be handled gracefully by the CAB TEI
and/or TEIws formatters.
AUTHOR
Bryan Jurish <
[email protected]>