NAME
   Wais - access to freeWAIS-sf libraries

SYNOPSIS
   `require Wais;'

DESCRIPTION
   The interface is divided in four major parts.

   SFgate 4.0
             For backward compatibility the functions used in
             SFgate up to version 4 are still present. Their use is
             deprecated and they are not documented here. These
             functions may no be supported in following versions of
             this module.

   Protocol  XS functions which provide a low-level access to the WAIS
             protocol. E.g. `generate_search_apdu()' constructs a
             request message.

   SFgate 5  Perl functions that implement high-level access to WAIS
             servers. E. g. parallel searching is supported.

   dictionary
             A bunch of XS functions useful for inspecting local
             databases.


   We will start with the SFgate 5 functions.

USAGE
   The main high-level interface are the functions `Wais::Search'
   and `Wais::Retrieve'. Both return a reference to an object of
   the class `Wais::Result'.

 Wais::Search

   Arguments of `Wais::Search' are hash references, one for each
   database to search. The keys of the hashes should be:

   query     The query to submit.

   database  The database which should be searched.

   host      host is optional. It defaults to `'localhost''.

   port      port is optional. It defaults to `210'.

   tag       A tag by which individual results can be associated to a
             database/host/port triple. If omitted defaults to the
             database name.

   relevant  If present must be a reference to an array containing
             alternating document id's and types. Document id's
             must be of type `Wais:Docid'.

             Here is a complete example:

                  $result = Wais::Search({'query'    => 'pfeifer',
                                          'database' => $db1,
                                          'host'     => 'ls6',
                                          'relevant' => [$id, 'TEXT']},
                                         {'query'    => 'pfeifer',
                                          'database' => $db2});


             If *host* is `'localhost'' and *database*`.src'
             exists, local search is performed instead of
             connecting a server.

             `Wais::Search' will open `$Wais::maxnumfd' connections
             in parallel at most.

 Wais::Retrieve

             `Wais::Retrieve' should be called with named
             parameters (i.e. a hash). Valid parameters are
             database, host, port, docid, and type.

                     $result = Wais::Retrieve('database' => $db,
                                              'docid'    => $id,
                                              'host'     => 'ls6',
                                              'type'     => 'TEXT');


             Defaults are the same as for `Wais::Search'. In
             addition type defaults to `'TEXT''.

 `Wais:Result'

             The functions `Wais::Search' and `Wais::Retrieve'
             return references to objects blessed into
             `Wais:Result'. The following methods are available:

   diagnostics         Returns and array of diagnostic messages. Each
                       element (if any) is a reference to an array
                       consisting of

                  tag                 The tag of the corresponding
                                      search request or
                                      `'document'' if the request
                                      was a retrieve request.

                  code                The WAIS diagnostic code.

                  message             A textual diagnostic message.


   header              Returns and array of WAIS document headers. Each
                       element (if any) is a reference to an array
                       consisting of

                  tag                 The tag of the corresponding
                                      search request or
                                      `'document'' if the request
                                      was a retrieve request.

                  score

                  lines               Length of the corresponding
                                      dcoument in lines.

                  length              Length of the corresponding
                                      document in bytes.

                  headline

                  types               A reference to an array of types
                                      valid for docid.

                  docid               A reference to the WAIS
                                      identifier blessed into
                                      `Wais::Docid'.


   text                Returns the text fetched by `Wais::Retrieve'.


Dictionary
             There are a couple of functions to inspect local
             databases. See the inspect script in the distribution.
             You need the Curses module to run it. Also adapt the
             directory settings in the top part.

 Wais::dictionary

                    %frequency = Wais::dictionary($database);
                    %frequency = Wais::dictionary($database, $field);
                    %frequency = Wais::dictionary($database, 'foo*');
                    %frequency = Wais::dictionary($database,  $field, 'foo*');


             The function returns an array containing alternating
             the matching words in the global or field dictionary
             matching the prefix if given and the freqence of the
             preceding word. In a sclar context, the number of
             matching word is returned.

 Wais::list_offset

             The function takes the same arguments as
             Wais::dictionary. It returns the same array rsp.
             wordcount with the word frequencies replaced by the
             offset of the postinglist in the inverted file.

 Wais::postings

                    %postings = Wais::postings($database, 'foo');
                    %postings = Wais::postings($database, $field, 'foo');


             Returns and an array containing alternating numeric
             document id's and a reference to an array whichs first
             element is the internal weight if the word with
             respect to the document. The other elements are the
             word/character positions of the occurances of the word
             in the document. If freeWAIS-sf is compiled with `-
             DPROXIMITY', word positions are returned otherwise
             character postitions.

             In an scalar context the number of occurances of the
             word is returned.

 Wais::headline

                    $headline = Wais::headline($database, $docid);


             The function retrieves the headline (only the text!)
             of the document numbered `$docid'.

 Wais::document

                    $text = &Wais::document($database, $docid);


             The function retrieves the text of the document
             numbered `$docid'.

Protocol
 Wais::generate_search_apdu

                    $apdu = Wais::generate_search_apdu($query,$database);
                    $relevant = [$id1, 'TEXT', $id2, 'HTML'];
                    $apdu = Wais::generate_search_apdu($query,$database,$relevant);


             Document id's must be of type `WAIS::Docid' as
             returned by `Wais::Result::header' or
             Wais::Search::header. $WAIS::maxdoc may be set to
             modify the number of documents to retrieve.

 Wais::generate_retrieval_apdu

                    $apdu = Wais::generate_retrieval_apdu($database, $docid, $type);
                    $apdu = Wais::generate_retrieval_apdu($database, $docid,
                                                          $type, $chunk);


             Request to send the `$chunk''s chunk of the document
             whichs id is `$docid' (must be of type `WAIS::Docid').
             $chunk defaults to `0'. $Wais::CHARS_PER_PAGE may be
             set to influence the chunk size.

 Wais::local_answer

                    $answer = Wais::local_answer($apdu);


             Answer the request by local search/retrieval. The
             message header is stripped from the result for
             convenience (see the code of `Wais::Search' rsp.
             documentaion of Wais::Search::new below).

 Wais::Search::new

                    $result = Wais::Search::new($message);


             Turn the result message in an object of type
             `Wais::Search'. The following methods are available:
             diagnostics, header, and text. Result of the message
             is pretty the same as for `Wais::Result'. Just the
             tags are missing.

 Wais::Docid::new

                    $result = new Wais::Docid($distserver, $distdb, $distid,
                                  $copyright,  $origserver, $origdb, $origid);


             Only the first four arguments are manatory.

 Wais::Docid::split

                    ($distserver, $distdb, $distid, $copyright, $origserver,
                     $origdb, $origid) = Wais::Docid::split($result);
                    ($distserver, $distdb, $distid) = Wais::Docid::split($result);
                    ($distserver, $distdb, $distid) = $result->split;


             The inverse of `Wais::Docid::new'

   diagnostics         Return an array of references to `[$code,
                       $message]'

   header              Return an array of references to `[$score,
                       $lines, $length, $headline, $types,
                       $docid]'.

   text                Returns the chunk of the document requested. For
                       documents larger than $Wais::CHARS_PER_PAGE
                       more than one request must be send.


 Wais::Search::DESTROY

             The objects will be destroyed by Perl.

VARIABLES
   $Wais::version      Generated by: `sprintf(buf, "Wais %3.1f%d",
                       VERSION, PATCHLEVEL);'

   $Wais:errmsg        Set to an verbose error message if something
                       went wrong. Most functions return `undef' on
                       failure after setting `$Wais:errmsg'.

   $Wais::maxdoc       Maximum number of hits to return when searching.
                       Defaults to `40'.

   $Wais::CHARS_PER_PAGE
                       Maximum number of bytes to retrieve in a
                       single retrieve request. `Wais:Retrieve'
                       sends multiple requests if necessary to
                       retrieve a document. `CHARS_PER_PAGE'
                       defaults to `4096'.

   $Wais::timeout      Number of seconds to wait for an answer from
                       remote servers. Defaults to 120.

   $Wais::maxnumfd     Maximum number of file descriptors to use
                       simultaneously in `Wais::Search'. Defaults
                       to `10'.


Access to the basic freeWAIS-sf reduction functions
   Wais::Type::stemmer(*word*)
             reduces *word* using the well know Porter algorithm.

               AU: Porter, M.F.
               TI: An Algorithm for Suffix Stripping
               JT: Program
               VO: 14
               PP: 130-137
               PY: 1980
               PM: JUL


   Wais::Type::soundex(*word*)
             computes the 4 byte Soundex code for *word*.

               AU: Gadd, T.N.
               TI: 'Fisching for Werds'. Phonetic Retrieval of written text in
                   Information Retrieval Systems
               JT: Program
               VO: 22
               NO: 3
               PP: 222-237
               PY: 1988


   Wais::Type::phonix(*word*)
             computes the 8 byte Phonix code for *word*.

               AU: Gadd, T.N.
               TI: PHONIX: The Algorithm
               JT: Program
               VO: 24
               NO: 4
               PP: 363-366
               PY: 1990
               PM: OCT


BUGS
             `Wais::Search' currently splits the request in groups
             of `$Wais::maxnumfd' requests. Since some requests of
             the group might be local and/or some might refer to
             the same host/port, groups may not use all
             `$Wais::maxnumfd' possible file descriptors. Therefore
             some performance my be lost when more than
             `$Wais::maxnumfd' requests are processed.

AUTHORS
             Ulrich Pfeifer <[email protected]>,
             Norbert Goevert <[email protected]>