NAME
Wais - access to freeWAIS-sf libraries
SYNOPSIS
`require Wais;'
DESCRIPTION
The interface is divided in four major parts.
SFgate 4.0
For backward compatibility the functions used in
SFgate up to version 4 are still present. Their use is
deprecated and they are not documented here. These
functions may no be supported in following versions of
this module.
Protocol XS functions which provide a low-level access to the WAIS
protocol. E.g. `generate_search_apdu()' constructs a
request message.
SFgate 5 Perl functions that implement high-level access to WAIS
servers. E. g. parallel searching is supported.
dictionary
A bunch of XS functions useful for inspecting local
databases.
We will start with the SFgate 5 functions.
USAGE
The main high-level interface are the functions `Wais::Search'
and `Wais::Retrieve'. Both return a reference to an object of
the class `Wais::Result'.
Wais::Search
Arguments of `Wais::Search' are hash references, one for each
database to search. The keys of the hashes should be:
query The query to submit.
database The database which should be searched.
host host is optional. It defaults to `'localhost''.
port port is optional. It defaults to `210'.
tag A tag by which individual results can be associated to a
database/host/port triple. If omitted defaults to the
database name.
relevant If present must be a reference to an array containing
alternating document id's and types. Document id's
must be of type `Wais:Docid'.
Here is a complete example:
$result = Wais::Search({'query' => 'pfeifer',
'database' => $db1,
'host' => 'ls6',
'relevant' => [$id, 'TEXT']},
{'query' => 'pfeifer',
'database' => $db2});
If *host* is `'localhost'' and *database*`.src'
exists, local search is performed instead of
connecting a server.
`Wais::Search' will open `$Wais::maxnumfd' connections
in parallel at most.
Wais::Retrieve
`Wais::Retrieve' should be called with named
parameters (i.e. a hash). Valid parameters are
database, host, port, docid, and type.
$result = Wais::Retrieve('database' => $db,
'docid' => $id,
'host' => 'ls6',
'type' => 'TEXT');
Defaults are the same as for `Wais::Search'. In
addition type defaults to `'TEXT''.
`Wais:Result'
The functions `Wais::Search' and `Wais::Retrieve'
return references to objects blessed into
`Wais:Result'. The following methods are available:
diagnostics Returns and array of diagnostic messages. Each
element (if any) is a reference to an array
consisting of
tag The tag of the corresponding
search request or
`'document'' if the request
was a retrieve request.
code The WAIS diagnostic code.
message A textual diagnostic message.
header Returns and array of WAIS document headers. Each
element (if any) is a reference to an array
consisting of
tag The tag of the corresponding
search request or
`'document'' if the request
was a retrieve request.
score
lines Length of the corresponding
dcoument in lines.
length Length of the corresponding
document in bytes.
headline
types A reference to an array of types
valid for docid.
docid A reference to the WAIS
identifier blessed into
`Wais::Docid'.
text Returns the text fetched by `Wais::Retrieve'.
Dictionary
There are a couple of functions to inspect local
databases. See the inspect script in the distribution.
You need the Curses module to run it. Also adapt the
directory settings in the top part.
Wais::dictionary
%frequency = Wais::dictionary($database);
%frequency = Wais::dictionary($database, $field);
%frequency = Wais::dictionary($database, 'foo*');
%frequency = Wais::dictionary($database, $field, 'foo*');
The function returns an array containing alternating
the matching words in the global or field dictionary
matching the prefix if given and the freqence of the
preceding word. In a sclar context, the number of
matching word is returned.
Wais::list_offset
The function takes the same arguments as
Wais::dictionary. It returns the same array rsp.
wordcount with the word frequencies replaced by the
offset of the postinglist in the inverted file.
Wais::postings
%postings = Wais::postings($database, 'foo');
%postings = Wais::postings($database, $field, 'foo');
Returns and an array containing alternating numeric
document id's and a reference to an array whichs first
element is the internal weight if the word with
respect to the document. The other elements are the
word/character positions of the occurances of the word
in the document. If freeWAIS-sf is compiled with `-
DPROXIMITY', word positions are returned otherwise
character postitions.
In an scalar context the number of occurances of the
word is returned.
Wais::headline
$headline = Wais::headline($database, $docid);
The function retrieves the headline (only the text!)
of the document numbered `$docid'.
Wais::document
$text = &Wais::document($database, $docid);
The function retrieves the text of the document
numbered `$docid'.
Protocol
Wais::generate_search_apdu
$apdu = Wais::generate_search_apdu($query,$database);
$relevant = [$id1, 'TEXT', $id2, 'HTML'];
$apdu = Wais::generate_search_apdu($query,$database,$relevant);
Document id's must be of type `WAIS::Docid' as
returned by `Wais::Result::header' or
Wais::Search::header. $WAIS::maxdoc may be set to
modify the number of documents to retrieve.
Wais::generate_retrieval_apdu
$apdu = Wais::generate_retrieval_apdu($database, $docid, $type);
$apdu = Wais::generate_retrieval_apdu($database, $docid,
$type, $chunk);
Request to send the `$chunk''s chunk of the document
whichs id is `$docid' (must be of type `WAIS::Docid').
$chunk defaults to `0'. $Wais::CHARS_PER_PAGE may be
set to influence the chunk size.
Wais::local_answer
$answer = Wais::local_answer($apdu);
Answer the request by local search/retrieval. The
message header is stripped from the result for
convenience (see the code of `Wais::Search' rsp.
documentaion of Wais::Search::new below).
Wais::Search::new
$result = Wais::Search::new($message);
Turn the result message in an object of type
`Wais::Search'. The following methods are available:
diagnostics, header, and text. Result of the message
is pretty the same as for `Wais::Result'. Just the
tags are missing.
Wais::Docid::new
$result = new Wais::Docid($distserver, $distdb, $distid,
$copyright, $origserver, $origdb, $origid);
Only the first four arguments are manatory.
Wais::Docid::split
($distserver, $distdb, $distid, $copyright, $origserver,
$origdb, $origid) = Wais::Docid::split($result);
($distserver, $distdb, $distid) = Wais::Docid::split($result);
($distserver, $distdb, $distid) = $result->split;
The inverse of `Wais::Docid::new'
diagnostics Return an array of references to `[$code,
$message]'
header Return an array of references to `[$score,
$lines, $length, $headline, $types,
$docid]'.
text Returns the chunk of the document requested. For
documents larger than $Wais::CHARS_PER_PAGE
more than one request must be send.
Wais::Search::DESTROY
The objects will be destroyed by Perl.
VARIABLES
$Wais::version Generated by: `sprintf(buf, "Wais %3.1f%d",
VERSION, PATCHLEVEL);'
$Wais:errmsg Set to an verbose error message if something
went wrong. Most functions return `undef' on
failure after setting `$Wais:errmsg'.
$Wais::maxdoc Maximum number of hits to return when searching.
Defaults to `40'.
$Wais::CHARS_PER_PAGE
Maximum number of bytes to retrieve in a
single retrieve request. `Wais:Retrieve'
sends multiple requests if necessary to
retrieve a document. `CHARS_PER_PAGE'
defaults to `4096'.
$Wais::timeout Number of seconds to wait for an answer from
remote servers. Defaults to 120.
$Wais::maxnumfd Maximum number of file descriptors to use
simultaneously in `Wais::Search'. Defaults
to `10'.
Access to the basic freeWAIS-sf reduction functions
Wais::Type::stemmer(*word*)
reduces *word* using the well know Porter algorithm.
AU: Porter, M.F.
TI: An Algorithm for Suffix Stripping
JT: Program
VO: 14
PP: 130-137
PY: 1980
PM: JUL
Wais::Type::soundex(*word*)
computes the 4 byte Soundex code for *word*.
AU: Gadd, T.N.
TI: 'Fisching for Werds'. Phonetic Retrieval of written text in
Information Retrieval Systems
JT: Program
VO: 22
NO: 3
PP: 222-237
PY: 1988
Wais::Type::phonix(*word*)
computes the 8 byte Phonix code for *word*.
AU: Gadd, T.N.
TI: PHONIX: The Algorithm
JT: Program
VO: 24
NO: 4
PP: 363-366
PY: 1990
PM: OCT
BUGS
`Wais::Search' currently splits the request in groups
of `$Wais::maxnumfd' requests. Since some requests of
the group might be local and/or some might refer to
the same host/port, groups may not use all
`$Wais::maxnumfd' possible file descriptors. Therefore
some performance my be lost when more than
`$Wais::maxnumfd' requests are processed.
AUTHORS
Ulrich Pfeifer <
[email protected]>,
Norbert Goevert <
[email protected]>