NAME
Wais - access to freeWAIS-sf libraries
SYNOPSIS
`use Wais;'
DESCRIPTION
The interface is divided in four major parts.
SFgate 4.0
For backward compatibility the functions used in SFgate up to
version 4 are still present. Their use is deprecated and they
are not documented here. These functions may no be supported
in following versions of this module.
Protocol XS functions which provide a low-level access to the WAIS
protocol. E.g. `generate_search_apdu()' constructs a request
message.
SFgate 5.0
Perl functions that implement high-level access to WAIS
servers. E.g. parallel searching is supported.
dictionary
A bunch of XS functions useful for inspecting local databases.
We will start with the SFgate 5.0 functions.
USAGE
The main high-level interface are the functions `Wais::Search' and
`Wais::Retrieve'. Both return a reference to an object of the class
`Wais::Result'.
Wais::Search
Arguments of `Wais::Search' are hash references, one for each database
to search. The keys of the hashes should be:
query The query to submit.
database The database which should be searched.
host host is optional. It defaults to `'localhost''.
port port is optional. It defaults to `210'.
tag A tag by which individual results can be associated to a
database/host/port triple. If omitted defaults to the database
name.
relevant If present must be a reference to an array containing alternating
document id's and types. Document id's must be of type
`Wais:Docid'.
Here is a complete example:
$result = Wais::Search({'query' => 'pfeifer',
'database' => $db1,
'host' => 'ls6',
'relevant' => [$id, 'TEXT']},
{'query' => 'pfeifer',
'database' => $db2});
If *host* is `'localhost'' and *database*`.src' exists, local
search is performed instead of connecting a server.
`Wais::Search' will open `$Wais::maxnumfd' connections in
parallel at most.
Wais::Retrieve
`Wais::Retrieve' should be called with named parameters (i.e.
a hash). Valid parameters are database, host, port, docid, and
type.
$result = Wais::Retrieve('database' => $db,
'docid' => $id,
'host' => 'ls6',
'type' => 'TEXT');
Defaults are the same as for `Wais::Search'. In addition type
defaults to `'TEXT''.
`Wais:Result'
The functions `Wais::Search' and `Wais::Retrieve' return
references to objects blessed into `Wais:Result'. The
following methods are available:
diagnostics Returns and array of diagnostic messages. Each element
(if any) is a reference to an array consisting of
tag The tag of the corresponding search
request or `'document'' if the
request was a retrieve request.
code The WAIS diagnostic code.
message A textual diagnostic message.
header Returns and array of WAIS document headers. Each element
(if any) is a reference to an array consisting of
tag The tag of the corresponding search
request or `'document'' if the
request was a retrieve request.
score
lines Length of the corresponding dcoument in
lines.
length Length of the corresponding document in
bytes.
headline
types A reference to an array of types valid
for docid.
docid A reference to the WAIS identifier
blessed into `Wais::Docid'.
text Returns the text fetched by `Wais::Retrieve'.
Dictionary
There are a couple of functions to inspect local databases.
See the inspect script in the distribution. You need the
Curses module to run it. Also adapt the directory settings in
the top part.
Wais::dictionary
%frequency = Wais::dictionary($database);
%frequency = Wais::dictionary($database, $field);
%frequency = Wais::dictionary($database, 'foo*');
%frequency = Wais::dictionary($database, $field, 'foo*');
The function returns an array containing alternating the
matching words in the global or field dictionary matching the
prefix if given and the freqence of the preceding word. In a
sclar context, the number of matching word is returned.
Wais::list_offset
The function takes the same arguments as Wais::dictionary. It
returns the same array rsp. wordcount with the word
frequencies replaced by the offset of the postinglist in the
inverted file.
Wais::postings
%postings = Wais::postings($database, 'foo');
%postings = Wais::postings($database, $field, 'foo');
Returns and an array containing alternating numeric document
id's and a reference to an array whichs first element is the
internal weight if the word with respect to the document. The
other elements are the word/character positions of the
occurances of the word in the document. If freeWAIS-sf is
compiled with `-DPROXIMITY', word positions are returned
otherwise character postitions.
In an scalar context the number of occurances of the word is
returned.
Wais::headline
$headline = Wais::headline($database, $docid);
The function retrieves the headline (only the text!) of the
document numbered `$docid'.
Wais::document
$text = &Wais::document($database, $docid);
The function retrieves the text of the document numbered
`$docid'.
Protocol
Wais::generate_search_apdu
$apdu = Wais::generate_search_apdu($query,$database);
$relevant = [$id1, 'TEXT', $id2, 'HTML'];
$apdu = Wais::generate_search_apdu($query,$database,$relevant);
Document id's must be of type `WAIS::Docid' as returned by
`Wais::Result::header' or Wais::Search::header. $WAIS::maxdoc
may be set to modify the number of documents to retrieve.
Wais::generate_retrieval_apdu
$apdu = Wais::generate_retrieval_apdu($database, $docid, $type);
$apdu = Wais::generate_retrieval_apdu($database, $docid,
$type, $chunk);
Request to send the `$chunk''s chunk of the document whichs id
is `$docid' (must be of type `WAIS::Docid'). $chunk defaults
to `0'. $Wais::CHARS_PER_PAGE may be set to influence the
chunk size.
Wais::local_answer
$answer = Wais::local_answer($apdu);
Answer the request by local search/retrieval. The message
header is stripped from the result for convenience (see the
code of `Wais::Search' rsp. documentaion of Wais::Search::new
below).
Wais::Search::new
$result = Wais::Search::new($message);
Turn the result message in an object of type `Wais::Search'.
The following methods are available: diagnostics, header, and
text. Result of the message is pretty the same as for
`Wais::Result'. Just the tags are missing.
Wais::Docid::new
$result = new Wais::Docid($distserver, $distdb, $distid,
$copyright, $origserver, $origdb, $origid);
Only the first four arguments are manatory.
Wais::Docid::split
($distserver, $distdb, $distid, $copyright, $origserver,
$origdb, $origid) = Wais::Docid::split($result);
($distserver, $distdb, $distid) = Wais::Docid::split($result);
($distserver, $distdb, $distid) = $result->split;
The inverse of `Wais::Docid::new' =over 10
diagnostics
Return an array of references to `[$code, $message]'
header Return an array of references to `[$score, $lines, $length,
$headline, $types, $docid]'.
text Returns the chunk of the document requested. For documents larger
than $Wais::CHARS_PER_PAGE more than one request must be send.
Wais::Search::DESTROY
The objects will be destroyed by Perl.
VARIABLES
$Wais::version
Generated by: `sprintf(buf, "Wais %3.1f%d", VERSION,
PATCHLEVEL);'
$Wais:errmsg
Set to an verbose error message if something went wrong. Most
functions return `undef' on failure after setting
`$Wais:errmsg'.
$Wais::maxdoc
Maximum number of hits to return when searching. Defaults to
`40'.
$Wais::CHARS_PER_PAGE
Maximum number of bytes to retrieve in a single retrieve
request. `Wais:Retrieve' sends multiple requests if necessary
to retrieve a document. `CHARS_PER_PAGE' defaults to `4096'.
$Wais::timeout
Number of seconds to wait for an answer from remote servers.
Defaults to 120.
$Wais::maxnumfd
Maximum number of file descriptors to use simultaneously in
`Wais::Search'. Defaults to `10'.
Access to the basic freeWAIS-sf reduction functions
Wais::Type::stemmer(*word*)
reduces *word* using the well know Porter algorithm.
AU: Porter, M.F.
TI: An Algorithm for Suffix Stripping
JT: Program
VO: 14
PP: 130-137
PY: 1980
PM: JUL
Wais::Type::soundex(*word*)
computes the 4 byte Soundex code for *word*.
AU: Gadd, T.N.
TI: 'Fisching for Werds'. Phonetic Retrieval of written text in
Information Retrieval Systems
JT: Program
VO: 22
NO: 3
PP: 222-237
PY: 1988
Wais::Type::phonix(*word*)
computes the 8 byte Phonix code for *word*.
AU: Gadd, T.N.
TI: PHONIX: The Algorithm
JT: Program
VO: 24
NO: 4
PP: 363-366
PY: 1990
PM: OCT
BUGS
`Wais::Search' currently splits the request in groups of
`$Wais::maxnumfd' requests. Since some requests of the group might be
local and/or some might refer to the same host/port, groups may not use
all `$Wais::maxnumfd' possible file descriptors. Therefore some
performance my be lost when more than `$Wais::maxnumfd' requests are
processed.
AUTHOR
Ulrich Pfeifer <
[email protected]>