Path: usenet.cise.ufl.edu!newsfeeds.nerdc.ufl.edu!zombie.ncsc.mil!newsgate.duke.edu!nntp-out.monmouth.com!newspeer.monmouth.com!newsfeed.corridex.com!nntp2.savvis.net!inetarena.com!not-for-mail
From: Ave Wrigley <
[email protected]>
Newsgroups: comp.lang.perl.announce,comp.lang.perl.modules
Subject: ANNOUNCE: HTML::Summary 0.013
Followup-To: comp.lang.perl.modules
Date: 31 Mar 1999 13:23:45 GMT
Organization: Canon Research Centre Europe Ltd
Lines: 38
Approved:
[email protected] (comp.lang.perl.announce)
Message-ID: <
[email protected]>
NNTP-Posting-Host: halfdome.holdit.com
X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content.
Xref: usenet.cise.ufl.edu comp.lang.perl.announce:274 comp.lang.perl.modules:9911
HTML::Summary is a module to extract a summary from an HTML page,
somewhat like that which might be included in a <META NAME=DESCRIPTION>
tag in the page head. The interface allows you to specify a maximum
length for the summary generated. It does so using the location
heuristic, which determines the value of a given sentence based on its
position and status within the document. For example, headings, section
titles and opening paragraph sentences may be favoured over other
textual content.
The distribution contains a number of other modules that HTML::Summary
uses; these are bundled with HTML::Summary because I am still open to
suggestions on the interface / namespace of these modules for this early
release. The other modules are:
Text::Sentence - a module that splits text into constituent sentences.
Lingua::JA::Jcode - a perl5 wrapper around Kazumasa Utashiro's jcode.pl
library for detecting / converting Japanese mutlibyte character
encodings.
Lingua::JA::Jtruncate - a module for truncating Japanese text without
breaking multibyte character encodings.
The HTML::Summary distribution is available through CPAN:
ftp://ftp.perl.org/pub/CPAN/authors/id/A/AW/AWRIGLEY/HTML-Summary-0.013.readme
ftp://ftp.perl.org/pub/CPAN/authors/id/A/AW/AWRIGLEY/HTML-Summary-0.013.tar.gz
I would be grateful for any comments or suggestions on any of these
modules.
Ave.
--
Ave Wrigley, mailto:
[email protected]
Web Group,
http://www.cre.canon.co.uk/
Canon Research Europe, tel: +44-1483-448844
Guildford GU2 5YJ, U.K. fax: +44-1483-448845