NAME
Text-Similarity version 0.03
DESCRIPTION
Text-Similarity is a Perl module that allows a user to measure the
similarity between two files. Similarity is based on counting the number
of overlapping words between the two files, and is normalized by the
length of the files.
Pleased note that the smallest unit we are considered for matches are
white space separated strings.
'The cat and the hat' and 'These cats and these hats' will only result
in similarity between 'and', matches below the word level are not
measured.
CONTENTS
When the distribution is unpacked, several subdirectories are created:
/bin
This directory contains a driver program called text_compare.pl that
can be used to conveniently measure two files for similarity.
Please see the perldoc for this program for more details.
/lib
This directory contains the Perl modules that do the actual work of
disambiguation. By default, these files are installed into
/usr/local/lib/perl5/site_perl/PERL_VERSION (where PERL_VERSION is
the version of Perl you are using). See the INSTALL file for more
information.
/doc
This directory contains all of the *pod files used to document the
system. These are processed via pod2text and the output of this is
placed in the top level directory, although these top level text
files should be considered read only.
/t This directory contains test scripts. These scripts are run when you
execute 'make test'.
SEE ALSO
<
http://text-similarity.sourceforge.net>
AUTHORS
Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu
Siddharth Patwardhan sidd at cs.utah.edu
Satanjeev Banerjee banerjee at cs.cmu.edu
Jason Michelizzi
Last modified by: $Id: README.pod,v 1.7 2008/03/20 03:07:58 tpederse Exp
$
COPYRIGHT AND LICENSE
Copyright (C) 2004-2008 by Jason Michelizzi, Ted Pedersen, Siddharth
Patwardhan, Satanjeev Banerjee
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
Note: a copy of the GNU Free Documentation License is available on the
web at <
http://www.gnu.org/copyleft/fdl.html> and is included in this
distribution as FDL.txt.