NAME

NAME
urifind - find URIs in a document and dump them to STDOUT.

SYNOPSIS
$ urifind file

DESCRIPTION
urifind is a simple script that finds URIs in one or more files (using
"URI::Find"), and outputs them to to STDOUT. That's it.

To find all the URIs in file1, use:

$ urifind file1

To find the URIs in multiple files, simply list them as arguments:

$ urifind file1 file2 file3

urifind will read from "STDIN" if no files are given or if a filename of
"-" is specified:

$ wget http://www.boston.com/ -O - | urifind

When multiple files are listed, urifind prefixes each found URI with the
file from which it came:

$ urifind file1 file2
file1: http://www.boston.com/index.html
file2: http://use.perl.org/

This can be turned on for single files with the "-p" ("prefix") switch:

$urifind -p file3
file1: http://fsck.com/rt/

It can also be turned off for multiple files with the "-n" ("no prefix")
switch:

$ urifind file1 file2
http://www.boston.com/index.html
http://use.perl.org/

By default, URIs will be displayed in the order found; to sort them
ascii-betically, use the "-s" ("sort") option. To reverse sort them, use
the "-r" ("reverse") flag ("-r" implies "-s").

$ urifind -s file1 file2
http://use.perl.org/
http://www.boston.com/index.html
mailto:[email protected]

$ urifind -r file1 file2
mailto:[email protected]
http://www.boston.com/index.html
http://use.perl.org/

Finally, urifind supports limiting the returned URIs by scheme or by
arbitrary pattern, using the "-S" option (for schemes) and the "-P"
option. Both "-S" and "-P" can be specified multiple times:

$ urifind -S mailto file1
mailto:[email protected]

$ urifind -S mailto -S http
mailto:[email protected]
http://www.boston.com/index.html

"-P" takes an arbitrary Perl regex. It might need to be protected from
the shell:

$ urifind -P 's?html?' file1
http://www.boston.com/index.html

$ urifind -P '\.org\b' -S http file4
http://www.gnu.org/software/wget/wget.html

Add a "-d" to have urifind dump the refexen generated from "-S" and "-P"
to "STDERR". "-D" does the same but exits immediately:

$ urifind -P '\.org\b' -S http -D
$scheme = '^(\bhttp\b):'
@pats = ('^(\bhttp\b):', '\.org\b')

To remove duplicates from the results, use the "-u" ("unique") switch.

OPTION SUMMARY
-s Sort results.

-r Reverse sort results (implies -s).

-u Return unique results only.

-n Don't include filename in output.

-p Include filename in output (0 by default, but 1 if multiple files
are included on the command line).

-P $re
Print only lines matching regex '$re' (may be specified multiple
times).

-S $scheme
Only this scheme (may be specified multiple times).

-h Help summary.

-v Display version and exit.

-d Dump compiled regexes for "-S" and "-P" to "STDERR".

-D Same as "-d", but exit after dumping.

VERSION
This is urifind, revision $Revision: 1.2 $.

AUTHOR
darren chamberlain <[email protected]>

COPYRIGHT
(C) 2003 darren chamberlain

This library is free software; you may distribute it and/or modify it
under the same terms as Perl itself.

SEE ALSO
the Perl manpage