ARCHIE(1L)        MISC. REFERENCE MANUAL PAGES         ARCHIE(1L)


NAME
    archie - an Internet archive server listing service

SYNOPSIS
    archie

DESCRIPTION
    The archie system is a program which can  query  a  database
    maintained  by  the  Computer  Science  Department of McGill
    University.  The database contains a list of software  which
    is available by means of anonymous ftp(1) to hosts connected
    to the Internet network.

    The system can be accessed in an interactive fashion or  via
    electronic  mail  (email). In order use the interactive sys-
    tem:

    1)   Connect to  host  quiche.cs.mcgill.ca  (132.206.2.3  or
         132.206.51.1) with telnet(1).

    2)   Login  as  user  archie  (no  capitals,   no   password
         required).   The  system  prints  a  banner message and
         status report.

    3)   Type ``help'' for further information.

    In order to use the email interface, send requests to

              [email protected]

    Send the word ``help'' in a message for  available  commands
    and  features.  Please note that this is an automated inter-
    face: no human sees it. See "THE  EMAIL  INTERFACE"  section
    below.

    Comments and suggestions should be sent to

              [email protected]

    Adimistrative requests such as adding a site to the database
    or  modifying  the  Software  Description Database should be
    sent to

              [email protected]

THE INTERACTIVE INTERFACE
    Variables

    archie has a number of variables which modify its  behavior.
    The  values  of these variables may be changed using the set
    command.  archie distinguishes between three types of  vari-
    able:

    boolean
         which may be either set or unset.

    numeric
         representing an integer within a pre-determined range.

    string
         whose value is a string of characters (which may or may
         not be restricted).

    The following variables are currently recognized

    autologout

         By default, archie will exit after  one  hour  of  idle
         time.   This value can be changed though this variable,
         which represents in minutes, the length  of  idle  time
         before you are automatically logged out.

         The  minimum  and  maximum  values  are  1   and   300,
         representing one minute through five hours.

         Example:

            set autologout 45

         will cause you to be automatically logged out after  45
         minutes of idle time.


    mailto

         A string variable whose value is  a  mail  address,  or
         comma-separated list of addresses. Note that there must
         not be any spaces within the list of addresses. If this
         is  set  and  the  mail command is issued with no argu-
         ments, then the output of the last command is mailed to
         that address.

         Example:

            set mailto [email protected]

         Example:

            set mailto [email protected],[email protected]

         All the various Internet addressing styles  are  under-
         stood. BITNET sites should use the convention

            [email protected]

         UUCP addresses can be specified as

             [email protected]

    maxhits

         A numeric variable whose value is the maximum number of
         matches you want the prog command to generate.

         If archie seems to be slow, or you don't want a lot  of
         output  this  can be set to a small value.  ``maxhits''
         must be within the range 0 to 1000.  The default  value
         is 1000.

         Example:

            set maxhits 100

         prog will now stop after 100 matches have been found

    pager

         A boolean variable which, when  set,  tells  archie  to
         filter  all  output  through  the pager less(1L).  When
         using the pager you may also want to set the term vari-
         able to your terminal type (see term variable).

         Example:

            set pager

    search

         This variable determines the kind of  search  performed
         on  the  database by the prog command, providing flexi-
         bilty on search times and types.


         search is a string variable whose value is one  of  the
         following:

         sub

              Substring (case insensitive). A  simple,  everyday
              substring  search.  A match occurs if the the file
              (or directory) name in the database  contains  the
              user-given substring.

              Example:

                   The pattern ``is'' will  match  ``islington''
              and ``this'' and      ``poison''

         subcase

              Substring (case sensitive). As above but the  case
              of the strings involved becomes significant.

              Example:

                  ``TeX'' will match ``LaTeX'' but not ``Latex''
              or ``TExTroff''.

         exact

              Exact match. The fastest  search  method  of  all.
              The restriction is that the user string (the argu-
              ment to the prog command)  has  to  exactly  match
              (including  case) the string in the database. This
              is provided for those of who who  know  just  what
              you are looking for.

              For example, if you wanted to know where  all  the
              ``xlock.tar.Z''  files  were,  this is the kind of
              search to use.

         regex

              This is the default search method.   Searches  the
              database  with  the  user (search) string which is
              given in the form of an ed(1) regular expression.

              NOTE: Unless specifically anchored to  the  begin-
              ning  (with  ^)  or  end (with $) of a line, ed(1)
              regular  expressions  (effectively)  have   ``.*''
              prepended and appended to them. For example, it is
              not necessary to say

                   prog .*xnlock.*

              since

                   prog xnlock

              will suffice. Thus the regex match becomes a  sim-
              ple substring match.

    sortby

         This variable describes how the output  from  the  prog
         command  is  to be ordered. It can have one of 5 values
         (and their associated reverse orders). For each method,
         the  ``natural''  sort order (or at least, what we con-
         sider to be the natural order) is the default.

         hostname

              Output is sorted on the archive hostname in  lexi-
              cal order.

              Reverse order rhostname

         time

              Output is sorted with the most recent  modifcation
              times  of  the  found  file/directory names coming
              first (youngest -> oldest).

              Reverse order rtime

         size

              Output  is  sorted  by  the  size  of  the   found
              files/directories, largest first.

              Reverse order rsize

         filename

              Sorted in file/directory name lexical order.

              Reverse order rfilename

         none

              This is the DEFAULT order.

              Unsorted. There is no reverse order although rnone
              is accepted for symmetry.

         Typing the keyboard interrupt  character  (  Ctl-C  for
         most  people  on  UNIX)  during a search will cause the
         search to aborted. The results up to that time will  be
         sorted (determined by the value of the sortby variable)
         and the results output. The output phase may itself  be
         aborted by typing the abort character a second time.

    status

         This boolean variable  determines  if  the  status-line
         will  be  displayed while the prog command is searching
         through the database. If  set  (which  is  the  default
         value) then the number of matches and percentage of the
         database searched is displayed. Otherwise no output  is
         given until the search is complete.

    term  This variable tells archie what type of  terminal  you
         are using, and optionally its size in rows and columns.
         This information is used by the pager.

         The usage is:

            set term <terminal-type> [<#rows> [<#columns>]]

         That is, the terminal type is required, but the  number
         of  rows  and  columns  is optional.  You may specify a
         value for rows only, but if  you  want  to  change  the
         number  of  columns you must give a value for both rows
         and columns.  The default values for rows  and  columns
         are 24 and 80.

         Examples:

            set term vt100

            set term xterm 60

            set term xterm 24 100



    Regular Expressions

         archie uses ed(1) regular expressions in  a  number  of
         commands.

         A regular expression, on the one hand, is a string like
         any  other;  a  sequence  of  characters.  On the other
         hand, special characters within the string have certain
         functions  which  make  regular expressions useful when
         trying to match portions of other strings.  In the fol-
         lowing  discussion  and examples, a string containing a
         regular expression will be called the ``pattern'',  and
         the  string against which it is to be matched is called
         the ``reference string''.

         Regular expressions  allow  one  to  search  for  ``all
         strings ending with the letters ize
          '' or ``all strings beginning with a number between  1
         and 3 and ending in a comma''.

         In order to accomplish this, regular expressions co-opt
         the  use  of  some  characters to have special meaning.
         They also provide for these characters  to  lose  their
         special  meaning  if the user so desires. The rules for
         regular expresssion are


    c    Any character c  matches  itself  unless  it  has  been
         assigned  other  special  meaning as listed below. Most
         special characters can be escaped  (made  to  lose  its
         special meaning), by placing the character '\' in front
         of it. This doesn't apply to '{' which  is  non-special
         until  it  is  escaped.  Thus although '*' normally has
         special meaning the string '\*' matches itself.

         Example:

         The pattern

              acdef

         matches

              s83acdeffff or acdefsecs or acdefsecs

         but not

              accdef or aacde1f

         That is it will any string that contains ``acdef'' any-
         where in the reference string.

         Example:

              Normally the characters '*'  and '$' are  special,
         but the pattern

              a\*bse\$

         acts as above. That is any reference string  containing
         ``*abse$'' as a substring will be flagged as a match.



    .     A period matches  any  character  except  the  newline
         character. This is known as the wildcard character.

         Example:

              The pattern

               ....

         will match any 4 characters in  the  reference  string,
         except a newline character.


    ^    If `^' appears at the begining of the pattern  then  it
         is said to ``anchor'' the match to the beginning of the
         line. That is, the reference string must start with the
         pattern  following  the  `^'. If this character appears
         anywhere else other than at the beginning of the  line,
         then  it  is  no longer considered special, and matches
         itself as any non-special character would. Similarly if
         it starts a string but is escaped, it matches itself.

         Example:

         The pattern

              ^efghi

         Will match

              efghi or efghijlk

         but not

              abcefghi

         That is the pattern will  match  only  those  reference
         strings  starting  with  ``efghi''. Just containing the
         substring is not sufficient.


    $     Occurring at the end of the  pattern,  this  character
         ``anchors''  the pattern to the end of the line (refer-
         ence string). A '$' occurring anywhere else in the pat-
         tern  is  regarded as a non-special. Similarly if it is
         at the end of the pattern but is escaped,  it  is  non-
         special.

         Example:

         The pattern

              efghi$

         Will match

              efghi or abcdefghi

         but not

              efghijkl

         That is the pattern will  match  only  those  reference
         strings ending with ``efghi''. Just containing the sub-
         string is not sufficient.


    \<    This sequence in the pattern causes the one  character
         regular expression following it only to match something
         at the beginning of a word: the beginning of a line  or
         just  before a letter, digit or underline character, or
         just after a charcter which is not one of these.

         Example:

              The pattern

              \<abc

         would match the last 'abc' in the reference string

              @hijabc#+abc

         but not the first since the first 'abc' did  not  start
         on a ``word'' boundary.


    \>    Constrains the one-character regular  expression  fol-
         lowing  it  to  be  at the end of a ``word'' as defined
         above.


    [string]

         One or more characters within  square  brackets.   This
         pattern  matches any single character within the brack-
         ets. The caret, '^', has a special meaning if it is the
         first  character  in the series: the pattern will match
         any character other than one in the list.

         Example:

              The pattern

              [^abc]

         Will match any character except 'a', 'b' or 'c'.

         To match a right bracket, ']', in the list it  must  be
         put first:

              []ab01]

         For a caret, '^', in the list it  can  appear  anywhere
         but first.

         In

              [ab^01]

         the caret loses its special meaning.


         The '-' character is special within square brackets. It
         is  interpreted  as a range of characters (in the ASCII
         character set) and  will  match  any  single  character
         within  that  range.   '[a-z]'  matches  any lower case
         letter. The '-' can be made non special by  placing  it
         first or last within the square brackets.


         The characters '$', '*' and '.' are not special  within
         square brackets.


         Example:

              The pattern

              [ab01]

         matches a single occurence of a character from the  set
         'a', 'b', '0', '1'.

         Example:

              The pattern

              [^ab01]

         will match any single character other  than  'a',  'b',
         '0', '1'.


         Example :

              The pattern

              [a0-9b]

         which matches one of 'a', 'b' or a digit between 0  and
         9 inclusive.

         Example :

              The pattern

              [^a0-9b.$]


         means any single character not 'a', 'b' '.' , '$' or  a
         digit between 0 and 9 inclusive.

    *     An asterisk following a regular expression in the pat-
         tern   has   the   effect  of  matching  zero  or  more
         occurrences of that expression.

         Example:

              The pattern

              a*

         means zero or more occurrences of the character 'a'.


         Example:

              The pattern

              [A-Z]*

         means zero or more occurrences of the upper case alpha-
         bet.




    \{m\}

    \{m,\}

    \{m,n\}

         A one-character regular expression followed by  one  of
         the  three  of  these  constructions  causes a range of
         occurrences of that regular expression to  be  matched.
         If  it  is  followed by \{m\} where m is a non-negative
         integer between 0 and 255 (inclusive), then  exactly  m
         occurrences  of that regular expression are matched. If
         followed by \{m,\}, then at  least  m  occurrences  are
         matched.   Finally, if it is followed by \{m,n\} (where
         n is a non-negative integer between 0 and 255 and where
         n > m), then between m and n occurrences of the expres-
         sion are matched.

         Example:

              The pattern

              ab\{3\}

         would match any substring in the reference string of an
         'a' followed by exactly 3 'b's.

         Example:

              The pattern

              ab\{3,\}

         would match any substring in the reference string of an
         'a' followed by at least 3 'b's.


         Example:

              The pattern

              ab\{3,5\}

         would match any substring in the reference string of an
         'a' followed by at least 3 but at most 5 'b's.


         Common Problems with Regular Expression


    (1)  When matching a substring it is not  necessary  to  use
         the  wildcard character to match the part of the refer-
         ence string preceeding and following the substring.

         Example:

              The pattern

              abcd

         will match any reference string  containing  this  pat-
         tern. It is not necessary to use

               .*abcd.*

         as the pattern.


    (2)  In order to constrain a pattern to the entire reference
         pattern, use the the construction:

              ^pattern$


    (3)  The easiest way to obtain case insensitivity in a regu-
         lar  expression  is to use the '[]' operator. For exam-
         ple, a pattern to match the word ``hello'' regarless of
         the case of the letters would be:

              [Hh][Ee][Ll][Ll][Oo]


    Commands

         Arguments to commands shown  here  in  square  brackets
         '[]' are optional. All others are mandatory.  help List
         the valid archie commands.

    list [pattern]
         This command provides a list  of  the  sites  currently
         stored  in the database and the time at which they were
         last updated.  There is an optional regular  expression
         argument to limit the list to specific sites.

         Note that the numerical (IP) address associated with  a
         site  name  is valid at the listed time, but since they
         do  occasionally  change,  it  is   possible   that   a
         discrepancy may occur until that site is updated in our
         database. Furthermore, the listed  IP  address  is  the
         primary,  as  listed  in  the  DNS  database: secondary
         addresses are not stored.

         Example:

              list

         will list all sites in the database, while

              list \.de$

         lists all German sites.

    mail [address1,[address2...]]
         With an argument (or arguments) the output of the  last
         command  is  mailed  to the specified address or comma-
         separated list of addresses.   No  spaces  must  appear
         anywhere in the address list.

         Example:

              mail [email protected],[email protected]

         Without an argument the output of the last  command  is
         sent to the address specified in the mailto variable.

         Example:

              mail

         All the various Internet addressing styles  are  under-
         stood. BITNET sites should use the convention

              [email protected]

         UUCP address can be specified as

              [email protected]

    prog pattern
         Find all occurrences of programs  with  names  matching
         pattern.  How  pattern  is  interpreted  depends on the
         value of the search variable.   The  output  lists  the
         names  of  hosts with matching entries, the size of the
         matching program, its last modification  date  and  its
         path.

         The results are sorted according to the  value  of  ths
         sortby  variable, and are limited in number by the max-
         hits variable.

    set variable-name
         This command allows you to set one  of  archie's  vari-
         ables.   Their  values affect how archie interacts with
         the user.

         boolean variables are either set or unset

         Example:

              set pager

         numeric variables take a number within a certain range

         Example:

              set maxhits 500

         string variables take a  (possibly  restricted)  string
         value

         Example:

              set sortby time


         See entries on unset and show .



    show [variable-name]
         This command is  used to display the value of a partic-
         ular  variable,  or  all variables. With an argument it
         will display the value of  that  variable,  without  an
         argument it will display the value of all variables.

         Example:

            show maxhits

    site sitename
         This command allows you to get a  full  listing  of  an
         ftp(1)  site in the archie database.  The output format
         is similar to that of UNIX ls(1) long  recursive  (-lR)
         listing.

         Example:

            site col.hp.com

    unset variable
         This causes the specified variable to  have  no  value.
         This  means that it will not be used by archie until it
         has been given a value with the set command.

         Note: this may cause ``counter-intuitive'' behaviour in
         some  cases  (e.g.  in the case of maxhits ).  Although
         one might expect prog to print matches  without  regard
         for  any  limit, this is not the case.  If the value of
         maxhits is not available it will merely  fall  back  to
         some internal default.

    whatis substring
         This  command searches the archie Software  Description
         Database  for  the  given  substring,  with  case being
         ignored. This database  consists  of  names  and  short
         descriptions  of  many  of the software packages, docu-
         ments (like RFCs and  educational  material)  and  data
         files that are stored on the Internet.

         Example:

            whatis uucp

         in part gives as a result:

              findpath.sh             UUCP Pathfinder
              logfile-stats           UUCP LOGFILE analyzer
              mapstats                UUCP map statistics program

         We welcome and encourage additions and  corrections  to
         this  database  and depend on the archie user community
         to keep it uptodate. To make your contribution to  this
         database, mail to


                   [email protected]

         For new additions, please keep the  description  to  25
         words or less.


THE EMAIL INTERFACE
    The archie email interface currently accepts a limited  sub-
    set of the interactive interface commands, plus a few of its
    own. Currently variables are  not  supported  in  the  email
    interface.


    Requests to this server should be addressed to

                   [email protected]

    Note that the ``Subject:'' line in  incoming  mail  is  pro-
    cessed  as if it were part of the main message body. No spe-
    cial keywords are required.

    Note that the help command is exclusive. All other  commands
    in the same message are ignored.

    The server recognizes the following commands. If  a  message
    not  containing  any  valid  requests or an empty message is
    received, it will be considered to be a help request.


    path path
         This lets the requestor override the address that would
         normally  be  extracted from the header.  If you do not
         hear from the archive server within a couple  of  hours
         might  consider  adding a path command to your request.
         The  path  describes  how  to  mail  a   message   from
         cs.mcgill.ca  to  your  address.  cs.mcgill.ca is fully
         connected to the Internet.


         BITNET users can use the convention

              [email protected]

         UUCP user can use the convention

              [email protected]


    help Will send you a message describing how to use the email
         interface (basically this section).


    prog <reg expr1> [<reg exp2> ...]

         A search of the archie database is performed with  each
         <reg exp> (a regular expression as defined by ed(1)) in
         turn, and any matches found are returned to the reques-
         tor.  Note that multiple <reg exp> may be placed on one
         line, in which case the results will be mailed back  to
         you  in  one message.  If you have multiple prog lines,
         then multiple messages will be returned, one  for  each
         line  [This  doesn't  work as expected at the moment...
         stay tuned].

         Any regular expression containing spaces must be quoted
         with  single  (') or double (") quotes. ALL OTHER ed(1)
         rules must be followed.

         NOTE: The searches are CASE SENSITIVE. The  ability  to
         change this will hopefully be added soon.

         The prog command is currently executed as if the search
         variable were set to regex.


    site <site name> | <site IP address>

         A listing of the given <site name>  will  be  returned.
         The  fully  qualified  domain name or IP address may be
         used.


    compress

         ALL of your files in the current mail message will  run
         through  compress(1)  and uuencode(1). When you receive
         the reply, remove everything before the ``begin''  line
         and run it through uudecode(1).  This will produce a .Z
         file. You can then run uncompress(1) on this  file  and
         get the results of your request.



    quit Nothing past this point is interpreted.  This  is  pro-
         vided  so that the occasional lost soul whose signature
         contains a line that looks like a command can still use
         the server without getting a bogus response.



THE ARCHIE DATABASE
    The archie database subsystem maintains a list of about  600
    Internet  ftp(1)  archive  sites.   Each night, the database
    subsystem executes an anonymous ftp(1) to a subset of  these
    sites  and  fetches a recursive directory listing (or a file
    containing the recursive directory listing if this  exists).

    Currently,  each  site  gets  updated  approximately  once a
    month.    The   directory    listings    are    stored    on
    quiche.cs.mcgill.ca  (132.206.2.3), where they are available
    to the Internet community via anonymous ftp(1).  They appear
    in the directory ~ftp/archie/listings in compressed form.

BUGS
    1)   Only UNIX sites are included in the database.

    2)   The user can not limit searches to specific sites.

    3)   There is no graphical user interface.

    4)   There is no way to abort the help facility completely.

    It is hoped that all these will change in coming versions.


LONG TERM PLANS
    The archie system is regarded as  being  ``in  development''
    and  is not being released to outside sites at present.  The
    current database requires about 70 MB of disk  storage,  and
    the  updates  and  searches put a noticeable load on the Sun
    4/280 on which it operating.  Eventually, we hope to distri-
    bute archie to several sites around the world.

    We welcome comments and suggestions;  please  send  them  to
    [email protected].

SEE ALSO
    ftp(1), telnet(1)

AUTHORS
    Alan Emtage ([email protected]), McGill University.

    Bill Heelan ([email protected]), McGill University.


    Manual page by R. P. C. Rodgers, UCSF  School  of  Pharmacy,
    San           Francisco,           California          94143
    ([email protected]) and Alan Emtage.


















Downloaded From P-80 International Information Systems 304-744-2253