Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!newsfeed.sgi.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.mathworks.com!newsfeed1.earthlink.net!nntp.earthlink.net!posted-from-earthlink!not-for-mail
From: [email protected]
Newsgroups: comp.infosystems.search
Subject: Web and Internet Search Engine FAQ
Date: Tue, 04 Jan 2000 17:49:13 GMT
X-ELN-Insert-Date: Tue Jan  4 09:55:09 2000
X-Newsreader: Forte Free Agent 1.21/32.243
Organization: Infobasic Inc
X-Posted-Path-Was: not-for-mail
Lines: 653
NNTP-Posting-Host: ip68.pittsburgh5.pa.pub-ip.psi.net
X-ELN-Date: 4 Jan 2000 17:49:41 GMT
Message-ID: <[email protected]>
Xref: senator-bedfellow.mit.edu comp.infosystems.search:2244

Web and Internet Search Engine FAQ
(WISE FAQ (copyright) 1997-1998-1999-2000)
Copyright 1997-1998-1999-2000  Ken Bogucki

[email protected]

WISE FAQ (c) Ver. 4.1 Jan. 2000

==============================

A Windows 95/98 Search Engine Help File and Tutorial is available at
http://infobasic.com

Infobasic has set up a new and greatly expanded list of search and
search engine resources at http://infobasic.com/sengine/index.shtml
Please feel free to submit appropriate URLs to the Infobasic Directory

For a list of General Search Engines go to:
http:infobasic.com/se-gen.html

For a list of Geo-Specific Search Engines go to:
http:infobasic.com/se-geo.html

For a list of Meta Search Engines go to:
http:infobasic.com/se-meta.html

==============================

All email queries, complaints or corrections addressed to
[email protected]


COPYRIGHT
This FAQ is copyrighted material. The copyright is owned by the
author of this FAQ, Ken Bogucki [email protected]  This FAQ may
not be reproduced or distributed, in whole or in part, for
commercial purposes without the express written permission of the
author. This FAQ may be used for non-commercial purposes as long
as the author is notified in advance, the entire FAQ is used
without alterations (except for formatting purposes) and the
copyright notice & warranty notice remain intact.


WARRANTY.
This FAQ is an AS-IS document.

When necessary, double brackets [] are used in this FAQ for
clarity.  These brackets are not part of any search expression.
Their only purpose is to separate the search words,
expressions and results from the surrounding text.


CONTENTS***


1A   Alta Vista  http://www.altavista.digital.com
    1A.1  Alta Vista Simple Searches
    1A.2  Alta Vista Complex Searches
    1A.1  Restricting A Simple and Complex Search
    1A.4  Sorting Results by Ranking
     1A.4.1  Simple Search Ranking
     1A.4.2  Complex Search Ranking
    1A.5  Misc. Information about Alta Vista

1B   Excite  http://www.excite.com
    1B.1  Excite Concept Based Queries
    1B.2  Excite Advanced Queries
    1B.1  Excite Exact Match Queries

1C   Lycos  http://www.lycos.com
    1C.1  Lycos Simple Searches
    1C.2  Lycos Complex Searches

1D   Infoseek   http://www.infoseek.com
    1D.1  Infoseek Simple Searches
    1D.2  Infoseek Complex Searches

1E   Web Crawler  http://www.webcrawler.com
    1E.1  Basic Searches
    1E.2  Using Logical Word Operators

1F   Yahoo  http://www.yahoo.com
    1F.1  Yahoo Menu/Simple Searches
    1F.2  Yahoo Complex Searches


2.0  Quick Reference Card
    2.1  Alta Vista
    2.2  Excite
    2.1  Lycos
    2.4  Web Crawler
    2.5  Yahoo
    2.6  Infoseek

****

1A.0  ALTA VISTA SEARCH ENGINE  http://www.altavista.digital.com
Alta Vista is one of the more complex search engines.  It may
seem intimidating, however, for those with a serious interest or
pressing need to find information, Alta Vista may be the place to
go.

Like other search engines, Alta Vista has simple and complex
searches.  It also contains several other options that allow the
user to optimize their time and efforts.  One is ordering your
search results based on ranking (not necessarily confined to the
original search criteria) and the ability to restrict the search
to certain types and locations of Web pages.

1A.1  ALTA VISTA SIMPLE SEARCHES

apples peaches "orange juice" : documents where only "apples" or
"peaches" or the phrase "orange juice" appear.

+apples +pears -"orange juice" : documents where only "apples"
and "oranges" appear and not the phrase "orange juice".
Wildcard Operator "*"

app* : all documents that contain the words "apples", "applets",
"appraise",  etc.  It will not find "applications" or
"applicable".  The "*" notation can only be used to represent a
max. of 5 characters.

The above Operators can be used in any combination.  For example:


+oranges -app* : documents that contain the word "oranges" but
not the words "apples", "apply" and "applets", etc.

1A.2  ALTA VISTA COMPLEX SEARCHES

There are two ways to construct an Alta Vista complex search.
You can use either Logical Word Expressions or Logical Symbol
Expressions in the search request.  Alta Vista will interpret
both types of logical expressions the same way.

WORD EXPRESSION   is the same as  SYMBOL EXPRESSION
----------------------------------------------------
a AND b           is the same as       a & b
a OR b            is the same as       a | b
a NOT b           is the same as       a ! b
a NEAR b          is the same as       a ~ b


SPECIAL NOTE: Logical word and symbol expressions are precise
search tools.  The search expression... apple AND peach...will
find "apple" and "peach" but not "apples" and "peaches".

In Alta Vista, the complex search page contains an editing window
1 lines by 70 characters.  This window allows you to viewand edit
the entire complex search expression at one glance.

AND
apple AND orange : sites that contain the word "apple" as well as
the word "orange", however, this expression will not display
those sites that have "apples" and "oranges" in the same
document. (See Special Note above)

OR
apple OR orange : sites that contain either the word "apple" or
the word "orange".

NOT
apples NOT oranges : sites that contain the word "apples" but not
the word "oranges"

NEAR
apple NEAR juice :  will generate a list of pages where the word
"juice" is within ten words of the word "apple". Note, the Alta
Vista NEAR operator uses a default 10 word range.
1A.1  RESTRICTING A SIMPLE AND COMPLEX SEARCH
This is a method of confining the Web search to certain pages or
sites that meet specific criteria. [partial list]

anchor:click-here : only search pages that contain the phrase
"click-here" in the text of a hyperlink.

applet:<java class> : only search pages that have the specified
Java class applet in the applet tag of the Web page.

domain:ie : only search pages that originate in the domain .ie
(Ireland), or any of the other country codes and the
miscellaneous standard codes, .com, .org, .mil, etc.

host:xyz.com : only search those pages that reside at the host
name xyz.com.

image:apples.jpg : search those sites that contain the image tag,
"apple.jpg".

link:xyz.com : search those sites with a link to xyz.com.  If you
have a Web page and are curious about how many other pages carry
a link to your page then run this search;
link:www.yourhomepage.com.

title:"Apples and Oranges" : search those pages that have "Apples
and Oranges" in the title of the Web page.


1A.4  SORTING RESULTS BY RANKING

Ranking results, simply, is a way to sort the results of
your search. For example, if you use a complex search for "apples"
and "oranges", you can instruct Alta Vista to sort the results so
that those sites with the most references to "apples" appear
first in the result list.  Simple searches are sorted
automatically by Alta Vista.

1A.4.1  Simple Search Ranking

Alta Vista automatically uses a formula to sort the results of a
simple query.  Results are ranked according to the following
criteria:
 1.  results score highest if the search criteria are meant in
the first few words of a document
 2.  query words and phrases are found close to each other in a
document
 1.  query words or phrases appear more than once in a Web
document.

1A.4.2  Complex Search Ranking

On the complex search page, there is a separate window for
ranking.  After establishing the search expression, go to the
ranking window and insert those words (these words need not be
the same words you used in the search expression) that will be
used to sort the result list.  For example, if your search
expression is; "apples & oranges", you may then use the ranking
window and include the word "California".  The end result is that
the search will produce all those documents that contain the word
"apples" a
nd the word "oranges" in the same document.  With the ranking
example above, Alta Vista will then sort the result list so that
all documents that have a reference to "California" will appear
first in the list.  More than one word or phrase may be used in
the ranking window.


****
1B  EXCITE SEARCH ENGINE  http://www.excite.com

Excite uses several methods for finding the requested
information.  One is a concept based query, another is an advanced
based query and the last is an exact match query.

NOTE: Excite provides it's own relevancy rating.  The user cannot
directly change or alter this rating.

Excite uses " " marks to indicate a phrase search, for example,
"apple butter" will find those sites where the phrase --apple
butter-- can be found but not those sites that list only the word
apple.

1B.1

A concept based query utilizes the relationship between words and
ideas to find matches.  For example, in a concept based search
the keyword "fruit" will yield "fruit", but also, "apples",
"oranges", etc. Concept based queries rely on the user requesting
information in the form of one or more keywords.

1B.2  ADVANCED BASED QUERIES

In a Advanced based query the operators "+" and
"-" are used.

+apples +oranges : documents that have the word "apples" and the
word "oranges" on the same page.

-apples +oranges : documents that have the word "oranges" but not
the word  "apples".

+apple -pears -tarts : documents that have the word "apple" but
not the words "pears" or "tarts".  This query will not return
"apple tarts" but will return "apple turnovers".

1B.1  EXACT MATCH QUERIES

Exact match queries use Logical Word Expressions to find
documents.  The logical word operators are: AND, OR, AND NOT plus
().  Using the logical word operators will turn off Excite's
concept based search.  A keyword search for "fruit" will instruct
Excite to search only for those sites that contain the word
"fruit". Excite will display sites that contain related words
like "apples", "oranges", etc.

apples AND oranges : sites that contain both the words "apples" &
"oranges" in the same document.

apples OR oranges : sites that contain either the word "apples"
or the word "oranges".

apples AND NOT oranges : sites that contain only the word
"apples" but not those sites that contain the word "oranges".

() is an organizational operator. For example, "apples AND
NOT(oranges OR peaches)" will produce sites that contain the word
"apples" but not the words "oranges" or "peaches".

****
1C  LYCOS SEARCH ENGINE   http://www.lycos.com

Lycos has two search levels, simple and complex.  In the case of
Lycos, the complex search function is menu driven and not
difficult to use, however, because of its menu interface this
Lycos search is somewhat more restrictive than other search
engines.

1C.1  STANDARD SEARCH (Simple)
Standard searches do not use Logical Word Operators.

apples oranges peaches  :  will yield sites in which all three
words appear

[ - ] This is a restrictive operator.
apples oranges -berries :  all documents in which "apples" and
"oranges" appear but not those pages where "berries" appear.  If
"apples", "oranges" and "berries" appear in the same document,
this document will not appear in the search results.

[ $ ] This is a wildcard operator.
app$ : will yield all pages in which the words, "apples",
"applications", "applets" appear.

[ . ] This a delimiting tag.  Searching for "apple" will yield
"apples" and "apple", however, if the search were "apple." then
only those documents with the word "apple" will be returned and
not those pages with the word "apples".

1C.2  CUSTOM SEARCHES(Pro Search)

Complex searches are done through a menu interface.  All of this
is fairly intuitive.

Just a very brief explanation is required here.  Everything that
appears on the complex search page has a corresponding on screen
example and explanation.

****
1D  INFOSEEK  http://www.infoseek.com
Infoseek has two search options, simple and complex.  Both search
options provide only limited query syntax.  Infoseek has no way
to rank search results.  However, Infoseek is fast and is more
than suitable for those quick search needs.  The site is low
graphics and works well with text browsers.

1D.1  INFOSEEK SIMPLE SEARCHES

Infoseek's simple searches use a combination of commas, plus and
minus signs, quotes (to make phrase searches) and caps.

apples oranges : will find pages with either "apples" or
"oranges".

+apples oranges :  normally will return pages with just "apples",
however, pages that contain "oranges" as well are acceptable.
Those pages, however, will receive a lower ranking.

"apple juice" : will display those pages where the words "apple"
and "juice" appear next to each other.

Caps are used to indicate proper names and a case sensitive
search:
Johnny Appleseed  : will find only pages with the name "Johnny
Appleseed".

Johnny,Appleseed  : will find pages with either name.  Note:
commas are only used to separate names.

apples -grapes : will find pages with "apples" but not with the
word "grapes".


1D.2  INFOSEEK COMPLEX SEARCHES

There are only a few addition symbols that distinguish a complex
query from a simple query.

the pipe symbol [ | ] is used to construct a search within a set
of search results.

fruit | apple | juice :  will find pages that refer to "fruit"
then search out those pages within that result that contain the
word "apple". Finally, the last group of results will be searched
for any pages that contain the word "juice".

title:fruit : will find any pages where the word "fruit" appears
in the title of the web page.

url:www.orange.com : will find those site that contain the
address "www.orange.com".  The search expression [ url:fruit ]
will find those sites that have the word "fruit" in the URL, for
example, "www.fruit.com".

link:www.juice.com : will find those sites that are linked to the
specified URL

site:xyz.com : will bring up all the sites located at the
specified address.

****
1E  WEBCRAWLER   http://www.webcrawler.com

One of the better Web search engines is WebCrawler, simplybecause
of its flexibility.

1E.1  BASIC SEARCHES

apples oranges pineapples  :  will provide information on those
documents that contain any of the words: "apples", "oranges",
"pineapples".  A simple search expression.

1E.2  USING LOGICAL WORD OPERATORS

AND
apples AND oranges  :  will provide information on documents
where both the words "apples" and "oranges" appear.
OR
apples OR oranges  :  will display information on pages that
contain either of the two search words.  This is similar to the
Simple Search example except that this search employees specific
logical word operators.  The first search  could also be run as:
apples OR oranges OR pineapples.

NOT
fruit NOT apples  :  displays information about "fruit" but not
those pages that reference "apples".

NEAR
cheese NEAR/15 wine  :  will display those pages that contain the
word "cheese" and is within 15 words of the word "wine".  Note,
you can specify  any number of words in the NEAR operator,
NEAR/20, NEAR/5, etc..

ADJ
world ADJ war  :  will display Web pages that contain the word
"world" immediately followed by the word "war"

"  "
Quotes have the same effect as the ADJ command above: "world war"
will provide the same results as:  world ADJ war.

()
Parenthesis are used to organize complex search expressions. For
example:
(wine NEAR/10 cheese) AND apples or "California wine" AND prices
NOT (white OR rose)

****
1F   YAHOO  http://www.yahoo.com

Yahoo is one of the most intuitive search engines to use. There
are two ways to search Yahoo, one is a very simple, menu driven
search and the second is by use of logical word operators.
However, this second search option is also a menu driven search.

1F.1  MENU/BASIC SEARCHES
The Menu interface is easy to use and understand.  Simply select
the type of material you want to search (WEB, Usenet, etc.) and
how the search should be conducted. Select how the results should
be displayed, 20, 10, 40 per page and click the search button.

1F.2  MENU/ADVANCED SEARCHES

[ + ]
apples +oranges : those sites that have "apples" as well as
"oranges" in the same document.

[ - ]
apples -oranges : those sites that have "apples" but not those
sites that have "oranges".
[ t: ]
A restriction operator that will confine the search to Web page
titles. For example, t:apples will restrict the search to pages
with the word "apples" in the title of the page.  It will not
search a page if the page title is "Oranges".  The correct usage
of the "t:" operator in a search expression is [ +t:oranges
+apples ] this expression will yield documents that have the word
"apples" in the Web page and the word "oranges" in the Web page
title.  The expression, "+apples t:oranges" is incorrect.  The "t
:" operator must immediately precede the search word.

[ u: ]
A restrictive operator. Confines the search for the keywords to
certain URLs.
For example, [ u:xyz ] will restrict the search to URLs that have
an "xyz" in the url address.  The "u:" operator follows the same
rules listed for the "t:" operator.

[" "]
Phrase combining operator: "orange juice", "apple juice", etc.

[ * ]
Wildcard search.  For example, "pea*" will return "pears",
"peas", etc.


2.0  REFERENCE CARD

NOTE: This reference card is designed on the assumption that you
have a basic understanding of the search expressions and criteria
covered in prior sections of this FAQ.

The double brackets [] in the reference card are not part of the
query syntax.

****
2.1  ALTA VISTA  http://www.altavista.digital.com

[apples "orange juice"]     "apples" or the phrase "orange juice"
[+apples -"orange juice"]   "apples" & not the phrase
"orangejuice"
[app* (wildcard)]           "apples", "applets", "appraise"
(wildcard in Alta Vista requires Min. of three letters before the
wildcard and will return from 0-5 characters Max.)
Complex Searches (Can use either logical word or symbol
expressions)
AND or &, OR or |, NOT or !, NEAR or ~
[apple AND orange]          "apple" & the word "orange"
[apple OR orange]           "apple" or the word "orange"
[apples NOT oranges]        "apples" but not the word "oranges"
[apple NEAR juice]          "juice" within ten words of "apple"

RESTRICTING A SIMPLE AND COMPLEX SEARCH
[anchor:click-here]         pages with "click-here" in the
hyperlink.
[applet:<java class>]       pages with the Java class in the
applet tag
[domain:xyz]                pages in the domain "xyz"
[host:xyz.com]              sites at the host name xyz.com.
[image:a.jpg]               sites with an image tag, "a.jpg".
[link:xyz.com]              sites with a link to xyz.com.
[text:orange]               sites with "orange" in the visible
text
[title:"A, B and C"]        sites with "A, B and C" in the title.

RANKING
Simple searches: The ranking is automatic.

Complex searches: Enter any word or groups of words in the
ranking window. Alta Vista will sort the results based on these
words.


****
2.2  EXCITE  http://www.excite.com

Concept Based Search
[+apples +pears]            "apples" and "pears"
[-apples +peach]            "peach" but not "apples"
[+apples -pears -berries]   "apples" but not "peaches" or
"berries"

Exact match queries use Logical Word Expressions to find Web
documents.  The Logical Word Operator are: AND, OR, AND NOT.
Using logical word expressions will turn off Excite's concept
based option.  Precise searches require the use of Logical Word
Operators.

[apples AND peaches]         pages with "apples" and "peaches"
[apples OR peaches]          pages with either "apples" or
"peaches"
[apples AND NOT peaches]     pages with "apples" but not
with"peaches"

****
2.3  LYCOS  http://www.lycos.com

STANDARD SEARCH
Standard searches do not use logical word operators.
[apples oranges peaches]    pages where any of the words appear
[apples +berries]           "apples" and "berries"
[apples -berries]           "apples" but not "berries"
[app$ (wildcard)]           "apples", "applets" etc..
[apple.]                    "apple" but not the word "apples"

CUSTOM SEARCHES

Complex searches are done through an intuitive menu interface.
****
2.4  WEBCRAWLER  http://www.webcrawler.com

[apples oranges or apples OR oranges]  pages that contain any of
the words.
[apples AND oranges]        "apples" and "oranges"
[fruit NOT apples]          "fruit" but not "apples"
[cheese NEAR/(x) wine]      "wine" is within "x" words of
"cheese"
[world ADJ war]             "world" & "war" are next to each
other
[".. "  Phrases searches]   "us army", "jack and jill went up the
hill"
[(..)]                       used to organize search expressions

****
2.5  Yahoo  http://www.yahoo.com

Advanced Options:

[apples +oranges]           "apples" as well as "oranges"
[apples -oranges]           "apples" but not with "oranges".
[t:]                        confines the search to certain Web
titles.
[u:]                        confines the search to certain URLs.
[" "] phrase operator       "orange juice", "apple juice", etc.
[pea*  (wildcard)]          "pears", "peas", "peaches" etc.

****
2.6  Infoseek http://www.infoseek.com

Simple Searches

[apples oranges]           either "apples" or "oranges".
[+apples oranges]          "apples", pages with "oranges" are
ranked lower.
["apple juice"]            "apple" and "juice" appear next to
each other.

Caps are used to indicate proper names and a case sensitive
search:
[Johnny Appleseed]         will find the name "Johnny Appleseed".
[Johnny,Appleseed]         will find either name.
Note: commas are only used to separate names.

[apples -grapes]           "apples" but not "grapes".

Complex Searches

[fruit | apple | juice]    will find "fruit" then search results
for "apple" then search those results for "juice".
[title:fruit]              "fruit" in the title of the page.
[url:www.orange.com]       sites with address "www.orange.com".
[url:fruit]                sites with "fruit" in the URL,
"www.fruit.com" or "www.fruitandnuts.com".
[link:www.juice.com]       will find sites linked to the
specified URL
[site:xyz.com]             will find all sites at the specified
address.

****
Contact Information

Corrections, additions or comments can be sent to:

Ken Bogucki
[email protected]

http://www.infobasic.com/

END WISE FAQ (c)
=========================