Search Engines 1

About search engines

Search engines are available for keyword searches of databases of links to Web pages. Where Web directories are assembled by human editors, search databases are collected automatically by Web crawling software programs that troll the Web, indiscriminately grabbing new data wherever they find it. Most of the major search engines catalog entire Web pages so that page content can be searched along with the titles and other meta-information supplied by Web authors.

Unfortunately, there is no one search engine available that can find everything on the Web. The amount of Web information available grows every second as new pages are added and old pages are replaced or expanded. Cataloging all of this information reliably, despite many companies' claims, would have to be impossible. Even if it were possible to find and fully catalog every Web site in the world, pages are moved, updated, and deleted so often that by the time such a database could be completed, much of the information collected would be obsolete. However, the major search engines have more pages cataloged than any other resources on the Web.

Crafting a search

Finding fast results with a search engine is never a problem; the challenge is to create searches that exclude irrelevant pages.

To search for pages about Apollo 13 astronaut, Jim Lovell, for example, I would start by entering keywords to see what comes up. Look at the results of a simple keyword search on AltaVista for "Apollo 13," and "Jim Lovell." I see that the words "Apollo" and "13" might appear separately in many documents: "13 shows at the Apollo," "13 reasons why Apollo is the best Greek god," "The Apollo Vacuum Cleaner Co site, last updated on Sept 13, 1997." "Jim" and "Lovell" is a fairly safe combination, in this case, because "Lovell" is not a very common term.

After seeing the results of a simple keyword search, I will want to find a way to zero in on information about Jim Lovell, the astronaut, and exclude pages about other Jims, Lovells, or Jim Lovells in the world. I will also want it to exclude commercial information or reviews of the movie and books about him.

Search engines have made it possible to refine searches by recognizing special symbols, called "operators." Operators are sets of symbols or special words that sit alongside the keywords in a query and tell the search engine how to process them. Operators let users specify whole phrases, words to exclude, words that should appear, and words that should appear close together.

Phrase searching ("")

In the example we have seen that putting the words "Apollo" and "13" into AltaVista will produce too many irrelevant hits. We are not, after all, looking for all of the documents that contain the terms "Apollo" and "13," or "Jim" and "Lovell" we are trying to find only those documents that contain the phrase "Apollo 13," and/or the name "Jim Lovell."

With AltaVista, Infoseek, Excite, and Yahoo! Search, it is possible to specify phrases and proper names by putting them in quotation marks ("").

Try searching AltaVista for the whole phrase "Apollo 13." Compare the results with the "Apollo" and "13" search.

Includes and Excludes (+/-)

To make our search for Jim Lovell (and not his movie or book) even more precise, we can include and exclude certain terms. If we add a plus sign (+) in front of a term in the search box, the search engine will know that the term must appear in the pages. If we add a minus sign (-) in front of a term, the search engine knows that the term must not appear in the pages.

We can also mix includes and excludes with phrase searching. With the Jim Lovell example, we can do a search using "+"Jim Lovell" astronaut -movie -book," and ""Apollo 13" -book -movie -film." Of course, you will notice that this is not a perfect approach for this search. Those pages which contain even passing references to the "movie" and "book" are excluded, even if the rest of the page has information about Jim Lovell, or the flight of Apollo 13.

Includes and excludes are available on Yahoo! Search, AltaVista, Infoseek, and Excite.

Boolean logic (AND, OR, NOT, NEAR)

AltaVista Advanced Search and Excite use Boolean, or proximity, operators (always in uppercase) instead of includes and excludes (+/-). Boolean operators can add even more specificity to a search. By placing one of the Boolean operators between terms you can include and exclude terms, find cases where terms occur close together, and account for synonyms for words in a phrase. Here are some basic Boolean operators:

OperatorExampleRead as...
ANDJim AND LovellFind all pages containing both the terms "Jim" and "Lovell" anywhere on the page.
ORJim OR LovellFind all pages containing the term "Jim." Find all the pages containing the term "Lovell."
AND NOTLovell AND NOT JimFind all pages that contain the term "Lovell." Of those pages, exclude ones that contain the term "Jim."
NEARApollo NEAR moon(AltaVista:) Find all pages that contain the terms "Apollo" and "moon" within 10 spaces of each other.

Several operators can be used at once with parentheses to create even more complex searches. But be careful where you put the parentheses:

ExampleRead as...
(Jim OR James) AND LovellFind pages that contain the terms "Jim" and "Lovell." Find pages that contain "James" and "Lovell."
Jim OR (James AND Lovell)Find pages that contain the term "Jim." Find pages that contain the terms "James" and "Lovell"
("Jim Lovell" NEAR astronaut) AND ApolloFind pages that contain the phrase "Jim Lovell" within 10 characters of the term "astronaut," and contain the term "Apollo" somewhere on the page.

More about search engines

Here are four selected resources on search engines.


Version 1.2
© Copr. 1997 Craig Branham
BRANHACC@SLU.EDU
Saint Louis University
Created: 26-Sept-97
Last Modified: 02-Oct-98

URL for this Document: http://www.slu.edu/departments/english/research/page8.html