A Student's Guide to WWW Research

Part 3 of 3

by
Craig Branham
Saint Louis University

Student Guide Home

Anatomy of a page

Evaluating for relevance
Authority of a Web page
Evaluating for accuracy

Page types:

Informational pages
News sources
Advocacy web pages
Personal home pages
Glossary: part 1

Web search strategies:

Getting started
Web directories
Search engines 1
Search engines 2
Citing online sources

Glossary: part 2

The Search process

[ImageMap of Search Process]


A Web Search Process: One way to research a topic on the Web. The tools you employ may vary. This diagram is an image map; to hear more about a particular step in the process, click on the appropriate box or word in the diagram.

Now that you have some practice evaluating Web pages, we will consider some Web search strategies. The course that your Web research takes has largely to do with the development of your argument, and your research needs at the current stage of your project.

Although I have tried to reduce the research process to a basic procedure here, it is important to improvise with it a bit. Get to know the tools outlined on the pages that follow and come up with your own variants of the suggested process to suit your particular project and preferences.

The research-writing process itself is highly personal and recursive, but there are some activities which are fairly consistent in every project. Authors start by doing very general research just to define the topic and formulate a sensible question about it. Having found a general topic, an author might feel confident enough to propose a tentative thesis and return to the stacks to explore the topic in more detail. Then, as the research evolves, the author may return to the library several more times to follow up leads that emerge in the work itself.

The chart above shows the different components of the search process and their relationships. Each decision block (question mark box) makes a fork in the path, and each action (exclamation mark) box is a new step in the process:

1. Do you have a topic?

At this point "topic" can be interpreted loosely. You just need a general, sketchy, idea to get started. But before you begin your research, you may want to sit down and write for a half hour about what you know about the topic so far, and especially about what you think you may need to know. Alternatively, you might use the half hour to make a list of ideas and questions you have about your general topic. Think of all the different ways you might approach a discussion of the topic.

Up to diagram

1a. Check lists of suggested topics

If you do not have a topic, there are a few sites that can help you select one. At this stage you might also browse encyclopedias and other reference books for ideas (though it is not a good idea to cite reference books heavily in a serious research project). For more information, see Rose Adams' guide, "Beginning Research on Any Topic;" and the section, "How to Find and Develop a Viable Research Topic," in Michael Engle's "Library Research at Cornell: A Hypertext Guide."

Up to diagram

2. Are you looking for general information or specific details?

In other words, are you browsing for all kinds of information about your topic ("carry laws"), or are you trying to find some specific information (e.g., which states currently let people carry guns)? If you are looking for general information, have a topic which may be too broad, or know that the topic contains keywords that would be too common on the Web (like "Web usage statistics"), you can begin by searching the Web directories, sites which catalog Web pages by subject. If you know exactly what you are looking for, and can come up with precise search words, conduct keyword searches with search engines.

Up to diagram

2a. Browse Web directories

Web directories store and present links to Web sites under subject and topic categories. Because of the subjectivity involved in categorization, Web directories have to be assembled by people; where search engines, which make only weak subject classifications at best, compile their databases with Web crawling software. For this reason, Web directories are advantageous for certain tasks. For instance, if you have only a subject ("taxes," "animal rights") that you would like to narrow to a specific topic , you can just follow the subject links in the directory down to the editors' topic and sub-topic menus for inspiration.

Web directories can also help you read "around" a narrow topic. If you are not hunting for specific facts, but are looking for a range of related ideas, a Web directory can point you to sites representing all of the issues surrounding the topic.

Up to diagram

2b. Narrow the topic

Narrow your topic down to something that can be thoroughly researched in the time allowed and can receive detailed attention within the assigned length of the paper. For example, you might narrow a topic from "Medieval memory theory" to "Chaucer's use of the Medieval 'arts of memory' in The House of Fame." Or, from "Drug Abuse in Sports" to "the ethics of mandatory drug testing for high school athletes."

You should be able to express your topic as a kind of discussion question, one that starts with the word "why." For example, "Why does Chaucer make use of the images and backgrounds of the Medieval art of memory in Book I of The House of Fame?" Or, "Why is mandatory drug testing for high school athletes wrong?" If you have trouble narrowing the topic, or cannot express your topic as a "why"-question, you may need to do a little more research or exploratory writing.

Up to diagram

3. Create a list of keywords

Before using the search engines, you should create a list of keywords and phrases that will produce the longest list of relevant pages from the search engines.

After you have produced a list of phrases, enter these words into a few selected search engines. As you see the results of your searches, add new terms to your list.

Keywords have to be chosen carefully. A search with just the word "ghost" might produce a list of thousands of pages that would include sites devoted to ghost folklore; class syllabi on horror fiction; ghost notes; Casper, the friendly ghost; details about Bill Bruford's CD, "If Summer Had its Ghosts;" ghost writers, etc. It is important to choose keywords carefully to keep irrelevant results at bay.

Up to diagram

4. Keyword search

Special commercial sites on the Web employ "search engine" software to give users the ability to keyword search large databases of links to Web sites. These databases are useful for generating long lists of results quickly. Search engines generally produce a lower percentage of useful results than the subject guides; but because search engines catalog larger percentages of the Web, they are bound to produce links to more relevant pages than a simple Web directory search.

Up to diagram

The End

Ha! Did you really think this was the end? Though we tend of think of research as something we finish before we start writing, writing and research are actually parallel activities: one always grows from questions raised in the other. As you research, write exploratory drafts, or keep a journal to record ideas about what you find in your research. Occasionally you will find that there are questions you will discover that will have to be accounted for or points you raise that will need further substantiation. This will send you back to the stacks, or back to the computer, to repeat some of the steps we have discussed above.

Up to diagram


Current Page: Finding information
Personal home pages  |  Topic selection
Return to home page


Student Guide Home

Anatomy of a page

Evaluating for relevance
Authority of a Web page
Evaluating for accuracy

Page types:

Informational pages
News sources
Advocacy web pages
Personal home pages
Glossary: part 1

Web search strategies:

Getting started
Web directories
Search engines 1
Search engines 2
Citing online sources

Glossary: part 2

Getting Started

If you have a research paper assignment and cannot think of a topic, there are a few Web sites that might inspire you:

  • The Best Information on the Net site offers a great page called "Hot Topics."
  • The Library of Congress Learning Page is a good pointer for various topics.
  • The 'Lectronic Law Library Newsroom provides links to recent legislation and e-text versions of documents related to current events. Take the tour of the library to find out how it works.
  • StudyWeb is a commercial Web directory designed for student research. It is organized by subject, and within the subject, by topic. Read about StudyWeb.














Current Page: Getting started
The search process  |  Web directories
Return to home page



Student Guide Home

Anatomy of a page

Evaluating for relevance
Authority of a Web page
Evaluating for accuracy

Page types:

Informational pages
News sources
Advocacy web pages
Personal home pages
Glossary: part 1

Web search strategies:

Getting started
Web directories
Search engines 1
Search engines 2
Citing online sources

Glossary: part 2

Web Directories

Other resources:

Web directories are designed for users seeking information on a wide topic or subject area: "rock music industry," "Winston Churchill," "The Boer War." Web directories present users with a menu of broad subjects. Users select the subject category that covers their particular area of interest, then select links to sub-menus closer to their topics until they find links to relevant resources.

Menu-driven Web directories, like Yahoo! and The Argus Clearinghouse, have the advantage over search engines that they are assembled by teams of editors who specialize in the available subjects. Where keyword searches can sometimes make the user sort through thousands of links, with a very low ratio of results close to what the user is actually looking for, Web directories can often point users to relevant information with just a few menu selections. Though there is no guarantee that every site linked to a Web directory will be accurate and relevant, the odds of finding irrelevant information in a Web directory is much lower than in the results of a Web search engine.

The distinction between Web directories and search engines is becoming weaker all the time, however. Yahoo!, one of the first directories on the Web, has always offered users the option of searching by keyword. Web search engines like Lycos and Infoseek, have now retro-fitted their sites to give users the choice to browse links by subject. Excite blurs the categories even further by offering topical searches and links to "related" pages with their search results.

More detailed information about the three recommended Web directories follows. Go directly to these sites by selecting the appropriate link in the bottom frame of the window.


Whenever I have ventured out into the world of search engines and directories, I always find myself drawn back to Yahoo!. Yahoo! is still the easiest directory to use. Although it has a smaller database than the major search engines, I can usually count on Yahoo! to produce a list of the essential sites.

The real advantage of Yahoo! is that it mixes hierarchical subject menus and keyword search capabilities. The most effective way to use Yahoo! is to start by selecting a subject link from the front page, then select links until you find a sub-topic menu close to your research topic. Then, enter a word or phrase in the "Search" box, hit the "search within" radio button, then hit the search button to focus in on a particular topic.

Another way to use Yahoo! is to begin with a keyword search, then browse the subject categories that appear at the top of the results page until you find pertinent information.

One advantage that "true" search engines have over Yahoo! is that many of them can search for keywords within the content of a Web page, where Yahoo! just searches by title and subject. This is why it is important to combine the results of a Yahoo! search with results from other search engines.

Before searching Yahoo!, read their help file.

Top of page

The Argus Clearinghouse is a directory of Web directories. By choosing a subject area from the main menu; then navigating subject, topic, and sub-topic menus; you can find links to independent directories that pertain to your topic.

The Clearinghouse has an editorial staff to make sure that the site contains links to only to the best available Web directories. Though it may not have as many links as Yahoo!, the sites available through the Clearinghouse will be more likely to lead to reliable sources because they are all hand-picked by specialists in the field.

Before browsing The Argus Clearinghouse, read their FAQ list and read about their rating system.

Top of page

Excite is actually a search engine, but it is one that can do sophisticated "concept" searches, and it can search for pages "related" to ones that come up in the results of a keyword search. These features make it worthwhile to check Excite before moving on to other search tools. If you have a vague topic, Excite may be able to point you toward a list of pages that will help focus the topic. If your topic is too focused, Excite may be able to provide a list of related pages that will widen your view.

To search for a concept, just put in the main keywords. Using one of our example topics, "the ethics of mandatory drug testing of high school athletes," we would type "drug testing high school athletes" into the search box. The results of an Excite search are presented by statistical relevance: the documents that have the most occurrences of your terms or concept will appear at the top of the list.

Read these search tips before performing a search with Excite.

Top of page

Current Page: Web directories
Getting started  |  Search engines 1
Return to home page



Student Guide Home

Anatomy of a page

Evaluating for relevance
Authority of a Web page
Evaluating for accuracy

Page types:

Informational pages
News sources
Advocacy web pages
Personal home pages
Glossary: part 1

Web search strategies:

Getting started
Web directories
Search engines 1
Search engines 2
Citing online sources

Glossary: part 2

Search Engines 1

About search engines

Search engines are available for keyword searches of databases of links to Web pages. Where Web directories are assembled by human editors, search databases are collected automatically by Web crawling software programs that troll the Web, indiscriminately grabbing new data wherever they find it. Most of the major search engines catalog entire Web pages so that page content can be searched along with the titles and other meta-information supplied by Web authors.

Unfortunately, there is no one search engine available that can find everything on the Web. The amount of Web information available grows every second as new pages are added and old pages are replaced or expanded. Cataloging all of this information reliably, despite many companies' claims, would have to be impossible. Even if it were possible to find and fully catalog every Web site in the world, pages are moved, updated, and deleted so often that by the time such a database could be completed, much of the information collected would be obsolete. However, the major search engines have more pages cataloged than any other resources on the Web.

Crafting a search

Finding fast results with a search engine is never a problem; the challenge is to create searches that exclude irrelevant pages.

To search for pages about Apollo 13 astronaut, Jim Lovell, for example, I would start by entering keywords to see what comes up. Look at the results of a simple keyword search on AltaVista for "Apollo 13," and "Jim Lovell." I see that the words "Apollo" and "13" might appear separately in many documents: "13 shows at the Apollo," "13 reasons why Apollo is the best Greek god," "The Apollo Vacuum Cleaner Co site, last updated on Sept 13, 1997." "Jim" and "Lovell" is a fairly safe combination, in this case, because "Lovell" is not a very common term.

After seeing the results of a simple keyword search, I will want to find a way to zero in on information about Jim Lovell, the astronaut, and exclude pages about other Jims, Lovells, or Jim Lovells in the world. I will also want it to exclude commercial information or reviews of the movie and books about him.

Search engines have made it possible to refine searches by recognizing special symbols, called "operators." Operators are sets of symbols or special words that sit alongside the keywords in a query and tell the search engine how to process them. Operators let users specify whole phrases, words to exclude, words that should appear, and words that should appear close together.

Phrase searching ("")

In the example we have seen that putting the words "Apollo" and "13" into AltaVista will produce too many irrelevant hits. We are not, after all, looking for all of the documents that contain the terms "Apollo" and "13," or "Jim" and "Lovell" we are trying to find only those documents that contain the phrase "Apollo 13," and/or the name "Jim Lovell."

With AltaVista, Infoseek, Excite, and Yahoo! Search, it is possible to specify phrases and proper names by putting them in quotation marks ("").

Try searching AltaVista for the whole phrase "Apollo 13." Compare the results with the "Apollo" and "13" search.

Includes and Excludes (+/-)

To make our search for Jim Lovell (and not his movie or book) even more precise, we can include and exclude certain terms. If we add a plus sign (+) in front of a term in the search box, the search engine will know that the term must appear in the pages. If we add a minus sign (-) in front of a term, the search engine knows that the term must not appear in the pages.

We can also mix includes and excludes with phrase searching. With the Jim Lovell example, we can do a search using "+"Jim Lovell" astronaut -movie -book," and ""Apollo 13" -book -movie -film." Of course, you will notice that this is not a perfect approach for this search. Those pages which contain even passing references to the "movie" and "book" are excluded, even if the rest of the page has information about Jim Lovell, or the flight of Apollo 13.

Includes and excludes are available on Yahoo! Search, AltaVista, Infoseek, and Excite.

Boolean logic (AND, OR, NOT, NEAR)

AltaVista Advanced Search and Excite use Boolean, or proximity, operators (always in uppercase) instead of includes and excludes (+/-). Boolean operators can add even more specificity to a search. By placing one of the Boolean operators between terms you can include and exclude terms, find cases where terms occur close together, and account for synonyms for words in a phrase. Here are some basic Boolean operators:

OperatorExampleRead as...
ANDJim AND Lovell Find all pages containing both the terms "Jim" and "Lovell" anywhere on the page.
ORJim OR Lovell Find all pages containing the term "Jim." Find all the pages containing the term "Lovell."
AND NOTLovell AND NOT Jim Find all pages that contain the term "Lovell." Of those pages, exclude ones that contain the term "Jim."
NEARApollo NEAR moon (AltaVista:) Find all pages that contain the terms "Apollo" and "moon" within 10 spaces of each other.

Several operators can be used at once with parentheses to create even more complex searches. But be careful where you put the parentheses:

ExampleRead as...
(Jim OR James) AND Lovell Find pages that contain the terms "Jim" and "Lovell." Find pages that contain "James" and "Lovell."
Jim OR (James AND Lovell) Find pages that contain the term "Jim." Find pages that contain the terms "James" and "Lovell"
("Jim Lovell" NEAR astronaut) AND Apollo Find pages that contain the phrase "Jim Lovell" within 10 characters of the term "astronaut," and contain the term "Apollo" somewhere on the page.

More about search engines

Here are four selected resources on search engines.


Current Page: Search engines 1
Web directories  |  Search engines 2
Return to home page



Student Guide Home

Anatomy of a page

Evaluating for relevance
Authority of a Web page
Evaluating for accuracy

Page types:

Informational pages
News sources
Advocacy web pages
Personal home pages
Glossary: part 1

Web search strategies:

Getting started
Web directories
Search engines 1
Search engines 2
Citing online sources

Glossary: part 2

Search Engines 2

Suggested general search engines

Follow links for more information:

Specialized search engines

Follow links to go directly to these sites:

Here is some more information about the recommended general-purpose search engines:


Phrase searchingyes: "" or - (hyphen)
Includes/Excludesyes: + only
Boolean logicno
Meta-search no
Web directoryyes

Features: Infoseek is smart, easy to use, and forgiving; thus it is a good search engine for newbies, and a good place to start any search. Infoseek can search for Web pages, USENET postings, news stories, phone numbers, addresses, and e-mail addresses. Infoseek also has a Web directory of rated sites. Here are some of Infoseek's unique features and quirks:

  • Names and Titles: If you type "Max Roach" into the search box, Infoseek will look for pages that contain the name Max Roach. If you type, "max roach" it will look for pages containing "max" or "Max," and/or "roach" or "Roach" somewhere on the page. Infoseek also recognizes other kinds of names: "Saint Louis University," "Led Zeppelin," etc. Separate names in a sequence with commas:
    Max Roach, Lester Young

  • Phrase Searching: Phrases can be expressed with quotation marks or hyphens:

    "runs batted in" "stolen bases"

    or

    runs-batted-in stolen-bases

  • Sub-searching: Sub-searching gives users the ability to search for keywords, then search among the results to narrow the list further. On the results page from any search, Infoseek provides a button to perform sub-searches. Queries can be written to include sub-searches with the pipe (|) character to save a step:

    Charlie Parker | discography

Read Infoseek's help file for more information.

Top of the page

Phrase searchingyes: ""
Includes/Excludesyes: +/-
Boolean logicyes: AND OR NOT NEAR
Meta-search yes
Web directoryno

Features:

Multi-search: Dogpile is a multi-search engine. Instead of searching its own database, Dogpile will search twenty-five different commercial search databases, including Yahoo!, Lycos, HotBot, Excite, and others, and present the results on the same results page. These features can save you an enormous amount of time for simple searches.

Meta-search: Dogpile also has Metafind, a program that will search several databases at once and create a single list of all of the results. See the Metafind Help page for more details.

Operators: You can use any of the possible operators with a query. Arfie, the "fetch" software, will translate them into the appropriate format for each of the databases it checks.

Why search anywhere else? The first reason is that Dogpile only gives the top 10 results from each search engine. Metafind also has limits on the number of hits it will present. Secondly, all of the search engines have their own unique features to help you find information. In some cases, it is easier to go directly to the source to customize a search than to send the query through Dogpile. Finally, Dogpile is designed for users who are searching for as much information about a topic as possible. Researchers looking for a particular page, or some fairly common information, would have better luck using a single search engine.

Top of the page

Phrase searchingyes: ""
Includes/Excludesyes: + or -
Boolean logicyes (Advanced Search): AND OR NOT NEAR
Meta-search no
Web directoryno

AltaVista is known for its breadth and speed. It is among those search engines that catalog whole pages, so a query may hit information in the title, description, or content of the pages.

Features:

Simple Search: Simple Search just accepts keywords, phrases (""), and includes and excludes (+/-).

Advanced Search: Selecting Advanced Search brings up a unique search form that offers the following features:

  • Pull-down menus let users specify whether they want to search for Web pages or USENET posts, and whether they want to search for pages written in a particular language.
  • The "Ranking" box allows users to conduct sub-searches. A user could enter "Charles Mingus" in the "Search" box and "discography" in the "Ranking" box to find all of the documents that contain "Charles Mingus," and display links to all of those pages that also contain the word "discography" at the top of the list of results.
  • "From:" and "To:" boxes let users search for pages created during a particular period of time.
  • The Refine button will take users to a page that will provide a list of related keywords to refine the search results further. This is an excellent feature.

Read AltaVista's help file.

Top of the page

Current Page: Search engines 2
Search Engines 1  |  Citing web sources
Return to home page



Student Guide Home

Anatomy of a page

Evaluating for relevance
Authority of a Web page
Evaluating for accuracy

Page types:

Informational pages
News sources
Advocacy web pages
Personal home pages
Glossary: part 1

Web search strategies:

Getting started
Web directories
Search engines 1
Search engines 2
Citing online sources

Glossary: part 2

Citing online sources

Click the links below to see Janice Walker's guide to MLA style and the WEPAS Web extension for APA style.

MLA Style: Berkeley: http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/MLAStyleSheet.html
APA style: WEAPAS page: http://www.beadsland.com/weapas/

















Current Page: Citing online sources
Search Engines 2  |  Glossary 2
Return to home page



Student Guide Home

Anatomy of a page

Evaluating for relevance
Authority of a Web page
Evaluating for accuracy

Page types:

Informational pages
News sources
Advocacy web pages
Personal home pages
Glossary: part 1

Web search strategies:

Getting started
Web directories
Search engines 1
Search engines 2
Citing online sources

Glossary: part 2

Glossary: part 2


gopher mailing lists telnet thesis USENET

Gopher

Gopher --named for the rodent mascot of the University of Minnesota, where it was devised-- is a menu-driven system for browsing text resources on the Internet. Gopher was wildly popular between 1992 and 1994, but has now been almost completely swallowed up by the World Wide Web. There may be some documents still only available in this form, but they may be quite old.

Web browsers can still view gopher documents. Gopher URLs begin with "gopher://".

Back  |  Top of glossary

Mailing Lists

An e-mailing list, or mail reflector, is an e-mail address that takes any note sent to it and forwards the note to everyone on its list of "subscribers." Users can subscribe to a mailing list by sending a command by e-mail to the mail server (which always has a different address from the one used to send mail to the group). The most common mail servers are LISTSERV, listproc, and majordomo.

Like news groups, mailing lists are used for mostly informal discussion. Most mailing lists are not moderated or fact checked, so while they may be good resources for conversation, and leads to other resources, they are not entirely free of disinformation and rumor.

Back  |  Top of glossary

Telnet

Telnet is a program used to connect interactively with another computer on the Internet. A Web browser usually controls all of the transactions between a user's computer and other computers on the Net, and it automatically translates data it finds into formatted text, sounds, and pictures. Telnet only lets you view data "as is," but it lets users communicate directly with another computer: every time a key is pressed in a telnet session, the signal goes directly to the other computer.

Opportunities to use telnet have greatly diminished during the last few years, but some services are still only available by telnet.

Back  |  Top of glossary

Thesis

The thesis of a research paper is its main point or basic argument. The final, definite, thesis for a paper should not be set in stone before the research begins. An argument has to grow out of the research, and the author's evolving thinking about it. It is a good idea to come up with a "working thesis," or hypothesis, before beginning the research, then refine this idea as the project develops.

Back  |  Top of glossary

USENET

USENET is an organization of global network news systems on the Internet. In this case, "news" can be taken to have its more archaic meaning, "rumor." A news group is really an electronic bulletin board where anything related to the subject can be posted from anyone. Like Internet mailing lists, news groups are used mostly for informal discussion, not for official press releases; so a newsgroup might be a good place to look for leads, but it is not a good place to look for information you can cite in a paper (unless the notes come from well-known authorities in the field).

Back  |  Top of glossary
Current Page: Glossary
Citing web sources
Return to home page

Copyright Notice


All materials on this website are copyrighted () by Craig Branham.

The materials available on this site were designed to be read on the World Wide Web. They may not be reprinted and redistributed in any form. I extend permission to individuals and non-profit institutions to create links from a WWW page to any document on the site, as long as it is clear from the link context that these materials are not owned or affiliated with any project or organization other than the Department of English at Saint Louis University.

Editors and authors of magazines, trade journals, books, and other print or electronic media must obtain permission to reproduce any part of this site. These materials were not intended for print publication or distribution.


Version 1.1
Copr. 1997 Craig Branham
BRANHACC@SLU.EDU
Saint Louis University
Created: 27-Sept-97
Last Modified: 02-Oct-98

URL for this Document: http://www.slu.edu/departments/english/research/research2.html