Date of Award
12-1997
Document Type
Thesis
Degree Name
Master of Science
Abstract
We have become overwhelmed with electronic information and it seems our situation is not going to improve. It is becoming increasingly common for people to work with information on a daily basis. We seem to spend more and more time looking for information, and it is taking longer because more information is available. This thesis will look at how we can provide faster access to the information we want to find. Today's requirements are closely related to searching for information using queries. At the heart of the query process is the removal of search terms having little or no significance to the search being performed. Words considered to have little significance, in terms of their searching power, called stopwords, are compiled in a stoplist. Stoplists are usually constructed from commonly occurring words in the English language. This approach is acceptable for systems handling broad categories of information. We will build a stoplist for a specific area of interest based on a specific body of linguistic data, or corpus. A stoplist developed from an Air Force corpus will be tested to see if it is more effective than a stoplist created from a general use corpus.
AFIT Designator
AFIT-GIR-LAL-97D-02
DTIC Accession Number
ADA334570
Recommended Citation
Berg, Craig N., "Developing a Corpus Specific Stoplist Using Quantitative Comparison" (1997). Theses and Dissertations. 5576.
https://scholar.afit.edu/etd/5576
Comments
Thesis presented to the Faculty of the Graduate School of Logistics and Acquisitions Management.