Technischer Bericht TR-2010-06

Hoffmann, Benjamin: Comparison of Standard and Zipf-Based Document Retrieval Heuristics.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Technischer Bericht Informatik Nr. 2010/06.
17 Seiten, englisch.
CR-Klassif.H.3.1 (Content Analysis and Indexing)
H.3.3 (Information Search and Retrieval)
H.3.4 (Information Storage and Retrieval Systems and Software)
KeywordsInformation Retrieval; Zipf Model

Document retrieval is the task to retrieve from a possibly huge collection of documents those which are most similar to a given query document. In this paper, we present a new heuristic for inexact top K retrieval. It is similar to the well-known index elimination heuristic and is based on Zipf's law, a statistical law observable in natural language texts. We compare the two heuristics with regard to retrieval performance and execution time. Therefore, we use a text collection consisting of scientific articles from various computer science conferences and journals. It turns out that our new approach is not better than index elimination. Interestingly, a combination of both heuristics yields the best results.

Volltext und
andere Links
PDF (190297 Bytes)
Abteilung(en)Universität Stuttgart, Institut für Formale Methoden der Informatik, Theoretische Informatik
Eingabedatum15. September 2010
   Publ. Informatik