Technischer Bericht TR-2010-06

Bibliograph.
Daten
Hoffmann, Benjamin: Comparison of Standard and Zipf-Based Document Retrieval Heuristics.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Technischer Bericht Informatik Nr. 2010/06.
17 Seiten, englisch.
CR-Klassif.H.3.1 (Content Analysis and Indexing)
H.3.3 (Information Search and Retrieval)
H.3.4 (Information Storage and Retrieval Systems and Software)
KeywordsInformation Retrieval; Zipf Model
Kurzfassung

Document retrieval is the task to retrieve from a possibly huge collection of documents those which are most similar to a given query document. In this paper, we present a new heuristic for inexact top K retrieval. It is similar to the well-known index elimination heuristic and is based on Zipf's law, a statistical law observable in natural language texts. We compare the two heuristics with regard to retrieval performance and execution time. Therefore, we use a text collection consisting of scientific articles from various computer science conferences and journals. It turns out that our new approach is not better than index elimination. Interestingly, a combination of both heuristics yields the best results.

Volltext und
andere Links
PDF (190297 Bytes)
Abteilung(en)Universität Stuttgart, Institut für Formale Methoden der Informatik, Theoretische Informatik
Eingabedatum15. September 2010
   Publ. Institut   Publ. Informatik