Bibliograph. Daten | Hoffmann, Benjamin: Comparison of Standard and Zipf-Based Document Retrieval Heuristics. Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Technischer Bericht Informatik Nr. 2010/06. 17 Seiten, englisch.
|
CR-Klassif. | H.3.1 (Content Analysis and Indexing) H.3.3 (Information Search and Retrieval) H.3.4 (Information Storage and Retrieval Systems and Software)
|
Keywords | Information Retrieval; Zipf Model |
Kurzfassung | Document retrieval is the task to retrieve from a possibly huge collection of documents those which are most similar to a given query document. In this paper, we present a new heuristic for inexact top K retrieval. It is similar to the well-known index elimination heuristic and is based on Zipf's law, a statistical law observable in natural language texts. We compare the two heuristics with regard to retrieval performance and execution time. Therefore, we use a text collection consisting of scientific articles from various computer science conferences and journals. It turns out that our new approach is not better than index elimination. Interestingly, a combination of both heuristics yields the best results.
|
Volltext und andere Links | PDF (190297 Bytes)
|
Abteilung(en) | Universität Stuttgart, Institut für Formale Methoden der Informatik, Theoretische Informatik
|
Eingabedatum | 15. September 2010 |
---|