Bibliography | Hoffmann, Benjamin: Comparison of Standard and Zipf-Based Document Retrieval Heuristics. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Technical Report Computer Science No. 2010/06. 17 pages, english.
|
CR-Schema | H.3.1 (Content Analysis and Indexing) H.3.3 (Information Search and Retrieval) H.3.4 (Information Storage and Retrieval Systems and Software)
|
Keywords | Information Retrieval; Zipf Model |
Abstract | Document retrieval is the task to retrieve from a possibly huge collection of documents those which are most similar to a given query document. In this paper, we present a new heuristic for inexact top K retrieval. It is similar to the well-known index elimination heuristic and is based on Zipf's law, a statistical law observable in natural language texts. We compare the two heuristics with regard to retrieval performance and execution time. Therefore, we use a text collection consisting of scientific articles from various computer science conferences and journals. It turns out that our new approach is not better than index elimination. Interestingly, a combination of both heuristics yields the best results.
|
Full text and other links | PDF (190297 Bytes)
|
Department(s) | University of Stuttgart, Institute of Formal Methods in Computer Science, Theoretical Computer Science
|
Entry date | September 15, 2010 |
---|