Student Thesis STUD-2300

BibliographyRiegger, Philipp: Literature Survey on Nearest Neighbor Search and Search in Graphs.
University of Stuttgart, Faculty of Computer Science, Student Thesis No. 2300 (2010).
43 pages, english.
CR-SchemaA.1 (General Literature, Introductory and Survey)
E.1 (Data Structures)
F.2.2 (Nonnumerical Algorithms and Problems)
G.3 (Probability and Statistics)
H.2.8 (Database Applications)
H.3.1 (Content Analysis and Indexing)
H.3.3 (Information Search and Retrieval)
Abstract

Search algorithms are one of the most common problems in computer science. Their most basic form is string search which ranges from search in small documents, where speed is the most important feature of a given algorithm and indexing is usually not worth the effort to text retrieval in huge document collections, where memory and hard disk usage becomes an important factor and indexing is necessary to comply with given time bounds. For many applications we need to search more complex data objects like pictures and other multimedia data, data with special properties e.g. in computational biology or structured data like graphs or networks. Depending on the application, this either makes the search problem more complex since we need to do more work to adapt to the type of data or it makes it easier because we can exploit additional information.

We can map lots of databases to metric spaces or similarity spaces, which are finite sets with either a distance measure (a metric) or a similarity function. We can create a data structure that, given some new data set, enables us to compute its nearest neighbor which is the element in our database with minimal distance or maximal similarity to the new data set. This is called nearest neighbor search, it has its roots in computational geometry but is used in lots of other application areas like data mining.

When we map data objects to some abstract space we usually pay this generalization by losing some information. Therefore and because of noise and other effects in the input data it often suffices to search for approximate matches or approximate near neighbors. This allows us to create algorithms with significantly lower time and space requirements. Another reason for using approximate algorithms is the so called curse of dimensionality. For low dimensional data sets exact nearest neighbor search performs reasonably well, but for extremely high dimensional data sets which arise e.g. in document retrieval these algorithms are not practical since they use time or space exponential in the dimension.

The switch to approximate search enables us to further improve our algorithms and opens a huge area of different methods to solve these problems.

Full text and
other links
PDF (365144 Bytes)
Access to students' publications restricted to the faculty due to current privacy regulations
Department(s)University of Stuttgart, Institute of Computer Science, Software Reliability and Security
Superviser(s)Nowotka, Dirk
Entry dateNovember 18, 2010
   Publ. Computer Science