Benjamin Hoffmann, Yury Lifshits, and Dirk Nowotka
Maximal Intersection Queries in Randomized Graph Models


In V. Diekert, M. Volkov, and A. Voronkov (eds), International Computer Science Symposium in Russia (CSR), Ekaterinburg, 2007, volume 4649 of Lecture Notes in Computer Science, Springer-Verlag, Berlin, 2007.

Abstract

Consider a family of sets and a single set, called query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Strict time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. Therefore, one need to add some assumptions about input in order to get an efficient solution. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time logarithmic in the size of the family. In particular, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to efficient algorithms mentioned above.

Keywords: maximum intersection problem, nearest neighbour problem, randomized graph models, large scale algorithms