Maximal Intersection Queries in Randomized Input Models
Benjamin Hoffmann, Mikhail Lifshits, Yury Lifshits, and Dirk Nowotka
Consider a family of sets and a single set, called the query set. How
can one quickly find a member of the family which has a maximal intersection
with the query set? Time constraints on the query and on a
possible preprocessing of the set family make this problem challenging.
Such maximal intersection queries arise in a wide range of applications,
including web search, recommendation systems, and distributing on-line
advertisements. In general, maximal intersection queries are computationally
expensive. We investigate two well-motivated distributions over all
families of sets and propose an algorithm for each of them. We show that
with very high probability an almost optimal solution is found in time
which is logarithmic in the size of the family. Moreover, we point out
a threshold phenomenon on the probabilities of intersecting sets in each
of our two input models which leads to the efficient algorithms mentioned
above.
Keywords: Set intersection problem, nearest neighbour problem, randomized input models, large scale algorithms
|