|Rantzau, Ralf: Query Processing Concepts and Techniques for Set Containment Tests. |
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Dissertation (2004).
198 Seiten, englisch.
|CR-Klassif.||H.2.4 (Database Management Systems)|
H.2.8 (Database Applications)
|Keywords||relational division, set containment join, frequent itemsets|
Relational division is an operator of the relational algebra that realizes universal quantifications in queries against a relational database. Expressing a universal quantification problem in SQL is cumbersome. If the division operator would have a counterpart in a query language, a more intuitive formulation of universal quantification problems would be possible. Although division is a derived operator--it can be expressed using other basic algebra operators--the performance of queries involving a division problem can be significantly increased if it is implemented as a separate operator in a query processor.
In this work, we study relational division as well as operators related to it like set containment join. We present a classification of division algorithms and define algebraic laws to enable query optimization for queries involving division. A natural generalization of the division operator, called set containment division, is proposed and algorithms that realize it are discussed, including an algorithm that uses a new in-memory index data structure. Furthermore, we study strategies for parallelizing set containment division algorithms.
The most popular data mining sub-task called frequent itemset discovery, where one tries to find the number of transactions that contain a given set of items, is an excellent example of a problem that requires efficient set containment tests. In this work, we focus on frequent itemset discovery algorithms that exploit the query processing capabilities of a relational database system. After reviewing the best algorithms based on SQL-92, we suggest a new approach, called Quiver, that can be executed with the help of set containment division.
We report on performance results that cover experiments with frequent itemset discovery algorithms running on commercial database systems as well as experiments that highlight the most important properties of set containment test algorithms using a query processor prototype implemented in Java.
|PDF (1411021 Bytes)|
|Abteilung(en)||Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware|
|Eingabedatum||23. April 2004|