Doctoral Thesis DIS-2004-01

BibliographyRantzau, Ralf: Query Processing Concepts and Techniques for Set Containment Tests.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Doctoral Thesis (2004).
198 pages, english.
CR-SchemaH.2.4 (Database Management Systems)
H.2.8 (Database Applications)
Keywordsrelational division, set containment join, frequent itemsets
Abstract

Relational division is an operator of the relational algebra that realizes universal quantifications in queries against a relational database. Expressing a universal quantification problem in SQL is cumbersome. If the division operator would have a counterpart in a query language, a more intuitive formulation of universal quantification problems would be possible. Although division is a derived operator--it can be expressed using other basic algebra operators--the performance of queries involving a division problem can be significantly increased if it is implemented as a separate operator in a query processor.

In this work, we study relational division as well as operators related to it like set containment join. We present a classification of division algorithms and define algebraic laws to enable query optimization for queries involving division. A natural generalization of the division operator, called set containment division, is proposed and algorithms that realize it are discussed, including an algorithm that uses a new in-memory index data structure. Furthermore, we study strategies for parallelizing set containment division algorithms.

The most popular data mining sub-task called frequent itemset discovery, where one tries to find the number of transactions that contain a given set of items, is an excellent example of a problem that requires efficient set containment tests. In this work, we focus on frequent itemset discovery algorithms that exploit the query processing capabilities of a relational database system. After reviewing the best algorithms based on SQL-92, we suggest a new approach, called Quiver, that can be executed with the help of set containment division.

We report on performance results that cover experiments with frequent itemset discovery algorithms running on commercial database systems as well as experiments that highlight the most important properties of set containment test algorithms using a query processor prototype implemented in Java.

Full text and
other links
PDF (1411021 Bytes)
Contactrantzau@informatik.uni-stuttgart.de
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Entry dateApril 23, 2004
   Publ. Department   Publ. Institute   Publ. Computer Science