Article in Proceedings INPROC-2002-05

BibliographyRantzau, Ralf: Frequent Itemset Discovery with SQL Using Universal Quantification.
In: Proceedings of the International Workshop on Database Technologies for Data Mining (DTDM); Prague, Czech Republic, March 2002.
University of Stuttgart, Faculty of Computer Science.
pp. 51-66, english.
Prague, Czech Republic: unknown, March 2002.
Article in Proceedings (Conference Paper).
CR-SchemaH.2.4 (Database Management Systems)
H.2.8 (Database Applications)
Keywordsdata mining; association rules; relational division; mining and database integration
Abstract

Algorithms for finding frequent itemsets fall into two broad classes: (1) algorithms that are based on non-trivial SQL statements to query and update a database, and (2) algorithms that employ sophisticated in-memory data structures, where the data is stored into and retrieved from flat files. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. However, the current trend of database vendors to integrate analysis functionalities into their query execution and optimization components, i.e., "closer to the data," suggests revisiting these results and searching for new, potentially better solutions.

We investigate approaches based on SQL-92 and present a new approach called Quiver that employs universal and existential quantifications. This approach uses a table layout for itemsets, where a group of multiple records represents a single itemset. Hence, our vertical layout is similar to the popular layout used for the transaction table, which is the input of frequent itemset discovery. Our approach is particularly beneficial if the database system in use provides adequate strategies and techniques for processing universally quantified queries, unlike current commercial systems.

Contactrrantzau@acm.org
Department(s)University of Stuttgart, Institute of Parallel and Distributed High-Performance Systems, Applications of Parallel and Distributed Systems
Project(s)ORBIT
Entry dateMarch 5, 2002
   Publ. Department   Publ. Institute   Publ. Computer Science