Doctoral Thesis DIS-2009-03

BibliographyKraft, Tobias: Optimization of query sequences.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Doctoral Thesis (2009).
217 pages, english.
CR-SchemaH.2.4 (Database Management Systems)
Keywordsdatabase systems query optimization SQL
Abstract

Query optimization is a well-known topic in database research since the 1970s. This thesis highlights a special area of query optimization that arises from new trends in the usage of databases. Whereas in the beginning databases were primarily used for transaction-oriented processing of operative data, today databases are also used to facilitate reporting and analysis on consolidated, historic data. For the latter, the data is loaded into a large data warehouse and afterwards it is being analyzed by the use of tools. The tools used to model the flows that extract the operative data from the source systems, transform these data and load it into the data warehouse as well as the tools that process the data stored in the data warehouse often generate sequences of SQL statements that break down a complex flow or request into a sequence of computational steps. The optimization of this kind of sequences with respect to runtime is the focus of this thesis. We propose a heuristic as well as a cost-based approach for this optimization problem. The cost-based approach is just an enhancement of the heuristic approach. It results from adding a cost estimation component to the optimizer architecture of the heuristic approach and by replacing the heuristic control strategy by a control strategy that considers cost estimates. Both approaches are rule-based approaches that rewrite a given sequence of SQL statements into a syntactically different but semantically equivalent sequence of SQL statements. Therefore, we specify a set of rewrite rules. For cost estimation, we employ the capabilities of the query optimizer of the underlying database management system (DBMS) which is responsible for the execution of the query sequences. To improve the quality of these cost estimates, we support the query optimizer of the underlying DBMS with statistics that we derive from histogram propagation. For this purpose, we need an interface that allows to access and manipulate statistics in the underlying DBMS. Since there exists no standardized interface for this purpose, we define our own DBMS-independent interface. For the heuristic approach as well as for the cost-based approach, we provide prototypic implementations in JAVA. Furthermore, we have implemented the DBMS-independent interface for the three commercial DBMSs IBM DB2, Oracle, and Microsoft SQL Server. We report on the results of experiments that we conducted with our prototypes and some sample sequences that we derived by using a commercial tool for online analytical processing (OLAP). They show the effectiveness of our optimization approach and they highlight the optimization potential that lies in rewriting sequences of SQL statements. Finally, we draw a conclusion and suggest some interesting points for future research.

Full text and
other links
Tobias Kraft-Dissertation
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Entry dateApril 12, 2010
   Publ. Department   Publ. Institute   Publ. Computer Science