Knowledge discovery and data mining are concerned with the discovery of valid, novel, potentially useful, and understandable patterns in data. Most data mining algorithms require that the data are represented as a single, attribute-value table. In contrast, data mining techniques that are developed within the framework of Inductive Logic Programming (ILP) are applicable directly to multi-relational databases. The application of ILP for data mining is also termed relational data mining.
This thesis develops and investigates pruning techniques that are applicable for pattern discovery within a restricted ILP setting where all patterns describe subgroups of a fixed population of individuals. This variant of pattern discovery is also known as relational subgroup discovery. The main contribution of the thesis is an Apriori-like search algorithm for relational subgroup discovery. Formerly, Apriori-like search was applied only in attribute-value (i. e., non-relational) settings.
In particular, the contributions of the thesis are (1) a formal description of subgroup discovery in the framework of ILP, (2) optimum estimates for the interestingness criteria distributional unusualness and implication intensity, (3) an extension to the well-known Apriori algorithm that allows to constrain the set of patterns searched by Apriori, (4) an ILP language bias that allows the application of an Apriori-like algorithm for relational subgroup discovery, (5) an SQL-based language bias for relational subgroup discovery via SQL queries to a relational database management system, (6) a novel approach for integrating pruning based on structured attributes in an Apriori-like search algorithm, (7) an approach for integrating pruning based on discreetized numerical attributes in an Apriori-like search algorithm, (8) an experimental evaluation of the various pruning methods (namely, optimum estimates, Apriori-like pruning, use of structure in attributes for pruning), (9) the application of the approach for data mining in a real-world financial database, and extensive comparisons with related work.
The experiments provide a comparison of different methods for pruning the search space for subgroup discovery in an ILP framework. Optimum estimates and Apriori-like pruning have produced good and reliable pruning effects, while the effect of pruning based on structured attributes varied for different search settings. In particular, the experiments have shown that Apriori-like pruning can have a similarly good effect for search in a multi-relational data base as it has for search in a single-relation database.
The application of the approach for data mining in a real-world financial database has shown that the language bias is well suited for the task of relational subgroup discovery, and that its expressivity is practically useful.
A detailed english abstract is given in Appendix B of the thesis.