Article in Proceedings INPROC-2021-03

BibliographyTschechlov, Dennis; Fritz, Manuel; Schwarz, Holger: AutoML4Clust: Efficient AutoML for Clustering Analyses.
In: Proceedings of the 24th International Conference on Extending Database Technology (EDBT).
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 1-6, english.
Online, March 2021.
DOI: 10.5441/002/EDBT.2021.32.
Article in Proceedings (Conference Paper).
CR-SchemaH.2.8 (Database Applications)

Data analysis is a highly iterative process. In order to achieve valuable analysis results, analysts typically execute many configurations, i.e., algorithms and their hyperparameter settings, based on their domain knowledge. While experienced analysts may be able to define small search spaces for promising configurations, especially novice analysts define large search spaces due to their lack of domain knowledge. In the worst case, they perform an exhaustive search throughout the whole search space, resulting in infeasible runtimes. Recent advances in the research area of AutoML address this challenge by supporting novice analysts in the combined algorithm selection and hyperparameter optimization (CASH) problem for supervised learning tasks. However, no such systems exist for unsupervised learning tasks, such as the prevalent task of clustering analysis. In this work, we present our novel AutoML4Clust approach, which efficiently supports novice analysts regarding CASH for clustering analyses. To the best of our knowledge, this is the first thoroughly elaborated approach in this area. Our comprehensive evaluation unveils that AutoML4Clust significantly outperforms several existing approaches, as it achieves considerable speedups for the CASH problem, while still achieving very valuable clustering results.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Entry dateMay 27, 2021
   Publ. Department   Publ. Institute   Publ. Computer Science