Artikel in Tagungsband INPROC-2021-03

Bibliograph.
Daten
Tschechlov, Dennis; Fritz, Manuel; Schwarz, Holger: AutoML4Clust: Efficient AutoML for Clustering Analyses.
In: Proceedings of the 24th International Conference on Extending Database Technology (EDBT).
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik.
S. 1-6, englisch.
Online, März 2021.
DOI: 10.5441/002/EDBT.2021.32.
Artikel in Tagungsband (Konferenz-Beitrag).
CR-Klassif.H.2.8 (Database Applications)
Kurzfassung

Data analysis is a highly iterative process. In order to achieve valuable analysis results, analysts typically execute many configurations, i.e., algorithms and their hyperparameter settings, based on their domain knowledge. While experienced analysts may be able to define small search spaces for promising configurations, especially novice analysts define large search spaces due to their lack of domain knowledge. In the worst case, they perform an exhaustive search throughout the whole search space, resulting in infeasible runtimes. Recent advances in the research area of AutoML address this challenge by supporting novice analysts in the combined algorithm selection and hyperparameter optimization (CASH) problem for supervised learning tasks. However, no such systems exist for unsupervised learning tasks, such as the prevalent task of clustering analysis. In this work, we present our novel AutoML4Clust approach, which efficiently supports novice analysts regarding CASH for clustering analyses. To the best of our knowledge, this is the first thoroughly elaborated approach in this area. Our comprehensive evaluation unveils that AutoML4Clust significantly outperforms several existing approaches, as it achieves considerable speedups for the CASH problem, while still achieving very valuable clustering results.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
Eingabedatum27. Mai 2021
   Publ. Abteilung   Publ. Institut   Publ. Informatik