Bachelorarbeit BCLR-2023-13

Bibliograph.
Daten
Reed, Connor: Analysis and integration of data preprocessing steps in AutoML for clustering.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Bachelorarbeit Nr. 13 (2023).
76 Seiten, englisch.
Kurzfassung

This work explores the field of Automated Machine Learning (AutoML) and its application in unsupervised learning, specifically clustering. While AutoML for clustering systems exist, none of these fully integrate and analyze preprocessing steps. Therefore the contributions of this work include the following: Analysis and selection of promising preprocessing methods. The selection is based on supervised AutoML systems with preprocessing and clustering use-cases. Extension of the existing package AutoML4Clust with the selected preprocessing methods and their respective hyperparameters. Addition of categorical clustering algorithms to the search space in an attempt to improve results on data with categorical features. Implementation of multiobjective optimization strategies which aim to reach a trade-off between accuracy and runtime. Reduction in runtime with sampling, which optimizes on only a subset of the data. Rigorous evaluation of the new implementations. The work concludes that AutoML can significantly benefit from the addition of preprocessing and that the proposed methods show promising results for future development.

Volltext und
andere Links
Volltext
Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
BetreuerMitschang, Prof. Bernhard; Tschechlov, Dennis
Eingabedatum20. Juni 2023
   Publ. Institut   Publ. Informatik