Bachelor Thesis BCLR-2023-13

BibliographyReed, Connor: Analysis and integration of data preprocessing steps in AutoML for clustering.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis No. 13 (2023).
76 pages, english.
Abstract

This work explores the field of Automated Machine Learning (AutoML) and its application in unsupervised learning, specifically clustering. While AutoML for clustering systems exist, none of these fully integrate and analyze preprocessing steps. Therefore the contributions of this work include the following: Analysis and selection of promising preprocessing methods. The selection is based on supervised AutoML systems with preprocessing and clustering use-cases. Extension of the existing package AutoML4Clust with the selected preprocessing methods and their respective hyperparameters. Addition of categorical clustering algorithms to the search space in an attempt to improve results on data with categorical features. Implementation of multiobjective optimization strategies which aim to reach a trade-off between accuracy and runtime. Reduction in runtime with sampling, which optimizes on only a subset of the data. Rigorous evaluation of the new implementations. The work concludes that AutoML can significantly benefit from the addition of preprocessing and that the proposed methods show promising results for future development.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Mitschang, Prof. Bernhard; Tschechlov, Dennis
Entry dateJune 20, 2023
New Report   New Article   New Monograph   Computer Science