Masterarbeit MSTR-2019-83

Bibliograph.
Daten
Tschechlov, Dennis: Analysis and Transfer of AutoML Concepts for Clustering Algorithms.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 83 (2019).
91 Seiten, englisch.
Kurzfassung

Data analysts are confronted with the choice of selecting an appropriate algorithm with suitable hyperparameters for datasets that they want to analyze. For this, they typically execute and evaluate many configurations in a trial-and-error manner. However, for novice data analysts this is a time-consuming task. Recent advances in the research area of AutoML address this problem by automatically find a suitable algorithm with appropriate hyperparameters. Yet, these systems are only applicable for supervised learning tasks and not for unsupervised learning. In the scope of this work, existing AutoML systems are analyzed in detail. Subsequently, a concept is developed that uses components from existing AutoML systems but modifies them in such a way that they are applicable for unsupervised learning. Although, various kinds of unsupervised learning methods exist, this work focuses on the popular unsupervised method clustering. This concept is also prototypical implemented as proof-of-concept and is used for the evaluation. The comprehensive evaluation discusses the results for different optimization methods for selecting a suitable clustering algorithm with appropriate hyperparameters. The evaluation unveils that the predicted number of clusters of the implemented prototype deviates only slightly from the actual number of clusters. Hence, this work showed that it is possible to successfully transfer the concepts of existing AutoML systems to the unsupervised learning method of clustering and at the same time achieve precise results in an acceptable amount of time.

Volltext und
andere Links
Volltext
Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
BetreuerSchwarz, PD Dr. Holger; Fritz, Manuel
Eingabedatum2. März 2020
   Publ. Institut   Publ. Informatik