Master Thesis MSTR-2022-04

BibliographyKunze, Ulf: Partitioning training data for complex multi-class problems using constraint-based clustering.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 4 (2022).
58 pages, english.
Abstract

Quality control is one of the most important tools for protecting consumers of low quality products and is therefore essential. But it is not only important to keep defective and substandard products off the market through quality control, but to repair them whenever possible. This saves unnecessary waste and is a sustainable use of resources. In this thesis, constraint-based clustering algorithms are evaluated in the use case of quality control. Constraint-based clustering algorithms are used because they promise more flexibility than rigid partitions. The reallocation of data instance into new clusters can help to reduce the influence of analytic challenges for example: heterogeneous product portfolio, Multi-class imbalance and small sample size. For this thesis the algorithms CDBSCAN, COP-Kmeans and MPCK-Means are evaluated. The used constraint sets are Constraints by: Product group, engine type and error classes. This work also examines the existing method in more detail to understand the different behaviours. The end result is an average improvement of 5% over the existing approach and an increase of 13% over a random forest classifier. Furthermore, methods for extracting domain knowledge from data sets are investigated. For this purpose, an active learning and an algorithmic approach are integrated into the existing pipeline.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Schwarz, Prof. Holger; Tschechlov; Dennis
Entry dateApril 28, 2022
   Publ. Computer Science