Bibliograph. Daten | Hirsch, Vitali; Reimann, Peter; Treder-Tschechlov, Dennis; Schwarz, Holger; Mitschang, Bernhard: Exploiting Domain Knowledge to address Class Imbalance and a Heterogeneous Feature Space in Multi-Class Classification. In: International Journal on Very Large Data Bases (VLDB-Journal). Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik. englisch. Springer, 27. Februar 2023. Artikel in Zeitschrift.
|
CR-Klassif. | H.2.8 (Database Applications)
|
Keywords | Classification; Domain knowledge; Multi-class Imbalance; Heterogeneous feature space |
Kurzfassung | Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre- processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.
|
Abteilung(en) | Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
|
Projekt(e) | GSaME-NFG
|
Eingabedatum | 2. März 2023 |
---|