Artikel in Zeitschrift ART-2023-02

Bibliograph.
Daten
Hirsch, Vitali; Reimann, Peter; Treder-Tschechlov, Dennis; Schwarz, Holger; Mitschang, Bernhard: Exploiting Domain Knowledge to address Class Imbalance and a Heterogeneous Feature Space in Multi-Class Classification.
In: International Journal on Very Large Data Bases (VLDB-Journal).
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik.
englisch.
Springer, 27. Februar 2023.
Artikel in Zeitschrift.
CR-Klassif.H.2.8 (Database Applications)
KeywordsClassification; Domain knowledge; Multi-class Imbalance; Heterogeneous feature space
Kurzfassung

Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre- processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
Projekt(e)GSaME-NFG
Eingabedatum2. März 2023
   Publ. Abteilung   Publ. Institut   Publ. Informatik