Article in Journal ART-2023-02

BibliographyHirsch, Vitali; Reimann, Peter; Treder-Tschechlov, Dennis; Schwarz, Holger; Mitschang, Bernhard: Exploiting Domain Knowledge to address Class Imbalance and a Heterogeneous Feature Space in Multi-Class Classification.
In: International Journal on Very Large Data Bases (VLDB-Journal).
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
english.
Springer, February 27, 2023.
Article in Journal.
CR-SchemaH.2.8 (Database Applications)
KeywordsClassification; Domain knowledge; Multi-class Imbalance; Heterogeneous feature space
Abstract

Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre- processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Project(s)GSaME-NFG
Entry dateMarch 2, 2023
   Publ. Department   Publ. Institute   Publ. Computer Science