Article in Proceedings INPROC-2021-02

BibliographyFritz, Manuel; Tschechlov, Dennis; Schwarz, Holger: Efficient Exploratory Clustering Analyses with Qualitative Approximations.
In: Proceedings of the 24th International Conference on Extending Database Technology (EDBT).
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 1-6, english.
Online, March 2021.
DOI: 10.5441/002/EDBT.2021.31.
Article in Proceedings (Conference Paper).
CR-SchemaH.2.8 (Database Applications)

Clustering is a fundamental primitive for exploratory data analyses. Yet, finding valuable clustering results for previously unseen datasets is a pivotal challenge. Analysts as well as automated exploration methods often perform an exploratory clustering analysis, i.e., they repeatedly execute a clustering algorithm with varying parameters until valuable results can be found. k-center clustering algorithms, such as k-Means, are commonly used in such exploratory processes. However, in the worst case, each single execution of k-Means requires a super-polynomial runtime, making the overall exploratory process on voluminous datasets infeasible in a reasonable time frame. We propose a novel and efficient approach for approximating results of k-center clustering algorithms, thus supporting analysts in an ad-hoc exploratory process for valuable clustering results. Our evaluation on an Apache Spark cluster unveils that our approach significantly outperforms the regular execution of a k-center clustering algorithm by several orders of magnitude in runtime with a predefinable qualitative demand. Hence, our approach is a strong fit for clustering voluminous datasets in exploratory settings.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Entry dateMay 27, 2021
   Publ. Department   Publ. Institute   Publ. Computer Science