|Bibliography||Fritz, Manuel; Tschechlov, Dennis; Schwarz, Holger: Efficient Exploratory Clustering Analyses with Qualitative Approximations. |
In: Proceedings of the 24th International Conference on Extending Database Technology (EDBT).
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 1-6, english.
Online, March 2021.
Article in Proceedings (Conference Paper).
|CR-Schema||H.2.8 (Database Applications)|
Clustering is a fundamental primitive for exploratory data analyses. Yet, finding valuable clustering results for previously unseen datasets is a pivotal challenge. Analysts as well as automated exploration methods often perform an exploratory clustering analysis, i.e., they repeatedly execute a clustering algorithm with varying parameters until valuable results can be found. k-center clustering algorithms, such as k-Means, are commonly used in such exploratory processes. However, in the worst case, each single execution of k-Means requires a super-polynomial runtime, making the overall exploratory process on voluminous datasets infeasible in a reasonable time frame. We propose a novel and efficient approach for approximating results of k-center clustering algorithms, thus supporting analysts in an ad-hoc exploratory process for valuable clustering results. Our evaluation on an Apache Spark cluster unveils that our approach significantly outperforms the regular execution of a k-center clustering algorithm by several orders of magnitude in runtime with a predefinable qualitative demand. Hence, our approach is a strong fit for clustering voluminous datasets in exploratory settings.
|Department(s)||University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems|
|Entry date||May 27, 2021|