Artikel in Tagungsband INPROC-2021-02

Bibliograph.
Daten
Fritz, Manuel; Tschechlov, Dennis; Schwarz, Holger: Efficient Exploratory Clustering Analyses with Qualitative Approximations.
In: Proceedings of the 24th International Conference on Extending Database Technology (EDBT).
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik.
S. 1-6, englisch.
Online, März 2021.
DOI: 10.5441/002/EDBT.2021.31.
Artikel in Tagungsband (Konferenz-Beitrag).
CR-Klassif.H.2.8 (Database Applications)
Kurzfassung

Clustering is a fundamental primitive for exploratory data analyses. Yet, finding valuable clustering results for previously unseen datasets is a pivotal challenge. Analysts as well as automated exploration methods often perform an exploratory clustering analysis, i.e., they repeatedly execute a clustering algorithm with varying parameters until valuable results can be found. k-center clustering algorithms, such as k-Means, are commonly used in such exploratory processes. However, in the worst case, each single execution of k-Means requires a super-polynomial runtime, making the overall exploratory process on voluminous datasets infeasible in a reasonable time frame. We propose a novel and efficient approach for approximating results of k-center clustering algorithms, thus supporting analysts in an ad-hoc exploratory process for valuable clustering results. Our evaluation on an Apache Spark cluster unveils that our approach significantly outperforms the regular execution of a k-center clustering algorithm by several orders of magnitude in runtime with a predefinable qualitative demand. Hence, our approach is a strong fit for clustering voluminous datasets in exploratory settings.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
Projekt(e)INTERACT
Eingabedatum27. Mai 2021
   Publ. Abteilung   Publ. Institut   Publ. Informatik