Artikel in Tagungsband INPROC-2019-20

Fritz, Manuel; Schwarz, Holger: Initializing k-Means Efficiently: Benefits for Exploratory Cluster Analysis.
In: Panetto, Hervé (Hrsg); Debruyne, Christophe (Hrsg); Hepp, Martin (Hrsg); Lewis, Dave (Hrsg); Ardagna, Claudio Agostino (Hrsg); Meersman, Robert (Hrsg): On the Move to Meaningful Internet Systems: OTM 2019 Conferences.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik.
Lecture Notes in Computer Science (LNCS); 11877, S. 146-163, englisch.
Springer Nature Switzerland AG, Januar 2019.
ISSN: 978-3-030-33245-7; DOI: 10.1007/978-3-030-33246-4.
Artikel in Tagungsband (Konferenz-Beitrag).
KörperschaftOn the Move to Meaningful Internet Systems
CR-Klassif.E.0 (Data General)
H.2.8 (Database Applications)
H.3.3 (Information Search and Retrieval)
KeywordsExploratory cluster analysis; k-Means; Initialization

Data analysis is a highly exploratory task, where various algorithms with different parameters are executed until a solid result is achieved. This is especially evident for cluster analyses, where the number of clusters must be provided prior to the execution of the clustering algorithm. Since this number is rarely known in advance, the algorithm is typically executed several times with varying parameters. Hence, the duration of the exploratory analysis heavily dependends on the runtime of each execution of the clustering algorithm. While previous work shows that the initialization of clustering algorithms is crucial for fast and solid results, it solely focuses on a single execution of the clustering algorithm and thereby neglects previous executions. We propose Delta Initialization as an initialization strategy for k-Means in such an exploratory setting. The core idea of this new algorithm is to exploit the clustering results of previous executions in order to enhance the initialization of subsequent executions. We show that this algorithm is well suited for exploratory cluster analysis as considerable speedups can be achieved while additionally achieving superior clustering results compared to state-of-the-art initialization strategies.

Volltext und
andere Links
Springer Link
Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
Eingabedatum16. Oktober 2019
   Publ. Abteilung   Publ. Institut   Publ. Informatik