Article in Proceedings INPROC-2019-20

BibliographyFritz, Manuel; Schwarz, Holger: Initializing k-Means Efficiently: Benefits for Exploratory Cluster Analysis.
In: Panetto, Hervé (ed.); Debruyne, Christophe (ed.); Hepp, Martin (ed.); Lewis, Dave (ed.); Ardagna, Claudio Agostino (ed.); Meersman, Robert (ed.): On the Move to Meaningful Internet Systems: OTM 2019 Conferences.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
Lecture Notes in Computer Science (LNCS); 11877, pp. 146-163, english.
Springer Nature Switzerland AG, January 2019.
ISSN: 978-3-030-33245-7; DOI: 10.1007/978-3-030-33246-4.
Article in Proceedings (Conference Paper).
CorporationOn the Move to Meaningful Internet Systems
CR-SchemaE.0 (Data General)
H.2.8 (Database Applications)
H.3.3 (Information Search and Retrieval)
KeywordsExploratory cluster analysis; k-Means; Initialization
Abstract

Data analysis is a highly exploratory task, where various algorithms with different parameters are executed until a solid result is achieved. This is especially evident for cluster analyses, where the number of clusters must be provided prior to the execution of the clustering algorithm. Since this number is rarely known in advance, the algorithm is typically executed several times with varying parameters. Hence, the duration of the exploratory analysis heavily dependends on the runtime of each execution of the clustering algorithm. While previous work shows that the initialization of clustering algorithms is crucial for fast and solid results, it solely focuses on a single execution of the clustering algorithm and thereby neglects previous executions. We propose Delta Initialization as an initialization strategy for k-Means in such an exploratory setting. The core idea of this new algorithm is to exploit the clustering results of previous executions in order to enhance the initialization of subsequent executions. We show that this algorithm is well suited for exploratory cluster analysis as considerable speedups can be achieved while additionally achieving superior clustering results compared to state-of-the-art initialization strategies.

Full text and
other links
Springer Link
Contactmanuel.fritz@ipvs.uni-stuttgart.de
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Entry dateOctober 16, 2019
   Publ. Department   Publ. Institute   Publ. Computer Science