Artikel in Tagungsband INPROC-2020-54

Bibliograph.
Daten
Fritz, Manuel; Tschechlov, Dennis; Schwarz, Holger: Learning from Past Observations: Meta-Learning for Efficient Clustering Analyses.
In: Song, Min (Hrsg); Song, Il-Yeol (Hrsg); Kotsis, Gabriele (Hrsg); Tjoa, A Min (Hrsg); Khalil, Ismail (Hrsg): Proceedings of 22nd Big Data Analytics and Knowledge Discovery (DaWaK), 2020.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik.
Lecture Notes in Computer Science; 12393, S. 364-379, englisch.
Springer, Cham, 11. September 2020.
ISBN: 978-3-030-59065-9; DOI: https://doi.org/10.1007/978-3-030-59065-9_28.
Artikel in Tagungsband (Konferenz-Beitrag).
CR-Klassif.H.3.3 (Information Search and Retrieval)
Kurzfassung

Many clustering algorithms require the number of clusters as input parameter prior to execution. Since the “best†number of clusters is most often unknown in advance, analysts typically execute clustering algorithms multiple times with varying parameters and subsequently choose the most promising result. Several methods for an automated estimation of suitable parameters have been proposed. Similar to the procedure of an analyst, these estimation methods draw on repetitive executions of a clustering algorithm with varying parameters. However, when working with voluminous datasets, each single execution tends to be very time-consuming. Especially in today’s Big Data era, such a repetitive execution of a clustering algorithm is not feasible for an efficient exploration. We propose a novel and efficient approach to accelerate estimations for the number of clusters in datasets. Our approach relies on the idea of meta-learning and terminates each execution of the clustering algorithm as soon as an expected qualitative demand is met. We show that this new approach is generally applicable, i.e., it can be used with existing estimation methods. Our comprehensive evaluation reveals that our approach is able to speed up the estimation of the number of clusters by an order of magnitude, while still achieving accurate estimates.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
Projekt(e)INTERACT
Eingabedatum19. November 2020
   Publ. Institut   Publ. Informatik