Article in Proceedings INPROC-2020-54

BibliographyFritz, Manuel; Tschechlov, Dennis; Schwarz, Holger: Learning from Past Observations: Meta-Learning for Efficient Clustering Analyses.
In: Song, Min (ed.); Song, Il-Yeol (ed.); Kotsis, Gabriele (ed.); Tjoa, A Min (ed.); Khalil, Ismail (ed.): Proceedings of 22nd Big Data Analytics and Knowledge Discovery (DaWaK), 2020.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
Lecture Notes in Computer Science; 12393, pp. 364-379, english.
Springer, Cham, September 11, 2020.
ISBN: 978-3-030-59065-9; DOI: https://doi.org/10.1007/978-3-030-59065-9_28.
Article in Proceedings (Conference Paper).
CR-SchemaH.3.3 (Information Search and Retrieval)
Abstract

Many clustering algorithms require the number of clusters as input parameter prior to execution. Since the “best” number of clusters is most often unknown in advance, analysts typically execute clustering algorithms multiple times with varying parameters and subsequently choose the most promising result. Several methods for an automated estimation of suitable parameters have been proposed. Similar to the procedure of an analyst, these estimation methods draw on repetitive executions of a clustering algorithm with varying parameters. However, when working with voluminous datasets, each single execution tends to be very time-consuming. Especially in today’s Big Data era, such a repetitive execution of a clustering algorithm is not feasible for an efficient exploration. We propose a novel and efficient approach to accelerate estimations for the number of clusters in datasets. Our approach relies on the idea of meta-learning and terminates each execution of the clustering algorithm as soon as an expected qualitative demand is met. We show that this new approach is generally applicable, i.e., it can be used with existing estimation methods. Our comprehensive evaluation reveals that our approach is able to speed up the estimation of the number of clusters by an order of magnitude, while still achieving accurate estimates.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Project(s)INTERACT
Entry dateNovember 19, 2020
   Publ. Department   Publ. Institute   Publ. Computer Science