Article in Journal ART-2019-11

BibliographyFritz, Manuel; Muazzen, Osama; Behringer, Michael; Schwarz, Holger: ASAP-DM: a framework for automatic selection of analytic platforms for data mining.
In: Software-Intensive Cyber-Physical Systems.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 1-13, english.
Springer Berlin Heidelberg, August 17, 2019.
ISSN: 2524-8510; 2524-8529; DOI: 10.1007/s00450-019-00408-7.
Article in Journal.
CR-SchemaE.0 (Data General)
H.2.8 (Database Applications)
H.3.3 (Information Search and Retrieval)
KeywordsData mining; Analytic platform; Platform selection
Abstract

The plethora of analytic platforms escalates the difficulty of selecting the most appropriate analytic platform that fits the needed data mining task, the dataset as well as additional user-defined criteria. Especially analysts, who are rather focused on the analytics domain, experience difficulties to keep up with the latest developments. In this work, we introduce the ASAP-DM framework, which enables analysts to seamlessly use several platforms, whereas programmers can easily add several platforms to the framework. Furthermore, we investigate how to predict a platform based on specific criteria, such as lowest runtime or resource consumption during the execution of a data mining task. We formulate this task as an optimization problem, which can be solved by today’s classification algorithms. We evaluate the proposed framework on several analytic platforms such as Spark, Mahout, and WEKA along with several data mining algorithms for classification, clustering, and association rule discovery. Our experiments unveil that the automatic selection process can save up to 99.71% of the execution time due to automatically choosing a faster platform.

CopyrightSpringer Berlin Heidelberg
Contactmanuel.fritz@ipvs.uni-stuttgart.de
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Entry dateAugust 19, 2019
   Publ. Department   Publ. Institute   Publ. Computer Science