|Franke, Max: Sparse grid datamining with huge datasets. |
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Bachelorarbeit (2015).
73 Seiten, englisch.
|CR-Klassif.||H.2.8 (Database Applications)|
Due to the inflated costs of disk space and the prevalence of sensor equipment everywhere, the scientific world is flooded by huge amounts of data. The intention being to somehow benefit from that data, data mining algorithms are used to evaluate those data. As conventional data mining methods scale at least linear with problem size and exponentially with input problem dimension, this poses a great problem as to the computing power required to mine these data. For the testing of data mining algorithms, very few real world reference datasets exist. Using an already in-place toolkit for data mining on sparse grids, the goal of this thesis is to generate one or more real world reference datasets for data mining purposes. For this purpose, multiple weather and photovoltaic datasets were used. It was possible to learn 6-dimensional datasets with 1.2 million data points and obtain a very good prediction of photovoltaic power. Thus, a dataset was obtained to test regression on. For classification, a 9-dimensional dataset with 200~000 data points was generated, which however didn't have overly good results, with a 41% hit rate over 4 classes. Here, further processing of the data will be necessary.
|PDF (4146227 Bytes)|
|Abteilung(en)||Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Simulation großer Systeme|
|Betreuer||Pflüger, Jun.-Prof. Dirk; Pfander, David|
|Eingabedatum||25. September 2018|