Bachelorarbeit BCLR-2022-27

Bibliograph.
Daten
Hauf, Nicolas: Multi-Node Parallelization of the PLSSVM Library.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Bachelorarbeit Nr. 27 (2022).
39 Seiten, englisch.
Kurzfassung

With the accumulation of more and more sensor data algorithms need to be able to process a huge amount of this data. A popular use for big data sets are machine learning algorithms. In this thesis a library which implements a support vector machine, a machine learning method, is extended to more adequately fit this need. The C++ library PLSSVM aims to be one of the fastest Support Vector Machine solving library for big data. By using the least squares method and parallelization using MPI the PLSSVM achieves this, which will be shown in this thesis. This Thesis describes the structure and workings of the PLSSVM C++ libraries parallelization method. The implementation is discussed in detail and the reasoning for each important step is presented. Using multiple metrics the performance of the library is evaluated on large test data and compared to different implementations. On the IPVS servers, the fast and space efficient library can be used to its fullest potential. In comparison to the original version of the library, when using 13 nodes with a total of 100 processes a roughly tenfold improvement can be seen. When scaling the amount of threads a decreasing runtime as well as strictly decreasing space usage per thread is achieved.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Scientific Computing
BetreuerPflüger, Prof. Dirk; Van Craen, Alexander
Eingabedatum24. Oktober 2022
   Publ. Institut   Publ. Informatik