Bachelor Thesis BCLR-2022-27

BibliographyHauf, Nicolas: Multi-Node Parallelization of the PLSSVM Library.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis No. 27 (2022).
39 pages, english.

With the accumulation of more and more sensor data algorithms need to be able to process a huge amount of this data. A popular use for big data sets are machine learning algorithms. In this thesis a library which implements a support vector machine, a machine learning method, is extended to more adequately fit this need. The C++ library PLSSVM aims to be one of the fastest Support Vector Machine solving library for big data. By using the least squares method and parallelization using MPI the PLSSVM achieves this, which will be shown in this thesis. This Thesis describes the structure and workings of the PLSSVM C++ libraries parallelization method. The implementation is discussed in detail and the reasoning for each important step is presented. Using multiple metrics the performance of the library is evaluated on large test data and compared to different implementations. On the IPVS servers, the fast and space efficient library can be used to its fullest potential. In comparison to the original version of the library, when using 13 nodes with a total of 100 processes a roughly tenfold improvement can be seen. When scaling the amount of threads a decreasing runtime as well as strictly decreasing space usage per thread is achieved.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Scientific Computing
Superviser(s)Pflüger, Prof. Dirk; Van Craen, Alexander
Entry dateOctober 24, 2022
   Publ. Computer Science