Master Thesis MSTR-2023-07

BibliographySchrader, Timo Pierre: Efficient application of accelerator cards for the coupling library preCICE.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 7 (2023).
73 pages, english.
Abstract

The usage of accelerator cards, mainly graphics processing units (GPU), in scientific and industrial research has been on the rise for years due to their highly data-parallel computational throughput capabilities. Common fields are, amongst other things, machine learning, computational physics, and cryptography.

This thesis investigates the efficient application of GPUs in the multi-physics coupling library preCICE. We look at data mapping methods, which are used to map values between two vertex clouds. More specifically, the focus lies on radial basis function (RBF) interpolation that acts on scattered data points. Solving an RBF interpolation problem requires the solution of mostly large and ill-conditioned systems of linear equations. High computational effort is needed in order to solve these systems, which increases the runtime of preCICE tremendously. We approach this problem by leveraging the high computing power of GPUs.

In order to integrate GPU support into preCICE, we make use of the Ginkgo linear algebra library, which supports multiple data-parallel backends, including Nvidia CUDA, AMD HIP, and OpenMP. It provides solvers and preconditioners for linear systems of equations such as conjugate gradient (CG) and GMRES. Using Ginkgo, we implement an assembly routine for RBF matrices that is up to 100-1,000 times faster than already existing variants in preCICE. We discuss GPU-specific optimization approaches and the resulting efficiency of our implementation approach. The result is a nearly optimal assembly kernel that uses most of the 64-bit compute units on the GPU. Next, we evaluate CG and GMRES, combined with Jacobi and Cholesky preconditioners, on GPUs. The iterative solution approach works well on sparse system matrices, which are the result of RBF kernels with local support and are very competitive to using a very high number of CPU cores. To also provide an efficient way of solving dense systems on GPUs, we additionally implement a QR decomposition using the Nvidia cuSolver library. Our experiments show that using the CUDA QR decomposition on dense system matrices outperforms every other variant including iterative GPU and multi-core CPU solvers as well as single-core solvers by at least a factor of five for larger interpolation problems. As a last step, we investigate a matrix-free RBF solution approach that allows for solving problems of sizes that exceed GPU memory limitations in matrix-based methods.

To summarize our findings, preCICE can highly benefit from the efficient application of GPUs in RBF data mapping routines by being able to solve large interpolation problems much faster; enabling the users of preCICE to run their coupled simulation in less time.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Simulation Software Engineering
Superviser(s)Uekermann, Jun.-Prof. Benjamin; Schneider, David
Entry dateJune 14, 2023
   Publ. Computer Science