Article in Journal ART-2016-03

BibliographyHupp, Philipp; Heene, Mario; Jacob, Riko; Pflüger, Dirk: Global communication schemes for the numerical solution of high-dimensional PDEs.
In: Parallel Computing. Vol. 52.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 78-105, english.
Amsterdam, The Netherlands: Elsevier Science Publishers, February 2016.
ISSN: 0167-8191.
Article in Journal.
CR-SchemaG.1.0 (Numerical Analysis General)
G.1.8 (Partial Differential Equations)
D.4 (Operating Systems)
D.1.3 (Concurrent Programming)
F.2.1 (Numerical Algorithms and Problems)
KeywordsCommunication model; Communication performance analysis; Experimental evaluation; Global communication; High-performance computing; Sparse grid combination technique
Abstract

We study the global communication of the numerical solution of high-dimensional PDEs.We design two optimal communication schemes for the sparse grid combination technique.We present a new communication model based on the system's latency and bandwidth.The communication model predicts the performance of the communication schemes.Experimental results on several current supercomputers confirm the predictions. The numerical treatment of high-dimensional partial differential equations is among the most compute-hungry problems and in urgent need for current and future high-performance computing (HPC) systems. It is thus also facing the grand challenges of exascale computing such as the requirement to reduce global communication. To cope with high dimensionalities we employ a hierarchical discretization scheme, the sparse grid combination technique. Based on an extrapolation scheme, the combination technique additionally mitigates the need for global communication: multiple and much smaller problems can be computed independently for each time step, and the global communication shrinks to a reduce/broadcast step in between. Here, we focus on this remaining synchronization step of the combination technique and present two communication schemes designed to either minimize the number of communication rounds or the total communication volume. Experiments on two different supercomputers show that either of the schemes outperforms the other depending on the size of the problem. Furthermore, we present a communication model based on the system's latency and bandwidth and validate the model with the experiments. The model can be used to predict the runtime of the reduce/broadcast step for dimensionalities that are yet out of scope on current supercomputers.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Simulation of Large Systems
Entry dateMay 19, 2016
   Publ. Institute   Publ. Computer Science