Article in Proceedings INPROC-2017-31

BibliographyHeene, Mario; Parra Hinojosa, Alfredo; Bungartz, Hans-Joachim; Pflüger, Dirk: A Massively-Parallel, Fault-Tolerant Solver for High-Dimensional PDEs.
In: Desprez, F. (ed.); Et al. (ed.): Euro-Par 2016: Parallel Processing Workshops.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
Lecture Notes in Computer Science (LNCS); 10104, pp. 635-647, english.
Cham: Springer, May 28, 2017.
DOI: 10.1007/978-3-319-58943-5_51.
Article in Proceedings (Conference Paper).
CorporationEuro-Par 2016
CR-SchemaG.4 (Mathematical Software)
Abstract

We investigate the effect of hard faults on a massively-parallel implementation of the Sparse Grid Combination Technique (SGCT), an efficient numerical approach for the solution of high-dimensional time-dependent PDEs. The SGCT allows us to increase the spatial resolution of a solver to a level that is out of scope with classical discretization schemes due to the curse of dimensionality. We exploit the inherent data redundancy of this algorithm to obtain a scalable and fault-tolerant implementation without the need of checkpointing or process replication. It is a lossy approach that can guarantee convergence for a large number of faults and a wide range of applications. We present first results using our fault simulation framework – and the first convergence and scalability results with simulated faults and algorithm-based fault tolerance for PDEs in more than three dimensions.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Simulation of Large Systems
Project(s)EXAHD
Entry dateJune 19, 2017
   Publ. Department   Publ. Institute   Publ. Computer Science