| Kurzfassung | Recent hardware advancements mostly hinge on adding more cores to the architecture, with increasing support for additional random access memory. This requires scalable software that make the most out of the cores available, especially for simulation software, which run in HPC server cluster on not just multiple cores, but rather multiple CPUs with intercommunication enabled by MPI. One such simulation software is OpenDiHu, designed to be scaled to more than 10000 cores for modeling skeletal muscles, neural activation and EMG signals. It is currently possible to run the OpenDiHu mechanics solver in parallel if the solver is used as a standalone component, running the solver in parallel with preCICE is currently not possible. We first present a revised algorithm for calculating boundary conditions, that enabled parallel support for mechanics solver coupled with preCICE by looking into their respective ghost values and show that this reduces the execution time by up to 80% depending on the simulation experiment. Afterwards we present a new extendible component for creating checkpoints in a predefined interval, a common technique for increasing the resiliency of simulation software against hard failures. We are able to restart a given simulation at the last successfully written checkpoint. We then implement multiple different checkpointing backends based on two different file formats HDF5, JSON with options that allow us to combine data into a single checkpoint and write this checkpoint collaboratively using MPI-IO as well as an option for an independent writer, where each rank writes their own respective data to an independent file. Last but not least, we present the correctness of these implementations and compare the overall overhead for writing checkpoints for all implementations. The results show that implementations that use the MPI-IO writer result in an overall worse performance, with the highest overhead, in our tests on NFS4 mount points while we see the best performance, with the lowest overhead of around 1 to 2 percent points for independent checkpointing with HDF5 on a local EXT4 mounted disk. Furthermore, we evaluate if loading a checkpoint preforms better than rerunning the simulation from scratch, with an overall cost of rerunning that takes 180 times longer than loading a checkpoint. We also provide a recommendation that the HDF5 based checkpointing backend should be preferred with independent checkpoints onto a local hard drive or better a network attached storage. The later will simplify restoring a checkpoint when multiple compute nodes are participating in the simulation.
|