Partitioned Global Address Space (PGAS) is an abstraction of the shared memory model for Single Program Multiple Data (SPMD). PGAS was developed to unite the concepts of shared memory and distributed data into a single parallel programming model. The purpose of allying distributed data with shared memory was to cultivate locality-aware shared memory paradigm. In PGAS, shared space is partitioned among threads, such that each thread has a portion of shared space which is local to it. Each thread can exploit locality by effectively doing computation on data which has affinity to it or which resides in its local space.
Unified Parallel C (UPC) is a parallel extension of ISO C. It is distributed shared memory programming model which is based on PGAS to support SPMD programs. UPC aims to support locality-aware paradigm. The declarations in UPC provide control to the programmer to efficiently distribute data across multiple threads, which can later be manipulated by the threads such that each thread manipulates data which has affinity to it to exploit locality. However UPC does not restrict the access to data which has affinity to any other thread. Apart from shared space, each thread has also a private space which can only be accessed by thread which owns this space. UPC provides library functions for moving data to/from shared memory economically, efficiently dividing the tasks among the threads.
Molecular dynamics (MD) simulates interactions between molecules. In principle, given an initial set of positions and velocities of molecules, the following time progression of a set of interacting molecules is ascertain. After the system is initialized with the initial positions and velocities of molecules, calculation of forces is done on all molecules in system. Finally Newton's equations of motion are integrated to advance the positions and velocities of molecules. The simulation is advanced unless the computation of time evolution of system is completed for the aimed length of time.
We have ported an in-house MD code (CMD) to UPC, calculating interactions between the molecules and atoms in a parallel fashion. The molecules are spatially decomposed into cells which then are distributed among threads. Each thread calculates the interactions of the molecules of the cells it has affinity to. To minimize the access to remote data, the cells are distributed among the threads in a spatially coherent manner. We have tried to aid cache optimizations in most of the routines in our ported MD code. Here we elaborate the technical details of UPC which are necessary to understand our ported CMD code, then we explain the methodology we chose for porting our CMD code to UPC and finally we present the benchmark results for our UPC implementation of CMD.