Computer-based simulations become increasingly important to imitate real-world experiments such as crash tests. The input data of simulations often come from diverse data sources that manage their data in a multiplicity of proprietary formats. Corresponding simulation workflows thus have to carry out many complex data provisioning tasks. These tasks filter and transform heterogeneous input data in such a way that underlying calculation software are able to ingest them. So, scientists have to spend much effort to implement several low-level data transformations in their simulation workflows.
This PhD thesis introduces novel concepts and methods that alleviate the implementation of the complex data provisioning in simulation workflows. Firstly, it addresses the problem that existing workflow systems overwhelm scientists with diverse data provisioning techniques. Therefore, this thesis derives a set of guidelines that assist scientists in choosing proper techniques for their workflows.
Another outcome are extensions of workflow languages that offer a generic solution to data provisioning in simulation workflows. More precisely, these extensions support any data resource and data management operation that are required by real simulations.
The major contribution is a pattern-based approach that completely removes the burden from scientists to implement low-level data transformations in their workflows. Instead of designing many workflow tasks, scientists only need to select a small number of abstract patterns to describe a high-level simulation process. Furthermore, scientists are familiar with the parameters to be specified for the patterns, because these parameters are related to their domain-specific methodology. Altogether, this approach conquers the data complexity associated with simulations, which allows scientists to concentrate on their core issue again, namely on the simulation itself.
The last contribution is a complementary optimization method to increase the performance of local data processing in simulation workflows. This method introduces various techniques that partition relevant local data processing tasks between the components of a workflow system in a smart way. Thereby, such tasks are either assigned to the workflow execution engine or to a tightly integrated local database system. Corresponding experiments revealed that, even for a moderate data size of about 0.5 MB, this method is able to reduce workflow duration by nearly a factor of 9.