Bachelor Thesis BCLR-0156

BibliographyGessler, Alexander: MapReduce to Couple a Bio-mechanical and a Systems-biological Simulation.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis No. 156 (2014).
106 pages, english.
CR-SchemaH.2.4 (Database Management Systems)
H.2.8 (Database Applications)
H.2.3 (Database Management Languages)
H.3.4 (Information Storage and Retrieval Systems and Software)
H.4.1 (Office Automation)
Abstract

Recently, workflow technology has fostered the hope of the scientific community in that they could help complex scientific simulations to become easier to implement and maintain. The subject of this thesis is an existing workflow for a multi-scalar simulation which calculates the flux of porous mass in human bones. The simulation consists of separate systems-biological and bio-mechanical simulation steps coupled through additional data processing steps. The workflow exhibits a high potential for parallelism which is only used to a marginal degree. Thus we investigate whether _Big Data_ concepts such as MapReduce or NoSQL can be integrated into the workflow.

A prototype of the workflow is developed using the Apache Hadoop ecosystem to parallelize the simulation and this prototype compared against a hand-parallelized baseline prototype in terms of performance and scalability. NoSQL concepts for storing inputs and results are utilized with an emphasis on HDFS, the Hadoop File System, as a schemaless distributed file system and MySQL Cluster as an intermediary between a classic database system and a NoSQL system.

Lastly, the MapReduce-based prototype is implemented in the WS-BPEL workflow language using the SIMPL[0] framework and a custom Web Service to access Hadoop functionality. We show the simplicity of the resulting workflow model and argue that the approach greatly decreases implementation effort and at the same time enables simulations to scale to very large data volumes at ease.

[0] P. Reimann, M. Reiter, H. Schwarz, D. Karastoyanova, F. Leymann. SIMPL - A Framework for Accessing External Data in Simulation Workflows. In BTW, pp. 534–553. Kaiserslautern, Germany, 2011.

Full text and
other links
PDF (4184413 Bytes)
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Reimann, Peter
Project(s)SimTech - DP4DDS
Entry dateJanuary 20, 2015
   Publ. Computer Science