Masterarbeit MSTR-2015-49

Bibliograph.
Daten
Li, Chen: Distributed Data Analytics Using Graph Processing Frameworks.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 49 (2015).
71 Seiten, englisch.
Kurzfassung

In recent years, with the growth of graphs size and associated data, e.g. Facebook user graph, large graphs trend to be distributed and processed in parallel on the cluster by using graph processing frameworks such as Pregel and PowerGraph. Graph processing frameworks generally provide a vertex-centric programming model. However, some important problems do not have available algorithms in this model. This thesis presents novel algorithms in vertex-centric programming paradigm for two problems: distributed subgraph isomorphism, and geospatial simulation using Agent-Based Cellular Automata. We evaluate the proposed algorithms on PowerGraph, and use these algorithms to examine a framework prototype called GrapH with respect to the improvement in communication. The subgraph isomorphism problem is to determine whether a given graph contains any subgraph that is isomorphic to a given pattern. Most existing algorithms for subgraph isomorphism are highly computation intensive and are not capable to scale to large graphs. As solving subgraph isomorphism often requires a global overview of graph, developing a vertex-centric algorithm is very challenging. We present a distributed algorithm (DSI) and two variants (optimized DSI and PDSI) based on the GAS model for subgraph isomorphism. We also analyse the properties of the algorithms and point out the bottlenecks for the performance. Geospatial simulation has assisted the investigation and modelling the environmental systems, which helps people understanding complex dynamics of our society and making better decisions. As the scale of simulations becomes larger, simulation trends to be executed on distributed systems. Therefore using graph processing frameworks is a good option for simulation. Thus we present an algorithm for geospatial simulation using Agent-Based Cellular Automata. Our evaluation show that optimized DSI needs on average only one third of the execution time as DSI needs. PDSI can remarkably reduce the runtime with a low proportion, and still be able to find a relative reasonable quantity of the isomorphic subgraphs. Moreover, algorithms for both subgraph isomorphism and geospatial simulation display heterogeneous vertex synchronization costs. Finally, we show that by taking the heterogeneity into account and introducing adaptive partitioning GrapH reduces overall communication more than 20%, compared with PowerGraph partitioning strategy.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Verteilte Systeme
BetreuerRothermel, Prof. Kurt; Tariq, Dr. Muhammad Adnan; Mayer, Christian
Eingabedatum14. Februar 2022
   Publ. Informatik