Student Thesis STUD-2412

BibliographyMayer, Christian: Design and Implementation of a Coordination Service for Distributed Applications (In-memory Paxos).
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Student Thesis No. 2412 (2013).
48 pages, english.
CR-SchemaC.2.4 (Distributed Systems)
C.4 (Performance of Systems)
H.3.4 (Information Storage and Retrieval Systems and Software)
Abstract

Abstract Coordination of different, independent processes is a very important aspect in the area of distributed systems. In order to coordinate each other, participants of a distributed system often have to agree on some common knowledge such as locking of a shared resource. The general problem how to reach agreement on some value is also known as consensus problem. In many practical systems, the consensus problem is outsourced to a distributed system consisting of multiple servers to increase its availability. Each server can be contacted by clients that intend to reach consensus about a specific value. Examples are Google’s locking service Chubby or Yahoo’s distributed file system Zookeeper. The standard Paxos algorithm solves this problem in an environment where nodes may recover after a crash and messages can have infinite delay. However, a system based on the classical Paxos algorithm makes use of expensive stable storage operations to guarantee that a crashed and recovered Paxos server is still able to participate in the protocol. Studies have shown that these disk costs are the bottleneck of the whole system. In this work a performance-oriented version of Paxos will be investigated that still solves the consensus problem, but trades availability of the consensus system against performance by not using stable storage operations. Without careful design this can be problematic, because a Paxos server that has lost its memory can be dangerous for the success of the protocol. This can be solved by not allowing a crashed and recovered Paxos server to participate in the protocol anymore. Instead, a recovered server rejoins the protocol with a new id, so that no active processes assume anything about the recovered process. In order to join the group of active processes, a majority of servers has to be active and agree on this. Our evaluations show, that a coordination system based on the in-memory Paxos approach has a very short response time of only one millisecond and high throughput up to 18000 write requests per second.

Full text and
other links
PDF (902781 Bytes)
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Distributed Systems
Superviser(s)Dürr Frank
Entry dateJuly 18, 2013
   Publ. Computer Science