Diplomarbeit DIP-3729

Bibliograph.
Daten
Bader, Andreas: Comparison of Time Series Databases.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Diplomarbeit Nr. 3729 (2016).
157 Seiten, englisch.
CR-Klassif.H.2.4 (Database Management Systems)
H.3.4 (Information Storage and Retrieval Systems and Software)
C.2.4 (Distributed Systems)
C.4 (Performance of Systems)
KeywordsTime Series Database; TSDB; Comparison; Survey; Benchmark; YCSB-TS; TSDBBench; Vergleich; Zeitreihendatenbank
Kurzfassung

Storing and analyzing large amounts of data are growing in importance since the fourth industrial revolution. As more devices are becoming ßmart" and are equipped with sensors in today's world, the amount of data that can be stored and analyzed grows. Insights from this data are important for several industries, e. g., energy companies for controlling smart grids.

Traditional Relational Database Management Systems (RDBMS) have reached their lim- its with such huge amounts of data, which resulted in a new database type, the NoSQL Database Management Systems (DBMS). NoSQL DBMS are specialized in handling huge amounts of data with the help of distribution and weaker consistency. Between these two a new type arose: Time Series Database (TSDB), which is specialized for storing and querying time series data.

The amount of existing TSDBs is big, whereby for this thesis 75 TSDBs have been found. 42 of them are open source, the remaining TSDBs are commercial. Many of the found open source TSDBs are under ongoing development. The challenge is the selection of one TSDB for a given scenario or problem. Benchmarks that have the ability to compare several TSDBs for a specific scenario or in general are hardly existing. This currently makes a choice based on performance only possible if TSDBs are manually compared in a test environment or with a self-written benchmark.

In this thesis, a feature comparison with 19 criteria in five groups between ten of these TSDB is presented and discussed. After presenting metrics, scenarios, and requirements for a benchmark, a benchmark for TSDB, TSDBBench, is presented. TSDBBench uses an Elastic Infrastructure (EI) and alterable workloads to measure the query latency and space consumption in different scenarios that include an alterable cluster setup. All benchmarking steps are automated, so that no user interaction is required after starting the benchmark. It also uses an adapted version of Yahoo Cloud Server Benchmark (YCSB) that is named Yahoo Cloud Server Benchmark for Time Series (YCSB-TS) for creating and measuring the queries of a workload, which is also presented in this thesis. For the performance part of the comparison, two scenarios are compared between the ten TSDBs with the use of TSDBBench. The results of the performance comparison are presented and discussed afterward. The thesis concludes with a discussion of the results from the feature and performance comparison.

Volltext und
andere Links
PDF (1153144 Bytes)
TSDBBench (Github)
Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
BetreuerKopp, Oliver; Falkenthal, Michael
Projekt(e)NEMAR
Eingabedatum30. September 2016
   Publ. Institut   Publ. Informatik