Bachelor Thesis BCLR-2023-20

BibliographyKuksina, Olena: Differential privacy by sampling.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis No. 20 (2023).
87 pages, english.
Abstract

Abstract

Collection and storage of immense volumes of data has become commonplace in today's digital age, making the protection of personal data increasingly important. Private data often includes sensitive information about an individual, and is gathered by medical and financial institutions, research and social science organisations, government, etc., taking full advantage of data-driven analytics and knowledge-based decision-making to improve products and services, enterprise statistical analysis, comprehensive studies of demographic trends, and many others. The disclosure or sharing of such information among different parties could infringe on privacy. This information can be used for malicious purposes, such as identity theft, scams or targeted advertising. This work examines the field of privacy-preserving data publishing.

Quality of published data significantly affects not only understanding and processing strategy, but the accuracy of data analysis as well as consequently the interpretation and decisions derived from the data. In order to meet this challenge, synthetic anonymization techniques, such as k-anonymity and its enhanced algorithms, are applied. However, they are based on the background knowledge of the adversary. A semantic model, or differential privacy, is a more rigorous mathematical notion of privacy assurance that operates under no assumptions. Nevertheless, differential privacy applies to the subsequent phase, namely privacy preserving data mining, query answering and aggregate statistics.

In the scope of this work, a subsampling anonymization algorithm DP-anonym providing k-anonymity with integrated differential privacy mechanisms, such as Laplace mechanism and exponential mechanisms, is elaborated. The algorithm provides synthetic and semantic privacy, combining the best of the two areas of private data exploration. According to experimental results, the proposed DP-anonym algorithm provides better data utility when compared to standard anonymization algorithms among general data utility metrics. It also provides more precise answers to typical database queries as it uses multidimensional generalization approach. In contrast to standard methods, DP-anonym achieves (epsilon, delta)-differential privacy, which guarantees the privacy of published anonymized data more efficiently.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Mitschang, Prof. Bernhard; Stach, Dr. Christoph
Entry dateSeptember 14, 2023
New Report   New Article   New Monograph   Computer Science