Floragasse 7 – 5th floor, 1040 Vienna
Subscribe to our Newsletter

News

Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data

From December 17-20, 2022 the IEEE Big Data conference took place in Osaka, Japan. Rudolf Mayer, senior researcher at SBA Research, held a talk about Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data.

Titel

Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data

Authors

Markus Hittmeir, Rudolf Mayer, Andreas Ekelhart

Report

IEEE Big Data

Abstract

The use of synthetic data is a widely acknowledged privacy-preserving measure that reduces identity and attribute disclosure risks in micro-data. The idea is to learn the statistical properties of an original dataset, store this information in a model, and then use this model to generate artificial samples and build a synthetic dataset that resembles the original. One of the many different approaches of synthetization tools relies on describing the original dataset by using a Bayesian network. This method is implemented in the open-source tool DataSynthesizer and has proven particularly suitable for datasets with a small to moderate number of attributes. In this paper, we will substitute the greedy algorithm used for learning the Bayesian network by a substantially faster genetic algorithm. In addition, our goal is to protect particularly sensitive attributes by decreasing specific correlations in the synthetic data that may reveal personal information. We will thus show how to customize the network structures for specific machine learning tasks. Our experiments demonstrate that this technique allows to further decrease the disclosure risks and, hence, add to the applicability of synthetic data as technique for privacy preservation. Index Terms—Synthetic Data, Machine

Links

GitHub – sbaresearch/EnhancedDataSynthesizer

Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data | IEEE Conference Publication | IEEE Xplore