Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data
From December 17-20, 2022 the IEEE Big Data conference took place in Osaka, Japan. Rudolf Mayer, senior researcher at SBA Research, held a talk about Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data.
Titel
Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data
Authors
Markus Hittmeir, Rudolf Mayer, Andreas Ekelhart
Report
IEEE Big Data
Abstract
The use of synthetic data is a widely acknowledged privacy-preserving measure that reduces identity and attribute disclosure risks in micro-data. The idea is to learn the statistical properties of an original dataset, store this information in a model, and then use this model to generate artificial samples and build a synthetic dataset that resembles the original. One of the many different approaches of synthetization tools relies on describing the original dataset by using a Bayesian network. This method is implemented in the open-source tool DataSynthesizer and has proven particularly suitable for datasets with a small to moderate number of attributes. In this paper, we will substitute the greedy algorithm used for learning the Bayesian network by a substantially faster genetic algorithm. In addition, our goal is to protect particularly sensitive attributes by decreasing specific correlations in the synthetic data that may reveal personal information. We will thus show how to customize the network structures for specific machine learning tasks. Our experiments demonstrate that this technique allows to further decrease the disclosure risks and, hence, add to the applicability of synthetic data as technique for privacy preservation. Index Terms—Synthetic Data, Machine