Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data

Name	SBA Research Cookie
Provider	SBA Research
Purpose	Saves the settings of the visitors selected in the cookie box.
Cookie name	sba-research-cookie
Cookie runtime	1 year

Name	YouTube
Provider	YouTube
Purpose	Used to unblock YouTube content.
Privacy policy	https://policies.google.com/privacy
Host(s)	google.com
Cookie name	NID
Cookie runtime	6 months

Name	Vimeo
Provider	Vimeo
Purpose	Used to unblock Vimeo content.
Privacy policy	https://vimeo.com/privacy
Host(s)	player.vimeo.com
Cookie name	vuid
Cookie runtime	2 years

December 22, 2022

From December 17-20, 2022 the IEEE Big Data conference took place in Osaka, Japan. Rudolf Mayer, senior researcher at SBA Research, held a talk about Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data.

Titel

Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data

Authors

Markus Hittmeir, Rudolf Mayer, Andreas Ekelhart

Report

IEEE Big Data

Abstract

The use of synthetic data is a widely acknowledged privacy-preserving measure that reduces identity and attribute disclosure risks in micro-data. The idea is to learn the statistical properties of an original dataset, store this information in a model, and then use this model to generate artificial samples and build a synthetic dataset that resembles the original. One of the many different approaches of synthetization tools relies on describing the original dataset by using a Bayesian network. This method is implemented in the open-source tool DataSynthesizer and has proven particularly suitable for datasets with a small to moderate number of attributes. In this paper, we will substitute the greedy algorithm used for learning the Bayesian network by a substantially faster genetic algorithm. In addition, our goal is to protect particularly sensitive attributes by decreasing specific correlations in the synthetic data that may reveal personal information. We will thus show how to customize the network structures for specific machine learning tasks. Our experiments demonstrate that this technique allows to further decrease the disclosure risks and, hence, add to the applicability of synthetic data as technique for privacy preservation. Index Terms—Synthetic Data, Machine

Links

GitHub – sbaresearch/EnhancedDataSynthesizer

Efficient Bayesian Network Construction for Increased Privacy on Synthetic Data | IEEE Conference Publication | IEEE Xplore

News