Differentially Private Synthetic Mixed-Type Data Generation For Unsupervised Learning

Tantipongpipat, Uthaipon; Waites, Chris; Boob, Digvijay; Siva, Amaresh Ankit; Cummings, Rachel

计算机科学 > 机器学习

arXiv:1912.03250 (cs)

[提交于 2019年12月6日 (v1) ，最后修订 2020年12月10日 (此版本， v2)]

标题：差分隐私合成混合数据生成用于无监督学习

标题： Differentially Private Synthetic Mixed-Type Data Generation For Unsupervised Learning

Authors:Uthaipon Tantipongpipat, Chris Waites, Digvijay Boob, Amaresh Ankit Siva, Rachel Cummings

摘要：我们引入了DP-auto-GAN框架用于合成数据生成，该框架结合了自动编码器的低维表示和生成对抗网络（GANs）的灵活性。此框架可以用来接收原始敏感数据，并在保护隐私的情况下训练模型以生成具有与原始数据相似统计特性的合成数据。通过差分隐私的后处理保证，这个学习到的模型可以生成任意数量的合成数据，然后可以自由共享。我们的框架适用于未标记的混合类型数据，可能包括二进制、分类和实数值数据。我们在二进制数据（MIMIC-III）和混合类型数据（ADULT）上实现了这个框架，并在无监督设置下的指标上将其性能与现有的私有算法进行了比较。我们还介绍了一种新的定量度量方法，能够检测合成数据的多样性或缺乏多样性。

摘要： We introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data and privately train a model for generating synthetic data that will satisfy similar statistical properties as the original data. This learned model can generate an arbitrary amount of synthetic data, which can then be freely shared due to the post-processing guarantee of differential privacy. Our framework is applicable to unlabeled mixed-type data, that may include binary, categorical, and real-valued data. We implement this framework on both binary data (MIMIC-III) and mixed-type data (ADULT), and compare its performance with existing private algorithms on metrics in unsupervised settings. We also introduce a new quantitative metric able to detect diversity, or lack thereof, of synthetic data.

主题：	机器学习 (cs.LG) ; 密码学与安全 (cs.CR); 机器学习 (stat.ML)
引用方式：	arXiv:1912.03250 [cs.LG]
	(或者 arXiv:1912.03250v2 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.1912.03250

提交历史

来自： Rachel Cummings [查看电子邮件]
[v1] 星期五， 2019 年 12 月 6 日 17:46:07 UTC (7,677 KB)
[v2] 星期四， 2020 年 12 月 10 日 00:46:37 UTC (32,319 KB)

计算机科学 > 机器学习

标题：差分隐私合成混合数据生成用于无监督学习

标题： Differentially Private Synthetic Mixed-Type Data Generation For Unsupervised Learning

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 差分隐私合成混合数据生成用于无监督学习 显示英文标题

标题： Differentially Private Synthetic Mixed-Type Data Generation For Unsupervised Learning

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：差分隐私合成混合数据生成用于无监督学习