An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise

Düngler, Johanna; Sanyal, Amartya

统计学 > 机器学习

arXiv:2508.10879 (stat)

[提交于 2025年8月14日 ]

标题：一种用于差分隐私$k$-PCA 的迭代算法及自适应噪声

标题： An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise

Authors:Johanna Düngler, Amartya Sanyal

摘要：给定 $n$个独立同分布的随机矩阵 $A_i \in \mathbb{R}^{d \times d}$，它们具有共同的期望 $\Sigma$，差分隐私随机主成分分析的目标是找到一个维度为 $k$的子空间，该子空间捕捉 $\Sigma$的最大方差方向，同时保持每个个体 $A_i$的差分隐私（DP）。现有的方法要么（i）要求样本量$n$与维度$d$超线性增长，即使在对$A_i$做高斯假设的情况下也是如此，要么（ii）在差分隐私下引入过多噪声，即使在$A_i$内部的固有随机性较小时也是如此。 Liu 等人 (2022a) 解决了针对子高斯数据的这些问题，但仅限于使用他们的算法 DP-PCA 来估计顶级特征向量 ($k=1$) 。我们提出了第一个能够估计任意$k \leq d$的顶部$k$特征向量的算法，同时克服了上述两个限制。对于$k=1$，我们的算法达到了 DP-PCA 的效用保证，在$n = \tilde{\!O}(d)$时也能实现接近最优的统计误差。我们进一步提供了通用$k > 1$的下界，该下界与我们的上界相差一个因子$k$，并且实验上展示了我们的算法相比可比基线的优势。

摘要： Given $n$ i.i.d. random matrices $A_i \in \mathbb{R}^{d \times d}$ that share a common expectation $\Sigma$, the objective of Differentially Private Stochastic PCA is to identify a subspace of dimension $k$ that captures the largest variance directions of $\Sigma$, while preserving differential privacy (DP) of each individual $A_i$. Existing methods either (i) require the sample size $n$ to scale super-linearly with dimension $d$, even under Gaussian assumptions on the $A_i$, or (ii) introduce excessive noise for DP even when the intrinsic randomness within $A_i$ is small. Liu et al. (2022a) addressed these issues for sub-Gaussian data but only for estimating the top eigenvector ($k=1$) using their algorithm DP-PCA. We propose the first algorithm capable of estimating the top $k$ eigenvectors for arbitrary $k \leq d$, whilst overcoming both limitations above. For $k=1$ our algorithm matches the utility guarantees of DP-PCA, achieving near-optimal statistical error even when $n = \tilde{\!O}(d)$. We further provide a lower bound for general $k > 1$, matching our upper bound up to a factor of $k$, and experimentally demonstrate the advantages of our algorithm over comparable baselines.

主题：	机器学习 (stat.ML) ; 密码学与安全 (cs.CR); 信息论 (cs.IT); 机器学习 (cs.LG); 统计理论 (math.ST)
引用方式：	arXiv:2508.10879 [stat.ML]
	(或者 arXiv:2508.10879v1 [stat.ML] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.10879

提交历史

来自： Johanna Düngler [查看电子邮件]
[v1] 星期四， 2025 年 8 月 14 日 17:48:45 UTC (3,067 KB)

统计学 > 机器学习

标题：一种用于差分隐私$k$-PCA 的迭代算法及自适应噪声

标题： An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 机器学习

标题： 一种用于差分隐私$k$-PCA 的迭代算法及自适应噪声 显示英文标题

标题： An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：一种用于差分隐私$k$-PCA 的迭代算法及自适应噪声