Dimension-free Private Mean Estimation for Anisotropic Distributions

Dagan, Yuval; Jordan, Michael I.; Yang, Xuelin; Zakynthinou, Lydia; Zhivotovskiy, Nikita

Computer Science > Machine Learning

arXiv:2411.00775 (cs)

[Submitted on 1 Nov 2024 ]

Title: Dimension-free Private Mean Estimation for Anisotropic Distributions

Title: 无维度的各向异性分布隐私均值估计

Authors:Yuval Dagan, Michael I. Jordan, Xuelin Yang, Lydia Zakynthinou, Nikita Zhivotovskiy

Abstract: We present differentially private algorithms for high-dimensional mean estimation. Previous private estimators on distributions over $\mathbb{R}^d$ suffer from a curse of dimensionality, as they require $\Omega(d^{1/2})$ samples to achieve non-trivial error, even in cases where $O(1)$ samples suffice without privacy. This rate is unavoidable when the distribution is isotropic, namely, when the covariance is a multiple of the identity matrix, or when accuracy is measured with respect to the affine-invariant Mahalanobis distance. Yet, real-world data is often highly anisotropic, with signals concentrated on a small number of principal components. We develop estimators that are appropriate for such signals$\unicode{x2013}$our estimators are $(\varepsilon,\delta)$-differentially private and have sample complexity that is dimension-independent for anisotropic subgaussian distributions. Given $n$ samples from a distribution with known covariance-proxy $\Sigma$ and unknown mean $\mu$, we present an estimator $\hat{\mu}$ that achieves error $\|\hat{\mu}-\mu\|_2\leq \alpha$, as long as $n\gtrsim\mathrm{tr}(\Sigma)/\alpha^2+ \mathrm{tr}(\Sigma^{1/2})/(\alpha\varepsilon)$. In particular, when $\pmb{\sigma}^2=(\sigma_1^2, \ldots, \sigma_d^2)$ are the singular values of $\Sigma$, we have $\mathrm{tr}(\Sigma)=\|\pmb{\sigma}\|_2^2$ and $\mathrm{tr}(\Sigma^{1/2})=\|\pmb{\sigma}\|_1$, and hence our bound avoids dimension-dependence when the signal is concentrated in a few principal components. We show that this is the optimal sample complexity for this task up to logarithmic factors. Moreover, for the case of unknown covariance, we present an algorithm whose sample complexity has improved dependence on the dimension, from $d^{1/2}$ to $d^{1/4}$.

Abstract: 我们提出了用于高维均值估计的差分隐私算法。先前针对 $\mathbb{R}^d$ 上分布的隐私估计器受到维度灾难的影响，因为它们需要 $\Omega(d^{1/2})$ 个样本才能实现非平凡误差，即使在 $O(1)$ 个样本就足够无需隐私的情况下也是如此。当分布是各向同性的，即协方差是单位矩阵的倍数时，或者当使用仿射不变马氏距离测量准确性时，这种速率是不可避免的。然而，现实世界的数据通常高度各向异性，信号集中在少数主成分上。我们开发了适用于此类信号的估计器 $\unicode{x2013}$，我们的估计器是 $(\varepsilon,\delta)$ -差分隐私的，并且对于各向异性的次高斯分布，其样本复杂度与维度无关。给定从具有已知协方差代理 $\Sigma$ 和未知均值 $\mu$ 的分布中抽取的 $n$ 个样本，我们提出一个估计量 $\hat{\mu}$ ，只要 $n\gtrsim\mathrm{tr}(\Sigma)/\alpha^2+ \mathrm{tr}(\Sigma^{1/2})/(\alpha\varepsilon)$，其误差可达 $\|\hat{\mu}-\mu\|_2\leq \alpha$。特别是当$\pmb{\sigma}^2=(\sigma_1^2, \ldots, \sigma_d^2)$是$\Sigma$的奇异值时，我们有$\mathrm{tr}(\Sigma)=\|\pmb{\sigma}\|_2^2$和$\mathrm{tr}(\Sigma^{1/2})=\|\pmb{\sigma}\|_1$，因此当信号集中在少数主成分上时，我们的界避免了维度依赖。我们证明这是该任务的最优样本复杂度，除对数因子外。此外，对于协方差未知的情况，我们提出了一种算法，其样本复杂度在维度上的依赖性得到了改进，从$d^{1/2}$到$d^{1/4}$。

Subjects:	Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Cite as:	arXiv:2411.00775 [cs.LG]
	(or arXiv:2411.00775v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.00775

Submission history

From: Lydia Zakynthinou [view email]
[v1] Fri, 1 Nov 2024 17:59:53 UTC (87 KB)

Computer Science > Machine Learning

Title: Dimension-free Private Mean Estimation for Anisotropic Distributions

Title: 无维度的各向异性分布隐私均值估计

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title: Dimension-free Private Mean Estimation for Anisotropic Distributions Show Chinese title

Title: 无维度的各向异性分布隐私均值估计

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Dimension-free Private Mean Estimation for Anisotropic Distributions