Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2411.00775

Help | Advanced Search

Computer Science > Machine Learning

arXiv:2411.00775 (cs)
[Submitted on 1 Nov 2024 ]

Title: Dimension-free Private Mean Estimation for Anisotropic Distributions

Title: 无维度的各向异性分布隐私均值估计

Authors:Yuval Dagan, Michael I. Jordan, Xuelin Yang, Lydia Zakynthinou, Nikita Zhivotovskiy
Abstract: We present differentially private algorithms for high-dimensional mean estimation. Previous private estimators on distributions over $\mathbb{R}^d$ suffer from a curse of dimensionality, as they require $\Omega(d^{1/2})$ samples to achieve non-trivial error, even in cases where $O(1)$ samples suffice without privacy. This rate is unavoidable when the distribution is isotropic, namely, when the covariance is a multiple of the identity matrix, or when accuracy is measured with respect to the affine-invariant Mahalanobis distance. Yet, real-world data is often highly anisotropic, with signals concentrated on a small number of principal components. We develop estimators that are appropriate for such signals$\unicode{x2013}$our estimators are $(\varepsilon,\delta)$-differentially private and have sample complexity that is dimension-independent for anisotropic subgaussian distributions. Given $n$ samples from a distribution with known covariance-proxy $\Sigma$ and unknown mean $\mu$, we present an estimator $\hat{\mu}$ that achieves error $\|\hat{\mu}-\mu\|_2\leq \alpha$, as long as $n\gtrsim\mathrm{tr}(\Sigma)/\alpha^2+ \mathrm{tr}(\Sigma^{1/2})/(\alpha\varepsilon)$. In particular, when $\pmb{\sigma}^2=(\sigma_1^2, \ldots, \sigma_d^2)$ are the singular values of $\Sigma$, we have $\mathrm{tr}(\Sigma)=\|\pmb{\sigma}\|_2^2$ and $\mathrm{tr}(\Sigma^{1/2})=\|\pmb{\sigma}\|_1$, and hence our bound avoids dimension-dependence when the signal is concentrated in a few principal components. We show that this is the optimal sample complexity for this task up to logarithmic factors. Moreover, for the case of unknown covariance, we present an algorithm whose sample complexity has improved dependence on the dimension, from $d^{1/2}$ to $d^{1/4}$.
Abstract: 我们提出了用于高维均值估计的差分隐私算法。 先前针对 $\mathbb{R}^d$ 上分布的隐私估计器受到维度灾难的影响,因为它们需要 $\Omega(d^{1/2})$ 个样本才能实现非平凡误差,即使在 $O(1)$ 个样本就足够无需隐私的情况下也是如此。 当分布是各向同性的,即协方差是单位矩阵的倍数时,或者当使用仿射不变马氏距离测量准确性时,这种速率是不可避免的。 然而,现实世界的数据通常高度各向异性,信号集中在少数主成分上。 我们开发了适用于此类信号的估计器 $\unicode{x2013}$,我们的估计器是 $(\varepsilon,\delta)$ -差分隐私的,并且对于各向异性的次高斯分布,其样本复杂度与维度无关。 给定从具有已知协方差代理 $\Sigma$ 和未知均值 $\mu$ 的分布中抽取的 $n$ 个样本,我们提出一个估计量 $\hat{\mu}$ ,只要 $n\gtrsim\mathrm{tr}(\Sigma)/\alpha^2+ \mathrm{tr}(\Sigma^{1/2})/(\alpha\varepsilon)$,其误差可达 $\|\hat{\mu}-\mu\|_2\leq \alpha$。 特别是当$\pmb{\sigma}^2=(\sigma_1^2, \ldots, \sigma_d^2)$是$\Sigma$的奇异值时,我们有$\mathrm{tr}(\Sigma)=\|\pmb{\sigma}\|_2^2$和$\mathrm{tr}(\Sigma^{1/2})=\|\pmb{\sigma}\|_1$,因此当信号集中在少数主成分上时,我们的界避免了维度依赖。我们证明这是该任务的最优样本复杂度,除对数因子外。此外,对于协方差未知的情况,我们提出了一种算法,其样本复杂度在维度上的依赖性得到了改进,从$d^{1/2}$到$d^{1/4}$。
Subjects: Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Cite as: arXiv:2411.00775 [cs.LG]
  (or arXiv:2411.00775v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2411.00775
arXiv-issued DOI via DataCite

Submission history

From: Lydia Zakynthinou [view email]
[v1] Fri, 1 Nov 2024 17:59:53 UTC (87 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
view license
Current browse context:
cs.LG
< prev   |   next >
new | recent | 2024-11
Change to browse by:
cs
stat
stat.ML

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号