Testing probability distributions using conditional samples

Canonne, Clement; Ron, Dana; Servedio, Rocco A.

计算机科学 > 数据结构与算法

arXiv:1211.2664 (cs)

[提交于 2012年11月12日 (v1) ，最后修订 2015年1月16日 (此版本， v2)]

标题：利用条件样本检验概率分布

标题： Testing probability distributions using conditional samples

Authors:Clement Canonne, Dana Ron, Rocco A. Servedio

摘要：我们研究了概率分布属性测试的一个新框架，考虑了能够访问条件采样 oracle 的分布测试算法。 * 这是一个 oracle，它以未知概率分布 $D$ 的定义域 $[N]$ 的子集 $S \subseteq [N]$ 作为输入，并返回来自限定于 $S$ 的条件概率分布 $D$ 的抽样结果。这个新的模型在设计分布测试算法时允许相当大的灵活性；特别是，该模型中的测试算法可以是自适应的。我们在这一新框架及其某些变体下研究了一系列自然的分布测试问题，给出了查询复杂度的上下界。这些问题包括检验$D$是否是均匀分布$\mathcal{U}$；检验$D = D^\ast$对于一个明确提供的$D^\ast$是否成立；检验两个未知分布$D_1$和$D_2$是否相等；以及估计$D$和均匀分布之间的变异数距离。我们的主要发现是，我们所考虑的新的“条件采样”框架非常强大：虽然上述所有问题在标准模型中的样本复杂度均为 $\Omega(\sqrt{N})$，（在某些情况下，复杂度必须接近线性于 $N$），我们在条件采样设定下为所有这些问题提供了 $\mathrm{poly}(\log N, 1/\varepsilon)$次查询算法（在某些情况下，提供了与 $N$无关的 $\mathrm{poly}(1/\varepsilon)$次查询算法）。 * 与我们的工作独立地，Chakraborty 等人也考虑了这一框架。我们在 subsection [1.4] 中讨论了他们的工作。

摘要： We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle.* This is an oracle that takes as input a subset $S \subseteq [N]$ of the domain $[N]$ of the unknown probability distribution $D$ and returns a draw from the conditional probability distribution $D$ restricted to $S$. This new model allows considerable flexibility in the design of distribution testing algorithms; in particular, testing algorithms in this model can be adaptive. We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether $D$ is the uniform distribution $\mathcal{U}$; testing whether $D = D^\ast$ for an explicitly provided $D^\ast$; testing whether two unknown distributions $D_1$ and $D_2$ are equivalent; and estimating the variation distance between $D$ and the uniform distribution. At a high level our main finding is that the new "conditional sampling" framework we consider is a powerful one: while all the problems mentioned above have $\Omega(\sqrt{N})$ sample complexity in the standard model (and in some cases the complexity must be almost linear in $N$), we give $\mathrm{poly}(\log N, 1/\varepsilon)$-query algorithms (and in some cases $\mathrm{poly}(1/\varepsilon)$-query algorithms independent of $N$) for all these problems in our conditional sampling setting. *Independently from our work, Chakraborty et al. also considered this framework. We discuss their work in Subsection [1.4].

评论：	第九节的重要修改（详细并扩展了定理16的证明）。在各个地方修正了若干澄清和排版错误。
主题：	数据结构与算法 (cs.DS) ; 计算复杂性 (cs.CC); 概率 (math.PR); 统计理论 (math.ST)
引用方式：	arXiv:1211.2664 [cs.DS]
	(或者 arXiv:1211.2664v2 [cs.DS] 对于此版本)
	https://doi.org/10.48550/arXiv.1211.2664

提交历史

来自： Clément Canonne [查看电子邮件]
[v1] 星期一， 2012 年 11 月 12 日 15:39:28 UTC (103 KB)
[v2] 星期五， 2015 年 1 月 16 日 18:23:16 UTC (116 KB)

计算机科学 > 数据结构与算法

标题：利用条件样本检验概率分布

标题： Testing probability distributions using conditional samples

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 数据结构与算法

标题： 利用条件样本检验概率分布 显示英文标题

标题： Testing probability distributions using conditional samples

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：利用条件样本检验概率分布