Hypothesis Selection: A High Probability Conundrum

Aamand, Anders; Aliakbarpour, Maryam; Chen, Justin Y.; Silwal, Sandeep

计算机科学 > 数据结构与算法

arXiv:2509.03734 (cs)

[提交于 2025年9月3日 ]

标题：假设选择：一个高概率的难题

标题： Hypothesis Selection: A High Probability Conundrum

Authors:Anders Aamand, Maryam Aliakbarpour, Justin Y. Chen, Sandeep Silwal

摘要：在假设选择问题中，我们给定一个有限的候选分布集合（假设）$\mathcal{H} = \{H_1, \ldots, H_n\}$，并且有来自一个未知分布$P$的样本。我们的目标是找到一个假设$H_i$，其与$P$的总变分距离与$\mathcal{H}$中最近的假设的距离相当。如果最小距离是$\mathsf{OPT}$，我们旨在输出一个$H_i$，使得以至少$1-\delta$的概率，其与$P$的总变分距离至多为$C \cdot \mathsf{OPT} + \varepsilon$。尽管经过数十年的研究，这个问题的关键方面仍然未解决，包括达到最优样本复杂度的算法的最优运行时间和 $C=3$的最佳可能近似因子。之前最先进的结果[Aliakbarpour, Bun, Smith, NeurIPS 2024]提供了一个在$n$上几乎线性时间的算法，但对其他参数的依赖次优，运行时间为$\tilde{O}(n/(\delta^3\varepsilon^3))$时间。我们将这个时间复杂度改进为$\tilde{O}(n/(\delta \varepsilon^2))$，显著减少了对置信度和误差参数的依赖。此外，我们在三种替代设置中研究了假设选择，解决了或在若干先前工作中的开放问题上取得了进展。 (1) 我们确定了当限制输出假设的\textit{期望距离}时的最优近似因子，而不是其高概率性能。 (2) 假设提前知道\textit{$\mathsf{OPT}$是已知的}的数值，我们提出一种算法，得到$C=3$和运行时$\tilde{O}(n/\varepsilon^2)$，具有最优的样本复杂度，并在$n$中以高概率成功。 (3) 在观察样本之前允许在假设类$\mathcal{H}$上进行多项式\textit{预处理}步骤，我们提出一个具有$C=3$和次二次运行时间的算法，在$n$中以高概率成功。

摘要： In the hypothesis selection problem, we are given a finite set of candidate distributions (hypotheses), $\mathcal{H} = \{H_1, \ldots, H_n\}$, and samples from an unknown distribution $P$. Our goal is to find a hypothesis $H_i$ whose total variation distance to $P$ is comparable to that of the nearest hypothesis in $\mathcal{H}$. If the minimum distance is $\mathsf{OPT}$, we aim to output an $H_i$ such that, with probability at least $1-\delta$, its total variation distance to $P$ is at most $C \cdot \mathsf{OPT} + \varepsilon$. Despite decades of work, key aspects of this problem remain unresolved, including the optimal running time for algorithms that achieve the optimal sample complexity and best possible approximation factor of $C=3$. The previous state-of-the-art result [Aliakbarpour, Bun, Smith, NeurIPS 2024] provided a nearly linear in $n$ time algorithm but with a sub-optimal dependence on the other parameters, running in $\tilde{O}(n/(\delta^3\varepsilon^3))$ time. We improve this time complexity to $\tilde{O}(n/(\delta \varepsilon^2))$, significantly reducing the dependence on the confidence and error parameters. Furthermore, we study hypothesis selection in three alternative settings, resolving or making progress on several open questions from prior works. (1) We settle the optimal approximation factor when bounding the \textit{expected distance} of the output hypothesis, rather than its high-probability performance. (2) Assuming the numerical value of \textit{$\mathsf{OPT}$ is known} in advance, we present an algorithm obtaining $C=3$ and runtime $\tilde{O}(n/\varepsilon^2)$ with the optimal sample complexity and succeeding with high probability in $n$. (3) Allowing polynomial \textit{preprocessing} step on the hypothesis class $\mathcal{H}$ before observing samples, we present an algorithm with $C=3$ and subquadratic runtime which succeeds with high probability in $n$.

评论：	摘要已缩短以符合arxiv要求
主题：	数据结构与算法 (cs.DS) ; 机器学习 (cs.LG)
引用方式：	arXiv:2509.03734 [cs.DS]
	(或者 arXiv:2509.03734v1 [cs.DS] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.03734

提交历史

来自： Justin Chen [查看电子邮件]
[v1] 星期三， 2025 年 9 月 3 日 21:44:48 UTC (61 KB)

计算机科学 > 数据结构与算法

标题：假设选择：一个高概率的难题

标题： Hypothesis Selection: A High Probability Conundrum

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 数据结构与算法

标题： 假设选择：一个高概率的难题 显示英文标题

标题： Hypothesis Selection: A High Probability Conundrum

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：假设选择：一个高概率的难题