A Simple Method for PMF Estimation on Large Supports

Shtoff, Alex

计算机科学 > 机器学习

arXiv:2510.15132 (cs)

[提交于 2025年10月16日 ]

标题：一种在大支撑集上进行PMF估计的简单方法

标题： A Simple Method for PMF Estimation on Large Supports

Authors:Alex Shtoff

摘要：我们研究在大离散支持上概率质量函数（PMF）的非参数估计，其中PMF是多峰且重尾的。核心思想是将经验PMF视为线图上的信号，并应用一个数据相关的低通滤波器。具体来说，我们构造一个对称三对角算子，即从经验PMF构建的对角矩阵扰动的路径图拉普拉斯矩阵，然后计算对应于最小频率特征值的特征向量。将经验PMF投影到这个低维子空间上会产生一个平滑的、多峰的估计，它保留粗粒结构的同时抑制噪声。一个轻量级的后处理步骤，包括截断和重新归一化，得到一个有效的PMF。因为我们计算对称三对角矩阵的特征对，计算是可靠的，运行时间和内存与支持大小和所需低维子空间的维度成比例。我们还提供了一个实用的数据驱动规则，基于正交系列风险估计选择维度，因此该方法在最小调参的情况下“即可工作”。在合成和真实重尾例子中，该方法在抑制抽样噪声的同时保留了粗粒结构，在预期范围内与对数样条和高斯-KDE基线相比表现更优。然而，它已知存在失败模式（例如，突然的不连续性）。该方法实现简短，适用于不同样本大小，由于其可靠性和速度，适合自动化流程和大规模探索性分析。

摘要： We study nonparametric estimation of a probability mass function (PMF) on a large discrete support, where the PMF is multi-modal and heavy-tailed. The core idea is to treat the empirical PMF as a signal on a line graph and apply a data-dependent low-pass filter. Concretely, we form a symmetric tri-diagonal operator, the path graph Laplacian perturbed with a diagonal matrix built from the empirical PMF, then compute the eigenvectors, corresponding to the smallest feq eigenvalues. Projecting the empirical PMF onto this low dimensional subspace produces a smooth, multi-modal estimate that preserves coarse structure while suppressing noise. A light post-processing step of clipping and re-normalizing yields a valid PMF. Because we compute the eigenpairs of a symmetric tridiagonal matrix, the computation is reliable and runs time and memory proportional to the support times the dimension of the desired low-dimensional supspace. We also provide a practical, data-driven rule for selecting the dimension based on an orthogonal-series risk estimate, so the method "just works" with minimal tuning. On synthetic and real heavy-tailed examples, the approach preserves coarse structure while suppressing sampling noise, compares favorably to logspline and Gaussian-KDE baselines in the intended regimes. However, it has known failure modes (e.g., abrupt discontinuities). The method is short to implement, robust across sample sizes, and suitable for automated pipelines and exploratory analysis at scale because of its reliability and speed.

主题：	机器学习 (cs.LG) ; 机器学习 (stat.ML)
引用方式：	arXiv:2510.15132 [cs.LG]
	(或者 arXiv:2510.15132v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.15132

提交历史

来自： Alex Shtoff [查看电子邮件]
[v1] 星期四， 2025 年 10 月 16 日 20:47:40 UTC (3,619 KB)

计算机科学 > 机器学习

标题：一种在大支撑集上进行PMF估计的简单方法

标题： A Simple Method for PMF Estimation on Large Supports

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 一种在大支撑集上进行PMF估计的简单方法 显示英文标题

标题： A Simple Method for PMF Estimation on Large Supports

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：一种在大支撑集上进行PMF估计的简单方法