Consistent estimation of the missing mass for feature models

Ayed, Fadhel; Battiston, Marco; Camerlenghi, Federico; Favaro, Stefano

数学 > 统计理论

arXiv:1902.10530 (math)

[提交于 2019年2月27日 ]

标题：特征模型中缺失质量的一致估计

标题： Consistent estimation of the missing mass for feature models

Authors:Fadhel Ayed, Marco Battiston, Federico Camerlenghi, Stefano Favaro

摘要：特征模型在机器学习中很受欢迎，并且最近被用于解决许多无监督学习问题。在这些模型中，每个观测值都具有一个有限的特征集，通常从无限集合 $(F_{j})_{j\geq 1}$ 中选择。每个观测值可以以未知概率 $p_{j}$ 显示特征 $F_{j}$。这些模型中存在的一个统计问题是，在给定初始样本的情况下，如何估计未来观测中尚未出现的特征的条件期望数量。这个问题通常被称为缺失质量问题。在这项工作中，我们证明了使用适当的乘法损失函数并且不对参数 $p_{j}$ 施加任何假设时，不存在对缺失质量普遍一致的估计器。在论文的第二部分，我们将注意力集中在一类常见的重尾概率 $(p_{j})_{j\geq 1}$上，这类概率在许多实际应用中都很常见，并且我们展示了在这种受限的概率类中，Ayed 等人（2017）建议的非参数缺失质量估计器是强一致的。作为副产品，我们将推导出关于缺失质量和样本大小为 $n$的观测特征频率的集中不等式。

摘要： Feature models are popular in machine learning and they have been recently used to solve many unsupervised learning problems. In these models every observation is endowed with a finite set of features, usually selected from an infinite collection $(F_{j})_{j\geq 1}$. Every observation can display feature $F_{j}$ with an unknown probability $p_{j}$. A statistical problem inherent to these models is how to estimate, given an initial sample, the conditional expected number of hitherto unseen features that will be displayed in a future observation. This problem is usually referred to as the missing mass problem. In this work we prove that, using a suitable multiplicative loss function and without imposing any assumptions on the parameters $p_{j}$, there does not exist any universally consistent estimator for the missing mass. In the second part of the paper, we focus on a special class of heavy-tailed probabilities $(p_{j})_{j\geq 1}$, which are common in many real applications, and we show that, within this restricted class of probabilities, the nonparametric estimator of the missing mass suggested by Ayed et al. (2017) is strongly consistent. As a byproduct result, we will derive concentration inequalities for the missing mass and the number of features observed with a specified frequency in a sample of size $n$.

主题：	统计理论 (math.ST)
引用方式：	arXiv:1902.10530 [math.ST]
	(或者 arXiv:1902.10530v1 [math.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.1902.10530

提交历史

来自： Fadhel Ayed [查看电子邮件]
[v1] 星期三， 2019 年 2 月 27 日 13:51:35 UTC (16 KB)

数学 > 统计理论

标题：特征模型中缺失质量的一致估计

标题： Consistent estimation of the missing mass for feature models

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 统计理论

标题： 特征模型中缺失质量的一致估计 显示英文标题

标题： Consistent estimation of the missing mass for feature models

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：特征模型中缺失质量的一致估计