P-values for classification

Duembgen, Lutz; Igl, Bernd-Wolfgang; Munk, Axel

doi:10.1214/08-EJS245

数学 > 统计理论

arXiv:0801.2934 (math)

[提交于 2008年1月18日 (v1) ，最后修订 2008年6月26日 (此版本， v3)]

标题：分类的 p 值

标题： P-values for classification

Authors:Lutz Duembgen, Bernd-Wolfgang Igl, Axel Munk

摘要：设 $(X,Y)$ 是一个随机变量，由一个观测到的特征向量$X\in \mathcal{X}$和一个未知联合分布的未观测类别标签$Y\in \{1,2,...,L\}$组成。此外，设$\mathcal{D}$为一个训练数据集，它由$n$个完全可观测的独立副本$(X,Y)$构成。通常的分类程序提供点预测器（分类器） $\widehat{Y}(X,\mathcal{D})$的$Y$或估计$Y$在给定$X$条件下的分布。为了量化分类$X$的确定性，我们提议为每个$\theta =1,2,...,L$构造一个p值$\pi_{\theta}(X,\mathcal{D})$来检验零假设$Y=\theta$，暂时将$Y$视为固定参数。换句话说，点预测器$\widehat{Y}(X,\mathcal{D})$被替换为一个具有特定置信水平的$Y$预测区域。我们认为（i）这种方法优于传统方法，以及（ii）任何合理的分类器都可以被修改以生成非参数p值。我们讨论了诸如最优性、单次使用和多次使用的有效性，以及计算和图形方面的问题。

摘要： Let $(X,Y)$ be a random variable consisting of an observed feature vector $X\in \mathcal{X}$ and an unobserved class label $Y\in \{1,2,...,L\}$ with unknown joint distribution. In addition, let $\mathcal{D}$ be a training data set consisting of $n$ completely observed independent copies of $(X,Y)$. Usual classification procedures provide point predictors (classifiers) $\widehat{Y}(X,\mathcal{D})$ of $Y$ or estimate the conditional distribution of $Y$ given $X$. In order to quantify the certainty of classifying $X$ we propose to construct for each $\theta =1,2,...,L$ a p-value $\pi_{\theta}(X,\mathcal{D})$ for the null hypothesis that $Y=\theta$, treating $Y$ temporarily as a fixed parameter. In other words, the point predictor $\widehat{Y}(X,\mathcal{D})$ is replaced with a prediction region for $Y$ with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

评论：	发表于 http://dx.doi.org/10.1214/08-EJS245 的《电子统计期刊》(http://www.i-journals.org/ejs/)，由数理统计研究所(http://www.imstat.org)出版
主题：	统计理论 (math.ST) ; 机器学习 (stat.ML)
MSC 类：	62C05, 62F25, 62G09, 62G15, 62H30 (Primary)
引用方式：	arXiv:0801.2934 [math.ST]
	(或者 arXiv:0801.2934v3 [math.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.0801.2934
期刊参考：	IMS-EJS-EJS_2008_245
相关 DOI:	https://doi.org/10.1214/08-EJS245

提交历史

来自： Lutz Dümbgen [查看电子邮件]
[v1] 星期五， 2008 年 1 月 18 日 16:44:02 UTC (362 KB)
[v2] 星期二， 2008 年 6 月 3 日 09:34:53 UTC (363 KB)
[v3] 星期四， 2008 年 6 月 26 日 08:14:11 UTC (440 KB)

数学 > 统计理论

标题：分类的 p 值

标题： P-values for classification

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 统计理论

标题： 分类的 p 值 显示英文标题

标题： P-values for classification

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：分类的 p 值