A comparison of the discrete Kolmogorov-Smirnov statistic and the Euclidean distance

Carruth, Jacob; Tygert, Mark; Ward, Rachel

统计学 > 方法论

arXiv:1206.6367 (stat)

[提交于 2012年6月27日 ]

标题：离散 Kolmogorov-Smirnov 统计量与欧几里得距离的比较

标题： A comparison of the discrete Kolmogorov-Smirnov statistic and the Euclidean distance

Authors:Jacob Carruth, Mark Tygert, Rachel Ward

摘要：拟合优度检验用于衡量一组给定的观测值是否与从用户指定的概率分布（称为“模型”）中独立同分布（i.i.d.）抽取的结果一致（在预期的随机波动范围内）。标准的检验方法涉及模型与观察到的抽样经验分布之间的差异。一些差异度量是累积的，而另一些则不是。最流行的累积度量是 Kolmogorov-Smirnov 统计量；当所有考虑中的概率分布都是离散时，一个自然的非累积度量是模型与经验分布之间的欧几里得距离。本文通过数学分析及其对各种数据集的说明表明，当抽样的取值存在自然顺序（即数据为有序数据）时，Kolmogorov-Smirnov 统计量往往比欧几里得距离更强大；而在没有自然顺序（或部分顺序）的情况下（即数据为名义数据），欧几里得距离比 Kolmogorov-Smirnov 统计量更可靠且更容易理解。

摘要： Goodness-of-fit tests gauge whether a given set of observations is consistent (up to expected random fluctuations) with arising as independent and identically distributed (i.i.d.) draws from a user-specified probability distribution known as the "model." The standard gauges involve the discrepancy between the model and the empirical distribution of the observed draws. Some measures of discrepancy are cumulative; others are not. The most popular cumulative measure is the Kolmogorov-Smirnov statistic; when all probability distributions under consideration are discrete, a natural noncumulative measure is the Euclidean distance between the model and the empirical distributions. In the present paper, both mathematical analysis and its illustration via various data sets indicate that the Kolmogorov-Smirnov statistic tends to be more powerful than the Euclidean distance when there is a natural ordering for the values that the draws can take -- that is, when the data is ordinal -- whereas the Euclidean distance is more reliable and more easily understood than the Kolmogorov-Smirnov statistic when there is no natural ordering (or partial order) -- that is, when the data is nominal.

评论：	15页，6个图，3个表格
主题：	方法论 (stat.ME) ; 统计理论 (math.ST)
引用方式：	arXiv:1206.6367 [stat.ME]
	(或者 arXiv:1206.6367v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.1206.6367

提交历史

来自： Mark Tygert [查看电子邮件]
[v1] 星期三， 2012 年 6 月 27 日 19:15:25 UTC (25 KB)

统计学 > 方法论

标题：离散 Kolmogorov-Smirnov 统计量与欧几里得距离的比较

标题： A comparison of the discrete Kolmogorov-Smirnov statistic and the Euclidean distance

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 离散 Kolmogorov-Smirnov 统计量与欧几里得距离的比较 显示英文标题

标题： A comparison of the discrete Kolmogorov-Smirnov statistic and the Euclidean distance

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：离散 Kolmogorov-Smirnov 统计量与欧几里得距离的比较