Evaluating Independence and Conditional Independence Measures

Ma, Jian

统计学 > 方法论

arXiv:2205.07253 (stat)

[提交于 2022年5月15日 ]

标题：评估独立性和条件独立性度量

标题： Evaluating Independence and Conditional Independence Measures

Authors:Jian Ma

摘要：独立性和条件独立性（CI）是概率论和统计学中的两个基本概念，可以用于解决许多统计推断的核心问题。有许多现有的独立性和CI度量是从不同的原理和概念中定义的。在本文中，回顾了16种独立性度量和16种CI度量，并用模拟数据和真实数据进行了评估。对于独立性度量，从正态分布、正态和阿基米德Copula函数生成了八组模拟数据，以在双变量或多元、线性或非线性设置中比较这些度量。两个UCI数据集，包括心脏病数据和葡萄酒质量数据，被用来测试独立性度量在真实条件下的功效。对于CI度量，使用了两组正态分布和Gumbel Copula的模拟数据以及一个真实数据（北京空气质量数据）来在预设的线性或非线性设置和真实场景中测试CI度量。从实验结果来看，大多数度量在模拟数据上表现良好，展示了模拟的正确单调性。然而，在更复杂的真实数据上，独立性和CI度量分别表现出差异，只有少数度量可以根据领域知识被认为表现良好。我们还发现，这些度量倾向于根据它们在每个设置中以及总体行为的相似性分成不同的组。根据实验，我们推荐CE作为独立性和CI度量的良好选择。这也是由于其严格的无分布定义和一致的非参数估计器。

摘要： Independence and Conditional Independence (CI) are two fundamental concepts in probability and statistics, which can be applied to solve many central problems of statistical inference. There are many existing independence and CI measures defined from diverse principles and concepts. In this paper, the 16 independence measures and 16 CI measures were reviewed and then evaluated with simulated and real data. For the independence measures, eight simulated data were generating from normal distribution, normal and Archimedean copula functions to compare the measures in bivariate or multivariate, linear or nonlinear settings. Two UCI dataset, including the heart disease data and the wine quality data, were used to test the power of the independence measures in real conditions. For the CI measures, two simulated data with normal distribution and Gumbel copula, and one real data (the Beijing air data) were utilized to test the CI measures in prespecified linear or nonlinear setting and real scenario. From the experimental results, we found that most of the measures work well on the simulated data by presenting the right monotonicity of the simulations. However, the independence and CI measures were differentiated on much complex real data respectively and only a few can be considered as working well with reference to domain knowledge. We also found that the measures tend to be separated into groups based on the similarity of the behaviors of them in each setting and in general. According to the experiments, we recommend CE as a good choice for both independence and CI measure. This is also due to its rigorous distribution-free definition and consistent nonparametric estimator.

评论：	53页，26图，3表
主题：	方法论 (stat.ME) ; 机器学习 (cs.LG)
引用方式：	arXiv:2205.07253 [stat.ME]
	(或者 arXiv:2205.07253v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2205.07253

提交历史

来自： Jian Ma [查看电子邮件]
[v1] 星期日， 2022 年 5 月 15 日 10:38:41 UTC (399 KB)

统计学 > 方法论

标题：评估独立性和条件独立性度量

标题： Evaluating Independence and Conditional Independence Measures

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 评估独立性和条件独立性度量 显示英文标题

标题： Evaluating Independence and Conditional Independence Measures

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：评估独立性和条件独立性度量