Assessing high-order effects in feature importance via predictability decomposition

Ontivero-Ortega, Marlis; Faes, Luca; Cortes, Jesus M; Marinazzo, Daniele; Stramaglia, Sebastiano

doi:10.1103/PhysRevE.111.L033301

物理学 > 数据分析、统计与概率

arXiv:2412.09964 (physics)

[提交于 2024年12月13日 (v1) ，最后修订 2025年3月12日 (此版本， v2)]

标题：通过可预测性分解评估特征重要性的高阶效应

标题： Assessing high-order effects in feature importance via predictability decomposition

Authors:Marlis Ontivero-Ortega, Luca Faes, Jesus M Cortes, Daniele Marinazzo, Sebastiano Stramaglia

摘要：利用近年来大量用于描述随机变量多元交互中的冗余和协同作用的工作，我们提出了一种新颖的方法来量化特征重要性中的合作效应，这是可解释人工智能中最常用的技术之一。特别是，我们提出了一个知名特征重要性度量的自适应版本，称为留一协变量法（LOCO），以分解回归问题中涉及特定输入特征的高阶效应。 LOCO 是指在考虑的特征被加入到用于回归的所有特征集合时，预测误差的减少。与标准版本中使用所有可用特征计算 LOCO 不同，我们的方法寻找使 LOCO 最大化的特征组以及使 LOCO 最小化的特征组。这提供了 LOCO 的分解，将其表示为二体成分和高阶成分（冗余和协同）之和，同时也突出了与其他驱动特征一起构建这些高阶效应的特征。我们将该方法应用于基于 GEANT 模拟探测器测量的质子/介子鉴别任务中。

摘要： Leveraging the large body of work devoted in recent years to describe redundancy and synergy in multivariate interactions among random variables, we propose a novel approach to quantify cooperative effects in feature importance, one of the most used techniques for explainable artificial intelligence. In particular, we propose an adaptive version of a well-known metric of feature importance, named Leave One Covariate Out (LOCO), to disentangle high-order effects involving a given input feature in regression problems. LOCO is the reduction of the prediction error when the feature under consideration is added to the set of all the features used for regression. Instead of calculating the LOCO using all the features at hand, as in its standard version, our method searches for the multiplet of features that maximize LOCO and for the one that minimize it. This provides a decomposition of the LOCO as the sum of a two-body component and higher-order components (redundant and synergistic), also highlighting the features that contribute to building these high-order effects alongside the driving feature. We report the application to proton/pion discrimination from simulated detector measures by GEANT.

评论：	11页，3幅图
主题：	数据分析、统计与概率 (physics.data-an) ; 机器学习 (stat.ML)
引用方式：	arXiv:2412.09964 [physics.data-an]
	(或者 arXiv:2412.09964v2 [physics.data-an] 对于此版本)
	https://doi.org/10.48550/arXiv.2412.09964
期刊参考：	Phys. Rev. E 111, L033301 (2025)
相关 DOI:	https://doi.org/10.1103/PhysRevE.111.L033301

提交历史

来自： Sebastiano Stramaglia [查看电子邮件]
[v1] 星期五， 2024 年 12 月 13 日 08:47:16 UTC (768 KB)
[v2] 星期三， 2025 年 3 月 12 日 18:06:05 UTC (815 KB)

物理学 > 数据分析、统计与概率

标题：通过可预测性分解评估特征重要性的高阶效应

标题： Assessing high-order effects in feature importance via predictability decomposition

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

物理学 > 数据分析、统计与概率

标题： 通过可预测性分解评估特征重要性的高阶效应 显示英文标题

标题： Assessing high-order effects in feature importance via predictability decomposition

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通过可预测性分解评估特征重要性的高阶效应