Feature Selection with Distance Correlation

Das, Ranit; Kasieczka, Gregor; Shih, David

High Energy Physics - Phenomenology

arXiv:2212.00046 (hep-ph)

[Submitted on 30 Nov 2022 ]

Title: Feature Selection with Distance Correlation

Title: 基于距离相关性的特征选择

Authors:Ranit Das, Gregor Kasieczka, David Shih

Abstract: Choosing which properties of the data to use as input to multivariate decision algorithms -- a.k.a. feature selection -- is an important step in solving any problem with machine learning. While there is a clear trend towards training sophisticated deep networks on large numbers of relatively unprocessed inputs (so-called automated feature engineering), for many tasks in physics, sets of theoretically well-motivated and well-understood features already exist. Working with such features can bring many benefits, including greater interpretability, reduced training and run time, and enhanced stability and robustness. We develop a new feature selection method based on Distance Correlation (DisCo), and demonstrate its effectiveness on the tasks of boosted top- and $W$-tagging. Using our method to select features from a set of over 7,000 energy flow polynomials, we show that we can match the performance of much deeper architectures, by using only ten features and two orders-of-magnitude fewer model parameters.

Abstract: 选择哪些数据属性作为多元决策算法的输入——即特征选择——是解决任何机器学习问题的重要步骤。虽然有越来越多的趋势是将复杂的深度网络训练在大量相对未处理的输入上（所谓的自动化特征工程），但在物理学的许多任务中，已经存在理论上有依据且理解充分的特征集。使用这些特征可以带来诸多好处，包括更高的可解释性、减少训练和运行时间，以及增强稳定性和鲁棒性。我们开发了一种基于距离相关性的新特征选择方法（DisCo），并在提升顶夸克和$W$-标签任务中展示了其有效性。通过从超过7,000个能量流多项式中选择特征，我们证明仅使用十个特征和少两个数量级的模型参数就可以达到更深层架构的性能。

Comments:	14 pages, 8 figures, 3 tables
Subjects:	High Energy Physics - Phenomenology (hep-ph) ; Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Data Analysis, Statistics and Probability (physics.data-an)
Cite as:	arXiv:2212.00046 [hep-ph]
	(or arXiv:2212.00046v1 [hep-ph] for this version)
	https://doi.org/10.48550/arXiv.2212.00046

Submission history

From: Ranit Das [view email]
[v1] Wed, 30 Nov 2022 19:00:04 UTC (1,774 KB)

High Energy Physics - Phenomenology

Title: Feature Selection with Distance Correlation

Title: 基于距离相关性的特征选择

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

High Energy Physics - Phenomenology

Title: Feature Selection with Distance Correlation Show Chinese title

Title: 基于距离相关性的特征选择

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Feature Selection with Distance Correlation