Machine learning method for enforcing variable independence in background estimation with LHC data: ABCDisCoTEC

CMS Collaboration

高能物理 - 实验

arXiv:2506.08826v1 (hep-ex)

[提交于 2025年6月10日 ]

标题：基于机器学习的背景估计中变量独立性约束方法：ABCDiscoTEC

标题： Machine learning method for enforcing variable independence in background estimation with LHC data: ABCDisCoTEC

Authors:CMS Collaboration

摘要：针对利用观测数据估计信号搜索背景同时最大化搜索灵敏度的问题，提出了一种新的解决方案。 “ABCD 方法”通过将事件划分为一个信号增强区域（A）和三个背景增强控制区域（B、C 和 D）提供了可靠的背景估计框架，这两个区域由两个统计独立的变量确定。实际上，即使这两个变量之间存在轻微的相关性，也可能显著削弱该方法的性能。因此，手动选择合适的变量可能是一项艰巨的挑战，尤其是在背景和信号仅存在微妙差异的情况下。为了解决这一问题，基于距离相关性的 ABCD 方法（ABCDisCo）被开发出来，从经过训练以最大化信号-背景区分且最小化相关性的神经网络输出分数中构建了两个人工变量。然而，仅依赖于最小化距离相关性可能导致结果分布出现不理想特性，这可能会损害使用此方法获得的背景预测的有效性。引入了带有闭合性增强的 ABCDisCo 训练方法（ABCDisCoTEC），通过直接最小化非闭合性（表示为专用可微损失项）来解决这一问题。该扩展方法被应用于由 CERN LHC 的 CMS 探测器记录的质子-质子碰撞数据集，中心-of-mass 能量为 13 TeV。此外，鉴于多个损失项约束下的最小化问题的复杂性，应用了修改后的乘数法差分方法，并表明与网格搜索超参数优化程序相比，它大大提高了 ABCDisCoTEC 方法的稳定性和鲁棒性。

摘要： A novel solution is presented for the problem of estimating the backgrounds of a signal search using observed data while simultaneously maximizing the sensitivity of the search to the signal. The ``ABCD method'' provides a reliable framework for background estimation by partitioning events into one signal-enhanced region (A) and three background-enhanced control regions (B, C, and D) via two statistically independent variables. In practice, even slight correlations between the two variables can significantly undermine the method's performance. Thus, choosing appropriate variables by hand can present a formidable challenge, especially when background and signal differ only subtly. To address this issue, the ABCD with distance correlation (ABCDisCo) method was developed to construct two artificial variables from the output scores of a neural network trained to maximize signal-background discrimination while minimizing correlations using the distance correlation measure. However, relying solely on minimizing the distance correlation can yield undesirable characteristics in the resulting distributions, which may compromise the validity of the background prediction obtained using this method. The ABCDisCo training enhanced with closure (ABCDisCoTEC) method is introduced to solve this issue by directly minimizing the nonclosure, expressed as a dedicated differentiable loss term. This extended method is applied to a data set of proton-proton collisions at a center-of-mass energy of 13 TeV recorded by the CMS detector at the CERN LHC. Additionally, given the complexity of the minimization problem with constraints on multiple loss terms, the modified differential method of multipliers is applied and shown to greatly improve the stability and robustness of the ABCDisCoTEC method, compared to grid search hyperparameter optimization procedures.

评论：	投稿至《Machine Learning: Science and Technology》。所有图表可在http://cms-results.web.cern.ch/cms-results/public-results/publications/MLG-23-003（CMS公共页面）找到。
主题：	高能物理 - 实验 (hep-ex)
引用方式：	arXiv:2506.08826 [hep-ex]
	(或者 arXiv:2506.08826v1 [hep-ex] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.08826
期刊参考：	CMS-MLG-23-003, CERN-EP-2025-092

提交历史

来自： The CMS Collaboration [查看电子邮件]
[v1] 星期二， 2025 年 6 月 10 日 14:15:46 UTC (2,940 KB)

高能物理 - 实验

标题：基于机器学习的背景估计中变量独立性约束方法：ABCDiscoTEC

标题： Machine learning method for enforcing variable independence in background estimation with LHC data: ABCDisCoTEC

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

高能物理 - 实验

标题： 基于机器学习的背景估计中变量独立性约束方法：ABCDiscoTEC 显示英文标题

标题： Machine learning method for enforcing variable independence in background estimation with LHC data: ABCDisCoTEC

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于机器学习的背景估计中变量独立性约束方法：ABCDiscoTEC