Post-reduction inference for confidence sets of models

Battey, Heather; Rasines, Daniel Garcia; Tang, Yanbo

数学 > 统计理论

arXiv:2507.10373 (math)

[提交于 2025年7月14日 ]

标题：减少后的置信集模型推理

标题： Post-reduction inference for confidence sets of models

Authors:Heather Battey, Daniel Garcia Rasines, Yanbo Tang

摘要：在回归背景下，稀疏性使模型本身成为研究对象，指向一个模型的置信集作为证据的适当呈现方式。在基因组学等变量数量庞大的领域中，出现的困难来自于在模型评估之前需要进行初步的变量减少。本文考虑使用与费舍尔条件推断方法基本相关的推断分离方法，即充分性/共充分性分离和辅助性/共辅助性分离。这些分离方法的优势在于不需要任何假设模型的偏离方向，从而避免了使用相同数据进行变量减少和模型评估时可能出现的问题。在没有干扰参数的理想情况下，这些分离方法提取了数据中的全部信息，仅用于其有用的目的，不会丢失或冗余。在正态理论线性回归模型中详细展示了干扰参数估计对理想信息提取的影响，并立即扩展到时间至事件结果的对数正态加速寿命模型。这种理想化分析提供了关于样本分割何时可能表现得与共充分性或辅助性检验一样好，甚至更好的见解，以及何时可能不可靠。对将详细实现扩展到规范指数族和更一般的回归模型所涉及的考虑因素进行了简要讨论。作为高斯模型分析的一部分，我们引入了Fan等人(2012)提出的重新拟合交叉验证估计量的修改版本，其分布理论在适当的条件意义上是精确的。

摘要： Sparsity in a regression context makes the model itself an object of interest, pointing to a confidence set of models as the appropriate presentation of evidence. A difficulty in areas such as genomics, where the number of candidate variables is vast, arises from the need for preliminary reduction prior to the assessment of models. The present paper considers a resolution using inferential separations fundamental to the Fisherian approach to conditional inference, namely, the sufficiency/co-sufficiency separation, and the ancillary/co-ancillary separation. The advantage of these separations is that no direction for departure from any hypothesised model is needed, avoiding issues that would otherwise arise from using the same data for reduction and for model assessment. In idealised cases with no nuisance parameters, the separations extract all the information in the data, solely for the purpose for which it is useful, without loss or redundancy. The extent to which estimation of nuisance parameters affects the idealised information extraction is illustrated in detail for the normal-theory linear regression model, extending immediately to a log-normal accelerated-life model for time-to-event outcomes. This idealised analysis provides insight into when sample-splitting is likely to perform as well as, or better than, the co-sufficient or ancillary tests, and when it may be unreliable. The considerations involved in extending the detailed implementation to canonical exponential-family and more general regression models are briefly discussed. As part of the analysis for the Gaussian model, we introduce a modified version of the refitted cross-validation estimator of Fan et al. (2012), whose distribution theory is exact in an appropriate conditional sense.

主题：	统计理论 (math.ST) ; 方法论 (stat.ME)
引用方式：	arXiv:2507.10373 [math.ST]
	(或者 arXiv:2507.10373v1 [math.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.10373

提交历史

来自： Heather Battey Dr [查看电子邮件]
[v1] 星期一， 2025 年 7 月 14 日 15:14:27 UTC (36 KB)

数学 > 统计理论

标题：减少后的置信集模型推理

标题： Post-reduction inference for confidence sets of models

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 统计理论

标题： 减少后的置信集模型推理 显示英文标题

标题： Post-reduction inference for confidence sets of models

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：减少后的置信集模型推理