COMBO and COMMA: R packages for regression modeling and inference in the presence of misclassified binary mediator or outcome variables

Webb, Kimberly A. Hochstedler; Wells, Martin T.

统计学 > 计算

arXiv:2501.08320 (stat)

[提交于 2025年1月14日 ]

标题： COMBO 和 COMMA：在存在误分类二元中介变量或结果变量的情况下进行回归建模和推断的 R 包

标题： COMBO and COMMA: R packages for regression modeling and inference in the presence of misclassified binary mediator or outcome variables

Authors:Kimberly A. Hochstedler Webb, Martin T. Wells

摘要：错误分类的二元结果变量或中介变量可能导致参数估计中不可预测的偏差。随着越来越多原本不是为研究目的而收集的数据集被用于社会和健康科学的研究，解决数据质量问题的方法需求正在增长。在本文中，我们描述了两个R包，COMBO和COMMA，分别实现了对错误分类的二元结果变量和中介变量的偏差校正方法。这些基于似然的方法不需要金标准测量，并允许对错误分类变量的敏感性和特异性率进行估计。此外，这些R包会自动应用关键的标签切换校正，使研究人员能够绕过错误分类模型似然的固有排列不变性。我们使用一个关于律师考试通过率的研究来展示COMBO在单结果情况下的应用。我们开发并评估了一个基于噪声指标的风险预测模型，在预审风险评估研究中展示COMBO在多结果情况下的应用。此外，我们使用COMMA来评估潜在误诊的妊娠期高血压在母亲种族与出生体重关系中的中介效应。

摘要： Misclassified binary outcome or mediator variables can cause unpredictable bias in resulting parameter estimates. As more datasets that were not originally collected for research purposes are being used for studies in the social and health sciences, the need for methods that address data quality concerns is growing. In this paper, we describe two R packages, COMBO and COMMA, that implement bias-correction methods for misclassified binary outcome and mediator variables, respectively. These likelihood-based approaches do not require gold standard measures and allow for estimation of sensitivity and specificity rates for the misclassified variable(s). In addition, these R packages automatically apply crucial label switching corrections, allowing researchers to circumvent the inherent permutation invariance of the misclassification model likelihood. We demonstrate COMBO for single-outcome cases using a study of bar exam passage. We develop and evaluate a risk prediction model based on noisy indicators in a pretrial risk assessment study to demonstrate COMBO for multi-outcome cases. In addition, we use COMMA to evaluate the mediating effect of potentially misdiagnosed gestational hypertension on the maternal ethnicity-birthweight relationship.

评论：	99页，7图
主题：	计算 (stat.CO) ; 其他统计 (stat.OT)
引用方式：	arXiv:2501.08320 [stat.CO]
	(或者 arXiv:2501.08320v1 [stat.CO] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.08320

提交历史

来自： Kimberly Hochstedler Webb [查看电子邮件]
[v1] 星期二， 2025 年 1 月 14 日 18:53:22 UTC (339 KB)

统计学 > 计算

标题： COMBO 和 COMMA：在存在误分类二元中介变量或结果变量的情况下进行回归建模和推断的 R 包

标题： COMBO and COMMA: R packages for regression modeling and inference in the presence of misclassified binary mediator or outcome variables

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 计算

标题： COMBO 和 COMMA：在存在误分类二元中介变量或结果变量的情况下进行回归建模和推断的 R 包 显示英文标题

标题： COMBO and COMMA: R packages for regression modeling and inference in the presence of misclassified binary mediator or outcome variables

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： COMBO 和 COMMA：在存在误分类二元中介变量或结果变量的情况下进行回归建模和推断的 R 包