Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics

Chion, Marie; Carapito, Christine; Bertrand, Frédéric

doi:10.1371/journal.pcbi.1010420

统计学 > 方法论

arXiv:2108.07086 (stat)

[提交于 2021年8月16日 ]

标题：基于质谱的无标记定量蛋白质组学中多重插补引起的变异性用于差异分析

标题： Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics

Authors:Marie Chion, Christine Carapito, Frédéric Bertrand

摘要：在无标记定量蛋白质组学中，填补缺失值是一种常见做法。填补的目标是用用户定义的值替换缺失值。然而，由于填补的数据集通常被视为始终完整，因此在填补过程下游可能并未充分考虑填补本身的影响。因此，由于填补而产生的不确定性并未得到充分考虑。我们提供了一种严格的多重填补策略，通过Rubin规则的应用，能够减少参数变异性的偏差估计。随后，基于填补后的肽段强度方差估计器使用贝叶斯分层模型进行调整。最终，该估计器被纳入调整后的t检验统计量中，以提供差异分析的结果。此工作流程可用于定量数据集中的肽段和蛋白水平分析。对于基于肽段水平定量数据的蛋白水平结果，还包括一个聚合步骤。我们的方法名为mi4p，与DAPAR R包中实现的最先进的limma工作流程相比，既适用于模拟数据也适用于真实数据。我们观察到敏感性和特异性之间存在权衡，而在F分数方面，mi4p的整体性能优于DAPAR。

摘要： Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters' variability thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. For protein-level results based on peptide-level quantification data, an aggregation step is also included. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.

评论：	所述方法学是在R环境中实现的，并可在GitHub上找到：https://github.com/mariechion/mi4p。导致此处呈现结果的R脚本也可以在该存储库中找到。真实数据集可于ProteomeXchange获得，数据集标识符为PXD003841和PXD027800。
主题：	方法论 (stat.ME) ; 定量方法 (q-bio.QM); 应用 (stat.AP)
引用方式：	arXiv:2108.07086 [stat.ME]
	(或者 arXiv:2108.07086v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2108.07086
期刊参考：	Chion M, Carapito C, Bertrand F (2022) Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. PLoS Comput Biol 18(8): e1010420
相关 DOI:	https://doi.org/10.1371/journal.pcbi.1010420

提交历史

来自： Marie Chion [查看电子邮件]
[v1] 星期一， 2021 年 8 月 16 日 13:40:35 UTC (353 KB)

统计学 > 方法论

标题：基于质谱的无标记定量蛋白质组学中多重插补引起的变异性用于差异分析

标题： Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 基于质谱的无标记定量蛋白质组学中多重插补引起的变异性用于差异分析 显示英文标题

标题： Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于质谱的无标记定量蛋白质组学中多重插补引起的变异性用于差异分析