Polygenic Modeling with Bayesian Sparse Linear Mixed Models

Zhou, Xiang; Carbonetto, Peter; Stephens, Matthew

定量生物学 > 定量方法

arXiv:1209.1341 (q-bio)

[提交于 2012年9月6日 (v1) ，最后修订 2012年11月14日 (此版本， v2)]

标题：基于贝叶斯稀疏线性混合模型的多基因建模

标题： Polygenic Modeling with Bayesian Sparse Linear Mixed Models

Authors:Xiang Zhou, Peter Carbonetto, Matthew Stephens

摘要：线性混合模型（LMMs）和稀疏回归模型在遗传学应用中被广泛使用，包括最近在全基因组关联研究中的多基因建模。这两种方法做出了非常不同的假设，因此预计在不同情况下表现良好。然而，在实践中，对于给定的数据集，通常不知道哪种假设会更准确。受此启发，我们考虑了这两种方法的混合，我们称之为“贝叶斯稀疏线性混合模型”（BSLMM），它包括这两种模型作为特例。我们解决了应用BSLMM时出现的几个关键计算和统计问题，包括对超参数的适当先验规范，以及用于后验推断的新型马尔可夫链蒙特卡罗算法。我们将BSLMM应用于两种多基因建模应用，并与其他方法进行比较：估计可用基因型解释的表型方差比例（PVE），以及表型（或育种值）预测。在PVE估计方面，我们证明了BSLMM结合了标准LMMs和稀疏回归建模的优点。在表型预测方面，它显著优于其他两种方法，以及之前为此问题提出的几种大规模回归方法。实现我们方法的软件可以从 http://stephenslab.uchicago.edu/software.html 免费获得。

摘要： Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications, including, recently, polygenic modeling in genome-wide association studies. These two approaches make very different assumptions, so are expected to perform well in different situations. However, in practice, for a given data set one typically does not know which assumptions will be more accurate. Motivated by this, we consider a hybrid of the two, which we refer to as a "Bayesian sparse linear mixed model" (BSLMM) that includes both these models as special cases. We address several key computational and statistical issues that arise when applying BSLMM, including appropriate prior specification for the hyper-parameters, and a novel Markov chain Monte Carlo algorithm for posterior inference. We apply BSLMM and compare it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction. For PVE estimation, we demonstrate that BSLMM combines the advantages of both standard LMMs and sparse regression modeling. For phenotype prediction it considerably outperforms either of the other two methods, as well as several other large-scale regression methods previously suggested for this problem. Software implementing our method is freely available from http://stephenslab.uchicago.edu/software.html

主题：	定量方法 (q-bio.QM) ; 基因组学 (q-bio.GN); 应用 (stat.AP); 方法论 (stat.ME)
引用方式：	arXiv:1209.1341 [q-bio.QM]
	(或者 arXiv:1209.1341v2 [q-bio.QM] 对于此版本)
	https://doi.org/10.48550/arXiv.1209.1341

提交历史

来自： Xiang Zhou [查看电子邮件]
[v1] 星期四， 2012 年 9 月 6 日 16:48:45 UTC (1,570 KB)
[v2] 星期三， 2012 年 11 月 14 日 22:30:27 UTC (2,476 KB)

定量生物学 > 定量方法

标题：基于贝叶斯稀疏线性混合模型的多基因建模

标题： Polygenic Modeling with Bayesian Sparse Linear Mixed Models

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

定量生物学 > 定量方法

标题： 基于贝叶斯稀疏线性混合模型的多基因建模 显示英文标题

标题： Polygenic Modeling with Bayesian Sparse Linear Mixed Models

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于贝叶斯稀疏线性混合模型的多基因建模