Inferring network structure in non-normal and mixed discrete-continuous genomic data

Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran

统计学 > 方法论

arXiv:1604.00376v1 (stat)

[提交于 2016年4月1日 ]

标题：推断非正态和混合离散-连续基因组数据的网络结构

标题： Inferring network structure in non-normal and mixed discrete-continuous genomic data

Authors:Anindya Bhadra, Arvind Rao, Veerabhadran Baladandayuthapani

摘要：通过无向图推断依赖结构对于揭示可能与癌症相关的高维基因组标记之间的主要多变量交互模式至关重要。传统上，条件独立性通常使用稀疏高斯图模型（适用于连续数据）和稀疏Ising模型（适用于离散数据）进行研究。然而，存在两种明显的情况，这些方法显得不足。第一种情况发生在数据为连续型但表现出非正态边缘行为时，例如厚尾或偏斜，这使得正态性假设不适用。第二种情况发生在数据部分为有序或离散型（例如突变的存在与否），而另一部分为连续型（例如基因或蛋白质的表达水平）时。在这种情况下，现有的贝叶斯方法通常采用潜在变量框架处理离散部分，从而无法推断实际观测数据间的条件独立性。本文在统一框架下克服了这两个挑战，采用了高斯比例混合模型。我们的框架能够处理非正态的连续数据以及混合连续和离散型数据，同时仍然能够推断观测数据中的稀疏条件符号独立性结构。模拟中的广泛性能对比与真实癌症基因组数据集的分析表明，所提出的方法的有效性。

摘要： Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach.

主题：	方法论 (stat.ME)
引用方式：	arXiv:1604.00376 [stat.ME]
	(或者 arXiv:1604.00376v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.1604.00376

提交历史

来自： Anindya Bhadra [查看电子邮件]
[v1] 星期五， 2016 年 4 月 1 日 19:37:31 UTC (678 KB)

统计学 > 方法论

标题：推断非正态和混合离散-连续基因组数据的网络结构

标题： Inferring network structure in non-normal and mixed discrete-continuous genomic data

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 推断非正态和混合离散-连续基因组数据的网络结构 显示英文标题

标题： Inferring network structure in non-normal and mixed discrete-continuous genomic data

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：推断非正态和混合离散-连续基因组数据的网络结构