High-Dimensional Multi-Study Multi-Modality Covariate-Augmented Generalized Factor Model

Liu, Wei; Zhong, Qingzhi

统计学 > 方法论

arXiv:2507.09889 (stat)

[提交于 2025年7月14日 ]

标题：高维多研究多模态协变量增强广义因子模型

标题： High-Dimensional Multi-Study Multi-Modality Covariate-Augmented Generalized Factor Model

Authors:Wei Liu, Qingzhi Zhong

摘要：潜在因子模型将来自多个来源/研究或模态的数据进行整合，在各个学科中引起了广泛关注。然而，现有方法主要侧重于多研究整合或多模态整合，这使得它们在分析跨多个研究测量的多样化模态时显得不足。为了解决这一限制并满足实际需求，我们引入了一个高维广义因子模型，该模型能够无缝整合来自多个研究的多模态数据，同时还能适应额外的协变量。我们对可识别性条件进行了深入研究，以提高模型的可解释性。为了应对由四个大潜在随机矩阵引起的高维非线性整合的复杂性，我们利用变分下界通过采用变分后验分布来近似观察到的对数似然。通过对变分参数进行轮廓分析，我们使用M估计理论建立了模型参数估计量的渐近性质。此外，我们设计了一个计算高效的变分EM算法来执行估计过程，并制定了一种准则来确定研究共享和研究特异性因子的最佳数量。广泛的模拟研究和一个实际应用表明，所提出的方法在估计精度和计算效率方面显著优于现有方法。所提出方法的R包可在https://CRAN.R-project.org/package=MMGFM公开访问。

摘要： Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by four large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational EM algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency. The R package for the proposed method is publicly accessible at https://CRAN.R-project.org/package=MMGFM.

主题：	方法论 (stat.ME)
引用方式：	arXiv:2507.09889 [stat.ME]
	(或者 arXiv:2507.09889v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.09889

提交历史

来自： Qingzhi Zhong [查看电子邮件]
[v1] 星期一， 2025 年 7 月 14 日 03:48:53 UTC (1,536 KB)

统计学 > 方法论

标题：高维多研究多模态协变量增强广义因子模型

标题： High-Dimensional Multi-Study Multi-Modality Covariate-Augmented Generalized Factor Model

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 高维多研究多模态协变量增强广义因子模型 显示英文标题

标题： High-Dimensional Multi-Study Multi-Modality Covariate-Augmented Generalized Factor Model

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：高维多研究多模态协变量增强广义因子模型