InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion

Raghu, Ananya; Raghu, Anisha; Tang, Alice S.; Paulus, Yannis M.; Kim, Tyson N.; Oskotsky, Tomiko T.

电气工程与系统科学 > 图像与视频处理

arXiv:2507.12669 (eess)

[提交于 2025年7月16日 ]

标题： InSight：使用多模态融合的AI移动筛查工具用于多种眼病检测

标题： InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion

Authors:Ananya Raghu, Anisha Raghu, Alice S. Tang, Yannis M. Paulus, Tyson N. Kim, Tomiko T. Oskotsky

摘要：背景/目的：年龄相关性黄斑变性、青光眼、糖尿病视网膜病变（DR）、糖尿病黄斑水肿和病理性近视影响着全球数亿人。这些疾病的早期筛查至关重要，但在低收入和中等收入国家以及资源有限的环境中，医疗保健的可及性仍然有限。我们开发了InSight，这是一个基于人工智能的应用程序，结合患者元数据和眼底图像，以准确诊断五种常见的眼部疾病，从而提高筛查的可及性。方法：InSight具有三阶段流程：实时图像质量评估、疾病诊断模型和一个用于评估严重程度的DR分级模型。我们的疾病诊断模型包含三项关键创新：(a) 结合临床元数据和图像的多模态融合技术（MetaFusion）；(b) 利用监督和自监督损失函数的预训练方法；以及(c) 多任务模型，可以同时预测5种疾病。我们使用了BRSET（实验室捕获的图像）和mBRSET（智能手机捕获的图像）数据集，这两个数据集也包含用于模型训练/评估的临床元数据。结果：在BRSET和mBRSET图像数据集上进行训练，图像质量检查器在过滤低质量眼底图像方面达到了近100%的准确率。多模态预训练疾病诊断模型在BRSET上的平衡准确率比仅使用图像的模型高出6%，在mBRSET上高出4%。结论：InSight流程在各种图像条件下表现出稳健性，并且在所有五种疾病中都具有高诊断准确性，能够推广到智能手机和实验室捕获的图像。多任务模型有助于流程的轻量化，使其计算效率是对应每种疾病的五个独立模型的五倍。

摘要： Background/Objectives: Age-related macular degeneration, glaucoma, diabetic retinopathy (DR), diabetic macular edema, and pathological myopia affect hundreds of millions of people worldwide. Early screening for these diseases is essential, yet access to medical care remains limited in low- and middle-income countries as well as in resource-limited settings. We develop InSight, an AI-based app that combines patient metadata with fundus images for accurate diagnosis of five common eye diseases to improve accessibility of screenings. Methods: InSight features a three-stage pipeline: real-time image quality assessment, disease diagnosis model, and a DR grading model to assess severity. Our disease diagnosis model incorporates three key innovations: (a) Multimodal fusion technique (MetaFusion) combining clinical metadata and images; (b) Pretraining method leveraging supervised and self-supervised loss functions; and (c) Multitask model to simultaneously predict 5 diseases. We make use of BRSET (lab-captured images) and mBRSET (smartphone-captured images) datasets, both of which also contain clinical metadata for model training/evaluation. Results: Trained on a dataset of BRSET and mBRSET images, the image quality checker achieves near-100% accuracy in filtering out low-quality fundus images. The multimodal pretrained disease diagnosis model outperforms models using only images by 6% in balanced accuracy for BRSET and 4% for mBRSET. Conclusions: The InSight pipeline demonstrates robustness across varied image conditions and has high diagnostic accuracy across all five diseases, generalizing to both smartphone and lab captured images. The multitask model contributes to the lightweight nature of the pipeline, making it five times computationally efficient compared to having five individual models corresponding to each disease.

主题：	图像与视频处理 (eess.IV) ; 人工智能 (cs.AI); 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2507.12669 [eess.IV]
	(或者 arXiv:2507.12669v1 [eess.IV] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.12669

提交历史

来自： Tomiko Oskotsky [查看电子邮件]
[v1] 星期三， 2025 年 7 月 16 日 23:00:10 UTC (1,386 KB)

电气工程与系统科学 > 图像与视频处理

标题： InSight：使用多模态融合的AI移动筛查工具用于多种眼病检测

标题： InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 图像与视频处理

标题： InSight：使用多模态融合的AI移动筛查工具用于多种眼病检测 显示英文标题

标题： InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： InSight：使用多模态融合的AI移动筛查工具用于多种眼病检测