AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care

Jabin, Md Asaduzzaman; Jiang, Hanqi; Li, Yiwei; Kaggwa, Patrick; Douglass, Eugene; Sekandi, Juliet N.; Liu, Tianming

计算机科学 > 计算机视觉与模式识别

arXiv:2505.00275 (cs)

[提交于 2025年5月1日 ]

标题： AdCare-VLM：利用大型视觉语言模型（LVLM）监测长期药物依从性和护理

标题： AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care

Authors:Md Asaduzzaman Jabin, Hanqi Jiang, Yiwei Li, Patrick Kaggwa, Eugene Douglass, Juliet N. Sekandi, Tianming Liu

摘要：慢性疾病，包括糖尿病、高血压、哮喘、艾滋病、癫痫和肺结核，需要严格遵守药物治疗以避免疾病进展、管理症状并降低死亡率。依从性常常受到患者行为、护理者支持、高昂的医疗费用以及医疗基础设施不足等因素的影响。我们提出了AdCare-VLM，这是一种基于Video-LLaVA的专用多模态大型视觉语言模型（LVLM），旨在通过患者的视频来进行关于药物依从性的视觉问答（VQA）。我们使用一个包含806个自定义标注的肺结核（TB）药物监测视频的私有数据集，这些视频由临床专家标记，以微调模型以检测依从性模式。我们介绍了LLM-TB-VQA，这是一个详细的医疗依从性VQA数据集，包含了正面、负面和模糊的依从性案例。我们的方法识别了可视特征之间的相关性，例如患者面部的清晰可见度、药物、饮水量以及摄入行为，并将其与标题中的相关医学概念关联起来。这促进了对齐的视觉-语言表示的整合，提高了多模态交互的效果。实验结果显示，我们的方法优于参数高效微调（PEFT）启用的VLM模型，如LLaVA-V1.5和Chat-UniVi，绝对改进范围从3.1\%到5.9\%，具体取决于不同的低秩适应（LoRA）配置。全面的消融研究和注意力图可视化证实了我们的方法，增强了可解释性。

摘要： Chronic diseases, including diabetes, hypertension, asthma, HIV-AIDS, epilepsy, and tuberculosis, necessitate rigorous adherence to medication to avert disease progression, manage symptoms, and decrease mortality rates. Adherence is frequently undermined by factors including patient behavior, caregiver support, elevated medical costs, and insufficient healthcare infrastructure. We propose AdCare-VLM, a specialized Video-LLaVA-based multimodal large vision language model (LVLM) aimed at visual question answering (VQA) concerning medication adherence through patient videos. We employ a private dataset comprising 806 custom-annotated tuberculosis (TB) medication monitoring videos, which have been labeled by clinical experts, to fine-tune the model for adherence pattern detection. We present LLM-TB-VQA, a detailed medical adherence VQA dataset that encompasses positive, negative, and ambiguous adherence cases. Our method identifies correlations between visual features, such as the clear visibility of the patient's face, medication, water intake, and the act of ingestion, and their associated medical concepts in captions. This facilitates the integration of aligned visual-linguistic representations and improves multimodal interactions. Experimental results indicate that our method surpasses parameter-efficient fine-tuning (PEFT) enabled VLM models, such as LLaVA-V1.5 and Chat-UniVi, with absolute improvements ranging from 3.1% to 3.54% across pre-trained, regular, and low-rank adaptation (LoRA) configurations. Comprehensive ablation studies and attention map visualizations substantiate our approach, enhancing interpretability.

主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2505.00275 [cs.CV]
	(或者 arXiv:2505.00275v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2505.00275

提交历史

来自： Md Asaduzzaman Jabin [查看电子邮件]
[v1] 星期四， 2025 年 5 月 1 日 03:48:12 UTC (2,144 KB)

计算机科学 > 计算机视觉与模式识别

标题： AdCare-VLM：利用大型视觉语言模型（LVLM）监测长期药物依从性和护理

标题： AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： AdCare-VLM：利用大型视觉语言模型（LVLM）监测长期药物依从性和护理 显示英文标题

标题： AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： AdCare-VLM：利用大型视觉语言模型（LVLM）监测长期药物依从性和护理