Towards a Personal Health Large Language Model

Cosentino, Justin; Belyaeva, Anastasiya; Liu, Xin; Furlotte, Nicholas A.; Yang, Zhun; Lee, Chace; Schenck, Erik; Patel, Yojan; Cui, Jian; Schneider, Logan Douglas; Bryant, Robby; Gomes, Ryan G.; Jiang, Allen; Lee, Roy; Liu, Yun; Perez, Javier; Rogers, Jameson K.; Speed, Cathy; Tailor, Shyam; Walker, Megan; Yu, Jeffrey; Althoff, Tim; Heneghan, Conor; Hernandez, John; Malhotra, Mark; Stern, Leor; Matias, Yossi; Corrado, Greg S.; Patel, Shwetak; Shetty, Shravya; Zhan, Jiening; Prabhakara, Shruthi; McDuff, Daniel; McLean, Cory Y.

计算机科学 > 人工智能

arXiv:2406.06474 (cs)

[提交于 2024年6月10日 ]

标题：迈向个人健康大型语言模型

标题： Towards a Personal Health Large Language Model

摘要：在健康领域，大多数大型语言模型（LLM）的研究集中在临床任务上。然而，移动设备和可穿戴设备很少被整合到这些任务中，它们为个人健康监测提供了丰富的纵向数据。在这里，我们介绍了个性化健康大型语言模型（PH-LLM），该模型基于Gemini进行微调，以理解和推理数值时间序列的个人健康数据。我们创建并整理了三个数据集，用于测试1）从睡眠模式、身体活动和生理反应中生成个性化见解和建议，2）专家领域知识，以及3）预测自我报告的睡眠结果。对于第一个任务，我们与领域专家合作设计了857个案例研究，以评估睡眠和健身中的现实场景。通过全面评估领域特定的评分标准，我们观察到Gemini Ultra 1.0和PH-LLM在健身方面与专家表现没有统计学差异，而专家在睡眠方面仍然占优，但对PH-LLM进行微调显著提高了使用相关领域知识和为睡眠见解个性化信息的能力。我们使用多项选择睡眠医学和健身考试评估了PH-LLM的领域知识。PH-LLM在睡眠方面得分为79%，在健身方面得分为88%，超过了人类专家样本的平均分数。最后，我们训练PH-LLM从可穿戴数据的文本和多模态编码表示中预测自我报告的睡眠质量结果，并证明多模态编码是匹配专用判别模型性能所必需的。尽管在安全关键的个人健康领域需要进一步的发展和评估，但这些结果展示了Gemini模型的广泛知识和能力，以及像PH-LLM所做的那样将生理数据情境化对于个人健康应用的好处。

摘要： In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM.

评论：	72页
主题：	人工智能 (cs.AI) ; 计算与语言 (cs.CL)
引用方式：	arXiv:2406.06474 [cs.AI]
	(或者 arXiv:2406.06474v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2406.06474

提交历史

来自： Justin Cosentino [查看电子邮件]
[v1] 星期一， 2024 年 6 月 10 日 17:16:49 UTC (3,752 KB)

计算机科学 > 人工智能

标题：迈向个人健康大型语言模型

标题： Towards a Personal Health Large Language Model

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： 迈向个人健康大型语言模型 显示英文标题

标题： Towards a Personal Health Large Language Model

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：迈向个人健康大型语言模型