Towards Visualizing Electronic Medical Records via Natural Language Queries

Zhang, Haodi; Ning, Siqi; Zheng, Qiyong; Nie, Jinyin; Zhang, Liangjie; Wang, Weicheng; Song, Yuanfeng

Computer Science > Databases

arXiv:2506.12837v1 (cs)

[Submitted on 15 Jun 2025 ]

Title: Towards Visualizing Electronic Medical Records via Natural Language Queries

Title: 通过自然语言查询可视化电子病历

Authors:Haodi Zhang, Siqi Ning, Qiyong Zheng, Jinyin Nie, Liangjie Zhang, Weicheng Wang, Yuanfeng Song

Abstract: Electronic medical records (EMRs) contain essential data for patient care and clinical research. With the diversity of structured and unstructured data in EHR, data visualization is an invaluable tool for managing and explaining these complexities. However, the scarcity of relevant medical visualization data and the high cost of manual annotation required to develop such datasets pose significant challenges to advancing medical visualization techniques. To address this issue, we propose an innovative approach using large language models (LLMs) for generating visualization data without labor-intensive manual annotation. We introduce a new pipeline for building text-to-visualization benchmarks suitable for EMRs, enabling users to visualize EMR statistics through natural language queries (NLQs). The dataset presented in this paper primarily consists of paired text medical records, NLQs, and corresponding visualizations, forming the first large-scale text-to-visual dataset for electronic medical record information called MedicalVis with 35,374 examples. Additionally, we introduce an LLM-based approach called MedCodeT5, showcasing its viability in generating EMR visualizations from NLQs, outperforming various strong text-to-visualization baselines. Our work facilitates standardized evaluation of EMR visualization methods while providing researchers with tools to advance this influential field of application. In a nutshell, this study and dataset have the potential to promote advancements in eliciting medical insights through visualization.

Abstract: 电子病历（EMRs）包含了患者护理和临床研究所需的重要数据。由于电子健康记录（EHR）中结构化和非结构化数据的多样性，数据可视化是管理和解释这些复杂性的宝贵工具。然而，相关医学可视化数据的稀缺性和开发此类数据集所需的大量手动注释成本，给推进医学可视化技术带来了重大挑战。为了解决这个问题，我们提出了一种创新的方法，利用大型语言模型（LLMs）生成无需劳动密集型人工标注的可视化数据。我们介绍了一个新的管道，用于构建适合EMRs的文本到可视化的基准测试，使用户能够通过自然语言查询（NLQs）可视化EMR统计数据。本文提供的数据集主要由成对的文本医疗记录、NLQs和相应的可视化组成，形成了第一个名为MedicalVis的大规模文本到视觉数据集，包含35,374个示例。此外，我们引入了一种基于LLM的方法MedCodeT5，展示了它从NLQs生成EMR可视化的可行性，优于各种强大的文本到可视化基线。我们的工作促进了EMR可视化方法的标准评估，同时为研究人员提供了推动这一有影响力的应用领域发展的工具。简而言之，这项研究和数据集有可能通过可视化促进获取医学洞察力的进步。

Subjects:	Databases (cs.DB)
Cite as:	arXiv:2506.12837 [cs.DB]
	(or arXiv:2506.12837v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2506.12837

Submission history

From: Siqi Ning [view email]
[v1] Sun, 15 Jun 2025 13:15:28 UTC (666 KB)

Computer Science > Databases

Title: Towards Visualizing Electronic Medical Records via Natural Language Queries

Title: 通过自然语言查询可视化电子病历

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title: Towards Visualizing Electronic Medical Records via Natural Language Queries Show Chinese title

Title: 通过自然语言查询可视化电子病历

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Towards Visualizing Electronic Medical Records via Natural Language Queries