Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2506.12837v1

Help | Advanced Search

Computer Science > Databases

arXiv:2506.12837v1 (cs)
[Submitted on 15 Jun 2025 ]

Title: Towards Visualizing Electronic Medical Records via Natural Language Queries

Title: 通过自然语言查询可视化电子病历

Authors:Haodi Zhang, Siqi Ning, Qiyong Zheng, Jinyin Nie, Liangjie Zhang, Weicheng Wang, Yuanfeng Song
Abstract: Electronic medical records (EMRs) contain essential data for patient care and clinical research. With the diversity of structured and unstructured data in EHR, data visualization is an invaluable tool for managing and explaining these complexities. However, the scarcity of relevant medical visualization data and the high cost of manual annotation required to develop such datasets pose significant challenges to advancing medical visualization techniques. To address this issue, we propose an innovative approach using large language models (LLMs) for generating visualization data without labor-intensive manual annotation. We introduce a new pipeline for building text-to-visualization benchmarks suitable for EMRs, enabling users to visualize EMR statistics through natural language queries (NLQs). The dataset presented in this paper primarily consists of paired text medical records, NLQs, and corresponding visualizations, forming the first large-scale text-to-visual dataset for electronic medical record information called MedicalVis with 35,374 examples. Additionally, we introduce an LLM-based approach called MedCodeT5, showcasing its viability in generating EMR visualizations from NLQs, outperforming various strong text-to-visualization baselines. Our work facilitates standardized evaluation of EMR visualization methods while providing researchers with tools to advance this influential field of application. In a nutshell, this study and dataset have the potential to promote advancements in eliciting medical insights through visualization.
Abstract: 电子病历(EMRs)包含了患者护理和临床研究所需的重要数据。由于电子健康记录(EHR)中结构化和非结构化数据的多样性,数据可视化是管理和解释这些复杂性的宝贵工具。然而,相关医学可视化数据的稀缺性和开发此类数据集所需的大量手动注释成本,给推进医学可视化技术带来了重大挑战。为了解决这个问题,我们提出了一种创新的方法,利用大型语言模型(LLMs)生成无需劳动密集型人工标注的可视化数据。我们介绍了一个新的管道,用于构建适合EMRs的文本到可视化的基准测试,使用户能够通过自然语言查询(NLQs)可视化EMR统计数据。本文提供的数据集主要由成对的文本医疗记录、NLQs和相应的可视化组成,形成了第一个名为MedicalVis的大规模文本到视觉数据集,包含35,374个示例。此外,我们引入了一种基于LLM的方法MedCodeT5,展示了它从NLQs生成EMR可视化的可行性,优于各种强大的文本到可视化基线。我们的工作促进了EMR可视化方法的标准评估,同时为研究人员提供了推动这一有影响力的应用领域发展的工具。简而言之,这项研究和数据集有可能通过可视化促进获取医学洞察力的进步。
Subjects: Databases (cs.DB)
Cite as: arXiv:2506.12837 [cs.DB]
  (or arXiv:2506.12837v1 [cs.DB] for this version)
  https://doi.org/10.48550/arXiv.2506.12837
arXiv-issued DOI via DataCite

Submission history

From: Siqi Ning [view email]
[v1] Sun, 15 Jun 2025 13:15:28 UTC (666 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs.DB
< prev   |   next >
new | recent | 2025-06
Change to browse by:
cs

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号