Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

Dobbins, Nic; Xiong, Christelle; Lan, Kristine; Yetisgen, Meliha

计算机科学 > 计算与语言

arXiv:2505.23852 (cs)

[提交于 2025年5月29日 ]

标题：基于大型语言模型的研究可重复性代理：阿尔茨海默病的探索性研究

标题： Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

Authors:Nic Dobbins, Christelle Xiong, Kristine Lan, Meliha Yetisgen

摘要：目标：展示大型语言模型（LLMs）作为自主代理的能力，使用相同或相似的数据集重现已发表研究论文的结果。材料与方法：我们使用了国家阿尔茨海默病协调中心（NACC）的“快速访问”数据集。通过NACC数据识别引用率高的已发表研究论文，并选择了五个可以通过该数据集单独重现的研究。利用GPT-4o，我们创建了一个由基于LLM的自主代理组成的模拟研究团队，任务是编写和执行代码，动态重现每项研究的结果，仅根据研究摘要、方法部分以及数据字典描述。结果：我们从5个阿尔茨海默病研究的摘要中提取了35个关键发现。平均而言，LLM代理每项研究大约重现了53.2%的结果。数值和基于范围的发现经常在研究和代理之间有所不同。代理还应用了与原始研究不同的统计方法或参数，尽管总体趋势和显著性有时相似。讨论：在某些情况下，基于LLM的代理复制了研究技术和发现。在其他情况下，由于实现缺陷或缺失的方法细节而失败。这些差异显示了LLMs目前在完全自动化重现评估方面的局限性。然而，这项早期调查突显了基于结构化代理系统的潜力，可以提供科学严谨性的可扩展评估。结论：这项探索性工作展示了LLMs作为自主代理在生物医学研究中自动重现的潜力和局限性。

摘要： Objective: To demonstrate the capabilities of Large Language Models (LLMs) as autonomous agents to reproduce findings of published research studies using the same or similar dataset. Materials and Methods: We used the "Quick Access" dataset of the National Alzheimer's Coordinating Center (NACC). We identified highly cited published research manuscripts using NACC data and selected five studies that appeared reproducible using this dataset alone. Using GPT-4o, we created a simulated research team of LLM-based autonomous agents tasked with writing and executing code to dynamically reproduce the findings of each study, given only study Abstracts, Methods sections, and data dictionary descriptions of the dataset. Results: We extracted 35 key findings described in the Abstracts across 5 Alzheimer's studies. On average, LLM agents approximately reproduced 53.2% of findings per study. Numeric values and range-based findings often differed between studies and agents. The agents also applied statistical methods or parameters that varied from the originals, though overall trends and significance were sometimes similar. Discussion: In some cases, LLM-based agents replicated research techniques and findings. In others, they failed due to implementation flaws or missing methodological detail. These discrepancies show the current limits of LLMs in fully automating reproducibility assessments. Still, this early investigation highlights the potential of structured agent-based systems to provide scalable evaluation of scientific rigor. Conclusion: This exploratory work illustrates both the promise and limitations of LLMs as autonomous agents for automating reproducibility in biomedical research.

主题：	计算与语言 (cs.CL) ; 人工智能 (cs.AI); 多智能体系统 (cs.MA); 应用 (stat.AP)
引用方式：	arXiv:2505.23852 [cs.CL]
	(或者 arXiv:2505.23852v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2505.23852

提交历史

来自： Minqi Xiong [查看电子邮件]
[v1] 星期四， 2025 年 5 月 29 日 01:31:55 UTC (178 KB)

计算机科学 > 计算与语言

标题：基于大型语言模型的研究可重复性代理：阿尔茨海默病的探索性研究

标题： Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 基于大型语言模型的研究可重复性代理：阿尔茨海默病的探索性研究 显示英文标题

标题： Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于大型语言模型的研究可重复性代理：阿尔茨海默病的探索性研究