ArchiveGPT: A human-centered evaluation of using a vision language model for image cataloguing

Abele, Line; Anders, Gerrit; Aydın, Tolgahan; Buder, Jürgen; Fischer, Helen; Kimmel, Dominik; Huff, Markus

计算机科学 > 人机交互

arXiv:2507.07551 (cs)

[提交于 2025年7月10日 ]

标题： ArchiveGPT：使用视觉语言模型进行图像编目的以人为本评估

标题： ArchiveGPT: A human-centered evaluation of using a vision language model for image cataloguing

Authors:Line Abele, Gerrit Anders, Tolgahan Aydın, Jürgen Buder, Helen Fischer, Dominik Kimmel, Markus Huff

摘要：摄影资料的快速增长已经超过了人工编目，促使使用视觉语言模型（VLMs）来自动化元数据生成。本研究考察了人工智能生成的目录描述能否近似于人工撰写的质量，以及生成式人工智能如何整合到档案和博物馆资料的编目工作中。一个视觉语言模型（InternVL2）为带有考古内容标签的卡纸底座上的摄影作品生成了目录描述，在一个人为中心的实验框架中由档案和考古专家以及非专家进行评估。参与者将描述分类为人工智能生成或专家撰写，评估质量，并报告使用和信任人工智能工具的意愿。分类性能高于随机水平，两组都低估了自己检测人工智能生成描述的能力。 OCR错误和幻觉限制了感知质量，但评分更高的准确性及实用性描述更难分类，这表明需要人工审查以确保由现成模型生成的目录描述的准确性和质量，特别是在考古目录等专业领域。专家对采用人工智能工具的意愿较低，强调了保存责任而非技术性能的担忧。这些发现倡导一种协作方法，其中人工智能支持草稿生成，但应从属于人工验证，以确保与策展价值观（例如来源、透明度）一致。这种方法的成功整合不仅取决于技术进步，如特定领域的微调，而且更重要的是建立专业人士之间的信任，这可以通过一个透明和可解释的人工智能流程来促进。

摘要： The accelerating growth of photographic collections has outpaced manual cataloguing, motivating the use of vision language models (VLMs) to automate metadata generation. This study examines whether Al-generated catalogue descriptions can approximate human-written quality and how generative Al might integrate into cataloguing workflows in archival and museum collections. A VLM (InternVL2) generated catalogue descriptions for photographic prints on labelled cardboard mounts with archaeological content, evaluated by archive and archaeology experts and non-experts in a human-centered, experimental framework. Participants classified descriptions as AI-generated or expert-written, rated quality, and reported willingness to use and trust in AI tools. Classification performance was above chance level, with both groups underestimating their ability to detect Al-generated descriptions. OCR errors and hallucinations limited perceived quality, yet descriptions rated higher in accuracy and usefulness were harder to classify, suggesting that human review is necessary to ensure the accuracy and quality of catalogue descriptions generated by the out-of-the-box model, particularly in specialized domains like archaeological cataloguing. Experts showed lower willingness to adopt AI tools, emphasizing concerns on preservation responsibility over technical performance. These findings advocate for a collaborative approach where AI supports draft generation but remains subordinate to human verification, ensuring alignment with curatorial values (e.g., provenance, transparency). The successful integration of this approach depends not only on technical advancements, such as domain-specific fine-tuning, but even more on establishing trust among professionals, which could both be fostered through a transparent and explainable AI pipeline.

评论：	56页，7图
主题：	人机交互 (cs.HC) ; 人工智能 (cs.AI); 数字图书馆 (cs.DL)
引用方式：	arXiv:2507.07551 [cs.HC]
	(或者 arXiv:2507.07551v1 [cs.HC] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.07551

提交历史

来自： Markus Huff [查看电子邮件]
[v1] 星期四， 2025 年 7 月 10 日 08:49:15 UTC (5,631 KB)

计算机科学 > 人机交互

标题： ArchiveGPT：使用视觉语言模型进行图像编目的以人为本评估

标题： ArchiveGPT: A human-centered evaluation of using a vision language model for image cataloguing

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人机交互

标题： ArchiveGPT：使用视觉语言模型进行图像编目的以人为本评估 显示英文标题

标题： ArchiveGPT: A human-centered evaluation of using a vision language model for image cataloguing

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： ArchiveGPT：使用视觉语言模型进行图像编目的以人为本评估