Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs.DL

Help | Advanced Search

Digital Libraries

  • Cross-lists
  • Replacements

See recent articles

Showing new listings for Tuesday, 30 September 2025

Total of 3 entries
Showing up to 2000 entries per page: fewer | more | all

Cross submissions (showing 2 of 2 entries )

[1] arXiv:2509.24283 (cross-list from cs.DL) [cn-pdf, pdf, html, other]
Title: Overview of SCIDOCA 2025 Shared Task on Citation Prediction, Discovery, and Placement
Title: SCIDOCA 2025引文预测、发现与放置联合任务概述
An Dao, Vu Tran, Le-Minh Nguyen, Yuji Matsumoto
Comments: 16 pages, SCIDOCA 2025
Subjects: Digital Libraries (cs.DL) ; Computation and Language (cs.CL)

We present an overview of the SCIDOCA 2025 Shared Task, which focuses on citation discovery and prediction in scientific documents. The task is divided into three subtasks: (1) Citation Discovery, where systems must identify relevant references for a given paragraph; (2) Masked Citation Prediction, which requires selecting the correct citation for masked citation slots; and (3) Citation Sentence Prediction, where systems must determine the correct reference for each cited sentence. We release a large-scale dataset constructed from the Semantic Scholar Open Research Corpus (S2ORC), containing over 60,000 annotated paragraphs and a curated reference set. The test set consists of 1,000 paragraphs from distinct papers, each annotated with ground-truth citations and distractor candidates. A total of seven teams registered, with three submitting results. We report performance metrics across all subtasks and analyze the effectiveness of submitted systems. This shared task provides a new benchmark for evaluating citation modeling and encourages future research in scientific document understanding. The dataset and task materials are publicly available at https://github.com/daotuanan/scidoca2025-shared-task.

我们介绍SCIDOCA 2025共享任务的概述,该任务专注于科学文档中的引用发现和预测。 该任务分为三个子任务:(1) 引用发现,其中系统必须为给定段落识别相关参考文献;(2) 被掩码的引用预测,需要为被掩码的引用位置选择正确的引用;以及 (3) 引用句子预测,其中系统必须确定每个被引用句子的正确参考文献。 我们发布了一个从Semantic Scholar Open Research Corpus (S2ORC) 构建的大规模数据集,包含超过60,000个标注的段落和一个精心挑选的参考文献集合。 测试集包括 1,000个来自不同论文的段落,每个段落都带有真实引用和干扰候选。 共有七支团队注册,其中三支提交了结果。 我们在所有子任务中报告性能指标并分析了提交系统的有效性。 这个共享任务为评估引用建模提供了一个新的基准,并鼓励未来在科学文档理解方面的研究。 数据集和任务材料可在https://github.com/daotuanan/scidoca2025-shared-task公开获取。

[2] arXiv:2509.24511 (cross-list from cs.DL) [cn-pdf, pdf, html, other]
Title: The Landscape of problematic papers in the field of non-coding RNA
Title: 非编码RNA领域中存在问题的论文概况
Ying Lou, Zhengyi Zhou, Guosheng Wang, Zhesi Shen, Menghui Li
Comments: 13 pages, 6 figures, 2 tables
Subjects: Digital Libraries (cs.DL) ; Social and Information Networks (cs.SI) ; Physics and Society (physics.soc-ph)

In recent years, the surge in retractions has been accompanied by numerous papers receiving comments that raise concerns about their reliability. The prevalence of problematic papers undermines the reliability of scientific research and threatens the foundation of evidence-based medicine. In this study,we focus on the field of non-coding RNA(ncRNA) as a case study to explore the typical characteristics of problematic papers from various perspectives, aiming to provide insights for addressing large-scale fraudulent publications. Research on under-investigated ncRNAs is more likely to yield problematic papers. These problematic papers often exhibit significant textual similarity, and many others sharing this similarity also display suspicious instances of image duplication. Healthcare institutions are particularly prone to publishing problematic papers, especially those with a low publication volume. Most problematic papers are found in a limited number of journals, and many journals inadequately address the commented papers. Our findings suggest that numerous problematic papers may still remain unidentified. The revealed characteristics offer valuable insights for formulating strategies to address the issue of fraudulent papers at scale.

近年来,撤稿数量的激增伴随着大量论文收到引发对其可靠性担忧的评论。 问题论文的普遍性损害了科学研究的可靠性,并威胁到循证医学的基础。 在本研究中,我们以非编码RNA(ncRNA)领域作为案例研究,从多个角度探讨问题论文的典型特征,旨在为应对大规模欺诈性出版物提供见解。 对研究不足的ncRNA进行的研究更容易产生问题论文。 这些问题论文通常表现出显著的文字相似性,许多具有这种相似性的论文也显示出可疑的图像重复。 医疗机构尤其容易发表问题论文,尤其是那些发表量较低的机构。 大多数问题论文集中在少数期刊上,许多期刊未能妥善处理被评论的论文。 我们的发现表明,可能仍有大量问题论文未被识别。 所揭示的特征为制定应对欺诈性论文大规模问题的策略提供了有价值的见解。

Replacement submissions (showing 1 of 1 entries )

[3] arXiv:2505.12452 (replaced) [cn-pdf, pdf, html, other]
Title: Automatically Advancing LLM Expertise in Technology Judgment
Title: 自动提升LLM在技术判断中的专业知识
Siyang Wu, Honglin Bao, Nadav Kunievsky, James A. Evans
Comments: We open-source our patent dataset at https://huggingface.co/datasets/UchiKlab/patent_understanding
Subjects: Computation and Language (cs.CL) ; Computers and Society (cs.CY) ; Digital Libraries (cs.DL) ; Information Retrieval (cs.IR)

Large language models (LLMs) are rapidly becoming core tools for science, engineering, and innovation. Their promise lies not just in remembering facts, but in putting knowledge to work. Despite their impressive ability to answer increasingly difficult questions, it remains unclear whether LLMs truly use their knowledge when confronted with new and challenging tasks. We address this question with a patent classification task that requires deep conceptual understanding: distinguishing objectively different but semantically similar patents. To evaluate this approach, we introduce a challenging new benchmark of 1.3 million post-2015 computer science patent pairs, characterized by dense technical jargon and strategically complex writing. We find that LLMs often fail our benchmark and struggle to distinguish among semantically similar patents. To probe this failure, we introduce a novel framework that decomposes model errors into two sources: missing and unused knowledge. Our approach asks models to generate clarifying questions to improve their understanding, and then compares three settings: raw performance, self-answered questions, and externally supplied answers. This decomposition reveals that LLMs often possess the relevant knowledge internally but fail to deploy it, while a smaller share of errors arises from genuine knowledge gaps. We then ask whether the ability of models to construct a task-specific database of questions and answers differs across models. We find that smaller models generate simpler, broadly transferable questions, while larger models propose more complex but less generalizable ones. This suggests new strategies for combining strengths across models. Our findings highlight a critical limitation of current LLMs and their evaluation: models often know more than they can use. LLM evaluation should shift from recall of static facts to application of dynamic knowledge.

大型语言模型(LLMs)正在迅速成为科学、工程和创新的核心工具。 它们的潜力不仅在于记住事实,还在于将知识应用于实际。 尽管它们在回答越来越困难的问题方面表现出色,但尚不清楚当面对新的和具有挑战性的任务时,LLMs是否真正使用了它们的知识。 我们通过一项需要深入概念理解的专利分类任务来解决这个问题:客观区分语义相似但不同的专利。 为了评估这种方法,我们引入了一个具有挑战性的新基准,包含130万条2015年后的计算机科学专利对,其特点是密集的技术术语和战略复杂的写作方式。 我们发现,LLMs通常无法通过我们的基准测试,并且难以区分语义相似的专利。 为了探究这种失败,我们引入了一个新的框架,将模型错误分解为两个来源:缺失的知识和未使用的知识。 我们的方法要求模型生成澄清性问题以提高它们的理解,然后比较三种设置:原始性能、自答问题和外部提供的答案。 这种分解表明,LLMs通常内部拥有相关知识,但在部署它们时失败,而较小比例的错误源于真正的知识差距。 随后,我们询问模型构建特定任务的问答数据库的能力是否在不同模型之间存在差异。 我们发现,较小的模型生成更简单且广泛可转移的问题,而较大的模型提出更复杂但不太可泛化的问答。 这表明了跨模型结合优势的新策略。 我们的研究结果突显了当前LLMs及其评估的一个关键限制:模型往往知道得比它们能使用得多。 LLM的评估应从静态事实的回忆转向动态知识的应用。

Total of 3 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号