Enhancing Semantic Document Retrieval- Employing Group Steiner Tree Algorithm with Domain Knowledge Enrichment

Kulkarni, Apurva; Ramanathan, Chandrashekar; Venugopal, Vinu E

计算机科学 > 信息检索

arXiv:2508.20543 (cs)

[提交于 2025年8月28日 ]

标题：增强语义文档检索—利用带有领域知识增强的组Steiner树算法

标题： Enhancing Semantic Document Retrieval- Employing Group Steiner Tree Algorithm with Domain Knowledge Enrichment

Authors:Apurva Kulkarni, Chandrashekar Ramanathan, Vinu E Venugopal

摘要：从具有不同特性的各种数据源中检索相关文档，对文档检索系统构成了重大挑战。当考虑到数据与领域知识之间的语义关系时，这一挑战的复杂性进一步增加。虽然现有的使用语义的检索系统（通常以从开放资源和通用领域知识创建的知识图谱表示）在提供相关结果方面具有潜力，但由于缺乏领域特定信息并依赖过时的知识源，其精度可能会受到影响。本研究的主要重点是两个关键贡献——a) 开发一种通用算法——“基于语义的概念检索使用组Steiner树”，该算法结合领域信息以增强语义感知的知识表示和数据访问，以及b) 在使用真实数据的文档检索系统中实现所提出的算法。为了评估SemDR系统的有效性，研究工作使用包含170个真实搜索查询的基准进行性能评估。通过领域专家进行严格的评估和验证，以确保结果的有效性和准确性。实验结果表明，与基线系统相比有显著进步，精度和准确率分别达到90%和82%，表明有良好的改进。

摘要： Retrieving pertinent documents from various data sources with diverse characteristics poses a significant challenge for Document Retrieval Systems. The complexity of this challenge is further compounded when accounting for the semantic relationship between data and domain knowledge. While existing retrieval systems using semantics (usually represented as Knowledge Graphs created from open-access resources and generic domain knowledge) hold promise in delivering relevant outcomes, their precision may be compromised due to the absence of domain-specific information and reliance on outdated knowledge sources. In this research, the primary focus is on two key contributions- a) the development of a versatile algorithm- 'Semantic-based Concept Retrieval using Group Steiner Tree' that incorporates domain information to enhance semantic-aware knowledge representation and data access, and b) the practical implementation of the proposed algorithm within a document retrieval system using real-world data. To assess the effectiveness of the SemDR system, research work conducts performance evaluations using a benchmark consisting of 170 real-world search queries. Rigorous evaluation and verification by domain experts are conducted to ensure the validity and accuracy of the results. The experimental findings demonstrate substantial advancements when compared to the baseline systems, with precision and accuracy achieving levels of 90% and 82% respectively, signifying promising improvements.

主题：	信息检索 (cs.IR) ; 计算机与社会 (cs.CY)
引用方式：	arXiv:2508.20543 [cs.IR]
	(或者 arXiv:2508.20543v1 [cs.IR] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.20543

提交历史

来自： Apurva Kulkarni [查看电子邮件]
[v1] 星期四， 2025 年 8 月 28 日 08:29:55 UTC (930 KB)

计算机科学 > 信息检索

标题：增强语义文档检索—利用带有领域知识增强的组Steiner树算法

标题： Enhancing Semantic Document Retrieval- Employing Group Steiner Tree Algorithm with Domain Knowledge Enrichment

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 信息检索

标题： 增强语义文档检索—利用带有领域知识增强的组Steiner树算法 显示英文标题

标题： Enhancing Semantic Document Retrieval- Employing Group Steiner Tree Algorithm with Domain Knowledge Enrichment

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：增强语义文档检索—利用带有领域知识增强的组Steiner树算法