Heterogeneous LLM Methods for Ontology Learning (Few-Shot Prompting, Ensemble Typing, and Attention-Based Taxonomies)

Beliaeva, Aleksandra; Rahmatullaev, Temurbek

计算机科学 > 计算与语言

arXiv:2508.19428v1 (cs)

[提交于 2025年8月26日 ]

标题：异构大语言模型方法用于本体学习（少样本提示、集成类型和基于注意力的分类体系）

标题： Heterogeneous LLM Methods for Ontology Learning (Few-Shot Prompting, Ensemble Typing, and Attention-Based Taxonomies)

Authors:Aleksandra Beliaeva, Temurbek Rahmatullaev

摘要：我们提出一个全面的系统，用于解决LLMs4OL 2025挑战中的任务A、B和C，这些任务涵盖了整个本体构建流程：术语提取、类型分配和分类体系发现。我们的方法结合了检索增强提示、零样本分类和基于注意力的图建模——每种方法都针对相应任务的需求进行了定制。对于任务A，我们使用检索增强生成（RAG）流程联合提取领域特定的术语及其本体类型。训练数据被重新表述为文档到术语和类型的对应关系，而在测试时推理则利用语义相似的训练示例。这种单次通过的方法无需模型微调，并通过词汇增强提高整体性能。任务B，即为给定术语分配类型，采用双策略处理。在少量样本设置中（对于有标记训练数据的领域），我们重复使用RAG方案并进行少量样本提示。在零样本设置中（对于之前未见过的领域），我们使用一个零样本分类器，该分类器通过置信度加权组合多个嵌入模型的余弦相似性得分。在任务C中，我们将分类体系发现建模为图推理。使用类型标签的嵌入，我们训练一个轻量级的交叉注意力层，通过近似软邻接矩阵来预测is-a关系。这些模块化、任务特定的解决方案使我们在所有三个任务的官方排行榜上取得了顶级排名。综合来看，这些策略展示了基于LLM的架构在异构领域本体学习中的可扩展性、适应性和鲁棒性。代码可在以下地址获取：https://github.com/BelyaevaAlex/LLMs4OL-Challenge-Alexbek

摘要： We present a comprehensive system for addressing Tasks A, B, and C of the LLMs4OL 2025 challenge, which together span the full ontology construction pipeline: term extraction, typing, and taxonomy discovery. Our approach combines retrieval-augmented prompting, zero-shot classification, and attention-based graph modeling -- each tailored to the demands of the respective task. For Task A, we jointly extract domain-specific terms and their ontological types using a retrieval-augmented generation (RAG) pipeline. Training data was reformulated into a document to terms and types correspondence, while test-time inference leverages semantically similar training examples. This single-pass method requires no model finetuning and improves overall performance through lexical augmentation Task B, which involves assigning types to given terms, is handled via a dual strategy. In the few-shot setting (for domains with labeled training data), we reuse the RAG scheme with few-shot prompting. In the zero-shot setting (for previously unseen domains), we use a zero-shot classifier that combines cosine similarity scores from multiple embedding models using confidence-based weighting. In Task C, we model taxonomy discovery as graph inference. Using embeddings of type labels, we train a lightweight cross-attention layer to predict is-a relations by approximating a soft adjacency matrix. These modular, task-specific solutions enabled us to achieve top-ranking results in the official leaderboard across all three tasks. Taken together these strategies showcase the scalability, adaptability, and robustness of LLM-based architectures for ontology learning across heterogeneous domains. Code is available at: https://github.com/BelyaevaAlex/LLMs4OL-Challenge-Alexbek

主题：	计算与语言 (cs.CL) ; 计算机科学中的逻辑 (cs.LO); 符号计算 (cs.SC)
MSC 类：	68T30, 68T50, 68T07, 68U15
ACM 类：	I.2.4; I.2.7; H.3.1; H.3.3; I.2.6
引用方式：	arXiv:2508.19428 [cs.CL]
	(或者 arXiv:2508.19428v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.19428

提交历史

来自： Temurbek Rahmatullaev [查看电子邮件]
[v1] 星期二， 2025 年 8 月 26 日 20:50:16 UTC (750 KB)

计算机科学 > 计算与语言

标题：异构大语言模型方法用于本体学习（少样本提示、集成类型和基于注意力的分类体系）

标题： Heterogeneous LLM Methods for Ontology Learning (Few-Shot Prompting, Ensemble Typing, and Attention-Based Taxonomies)

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 异构大语言模型方法用于本体学习（少样本提示、集成类型和基于注意力的分类体系） 显示英文标题

标题： Heterogeneous LLM Methods for Ontology Learning (Few-Shot Prompting, Ensemble Typing, and Attention-Based Taxonomies)

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：异构大语言模型方法用于本体学习（少样本提示、集成类型和基于注意力的分类体系）