Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion

Wang, Hebin; Li, Yangning; Li, Yinghui; Zheng, Hai-Tao; Jiang, Wenhao; Kim, Hong-Gee

Computer Science > Computation and Language

arXiv:2501.00330 (cs)

[Submitted on 31 Dec 2024 ]

Title: Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion

Title: 探索多模态大语言模型的隐式语义能力：实体集扩展的初步研究

Authors:Hebin Wang, Yangning Li, Yinghui Li, Hai-Tao Zheng, Wenhao Jiang, Hong-Gee Kim

Abstract: The rapid development of multimodal large language models (MLLMs) has brought significant improvements to a wide range of tasks in real-world applications. However, LLMs still exhibit certain limitations in extracting implicit semantic information. In this paper, we apply MLLMs to the Multi-modal Entity Set Expansion (MESE) task, which aims to expand a handful of seed entities with new entities belonging to the same semantic class, and multi-modal information is provided with each entity. We explore the capabilities of MLLMs to understand implicit semantic information at the entity-level granularity through the MESE task, introducing a listwise ranking method LUSAR that maps local scores to global rankings. Our LUSAR demonstrates significant improvements in MLLM's performance on the MESE task, marking the first use of generative MLLM for ESE tasks and extending the applicability of listwise ranking.

Abstract: 多模态大语言模型（MLLMs）的快速发展在现实应用的广泛任务中带来了显著的改进。然而，LLMs在提取隐式语义信息方面仍然表现出一定的局限性。在本文中，我们将MLLMs应用于多模态实体集扩展（MESE）任务，该任务旨在通过属于同一语义类的新实体扩展少量种子实体，并且每个实体都提供了多模态信息。我们通过MESE任务探索MLLMs在实体级别粒度上理解隐式语义信息的能力，引入了一种列表排序方法LUSAR，该方法将局部得分映射到全局排名。我们的LUSAR在MLLMs的MESE任务性能上表现出显著的提升，标志着生成式MLLM首次用于ESE任务，并扩展了列表排序的适用性。

Comments:	ICASSP 2025
Subjects:	Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2501.00330 [cs.CL]
	(or arXiv:2501.00330v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.00330

Submission history

From: Yinghui Li [view email]
[v1] Tue, 31 Dec 2024 08:03:48 UTC (269 KB)

Computer Science > Computation and Language

Title: Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion

Title: 探索多模态大语言模型的隐式语义能力：实体集扩展的初步研究

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title: Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion Show Chinese title

Title: 探索多模态大语言模型的隐式语义能力：实体集扩展的初步研究

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion