Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment

Zhou, Kai; Zhang, Shuhai; You, Zeng; Hu, Jinwu; Tan, Mingkui; Liu, Fei

计算机科学 > 计算机视觉与模式识别

arXiv:2507.00566 (cs)

[提交于 2025年7月1日 ]

标题：基于原型引导特征对齐的零样本骨架动作识别

标题： Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment

Authors:Kai Zhou, Shuhai Zhang, Zeng You, Jinwu Hu, Mingkui Tan, Fei Liu

摘要：零样本基于骨架的动作识别旨在在训练期间未接触过此类类别的情况下对未见过的基于骨架的人类动作进行分类。由于从已知动作到未知动作的泛化难度很大，这项任务极具挑战性。以往的研究通常采用两阶段训练：在可见动作类别上使用交叉熵损失对骨架编码器进行预训练，然后对预提取的骨架和文本特征进行对齐，通过骨架-文本对齐和语言模型的泛化能力实现知识迁移。然而，它们的效果受到以下因素的阻碍：1）骨架特征的区分度不足，因为固定的骨架编码器无法捕捉有效的骨架-文本对齐所需的相关对齐信息；2）在测试过程中忽略了骨架和未见文本特征之间的对齐偏差。为此，我们提出了一种原型引导的特征对齐范式，用于零样本基于骨架的动作识别，称为PGFA。具体来说，我们开发了一个端到端的跨模态对比训练框架，以提高骨架-文本对齐效果，确保骨架特征具有足够的区分度。此外，我们引入了一种原型引导的文本特征对齐策略，以减轻测试过程中分布差异的不利影响。我们提供了理论分析来支持我们的原型引导的文本特征对齐策略，并在三个知名数据集上进行了经验评估。与顶级竞争对手SMIE方法相比，我们的PGFA在NTU-60、NTU-120和PKU-MMD数据集上的绝对准确率分别提高了22.96%、12.53%和18.54%。

摘要： Zero-shot skeleton-based action recognition aims to classify unseen skeleton-based human actions without prior exposure to such categories during training. This task is extremely challenging due to the difficulty in generalizing from known to unknown actions. Previous studies typically use two-stage training: pre-training skeleton encoders on seen action categories using cross-entropy loss and then aligning pre-extracted skeleton and text features, enabling knowledge transfer to unseen classes through skeleton-text alignment and language models' generalization. However, their efficacy is hindered by 1) insufficient discrimination for skeleton features, as the fixed skeleton encoder fails to capture necessary alignment information for effective skeleton-text alignment; 2) the neglect of alignment bias between skeleton and unseen text features during testing. To this end, we propose a prototype-guided feature alignment paradigm for zero-shot skeleton-based action recognition, termed PGFA. Specifically, we develop an end-to-end cross-modal contrastive training framework to improve skeleton-text alignment, ensuring sufficient discrimination for skeleton features. Additionally, we introduce a prototype-guided text feature alignment strategy to mitigate the adverse impact of the distribution discrepancy during testing. We provide a theoretical analysis to support our prototype-guided text feature alignment strategy and empirically evaluate our overall PGFA on three well-known datasets. Compared with the top competitor SMIE method, our PGFA achieves absolute accuracy improvements of 22.96%, 12.53%, and 18.54% on the NTU-60, NTU-120, and PKU-MMD datasets, respectively.

评论：	本文已被IEEE TIP 2025接收。代码可在 https://github.com/kaai520/PGFA 公开获取。
主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2507.00566 [cs.CV]
	(或者 arXiv:2507.00566v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.00566

提交历史

来自： Mingkui Tan [查看电子邮件]
[v1] 星期二， 2025 年 7 月 1 日 08:34:35 UTC (7,110 KB)

计算机科学 > 计算机视觉与模式识别

标题：基于原型引导特征对齐的零样本骨架动作识别

标题： Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 基于原型引导特征对齐的零样本骨架动作识别 显示英文标题

标题： Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于原型引导特征对齐的零样本骨架动作识别