Improving Skeleton-based Action Recognition with Interactive Object Information

Wen, Hao; Lu, Ziqian; Shen, Fengli; Lu, Zhe-Ming; Cui, Jialin

计算机科学 > 计算机视觉与模式识别

arXiv:2501.05066 (cs)

[提交于 2025年1月9日 ]

标题：基于交互物体信息的骨架动作识别改进方法

标题： Improving Skeleton-based Action Recognition with Interactive Object Information

Authors:Hao Wen, Ziqian Lu, Fengli Shen, Zhe-Ming Lu, Jialin Cui

摘要：人体骨骼信息在基于骨骼的动作识别中很重要，它提供了一种简单且高效的方式来描述人体姿态。然而，现有的基于骨骼的方法更多地关注骨骼，忽略了与人类交互的物体，导致在识别涉及物体交互的动作时表现不佳。我们提出了一种新的动作识别框架，引入物体节点以补充缺失的交互物体信息。我们还提出了时空可变图卷积网络（ST-VGCN），以有效建模包含物体节点的可变图（VG）。具体来说，为了验证交互物体信息的作用，通过利用一种简单的自训练方法，我们建立了一个新数据集 JXGC 24 和一个扩展数据集 NTU RGB+D+Object 60，包括超过 200 万个额外的物体节点。同时，我们设计了可变图构建方法，以适应图结构中可变数量的节点。此外，我们首次探索了引入额外物体信息所带来的过拟合问题，并提出了一种基于 VG 的数据增强方法来解决这个问题，称为随机节点攻击。最后，在网络结构方面，我们引入了两个融合模块 CAF 和 WNPool，以及一种新的节点平衡损失，通过有效地融合和平衡骨骼和物体节点信息来提高整体性能。我们的方法在多个基于骨骼的动作识别基准测试中超越了之前最先进的方法。在 NTU RGB+D 60 跨主体分割上的准确率为 96.7%，在跨视角分割上为 99.2%。

摘要： Human skeleton information is important in skeleton-based action recognition, which provides a simple and efficient way to describe human pose. However, existing skeleton-based methods focus more on the skeleton, ignoring the objects interacting with humans, resulting in poor performance in recognizing actions that involve object interactions. We propose a new action recognition framework introducing object nodes to supplement absent interactive object information. We also propose Spatial Temporal Variable Graph Convolutional Networks (ST-VGCN) to effectively model the Variable Graph (VG) containing object nodes. Specifically, in order to validate the role of interactive object information, by leveraging a simple self-training approach, we establish a new dataset, JXGC 24, and an extended dataset, NTU RGB+D+Object 60, including more than 2 million additional object nodes. At the same time, we designe the Variable Graph construction method to accommodate a variable number of nodes for graph structure. Additionally, we are the first to explore the overfitting issue introduced by incorporating additional object information, and we propose a VG-based data augmentation method to address this issue, called Random Node Attack. Finally, regarding the network structure, we introduce two fusion modules, CAF and WNPool, along with a novel Node Balance Loss, to enhance the comprehensive performance by effectively fusing and balancing skeleton and object node information. Our method surpasses the previous state-of-the-art on multiple skeleton-based action recognition benchmarks. The accuracy of our method on NTU RGB+D 60 cross-subject split is 96.7\%, and on cross-view split, it is 99.2\%.

主题：	计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI)
引用方式：	arXiv:2501.05066 [cs.CV]
	(或者 arXiv:2501.05066v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.05066

提交历史

来自： Hao Wen [查看电子邮件]
[v1] 星期四， 2025 年 1 月 9 日 08:43:09 UTC (3,368 KB)

计算机科学 > 计算机视觉与模式识别

标题：基于交互物体信息的骨架动作识别改进方法

标题： Improving Skeleton-based Action Recognition with Interactive Object Information

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 基于交互物体信息的骨架动作识别改进方法 显示英文标题

标题： Improving Skeleton-based Action Recognition with Interactive Object Information

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于交互物体信息的骨架动作识别改进方法