VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Garg, Yash; Bachu, Saketh; Dutta, Arindam; Lal, Rohit; Bose, Sarosij; Ta, Calvin-Khang; Asif, M. Salman; Roy-Chowdhury, Amit

计算机科学 > 计算机视觉与模式识别

arXiv:2508.06757 (cs)

[提交于 2025年8月9日 ]

标题： VOccl3D：一种用于真实遮挡下3D人体姿态和形状估计的视频基准数据集

标题： VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Authors:Yash Garg, Saketh Bachu, Arindam Dutta, Rohit Lal, Sarosij Bose, Calvin-Khang Ta, M. Salman Asif, Amit Roy-Chowdhury

摘要：人体姿态和形状（HPS）估计方法已被广泛研究，许多方法在真实场景图像和视频中表现出色。然而，这些方法在涉及复杂人体姿势或显著遮挡的挑战性场景中通常表现不佳。尽管一些研究解决了遮挡下的3D人体姿态估计问题，但它们通常在缺乏现实或显著遮挡的数据集上评估性能，例如，大多数现有数据集通过在人体上随机贴图或剪贴画风格的覆盖来引入遮挡，这可能无法反映现实世界的挑战。为了弥补现实遮挡数据集的这一差距，我们引入了一个新的基准数据集，VOccl3D，这是一个基于视频的人体遮挡数据集，包含3D身体姿态和形状标注。受AGORA和BEDLAM等工作的启发，我们使用先进的计算机图形渲染技术构建了这个数据集，融入了多样的现实世界遮挡场景、服装纹理和人体运动。此外，我们在我们的数据集上微调了最近的HPS方法，CLIFF和BEDLAM-CLIFF，展示了在多个公共数据集以及我们数据集的测试分割上的显著定性和定量改进，并与其他最先进的方法进行比较。此外，我们利用我们的数据集通过微调现有的目标检测器YOLO11来提升遮挡下的行人检测性能，从而在遮挡下实现一个鲁棒的端到端HPS估计系统。总体而言，该数据集为未来旨在基准化处理遮挡的方法的研究提供了宝贵的资源，为现有遮挡数据集提供了一个更真实的替代方案。查看项目页面获取代码和数据集：https://yashgarg98.github.io/VOccl3D-dataset/

摘要： Human pose and shape (HPS) estimation methods have been extensively studied, with many demonstrating high zero-shot performance on in-the-wild images and videos. However, these methods often struggle in challenging scenarios involving complex human poses or significant occlusions. Although some studies address 3D human pose estimation under occlusion, they typically evaluate performance on datasets that lack realistic or substantial occlusions, e.g., most existing datasets introduce occlusions with random patches over the human or clipart-style overlays, which may not reflect real-world challenges. To bridge this gap in realistic occlusion datasets, we introduce a novel benchmark dataset, VOccl3D, a Video-based human Occlusion dataset with 3D body pose and shape annotations. Inspired by works such as AGORA and BEDLAM, we constructed this dataset using advanced computer graphics rendering techniques, incorporating diverse real-world occlusion scenarios, clothing textures, and human motions. Additionally, we fine-tuned recent HPS methods, CLIFF and BEDLAM-CLIFF, on our dataset, demonstrating significant qualitative and quantitative improvements across multiple public datasets, as well as on the test split of our dataset, while comparing its performance with other state-of-the-art methods. Furthermore, we leveraged our dataset to enhance human detection performance under occlusion by fine-tuning an existing object detector, YOLO11, thus leading to a robust end-to-end HPS estimation system under occlusions. Overall, this dataset serves as a valuable resource for future research aimed at benchmarking methods designed to handle occlusions, offering a more realistic alternative to existing occlusion datasets. See the Project page for code and dataset:https://yashgarg98.github.io/VOccl3D-dataset/

主题：	计算机视觉与模式识别 (cs.CV) ; 图形学 (cs.GR)
引用方式：	arXiv:2508.06757 [cs.CV]
	(或者 arXiv:2508.06757v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.06757

提交历史

来自： Yash Garg [查看电子邮件]
[v1] 星期六， 2025 年 8 月 9 日 00:13:46 UTC (22,651 KB)

计算机科学 > 计算机视觉与模式识别

标题： VOccl3D：一种用于真实遮挡下3D人体姿态和形状估计的视频基准数据集

标题： VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： VOccl3D：一种用于真实遮挡下3D人体姿态和形状估计的视频基准数据集 显示英文标题

标题： VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： VOccl3D：一种用于真实遮挡下3D人体姿态和形状估计的视频基准数据集