Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion

Cao, Dongliang; Sun, Guoxing; Habermann, Marc; Bernard, Florian

计算机科学 > 图形学

arXiv:2509.04145 (cs)

[提交于 2025年9月4日 ]

标题：超扩散化身：使用网络权重空间扩散生成动态人类化身

标题： Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion

Authors:Dongliang Cao, Guoxing Sun, Marc Habermann, Florian Bernard

摘要：创建人类虚拟形象是一项高度 desirable 但具有挑战性的任务。最近在辐射场渲染方面的进展，已实现了个性化动态人类虚拟形象前所未有的逼真度和实时性能。然而，这些方法通常仅限于针对单个人的多视角视频数据训练的特定人员渲染模型，限制了它们在不同身份之间的泛化能力。另一方面，利用预训练的2D扩散模型中的先验知识的生成方法可以生成卡通风格的静态人类虚拟形象，这些形象通过简单的基于骨架的关节运动进行动画处理。因此，这些方法生成的虚拟形象在渲染质量上不如特定人员的渲染方法，并且无法捕捉到如衣物褶皱等依赖姿态的变形。在本文中，我们提出了一种新方法，结合特定人员渲染和基于扩散的生成建模的优势，以实现同时具备高逼真度和真实姿态依赖变形的动态人类虚拟形象生成。我们的方法采用两阶段流程：首先，我们优化一组特定人员的UNets，每个网络代表一个捕捉复杂姿态依赖变形的动态人类虚拟形象。在第二阶段，我们在优化后的网络权重上训练一个超扩散模型。在推理过程中，我们的方法生成网络权重，用于实时、可控的动态人类虚拟形象渲染。使用大规模的跨身份多视角视频数据集，我们证明了我们的方法优于最先进的虚拟形象生成方法。

摘要： Creating human avatars is a highly desirable yet challenging task. Recent advancements in radiance field rendering have achieved unprecedented photorealism and real-time performance for personalized dynamic human avatars. However, these approaches are typically limited to person-specific rendering models trained on multi-view video data for a single individual, limiting their ability to generalize across different identities. On the other hand, generative approaches leveraging prior knowledge from pre-trained 2D diffusion models can produce cartoonish, static human avatars, which are animated through simple skeleton-based articulation. Therefore, the avatars generated by these methods suffer from lower rendering quality compared to person-specific rendering methods and fail to capture pose-dependent deformations such as cloth wrinkles. In this paper, we propose a novel approach that unites the strengths of person-specific rendering and diffusion-based generative modeling to enable dynamic human avatar generation with both high photorealism and realistic pose-dependent deformations. Our method follows a two-stage pipeline: first, we optimize a set of person-specific UNets, with each network representing a dynamic human avatar that captures intricate pose-dependent deformations. In the second stage, we train a hyper diffusion model over the optimized network weights. During inference, our method generates network weights for real-time, controllable rendering of dynamic human avatars. Using a large-scale, cross-identity, multi-view video dataset, we demonstrate that our approach outperforms state-of-the-art human avatar generation methods.

主题：	图形学 (cs.GR) ; 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2509.04145 [cs.GR]
	(或者 arXiv:2509.04145v1 [cs.GR] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.04145

提交历史

来自： Dongliang Cao [查看电子邮件]
[v1] 星期四， 2025 年 9 月 4 日 12:15:55 UTC (6,255 KB)

计算机科学 > 图形学

标题：超扩散化身：使用网络权重空间扩散生成动态人类化身

标题： Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 图形学

标题： 超扩散化身：使用网络权重空间扩散生成动态人类化身 显示英文标题

标题： Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：超扩散化身：使用网络权重空间扩散生成动态人类化身