LuxDiT: Lighting Estimation with Video Diffusion Transformer

Liang, Ruofan; He, Kai; Gojcic, Zan; Gilitschenski, Igor; Fidler, Sanja; Vijaykumar, Nandita; Wang, Zian

计算机科学 > 图形学

arXiv:2509.03680 (cs)

[提交于 2025年9月3日 ]

标题： LuxDiT：基于视频扩散变压器的光照估计

标题： LuxDiT: Lighting Estimation with Video Diffusion Transformer

Authors:Ruofan Liang, Kai He, Zan Gojcic, Igor Gilitschenski, Sanja Fidler, Nandita Vijaykumar, Zian Wang

摘要：从单张图像或视频中估计场景光照仍然是计算机视觉和图形学中的一个长期挑战。基于学习的方法受到真实HDR环境图稀缺的限制，这些环境图捕获成本高且多样性有限。尽管最近的生成模型为图像合成提供了强大的先验知识，但由于依赖间接视觉线索、需要推断全局（非局部）上下文以及恢复高动态范围输出，光照估计仍然困难。我们提出了LuxDiT，一种新颖的数据驱动方法，通过微调视频扩散变压器来根据视觉输入生成HDR环境图。在包含多种光照条件的大规模合成数据集上进行训练，我们的模型学会了从间接视觉线索中推断照明，并能有效地推广到真实场景。为了提高输入与预测环境图之间的语义对齐，我们引入了一种使用收集的HDR全景图数据集的低秩适应微调策略。我们的方法产生了具有真实角度高频细节的准确光照预测，在定量和定性评估中均优于现有的最先进技术。

摘要： Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics. Learning-based approaches are constrained by the scarcity of ground-truth HDR environment maps, which are expensive to capture and limited in diversity. While recent generative models offer strong priors for image synthesis, lighting estimation remains difficult due to its reliance on indirect visual cues, the need to infer global (non-local) context, and the recovery of high-dynamic-range outputs. We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input. Trained on a large synthetic dataset with diverse lighting conditions, our model learns to infer illumination from indirect visual cues and generalizes effectively to real-world scenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas. Our method produces accurate lighting predictions with realistic angular high-frequency details, outperforming existing state-of-the-art techniques in both quantitative and qualitative evaluations.

评论：	项目页面：https://research.nvidia.com/labs/toronto-ai/LuxDiT/
主题：	图形学 (cs.GR) ; 人工智能 (cs.AI); 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2509.03680 [cs.GR]
	(或者 arXiv:2509.03680v1 [cs.GR] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.03680

提交历史

来自： Ruofan Liang [查看电子邮件]
[v1] 星期三， 2025 年 9 月 3 日 19:59:20 UTC (29,773 KB)

计算机科学 > 图形学

标题： LuxDiT：基于视频扩散变压器的光照估计

标题： LuxDiT: Lighting Estimation with Video Diffusion Transformer

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 图形学

标题： LuxDiT：基于视频扩散变压器的光照估计 显示英文标题

标题： LuxDiT: Lighting Estimation with Video Diffusion Transformer

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： LuxDiT：基于视频扩散变压器的光照估计