Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2506.06037v1

Help | Advanced Search

Computer Science > Multimedia

arXiv:2506.06037v1 (cs)
[Submitted on 6 Jun 2025 ]

Title: SVD: Spatial Video Dataset

Title: SVD:空间视频数据集

Authors:M. H. Izadimehr, Milad Ghanbari, Guodong Chen, Wei Zhou, Xiaoshuai Hao, Mallesham Dasari, Christian Timmerer, Hadi Amirpour
Abstract: Stereoscopic video has long been the subject of research due to its capacity to deliver immersive three-dimensional content across a wide range of applications, from virtual and augmented reality to advanced human-computer interaction. The dual-view format inherently provides binocular disparity cues that enhance depth perception and realism, making it indispensable for fields such as telepresence, 3D mapping, and robotic vision. Until recently, however, end-to-end pipelines for capturing, encoding, and viewing high-quality 3D video were neither widely accessible nor optimized for consumer-grade devices. Today's smartphones, such as the iPhone Pro, and modern Head-Mounted Displays (HMDs), like the Apple Vision Pro (AVP), offer built-in support for stereoscopic video capture, hardware-accelerated encoding, and seamless playback on devices like the Apple Vision Pro and Meta Quest 3, requiring minimal user intervention. Apple refers to this streamlined workflow as spatial video. Making the full stereoscopic video process available to everyone has made new applications possible. Despite these advances, there remains a notable absence of publicly available datasets that include the complete spatial video pipeline. In this paper, we introduce SVD, a spatial video dataset comprising 300 five-second video sequences, 150 captured using an iPhone Pro and 150 with an AVP. Additionally, 10 longer videos with a minimum duration of 2 minutes have been recorded. The SVD dataset is publicly released under an open-access license to facilitate research in codec performance evaluation, subjective and objective quality of experience (QoE) assessment, depth-based computer vision, stereoscopic video streaming, and other emerging 3D applications such as neural rendering and volumetric capture. Link to the dataset: https://cd-athena.github.io/SVD/
Abstract: 立体视频长期以来一直是研究的主题,因为它能够在从虚拟现实和增强现实到高级人机交互的广泛应用中提供沉浸式的三维内容。 双视图格式本质上提供了双眼视差线索,增强了深度感知和真实感,使其在远程呈现、3D地图绘制和机器人视觉等领域不可或缺。 然而,直到最近,用于捕捉、编码和观看高质量3D视频的端到端管道既不广泛可用,也不针对消费级设备进行优化。 如今的智能手机,如iPhone Pro,以及现代头戴式显示器(HMD),如苹果Vision Pro(AVP),内置了对立体视频捕捉的支持、硬件加速的编码以及在设备如苹果Vision Pro和Meta Quest 3上的无缝播放,用户干预最少。 苹果公司将这个流畅的工作流程称为空间视频。 使完整的立体视频过程可供所有人使用开启了新的应用可能性。 尽管取得了这些进展,仍然缺乏包含完整空间视频管道的公开可用数据集。 在这篇论文中,我们介绍了SVD,这是一个包含300个五秒视频序列的空间视频数据集,其中150个使用iPhone Pro捕捉,150个使用AVP捕捉。 此外,还录制了10个长度至少为2分钟的更长视频。 SVD数据集以开放访问许可证公开发布,以促进编解码器性能评估、主观和客观体验质量(QoE)评估、基于深度的计算机视觉、立体视频流媒体以及其他新兴的3D应用,如神经渲染和体积捕捉。 数据集链接:https://cd-athena.github.io/SVD/
Subjects: Multimedia (cs.MM)
Cite as: arXiv:2506.06037 [cs.MM]
  (or arXiv:2506.06037v1 [cs.MM] for this version)
  https://doi.org/10.48550/arXiv.2506.06037
arXiv-issued DOI via DataCite

Submission history

From: Hadi Amirpour [view email]
[v1] Fri, 6 Jun 2025 12:38:01 UTC (1,203 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs.MM
< prev   |   next >
new | recent | 2025-06
Change to browse by:
cs

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号