CRAM: Large-scale Video Continual Learning with Bootstrapped Compression

Mall, Shivani; Henriques, Joao F.

计算机科学 > 计算机视觉与模式识别

arXiv:2508.05001 (cs)

[提交于 2025年8月7日 ]

标题： CRAM：具有自举压缩的大规模视频持续学习

标题： CRAM: Large-scale Video Continual Learning with Bootstrapped Compression

Authors:Shivani Mall, Joao F. Henriques

摘要：持续学习（CL）有望使神经网络能够从连续的数据流中学习，而不是依赖于独立同分布（IID）采样，后者需要随机访问完整数据集。这将允许部署的系统具有更小的存储需求和自给自足性，从而应对自然分布变化，类似于生物学习。我们专注于基于重放的视频CL方法，该方法通过记忆缓冲区强化过去的样本。我们认为实际视频CL具有挑战性的一个原因是视频的高内存需求，再加上长视频和持续数据流，这与常见的重放缓冲区大小限制相冲突。为了解决这个问题，我们提议使用压缩视觉，即存储视频代码（嵌入）而不是原始输入，并通过从滚动缓冲区进行IID采样来训练视频分类器。在线训练视频压缩器（不依赖任何预训练网络）意味着它也会受到灾难性遗忘的影响。我们提出了一种处理这种遗忘的方案，通过刷新视频代码，这需要使用网络的先前版本进行仔细解压缩，并使用新版本重新压缩。我们将我们的方法命名为持续刷新模态记忆（CRAM）。我们扩展了当前的视频CL基准测试到大规模设置，即EpicKitchens-100和Kinetics-700，在不到2GB的存储中存储数千个相对较长的视频，并通过实验证明，我们的视频CL方法在显著减少内存占用的情况下优于现有技术。

摘要： Continual learning (CL) promises to allow neural networks to learn from continuous streams of inputs, instead of IID (independent and identically distributed) sampling, which requires random access to a full dataset. This would allow for much smaller storage requirements and self-sufficiency of deployed systems that cope with natural distribution shifts, similarly to biological learning. We focus on video CL employing a rehearsal-based approach, which reinforces past samples from a memory buffer. We posit that part of the reason why practical video CL is challenging is the high memory requirements of video, further exacerbated by long-videos and continual streams, which are at odds with the common rehearsal-buffer size constraints. To address this, we propose to use compressed vision, i.e. store video codes (embeddings) instead of raw inputs, and train a video classifier by IID sampling from this rolling buffer. Training a video compressor online (so not depending on any pre-trained networks) means that it is also subject to catastrophic forgetting. We propose a scheme to deal with this forgetting by refreshing video codes, which requires careful decompression with a previous version of the network and recompression with a new one. We name our method Continually Refreshed Amodal Memory (CRAM). We expand current video CL benchmarks to large-scale settings, namely EpicKitchens-100 and Kinetics-700, storing thousands of relatively long videos in under 2 GB, and demonstrate empirically that our video CL method outperforms prior art with a significantly reduced memory footprint.

主题：	计算机视觉与模式识别 (cs.CV) ; 机器学习 (cs.LG); 性能 (cs.PF)
引用方式：	arXiv:2508.05001 [cs.CV]
	(或者 arXiv:2508.05001v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.05001
期刊参考：	International Conference on Computer Vision, ICCV 2025

提交历史

来自： Shivani Mall [查看电子邮件]
[v1] 星期四， 2025 年 8 月 7 日 03:32:20 UTC (2,040 KB)

计算机科学 > 计算机视觉与模式识别

标题： CRAM：具有自举压缩的大规模视频持续学习

标题： CRAM: Large-scale Video Continual Learning with Bootstrapped Compression

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： CRAM：具有自举压缩的大规模视频持续学习 显示英文标题

标题： CRAM: Large-scale Video Continual Learning with Bootstrapped Compression

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： CRAM：具有自举压缩的大规模视频持续学习