Frame Sampling Strategies Matter: A Benchmark for small vision language models

Brkic, Marija; Razzouki, Anas Filali; Tevissen, Yannis; Guetari, Khalil; Yacoubi, Mounim A. El

Computer Science > Computer Vision and Pattern Recognition

arXiv:2509.14769 (cs)

[Submitted on 18 Sep 2025 ]

Title: Frame Sampling Strategies Matter: A Benchmark for small vision language models

Title: 框架采样策略很重要：小视觉语言模型的基准

Authors:Marija Brkic, Anas Filali Razzouki, Yannis Tevissen, Khalil Guetari, Mounim A. El Yacoubi

Abstract: Comparing vision language models on videos is particularly complex, as the performances is jointly determined by the model's visual representation capacity and the frame-sampling strategy used to construct the input. Current video benchmarks are suspected to suffer from substantial frame-sampling bias, as models are evaluated with different frame selection strategies. In this work, we propose the first frame-accurate benchmark of state-of-the-art small VLMs for video question-answering, evaluated under controlled frame-sampling strategies. Our results confirm the suspected bias and highlight both data-specific and task-specific behaviors of SVLMs under different frame-sampling techniques. By open-sourcing our benchmarking code, we provide the community with a reproducible and unbiased protocol for evaluating video VLMs and emphasize the need for standardized frame-sampling strategies tailored to each benchmarking dataset in future research.

Abstract: 在视频上比较视觉语言模型尤其复杂，因为性能是由模型的视觉表示能力以及用于构建输入的帧采样策略共同决定的。当前的视频基准测试可能受到显著的帧采样偏差的影响，因为模型是使用不同的帧选择策略进行评估的。在本工作中，我们提出了第一个针对视频问答任务的最新小型VLMs的帧精确基准，在受控的帧采样策略下进行评估。我们的结果证实了所怀疑的偏差，并突显了在不同帧采样技术下SVLMs的数据特定和任务特定行为。通过开源我们的基准测试代码，我们为社区提供了一个可重复且无偏的协议，用于评估视频VLMs，并强调了未来研究中需要针对每个基准数据集定制标准化的帧采样策略。

Subjects:	Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL)
Cite as:	arXiv:2509.14769 [cs.CV]
	(or arXiv:2509.14769v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2509.14769

Submission history

From: Yannis Tevissen [view email]
[v1] Thu, 18 Sep 2025 09:18:42 UTC (44 KB)

Computer Science > Computer Vision and Pattern Recognition

Title: Frame Sampling Strategies Matter: A Benchmark for small vision language models

Title: 框架采样策略很重要：小视觉语言模型的基准

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title: Frame Sampling Strategies Matter: A Benchmark for small vision language models Show Chinese title

Title: 框架采样策略很重要：小视觉语言模型的基准

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Frame Sampling Strategies Matter: A Benchmark for small vision language models