Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2510.00028v1

Help | Advanced Search

Computer Science > Machine Learning

arXiv:2510.00028v1 (cs)
[Submitted on 26 Sep 2025 ]

Title: Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling

Title: 重新思考量化LLM中的RoPE缩放:理论、异常值和通道带分析与权重重缩放

Authors:Ye Qiao, Haocheng Xu, Xiaofan Zhang, Sitao Huang
Abstract: Extending the context window support of large language models (LLMs) is crucial for tasks with long-distance dependencies. RoPE-based interpolation and extrapolation methods, such as linear scaling and frequency-aware schemes, enable longer input length support without retraining, while post-training quantization (PTQ) makes deployment practical. However, we show that combining RoPE position interpolation (PI) with PTQ degrades accuracy due to coupled effects including long-context aliasing, dynamic-range dilation, anisotropy from axis-aligned quantizers vs. rotated RoPE pairs, and outlier shifting that produces position-dependent logit noise. We provide, to the best of our knowledge, the first systematic analysis of the PI+PTQ approach and introduce two practical diagnostics: interpolation pressure (per-band sensitivity to phase scaling) and tail-inflation ratios (outlier shift from short to long contexts). Following the analysis results, we propose Q-ROAR (Quantization, RoPE-interpolation, and Outlier Aware Rescaling), a weight-only, interpolation-aware stabilization of PI for quantized LLMs. Q-ROAR groups RoPE dimensions into a small number of frequency bands and performs a lightweight search over per-band scales for Key and Query weights (with an optional symmetric variant to preserve logit scale). The search is guided by our diagnostics and uses a tiny long-context development dataset, requiring no fine-tuning to the model, no architecture or kernel changes, and no additional deployment overhead. Empirically, Q-ROAR reduces the model's perplexity on long-context workloads by more than 14%, while preserving short-context performance, inference throughput, and compatibility with existing LLM system stacks.
Abstract: 扩展大型语言模型(LLMs)的上下文窗口支持对于具有长距离依赖性的任务至关重要。基于RoPE的插值和外推方法,如线性缩放和频率感知方案,可以在不重新训练的情况下支持更长的输入长度,而事后训练量化(PTQ)使部署变得实际。然而,我们表明,将RoPE位置插值(PI)与PTQ结合会导致准确性下降,这是由于包括长上下文混叠、动态范围膨胀、轴对齐量化器与旋转RoPE对的各向异性以及产生位置相关logit噪声的异常值偏移等耦合效应。据我们所知,这是我们首次对PI+PTQ方法进行系统的分析,并引入了两个实用的诊断方法:插值压力(对相位缩放的每带敏感度)和尾部膨胀比(从短上下文到长上下文的异常值偏移)。根据分析结果,我们提出了Q-ROAR(量化、RoPE插值和异常值感知重缩放),这是一种针对量化LLMs的仅权重、插值感知的PI稳定化方法。Q-ROAR将RoPE维度分组为少量频率带,并对键和查询权重的每带尺度进行轻量级搜索(可选对称变体以保持logit尺度)。该搜索由我们的诊断方法指导,并使用一个小型的长上下文开发数据集,不需要对模型进行微调,不需要架构或内核更改,也不需要额外的部署开销。实证结果显示,Q-ROAR在长上下文工作负载上将模型的困惑度降低了超过14%,同时保持了短上下文性能、推理吞吐量和与现有LLM系统堆栈的兼容性。
Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Cite as: arXiv:2510.00028 [cs.LG]
  (or arXiv:2510.00028v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2510.00028
arXiv-issued DOI via DataCite

Submission history

From: Ye Qiao [view email]
[v1] Fri, 26 Sep 2025 01:23:32 UTC (376 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
view license
Current browse context:
cs.LG
< prev   |   next >
new | recent | 2025-10
Change to browse by:
cs
cs.AI

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号