Chunk-Distilled Language Modeling

Li, Yanhong; Livescu, Karen; Zhou, Jiawei

计算机科学 > 计算与语言

arXiv:2501.00343 (cs)

[提交于 2024年12月31日 ]

标题：块蒸馏语言建模

标题： Chunk-Distilled Language Modeling

Authors:Yanhong Li, Karen Livescu, Jiawei Zhou

摘要：我们引入了分块蒸馏语言建模（CD-LM），这是一种解决当前大型语言模型（LLMs）中两个挑战的文本生成方法：逐标记生成的低效率，以及适应新数据和知识的困难。我们的方法将基于深度网络的LLMs与一个简单的检索模块相结合，这使得在单个解码步骤中可以生成多标记文本块。我们的检索框架允许灵活构建模型或领域特定的数据存储，既可以利用现有模型的内部知识，也可以结合人工标注语料库中的专家见解。这种适应性使得可以在不需要额外训练的情况下对语言模型的分布进行增强控制。我们提出了CD-LM的公式，并提供了性能指标，证明了它在各种下游任务中提高语言模型性能和效率的能力。代码和数据将公开提供。

摘要： We introduce Chunk-Distilled Language Modeling (CD-LM), an approach to text generation that addresses two challenges in current large language models (LLMs): the inefficiency of token-level generation, and the difficulty of adapting to new data and knowledge. Our method combines deep network-based LLMs with a straightforward retrieval module, which allows the generation of multi-token text chunks at a single decoding step. Our retrieval framework enables flexible construction of model- or domain-specific datastores, either leveraging the internal knowledge of existing models, or incorporating expert insights from human-annotated corpora. This adaptability allows for enhanced control over the language model's distribution without necessitating additional training. We present the CD-LM formulation along with performance metrics demonstrating its ability to improve language model performance and efficiency across a diverse set of downstream tasks. Code and data will be made publicly available.

主题：	计算与语言 (cs.CL) ; 人工智能 (cs.AI)
引用方式：	arXiv:2501.00343 [cs.CL]
	(或者 arXiv:2501.00343v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.00343

提交历史

来自： Yanhong Li [查看电子邮件]
[v1] 星期二， 2024 年 12 月 31 日 08:32:15 UTC (1,050 KB)

计算机科学 > 计算与语言

标题：块蒸馏语言建模

标题： Chunk-Distilled Language Modeling

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 块蒸馏语言建模 显示英文标题

标题： Chunk-Distilled Language Modeling

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：块蒸馏语言建模