HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong

Han, Sirui; Zhu, Junqi; Zhang, Ruiyuan; Guo, Yike

计算机科学 > 计算与语言

arXiv:2507.11502v1 (cs)

[提交于 2025年7月14日 ]

标题： HKGAI-V1：面向香港的区域主权大型语言模型

标题： HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong

Authors:Sirui Han, Junqi Zhu, Ruiyuan Zhang, Yike Guo

摘要：本文介绍了HKGAI-V1的开发，这是一个基础主权大型语言模型（LLM），作为一项旨在为香港建立符合价值的AI基础设施的倡议的一部分。针对该地区的独特多语言环境（粤语、普通话和英语）、在“一国两制”框架下的独特社会法律背景以及特定的本地文化和价值观考虑，该模型基于DeepSeek架构，并通过多方面的全参数微调过程系统地与区域规范对齐。它进一步集成了一个检索增强生成（RAG）系统，以确保及时且事实依据的信息访问。核心贡献在于设计和实现了一个全面的、地区特定的AI对齐和安全框架，通过两个关键成果得到展示：1）HKGAI-V1本身的成功开发——在处理香港特有的文化敏感查询方面优于通用模型，并体现了“治理嵌入”的数字主权方法——使香港能够在公共事业、法律体系和教育等关键领域掌控AI应用。 2）开发了专有的对抗性香港价值基准，这是一种在挑战性条件下评估模型与本地伦理和法律标准对齐程度的严格工具。通过记录这些成就，本文不仅提供了一种技术成果，还提供了一个可复制的蓝图，用于开发深入扎根于本地身份的先进区域性AI系统。

摘要： This paper presents the development of HKGAI-V1, a foundational sovereign large language model (LLM), developed as part of an initiative to establish value-aligned AI infrastructure specifically tailored for Hong Kong. Addressing the region's unique multilingual environment (Cantonese, Mandarin, and English), its distinct socio-legal context under the "one country, two systems" framework, and specific local cultural and value considerations, the model is built upon the DeepSeek architecture and systematically aligned with regional norms through a multifaceted full parameter fine-tuning process. It is further integrated with a retrieval-augmented generation (RAG) system to ensure timely and factually grounded information access. The core contribution lies in the design and implementation of a comprehensive, region-specific AI alignment and safety framework, demonstrated through two key achievements: 1) The successful development of HKGAI-V1 itself - which outper-forms general-purpose models in handling Hong Kong-specific culturally sensitive queries, and embodies a "governance-embedded" approach to digital sovereignty - empowers Hong Kong to exercise control over AI applications in critical sectors including public services, legal systems, and edu-cation. 2) The development of the proprietary Adversarial HK Value Benchmark, a rigorous tool for evaluating model alignment with local ethical and legal stand-ards under challenging conditions. By documenting these achievements, the paper provides not only a technological artifact but also a replicable blueprint for developing advanced, regionally focused AI systems deeply rooted in their local identities.

主题：	计算与语言 (cs.CL) ; 计算工程、金融与科学 (cs.CE); 机器学习 (cs.LG)
引用方式：	arXiv:2507.11502 [cs.CL]
	(或者 arXiv:2507.11502v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.11502

提交历史

来自： Sirui Han [查看电子邮件]
[v1] 星期一， 2025 年 7 月 14 日 15:09:05 UTC (1,092 KB)

计算机科学 > 计算与语言

标题： HKGAI-V1：面向香港的区域主权大型语言模型

标题： HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： HKGAI-V1：面向香港的区域主权大型语言模型 显示英文标题

标题： HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： HKGAI-V1：面向香港的区域主权大型语言模型