SPDZCoder: Combining Expert Knowledge with LLMs for Generating Privacy-Computing Code

Dong, Xiaoning; Xin, Peilin; Li, Jia; Xu, Wei

计算机科学 > 密码学与安全

arXiv:2501.00363 (cs)

[提交于 2024年12月31日 (v1) ，最后修订 2025年3月21日 (此版本， v2)]

标题： SPDZCoder：结合专家知识与大语言模型生成隐私计算代码

标题： SPDZCoder: Combining Expert Knowledge with LLMs for Generating Privacy-Computing Code

Authors:Xiaoning Dong, Peilin Xin, Jia Li, Wei Xu

摘要：隐私计算受到越来越多的关注，但由于库函数有限，编写隐私计算代码对开发人员来说仍然具有挑战性，需要从头实现功能，并且有数据无关的要求，这与程序员的直觉思维和常规做法相矛盾。使用大型语言模型自动生成隐私计算代码可以简化开发工作并降低使用隐私计算框架的门槛。然而，现有的大型语言模型在隐私保护计算的代码翻译方面仍然面临挑战，例如将 Python 翻译为 MP-SPDZ，这是由于有效预训练或微调所需的 MP-SPDZ 数据不足。此外，缺乏基准进一步增加了评估翻译质量的难度。为了解决这些限制，本工作提出了 SPDZCoder，这是一个基于规则的框架，结合了大型语言模型和专家知识，用于生成无需额外训练数据的隐私计算代码。具体而言，SPDZCoder 采用了一套严格的流程来收集高质量的专家知识，以表示 Python 和 MP-SPDZ 之间的语义表达差异，并基于这些知识推导出将 Python 翻译为 MP-SPDZ 的转换规则。然后，SPDZCoder 使用转换规则在三个阶段的流水线中逐步将 Python 代码转换为 MP-SPDZ 代码。为了评估 SPDZCoder，我们手动构建了一个基准数据集 SPDZEval，该数据集包含六个数据分割，每个分割代表 MP-SPDZ 实现中一类具有挑战性的任务。大量实验表明，SPDZCoder 表现出卓越的性能，在 pass@1 和 pass@2 上显著优于基线方法。具体而言，SPDZCoder 在 pass@1 和 pass@2 上的整体正确率分别为 85.94% 和 92.01%，而表现最好的基线方法分别达到 63.58% 和 76.36%。

摘要： Privacy computing receives increasing attention but writing privacy computing code remains challenging for developers due to limited library functions, necessitating function implementation from scratch, and data-oblivious requirement, contradicting intuitive thinking and usual practices of programmers. Automating the generation of privacy computing code with Large Language Models can streamline development effort and lower the barrier to using privacy computing frameworks. However, existing LLMs still encounter challenges in code translation for privacy-preserving computation, such as translating Python to MP-SPDZ, due to the scarcity of MP-SPDZ data required for effective pre-training or fine-tuning. Moreover, the lack of a benchmark further complicates the evaluation of translation quality. To address the limitations, this work proposes SPDZCoder, a rule-based framework that combines LLMs with expert knowledge for generating privacy-computing code without requiring additional training data. Specifically, SPDZCoder employ a rigorous procedure for collecting high-quality expert knowledge to represent the semantic-expressing differences between Python and MP-SPDZ, and to derive transformation rules for translating Python to MP-SPDZ based on these knowledge. Then, SPDZCoder progressively converts Python code into MP-SPDZ code using transformation rules in a three stage pipeline. To evaluate SPDZCoder, we manually constructed a benchmark dataset, SPDZEval, which comprises six data splits, each representing a distinct class of challenging tasks in MP-SPDZ implementation. Extensive experiments show that SPDZCoder achieves superior performance, significantly surpassing baselines in pass@1 and pass@2. Specifically, SPDZCoder attains an overall correctness of 85.94% and 92.01% in pass@1 and pass@2, respectively, whereas the best-performing baseline achieves 63.58% and 76.36%, respectively.

主题：	密码学与安全 (cs.CR) ; 人工智能 (cs.AI); 软件工程 (cs.SE)
引用方式：	arXiv:2501.00363 [cs.CR]
	(或者 arXiv:2501.00363v2 [cs.CR] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.00363

提交历史

来自： Xiaoning Dong [查看电子邮件]
[v1] 星期二， 2024 年 12 月 31 日 09:29:38 UTC (8,304 KB)
[v2] 星期五， 2025 年 3 月 21 日 12:52:57 UTC (1,294 KB)

计算机科学 > 密码学与安全

标题： SPDZCoder：结合专家知识与大语言模型生成隐私计算代码

标题： SPDZCoder: Combining Expert Knowledge with LLMs for Generating Privacy-Computing Code

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 密码学与安全

标题： SPDZCoder：结合专家知识与大语言模型生成隐私计算代码 显示英文标题

标题： SPDZCoder: Combining Expert Knowledge with LLMs for Generating Privacy-Computing Code

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： SPDZCoder：结合专家知识与大语言模型生成隐私计算代码