Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

Cohen, Roxane; David, Robin; Yger, Florian; Rossi, Fabrice

计算机科学 > 密码学与安全

arXiv:2504.01481 (cs)

[提交于 2025年4月2日 ]

标题：通过基于图的二进制代码语义分析识别伪装代码

标题： Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

Authors:Roxane Cohen (LAMSADE), Robin David, Florian Yger (LITIS), Fabrice Rossi (CEREMADE)

摘要：保护敏感程序内容在各种情况下都是一个关键问题，从合法的使用场景到不道德的情境皆是如此。混淆是一种最常用的技术，以确保此类保护。因此，攻击者必须首先检测和分析混淆，然后才能发起针对混淆的攻击。本文研究了基于图的方法在函数级混淆检测中的问题，并比较了从基础算法到诸如GNN（图神经网络）等有前景技术的不同特征选择下的算法。我们考虑了多种混淆类型和混淆器，从而生成了两个复杂的数据集。我们的研究结果表明，为了超越基线模型，GNN需要有意义的特征来捕捉函数语义方面的信息。我们的方法在具有挑战性的11类分类任务以及实际恶意软件分析示例中表现出了令人满意的结果。

摘要： Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example.

评论：	第十三届国际复杂网络及其应用会议，2024年12月，土耳其伊斯坦布尔
主题：	密码学与安全 (cs.CR) ; 机器学习 (cs.LG); 机器学习 (stat.ML)
引用方式：	arXiv:2504.01481 [cs.CR]
	(或者 arXiv:2504.01481v1 [cs.CR] 对于此版本)
	https://doi.org/10.48550/arXiv.2504.01481

提交历史

来自： Fabrice Rossi [查看电子邮件]
[v1] 星期三， 2025 年 4 月 2 日 08:36:27 UTC (57 KB)

计算机科学 > 密码学与安全

标题：通过基于图的二进制代码语义分析识别伪装代码

标题： Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 密码学与安全

标题： 通过基于图的二进制代码语义分析识别伪装代码 显示英文标题

标题： Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通过基于图的二进制代码语义分析识别伪装代码