The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents

Wang, Lixu; Yao, Kaixiang; Li, Xinfeng; Yang, Dong; Li, Haoyang; Wang, Xiaofeng; Dong, Wei

计算机科学 > 密码学与安全

arXiv:2507.10016 (cs)

[提交于 2025年7月14日 ]

标题：声音背后的人员：通过多模态大语言模型代理解音频私有属性分析

标题： The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents

Authors:Lixu Wang, Kaixiang Yao, Xinfeng Li, Dong Yang, Haoyang Li, Xiaofeng Wang, Wei Dong

摘要：我们的研究揭示了与多模态大语言模型（MLLMs）相关的一种新型隐私风险：从音频数据中推断敏感个人属性的能力——我们称之为音频私有属性分析。这种能力构成重大威胁，因为音频可以未经直接交互或可见性的情况下被隐蔽捕获。此外，与图像和文本相比，音频具有独特的特征，如语气和音调，这些特征可以用于更详细的分析。然而，在理解MLLM从音频中使用的私有属性分析时存在两个关键挑战：(1) 缺乏带有敏感属性注释的音频基准数据集，以及 (2) 当前MLLM直接从音频中推断此类属性的能力有限。为了解决这些挑战，我们引入了AP^2，一个由从现实世界数据中收集和组合的两个子集组成的音频基准数据集，并且都带有敏感属性标签。此外，我们提出了Gifts，一种混合多智能体框架，利用音频-语言模型（ALMs）和大语言模型（LLMs）的互补优势来增强推理能力。 Gifts使用LLM指导ALM推断敏感属性，然后对ALM的推断进行法医分析和整合，克服现有ALMs在生成长上下文响应时的严重幻觉问题。我们的评估表明，Gifts在推断敏感属性方面显著优于基线方法。最后，我们研究了模型级和数据级防御策略以减轻音频私有属性分析的风险。我们的工作验证了使用MLLM进行基于音频的隐私攻击的可行性，突显了需要强大的防御措施，并提供了一个数据集和框架以促进未来的研究。

摘要： Our research uncovers a novel privacy risk associated with multimodal large language models (MLLMs): the ability to infer sensitive personal attributes from audio data -- a technique we term audio private attribute profiling. This capability poses a significant threat, as audio can be covertly captured without direct interaction or visibility. Moreover, compared to images and text, audio carries unique characteristics, such as tone and pitch, which can be exploited for more detailed profiling. However, two key challenges exist in understanding MLLM-employed private attribute profiling from audio: (1) the lack of audio benchmark datasets with sensitive attribute annotations and (2) the limited ability of current MLLMs to infer such attributes directly from audio. To address these challenges, we introduce AP^2, an audio benchmark dataset that consists of two subsets collected and composed from real-world data, and both are annotated with sensitive attribute labels. Additionally, we propose Gifts, a hybrid multi-agent framework that leverages the complementary strengths of audio-language models (ALMs) and large language models (LLMs) to enhance inference capabilities. Gifts employs an LLM to guide the ALM in inferring sensitive attributes, then forensically analyzes and consolidates the ALM's inferences, overcoming severe hallucinations of existing ALMs in generating long-context responses. Our evaluations demonstrate that Gifts significantly outperforms baseline approaches in inferring sensitive attributes. Finally, we investigate model-level and data-level defense strategies to mitigate the risks of audio private attribute profiling. Our work validates the feasibility of audio-based privacy attacks using MLLMs, highlighting the need for robust defenses, and provides a dataset and framework to facilitate future research.

评论：	22页，4图
主题：	密码学与安全 (cs.CR) ; 声音 (cs.SD); 音频与语音处理 (eess.AS)
引用方式：	arXiv:2507.10016 [cs.CR]
	(或者 arXiv:2507.10016v1 [cs.CR] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.10016

提交历史

来自： Lixu Wang [查看电子邮件]
[v1] 星期一， 2025 年 7 月 14 日 07:51:56 UTC (592 KB)

计算机科学 > 密码学与安全

标题：声音背后的人员：通过多模态大语言模型代理解音频私有属性分析

标题： The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 密码学与安全

标题： 声音背后的人员：通过多模态大语言模型代理解音频私有属性分析 显示英文标题

标题： The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：声音背后的人员：通过多模态大语言模型代理解音频私有属性分析