HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps

Kelly, Timoteo; Korkmaz, Abdulkadir; Mallet, Samuel; Souders, Connor; Aliakbarpour, Sadra; Rao, Praveen

计算机科学 > 人机交互

arXiv:2506.19268 (cs)

[提交于 2025年6月24日 (v1) ，最后修订 2025年6月26日 (此版本， v2)]

标题： HARPT：用于分析移动健康应用程序中消费者信任和隐私担忧的语料库

标题： HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps

Authors:Timoteo Kelly, Abdulkadir Korkmaz, Samuel Mallet, Connor Souders, Sadra Aliakbarpour, Praveen Rao

摘要：我们提出HARPT，这是一个大规模标注的移动健康应用商店评论语料库，旨在推动用户隐私和信任方面的研究。该数据集包含超过480,000条用户评论，分为七个类别，涵盖了应用程序信任、提供者信任和隐私担忧的关键方面。创建HARPT需要解决多个复杂问题，例如定义细致的标签模式，从大量噪声数据中隔离相关内容，并设计一种在可扩展性与准确性之间取得平衡的标注策略。该策略整合了基于规则的过滤、迭代的手动标记与审查、针对性的数据增强以及使用基于Transformer的分类器进行弱监督，以加快覆盖速度。同时，精心挑选的7,000条评论子集被手动标注，以支持模型开发和评估。我们对一系列分类模型进行了基准测试，证明了高性能是可行的，并为未来的研究提供了基准。 HARPT作为公共资源发布，以支持健康信息学、网络安全和自然语言处理领域的工作。

摘要： We present HARPT, a large-scale annotated corpus of mobile health app store reviews aimed at advancing research in user privacy and trust. The dataset comprises over 480,000 user reviews labeled into seven categories that capture critical aspects of trust in applications, trust in providers and privacy concerns. Creating HARPT required addressing multiple complexities, such as defining a nuanced label schema, isolating relevant content from large volumes of noisy data, and designing an annotation strategy that balanced scalability with accuracy. This strategy integrated rule-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers to accelerate coverage. In parallel, a carefully curated subset of 7,000 reviews was manually annotated to support model development and evaluation. We benchmark a broad range of classification models, demonstrating that strong performance is achievable and providing a baseline for future research. HARPT is released as a public resource to support work in health informatics, cybersecurity, and natural language processing.

主题：	人机交互 (cs.HC) ; 密码学与安全 (cs.CR); 新兴技术 (cs.ET); 机器学习 (cs.LG)
引用方式：	arXiv:2506.19268 [cs.HC]
	(或者 arXiv:2506.19268v2 [cs.HC] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.19268

提交历史

来自： Timoteo Kelly [查看电子邮件]
[v1] 星期二， 2025 年 6 月 24 日 02:59:14 UTC (214 KB)
[v2] 星期四， 2025 年 6 月 26 日 15:23:54 UTC (214 KB)

计算机科学 > 人机交互

标题： HARPT：用于分析移动健康应用程序中消费者信任和隐私担忧的语料库

标题： HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人机交互

标题： HARPT：用于分析移动健康应用程序中消费者信任和隐私担忧的语料库 显示英文标题

标题： HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： HARPT：用于分析移动健康应用程序中消费者信任和隐私担忧的语料库