HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Electronic Health Apps

Kelly, Timoteo; Korkmaz, Abdulkadir; Mallet, Samuel; Souders, Connor; Aliakbarpour, Sadra; Rao, Praveen

Computer Science > Human-Computer Interaction

arXiv:2506.19268 (cs)

[Submitted on 24 Jun 2025 (v1) , last revised 20 Sep 2025 (this version, v3)]

Title: HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Electronic Health Apps

Title: HARPT：用于分析电子健康应用程序中消费者信任和隐私担忧的语料库

Authors:Timoteo Kelly, Abdulkadir Korkmaz, Samuel Mallet, Connor Souders, Sadra Aliakbarpour, Praveen Rao

Abstract: We present Health App Reviews for Privacy & Trust (HARPT), a large-scale annotated corpus of user reviews from Electronic Health (eHealth) applications (apps) aimed at advancing research in user privacy and trust. The dataset comprises 480K user reviews labeled in seven categories that capture critical aspects of trust in applications (TA), trust in providers (TP), and privacy concerns (PC). Our multistage strategy integrated keyword-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers. In parallel, we manually annotated a curated subset of 7,000 reviews to support the development and evaluation of machine learning models. We benchmarked a broad range of models, providing a baseline for future work. HARPT is released under an open resource license to support reproducible research in usable privacy and trust in digital libraries and health informatics.

Abstract: 我们提出健康应用程序评论隐私与信任（HARPT），这是一个大规模标注语料库，包含来自电子健康（eHealth）应用程序的用户评论，旨在推动用户隐私和信任方面的研究。该数据集包含48万条用户评论，分为七个类别，涵盖了应用程序信任（TA）、提供者信任（TP）和隐私担忧（PC）的关键方面。我们的多阶段策略集成了基于关键词的过滤、迭代手动标记与审查、针对性的数据增强以及使用基于变压器的分类器进行弱监督。同时，我们手动标注了一个精选的7000条评论子集，以支持机器学习模型的开发和评估。我们对一系列模型进行了基准测试，为未来的工作提供了基准。 HARPT 在开放资源许可下发布，以支持数字图书馆和健康信息学中的可重复隐私和信任研究。

Subjects:	Human-Computer Interaction (cs.HC) ; Cryptography and Security (cs.CR); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
Cite as:	arXiv:2506.19268 [cs.HC]
	(or arXiv:2506.19268v3 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2506.19268

Submission history

From: Timoteo Kelly [view email]
[v1] Tue, 24 Jun 2025 02:59:14 UTC (214 KB)
[v2] Thu, 26 Jun 2025 15:23:54 UTC (214 KB)
[v3] Sat, 20 Sep 2025 07:58:59 UTC (73 KB)

Computer Science > Human-Computer Interaction

Title: HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Electronic Health Apps

Title: HARPT：用于分析电子健康应用程序中消费者信任和隐私担忧的语料库

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title: HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Electronic Health Apps Show Chinese title

Title: HARPT：用于分析电子健康应用程序中消费者信任和隐私担忧的语料库

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Electronic Health Apps