Gender inference: can chatGPT outperform common commercial tools?

Alexopoulos, Michelle; Lyons, Kelly; Mahetaji, Kaushar; Barnes, Marcus Emmanuel; Gutwillinger, Rogan

计算机科学 > 计算与语言

arXiv:2312.00805 (cs)

[提交于 2023年11月24日 ]

标题：性别推断：chatGPT能否超越常见的商业工具？

标题： Gender inference: can chatGPT outperform common commercial tools?

Authors:Michelle Alexopoulos, Kelly Lyons, Kaushar Mahetaji, Marcus Emmanuel Barnes, Rogan Gutwillinger

摘要：越来越多的研究使用性别信息来理解诸如性别偏见、获取和参与的不平等或新冠疫情影响等现象。不幸的是，大多数数据集不包含自我报告的性别信息，这使得研究人员必须从其他信息中推断性别，例如姓名或姓名和国家信息。这些工具的一个重要局限性是它们未能适当捕捉到性别存在于非二元尺度上的事实，然而，在各种情境下评估和比较这些工具的表现仍然很重要。在本文中，我们比较了一个生成式人工智能（AI）工具ChatGPT与三种商业可用的基于列表和机器学习的性别推断工具（Namsor、Gender-API和genderize.io）在一个独特数据集上的表现。具体来说，我们使用了一个大型奥运运动员数据集，并报告了输入的变化（例如，名字和名字及姓氏，有无国家信息）如何影响其预测的准确性。我们报告了整体结果以及子集的结果：获奖者与非获奖者、来自最大英语国家的运动员以及来自东亚的运动员。在这些集合上，我们发现Namsor是最优秀的传统商业可用工具。然而，ChatGPT的表现至少与Namsor相当，而且在有国家和/或姓氏信息的情况下，特别是在女性样本中，常常优于Namsor。所有工具在获奖者与非获奖者以及英语国家的名字上的表现更好。尽管并非为此目的而设计，ChatGPT可能是一个具有成本效益的性别预测工具。未来，ChatGPT或其他大规模语言模型甚至可能更好地识别自我报告的性别，而不是在二元尺度上报告性别。

摘要： An increasing number of studies use gender information to understand phenomena such as gender bias, inequity in access and participation, or the impact of the Covid pandemic response. Unfortunately, most datasets do not include self-reported gender information, making it necessary for researchers to infer gender from other information, such as names or names and country information. An important limitation of these tools is that they fail to appropriately capture the fact that gender exists on a non-binary scale, however, it remains important to evaluate and compare how well these tools perform in a variety of contexts. In this paper, we compare the performance of a generative Artificial Intelligence (AI) tool ChatGPT with three commercially available list-based and machine learning-based gender inference tools (Namsor, Gender-API, and genderize.io) on a unique dataset. Specifically, we use a large Olympic athlete dataset and report how variations in the input (e.g., first name and first and last name, with and without country information) impact the accuracy of their predictions. We report results for the full set, as well as for the subsets: medal versus non-medal winners, athletes from the largest English-speaking countries, and athletes from East Asia. On these sets, we find that Namsor is the best traditional commercially available tool. However, ChatGPT performs at least as well as Namsor and often outperforms it, especially for the female sample when country and/or last name information is available. All tools perform better on medalists versus non-medalists and on names from English-speaking countries. Although not designed for this purpose, ChatGPT may be a cost-effective tool for gender prediction. In the future, it might even be possible for ChatGPT or other large scale language models to better identify self-reported gender rather than report gender on a binary scale.

评论：	14页，8张表格
主题：	计算与语言 (cs.CL) ; 人工智能 (cs.AI)
引用方式：	arXiv:2312.00805 [cs.CL]
	(或者 arXiv:2312.00805v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2312.00805
期刊参考：	Proceedings of CASCON 2023, ACM, New York, NY, USA, 161-166

提交历史

来自： Kelly Lyons [查看电子邮件]
[v1] 星期五， 2023 年 11 月 24 日 22:09:14 UTC (86 KB)

计算机科学 > 计算与语言

标题：性别推断：chatGPT能否超越常见的商业工具？

标题： Gender inference: can chatGPT outperform common commercial tools?

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 性别推断：chatGPT能否超越常见的商业工具？ 显示英文标题

标题： Gender inference: can chatGPT outperform common commercial tools?

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：性别推断：chatGPT能否超越常见的商业工具？