SignNet: Single Channel Sign Generation using Metric Embedded Learning

Ananthanarayana, Tejaswini; Chaudhary, Lipisha; Nwogu, Ifeoma

计算机科学 > 人机交互

arXiv:2212.02848 (cs)

[提交于 2022年12月6日 ]

标题： SignNet：使用度量嵌入学习的单通道签名生成

标题： SignNet: Single Channel Sign Generation using Metric Embedded Learning

Authors:Tejaswini Ananthanarayana, Lipisha Chaudhary, Ifeoma Nwogu

摘要：一个真正的解释代理不仅能够理解手语并翻译成文本，还能理解文本并翻译成手语。到目前为止，大多数人工智能在手语翻译方面的工作主要集中在从手语翻译成文本。为了实现这一目标，我们提出了一种文本到手语的翻译模型 SignNet，它利用了视觉手语的相似性（和差异性）概念进行翻译。所提出的模块只是涉及文本到手语（T2S）以及手语到文本（S2T）的双任务过程的一部分。我们目前将 SignNet 实现为单通道架构，以便 T2S 任务的输出可以在连续的双学习框架中输入到 S2T 中。所谓单通道，我们指的是单一模态，即身体姿态关节。在这项工作中，我们介绍了 SignNet，这是一个使用新型度量嵌入学习过程的 T2S 任务，以保留手语嵌入之间的距离相对于它们的差异性。我们还描述了如何选择手语的正例和负例进行相似性测试。从我们的分析中，我们观察到基于度量嵌入学习的模型在使用 BLEU 分数评估时，明显优于其他使用传统损失的模型。在词素到姿态的任务中，SignNet 的表现与其最先进的（SoTA）对手相当，并且在文本到姿态的任务中表现更好，在测试流行的 RWTH PHOENIX-Weather-2014T 基准数据集时，BLEU 1 - BLEU 4 分数显示出显著的提升（BLEU 1：31->39；约26%的提升，BLEU 4：10.43->11.84；约14%的提升）。

摘要： A true interpreting agent not only understands sign language and translates to text, but also understands text and translates to signs. Much of the AI work in sign language translation to date has focused mainly on translating from signs to text. Towards the latter goal, we propose a text-to-sign translation model, SignNet, which exploits the notion of similarity (and dissimilarity) of visual signs in translating. This module presented is only one part of a dual-learning two task process involving text-to-sign (T2S) as well as sign-to-text (S2T). We currently implement SignNet as a single channel architecture so that the output of the T2S task can be fed into S2T in a continuous dual learning framework. By single channel, we refer to a single modality, the body pose joints. In this work, we present SignNet, a T2S task using a novel metric embedding learning process, to preserve the distances between sign embeddings relative to their dissimilarity. We also describe how to choose positive and negative examples of signs for similarity testing. From our analysis, we observe that metric embedding learning-based model perform significantly better than the other models with traditional losses, when evaluated using BLEU scores. In the task of gloss to pose, SignNet performed as well as its state-of-the-art (SoTA) counterparts and outperformed them in the task of text to pose, by showing noteworthy enhancements in BLEU 1 - BLEU 4 scores (BLEU 1: 31->39; ~26% improvement and BLEU 4: 10.43->11.84; ~14\% improvement) when tested on the popular RWTH PHOENIX-Weather-2014T benchmark dataset

评论：	9页，4图，4表 - IEEE 人脸与手势，2023
主题：	人机交互 (cs.HC) ; 人工智能 (cs.AI)
引用方式：	arXiv:2212.02848 [cs.HC]
	(或者 arXiv:2212.02848v1 [cs.HC] 对于此版本)
	https://doi.org/10.48550/arXiv.2212.02848

提交历史

来自： Ifeoma Nwogu [查看电子邮件]
[v1] 星期二， 2022 年 12 月 6 日 09:37:01 UTC (4,333 KB)

计算机科学 > 人机交互

标题： SignNet：使用度量嵌入学习的单通道签名生成

标题： SignNet: Single Channel Sign Generation using Metric Embedded Learning

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人机交互

标题： SignNet：使用度量嵌入学习的单通道签名生成 显示英文标题

标题： SignNet: Single Channel Sign Generation using Metric Embedded Learning

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： SignNet：使用度量嵌入学习的单通道签名生成