Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Jang, Youngjoon; Raajesh, Haran; Momeni, Liliane; Varol, Gül; Zisserman, Andrew

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.09754 (cs)

[Submitted on 16 Jan 2025 (v1), last revised 29 Mar 2025 (this version, v2)]

Title:Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Authors:Youngjoon Jang, Haran Raajesh, Liliane Momeni, Gül Varol, Andrew Zisserman

View PDF HTML (experimental)

Abstract:Our objective is to translate continuous sign language into spoken language text. Inspired by the way human interpreters rely on context for accurate translation, we incorporate additional contextual cues together with the signing video, into a new translation framework. Specifically, besides visual sign recognition features that encode the input video, we integrate complementary textual information from (i) captions describing the background show, (ii) translation of previous sentences, as well as (iii) pseudo-glosses transcribing the signing. These are automatically extracted and inputted along with the visual features to a pre-trained large language model (LLM), which we fine-tune to generate spoken language translations in text form. Through extensive ablation studies, we show the positive contribution of each input cue to the translation performance. We train and evaluate our approach on BOBSL -- the largest British Sign Language dataset currently available. We show that our contextual approach significantly enhances the quality of the translations compared to previously reported results on BOBSL, and also to state-of-the-art methods that we implement as baselines. Furthermore, we demonstrate the generality of our approach by applying it also to How2Sign, an American Sign Language dataset, and achieve competitive results.

Comments:	CVPR 2025 Camera Ready, Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.09754 [cs.CV]
	(or arXiv:2501.09754v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.09754

Submission history

From: Youngjoon Jang [view email]
[v1] Thu, 16 Jan 2025 18:59:03 UTC (21,457 KB)
[v2] Sat, 29 Mar 2025 09:02:32 UTC (21,404 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators