Skip to main content

Showing 1–32 of 32 results for author: Camgöz, N C

Searching in archive cs. Search in all archives.
.
  1. POET: Prompt Offset Tuning for Continual Human Action Adaptation

    Authors: Prachi Garg, Joseph K J, Vineeth N Balasubramanian, Necati Cihan Camgoz, Chengde Wan, Kenrick Kin, Weiguang Si, Shugao Ma, Fernando De La Torre

    Abstract: As extended reality (XR) is redefining how users interact with computing devices, research in human action recognition is gaining prominence. Typically, models deployed on immersive computing devices are static and limited to their default set of classes. The goal of our research is to provide users and developers with the capability to personalize their experience by adding new action classes to… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: ECCV 2024 (Oral), webpage https://humansensinglab.github.io/POET-continual-action-recognition/

    Journal ref: ECCV 2024, Lecture Notes in Computer Science, vol. 15122, Springer, 2025, pp. 436-455

  2. arXiv:2504.13915  [pdf, other

    cs.CV

    Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding

    Authors: Dibyadip Chatterjee, Edoardo Remelli, Yale Song, Bugra Tekin, Abhay Mittal, Bharat Bhatnagar, Necati Cihan Camgöz, Shreyas Hampali, Eric Sauser, Shugao Ma, Angela Yao, Fadime Sener

    Abstract: We introduce ProVideLLM, an end-to-end framework for real-time procedural video understanding. ProVideLLM integrates a multimodal cache configured to store two types of tokens - verbalized text tokens, which provide compressed textual summaries of long-term observations, and visual tokens, encoded with DETR-QFormer to capture fine-grained details from short-term observations. This design reduces t… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 13 pages, 5 figures; https://dibschat.github.io/ProVideLLM

  3. arXiv:2503.08529  [pdf, other

    cs.CV

    SignRep: Enhancing Self-Supervised Sign Representations

    Authors: Ryan Wong, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign language representation learning presents unique challenges due to the complex spatio-temporal nature of signs and the scarcity of labeled datasets. Existing methods often rely either on models pre-trained on general visual tasks, that lack sign-specific features, or use complex multimodal and multi-branch architectures. To bridge this gap, we introduce a scalable, self-supervised framework f… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  4. arXiv:2412.08274  [pdf, other

    cs.CL

    2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset

    Authors: Marta R. Costa-jussà, Bokai Yu, Pierre Andrews, Belen Alastruey, Necati Cihan Camgoz, Joe Chuang, Jean Maillard, Christophe Ropers, Arina Turkantenko, Carleigh Wood

    Abstract: We introduce the first highly multilingual speech and American Sign Language (ASL) comprehension dataset by extending BELEBELE. Our dataset covers 74 spoken languages at the intersection of BELEBELE and FLEURS, and one sign language (ASL). We evaluate 2M-BELEBELE dataset for both 5-shot and zero-shot settings and across languages, the speech comprehension accuracy is ~ 2-3% average lower compared… ▽ More

    Submitted 23 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

    ACM Class: I.2.7

  5. arXiv:2405.04164  [pdf, other

    cs.CV

    Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation

    Authors: Ryan Wong, Necati Cihan Camgoz, Richard Bowden

    Abstract: Automatic Sign Language Translation requires the integration of both computer vision and natural language processing to effectively bridge the communication gap between sign and spoken languages. However, the deficiency in large-scale training data to support sign language translation means we need to leverage resources from spoken language. We introduce, Sign2GPT, a novel framework for sign langu… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted at ICLR2024

  6. arXiv:2403.10434  [pdf, other

    cs.CV

    Using an LLM to Turn Sign Spottings into Spoken Language Sentences

    Authors: Ozge Mercanoglu Sincan, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a hybrid SLT approach, Spotter+GPT, that utilizes a sign spotter and a powerful Large Language Model (LLM) to improve SLT performance. Spotter+GPT breaks down the SLT task into two stages. The videos are first processed by the Spotter, which i… ▽ More

    Submitted 14 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  7. arXiv:2402.09611  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Towards Privacy-Aware Sign Language Translation at Scale

    Authors: Phillip Rust, Bowen Shi, Skyler Wang, Necati Cihan Camgöz, Jean Maillard

    Abstract: A major impediment to the advancement of sign language translation (SLT) is data scarcity. Much of the sign language data currently available on the web cannot be used for training supervised models due to the lack of aligned captions. Furthermore, scaling SLT using large-scale web-scraped datasets bears privacy risks due to the presence of biometric information, which the responsible development… ▽ More

    Submitted 7 August, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  8. arXiv:2308.09622  [pdf, other

    cs.CV

    Is context all you need? Scaling Neural Sign Language Translation to Large Domains of Discourse

    Authors: Ozge Mercanoglu Sincan, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos, both of which have different grammar and word/gloss order. From a Neural Machine Translation (NMT) perspective, the straightforward way of training translation models is to use sign language phrase-spoken language sentence pairs. However, human interpreters heavily rely… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  9. arXiv:2308.09515  [pdf, other

    cs.CV

    Learnt Contrastive Concept Embeddings for Sign Recognition

    Authors: Ryan Wong, Necati Cihan Camgoz, Richard Bowden

    Abstract: In natural language processing (NLP) of spoken languages, word embeddings have been shown to be a useful method to encode the meaning of words. Sign languages are visual languages, which require sign embeddings to capture the visual and linguistic semantics of sign. Unlike many common approaches to Sign Recognition, we focus on explicitly creating sign embeddings that bridge the gap between sign l… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  10. arXiv:2210.00951  [pdf, other

    cs.CV

    Hierarchical I3D for Sign Spotting

    Authors: Ryan Wong, Necati Cihan Camgöz, Richard Bowden

    Abstract: Most of the vision-based sign language research to date has focused on Isolated Sign Language Recognition (ISLR), where the objective is to predict a single sign class given a short video clip. Although there has been significant progress in ISLR, its real-life applications are limited. In this paper, we focus on the challenging task of Sign Spotting instead, where the goal is to simultaneously id… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  11. arXiv:2203.15354  [pdf, other

    cs.CV

    Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. However, current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences from constrained vocabularies and this limits applicability. To be understandable and accepted by the deaf, an automatic SLP system must be able to generate co-articulated p… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2011.09846

  12. arXiv:2202.09096  [pdf, other

    cs.LG stat.ME stat.ML

    A Free Lunch with Influence Functions? Improving Neural Network Estimates with Concepts from Semiparametric Statistics

    Authors: Matthew J. Vowels, Sina Akbari, Necati Cihan Camgoz, Richard Bowden

    Abstract: Parameter estimation in empirical fields is usually undertaken using parametric models, and such models readily facilitate statistical inference. Unfortunately, they are unlikely to be sufficiently flexible to be able to adequately model real-world phenomena, and may yield biased estimates. Conversely, non-parametric approaches are flexible but do not readily facilitate statistical inference and m… ▽ More

    Submitted 10 June, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

  13. arXiv:2112.05277  [pdf, other

    cs.CV cs.CL

    Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications. In addition, these works represent sign language as a sequence of skeleton pose vectors, projected to an abstract representation with no inherent skeletal structure. In this paper, we represent sign language sequences as a ske… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  14. arXiv:2108.04229  [pdf, other

    cs.CV

    Looking for the Signs: Identifying Isolated Sign Instances in Continuous Video Footage

    Authors: Tao Jiang, Necati Cihan Camgoz, Richard Bowden

    Abstract: In this paper, we focus on the task of one-shot sign spotting, i.e. given an example of an isolated sign (query), we want to identify whether/where this sign appears in a continuous, co-articulated sign language video (target). To achieve this goal, we propose a transformer-based network, called SignLookup. We employ 3D Convolutional Neural Networks (CNNs) to extract spatio-temporal representation… ▽ More

    Submitted 20 November, 2021; v1 submitted 21 July, 2021; originally announced August 2021.

    Comments: 8 pages, 2 figures

  15. arXiv:2107.11317  [pdf, other

    cs.CV

    Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: It is common practice to represent spoken languages at their phonetic level. However, for sign languages, this implies breaking motion into its constituent motion primitives. Avatar based Sign Language Production (SLP) has traditionally done just this, building up animation from sequences of hand motions, shapes and facial expressions. However, more recent deep learning based solutions to SLP have… ▽ More

    Submitted 26 July, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

    Journal ref: International Conference of Computer Vision (ICCV 2021)

  16. arXiv:2107.10685  [pdf, other

    cs.CV

    AnonySIGN: Novel Human Appearance Synthesis for Sign Language Video Anonymisation

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: The visual anonymisation of sign language data is an essential task to address privacy concerns raised by large-scale dataset collection. Previous anonymisation techniques have either significantly affected sign comprehension or required manual, labour-intensive work. In this paper, we formally introduce the task of Sign Language Video Anonymisation (SLVA) as an automatic method to anonymise the… ▽ More

    Submitted 23 July, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Journal ref: Face and Gesture Conference 2021

  17. arXiv:2105.02351  [pdf, other

    cs.CV cs.CL

    Content4All Open Research Sign Language Translation Datasets

    Authors: Necati Cihan Camgoz, Ben Saunders, Guillaume Rochette, Marco Giovanelli, Giacomo Inches, Robin Nachtrab-Ribback, Richard Bowden

    Abstract: Computational sign language research lacks the large-scale datasets that enables the creation of useful reallife applications. To date, most research has been limited to prototype systems on small domains of discourse, e.g. weather forecasts. To address this issue and to push the field forward, we release six datasets comprised of 190 hours of footage on the larger domain of news. From this, 20 ho… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

  18. arXiv:2104.11712  [pdf, other

    cs.CV

    Skeletor: Skeletal Transformers for Robust Body-Pose Estimation

    Authors: Tao Jiang, Necati Cihan Camgoz, Richard Bowden

    Abstract: Predicting 3D human pose from a single monoscopic video can be highly challenging due to factors such as low resolution, motion blur and occlusion, in addition to the fundamental ambiguity in estimating 3D from 2D. Approaches that directly regress the 3D pose from independent images can be particularly susceptible to these factors and result in jitter, noise and/or inconsistencies in skeletal esti… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

  19. arXiv:2104.10166  [pdf, other

    cs.CL

    Evaluating the Immediate Applicability of Pose Estimation for Sign Language Recognition

    Authors: Amit Moryossef, Ioannis Tsochantaridis, Joe Dinn, Necati Cihan Camgöz, Richard Bowden, Tao Jiang, Annette Rios, Mathias Müller, Sarah Ebling

    Abstract: Signed languages are visual languages produced by the movement of the hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-preserving, low-dimensional representations. Basically, skeletal representations generalize over an individual's appearance and background, allowing us to focus on the recognition of mo… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  20. arXiv:2104.08183  [pdf, other

    cs.CV stat.AP stat.ML

    Shadow-Mapping for Unsupervised Neural Causal Discovery

    Authors: Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

    Abstract: An important goal across most scientific fields is the discovery of causal structures underling a set of observations. Unfortunately, causal discovery methods which are based on correlation or mutual information can often fail to identify causal links in systems which exhibit dynamic relationships. Such dynamic systems (including the famous coupled logistic map) exhibit `mirage' correlations which… ▽ More

    Submitted 28 April, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

  21. arXiv:2103.07292  [pdf, other

    cs.CV cs.LG

    VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts

    Authors: Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

    Abstract: Disentangled representations support a range of downstream tasks including causal reasoning, generative modeling, and fair machine learning. Unfortunately, disentanglement has been shown to be impossible without the incorporation of supervision or inductive bias. Given that supervision is often expensive or infeasible to acquire, we choose to incorporate structural inductive bias and present an un… ▽ More

    Submitted 15 December, 2021; v1 submitted 12 March, 2021; originally announced March 2021.

  22. arXiv:2103.06982  [pdf, other

    cs.CV

    Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign languages are multi-channel visual languages, where signers use a continuous 3D space to communicate.Sign Language Production (SLP), the automatic translation from spoken to sign languages, must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community. Previous deep learning-based SLP works have produced only a concatenation of isola… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

  23. arXiv:2103.02582  [pdf, other

    cs.LG stat.ME stat.ML

    D'ya like DAGs? A Survey on Structure Learning and Causal Discovery

    Authors: Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

    Abstract: Causal reasoning is a crucial part of science and human intelligence. In order to discover causal relationships from data, we need structure discovery methods. We provide a review of background theory and a survey of methods for structure discovery. We primarily focus on modern, continuous optimization methods, and provide reference to further resources such as benchmark datasets and software pack… ▽ More

    Submitted 4 March, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: 35 pages

  24. arXiv:2011.09846  [pdf, other

    cs.CV cs.CL cs.LG

    Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: To be truly understandable and accepted by Deaf communities, an automatic Sign Language Production (SLP) system must generate a photo-realistic signer. Prior approaches based on graphical avatars have proven unpopular, whereas recent neural SLP works that produce skeleton pose sequences have been shown to be not understandable to Deaf viewers. In this paper, we propose SignGAN, the first SLP mod… ▽ More

    Submitted 26 November, 2020; v1 submitted 19 November, 2020; originally announced November 2020.

  25. arXiv:2009.13472  [pdf, other

    stat.ML cs.AI cs.LG

    Targeted VAE: Variational and Targeted Learning for Causal Inference

    Authors: Matthew James Vowels, Necati Cihan Camgoz, Richard Bowden

    Abstract: Undertaking causal inference with observational data is incredibly useful across a wide range of tasks including the development of medical treatments, advertisements and marketing, and policy making. There are two significant challenges associated with undertaking causal inference using observational data: treatment assignment heterogeneity (\textit{i.e.}, differences between the treated and untr… ▽ More

    Submitted 15 January, 2022; v1 submitted 28 September, 2020; originally announced September 2020.

  26. arXiv:2009.00299  [pdf, other

    cs.CV

    Multi-channel Transformers for Multi-articulatory Sign Language Translation

    Authors: Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, Richard Bowden

    Abstract: Sign languages use multiple asynchronous information channels (articulators), not just the hands but also the face and body, which computational approaches often ignore. In this paper we tackle the multi-articulatory sign language translation task and propose a novel multi-channel transformer architecture. The proposed architecture allows both the inter and intra contextual relationships between d… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

  27. arXiv:2008.12405  [pdf, other

    cs.CV

    Adversarial Training for Multi-Channel Sign Language Production

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign Languages are rich multi-channel languages, requiring articulation of both manual (hands) and non-manual (face and body) features in a precise, intricate manner. Sign Language Production (SLP), the automatic translation from spoken to sign languages, must embody this full sign morphology to be truly understandable by the Deaf community. Previous work has mainly focused on manual feature produ… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

  28. arXiv:2004.14874  [pdf, other

    cs.CV cs.CL cs.LG

    Progressive Transformers for End-to-End Sign Language Production

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video at a level comparable to a human translator. If this was achievable, then it would revolutionise Deaf hearing communications. Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign s… ▽ More

    Submitted 20 July, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

  29. arXiv:2004.01283  [pdf, other

    cs.CV

    BosphorusSign22k Sign Language Recognition Dataset

    Authors: Oğulcan Özdemir, Ahmet Alp Kındıroğlu, Necati Cihan Camgöz, Lale Akarun

    Abstract: Sign Language Recognition is a challenging research domain. It has recently seen several advancements with the increased availability of data. In this paper, we introduce the BosphorusSign22k, a publicly available large scale sign language dataset aimed at computer vision, video recognition and deep learning research communities. The primary objective of this dataset is to serve as a new benchmark… ▽ More

    Submitted 9 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: 8 pages

  30. arXiv:2003.13830  [pdf, other

    cs.CV cs.CL cs.HC cs.LG

    Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation

    Authors: Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, Richard Bowden

    Abstract: Prior work on Sign Language Translation has shown that having a mid-level sign gloss representation (effectively recognizing the individual signs) improves the translation performance drastically. In fact, the current state-of-the-art in translation requires gloss level tokenization in order to work. We introduce a novel transformer based architecture that jointly learns Continuous Sign Language R… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  31. arXiv:2002.11576  [pdf, other

    cs.LG stat.ML

    NestedVAE: Isolating Common Factors via Weak Supervision

    Authors: Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

    Abstract: Fair and unbiased machine learning is an important and active field of research, as decision processes are increasingly driven by models that learn from data. Unfortunately, any biases present in the data may be learned by the model, thereby inappropriately transferring that bias into the decision making process. We identify the connection between the task of bias reduction and that of isolating f… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  32. arXiv:1911.06443  [pdf, other

    cs.CV cs.LG

    Gated Variational AutoEncoders: Incorporating Weak Supervision to Encourage Disentanglement

    Authors: Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

    Abstract: Variational AutoEncoders (VAEs) provide a means to generate representational latent embeddings. Previous research has highlighted the benefits of achieving representations that are disentangled, particularly for downstream tasks. However, there is some debate about how to encourage disentanglement with VAEs and evidence indicates that existing implementations of VAEs do not achieve disentanglement… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.