Skip to main content

Showing 1–7 of 7 results for author: Ugan, E Y

.
  1. arXiv:2505.19679  [pdf, ps, other

    cs.CL cs.AI

    KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization

    Authors: Zhaolin Li, Yining Liu, Danni Liu, Tuan Nam Nguyen, Enes Yavuz Ugan, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues

    Abstract: This paper presents KIT's submissions to the IWSLT 2025 low-resource track. We develop both cascaded systems, consisting of Automatic Speech Recognition (ASR) and Machine Translation (MT) models, and end-to-end (E2E) Speech Translation (ST) systems for three language pairs: Bemba, North Levantine Arabic, and Tunisian Arabic into English. Building upon pre-trained models, we fine-tune our systems w… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2501.09512  [pdf, ps, other

    cs.CL cs.LG

    PIER: A Novel Metric for Evaluating What Matters in Code-Switching

    Authors: Enes Yavuz Ugan, Ngoc-Quan Pham, Leonard Bärmann, Alex Waibel

    Abstract: Code-switching, the alternation of languages within a single discourse, presents a significant challenge for Automatic Speech Recognition. Despite the unique nature of the task, performance is commonly measured with established metrics such as Word-Error-Rate (WER). However, in this paper, we question whether these general metrics accurately assess performance on code-switching. Specifically, usin… ▽ More

    Submitted 21 January, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  3. arXiv:2410.11434  [pdf, other

    cs.CL

    Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck

    Authors: Fevziye Irem Eyiokur, Christian Huber, Thai-Binh Nguyen, Tuan-Nam Nguyen, Fabian Retkowski, Enes Yavuz Ugan, Dogucan Yaman, Alexander Waibel

    Abstract: In this paper, we report on communication experiments conducted in the summer of 2022 during a deep dive to the wreck of the Titanic. Radio transmission is not possible in deep sea water, and communication links rely on sonar signals. Due to the low bandwidth of sonar signals and the need to communicate readable data, text messaging is used in deep-sea missions. In this paper, we report results an… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  4. arXiv:2308.03415  [pdf, other

    cs.CL cs.AI

    End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

    Authors: Christian Huber, Tu Anh Dinh, Carlos Mullov, Ngoc Quan Pham, Thai Binh Nguyen, Fabian Retkowski, Stefan Constantin, Enes Yavuz Ugan, Danni Liu, Zhaolin Li, Sai Koneru, Jan Niehues, Alexander Waibel

    Abstract: The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work… ▽ More

    Submitted 17 July, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Demo paper at EMNLP 2023

  5. arXiv:2306.05320  [pdf, other

    cs.CL cs.SD

    KIT's Multilingual Speech Translation System for IWSLT 2023

    Authors: Danni Liu, Thai Binh Nguyen, Sai Koneru, Enes Yavuz Ugan, Ngoc-Quan Pham, Tuan-Nam Nguyen, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues

    Abstract: Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks. The test condition features accented input speech and te… ▽ More

    Submitted 12 July, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: IWSLT 2023

  6. arXiv:2210.08992  [pdf, other

    cs.CL cs.SD eess.AS

    Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition

    Authors: Enes Yavuz Ugan, Christian Huber, Juan Hussain, Alexander Waibel

    Abstract: Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages. While today's neural end-to-end (E2E) models deliver state-of-the-art performances on the task of automatic speech recognition (ASR) it is commonly known that these systems are very data-intensive. However, there is only a few transcribed and aligned CS speech available. To overcome t… ▽ More

    Submitted 3 July, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: 18 pages

  7. arXiv:2210.01512  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Code-Switching without Switching: Language Agnostic End-to-End Speech Translation

    Authors: Christian Huber, Enes Yavuz Ugan, Alexander Waibel

    Abstract: We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmentation strategy to increase code-switching (CS) performance. With increasing globalization, multiple languages are increasingly used interchangeably during fluent speech. Such CS complicates traditional speech recognition and translation, as we must recognize which language was spoken first and then a… ▽ More

    Submitted 9 November, 2022; v1 submitted 4 October, 2022; originally announced October 2022.