Skip to main content

Showing 1–7 of 7 results for author: Prakash, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.14434  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Unifying Streaming and Non-streaming Zipformer-based ASR

    Authors: Bidisha Sharma, Karthik Pandia Durai, Shankar Venkatesan, Jeena J Prakash, Shashi Kumar, Malolan Chetlur, Andreas Stolcke

    Abstract: There has been increasing interest in unifying streaming and non-streaming automatic speech recognition (ASR) models to reduce development, training, and deployment costs. We present a unified framework that trains a single end-to-end ASR model for both streaming and non-streaming applications, leveraging future context information. We propose to use dynamic right-context through the chunked atten… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted in ACL2025 Industry track

  2. arXiv:2506.11089  [pdf, ps, other

    eess.AS cs.AI cs.CL

    Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM

    Authors: Jeena Prakash, Blessingh Kumar, Kadri Hacioglu, Bidisha Sharma, Sindhuja Gopalan, Malolan Chetlur, Shankar Venkatesan, Andreas Stolcke

    Abstract: Automatic speech recognition (ASR) models rely on high-quality transcribed data for effective training. Generating pseudo-labels for large unlabeled audio datasets often relies on complex pipelines that combine multiple ASR outputs through multi-stage processing, leading to error propagation, information loss and disjoint optimization. We propose a unified multi-ASR prompt-driven framework using p… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2506.04981  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

    Authors: Andres Carofilis, Pradeep Rangappa, Srikanth Madikeri, Shashi Kumar, Sergio Burdisso, Jeena Prakash, Esau Villatoro-Tello, Petr Motlicek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke

    Abstract: Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce. But unlabeled audio and labeled data from related domains are often available. We propose an incremental semi-supervised learning pipeline that first integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain, achieving a relative improvement of 4% over no auxilia… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025, Netherlands

  4. arXiv:2506.03681  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

    Authors: Pradeep Rangappa, Andres Carofilis, Jeena Prakash, Shashi Kumar, Sergio Burdisso, Srikanth Madikeri, Esau Villatoro-Tello, Bidisha Sharma, Petr Motlicek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke

    Abstract: Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources. Here, we explore different data selection pipelines and propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper (encoder-decoder) and Zipformer (transducer) models. Our approach integrates multiple sel… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025, Netherlands

  5. arXiv:2505.17070  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Improving endpoint detection in end-to-end streaming ASR for conversational speech

    Authors: Anandh C, Karthik Pandia Durai, Jeena Prakash, Manickavela Arumugam, Kadri Hacioglu, S. Pavankumar Dubagunta, Andreas Stolcke, Shankar Venkatesan, Aravind Ganapathiraju

    Abstract: ASR endpointing (EP) plays a major role in delivering a good user experience in products supporting human or artificial agents in human-human/machine conversations. Transducer-based ASR (T-ASR) is an end-to-end (E2E) ASR modelling technique preferred for streaming. A major limitation of T-ASR is delayed emission of ASR outputs, which could lead to errors or delays in EP. Inaccurate EP will cut the… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Submitted to Interspeech 2024

  6. arXiv:2305.00911  [pdf, ps, other

    cs.RO eess.SY

    SRPT vs Smith Predictor for Vehicle Teleoperation

    Authors: Jai Prakash, Michele Vignati, Edoardo Sabbioni

    Abstract: Vehicle teleoperation has potential applications in fallback solutions for autonomous vehicles, remote delivery services, and hazardous operations. However, network delays and limited situational awareness can compromise teleoperation performance and increase the cognitive workload of human operators. To address these issues, we previously introduced the novel successive reference pose tracking (S… ▽ More

    Submitted 27 April, 2023; originally announced May 2023.

    Comments: This work has been submitted to the IEEE for possible publication

  7. arXiv:1707.08391  [pdf, other

    physics.med-ph cs.CV eess.IV physics.optics

    Maximum entropy based non-negative optoacoustic tomographic image reconstruction

    Authors: Jaya Prakash, Subhamoy Mandal, Daniel Razansky, Vasilis Ntziachristos

    Abstract: Objective:Optoacoustic (photoacoustic) tomography is aimed at reconstructing maps of the initial pressure rise induced by the absorption of light pulses in tissue. In practice, due to inaccurate assumptions in the forward model, noise and other experimental factors, the images are often afflicted by artifacts, occasionally manifested as negative values. The aim of the work is to develop an inversi… ▽ More

    Submitted 11 January, 2019; v1 submitted 26 July, 2017; originally announced July 2017.

    Comments: This article has been accepted for publication in IEEE Transactions on Biomedical Engineering (30 Dec 2018)