Search | arXiv e-print repository

Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward

Authors: Shashi Kumar, Iuliia Thorbecke, Sergio Burdisso, Esaú Villatoro-Tello, Manjunath K E, Kadri Hacioğlu, Pradeep Rangappa, Petr Motlicek, Aravind Ganapathiraju, Andreas Stolcke

Abstract: Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions, such as domain shifts and speech perturbations. In th… ▽ More Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions, such as domain shifts and speech perturbations. In this paper, we address these questions by conducting various ablation experiments using a recent and widely adopted approach called SLAM-ASR. We present novel empirical findings that offer insights on how to effectively utilize the SLAM-ASR architecture across a wide range of settings. Our main findings indicate that SLAM-ASR exhibits poor performance in cross-domain evaluation settings. Additionally, speech perturbations on in-domain data, such as changes in speech rate or additive noise, can significantly degrade performance. Our findings offer critical insights for fine-tuning and configuring robust LLM-based ASR models, tailored to different data characteristics and computational resources. △ Less

Submitted 22 January, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

Comments: Accepted in ICASSP 2025 SALMA Workshop

Journal ref: Proc. ICASSP Workshop on Speech and Audio Language Models (SALMA), 2025

arXiv:2407.04439 [pdf, other]

XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models

Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Iuliia Thorbecke, Petr Motlicek, Manjunath K E, Aravind Ganapathiraju

Abstract: Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are trained with full attention context. In this paper, we introduce XLSR-Transducer, where the XLSR-53 model is used as encoder in transducer setup. Our exper… ▽ More Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are trained with full attention context. In this paper, we introduce XLSR-Transducer, where the XLSR-53 model is used as encoder in transducer setup. Our experiments on the AMI dataset reveal that the XLSR-Transducer achieves 4% absolute WER improvement over Whisper large-v2 and 8% over a Zipformer transducer model trained from scratch. To enable streaming capabilities, we investigate different attention masking patterns in the self-attention computation of transformer layers within the XLSR-53 model. We validate XLSR-Transducer on AMI and 5 languages from CommonVoice under low-resource scenarios. Finally, with the introduction of attention sinks, we reduce the left context by half while achieving a relative 12% improvement in WER. △ Less

Submitted 8 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

Comments: 5 pages, double column

arXiv:2311.17235 [pdf]

doi 10.2458/azu_uapress_9780816540945-ch012

Photochemistry and Haze Formation

Authors: Mandt K. E., Luspay-Kuti A., Cheng A., Jessup K. -L., Gao P

Abstract: One of the many exciting revelations of the New Horizons flyby of Pluto was the observation of global haze layers at altitudes as high as 200 km in the visible wavelengths. This haze is produced in the upper atmosphere through photochemical processes, similar to the processes in Titan's atmosphere. As the haze particles grow in size and descend to the lower atmosphere, they coagulate and interact… ▽ More One of the many exciting revelations of the New Horizons flyby of Pluto was the observation of global haze layers at altitudes as high as 200 km in the visible wavelengths. This haze is produced in the upper atmosphere through photochemical processes, similar to the processes in Titan's atmosphere. As the haze particles grow in size and descend to the lower atmosphere, they coagulate and interact with the gases in the atmosphere through condensation and sticking processes that serve as temporary and permanent loss processes. New Horizons observations confirm studies of Titan haze analogs suggesting that photochemically produced haze particles harden as they grow in size. We outline in this chapter what is known about the photochemical processes that lead to haze production and outline feedback processes resulting from the presence of haze in the atmosphere, connect this to the evolution of Pluto's atmosphere, and discuss open questions that need to be addressed in future work. △ Less

Submitted 28 November, 2023; originally announced November 2023.

MSC Class: 85-01

Journal ref: In Pluto System After New Horizons (S. A. Stern, R. P. Binzel, W. M. Grundy, J. M. Moore, and L. A. Young, eds.), Univ. of Arizona, Tucson (2021)

arXiv:1812.02474 [pdf]

doi 10.5121/ijcnc.2018.10607

A Proactive Flow Admission and Re-Routing Scheme for Load Balancing and Mitigation of Congestion Propagation in SDN Data Plane

Authors: Sminesh C. N., Grace Mary Kanaga E., Ranjitha K

Abstract: The centralized architecture in software-defined network (SDN) provides a global view of the underlying network, paving the way for enormous research in the area of SDN traffic engineering (SDN TE). This research focuses on the load balancing aspects of SDN TE, given that the existing reactive methods for data-plane load balancing eventually result in packet loss and proactive schemes for data pla… ▽ More The centralized architecture in software-defined network (SDN) provides a global view of the underlying network, paving the way for enormous research in the area of SDN traffic engineering (SDN TE). This research focuses on the load balancing aspects of SDN TE, given that the existing reactive methods for data-plane load balancing eventually result in packet loss and proactive schemes for data plane load balancing do not address congestion propagation. In the proposed work, the SDN controller periodically monitors flow level statistics and utilization on each link in the network and over-utilized links that cause network congestion and packet loss are identified as bottleneck links. For load balancing the identified largest flow and further traffic through these bottleneck links are rerouted through the lightly-loaded alternate path. The proposed scheme models a Bayesian Network using the observed port utilization and residual bandwidth to decide whether the newly computed alternate path can handle the new flow load before flow admission which in turn reduces congestion propagation. The simulation results show that when the network traffic increases the proposed method efficiently re-routes the flows and balance the network load which substantially improves the network efficiency and the quality of service (QoS) parameters. △ Less

Submitted 6 December, 2018; originally announced December 2018.

Comments: 18 pages, 117-134

Journal ref: International Journal of Computer Networks & Communications (IJCNC) Vol.10, No.6, November 2018

Showing 1–4 of 4 results for author: E, M K