Skip to main content

Showing 1–11 of 11 results for author: Sathyendra, K M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, AdriĆ  de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  2. arXiv:2506.01215  [pdf, other

    cs.CL cs.LG

    Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

    Authors: Woomin Song, Sai Muralidhar Jayanthi, Srikanth Ronanki, Kanthashree Mysore Sathyendra, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati

    Abstract: As large language models increasingly gain popularity in real-world applications, processing extremely long contexts, often exceeding the model's pre-trained context limits, has emerged as a critical challenge. While existing approaches to efficient long-context processing show promise, recurrent compression-based methods struggle with information preservation, whereas random access approaches req… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  3. arXiv:2305.05271  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition

    Authors: Xuandi Fu, Kanthashree Mysore Sathyendra, Ankur Gandhe, Jing Liu, Grant P. Strimel, Ross McGowan, Athanasios Mouchtaris

    Abstract: Attention-based contextual biasing approaches have shown significant improvements in the recognition of generic and/or personal rare-words in End-to-End Automatic Speech Recognition (E2E ASR) systems like neural transducers. These approaches employ cross-attention to bias the model towards specific contextual entities injected as bias-phrases to the model. Prior approaches typically relied on subw… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at ICASSP 2023

  4. arXiv:2304.01905  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

    Authors: Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann

    Abstract: We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW s… ▽ More

    Submitted 4 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted to Proc. IEEE ICASSP 2023

  5. arXiv:2303.17799  [pdf, other

    cs.CL cs.SD eess.AS

    Dialog act guided contextual adapter for personalized speech recognition

    Authors: Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant P. Strimel, Ross McGowan

    Abstract: Personalization in multi-turn dialogs has been a long standing challenge for end-to-end automatic speech recognition (E2E ASR) models. Recent work on contextual adapters has tackled rare word recognition using user catalogs. This adaptation, however, does not incorporate an important cue, the dialog act, which is available in a multi-turn dialog scenario. In this work, we propose a dialog act guid… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023

  6. arXiv:2205.13660  [pdf, other

    cs.CL cs.LG

    Contextual Adapters for Personalized Speech Recognition in Neural Transducers

    Authors: Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann

    Abstract: Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a challenge due to the lack of training data. A standard way to address this issue is with shallow fusion methods at inference time. However, due to their dependence on external language models and the deterministic approach to weight boosting, their performance is limited. In this paper, we propose train… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted at ICASSP 2022

  7. arXiv:2204.00558  [pdf, other

    cs.CL cs.SD eess.AS

    Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

    Authors: Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra

    Abstract: End-to-end Spoken Language Understanding (E2E SLU) has attracted increasing interest due to its advantages of joint optimization and low latency when compared to traditionally cascaded pipelines. Existing E2E SLU models usually follow a two-stage configuration where an Automatic Speech Recognition (ASR) network first predicts a transcript which is then passed to a Natural Language Understanding (N… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: Accepted at ICASSP 2022

  8. arXiv:2112.06743  [pdf, other

    cs.CL cs.AI

    Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding

    Authors: Kai Wei, Thanh Tran, Feng-Ju Chang, Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Jing Liu, Anirudh Raju, Ross McGowan, Nathan Susanj, Ariya Rastrow, Grant P. Strimel

    Abstract: Recent years have seen significant advances in end-to-end (E2E) spoken language understanding (SLU) systems, which directly predict intents and slots from spoken audio. While dialogue history has been exploited to improve conventional text-based natural language understanding systems, current E2E SLU approaches have not yet incorporated such critical contextual signals in multi-turn and task-orien… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Journal ref: ASRU2021

  9. arXiv:2012.00124  [pdf, other

    cs.CL cs.AI cs.LG

    Extreme Model Compression for On-device Natural Language Understanding

    Authors: Kanthashree Mysore Sathyendra, Samridhi Choudhary, Leah Nicolich-Henkin

    Abstract: In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system traine… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: Long paper at COLING 2020

  10. arXiv:1807.07520  [pdf, ps, other

    cs.CL

    Statistical Model Compression for Small-Footprint Natural Language Understanding

    Authors: Grant P. Strimel, Kanthashree Mysore Sathyendra, Stanislav Peshterliev

    Abstract: In this paper we investigate statistical model compression applied to natural language understanding (NLU) models. Small-footprint NLU models are important for enabling offline systems on hardware restricted devices, and for decreasing on-demand model loading latency in cloud-based systems. To compress NLU models, we present two main techniques, parameter quantization and perfect feature hashing.… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

    Comments: Interspeech 2018

  11. arXiv:1706.07230  [pdf, other

    cs.LG cs.AI cs.CL cs.RO

    Gated-Attention Architectures for Task-Oriented Language Grounding

    Authors: Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, Ruslan Salakhutdinov

    Abstract: To perform tasks specified by natural language instructions, autonomous agents need to extract semantically meaningful representations of language and map it to visual elements and actions in the environment. This problem is called task-oriented language grounding. We propose an end-to-end trainable neural architecture for task-oriented language grounding in 3D environments which assumes no prior… ▽ More

    Submitted 8 January, 2018; v1 submitted 22 June, 2017; originally announced June 2017.

    Comments: To appear in AAAI-18