Skip to main content

Showing 1–18 of 18 results for author: Sunkara, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.21760  [pdf, other

    cs.CL

    MemInsight: Autonomous Memory Augmentation for LLM Agents

    Authors: Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, Yassine Benajiba

    Abstract: Large language model (LLM) agents have evolved to intelligently process information, make decisions, and interact with users or tools. A key capability is the integration of long-term memory capabilities, enabling these agents to draw upon historical interactions and knowledge. However, the growing memory size and need for semantic structuring pose significant challenges. In this work, we propose… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  2. arXiv:2502.12094  [pdf, other

    cs.AI cs.CL

    A Study on Leveraging Search and Self-Feedback for Agent Reasoning

    Authors: Karthikeyan K, Michelle Yuan, Elman Mansimov, Katerina Margatina, Anurag Pratik, Daniele Bonadiman, Monica Sunkara, Yi Zhang, Yassine Benajiba

    Abstract: Recent works have demonstrated that incorporating search during inference can significantly improve reasoning capabilities of language agents. Some approaches may make use of the ground truth or rely on model's own generated feedback. The search algorithm uses this feedback to then produce values that will update its criterion for exploring and exploiting various reasoning paths. In this study, we… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: Under review

  3. arXiv:2502.01630  [pdf, other

    cs.AI

    TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues

    Authors: Yubin Ge, Salvatore Romeo, Jason Cai, Raphael Shu, Monica Sunkara, Yassine Benajiba, Yi Zhang

    Abstract: Temporal reasoning in multi-session dialogues presents a significant challenge which has been under-studied in previous temporal reasoning benchmarks. To bridge this gap, we propose a new evaluation task for temporal reasoning in multi-session dialogues and introduce an approach to construct a new benchmark by augmenting dialogues from LoCoMo and creating multi-choice QAs. Furthermore, we present… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  4. arXiv:2412.05449  [pdf, other

    cs.CL cs.AI

    Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications

    Authors: Raphael Shu, Nilaksh Das, Michelle Yuan, Monica Sunkara, Yi Zhang

    Abstract: AI agents powered by large language models (LLMs) have shown strong capabilities in problem solving. Through combining many intelligent agents, multi-agent collaboration has emerged as a promising approach to tackle complex, multi-faceted problems that exceed the capabilities of single AI agents. However, designing the collaboration protocols and evaluating the effectiveness of these systems remai… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

    Comments: Technical report for multi-agent collaboration on AWS Bedrock Agents

  5. arXiv:2411.07161  [pdf, other

    cs.MA cs.AI

    RoundTable: Investigating Group Decision-Making Mechanism in Multi-Agent Collaboration

    Authors: Young-Min Cho, Raphael Shu, Nilaksh Das, Tamer Alkhouli, Yi-An Lai, Jason Cai, Monica Sunkara, Yi Zhang

    Abstract: This study investigates the efficacy of Multi-Agent Systems in eliciting cross-agent communication and enhancing collective intelligence through group decision-making in a decentralized setting. Unlike centralized mechanisms, where a fixed hierarchy governs social choice, decentralized group decision-making allows agents to engage in joint deliberation. Our research focuses on the dynamics of comm… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: preprint

  6. arXiv:2410.19206  [pdf, other

    cs.LG cs.CL

    Inference time LLM alignment in single and multidomain preference spectrum

    Authors: Sadat Shahriar, Zheng Qi, Nikolaos Pappas, Srikanth Doss, Monica Sunkara, Kishaloy Halder, Manuel Mager, Yassine Benajiba

    Abstract: Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. To address these li… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  7. CERET: Cost-Effective Extrinsic Refinement for Text Generation

    Authors: Jason Cai, Hang Su, Monica Sunkara, Igor Shalyminov, Saab Mansour

    Abstract: Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by… ▽ More

    Submitted 1 November, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: The source code and data samples are released at https://github.com/amazon-science/CERET-LLM-refine

    Journal ref: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

  8. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sravan Bodapati, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 24 March, 2025; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  9. A-DisETrac Advanced Analytic Dashboard for Distributed Eye Tracking

    Authors: Yasasi Abeysinghe, Bhanuka Mahanama, Gavindya Jayawardena, Yasith Jayawardana, Mohan Sunkara, Andrew T. Duchowski, Vikas Ashok, Sampath Jayarathna

    Abstract: Understanding how individuals focus and perform visual searches during collaborative tasks can help improve user engagement. Eye tracking measures provide informative cues for such understanding. This article presents A-DisETrac, an advanced analytic dashboard for distributed eye tracking. It uses off-the-shelf eye trackers to monitor multiple users in parallel, compute both traditional and advanc… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Journal ref: International Journal of Multimedia Data Engineering and Management (IJMDEM) 15.1 (2024) 1-20

  10. arXiv:2305.07677  [pdf, other

    cs.SD cs.CL cs.LG

    Masked Audio Text Encoders are Effective Multi-Modal Rescorers

    Authors: Jinglun Cai, Monica Sunkara, Xilai Li, Anshu Bhatia, Xiao Pan, Sravan Bodapati

    Abstract: Masked Language Models (MLMs) have proven to be effective for second-pass rescoring in Automatic Speech Recognition (ASR) systems. In this work, we propose Masked Audio Text Encoder (MATE), a multi-modal masked language model rescorer which incorporates acoustic representations into the input space of MLM. We adopt contrastive learning for effectively aligning the modalities by learning shared rep… ▽ More

    Submitted 24 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  11. arXiv:2305.03837  [pdf, other

    eess.AS cs.LG cs.SD

    Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

    Authors: Nilaksh Das, Monica Sunkara, Sravan Bodapati, Jinglun Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture,… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2023

  12. arXiv:2211.07828  [pdf, other

    cs.CL

    Adaptation Approaches for Nearest Neighbor Language Models

    Authors: Rishabh Bhardwaj, George Polovets, Monica Sunkara

    Abstract: Semi-parametric Nearest Neighbor Language Models ($k$NN-LMs) have produced impressive gains over purely parametric LMs, by leveraging large-scale neighborhood retrieval over external memory datastores. However, there has been little investigation into adapting such models for new domains. This work attempts to fill that gap and suggests the following approaches for adapting $k$NN-LMs -- 1) adaptin… ▽ More

    Submitted 12 June, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: 10 pages, 4 figures

  13. arXiv:2210.09510  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

    Authors: Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previ… ▽ More

    Submitted 13 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: To appear in SLT 2022

  14. arXiv:2109.05092  [pdf, other

    eess.AS cs.SD

    Remember the context! ASR slot error correction through memorization

    Authors: Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems. Although it is a critical step with direct impact on downstream tasks such as language understanding, many domain agnostic ASR systems tend to perform poorly on domain specific or long tail words. They are often supp… ▽ More

    Submitted 17 September, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: 8 pages, 3 figures, 4 tables, Accepted to ASRU 2021

  15. Adapting Long Context NLM for ASR Rescoring in Conversational Agents

    Authors: Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff

    Abstract: Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models and NLMs that use limited context. In this paper, we investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs. For recurrent based NLMs, we exp… ▽ More

    Submitted 4 June, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: Accepted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2103.10325

  16. arXiv:2102.06380  [pdf, ps, other

    cs.CL eess.AS

    Neural Inverse Text Normalization

    Authors: Monica Sunkara, Chaitanya Shivade, Sravan Bodapati, Katrin Kirchhoff

    Abstract: While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state transducer (FST) based models which rely on manually curated rules and are hence not scalable. We propose an efficient and robust neural solution for ITN leveraging tr… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 5 pages, accepted to ICASSP 2021

  17. arXiv:2008.00702  [pdf, other

    eess.AS cs.CL

    Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

    Authors: Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff

    Abstract: In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. Conventional approaches in speech processing typically use forced alignment to encoder per frame acoustic features to word level features and perform multimodal fusion of the resulting acoustic and lexical representatio… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted for Interspeech 2020

  18. arXiv:2007.02025  [pdf, other

    cs.CL cs.SD eess.AS

    Robust Prediction of Punctuation and Truecasing for Medical ASR

    Authors: Monica Sunkara, Srikanth Ronanki, Kalpit Dixit, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Automatic speech recognition (ASR) systems in the medical domain that focus on transcribing clinical dictations and doctor-patient conversations often pose many challenges due to the complexity of the domain. ASR output typically undergoes automatic punctuation to enable users to speak naturally, without having to vocalise awkward and explicit punctuation commands, such as "period", "add comma" or… ▽ More

    Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Comments: Accepted for ACL NLPMC workshop 2020