Skip to main content

Showing 1–50 of 389 results for author: Srikanth

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  2. arXiv:2506.04981  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

    Authors: Andres Carofilis, Pradeep Rangappa, Srikanth Madikeri, Shashi Kumar, Sergio Burdisso, Jeena Prakash, Esau Villatoro-Tello, Petr Motlicek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke

    Abstract: Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce. But unlabeled audio and labeled data from related domains are often available. We propose an incremental semi-supervised learning pipeline that first integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain, achieving a relative improvement of 4% over no auxilia… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025, Netherlands

  3. arXiv:2506.04571  [pdf, ps, other

    cs.AI

    OpenAg: Democratizing Agricultural Intelligence

    Authors: Srikanth Thudumu, Jason Fisher

    Abstract: Agriculture is undergoing a major transformation driven by artificial intelligence (AI), machine learning, and knowledge representation technologies. However, current agricultural intelligence systems often lack contextual understanding, explainability, and adaptability, especially for smallholder farmers with limited resources. General-purpose large language models (LLMs), while powerful, typical… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 10 pages, 1 figure

  4. arXiv:2506.04453  [pdf, other

    eess.IV cs.CR cs.CV cs.LG

    Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning

    Authors: Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy, Basak Guler

    Abstract: Federated learning (FL) allows multiple data-owners to collaboratively train machine learning models by exchanging local gradients, while keeping their private data on-device. To simultaneously enhance privacy and training efficiency, recently parameter-efficient fine-tuning (PEFT) of large-scale pretrained models has gained substantial attention in FL. While keeping a pretrained (backbone) model… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)

  5. arXiv:2506.03681  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

    Authors: Pradeep Rangappa, Andres Carofilis, Jeena Prakash, Shashi Kumar, Sergio Burdisso, Srikanth Madikeri, Esau Villatoro-Tello, Bidisha Sharma, Petr Motlicek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke

    Abstract: Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources. Here, we explore different data selection pipelines and propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper (encoder-decoder) and Zipformer (transducer) models. Our approach integrates multiple sel… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025, Netherlands

  6. arXiv:2506.01215  [pdf, other

    cs.CL cs.LG

    Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

    Authors: Woomin Song, Sai Muralidhar Jayanthi, Srikanth Ronanki, Kanthashree Mysore Sathyendra, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati

    Abstract: As large language models increasingly gain popularity in real-world applications, processing extremely long contexts, often exceeding the model's pre-trained context limits, has emerged as a critical challenge. While existing approaches to efficient long-context processing show promise, recurrent compression-based methods struggle with information preservation, whereas random access approaches req… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  7. arXiv:2506.00627  [pdf, ps, other

    cs.GT cs.AI

    The Disparate Effects of Partial Information in Bayesian Strategic Learning

    Authors: Srikanth Avasarala, Serena Wang, Juba Ziani

    Abstract: We study how partial information about scoring rules affects fairness in strategic learning settings. In strategic learning, a learner deploys a scoring rule, and agents respond strategically by modifying their features -- at some cost -- to improve their outcomes. However, in our work, agents do not observe the scoring rule directly; instead, they receive a noisy signal of said rule. We consider… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  8. arXiv:2505.24765  [pdf, ps, other

    quant-ph cs.AI

    Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

    Authors: Srikanth Thudumu, Jason Fisher, Hung Du

    Abstract: Supervised Quantum Machine Learning (QML) represents an intersection of quantum computing and classical machine learning, aiming to use quantum resources to support model training and inference. This paper reviews recent developments in supervised QML, focusing on methods such as variational quantum circuits, quantum neural networks, and quantum kernel methods, along with hybrid quantum-classical… ▽ More

    Submitted 17 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: Future outlook and roadmap of QML with 7 pages and 1 figure

  9. arXiv:2505.24477  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Evaluating Gemini in an arena for learning

    Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Ankit Anand, Avishkar Bhoopchand, Brett Wiltshire, Daniel Gillick, Daniel Kasenberg, Eleni Sgouritsa, Gal Elidan, Hengrui Liu, Holger Winnemoeller, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Komal Singh, Lisa Wang, Markus Kunesch, Miruna Pîslar, Niv Efron , et al. (12 additional authors not shown)

    Abstract: Artificial intelligence (AI) is poised to transform education, but the research community lacks a robust, general benchmark to evaluate AI models for learning. To assess state-of-the-art support for educational use cases, we ran an "arena for learning" where educators and pedagogy experts conduct blind, head-to-head, multi-turn comparisons of leading AI models. In particular, $N = 189$ educators d… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  10. arXiv:2505.19259  [pdf, ps, other

    cs.LG cs.AI

    Towards Large Reasoning Models for Agriculture

    Authors: Hossein Zaremehrjerdi, Shreyan Ganguly, Ashlyn Rairdin, Elizabeth Tranel, Benjamin Feuer, Juan Ignacio Di Salvo, Srikanth Panthulugiri, Hernan Torres Pacin, Victoria Moser, Sarah Jones, Joscif G Raigne, Yanben Shen, Heidi M. Dornath, Aditya Balu, Adarsh Krishnamurthy, Asheesh K Singh, Arti Singh, Baskar Ganapathysubramanian, Chinmay Hegde, Soumik Sarkar

    Abstract: Agricultural decision-making involves complex, context-specific reasoning, where choices about crops, practices, and interventions depend heavily on geographic, climatic, and economic conditions. Traditional large language models (LLMs) often fall short in navigating this nuanced problem due to limited reasoning capacity. We hypothesize that recent advances in large reasoning models (LRMs) can bet… ▽ More

    Submitted 27 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

  11. arXiv:2505.17168  [pdf

    cs.SE

    Designing and Implementing Robust Test Automation Frameworks using Cucumber BDD and Java

    Authors: Srikanth Srinivas, Lagan Goel

    Abstract: Modern software development demands rapid, reliable testing methods to maintain high quality in increasingly complex systems. This paper details a comprehensive approach to designing and implementing robust test automation frameworks by leveraging Cucumber BDD with Java. By utilizing Cucumber BDD natural language syntax, the framework enables clear communication between technical and non-technical… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 5 pages

  12. arXiv:2505.17047  [pdf

    cs.CL cs.AI

    Assessing the Quality of AI-Generated Clinical Notes: A Validated Evaluation of a Large Language Model Scribe

    Authors: Erin Palm, Astrit Manikantan, Mark E. Pepin, Herprit Mahal, Srikanth Subramanya Belwadi

    Abstract: In medical practices across the United States, physicians have begun implementing generative artificial intelligence (AI) tools to perform the function of scribes in order to reduce the burden of documenting clinical encounters. Despite their widespread use, no established methods exist to gauge the quality of AI scribes. To address this gap, we developed a blinded study comparing the relative per… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 15 pages, 5 tables, 1 figure. Submitted for peer review 05/15/2025

  13. arXiv:2505.16404  [pdf, ps, other

    eess.AS cs.SD

    UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension

    Authors: Kishan Gupta, Srikanth Korse, Andreas Brendel, Nicola Pia, Guillaume Fuchs

    Abstract: In practical application of speech codecs, a multitude of factors such as the quality of the radio connection, limiting hardware or required user experience necessitate trade-offs between achievable perceptual quality, engendered bitrate and computational complexity. Most conventional and neural speech codecs operate on wideband (WB) speech signals to achieve this compromise. To further enhance th… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  14. arXiv:2505.13979  [pdf, ps, other

    cs.CL

    Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection

    Authors: Maya Srikanth, Run Chen, Julia Hirschberg

    Abstract: Multimodal models play a key role in empathy detection, but their performance can suffer when modalities provide conflicting cues. To understand these failures, we examine cases where unimodal and multimodal predictions diverge. Using fine-tuned models for text, audio, and video, along with a gated fusion model, we find that such disagreements often reflect underlying ambiguity, as evidenced by an… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  15. arXiv:2505.09739  [pdf, ps, other

    cs.RO

    Trailblazer: Learning offroad costmaps for long range planning

    Authors: Kasi Viswanath, Felix Sanchez, Timothy Overbye, Jason M. Gregory, Srikanth Saripalli

    Abstract: Autonomous navigation in off-road environments remains a significant challenge in field robotics, particularly for Unmanned Ground Vehicles (UGVs) tasked with search and rescue, exploration, and surveillance. Effective long-range planning relies on the integration of onboard perception systems with prior environmental knowledge, such as satellite imagery and LiDAR data. This work introduces Trailb… ▽ More

    Submitted 10 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  16. arXiv:2505.00010  [pdf

    cs.CL cs.AI

    Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models

    Authors: Tri Nguyen, Lohith Srikanth Pentapalli, Magnus Sieverding, Laurah Turner, Seth Overla, Weibing Zheng, Chris Zhou, David Furniss, Danielle Weber, Michael Gharib, Matt Kelleher, Michael Shukis, Cameron Pawlik, Kelly Cohen

    Abstract: Jailbreaking in Large Language Models (LLMs) threatens their safe use in sensitive domains like education by allowing users to bypass ethical safeguards. This study focuses on detecting jailbreaks in 2-Sigma, a clinical education platform that simulates patient interactions using LLMs. We annotated over 2,300 prompts across 158 conversations using four linguistic variables shown to correlate stron… ▽ More

    Submitted 21 April, 2025; originally announced May 2025.

  17. arXiv:2504.18636  [pdf, ps, other

    cs.CR cs.AI cs.LO

    A Gradient-Optimized TSK Fuzzy Framework for Explainable Phishing Detection

    Authors: Lohith Srikanth Pentapalli, Jon Salisbury, Josette Riep, Kelly Cohen

    Abstract: Phishing attacks represent an increasingly sophisticated and pervasive threat to individuals and organizations, causing significant financial losses, identity theft, and severe damage to institutional reputations. Existing phishing detection methods often struggle to simultaneously achieve high accuracy and explainability, either failing to detect novel attacks or operating as opaque black-box mod… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: 14 pages, 5 figures

  18. arXiv:2504.15629  [pdf, ps, other

    cs.IR cs.CL

    CiteFix: Enhancing RAG Accuracy Through Post-Processing Citation Correction

    Authors: Harsh Maheshwari, Srikanth Tenneti, Alwarappan Nakkiran

    Abstract: Retrieval Augmented Generation (RAG) has emerged as a powerful application of Large Language Models (LLMs), revolutionizing information search and consumption. RAG systems combine traditional search capabilities with LLMs to generate comprehensive answers to user queries, ideally with accurate citations. However, in our experience of developing a RAG product, LLMs often struggle with source attrib… ▽ More

    Submitted 11 June, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  19. arXiv:2504.03991  [pdf, other

    cs.CL cs.AI cs.HC cs.MA

    Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models

    Authors: Siddharth Srikanth, Varun Bhatt, Boshen Zhang, Werner Hager, Charles Michael Lewis, Katia P. Sycara, Aaquib Tabrez, Stefanos Nikolaidis

    Abstract: Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) h… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  20. arXiv:2504.00180  [pdf, other

    cs.CL cs.AI

    Contradiction Detection in RAG Systems: Evaluating LLMs as Context Validators for Improved Information Consistency

    Authors: Vignesh Gokul, Srikanth Tenneti, Alwarappan Nakkiran

    Abstract: Retrieval Augmented Generation (RAG) systems have emerged as a powerful method for enhancing large language models (LLMs) with up-to-date information. However, the retrieval step in RAG can sometimes surface documents containing contradictory information, particularly in rapidly evolving domains such as news. These contradictions can significantly impact the performance of LLMs, leading to inconsi… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  21. arXiv:2503.15494  [pdf, ps, other

    cs.HC cs.AI cs.CY

    AI-Powered Assistive Technologies for Visual Impairment

    Authors: Prudhvi Naayini, Praveen Kumar Myakala, Chiranjeevi Bura, Anil Kumar Jonnalagadda, Srikanth Kamatala

    Abstract: Artificial Intelligence (AI) is revolutionizing assistive technologies. It offers innovative solutions to enhance the quality of life for individuals with visual impairments. This review examines the development, applications, and impact of AI-powered tools in key domains, such as computer vision, natural language processing (NLP), and wearable devices. Specific advancements include object recogni… ▽ More

    Submitted 13 January, 2025; originally announced March 2025.

  22. arXiv:2503.12370  [pdf, other

    cs.CL

    Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat Logs

    Authors: Rupak Sarkar, Neha Srikanth, Taylor Hudson, Rachel Rudinger, Claire Bonial, Philip Resnik

    Abstract: While it is commonly accepted that maintaining common ground plays a role in conversational success, little prior research exists connecting conversational grounding to success in task-oriented conversations. We study failures of grounding in the Ubuntu IRC dataset, where participants use text-only communication to resolve technical issues. We find that disruptions in conversational flow often ste… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: 8 pages

  23. arXiv:2503.00714  [pdf, other

    cs.DB cs.AI cs.HC cs.LG cs.MA

    Speculative Ad-hoc Querying

    Authors: Haoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer, Aditya Akella, Venkat Arun

    Abstract: Analyzing large datasets requires responsive query execution, but executing SQL queries on massive datasets can be slow. This paper explores whether query execution can begin even before the user has finished typing, allowing results to appear almost instantly. We propose SpeQL, a system that leverages Large Language Models (LLMs) to predict likely queries based on the database schema, the user's… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  24. arXiv:2502.18760  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Autonomy: Off-Road Navigation Enhanced by Human Input

    Authors: Akhil Nagariya, Dimitar Filev, Srikanth Saripalli, Gaurav Pandey

    Abstract: In the area of autonomous driving, navigating off-road terrains presents a unique set of challenges, from unpredictable surfaces like grass and dirt to unexpected obstacles such as bushes and puddles. In this work, we present a novel learning-based local planner that addresses these challenges by directly capturing human driving nuances from real-world demonstrations using only a monocular camera.… ▽ More

    Submitted 14 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Journal ref: 12th IFAC Symposium on Intelligent Autonomous Vehicles 2025

  25. arXiv:2502.13138  [pdf, other

    cs.AI cs.LG

    AIDE: AI-Driven Exploration in the Space of Code

    Authors: Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, Yuxiang Wu

    Abstract: Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of concep… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  26. arXiv:2502.08705  [pdf, other

    cs.CY cs.DL physics.ed-ph

    Beyond the Lens: Quantifying the Impact of Scientific Documentaries through Amazon Reviews

    Authors: Jill Naiman, Aria Pessianzadeh, Hanyu Zhao, AJ Christensen, Kalina Borkiewicz, Shriya Srikanth, Anushka Gami, Emma Maxwell, Louisa Zhang, Sri Nithya Yeragorla, Rezvaneh Rezapour

    Abstract: Engaging the public with science is critical for a well-informed population. A popular method of scientific communication is documentaries. Once released, it can be difficult to assess the impact of such works on a large scale, due to the overhead required for in-depth audience feedback studies. In what follows, we overview our complementary approach to qualitative studies through quantitative imp… ▽ More

    Submitted 4 March, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: Camera-ready version for WebSci 2025

  27. arXiv:2502.08080  [pdf, other

    cs.CL

    NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals

    Authors: Neha Srikanth, Rachel Rudinger

    Abstract: Decomposition of text into atomic propositions is a flexible framework allowing for the closer inspection of input and output text. We use atomic decomposition of hypotheses in two natural language reasoning tasks, traditional NLI and defeasible NLI, to form atomic sub-problems, or granular inferences that models must weigh when solving the overall problem. These atomic sub-problems serve as a too… ▽ More

    Submitted 7 March, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Accepted to NAACL 2025

  28. arXiv:2502.02442  [pdf, ps, other

    cs.CC

    The Algebraic Cost of a Boolean Sum

    Authors: Ian Orzel, Srikanth Srinivasan, Sébastien Tavenas, Amir Yehudayoff

    Abstract: The P versus NP problem is about the computational power of an existential $\exists_{w \in \{0,1\}^n}$ quantifier. The VP versus VNP problem is about the power of a boolean sum $\sum_{w \in \{0,1\}^n}$ operation. We study the power of a single boolean sum $\sum_{w \in \{0,1\}}$, and prove that in some cases the cost of eliminating this sum is large. This identifies a fundamental difference between… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  29. The Dead Internet Theory: A Survey on Artificial Interactions and the Future of Social Media

    Authors: Prathamesh Muzumdar, Sumanth Cheemalapati, Srikanth Reddy RamiReddy, Kuldeep Singh, George Kurian, Apoorva Muley

    Abstract: The Dead Internet Theory (DIT) suggests that much of today's internet, particularly social media, is dominated by non-human activity, AI-generated content, and corporate agendas, leading to a decline in authentic human interaction. This study explores the origins, core claims, and implications of DIT, emphasizing its relevance in the context of social media platforms. The theory emerged as a respo… ▽ More

    Submitted 6 January, 2025; originally announced February 2025.

  30. Determinants of Human Development Index (HDI): A Regression Analysis of Economic and Social Indicators

    Authors: Kuldeep Singh, Sumanth Cheemalapati, Srikanth Reddy RamiReddy, George Kurian, Prathamesh Muzumdar, Apoorva Muley

    Abstract: This study aims to investigate the factors influencing the Human Development Index (HDI). Five variables-GDP per capita, health expenditure, education expenditure, infant mortality rate (per 1,000 live births), and average years of schooling-were analyzed to develop a regression model assessing their impact on HDI. The results indicate that GDP per capita, infant mortality rate, and average years… ▽ More

    Submitted 6 January, 2025; originally announced February 2025.

  31. arXiv:2501.19274  [pdf, other

    cs.RO

    GO: The Great Outdoors Multimodal Dataset

    Authors: Peng Jiang, Kasi Viswanath, Akhil Nagariya, George Chustz, Maggie Wigness, Philip Osteen, Timothy Overbye, Christian Ellis, Long Quang, Srikanth Saripalli

    Abstract: The Great Outdoors (GO) dataset is a multi-modal annotated data resource aimed at advancing ground robotics research in unstructured environments. This dataset provides the most comprehensive set of data modalities and annotations compared to existing off-road datasets. In total, the GO dataset includes six unique sensor types with high-quality semantic annotations and GPS traces to support tasks… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: 5 pages, 5 figures

  32. arXiv:2501.18229  [pdf, other

    cs.RO

    GPD: Guided Polynomial Diffusion for Motion Planning

    Authors: Ajit Srikanth, Parth Mahanjan, Kallol Saha, Vishal Mandadi, Pranjal Paul, Pawan Wadhwani, Brojeshwar Bhowmick, Arun Singh, Madhava Krishna

    Abstract: Diffusion-based motion planners are becoming popular due to their well-established performance improvements, stemming from sample diversity and the ease of incorporating new constraints directly during inference. However, a primary limitation of the diffusion process is the requirement for a substantial number of denoising steps, especially when the denoising process is coupled with gradient-based… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  33. arXiv:2501.17361  [pdf, other

    cs.LG cs.AI

    The M-factor: A Novel Metric for Evaluating Neural Architecture Search in Resource-Constrained Environments

    Authors: Srikanth Thudumu, Hy Nguyen, Hung Du, Nhat Duong, Zafaryab Rasool, Rena Logothetis, Scott Barnett, Rajesh Vasa, Kon Mouzakis

    Abstract: Neural Architecture Search (NAS) aims to automate the design of deep neural networks. However, existing NAS techniques often focus on maximising accuracy, neglecting model efficiency. This limitation restricts their use in resource-constrained environments like mobile devices and edge computing systems. Moreover, current evaluation metrics prioritise performance over efficiency, lacking a balanced… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

  34. arXiv:2501.16753  [pdf, other

    cs.CV cs.AI

    Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction

    Authors: Hy Nguyen, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis

    Abstract: Next-frame prediction in videos is crucial for applications such as autonomous driving, object tracking, and motion prediction. The primary challenge in next-frame prediction lies in effectively capturing and processing both spatial and temporal information from previous video sequences. The transformer architecture, known for its prowess in handling sequence data, has made remarkable progress in… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

  35. arXiv:2501.15695  [pdf, other

    cs.MA cs.AI

    Contextual Knowledge Sharing in Multi-Agent Reinforcement Learning with Decentralized Communication and Coordination

    Authors: Hung Du, Srikanth Thudumu, Hy Nguyen, Rajesh Vasa, Kon Mouzakis

    Abstract: Decentralized Multi-Agent Reinforcement Learning (Dec-MARL) has emerged as a pivotal approach for addressing complex tasks in dynamic environments. Existing Multi-Agent Reinforcement Learning (MARL) methodologies typically assume a shared objective among agents and rely on centralized control. However, many real-world scenarios feature agents with individual goals and limited observability of othe… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  36. arXiv:2501.14000  [pdf, other

    cs.LG cs.AI

    Local Control Networks (LCNs): Optimizing Flexibility in Neural Network Data Pattern Capture

    Authors: Hy Nguyen, Duy Khoa Pham, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis

    Abstract: The widespread use of Multi-layer perceptrons (MLPs) often relies on a fixed activation function (e.g., ReLU, Sigmoid, Tanh) for all nodes within the hidden layers. While effective in many scenarios, this uniformity may limit the networks ability to capture complex data patterns. We argue that employing the same activation function at every node is suboptimal and propose leveraging different activ… ▽ More

    Submitted 25 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  37. arXiv:2501.13994  [pdf, other

    cs.CV cs.AI cs.RO

    CSAOT: Cooperative Multi-Agent System for Active Object Tracking

    Authors: Hy Nguyen, Bao Pham, Hung Du, Srikanth Thudumu, Rajesh Vasa, Kon Mouzakis

    Abstract: Object Tracking is essential for many computer vision applications, such as autonomous navigation, surveillance, and robotics. Unlike Passive Object Tracking (POT), which relies on static camera viewpoints to detect and track objects across consecutive frames, Active Object Tracking (AOT) requires a controller agent to actively adjust its viewpoint to maintain visual contact with a moving target i… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

  38. arXiv:2501.13992  [pdf, other

    cs.LG cs.AI

    Dual-Branch HNSW Approach with Skip Bridges and LID-Driven Optimization

    Authors: Hy Nguyen, Nguyen Hung Nguyen, Nguyen Linh Bao Nguyen, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis

    Abstract: The Hierarchical Navigable Small World (HNSW) algorithm is widely used for approximate nearest neighbor (ANN) search, leveraging the principles of navigable small-world graphs. However, it faces some limitations. The first is the local optima problem, which arises from the algorithm's greedy search strategy, selecting neighbors based solely on proximity at each step. This often leads to cluster di… ▽ More

    Submitted 25 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  39. arXiv:2501.06762  [pdf

    q-bio.NC cs.LG cs.NE

    Improving the adaptive and continuous learning capabilities of artificial neural networks: Lessons from multi-neuromodulatory dynamics

    Authors: Jie Mei, Alejandro Rodriguez-Garcia, Daigo Takeuchi, Gabriel Wainstein, Nina Hubig, Yalda Mohsenzadeh, Srikanth Ramaswamy

    Abstract: Continuous, adaptive learning-the ability to adapt to the environment and improve performance-is a hallmark of both natural and artificial intelligence. Biological organisms excel in acquiring, transferring, and retaining knowledge while adapting to dynamic environments, making them a rich source of inspiration for artificial neural networks (ANNs). This study explores how neuromodulation, a funda… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

  40. arXiv:2412.18566  [pdf, other

    cs.CL eess.AS

    Zero-resource Speech Translation and Recognition with LLMs

    Authors: Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff

    Abstract: Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a m… ▽ More

    Submitted 30 December, 2024; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: ICASSP 2025, 5 pages, 2 figures, 2 tables

  41. arXiv:2412.16530  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation

    Authors: Lucas Goncalves, Prashant Mathur, Xing Niu, Brady Houston, Chandrashekhar Lavania, Srikanth Vishnubhotla, Lijia Sun, Anthony Ferritto

    Abstract: Audio-Visual Speech-to-Speech Translation typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony-ensuring that the movements of the lips match the spoken content-essential for maintaining realism in dubbed videos. Despite its importance, the inclusion of lip-synchrony constraints in AVS2S models has been lar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

    Comments: Accepted at ICASSP, 4 pages

  42. arXiv:2412.16500  [pdf, other

    eess.AS cs.AI cs.CL

    Speech Retrieval-Augmented Generation without Automatic Speech Recognition

    Authors: Do June Min, Karel Mundnich, Andy Lapastora, Erfan Soltanmohammadi, Srikanth Ronanki, Kyu Han

    Abstract: One common approach for question answering over speech data is to first transcribe speech using automatic speech recognition (ASR) and then employ text-based retrieval-augmented generation (RAG) on the transcriptions. While this cascaded pipeline has proven effective in many practical settings, ASR errors can propagate to the retrieval and generation steps. To overcome this limitation, we introduc… ▽ More

    Submitted 3 January, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

    Comments: ICASSP 2025

  43. arXiv:2412.16429  [pdf, other

    cs.CY cs.AI cs.LG

    LearnLM: Improving Gemini for Learning

    Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Brett Wiltshire, Brian Veprek, Daniel Gillick, Daniel Kasenberg, Derek Ahmed, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Lisa Wang, Markus Kunesch, Mike Schaekermann, Miruna Pîslar, Nikhil Joshi, Parsa Mahmoudieh, Paul Jhun, Sara Wiltberger, Shakir Mohamed , et al. (21 additional authors not shown)

    Abstract: Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level ins… ▽ More

    Submitted 25 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

  44. arXiv:2412.00622  [pdf, other

    cs.CV cs.AI cs.LG

    Visual Modality Prompt for Adapting Vision-Language Object Detectors

    Authors: Heitor R. Medeiros, Atif Belal, Srikanth Muralidharan, Eric Granger, Marco Pedersoli

    Abstract: The zero-shot performance of object detectors degrades when tested on different modalities, such as infrared and depth. While recent work has explored image translation techniques to adapt detectors to new modalities, these methods are limited to a single modality and apply only to traditional detectors. Recently, vision-language detectors, such as YOLO-World and Grounding DINO, have shown promisi… ▽ More

    Submitted 14 March, 2025; v1 submitted 30 November, 2024; originally announced December 2024.

  45. arXiv:2411.14611  [pdf, other

    cs.SE cs.LG

    CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs

    Authors: Alex Mathai, Kranthi Sedamaki, Debeshee Das, Noble Saji Mathews, Srikanth Tamilselvam, Sridhar Chimalakonda, Atul Kumar

    Abstract: Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspir… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  46. arXiv:2411.07374  [pdf, ps, other

    cs.CC

    Low Degree Local Correction Over the Boolean Cube

    Authors: Prashanth Amireddy, Amik Raj Behera, Manaswi Paraashar, Srikanth Srinivasan, Madhu Sudan

    Abstract: In this work, we show that the class of multivariate degree-$d$ polynomials mapping $\{0,1\}^{n}$ to any Abelian group $G$ is locally correctable with $\widetilde{O}_{d}((\log n)^{d})$ queries for up to a fraction of errors approaching half the minimum distance of the underlying code. In particular, this result holds even for polynomials over the reals or the rationals, special cases that were pre… ▽ More

    Submitted 12 November, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

    Comments: 64 pages, To appear in SODA 2025, deleted image files

  47. arXiv:2411.07264  [pdf, other

    cs.IR cs.CL

    Multi-Document Financial Question Answering using LLMs

    Authors: Shalin Shah, Srikanth Ryali, Ramasubbu Venkatesh

    Abstract: We propose two new methods for multi-document financial question answering. First, a method that uses semantic tagging, and then, queries the index to get the context (RAG_SEM). And second, a Knowledge Graph (KG_RAG) based method that uses semantic tagging, and, retrieves knowledge graph triples from a graph database, as context. KG_RAG uses knowledge graphs constructed using a small model that is… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  48. arXiv:2410.21422  [pdf, other

    cs.CE

    A Foundation Model for Chemical Design and Property Prediction

    Authors: Feiyang Cai, Katelin Hanna, Tianyu Zhu, Tzuen-Rong Tzeng, Yongping Duan, Ling Liu, Srikanth Pilla, Gang Li, Feng Luo

    Abstract: Artificial intelligence (AI) has significantly advanced computational chemistry research in various tasks. However, traditional AI methods often rely on task-specific model designs and training, which constrain both the scalability of model size and generalization across different tasks. Here, we introduce ChemFM, a large foundation model specifically developed for chemicals. ChemFM comprises 3 bi… ▽ More

    Submitted 23 January, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

  49. arXiv:2410.19206  [pdf, other

    cs.LG cs.CL

    Inference time LLM alignment in single and multidomain preference spectrum

    Authors: Sadat Shahriar, Zheng Qi, Nikolaos Pappas, Srikanth Doss, Monica Sunkara, Kishaloy Halder, Manuel Mager, Yassine Benajiba

    Abstract: Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. To address these li… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  50. arXiv:2410.18481  [pdf, other

    cs.CL cs.AI cs.LG

    Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction

    Authors: Sergio Burdisso, Srikanth Madikeri, Petr Motlicek

    Abstract: Efficiently deriving structured workflows from unannotated dialogs remains an underexplored and formidable challenge in computational linguistics. Automating this process could significantly accelerate the manual design of workflows in new domains and enable the grounding of large language models in domain-specific flowcharts, enhancing transparency and controllability. In this paper, we introduce… ▽ More

    Submitted 5 November, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024 main conference

    Journal ref: https://aclanthology.org/2024.emnlp-main.310/