Skip to main content

Showing 1–50 of 88 results for author: Sraavan

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  2. arXiv:2506.09068  [pdf, ps, other

    cs.CV cs.LG cs.RO

    BG-HOP: A Bimanual Generative Hand-Object Prior

    Authors: Sriram Krishna, Sravan Chittupalli, Sungjae Park

    Abstract: In this work, we present BG-HOP, a generative prior that seeks to model bimanual hand-object interactions in 3D. We address the challenge of limited bimanual interaction data by extending existing single-hand generative priors, demonstrating preliminary results in capturing the joint distribution of hands and objects. Our experiments showcase the model's capability to generate bimanual interaction… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: Presented at Agents in Interaction, from Humans to Robots, CVPR 2025

  3. arXiv:2506.04708  [pdf, other

    cs.CL

    Accelerated Test-Time Scaling with Model-Free Speculative Sampling

    Authors: Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati

    Abstract: Language models have demonstrated remarkable capabilities in reasoning tasks through test-time scaling techniques like best-of-N sampling and tree search. However, these approaches often demand substantial computational resources, creating a critical trade-off between performance and efficiency. We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding appro… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2506.03194  [pdf, ps, other

    cs.CV cs.AI cs.LG

    HueManity: Probing Fine-Grained Visual Perception in MLLMs

    Authors: Rynaa Grover, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Nilay Pande

    Abstract: Multimodal Large Language Models (MLLMs) excel at high-level visual reasoning, but their performance on nuanced perceptual tasks remains surprisingly limited. We present HueManity, a benchmark designed to assess visual perception in MLLMs. The dataset comprises 83,850 images featuring two-character alphanumeric strings embedded in Ishihara test style dot patterns, challenging models on precise pat… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  5. arXiv:2506.01215  [pdf, other

    cs.CL cs.LG

    Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

    Authors: Woomin Song, Sai Muralidhar Jayanthi, Srikanth Ronanki, Kanthashree Mysore Sathyendra, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati

    Abstract: As large language models increasingly gain popularity in real-world applications, processing extremely long contexts, often exceeding the model's pre-trained context limits, has emerged as a critical challenge. While existing approaches to efficient long-context processing show promise, recurrent compression-based methods struggle with information preservation, whereas random access approaches req… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  6. arXiv:2506.01206  [pdf, other

    cs.CL cs.AI

    Mamba Drafters for Speculative Decoding

    Authors: Daewon Choi, Seunghyuk Oh, Saket Dingliwal, Jihoon Tack, Kyuyoung Kim, Woomin Song, Seojin Kim, Insu Han, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati

    Abstract: Speculative decoding has emerged as a promising approach to accelerating large language model (LLM) generation using a fast drafter while maintaining alignment with the target model's distribution. However, existing approaches face a trade-off: external drafters offer flexibility but can suffer from slower drafting, while self-speculation methods use drafters tailored to the target model but requi… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  7. arXiv:2506.00785  [pdf, ps, other

    cs.AI cs.CV cs.LG

    GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning

    Authors: Sahiti Yerramilli, Nilay Pande, Rynaa Grover, Jayant Sravan Tamarapalli

    Abstract: This paper introduces GeoChain, a large-scale benchmark for evaluating step-by-step geographic reasoning in multimodal large language models (MLLMs). Leveraging 1.46 million Mapillary street-level images, GeoChain pairs each image with a 21-step chain-of-thought (CoT) question sequence (over 30 million Q&A pairs). These sequences guide models from coarse attributes to fine-grained localization acr… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  8. arXiv:2505.21681  [pdf, ps, other

    cs.IT

    Residual Diffusion Models for Variable-Rate Joint Source Channel Coding of MIMO CSI

    Authors: Sravan Kumar Ankireddy, Heasung Kim, Hyeji Kim

    Abstract: Despite significant advancements in deep learning-based CSI compression, some key limitations remain unaddressed. Current approaches predominantly treat CSI compression as a source coding problem, neglecting transmission errors. In finite block length regimes, separate source and channel coding proves suboptimal, with reconstruction performance deteriorating significantly under challenging channel… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 13 pages, 11 figures

  9. arXiv:2504.08177  [pdf, other

    eess.IV cs.AI cs.CV

    SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data

    Authors: Sourya Sengupta, Satrajit Chakrabarty, Keerthi Sravan Ravi, Gopal Avinash, Ravi Soni

    Abstract: Foundation models like the Segment Anything Model (SAM) excel in zero-shot segmentation for natural images but struggle with medical image segmentation due to differences in texture, contrast, and noise. Annotating medical images is costly and requires domain expertise, limiting large-scale annotated data availability. To address this, we propose SynthFM, a synthetic data generation framework that… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  10. arXiv:2503.04992  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Wanda++: Pruning Large Language Models via Regional Gradients

    Authors: Yifan Yang, Kai Zhen, Bhavana Ganesh, Aram Galstyan, Goeric Huybrechts, Markus Müller, Jonas M. Kübler, Rupak Vignesh Swaminathan, Athanasios Mouchtaris, Sravan Babu Bodapati, Nathan Susanj, Zheng Zhang, Jack FitzGerald, Abhishek Kumar

    Abstract: Large Language Models (LLMs) pruning seeks to remove unimportant weights for inference speedup with minimal accuracy impact. However, existing methods often suffer from accuracy degradation without full-model sparsity-aware fine-tuning. This paper presents Wanda++, a novel pruning framework that outperforms the state-of-the-art methods by utilizing decoder-block-level \textbf{regional} gradients.… ▽ More

    Submitted 1 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: Paper accepted at ACL 2025 Findings

  11. arXiv:2502.18002  [pdf, other

    cs.LG cs.AI

    A Radon-Nikodým Perspective on Anomaly Detection: Theory and Implications

    Authors: Shlok Mehendale, Aditya Challa, Rahul Yedida, Sravan Danda, Santonu Sarkar, Snehanshu Saha

    Abstract: Which principle underpins the design of an effective anomaly detection loss function? The answer lies in the concept of Radon-Nikodým theorem, a fundamental concept in measure theory. The key insight from this article is: Multiplying the vanilla loss function with the Radon-Nikodým derivative improves the performance across the board. We refer to this as RN-Loss. We prove this using the setting of… ▽ More

    Submitted 16 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  12. arXiv:2501.07957  [pdf, other

    cs.RO cs.AI cs.CV cs.HC cs.LG

    AI Guide Dog: Egocentric Path Prediction on Smartphone

    Authors: Aishwarya Jadhav, Jeffery Cao, Abhishree Shetty, Urvashi Priyam Kumar, Aditi Sharma, Ben Sukboontip, Jayant Sravan Tamarapalli, Jingyi Zhang, Anirudh Koul

    Abstract: This paper presents AI Guide Dog (AIGD), a lightweight egocentric (first-person) navigation system for visually impaired users, designed for real-time deployment on smartphones. AIGD employs a vision-only multi-label classification approach to predict directional commands, ensuring safe navigation across diverse environments. We introduce a novel technique for goal-based outdoor navigation by inte… ▽ More

    Submitted 16 February, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

    Comments: Accepted at the AAAI 2025 Spring Symposium on Human-Compatible AI for Well-being: Harnessing Potential of GenAI for AI-Powered Science

  13. arXiv:2412.03035  [pdf, other

    cs.LG

    A Granger-Causal Perspective on Gradient Descent with Application to Pruning

    Authors: Aditya Shah, Aditya Challa, Sravan Danda, Archana Mathur, Snehanshu Saha

    Abstract: Stochastic Gradient Descent (SGD) is the main approach to optimizing neural networks. Several generalization properties of deep networks, such as convergence to a flatter minima, are believed to arise from SGD. This article explores the causality aspect of gradient descent. Specifically, we show that the gradient descent procedure has an implicit granger-causal relationship between the reduction i… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  14. arXiv:2410.20252  [pdf, other

    cs.CV cs.AI

    Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning

    Authors: Sullam Jeoung, Goeric Huybrechts, Bhavana Ganesh, Aram Galstyan, Sravan Bodapati

    Abstract: Understanding long-form video content presents significant challenges due to its temporal complexity and the substantial computational resources required. In this work, we propose an agent-based approach to enhance both the efficiency and effectiveness of long-form video understanding by utilizing large language models (LLMs) and their tool-harnessing ability. A key aspect of our method is query-a… ▽ More

    Submitted 26 October, 2024; originally announced October 2024.

  15. arXiv:2410.09362  [pdf, other

    cs.LG cs.AI

    SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins

    Authors: Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh, Sailik Sengupta, Sravan Bodapati, Aram Galstyan

    Abstract: Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used in DAAs are usually collected before the alignment training begins and remain unchanged (off-policy). This can lead to two problems where the policy… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  16. arXiv:2410.03775  [pdf, other

    cs.HC cs.AI

    Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge

    Authors: Aparna Elangovan, Lei Xu, Jongwoo Ko, Mahsa Elyasi, Ling Liu, Sravan Bodapati, Dan Roth

    Abstract: The effectiveness of automatic evaluation of generative models is typically measured by comparing the labels generated via automation with labels by humans using correlation metrics. However, metrics like Krippendorff's $α$ and Randolph's $κ$ were originally designed to measure the reliability of human labeling, thus make assumptions about typical human labeling behavior, and these assumptions may… ▽ More

    Submitted 27 January, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted at ICLR 2025

  17. arXiv:2409.11580  [pdf, other

    cs.RO

    PLATO: Planning with LLMs and Affordances for Tool Manipulation

    Authors: Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani

    Abstract: As robotic systems become increasingly integrated into complex real-world environments, there is a growing need for approaches that enable robots to understand and act upon natural language instructions without relying on extensive pre-programmed knowledge of their surroundings. This paper presents PLATO, an innovative system that addresses this challenge by leveraging specialized large language m… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 7 pages, 4 figures, submitted to ICRA 2025

  18. arXiv:2407.06443  [pdf, other

    cs.AI

    Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment

    Authors: Qizhang Feng, Siva Rajesh Kasa, Santhosh Kumar Kasa, Hyokun Yun, Choon Hui Teo, Sravan Babu Bodapati

    Abstract: Large Language Models (LLMs) have seen widespread adoption due to their remarkable natural language capabilities. However, when deploying them in real-world settings, it is important to align LLMs to generate texts according to acceptable human standards. Methods such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) have enabled significant progress in refining LLMs u… ▽ More

    Submitted 27 April, 2025; v1 submitted 8 July, 2024; originally announced July 2024.

  19. arXiv:2407.02233  [pdf, other

    cs.CL cs.AI cs.LG

    Synthetic Multimodal Question Generation

    Authors: Ian Wu, Sravan Jayanthi, Vijay Viswanathan, Simon Rosenberg, Sina Pakazad, Tongshuang Wu, Graham Neubig

    Abstract: Multimodal Retrieval Augmented Generation (MMRAG) is a powerful approach to question-answering over multimodal documents. A key challenge with evaluating MMRAG is the paucity of high-quality datasets matching the question styles and modalities of interest. In light of this, we propose SMMQG, a synthetic data generation framework. SMMQG leverages interplay between a retriever, large language model… ▽ More

    Submitted 3 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to EMNLP 2024 Findings; Camera Ready

  20. arXiv:2406.01727  [pdf, other

    cs.LG cs.MA eess.SP

    Federated Learning-based Collaborative Wideband Spectrum Sensing and Scheduling for UAVs in UTM Systems

    Authors: Sravan Reddy Chintareddy, Keenan Roach, Kenny Cheung, Morteza Hashemi

    Abstract: In this paper, we propose a data-driven framework for collaborative wideband spectrum sensing and scheduling for networked unmanned aerial vehicles (UAVs), which act as the secondary users (SUs) to opportunistically utilize detected "spectrum holes". Our overall framework consists of three main stages. Firstly, in the model training stage, we explore dataset generation in a multi-cell environment… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This is a preprint version submitted to IEEE Transactions on Machine learning in Communications and Networking. arXiv admin note: text overlap with arXiv:2308.05036

  21. arXiv:2406.00144  [pdf, other

    cs.LG cs.AI cs.CE

    Query2CAD: Generating CAD models using natural language queries

    Authors: Akshay Badagabettu, Sai Sravan Yarlagadda, Amir Barati Farimani

    Abstract: Computer Aided Design (CAD) engineers typically do not achieve their best prototypes in a single attempt. Instead, they iterate and refine their designs to achieve an optimal solution through multiple revisions. This traditional approach, though effective, is time-consuming and relies heavily on the expertise of skilled engineers. To address these challenges, we introduce Query2CAD, a novel framew… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  22. ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models

    Authors: Aparna Elangovan, Ling Liu, Lei Xu, Sravan Bodapati, Dan Roth

    Abstract: In this position paper, we argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking that draws upon insights from disciplines such as user experience research and human behavioral psychology to ensure that the experimental design and results are reliable. The conclusions from these evaluations, thus, must consider factors such as usability, a… ▽ More

    Submitted 31 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted in ACL 2024

  23. arXiv:2405.11573  [pdf, other

    cs.LG

    Quantile Activation: Correcting a Failure Mode of ML Models

    Authors: Aditya Challa, Sravan Danda, Laurent Najman, Snehanshu Saha

    Abstract: Standard ML models fail to infer the context distribution and suitably adapt. For instance, the learning fails when the underlying distribution is actually a mixture of distributions with contradictory labels. Learning also fails if there is a shift between train and test distributions. Standard neural network architectures like MLPs or CNNs are not equipped to handle this. In this article, we p… ▽ More

    Submitted 2 April, 2025; v1 submitted 19 May, 2024; originally announced May 2024.

  24. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sravan Bodapati, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 24 March, 2025; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  25. arXiv:2404.09067  [pdf, other

    cs.CV cs.AI

    Exploring Explainability in Video Action Recognition

    Authors: Avinab Saha, Shashank Gupta, Sravan Kumar Ankireddy, Karl Chahine, Joydeep Ghosh

    Abstract: Image Classification and Video Action Recognition are perhaps the two most foundational tasks in computer vision. Consequently, explaining the inner workings of trained deep neural networks is of prime importance. While numerous efforts focus on explaining the decisions of trained deep neural networks in image classification, exploration in the domain of its temporal version, video action recognit… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 6 pages, 10 figures, Accepted to the 3rd Explainable AI for Computer Vision (XAI4CV) Workshop at CVPR 2024

  26. arXiv:2404.02359  [pdf, ps, other

    cs.LG

    Attribution Regularization for Multimodal Paradigms

    Authors: Sahiti Yerramilli, Jayant Sravan Tamarapalli, Jonathan Francis, Eric Nyberg

    Abstract: Multimodal machine learning has gained significant attention in recent years due to its potential for integrating information from multiple modalities to enhance learning and decision-making processes. However, it is commonly observed that unimodal models outperform multimodal models, despite the latter having access to richer information. Additionally, the influence of a single modality often dom… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  27. arXiv:2404.02353  [pdf, other

    cs.CV cs.AI cs.LG

    Semantic Augmentation in Images using Language

    Authors: Sahiti Yerramilli, Jayant Sravan Tamarapalli, Tanmay Girish Kulkarni, Jonathan Francis, Eric Nyberg

    Abstract: Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to tr… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  28. arXiv:2403.10751  [pdf, other

    cs.IT cs.AI

    LightCode: Light Analytical and Neural Codes for Channels with Feedback

    Authors: Sravan Kumar Ankireddy, Krishna Narayanan, Hyeji Kim

    Abstract: The design of reliable and efficient codes for channels with feedback remains a longstanding challenge in communication theory. While significant improvements have been achieved by leveraging deep learning techniques, neural codes often suffer from high computational costs, a lack of interpretability, and limited practicality in resource-constrained settings. We focus on designing low-complexity c… ▽ More

    Submitted 16 November, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 16 pages, 12 figures, To appear in IEEE Journal on Selected Areas in Communications, 2024

  29. arXiv:2402.08864  [pdf, other

    cs.IT cs.LG

    DeepPolar: Inventing Nonlinear Large-Kernel Polar Codes via Deep Learning

    Authors: S Ashwin Hebbar, Sravan Kumar Ankireddy, Hyeji Kim, Sewoong Oh, Pramod Viswanath

    Abstract: Progress in designing channel codes has been driven by human ingenuity and, fittingly, has been sporadic. Polar codes, developed on the foundation of Arikan's polarization kernel, represent the latest breakthrough in coding theory and have emerged as the state-of-the-art error-correction code for short-to-medium block length regimes. In an effort to automate the invention of good channel codes, es… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 22 pages, 24 figures

  30. arXiv:2402.08749  [pdf

    cs.CV cs.LG

    Automated detection of motion artifacts in brain MR images using deep learning and explainable artificial intelligence

    Authors: Marina Manso Jimeno, Keerthi Sravan Ravi, Maggie Fung, John Thomas Vaughan, Jr., Sairam Geethanath

    Abstract: Quality assessment, including inspecting the images for artifacts, is a critical step during MRI data acquisition to ensure data quality and downstream analysis or interpretation success. This study demonstrates a deep learning model to detect rigid motion in T1-weighted brain images. We leveraged a 2D CNN for three-class classification and tested it on publicly available retrospective and prospec… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 25 pages, 9 figures, 1 table. Submitted to NMR in Biomedicine

  31. arXiv:2402.08405  [pdf, other

    cs.LG

    A Novel Approach to Regularising 1NN classifier for Improved Generalization

    Authors: Aditya Challa, Sravan Danda, Laurent Najman

    Abstract: In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  32. arXiv:2401.17188  [pdf, other

    cs.IT cs.AI

    Nested Construction of Polar Codes via Transformers

    Authors: Sravan Kumar Ankireddy, S Ashwin Hebbar, Heping Wan, Joonyoung Cho, Charlie Zhang

    Abstract: Tailoring polar code construction for decoding algorithms beyond successive cancellation has remained a topic of significant interest in the field. However, despite the inherent nested structure of polar codes, the use of sequence models in polar code construction is understudied. In this work, we propose using a sequence modeling framework to iteratively construct a polar code for any given lengt… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 7 pages; 8 figures

  33. arXiv:2312.08356  [pdf, other

    cs.DB cs.DC

    CUTTANA: Scalable Graph Partitioning for Faster Distributed Graph Databases and Analytics

    Authors: Milad Rezaei Hajidehi, Sraavan Sridhar, Margo Seltzer

    Abstract: Graph partitioning plays a pivotal role in various distributed graph processing applications, including graph analytics, graph neural network training, and distributed graph databases. Graphs that require distributed settings are often too large to fit in the main memory of a single machine. This challenge renders traditional in-memory graph partitioners infeasible, leading to the emergence of str… ▽ More

    Submitted 9 December, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted at VLDB 2025 (Vol.18, No.1). Please use VLDB version of paper for the updated version and bibtex

  34. arXiv:2311.18618  [pdf, other

    cs.CV

    JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation

    Authors: Shishir Muralidhara, Sravan Kumar Jagadeesh, René Schuster, Didier Stricker

    Abstract: Part-aware panoptic segmentation is a problem of computer vision that aims to provide a semantic understanding of the scene at multiple levels of granularity. More precisely, semantic areas, object instances, and semantic parts are predicted simultaneously. In this paper, we present our Joint Panoptic Part Fusion (JPPF) that combines the three individual segmentations effectively to obtain a panop… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted for Springer Nature Computer Science. arXiv admin note: substantial text overlap with arXiv:2212.07671

  35. arXiv:2311.11518  [pdf, other

    cs.CL cs.LG

    Multi-teacher Distillation for Multilingual Spelling Correction

    Authors: Jingfen Zhang, Xuan Guo, Sravan Bodapati, Christopher Potts

    Abstract: Accurate spelling correction is a critical step in modern search interfaces, especially in an era of mobile devices and speech-to-text interfaces. For services that are deployed around the world, this poses a significant challenge for multilingual NLP: spelling errors need to be caught and corrected in all languages, and even in queries that use multiple languages. In this paper, we tackle this ch… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  36. arXiv:2311.08402  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

    Authors: Sai Muralidhar Jayanthi, Devang Kulshreshtha, Saket Dingliwal, Srikanth Ronanki, Sravan Bodapati

    Abstract: Personalization of automatic speech recognition (ASR) models is a widely studied topic because of its many practical applications. Most recently, attention-based contextual biasing techniques are used to improve the recognition of rare words and domain specific entities. However, due to performance constraints, the biasing is often limited to a few thousand entities, restricting real-world usabili… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  37. arXiv:2311.02482  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Generalized zero-shot audio-to-intent classification

    Authors: Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki

    Abstract: Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trai… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  38. arXiv:2310.08660  [pdf, other

    cs.LG cs.AI eess.SP

    Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach

    Authors: Heasung Kim, Sravan Kumar Ankireddy

    Abstract: In this work, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) communicate with multiple user equipment (UEs). Because of the exponential computational complexity of brute force search, we instead sol… ▽ More

    Submitted 11 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: 10 pages, 8 figures

  39. arXiv:2308.11357  [pdf, other

    cs.CV

    Exemplar-Free Continual Transformer with Convolutions

    Authors: Anurag Roy, Vinay Kumar Verma, Sravan Voonna, Kripabandhu Ghosh, Saptarshi Ghosh, Abir Das

    Abstract: Continual Learning (CL) involves training a machine learning model in a sequential manner to learn new information while retaining previously learned tasks without the presence of previous training data. Although there has been significant interest in CL, most recent CL approaches in computer vision have focused on convolutional architectures only. However, with the recent success of vision transf… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted in ICCV 2023

  40. arXiv:2308.05036  [pdf, other

    eess.SP cs.LG cs.MA cs.NI

    Collaborative Wideband Spectrum Sensing and Scheduling for Networked UAVs in UTM Systems

    Authors: Sravan Reddy Chintareddy, Keenan Roach, Kenny Cheung, Morteza Hashemi

    Abstract: In this paper, we propose a data-driven framework for collaborative wideband spectrum sensing and scheduling for networked unmanned aerial vehicles (UAVs), which act as the secondary users to opportunistically utilize detected spectrum holes. To this end, we propose a multi-class classification problem for wideband spectrum sensing to detect vacant spectrum spots based on collected I/Q samples. To… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  41. arXiv:2307.13850  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    MAEA: Multimodal Attribution for Embodied AI

    Authors: Vidhi Jain, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Yonatan Bisk

    Abstract: Understanding multimodal perception for embodied AI is an open question because such inputs may contain highly complementary as well as redundant information for the task. A relevant direction for multimodal policies is understanding the global trends of each modality at the fusion layer. To this end, we disentangle the attributions for visual, language, and previous action inputs across different… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  42. arXiv:2307.00759  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

    Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati

    Abstract: Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom enti… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Published at INTERSPEECH 2023

  43. arXiv:2307.00453  [pdf, other

    cs.CL cs.SD eess.AS

    Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

    Authors: Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks. However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose an… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  44. arXiv:2306.08175  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASR

    Authors: Goeric Huybrechts, Srikanth Ronanki, Xilai Li, Hadis Nosrati, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Conformer-based end-to-end models have become ubiquitous these days and are commonly used in both streaming and non-streaming automatic speech recognition (ASR). Techniques like dual-mode and dynamic chunk training helped unify streaming and non-streaming systems. However, there remains a performance gap between streaming with a full and limited past context. To address this issue, we propose the… ▽ More

    Submitted 1 March, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

  45. arXiv:2305.15523  [pdf, other

    cs.IT cs.CV

    Task-aware Distributed Source Coding under Dynamic Bandwidth

    Authors: Po-han Li, Sravan Kumar Ankireddy, Ruihan Zhao, Hossein Nourkhiz Mahjoub, Ehsan Moradi-Pari, Ufuk Topcu, Sandeep Chinchali, Hyeji Kim

    Abstract: Efficient compression of correlated data is essential to minimize communication overload in multi-sensor networks. In such networks, each sensor independently compresses the data and transmits them to a central node due to limited communication bandwidth. A decoder at the central node decompresses and passes the data to a pre-trained machine learning-based task to generate the final output. Thus,… ▽ More

    Submitted 2 December, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Journal ref: NeurIPS 2023

  46. arXiv:2305.07677  [pdf, other

    cs.SD cs.CL cs.LG

    Masked Audio Text Encoders are Effective Multi-Modal Rescorers

    Authors: Jinglun Cai, Monica Sunkara, Xilai Li, Anshu Bhatia, Xiao Pan, Sravan Bodapati

    Abstract: Masked Language Models (MLMs) have proven to be effective for second-pass rescoring in Automatic Speech Recognition (ASR) systems. In this work, we propose Masked Audio Text Encoder (MATE), a multi-modal masked language model rescorer which incorporates acoustic representations into the input space of MLM. We adopt contrastive learning for effectively aligning the modalities by learning shared rep… ▽ More

    Submitted 24 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  47. arXiv:2305.03837  [pdf, other

    eess.AS cs.LG cs.SD

    Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

    Authors: Nilaksh Das, Monica Sunkara, Sravan Bodapati, Jinglun Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture,… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2023

  48. arXiv:2304.09325  [pdf, other

    eess.AS cs.SD

    Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR

    Authors: Xilai Li, Goeric Huybrechts, Srikanth Ronanki, Jeff Farris, Sravan Bodapati

    Abstract: Recently, there has been an increasing interest in unifying streaming and non-streaming speech recognition models to reduce development, training and deployment cost. The best-known approaches rely on either window-based or dynamic chunk-based attention strategy and causal convolutions to minimize the degradation due to streaming. However, the performance gap still remains relatively large between… ▽ More

    Submitted 25 April, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: 5 pages, 3 figures, 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)

  49. arXiv:2302.10170  [pdf, other

    cs.IT

    Compressed Error HARQ: Feedback Communication on Noise-Asymmetric Channels

    Authors: Sravan Kumar Ankireddy, S. Ashwin Hebbar, Yihan Jiang, Hyeji Kim, Pramod Viswanath

    Abstract: In modern communication systems with feedback, there are increasingly more scenarios where the transmitter has much less power than the receiver (e.g., medical implant devices), which we refer to as noise-asymmetric channels. For such channels, the feedback link is of higher quality than the forward link. However, feedback schemes for cellular communications, such as hybrid ARQ, do not fully utili… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  50. arXiv:2212.09095  [pdf, other

    cs.CL cs.AI

    Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

    Authors: Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth

    Abstract: Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse… ▽ More

    Submitted 16 August, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: Accepted at Annual Meeting of the Association for Computational Linguistics (ACL) 2023, Main Proceedings