Skip to main content

Showing 1–50 of 544 results for author: Balaji

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15847  [pdf, ps, other

    cs.RO cs.AI

    SafeMimic: Towards Safe and Autonomous Human-to-Robot Imitation for Mobile Manipulation

    Authors: Arpit Bahety, Arnav Balaji, Ben Abbatematteo, Roberto Martín-Martín

    Abstract: For robots to become efficient helpers in the home, they must learn to perform new mobile manipulation tasks simply by watching humans perform them. Learning from a single video demonstration from a human is challenging as the robot needs to first extract from the demo what needs to be done and how, translate the strategy from a third to a first-person perspective, and then adapt it to be successf… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  2. arXiv:2506.12627  [pdf, ps, other

    eess.AS cs.SD

    Towards Neural Audio Codec Source Parsing

    Authors: Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Arun Balaji Buduru, Rajesh Sharma

    Abstract: A new class of audio deepfakes-codecfakes (CFs)-has recently caught attention, synthesized by Audio Language Models that leverage neural audio codecs (NACs) in the backend. In response, the community has introduced dedicated benchmarks and tailored detection strategies. As the field advances, efforts have moved beyond binary detection toward source attribution, including open-set attribution, whic… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  3. arXiv:2506.10248  [pdf, ps, other

    cs.DC

    Resilience through Automated Adaptive Configuration for Distribution and Replication

    Authors: Scott D. Stoller, Balaji Jayasankar, Yanhong A. Liu

    Abstract: This paper presents a powerful automated framework for making complex systems resilient under failures, by optimized adaptive distribution and replication of interdependent software components across heterogeneous hardware components with widely varying capabilities. A configuration specifies how software is distributed and replicated: which software components to run on each computer, which softw… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  4. arXiv:2506.08210  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation

    Authors: Andrew Z. Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, Yogesh Balaji

    Abstract: Both text-to-image generation and large language models (LLMs) have made significant advancements. However, many text-to-image models still employ the somewhat outdated T5 and CLIP as their text encoders. In this work, we investigate the effectiveness of using modern decoder-only LLMs as text encoders for text-to-image diffusion models. We build a standardized training and evaluation pipeline that… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: CVPR 2025

    Journal ref: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 28575-28585

  5. arXiv:2506.04531  [pdf, ps, other

    cs.LG

    HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training

    Authors: Geon-Woo Kim, Junbo Li, Shashidhar Gandham, Omar Baldonado, Adithya Gangidi, Pavan Balaji, Zhangyang Wang, Aditya Akella

    Abstract: Training large language models (LLMs) increasingly relies on geographically distributed accelerators, causing prohibitive communication costs across regions and uneven utilization of heterogeneous hardware. We propose HALoS, a hierarchical asynchronous optimization framework that tackles these issues by introducing local parameter servers (LPSs) within each region and a global parameter server (GP… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  6. arXiv:2506.03378  [pdf, ps, other

    eess.AS cs.CV cs.MM

    SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer

    Authors: Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

    Abstract: As video-sharing platforms have grown over the past decade, child viewership has surged, increasing the need for precise detection of harmful content like violence or explicit scenes. Malicious users exploit moderation systems by embedding unsafe content in minimal frames to evade detection. While prior research has focused on visual cues and advanced such fine-grained detection, audio features re… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  7. arXiv:2506.03364  [pdf, ps, other

    eess.AS cs.MM cs.SD

    Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models

    Authors: Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we introduce the task of singing voice deepfake source attribution (SVDSA). We hypothesize that multimodal foundation models (MMFMs) such as ImageBind, LanguageBind will be most effective for SVDSA as they are better equipped for capturing subtle source-specific characteristics-such as unique timbre, pitch manipulation, or synthesis artifacts of each singing voice deepfake source due… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  8. arXiv:2506.02519  [pdf, ps, other

    cs.CL

    Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning

    Authors: Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy

    Abstract: LLMssuch as GPT-4 have shown a remarkable ability to solve complex questions by generating step-by-step rationales. Prior works have utilized this capability to improve smaller and cheaper LMs (say, with 7B parameters). However, various practical constraints, such as copyright and legal issues, owing to lack of transparency in the pre-training data of large (often closed) models, prevent their use… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted at ACL Main 2025

  9. arXiv:2506.02258  [pdf, ps, other

    eess.AS cs.SD

    Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?

    Authors: Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Girish, Swarup Ranjan Behera, Ananda Chandra Nayak, Sanjib Kumar Nayak, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we focus on non-verbal vocal sounds emotion recognition (NVER). We investigate mamba-based audio foundation models (MAFMs) for the first time for NVER and hypothesize that MAFMs will outperform attention-based audio foundation models (AAFMs) for NVER by leveraging its state-space modeling to capture intrinsic emotional structures more effectively. Unlike AAFMs, which may amplify irre… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to EUSIPCO 2025

  10. arXiv:2506.02232  [pdf, ps, other

    eess.AS cs.SD

    Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction

    Authors: Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this study, we focus on Singing Voice Mean Opinion Score (SingMOS) prediction. Previous research have shown the performance benefit with the use of state-of-the-art (SOTA) pre-trained models (PTMs). However, they haven't explored speaker recognition speech PTMs (SPTMs) such as x-vector, ECAPA and we hypothesize that it will be the most effective for SingMOS prediction. We believe that due to th… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  11. arXiv:2506.02230  [pdf, ps, other

    eess.AS cs.SD

    Towards Machine Unlearning for Paralinguistic Speech Processing

    Authors: Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Shubham Singh, Swarup Ranjan Behera, Vandana Rajan, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we pioneer the study of Machine Unlearning (MU) for Paralinguistic Speech Processing (PSP). We focus on two key PSP tasks: Speech Emotion Recognition (SER) and Depression Detection (DD). To this end, we propose, SISA++, a novel extension to previous state-of-the-art (SOTA) MU method, SISA by merging models trained on different shards with weight-averaging. With such modifications, we… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  12. arXiv:2506.01157  [pdf, ps, other

    eess.AS cs.SD

    Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations

    Authors: Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Drishti Singh, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we focus on source tracing of synthetic speech generation systems (STSGS). Each source embeds distinctive paralinguistic features--such as pitch, tone, rhythm, and intonation--into their synthesized speech, reflecting the underlying design of the generation model. While previous research has explored representations from speech pre-trained models (SPTMs), the use of representations f… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted to EUSIPCO 2025

  13. arXiv:2506.01148  [pdf, ps, other

    eess.AS cs.SD

    Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism

    Authors: Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Santanu Roy, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this study, we focus on heart murmur classification (HMC) and hypothesize that combining neural audio codec representations (NACRs) such as EnCodec with spectral features (SFs), such as MFCC, will yield superior performance. We believe such fusion will trigger their complementary behavior as NACRs excel at capturing fine-grained acoustic patterns such as rhythm changes, spectral features focus… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  14. arXiv:2506.01138  [pdf, other

    eess.AS cs.SD

    PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition

    Authors: Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Jaya Sai Kiran Patibandla, Arun Balaji Buduru, Rajesh Sharma

    Abstract: The emergence of Mamba as an alternative to attention-based architectures has led to the development of Mamba-based self-supervised learning (SSL) pre-trained models (PTMs) for speech and audio processing. Recent studies suggest that these models achieve comparable or superior performance to state-of-the-art (SOTA) attention-based PTMs for speech emotion recognition (SER). Motivated by prior work… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  15. arXiv:2506.00815  [pdf, ps, other

    cs.CL

    From Plain Text to Poetic Form: Generating Metrically-Constrained Sanskrit Verses

    Authors: Manoj Balaji Jagadeeshan, Samarth Bhatia, Pretam Ray, Harshul Raj Surana, Akhil Rajeev P, Priya Mishra, Annarao Kulkarni, Ganesh Ramakrishnan, Prathosh AP, Pawan Goyal

    Abstract: Recent advances in large language models (LLMs) have significantly improved natural language generation, including creative tasks like poetry composition. However, most progress remains concentrated in high-resource languages. This raises an important question: Can LLMs be adapted for structured poetic generation in a low-resource, morphologically rich language such as Sanskrit? In this work, we i… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  16. arXiv:2506.00145  [pdf

    cs.CL cs.SD eess.AS

    Vedavani: A Benchmark Corpus for ASR on Vedic Sanskrit Poetry

    Authors: Sujeet Kumar, Pretam Ray, Abhinay Beerukuri, Shrey Kamoji, Manoj Balaji Jagadeeshan, Pawan Goyal

    Abstract: Sanskrit, an ancient language with a rich linguistic heritage, presents unique challenges for automatic speech recognition (ASR) due to its phonemic complexity and the phonetic transformations that occur at word junctures, similar to the connected speech found in natural conversations. Due to these complexities, there has been limited exploration of ASR in Sanskrit, particularly in the context of… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  17. Advancing Digital Accessibility In Digital Pharmacy, Healthcare, And Wearable Devices: Inclusive Solutions for Enhanced Patient Engagement

    Authors: Vishnu Ramineni, Balaji Shesharao Ingole, Nikhil Kumar Pulipeta, Balakrishna Pothineni, Aditya Gupta

    Abstract: Modern healthcare facilities demand digital accessibility to guarantee equal access to telemedicine platforms, online pharmacy services, and health monitoring devices that can be worn or are handy. With the rising call for the implementation of robust digital healthcare solutions, people with disabilities encounter impediments in their endeavor of managing and getting accustomed to these modern te… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 15 pages

    Journal ref: International Journal of Healthcare Information Systems and Informatics (IJHISI) Volume 2, Issue 1, January-June 2025, pp. 10-24, Article ID: IJHISI_02_01_002

  18. Bridging the Gap: Enhancing Digital Accessibility for Medicaid Populations in Telehealth Adoption

    Authors: Vishnu Ramineni, Aditya Gupta, Balakrishna Pothineni, Isan Sahoo, Shivareddy Devarapalli, Balaji Shesharao Ingole

    Abstract: The swift evolution of telehealth has revolutionized how medical professionals deliver healthcare services and boost convenience and accessibility. Yet, the Medicaid population encounters several impediments in utilizing facilities especially owing to poor internet connectivity, less awareness about digital platforms, and a shortage of assistive technologies. The paper aims to explicate key factor… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 16 pages

    Journal ref: European Journal of Computer Science and Information Technology,13(23),1-16, 2025 European Journal of Computer Science and Information Technology,13(23),1-16, 2025 Print ISSN: 2054-0957 (Print)

  19. arXiv:2505.21251  [pdf, ps, other

    cs.LG

    Copresheaf Topological Neural Networks: A Generalized Deep Learning Framework

    Authors: Mustafa Hajij, Lennart Bastian, Sarah Osentoski, Hardik Kabaria, John L. Davenport, Sheik Dawood, Balaji Cherukuri, Joseph G. Kocheemoolayil, Nastaran Shahmansouri, Adrian Lew, Theodore Papamarkou, Tolga Birdal

    Abstract: We introduce copresheaf topological neural networks (CTNNs), a powerful and unifying framework that encapsulates a wide spectrum of deep learning architectures, designed to operate on structured data: including images, point clouds, graphs, meshes, and topological manifolds. While deep learning has profoundly impacted domains ranging from digital assistants to autonomous systems, the principled de… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  20. arXiv:2505.20343  [pdf, ps, other

    cs.CL cs.AI

    Do LLMs have a Gender (Entropy) Bias?

    Authors: Sonal Prabhune, Balaji Padmanabhan, Kaushik Dutta

    Abstract: We investigate the existence and persistence of a specific type of gender bias in some of the popular LLMs and contribute a new benchmark dataset, RealWorldQuestioning (released on HuggingFace ), developed from real-world questions across four key domains in business and health contexts: education, jobs, personal financial management, and general health. We define and study entropy bias, which we… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 18 pages, 4 figures

    MSC Class: 68T42; 68T50 ACM Class: I.2.7

  21. arXiv:2505.19494  [pdf, other

    cs.CL cs.IR

    Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval On English Queries and Sanskrit Documents

    Authors: Manoj Balaji Jagadeeshan, Prince Raj, Pawan Goyal

    Abstract: The study presents a comprehensive benchmark for retrieving Sanskrit documents using English queries, focusing on the chapters of the Srimadbhagavatam. It employs a tripartite approach: Direct Retrieval (DR), Translation-based Retrieval (DT), and Query Translation (QT), utilizing shared embedding spaces and advanced translation methods to enhance retrieval systems in a RAG framework. The study fin… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  22. arXiv:2505.18217  [pdf, other

    cs.SD cs.AI eess.AS

    ABHINAYA -- A System for Speech Emotion Recognition In Naturalistic Conditions Challenge

    Authors: Soumya Dutta, Smruthi Balaji, Varada R, Viveka Salinamakki, Sriram Ganapathy

    Abstract: Speech emotion recognition (SER) in naturalistic settings remains a challenge due to the intrinsic variability, diverse recording conditions, and class imbalance. As participants in the Interspeech Naturalistic SER Challenge which focused on these complexities, we present Abhinaya, a system integrating speech-based, text-based, and speech-text models. Our approach fine-tunes self-supervised and sp… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 5 pages, 2 figures, 4 tables, accepted at Interspeech 2025

  23. arXiv:2505.03394  [pdf, ps, other

    cs.CV

    EOPose : Exemplar-based object reposing using Generalized Pose Correspondences

    Authors: Sarthak Mehrotra, Rishabh Jain, Mayur Hemani, Balaji Krishnamurthy, Mausoom Sarkar

    Abstract: Reposing objects in images has a myriad of applications, especially for e-commerce where several variants of product images need to be produced quickly. In this work, we leverage the recent advances in unsupervised keypoint correspondence detection between different object images of the same class to propose an end-to-end framework for generic object reposing. Our method, EOPose, takes a target po… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted in CVPR 2025 AI4CC workshop

  24. arXiv:2504.19735  [pdf, other

    cs.CV

    Measuring Train Driver Performance as Key to Approval of Driverless Trains

    Authors: Rustam Tagiew, Prasannavenkatesh Balaji

    Abstract: Points 2.1.4(b), 2.4.2(b) and 2.4.3(b) in Annex I of Implementing Regulation (EU) No. 402/2013 allow a simplified approach for the safety approval of computer vision systems for driverless trains, if they have 'similar' functions and interfaces as the replaced human driver. The human driver is not replaced one-to-one by a technical system - only a limited set of cognitive functions are replaced. H… ▽ More

    Submitted 29 April, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

    Comments: 6 pages, 3 figures, abstract accepted by IAVVC 2025, full paper to be submitted to IAVVC 2025

  25. arXiv:2504.18649  [pdf, other

    cs.DC

    Raptr: Prefix Consensus for Robust High-Performance BFT

    Authors: Andrei Tonkikh, Balaji Arun, Zhuolun Xiang, Zekun Li, Alexander Spiegelman

    Abstract: In this paper, we present Raptr--a Byzantine fault-tolerant state machine replication (BFT SMR) protocol that combines strong robustness with high throughput, while attaining near-optimal theoretical latency. Raptr delivers exceptionally low latency and high throughput under favorable conditions, and it degrades gracefully in the presence of Byzantine faults and network attacks. Existing high-th… ▽ More

    Submitted 29 April, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

  26. arXiv:2504.18082  [pdf, other

    cs.LG cs.AI

    Efficient GNN Training Through Structure-Aware Randomized Mini-Batching

    Authors: Vignesh Balaji, Christos Kozyrakis, Gal Chechik, Haggai Maron

    Abstract: Graph Neural Networks (GNNs) enable learning on realworld graphs and mini-batch training has emerged as the de facto standard for training GNNs because it can scale to very large graphs and improve convergence. Current mini-batch construction policies largely ignore efficiency considerations of GNN training. Specifically, existing mini-batching techniques employ randomization schemes to improve ac… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  27. arXiv:2504.17017  [pdf, other

    cs.AI cs.FL cs.LG cs.LO

    Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification

    Authors: Balaji Rao, William Eiers, Carlo Lipizzi

    Abstract: Formally verifying properties of software code has been a highly desirable task, especially with the emergence of LLM-generated code. In the same vein, they provide an interesting avenue for the exploration of formal verification and mechanistic interpretability. Since the introduction of code-specific models, despite their successes in generating code in Lean4 and Isabelle, the task of generalize… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: Accepted to the Proceedings of the 19th Conference on Neurosymbolic Learning and Reasoning (NeSy 2025)

  28. arXiv:2504.12075  [pdf

    cs.LG physics.chem-ph

    Generative Deep Learning Framework for Inverse Design of Fuels

    Authors: Kiran K. Yalamanchi, Pinaki Pal, Balaji Mohan, Abdullah S. AlRamadan, Jihad A. Badra, Yuanjiang Pei

    Abstract: In the present work, a generative deep learning framework combining a Co-optimized Variational Autoencoder (Co-VAE) architecture with quantitative structure-property relationship (QSPR) techniques is developed to enable accelerated inverse design of fuels. The Co-VAE integrates a property prediction component coupled with the VAE latent space, enhancing molecular reconstruction and accurate estima… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  29. arXiv:2504.07229  [pdf, other

    cs.CL eess.AS eess.SP

    Visual-Aware Speech Recognition for Noisy Scenarios

    Authors: Lakshmipathi Balaji, Karan Singla

    Abstract: Humans have the ability to utilize visual cues, such as lip movements and visual scenes, to enhance auditory perception, particularly in noisy environments. However, current Automatic Speech Recognition (ASR) or Audio-Visual Speech Recognition (AVSR) models often struggle in noisy scenarios. To solve this task, we propose a model that improves transcription by correlating noise sources to visual c… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  30. arXiv:2504.07132  [pdf

    cs.CR cs.CE cs.LG

    SolRPDS: A Dataset for Analyzing Rug Pulls in Solana Decentralized Finance

    Authors: Abdulrahman Alhaidari, Bhavani Kalal, Balaji Palanisamy, Shamik Sural

    Abstract: Rug pulls in Solana have caused significant damage to users interacting with Decentralized Finance (DeFi). A rug pull occurs when developers exploit users' trust and drain liquidity from token pools on Decentralized Exchanges (DEXs), leaving users with worthless tokens. Although rug pulls in Ethereum and Binance Smart Chain (BSC) have gained attention recently, analysis of rug pulls in Solana rema… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: Accepted paper to appear in the 15th ACM Conference on Data and Application Security and Privacy (CODASPY 2025)

  31. arXiv:2504.03789  [pdf, other

    cs.CY cs.MA

    Steve: LLM Powered ChatBot for Career Progression

    Authors: Naveen Mathews Renji, Balaji R Rao, Carlo Lipizzi

    Abstract: The advancements in systems deploying large language models (LLMs), as well as improvements in their ability to act as agents with predefined templates, provide an opportunity to conduct qualitative, individualized assessments, creating a bridge between qualitative and quantitative methods for candidates seeking career progression. In this paper, we develop a platform that allows candidates to run… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  32. arXiv:2503.19371  [pdf, other

    cs.LG cs.AI

    Flow to Learn: Flow Matching on Neural Network Parameters

    Authors: Daniel Saragih, Deyu Cao, Tejas Balaji, Ashwin Santhosh

    Abstract: Foundational language models show a remarkable ability to learn new concepts during inference via context data. However, similar work for images lag behind. To address this challenge, we introduce FLoWN, a flow matching model that learns to generate neural network parameters for different tasks. Our approach models the flow on latent space, while conditioning the process on context data. Experimen… ▽ More

    Submitted 19 April, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted at the ICLR Workshop on Neural Network Weights as a New Data Modality 2025

  33. VocalEyes: Enhancing Environmental Perception for the Visually Impaired through Vision-Language Models and Distance-Aware Object Detection

    Authors: Kunal Chavan, Keertan Balaji, Spoorti Barigidad, Samba Raju Chiluveru

    Abstract: With an increasing demand for assistive technologies that promote the independence and mobility of visually impaired people, this study suggests an innovative real-time system that gives audio descriptions of a user's surroundings to improve situational awareness. The system acquires live video input and processes it with a quantized and fine-tuned Florence-2 big model, adjusted to 4-bit accuracy… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  34. arXiv:2503.15456  [pdf, ps, other

    cs.LG

    Temporal Encoding Strategies for Energy Time Series Prediction

    Authors: Aayam Bansal, Keertan Balaji, Zeus Lalani

    Abstract: In contemporary power systems, energy consumption prediction plays a crucial role in maintaining grid stability and resource allocation enabling power companies to minimize energy waste and avoid overloading the grid. While there are several research works on energy optimization, they often fail to address the complexities of real-time fluctuations and the cyclic pattern of energy consumption. Thi… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  35. arXiv:2503.12780  [pdf, other

    cs.CV cs.AI cs.LG eess.IV stat.ML

    LangDA: Building Context-Awareness via Language for Domain Adaptive Semantic Segmentation

    Authors: Chang Liu, Bavesh Balaji, Saad Hossain, C Thomas, Kwei-Herng Lai, Raviteja Vemulapalli, Alexander Wong, Sirisha Rambhatla

    Abstract: Unsupervised domain adaptation for semantic segmentation (DASS) aims to transfer knowledge from a label-rich source domain to a target domain with no labels. Two key approaches in DASS are (1) vision-only approaches using masking or multi-resolution crops, and (2) language-based approaches that use generic class-wise prompts informed by target domain (e.g. "a {snowy} photo of a {class}"). However,… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    MSC Class: 68Txx ACM Class: I.2.1

  36. arXiv:2503.11444  [pdf, other

    cs.MA cs.AI cs.CL cs.OS

    Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery

    Authors: Balaji Rama, Kai Mei, Yongfeng Zhang

    Abstract: Autonomous LLM-based agents have emerged as a powerful paradigm for complex task execution, yet the field lacks standardized tools for development, deployment, distribution and discovery of agents. We present Cerebrum, an Agent SDK for AIOS that addresses this gap through three key components: (1) a comprehensive SDK featuring a modular four-layer architecture for agent development, encompassing L… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: Accepted to the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) - System Demonstration Track

  37. arXiv:2503.02950  [pdf, other

    cs.AI cs.CL cs.MA

    LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications

    Authors: Danqing Zhang, Balaji Rama, Jingyi Ni, Shiying He, Fu Zhao, Kunyu Chen, Arnold Chen, Junyu Cao

    Abstract: We introduce LiteWebAgent, an open-source suite for VLM-based web agent applications. Our framework addresses a critical gap in the web agent ecosystem with a production-ready solution that combines minimal serverless backend configuration, intuitive user and browser interfaces, and extensible research capabilities in agent planning, memory, and tree search. For the core LiteWebAgent agent framewo… ▽ More

    Submitted 6 May, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  38. arXiv:2503.02463  [pdf, other

    cs.CL

    It Helps to Take a Second Opinion: Teaching Smaller LLMs to Deliberate Mutually via Selective Rationale Optimisation

    Authors: Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy

    Abstract: Very large language models (LLMs) such as GPT-4 have shown the ability to handle complex tasks by generating and self-refining step-by-step rationales. Smaller language models (SLMs), typically with < 13B parameters, have been improved by using the data generated from very-large LMs through knowledge distillation. However, various practical constraints such as API costs, copyright, legal and ethic… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Accepted at ICLR 2025

  39. arXiv:2503.01944  [pdf

    cs.CR

    Protecting DeFi Platforms against Non-Price Flash Loan Attacks

    Authors: Abdulrahman Alhaidari, Balaji Palanisamy, Prashant Krishnamurthy

    Abstract: Smart contracts in Decentralized Finance (DeFi) platforms are attractive targets for attacks as their vulnerabilities can lead to massive amounts of financial losses. Flash loan attacks, in particular, pose a major threat to DeFi protocols that hold a Total Value Locked (TVL) exceeding \… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted paper to appear in the 15th ACM Conference on Data and Application Security and Privacy (CODASPY 2025)

  40. arXiv:2503.01068  [pdf, other

    cs.RO cs.AI

    Language-Guided Object Search in Agricultural Environments

    Authors: Advaith Balaji, Saket Pradhan, Dmitry Berenson

    Abstract: Creating robots that can assist in farms and gardens can help reduce the mental and physical workload experienced by farm workers. We tackle the problem of object search in a farm environment, providing a method that allows a robot to semantically reason about the location of an unseen target object among a set of previously seen objects in the environment using a Large Language Model (LLM). We le… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 8 pages, 4 figures, 2 tables, accepted to the 2025 International Conference on Robotics and Automation (ICRA 2025)

  41. arXiv:2503.00591  [pdf, other

    cs.CV

    AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models

    Authors: Sohan Patnaik, Rishabh Jain, Balaji Krishnamurthy, Mausoom Sarkar

    Abstract: Visual layouts are essential in graphic design fields such as advertising, posters, and web interfaces. The application of generative models for content-aware layout generation has recently gained traction. However, these models fail to understand the contextual aesthetic requirements of layout design and do not align with human-like preferences, primarily treating it as a prediction task without… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: Accepted for publication in CVPR 2025

  42. arXiv:2502.17198  [pdf, other

    cs.CV

    Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation

    Authors: Baptiste Chopin, Tashvik Dhamija, Pranav Balaji, Yaohui Wang, Antitza Dantcheva

    Abstract: We propose Dimitra, a novel framework for audio-driven talking head generation, streamlined to learn lip motion, facial expression, as well as head pose motion. Specifically, we train a conditional Motion Diffusion Transformer (cMDT) by modeling facial motion sequences with 3D representation. We condition the cMDT with only two input signals, an audio-sequence, as well as a reference facial image.… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 5 pages + 2 pages for supplementary material, 2 figures

  43. arXiv:2502.15860  [pdf, other

    cs.CL cs.AI cs.LG

    Synthetic vs. Gold: The Role of LLM-Generated Labels and Data in Cyberbullying Detection

    Authors: Arefeh Kazemi, Sri Balaaji Natarajan Kalaivendan, Joachim Wagner, Hamza Qadeer, Brian Davis

    Abstract: Cyberbullying (CB) presents a pressing threat, especially to children, underscoring the urgent need for robust detection systems to ensure online safety. However, progress in developing such systems is hindered by the scarcity of large, labeled datasets that are specifically tailored for specialized tasks and the target age groups. Creating these datasets relies heavily on human annotation, which… ▽ More

    Submitted 5 April, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

  44. arXiv:2502.12183  [pdf

    cs.CL cs.LG

    Leveraging large language models for structured information extraction from pathology reports

    Authors: Jeya Balaji Balasubramanian, Daniel Adams, Ioannis Roxanis, Amy Berrington de Gonzalez, Penny Coulson, Jonas S. Almeida, Montserrat García-Closas

    Abstract: Background: Structured information extraction from unstructured histopathology reports facilitates data accessibility for clinical research. Manual extraction by experts is time-consuming and expensive, limiting scalability. Large language models (LLMs) offer efficient automated extraction through zero-shot prompting, requiring only natural language instructions without labeled data or training. W… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: 29 pages, 6 figures

  45. arXiv:2502.11195  [pdf

    cs.CV cs.AI

    From Deception to Perception: The Surprising Benefits of Deepfakes for Detecting, Measuring, and Mitigating Bias

    Authors: Yizhi Liu, Balaji Padmanabhan, Siva Viswanathan

    Abstract: While deepfake technologies have predominantly been criticized for potential misuse, our study demonstrates their significant potential as tools for detecting, measuring, and mitigating biases in key societal domains. By employing deepfake technology to generate controlled facial images, we extend the scope of traditional correspondence studies beyond mere textual manipulations. This enhancement i… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    ACM Class: I.2.0; I.2.10; I.4.0; J.4; H.4; K.4.1; K.4.2

  46. arXiv:2502.05330  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

    Authors: Muhammad Imran, Jonathan R. Krebs, Vishal Balaji Sivaraman, Teng Zhang, Amarjeet Kumar, Walker R. Ueland, Michael J. Fassler, Jinlong Huang, Xiao Sun, Lisheng Wang, Pengcheng Shi, Maximilian Rokuss, Michael Baumgartner, Yannick Kirchhof, Klaus H. Maier-Hein, Fabian Isensee, Shuolin Liu, Bing Han, Bong Thanh Nguyen, Dong-jin Shin, Park Ji-Woo, Mathew Choi, Kwang-Hyun Uhm, Sung-Jea Ko, Chanwoong Lee , et al. (38 additional authors not shown)

    Abstract: Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  47. arXiv:2501.18777  [pdf, other

    cs.LG

    Navigating the Fragrance space Via Graph Generative Models And Predicting Odors

    Authors: Mrityunjay Sharma, Sarabeshwar Balaji, Pinaki Saha, Ritesh Kumar

    Abstract: We explore a suite of generative modelling techniques to efficiently navigate and explore the complex landscapes of odor and the broader chemical space. Unlike traditional approaches, we not only generate molecules but also predict the odor likeliness with ROC AUC score of 0.97 and assign probable odor labels. We correlate odor likeliness with physicochemical features of molecules using machine le… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  48. arXiv:2501.17270  [pdf, other

    cs.CL cs.DB

    Comprehensive Evaluation for a Large Scale Knowledge Graph Question Answering Service

    Authors: Saloni Potdar, Daniel Lee, Omar Attia, Varun Embar, De Meng, Ramesh Balaji, Chloe Seivwright, Eric Choi, Mina H. Farid, Yiwen Sun, Yunyao Li

    Abstract: Question answering systems for knowledge graph (KGQA), answer factoid questions based on the data in the knowledge graph. KGQA systems are complex because the system has to understand the relations and entities in the knowledge-seeking natural language queries and map them to structured queries against the KG to answer them. In this paper, we introduce Chronos, a comprehensive evaluation framework… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

  49. arXiv:2501.16334  [pdf

    eess.SP cs.LG

    RNN-Based Models for Predicting Seizure Onset in Epileptic Patients

    Authors: Mathan Kumar Mounagurusamy, Thiyagarajan V S, Abdur Rahman, Shravan Chandak, D. Balaji, Venkateswara Rao Jallepalli

    Abstract: Early management and better clinical outcomes for epileptic patients depend on seizure prediction. The accuracy and false alarm rates of existing systems are often compromised by their dependence on static thresholds and basic Electroencephalogram (EEG) properties. A novel Recurrent Neural Network (RNN)-based method for seizure start prediction is proposed in the article to overcome these limitati… ▽ More

    Submitted 24 December, 2024; originally announced January 2025.

  50. arXiv:2501.15747  [pdf, other

    cs.CL cs.AI

    IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding

    Authors: Sankalp KJ, Ashutosh Kumar, Laxmaan Balaji, Nikunj Kotecha, Vinija Jain, Aman Chadha, Sreyoshi Bhaduri

    Abstract: Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro… ▽ More

    Submitted 27 January, 2025; v1 submitted 26 January, 2025; originally announced January 2025.