Skip to main content

Showing 1–50 of 709 results for author: Sriram

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16644  [pdf, ps, other

    cs.LG cs.IR

    Semantic Outlier Removal with Embedding Models and LLMs

    Authors: Eren Akbiyik, João Almeida, Rik Melis, Ritu Sriram, Viviana Petrescu, Vilhjálmur Vilhjálmsson

    Abstract: Modern text processing pipelines demand robust methods to remove extraneous content while preserving a document's core message. Traditional approaches such as HTML boilerplate extraction or keyword filters often fail in multilingual settings and struggle with context-sensitive nuances, whereas Large Language Models (LLMs) offer improved quality at high computational cost. We introduce SORE (Semant… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted to the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) Industry Track, 10 pages

  2. arXiv:2506.13957  [pdf, ps, other

    cs.RO

    A Cooperative Contactless Object Transport with Acoustic Robots

    Authors: Narsimlu Kemsaram, Akin Delibasi, James Hardwick, Bonot Gautam, Diego Martinez Plasencia, Sriram Subramanian

    Abstract: Cooperative transport, the simultaneous movement of an object by multiple agents, has been widely observed in biological systems such as ant colonies, which improve efficiency and adaptability in dynamic environments. Inspired by these natural phenomena, we present a novel acoustic robotic system for the transport of contactless objects in mid-air. Our system leverages phased ultrasonic transducer… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted for publication in the Proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) as oral presentation, 8 pages with 8 figures

  3. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  4. arXiv:2506.10897  [pdf, ps, other

    cs.AI

    GenPlanX. Generation of Plans and Execution

    Authors: Daniel Borrajo, Giuseppe Canonaco, Tomás de la Rosa, Alfredo Garrachón, Sriram Gopalakrishnan, Simerjot Kaur, Marianela Morales, Sunandita Patra, Alberto Pozanco, Keshav Ramani, Charese Smiley, Pietro Totis, Manuela Veloso

    Abstract: Classical AI Planning techniques generate sequences of actions for complex tasks. However, they lack the ability to understand planning tasks when provided using natural language. The advent of Large Language Models (LLMs) has introduced novel capabilities in human-computer interaction. In the context of planning tasks, LLMs have shown to be particularly good in interpreting human intents among ot… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  5. arXiv:2506.09068  [pdf, ps, other

    cs.CV cs.LG cs.RO

    BG-HOP: A Bimanual Generative Hand-Object Prior

    Authors: Sriram Krishna, Sravan Chittupalli, Sungjae Park

    Abstract: In this work, we present BG-HOP, a generative prior that seeks to model bimanual hand-object interactions in 3D. We address the challenge of limited bimanual interaction data by extending existing single-hand generative priors, demonstrating preliminary results in capturing the joint distribution of hands and objects. Our experiments showcase the model's capability to generate bimanual interaction… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: Presented at Agents in Interaction, from Humans to Robots, CVPR 2025

  6. arXiv:2506.03145  [pdf

    cs.CL cs.AI

    Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM

    Authors: Pralaypati Ta, Sriram Venkatesaperumal, Keerthi Ram, Mohanasankar Sivaprakasam

    Abstract: Neuroscience research publications encompass a vast wealth of knowledge. Accurately retrieving existing information and discovering new insights from this extensive literature is essential for advancing the field. However, when knowledge is dispersed across multiple sources, current state-of-the-art retrieval methods often struggle to extract the necessary information. A knowledge graph (KG) can i… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  7. arXiv:2506.02306  [pdf, ps, other

    cs.LG stat.ML

    CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation

    Authors: Aditya Gorla, Ryan Wang, Zhengtong Liu, Ulzee An, Sriram Sankararaman

    Abstract: We present CACTI, a masked autoencoding approach for imputing tabular data that leverages the structure in missingness patterns and contextual information. Our approach employs a novel median truncated copy masking training strategy that encourages the model to learn from empirical patterns of missingness while incorporating semantic relationships between features - captured by column names and te… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Journal ref: In Proc. 42th International Conference on Machine Learning (ICML 2025 Spotlight)

  8. arXiv:2505.23945  [pdf, ps, other

    cs.CL cs.AI

    A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models

    Authors: Sriram Balasubramanian, Samyadeep Basu, Soheil Feizi

    Abstract: Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the model. We present the first comprehensive study of CoT faithfulness in large vision-language models (LVLMs), investigating how both text-based and previously unexplored image-based biases affect reasoning and bias… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 34 pages, 25 figures

    ACM Class: I.2.10; I.2.7

  9. arXiv:2505.23913  [pdf, ps, other

    cs.LG stat.ML

    Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling

    Authors: Gustavo Sutter Pessurno de Carvalho, Mohammed Abdulrahman, Hao Wang, Sriram Ganapathi Subramanian, Marc St-Aubin, Sharon O'Sullivan, Lawrence Wan, Luis Ricardez-Sandoval, Pascal Poupart, Agustinus Kristiadi

    Abstract: The optimization of expensive black-box functions is ubiquitous in science and engineering. A common solution to this problem is Bayesian optimization (BO), which is generally comprised of two components: (i) a surrogate model and (ii) an acquisition function, which generally require expensive re-training and optimization steps at each iteration, respectively. Although recent work enabled in-conte… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  10. arXiv:2505.23714  [pdf, ps, other

    cs.CL cs.AI

    SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods

    Authors: Roksana Goworek, Harpal Karlcut, Muhammad Shezad, Nijaguna Darshana, Abhishek Mane, Syam Bondada, Raghav Sikka, Ulvi Mammadov, Rauf Allahverdiyev, Sriram Purighella, Paridhi Gupta, Muhinyia Ndegwa, Haim Dubossarsky

    Abstract: This paper addresses the critical need for high-quality evaluation datasets in low-resource languages to advance cross-lingual transfer. While cross-lingual transfer offers a key strategy for leveraging multilingual pretraining to expand language technologies to understudied and typologically diverse languages, its effectiveness is dependent on quality and suitable benchmarks. We release new sense… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 8 pages, 22 figures, submitted to SIGTYP 2025 workshop in ACL

  11. arXiv:2505.18217  [pdf, other

    cs.SD cs.AI eess.AS

    ABHINAYA -- A System for Speech Emotion Recognition In Naturalistic Conditions Challenge

    Authors: Soumya Dutta, Smruthi Balaji, Varada R, Viveka Salinamakki, Sriram Ganapathy

    Abstract: Speech emotion recognition (SER) in naturalistic settings remains a challenge due to the intrinsic variability, diverse recording conditions, and class imbalance. As participants in the Interspeech Naturalistic SER Challenge which focused on these complexities, we present Abhinaya, a system integrating speech-based, text-based, and speech-text models. Our approach fine-tunes self-supervised and sp… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 5 pages, 2 figures, 4 tables, accepted at Interspeech 2025

  12. arXiv:2505.18135  [pdf, ps, other

    cs.AI cs.CL cs.CR cs.LG

    Gaming Tool Preferences in Agentic LLMs

    Authors: Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi

    Abstract: Large language models (LLMs) can now access a wide range of external tools, thanks to the Model Context Protocol (MCP). This greatly expands their abilities as various agents. However, LLMs rely entirely on the text descriptions of tools to decide which ones to use--a process that is surprisingly fragile. In this work, we expose a vulnerability in prevalent tool/function-calling protocols by inves… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  13. arXiv:2505.17655  [pdf, ps, other

    eess.AS cs.SD

    Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer

    Authors: Soumya Dutta, Avni Jain, Sriram Ganapathy

    Abstract: Given a pair of source and reference speech recordings, audio-to-audio (A2A) style transfer involves the generation of an output speech that mimics the style characteristics of the reference while preserving the content and speaker attributes of the source. In this paper, we propose a novel framework, termed as A2A Zero-shot Emotion Style Transfer (A2A-ZEST), that enables the transfer of reference… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 11 pages, 9 figures, 5 tables

  14. arXiv:2505.13115  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning

    Authors: Debarpan Bhattacharya, Apoorva Kulkarni, Sriram Ganapathy

    Abstract: The popular success of text-based large language models (LLM) has streamlined the attention of the multimodal community to combine other modalities like vision and audio along with text to achieve similar multimodal capabilities. In this quest, large audio language models (LALMs) have to be evaluated on reasoning related tasks which are different from traditional classification or generation tasks… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted in INTERSPEECH, 2025, Rotterdam, The Netherlands

  15. arXiv:2505.12238  [pdf, ps, other

    cs.CL cs.AI

    PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs

    Authors: Sriram Selvam, Anneswa Ghosh

    Abstract: The memorization of sensitive and personally identifiable information (PII) by large language models (LLMs) poses growing privacy risks as models scale and are increasingly deployed in real-world applications. Existing efforts to study sensitive and PII data memorization and develop mitigation strategies are hampered by the absence of comprehensive, realistic, and ethically sourced datasets reflec… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  16. arXiv:2505.11758  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Generalizable Vision-Language Few-Shot Adaptation with Predictive Prompts and Negative Learning

    Authors: Sriram Mandalika

    Abstract: Few-shot adaptation remains a core challenge for vision-language models (VLMs), especially under limited supervision and noisy support samples. We propose PromptFuseNL, a unified framework that enhances few-shot generalization by combining predictive prompt tuning with dual-branch positive and negative learning. The method refines class prototypes through task-conditioned residuals, multi-stage cr… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  17. arXiv:2505.07808  [pdf

    cs.RO

    AcoustoBots: A swarm of robots for acoustophoretic multimodal interactions

    Authors: Narsimlu Kemsaram, James Hardwick, Jincheng Wang, Bonot Gautam, Ceylan Besevli, Giorgos Christopoulos, Sourabh Dogra, Lei Gao, Akin Delibasi, Diego Martinez Plasencia, Orestis Georgiou, Marianna Obrist, Ryuji Hirayama, Sriram Subramanian

    Abstract: Acoustophoresis has enabled novel interaction capabilities, such as levitation, volumetric displays, mid-air haptic feedback, and directional sound generation, to open new forms of multimodal interactions. However, its traditional implementation as a singular static unit limits its dynamic range and application versatility. This paper introduces AcoustoBots - a novel convergence of acoustophoresis… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  18. arXiv:2505.07731  [pdf, other

    cs.CL cs.LG eess.AS

    Spoken Language Understanding on Unseen Tasks With In-Context Learning

    Authors: Neeraj Agrawal, Sriram Ganapathy

    Abstract: Spoken language understanding (SLU) tasks involve diverse skills that probe the information extraction, classification and/or generation capabilities of models. In this setting, task-specific training data may not always be available. While traditional task-specific SLU models are unable to cater to such requirements, the speech-text large language models (LLMs) offer a promising alternative with… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  19. arXiv:2505.04886  [pdf, other

    cs.HC cs.LG

    Fairness Perceptions in Regression-based Predictive Models

    Authors: Mukund Telukunta, Venkata Sriram Siddhardh Nadendla, Morgan Stuart, Casey Canfield

    Abstract: Regression-based predictive analytics used in modern kidney transplantation is known to inherit biases from training data. This leads to social discrimination and inefficient organ utilization, particularly in the context of a few social groups. Despite this concern, there is limited research on fairness in regression and its impact on organ utilization and placement. This paper introduces three n… ▽ More

    Submitted 9 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  20. arXiv:2505.04787  [pdf, other

    cs.CV cs.AI cs.LG

    Replay to Remember (R2R): An Efficient Uncertainty-driven Unsupervised Continual Learning Framework Using Generative Replay

    Authors: Sriram Mandalika, Harsha Vardhan, Athira Nambiar

    Abstract: Continual Learning entails progressively acquiring knowledge from new data while retaining previously acquired knowledge, thereby mitigating ``Catastrophic Forgetting'' in neural networks. Our work presents a novel uncertainty-driven Unsupervised Continual Learning framework using Generative Replay, namely ``Replay to Remember (R2R)''. The proposed R2R architecture efficiently uses unlabelled and… ▽ More

    Submitted 9 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

    Comments: Submitted to the 28th European Conference on Artificial Intelligence (ECAI-2025)

  21. arXiv:2505.04732  [pdf, other

    cs.IR cs.AI

    QBD-RankedDataGen: Generating Custom Ranked Datasets for Improving Query-By-Document Search Using LLM-Reranking with Reduced Human Effort

    Authors: Sriram Gopalakrishnan, Sunandita Patra

    Abstract: The Query-By-Document (QBD) problem is an information retrieval problem where the query is a document, and the retrieved candidates are documents that match the query document, often in a domain or query specific manner. This can be crucial for tasks such as patent matching, legal or compliance case retrieval, and academic literature review. Existing retrieval methods, including keyword search and… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 13 pages

  22. arXiv:2504.21781  [pdf, other

    cs.DC cs.DS

    Message Optimality and Message-Time Trade-offs for APSP and Beyond

    Authors: Fabien Dufoulon, Shreyas Pai, Gopal Pandurangan, Sriram Pemmaraju, Peter Robinson

    Abstract: Round complexity is an extensively studied metric of distributed algorithms. In contrast, our knowledge of the \emph{message complexity} of distributed computing problems and its relationship (if any) with round complexity is still quite limited. To illustrate, for many fundamental distributed graph optimization problems such as (exact) diameter computation, All-Pairs Shortest Paths (APSP), Maximu… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: Accepted to PODC 2025, abstract shortened to fit arXiv constraints

  23. arXiv:2504.19716  [pdf, other

    cs.RO

    QuickGrasp: Lightweight Antipodal Grasp Planning with Point Clouds

    Authors: Navin Sriram Ravie, Keerthi Vasan M, Asokan Thondiyath, Bijo Sebastian

    Abstract: Grasping has been a long-standing challenge in facilitating the final interface between a robot and the environment. As environments and tasks become complicated, the need to embed higher intelligence to infer from the surroundings and act on them has become necessary. Although most methods utilize techniques to estimate grasp pose by treating the problem via pure sampling-based approaches in the… ▽ More

    Submitted 9 May, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  24. arXiv:2504.11713  [pdf, other

    cs.LG cs.AI

    Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching

    Authors: Aaron Havens, Benjamin Kurt Miller, Bing Yan, Carles Domingo-Enrich, Anuroop Sriram, Brandon Wood, Daniel Levine, Bin Hu, Brandon Amos, Brian Karrer, Xiang Fu, Guan-Horng Liu, Ricky T. Q. Chen

    Abstract: We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities, or energy functions. It is the first on-policy approach that allows significantly more gradient updates than the number of energy evaluations and model samples, allowing us to scale to much larger problem settings than previously explored by similar met… ▽ More

    Submitted 28 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  25. arXiv:2504.10728  [pdf, other

    cs.GT

    Iterative Recommendations based on Monte Carlo Sampling and Trust Estimation in Multi-Stage Vehicular Traffic Routing Games

    Authors: Doris E. M. Brown, Venkata Sriram Siddhardh Nadendla, Sajal K. Das

    Abstract: The shortest-time route recommendations offered by modern navigation systems fuel selfish routing in urban vehicular traffic networks and are therefore one of the main reasons for the growth of congestion. In contrast, intelligent transportation systems (ITS) prefer to steer driver-vehicle systems (DVS) toward system-optimal route recommendations, which are primarily designed to mitigate network c… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  26. arXiv:2504.09387  [pdf, other

    cs.CL

    On Language Models' Sensitivity to Suspicious Coincidences

    Authors: Sriram Padmanabhan, Kanishka Misra, Kyle Mahowald, Eunsol Choi

    Abstract: Humans are sensitive to suspicious coincidences when generalizing inductively over data, as they make assumptions as to how the data was sampled. This results in smaller, more specific hypotheses being favored over more general ones. For instance, when provided the set {Austin, Dallas, Houston}, one is more likely to think that this is sampled from "Texas Cities" over "US Cities" even though both… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  27. arXiv:2504.05908  [pdf, other

    cs.CV cs.AI cs.LG

    PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario

    Authors: Sriram Mandalika, Lalitha V, Athira Nambiar

    Abstract: Driving scene understanding is a critical real-world problem that involves interpreting and associating various elements of a driving environment, such as vehicles, pedestrians, and traffic signals. Despite advancements in autonomous driving, traditional pipelines rely on deterministic models that fail to capture the probabilistic nature and inherent uncertainty of real-world driving. To address t… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: Accepted at The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 - CVPRW

  28. arXiv:2504.01190  [pdf, other

    cs.MM

    Video Quality Assessment for Resolution Cross-Over in Live Sports

    Authors: Jingwen Zhu, Yixu Chen, Hai Wei, Sriram Sethuraman, Yongjun Wu

    Abstract: In adaptive bitrate streaming, resolution cross-over refers to the point on the convex hull where the encoding resolution should switch to achieve better quality. Accurate cross-over prediction is crucial for streaming providers to optimize resolution at given bandwidths. Most existing works rely on objective Video Quality Metrics (VQM), particularly VMAF, to determine the resolution cross-over. H… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  29. arXiv:2503.17136  [pdf, other

    cs.CL

    CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization

    Authors: Brihi Joshi, Sriram Venkatapathy, Mohit Bansal, Nanyun Peng, Haw-Shiuan Chang

    Abstract: Evaluating creative text such as human-written stories using language models has always been a challenging task -- owing to the subjectivity of multi-annotator ratings. To mimic the thinking process of humans, chain of thought (CoT) generates free-text explanations that help guide a model's predictions and Self-Consistency (SC) marginalizes predictions over multiple generated explanations. In this… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  30. arXiv:2503.12193  [pdf, other

    cs.CV cs.LG

    S2IL: Structurally Stable Incremental Learning

    Authors: S Balasubramanian, Yedu Krishna P, Talasu Sai Sriram, M Sai Subramaniam, Manepalli Pranav Phanindra Sai, Darshan Gera

    Abstract: Feature Distillation (FD) strategies are proven to be effective in mitigating Catastrophic Forgetting (CF) seen in Class Incremental Learning (CIL). However, current FD approaches enforce strict alignment of feature magnitudes and directions across incremental steps, limiting the model's ability to adapt to new knowledge. In this paper we propose Structurally Stable Incremental Learning(S22IL), a… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  31. arXiv:2503.08884  [pdf, other

    cs.CV cs.CL cs.LG

    Seeing What's Not There: Spurious Correlation in Multimodal LLMs

    Authors: Parsa Hosseini, Sumit Nawathe, Mazda Moayeri, Sriram Balasubramanian, Soheil Feizi

    Abstract: Unimodal vision models are known to rely on spurious correlations, but it remains unclear to what extent Multimodal Large Language Models (MLLMs) exhibit similar biases despite language supervision. In this paper, we investigate spurious bias in MLLMs and introduce SpurLens, a pipeline that leverages GPT-4 and open-set object detectors to automatically identify spurious visual cues without human s… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  32. arXiv:2503.07025  [pdf, other

    cs.IR cs.AI cs.LG

    Weak Supervision for Improved Precision in Search Systems

    Authors: Sriram Vasudevan

    Abstract: Labeled datasets are essential for modern search engines, which increasingly rely on supervised learning methods like Learning to Rank and massive amounts of data to power deep learning models. However, creating these datasets is both time-consuming and costly, leading to the common use of user click and activity logs as proxies for relevance. In this paper, we present a weak supervision approach… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted to the AAAI 2025 Workshop on Computational Jobs Marketplace

  33. arXiv:2503.05312  [pdf, ps, other

    cs.DS cs.CC

    On the Parameterized Complexity of Odd Coloring

    Authors: Sriram Bhyravarapu, Swati Kumari, I. Vinod Reddy

    Abstract: A proper vertex coloring of a connected graph $G$ is called an odd coloring if, for every vertex $v$ in $G$, there exists a color that appears odd number of times in the open neighborhood of $v$. The minimum number of colors required to obtain an odd coloring of $G$ is called the \emph{odd chromatic number} of $G$, denoted by $χ_{o}(G)$. Determining $χ_o(G)$ known to be ${\sf NP}$-hard. Given a gr… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: Appeared in CALDAM 2025

  34. arXiv:2503.03965  [pdf, other

    cs.LG cs.AI

    All-atom Diffusion Transformers: Unified generative modelling of molecules and materials

    Authors: Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram, Zachary W. Ulissi

    Abstract: Diffusion models are the standard toolkit for generative modelling of 3D atomic systems. However, for different types of atomic systems -- such as molecules and materials -- the generative processes are usually highly specific to the target system despite the underlying physics being the same. We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly… ▽ More

    Submitted 22 May, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: ICML 2025

  35. arXiv:2503.03866  [pdf, other

    cs.AI cs.GT cs.LG cs.MA

    Learning to Negotiate via Voluntary Commitment

    Authors: Shuhui Zhu, Baoxiang Wang, Sriram Ganapathi Subramanian, Pascal Poupart

    Abstract: The partial alignment and conflict of autonomous agents lead to mixed-motive scenarios in many real-world applications. However, agents may fail to cooperate in practice even when cooperation yields a better outcome. One well known reason for this failure comes from non-credible commitments. To facilitate commitments among agents for better cooperation, we define Markov Commitment Games (MCGs), a… ▽ More

    Submitted 19 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: Accepted by AISTATS 2025

  36. arXiv:2503.03192  [pdf, other

    cs.RO

    Distributed Certifiably Correct Range-Aided SLAM

    Authors: Alexander Thoms, Alan Papalia, Jared Velasquez, David M. Rosen, Sriram Narasimhan

    Abstract: Reliable simultaneous localization and mapping (SLAM) algorithms are necessary for safety-critical autonomous navigation. In the communication-constrained multi-agent setting, navigation systems increasingly use point-to-point range sensors as they afford measurements with low bandwidth requirements and known data association. The state estimation problem for these systems takes the form of range-… ▽ More

    Submitted 13 May, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: 8 pages, 3 figures, accepted to the 2025 International Conference on Robotics and Automation (ICRA). This version includes minor clerical edits to the published version in the conference proceedings

  37. arXiv:2502.20389  [pdf, ps, other

    cs.CV

    From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation from 2D VLMs

    Authors: Ang Cao, Sergio Arnaud, Oleksandr Maksymets, Jianing Yang, Ayush Jain, Sriram Yenamandra, Ada Martin, Vincent-Pierre Berges, Paul McVay, Ruslan Partsey, Aravind Rajeswaran, Franziska Meier, Justin Johnson, Jeong Joon Park, Alexander Sax

    Abstract: 3D vision-language grounding faces a fundamental data bottleneck: while 2D models train on billions of images, 3D models have access to only thousands of labeled scenes--a six-order-of-magnitude gap that severely limits performance. We introduce $\textbf{LIFT-GS}$, a practical distillation technique that overcomes this limitation by using differentiable rendering to bridge 3D and 2D supervision. L… ▽ More

    Submitted 9 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: Project page: https://liftgs.github.io

  38. arXiv:2502.19297  [pdf, other

    cs.MA cs.AI cs.LG

    Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains

    Authors: Nikhilesh Prabhakar, Ranveer Singh, Harsha Kokel, Sriraam Natarajan, Prasad Tadepalli

    Abstract: Multiagent Reinforcement Learning (MARL) poses significant challenges due to the exponential growth of state and action spaces and the non-stationary nature of multiagent environments. This results in notable sample inefficiency and hinders generalization across diverse tasks. The complexity is further pronounced in relational settings, where domain knowledge is crucial but often underutilized by… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  39. arXiv:2502.17516  [pdf, other

    cs.LG cs.AI

    A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

    Authors: Zihao Lin, Samyadeep Basu, Mohammad Beigi, Varun Manjunatha, Ryan A. Rossi, Zichao Wang, Yufan Zhou, Sriram Balasubramanian, Arman Zarei, Keivan Rezaei, Ying Shen, Barry Menglong Yao, Zhiyang Xu, Qin Liu, Yuxiang Zhang, Yan Sun, Shilong Liu, Li Shen, Hongxuan Li, Soheil Feizi, Lifu Huang

    Abstract: The rise of foundation models has transformed machine learning research, prompting efforts to uncover their inner workings and develop more efficient and reliable applications for better control. While significant progress has been made in interpreting Large Language Models (LLMs), multimodal foundation models (MMFMs) - such as contrastive vision-language models, generative vision-language models,… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: 30 pages, 4 Figures, 10 Tables

  40. arXiv:2502.17439  [pdf, other

    cs.SE cs.AI cs.DC cs.OS

    Large Language Models as Realistic Microservice Trace Generators

    Authors: Donghyun Kim, Sriram Ravula, Taemin Ha, Alexandros G. Dimakis, Daehyeok Kim, Aditya Akella

    Abstract: Workload traces are essential to understand complex computer systems' behavior and manage processing and memory resources. Since real-world traces are hard to obtain, synthetic trace generation is a promising alternative. This paper proposes a first-of-a-kind approach that relies on training a large language model (LLM) to generate synthetic workload traces, specifically microservice call graphs.… ▽ More

    Submitted 25 February, 2025; v1 submitted 16 December, 2024; originally announced February 2025.

  41. arXiv:2502.15297  [pdf, other

    astro-ph.IM cs.LG quant-ph

    Comparative Analysis of Black Hole Mass Estimation in Type-2 AGNs: Classical vs. Quantum Machine Learning and Deep Learning Approaches

    Authors: Sathwik Narkedimilli, Venkata Sriram Amballa, N V Saran Kumar, R Arun Kumar, R Praneeth Reddy, Satvik Raghav, Manish M, Aswath Babu H

    Abstract: In the case of Type-2 AGNs, estimating the mass of the black hole is challenging. Understanding how galaxies form and evolve requires considerable insight into the mass of black holes. This work compared different classical and quantum machine learning (QML) algorithms for black hole mass estimation, wherein the classical algorithms are Linear Regression, XGBoost Regression, Random Forest Regresso… ▽ More

    Submitted 24 February, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 29 pages, 12 Figures, 6 Tables

  42. arXiv:2502.12918  [pdf, ps, other

    cs.DB

    Query Rewriting via LLMs

    Authors: Sriram Dharwada, Himanshu Devrani, Jayant Haritsa, Harish Doraiswamy

    Abstract: When complex SQL queries suffer slow executions despite query optimization, DBAs typically invoke automated query rewriting tools to recommend ``lean'' equivalents that are conducive to faster execution. The rewritings are usually achieved via transformation rules, but these rules are limited in scope and difficult to update in a production system. Recently, LLM-based techniques have also been sug… ▽ More

    Submitted 10 June, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  43. arXiv:2502.09927  [pdf, other

    cs.CV cs.AI

    Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

    Authors: Granite Vision Team, Leonid Karlinsky, Assaf Arbelle, Abraham Daniels, Ahmed Nassar, Amit Alfassi, Bo Wu, Eli Schwartz, Dhiraj Joshi, Jovana Kondic, Nimrod Shabtay, Pengyuan Li, Roei Herzig, Shafiq Abedin, Shaked Perek, Sivan Harary, Udi Barzelay, Adi Raz Goldfarb, Aude Oliva, Ben Wieles, Bishwaranjan Bhattacharjee, Brandon Huang, Christoph Auer, Dan Gutfreund, David Beymer , et al. (38 additional authors not shown)

    Abstract: We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our model is trained on a comprehensive instruction-following dataset, including document-related tasks, such as content extraction from tables, charts, diagrams, sketches, and infographics, as well as gener… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  44. arXiv:2502.06973  [pdf, other

    cs.CV

    Indoor Light and Heat Estimation from a Single Panorama

    Authors: Guanzhou Ji, Sriram Narayanan, Azadeh Sawyer, Srinivasa Narasimhan

    Abstract: This paper presents a novel application for directly estimating indoor light and heat maps from captured indoor-outdoor High Dynamic Range (HDR) panoramas. In our image-based rendering method, the indoor panorama is used to estimate the 3D room layout, while the corresponding outdoor panorama serves as an environment map to infer spatially-varying light and material properties. We establish a conn… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  45. arXiv:2502.01060  [pdf, other

    cs.LG cs.AI cs.CR

    Learning Nonlinearity of Boolean Functions: An Experimentation with Neural Networks

    Authors: Sriram Ranga, Nandish Chattopadhyay, Anupam Chattopadhyay

    Abstract: This paper investigates the learnability of the nonlinearity property of Boolean functions using neural networks. We train encoder style deep neural networks to learn to predict the nonlinearity of Boolean functions from examples of functions in the form of a truth table and their corresponding nonlinearity values. We report empirical results to show that deep neural networks are able to learn to… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: To be published in International conference on Artificial Intelligence and Sustainable Computing, AISC 2024

  46. arXiv:2501.16450  [pdf, other

    cs.IR cs.AI

    360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation

    Authors: Hamed Firooz, Maziar Sanjabi, Adrian Englhardt, Aman Gupta, Ben Levine, Dre Olgiati, Gungor Polatkan, Iuliia Melnychuk, Karthik Ramgopal, Kirill Talanine, Kutta Srinivasan, Luke Simon, Natesh Sivasubramoniapillai, Necip Fazil Ayan, Qingquan Song, Samira Sriram, Souvik Ghosh, Tao Song, Tejas Dharamsi, Vignesh Kothapalli, Xiaoling Zhai, Ya Xu, Yu Wang, Yun Dai

    Abstract: Ranking and recommendation systems are the foundation for numerous online experiences, ranging from search results to personalized content delivery. These systems have evolved into complex, multilayered architectures that leverage vast datasets and often incorporate thousands of predictive models. The maintenance and enhancement of these models is a labor intensive process that requires extensive… ▽ More

    Submitted 7 February, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

  47. arXiv:2501.14486  [pdf, other

    cs.RO

    Visual-Lidar Map Alignment for Infrastructure Inspections

    Authors: Jake McLaughlin, Nicholas Charron, Sriram Narasimhan

    Abstract: Routine and repetitive infrastructure inspections present safety, efficiency, and consistency challenges as they are performed manually, often in challenging or hazardous environments. They can also introduce subjectivity and errors into the process, resulting in undesirable outcomes. Simultaneous localization and mapping (SLAM) presents an opportunity to generate high-quality 3D maps that can be… ▽ More

    Submitted 27 January, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 8 pages, 8 figures, for associated code see https://github.com/jakemclaughlin6/vlma

    MSC Class: 68T40 (primary); 62P30 (secondary) ACM Class: J.2; I.4

  48. arXiv:2501.11618  [pdf, other

    cs.CR

    Enhancing IoT Network Security through Adaptive Curriculum Learning and XAI

    Authors: Sathwik Narkedimilli, Sujith Makam, Amballa Venkata Sriram, Sai Prashanth Mallellu, MSVPJ Sathvik, Ranga Rao Venkatesha Prasad

    Abstract: To address the critical need for secure IoT networks, this study presents a scalable and lightweight curriculum learning framework enhanced with Explainable AI (XAI) techniques, including LIME, to ensure transparency and adaptability. The proposed model employs novel neural network architecture utilized at every stage of Curriculum Learning to efficiently capture and focus on both short- and long-… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: 2 tables, 5 figures

  49. arXiv:2501.11468  [pdf, other

    eess.AS cs.SD

    LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations

    Authors: Soumya Dutta, Sriram Ganapathy

    Abstract: Emotion recognition in conversations (ERC) is challenging due to the multimodal nature of the emotion expression. In this paper, we propose to pretrain a text-based recognition model from unsupervised speech transcripts with LLM guidance. These transcriptions are obtained from a raw speech dataset with a pre-trained ASR system. A text LLM model is queried to provide pseudo-labels for these transcr… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: ICASSP 2025; 5 pages, 4 figures, 2 tables

  50. arXiv:2501.09755  [pdf, other

    cs.CV cs.AI

    Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

    Authors: Philippe Hansen-Estruch, David Yan, Ching-Yao Chung, Orr Zohar, Jialiang Wang, Tingbo Hou, Tao Xu, Sriram Vishwanath, Peter Vajda, Xinlei Chen

    Abstract: Visual tokenization via auto-encoding empowers state-of-the-art image and video generative models by compressing pixels into a latent space. Although scaling Transformer-based generators has been central to recent advances, the tokenizer component itself is rarely scaled, leaving open questions about how auto-encoder design choices influence both its objective of reconstruction and downstream gene… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 28 pages, 25 figures, 7 Tables

    ACM Class: I.2.10; I.4.2; I.4.5