Skip to main content

Showing 1–50 of 371 results for author: Shivam

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15085  [pdf, ps, other

    cs.RO cs.HC

    EmojiVoice: Towards long-term controllable expressivity in robot speech

    Authors: Paige Tuttösí, Shivam Mehta, Zachary Syvenky, Bermet Burkanova, Gustav Eje Henter, Angelica Lim

    Abstract: Humans vary their expressivity when speaking for extended periods to maintain engagement with their listener. Although social robots tend to be deployed with ``expressive'' joyful voices, they lack this long-term variation found in human speech. Foundation model text-to-speech systems are beginning to mimic the expressivity in human speech, but they are difficult to deploy offline on robots. We pr… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to RO-MAN 2025, Demo at HRI 2025 : https://dl.acm.org/doi/10.5555/3721488.3721774

  2. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  3. arXiv:2506.11398  [pdf, ps, other

    cs.LG physics.flu-dyn

    FIGNN: Feature-Specific Interpretability for Graph Neural Network Surrogate Models

    Authors: Riddhiman Raut, Romit Maulik, Shivam Barwey

    Abstract: This work presents a novel graph neural network (GNN) architecture, the Feature-specific Interpretable Graph Neural Network (FIGNN), designed to enhance the interpretability of deep learning surrogate models defined on unstructured grids in scientific applications. Traditional GNNs often obscure the distinct spatial influences of different features in multivariate prediction tasks. FIGNN addresses… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  4. arXiv:2506.05508  [pdf, ps, other

    cs.DC cs.AI

    Beyond the Buzz: A Pragmatic Take on Inference Disaggregation

    Authors: Tiyasa Mitra, Ritika Borkar, Nidhi Bhatia, Ramon Matas, Shivam Raj, Dheevatsa Mudigere, Ritchie Zhao, Maximilian Golub, Arpan Dutta, Sailaja Madduri, Dharmesh Jani, Brian Pharris, Bita Darvish Rouhani

    Abstract: As inference scales to multi-node deployments, disaggregation - splitting inference into distinct phases - offers a promising path to improving the throughput-interactivity Pareto frontier. Despite growing enthusiasm and a surge of open-source efforts, practical deployment of disaggregated serving remains limited due to the complexity of the optimization search space and system-level coordination.… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  5. arXiv:2506.04358  [pdf, ps, other

    cs.LG

    A Risk-Aware Reinforcement Learning Reward for Financial Trading

    Authors: Uditansh Srivastava, Shivam Aryan, Shaurya Singh

    Abstract: We propose a novel composite reward function for reinforcement learning in financial trading that balances return and risk using four differentiable terms: annualized return downside risk differential return and the Treynor ratio Unlike single metric objectives for example the Sharpe ratio our formulation is modular and parameterized by weights w1 w2 w3 and w4 enabling practitioners to encode di… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 14 pages, 11 figures

  6. arXiv:2506.03448  [pdf, ps, other

    cs.CV

    RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions

    Authors: Bimsara Pathiraja, Maitreya Patel, Shivam Singh, Yezhou Yang, Chitta Baral

    Abstract: Despite recent advances in inversion and instruction-based image editing, existing approaches primarily excel at editing single, prominent objects but significantly struggle when applied to complex scenes containing multiple entities. To quantify this gap, we first introduce RefEdit-Bench, a rigorous real-world benchmark rooted in RefCOCO, where even baselines trained on millions of samples perfor… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Project page: \url{http://refedit.vercel.app}

  7. arXiv:2506.01085  [pdf, ps, other

    cs.CV cs.AI

    Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

    Authors: Shivam Chandhok, Qian Yang, Oscar Manas, Kanishk Jain, Leonid Sigal, Aishwarya Agrawal

    Abstract: Instruction tuning has been central to the success of recent vision-language models (VLMs), but it remains expensive-requiring large-scale datasets, high-quality annotations, and large compute budgets. We propose PRioritized cOncept learninG via Relative Error-driven Sample Selection (PROGRESS), a data- and compute-efficient framework that enables VLMs to dynamically select what to learn next base… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Preprint

  8. arXiv:2506.00348  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Beyond Winning: Margin of Victory Relative to Expectation Unlocks Accurate Skill Ratings

    Authors: Shivam Shorewala, Zihao Yang

    Abstract: Knowledge of accurate relative skills in any competitive system is essential, but foundational approaches such as ELO discard extremely relevant performance data by concentrating exclusively on binary outcomes. While margin of victory (MOV) extensions exist, they often lack a definitive method for incorporating this information. We introduce Margin of Victory Differential Analysis (MOVDA), a frame… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  9. arXiv:2505.24584  [pdf, ps, other

    cs.LG cs.AI cs.IR

    AutoChemSchematic AI: A Closed-Loop, Physics-Aware Agentic Framework for Auto-Generating Chemical Process and Instrumentation Diagrams

    Authors: Sakhinana Sagar Srinivas, Shivam Gupta, Venkataramana Runkana

    Abstract: Recent advancements in generative AI have accelerated the discovery of novel chemicals and materials; however, transitioning these discoveries to industrial-scale production remains a critical bottleneck, as it requires the development of entirely new chemical manufacturing processes. Current AI methods cannot auto-generate PFDs or PIDs, despite their critical role in scaling chemical processes, w… ▽ More

    Submitted 1 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

  10. arXiv:2505.24365  [pdf

    cs.LG cs.PF

    Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm

    Authors: Vardhan Shorewala, Shivam Shorewala

    Abstract: This paper introduces a unified approach to cluster refinement and anomaly detection in datasets. We propose a novel algorithm that iteratively reduces the intra-cluster variance of N clusters until a global minimum is reached, yielding tighter clusters than the standard k-means algorithm. We evaluate the method using intrinsic measures for unsupervised learning, including the silhouette coefficie… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: IEEE ICCCSP

  11. arXiv:2505.23802  [pdf, ps, other

    cs.CL cs.AI

    MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

    Authors: Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Michael Wornow, Juan M. Banda, Nikesh Kotecha, Timothy Keyes, Yifan Mai, Mert Oez, Hao Qiu, Shrey Jain, Leonardo Schettini, Mehr Kashyap, Jason Alan Fries, Akshay Swaminathan, Philip Chung, Fateme Nateghi, Asad Aali, Ashwin Nayak, Shivam Vedak, Sneha S. Jain, Birju Patel, Oluseyi Fayanju, Shreya Shah , et al. (56 additional authors not shown)

    Abstract: While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. We introduce MedHELM, an extensible evaluation framework for assessing LLM performance for medical tasks with three key contributions. First, a clinician-validated taxonomy spanning 5 categories, 22 subcatego… ▽ More

    Submitted 2 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  12. arXiv:2505.21652  [pdf, ps, other

    cs.RO cs.AI

    PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation

    Authors: Yifan Yin, Zhengtao Han, Shivam Aarya, Jianxin Wang, Shuhang Xu, Jiawei Peng, Angtian Wang, Alan Yuille, Tianmin Shu

    Abstract: Fine-grained robot manipulation, such as lifting and rotating a bottle to display the label on the cap, requires robust reasoning about object parts and their relationships with intended tasks. Despite recent advances in training general-purpose robot manipulation policies guided by language instructions, there is a notable lack of large-scale datasets for fine-grained manipulation tasks with part… ▽ More

    Submitted 16 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  13. arXiv:2505.18839  [pdf, ps, other

    cs.DS

    DNF Learning via Locally Mixing Random Walks

    Authors: Josh Alman, Shivam Nadimpalli, Shyamal Patel, Rocco A. Servedio

    Abstract: We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over $\{0,1\}^n$. (1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target $s$-term DNF… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  14. arXiv:2505.18625  [pdf, ps, other

    math.AG cs.CV

    Tropical Geometry Based Edge Detection Using Min-Plus and Max-Plus Algebra

    Authors: Shivam Kumar Jha S, Jaya NN Iyer

    Abstract: This paper proposes a tropical geometry-based edge detection framework that reformulates convolution and gradient computations using min-plus and max-plus algebra. The tropical formulation emphasizes dominant intensity variations, contributing to sharper and more continuous edge representations. Three variants are explored: an adaptive threshold-based method, a multi-kernel min-plus method, and a… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    MSC Class: 14T90; 14-04

  15. arXiv:2505.18247  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG

    MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering

    Authors: Kunal Sawarkar, Shivam R. Solanki, Abhilasha Mangal

    Abstract: Retrieval-Augmented Generation (RAG) struggles with domain-specific enterprise datasets, often isolated behind firewalls and rich in complex, specialized terminology unseen by LLMs during pre-training. Semantic variability across domains like medicine, networking, or law hampers RAG's context precision, while fine-tuning solutions are costly, slow, and lack generalization as new data emerges. Achi… ▽ More

    Submitted 4 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Preprint. Paper Submitted for NeurIPS 2025- The Thirty-Ninth Annual Conference on Neural Information Processing Systems

  16. arXiv:2505.15134  [pdf, ps, other

    cs.LG cs.AI

    The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

    Authors: Shivam Agarwal, Zimin Zhang, Lifan Yuan, Jiawei Han, Hao Peng

    Abstract: Entropy minimization (EM) trains the model to concentrate even more probability mass on its most confident outputs. We show that this simple objective alone, without any labeled data, can substantially improve large language models' (LLMs) performance on challenging math, physics, and coding tasks. We explore three approaches: (1) EM-FT minimizes token-level entropy similarly to instruction finetu… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  17. arXiv:2505.10022  [pdf, ps, other

    cs.RO

    APEX: Action Priors Enable Efficient Exploration for Skill Imitation on Articulated Robots

    Authors: Shivam Sood, Laukik B Nakhwa, Yuhong Cao, Sun Ge, Guillaume Sartoretti

    Abstract: Learning by imitation provides an effective way for robots to develop well-regulated complex behaviors and directly benefit from natural demonstrations. State-of-the-art imitation learning (IL) approaches typically leverage Adversarial Motion Priors (AMP), which, despite their impressive results, suffer from two key limitations. They are prone to mode collapse, which often leads to overfitting to… ▽ More

    Submitted 12 June, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  18. arXiv:2505.05885  [pdf, ps, other

    cs.DB cs.IR

    Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

    Authors: Nitish Upreti, Krishnan Sundaram, Hari Sudan Sundar, Samer Boshra, Balachandar Perumalswamy, Shivam Atri, Martin Chisholm, Revti Raman Singh, Greg Yang, Subramanyam Pattipaka, Tamara Hass, Nitesh Dudhey, James Codella, Mark Hildebrand, Magdalen Manohar, Jack Moffitt, Haiyang Xu, Naren Datha, Suryansh Gupta, Ravishankar Krishnaswamy, Prashant Gupta, Abhishek Sahu, Ritika Mor, Santosh Kulkarni, Hemeswari Varada , et al. (11 additional authors not shown)

    Abstract: Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new class of specialized vector databases that optimize for vector search quality and cost. Instead, we argue that a scalable, high-performance, and cost-efficient… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    ACM Class: H.3.3

  19. arXiv:2505.03742  [pdf, other

    cs.CR

    Hardware-Enabled Mechanisms for Verifying Responsible AI Development

    Authors: Aidan O'Gara, Gabriel Kulp, Will Hodgkins, James Petrie, Vincent Immler, Aydin Aysu, Kanad Basu, Shivam Bhasin, Stjepan Picek, Ankur Srivastava

    Abstract: Advancements in AI capabilities, driven in large part by scaling up computing resources used for AI training, have created opportunities to address major global challenges but also pose risks of misuse. Hardware-enabled mechanisms (HEMs) can support responsible AI development by enabling verifiable reporting of key properties of AI training activities such as quantity of compute used, training clu… ▽ More

    Submitted 2 April, 2025; originally announced May 2025.

  20. arXiv:2505.00490  [pdf, other

    cs.RO cs.AI

    Optimal Interactive Learning on the Job via Facility Location Planning

    Authors: Shivam Vats, Michelle Zhao, Patrick Callaghan, Mingxi Jia, Maxim Likhachev, Oliver Kroemer, George Konidaris

    Abstract: Collaborative robots must continually adapt to novel tasks and user preferences without overburdening the user. While prior interactive robot learning methods aim to reduce human effort, they are typically limited to single-task scenarios and are not well-suited for sustained, multi-task collaboration. We propose COIL (Cost-Optimal Interactive Learning) -- a multi-task interaction planner that min… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted to Robotics: Science and Systems (RSS) 2025

  21. arXiv:2504.21536  [pdf, other

    cs.DC

    Scientific Workflow Scheduling in Cloud Considering Cold Start and Variable Pricing Model

    Authors: Suvarthi Sarkar, Sparsh Mittal, Shivam Garg, Aryabartta Sahu

    Abstract: Cloud computing has become a pivotal platform for executing scientific workflows due to its scalable and cost-effective infrastructure. Scientific Cloud Service Providers (SCSPs) act as intermediaries that rent virtual machines (VMs) from Infrastructure-as-a-Service (IaaS) providers to meet users' workflow execution demands. The SCSP earns profit from the execution of scientific workflows if it co… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  22. arXiv:2504.18509  [pdf, other

    cs.CV

    Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

    Authors: Shivam Duggal, Yushi Hu, Oscar Michel, Aniruddha Kembhavi, William T. Freeman, Noah A. Smith, Ranjay Krishna, Antonio Torralba, Ali Farhadi, Wei-Chiu Ma

    Abstract: Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often ov… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: CVPR 2025. Project page and codes: https://eval3d.github.io/

  23. arXiv:2504.15286  [pdf, other

    cs.SE cs.AI

    CUBETESTERAI: Automated JUnit Test Generation using the LLaMA Model

    Authors: Daniele Gorla, Shivam Kumar, Pietro Nicolaus Roselli Lorenzini, Alireza Alipourfaz

    Abstract: This paper presents an approach to automating JUnit test generation for Java applications using the Spring Boot framework, leveraging the LLaMA (Large Language Model Architecture) model to enhance the efficiency and accuracy of the testing process. The resulting tool, called CUBETESTERAI, includes a user-friendly web interface and the integration of a CI/CD pipeline using GitLab and Docker. These… ▽ More

    Submitted 13 March, 2025; originally announced April 2025.

    Comments: Accepted to ICST 2025 Industry Track

  24. arXiv:2504.13263  [pdf, other

    cs.AI

    Causal-Copilot: An Autonomous Causal Analysis Agent

    Authors: Xinyue Wang, Kun Zhou, Wenyi Wu, Har Simrat Singh, Fang Nan, Songyao Jin, Aryan Philip, Saloni Patnaik, Hou Zhu, Shivam Singh, Parjanya Prashant, Qian Shen, Biwei Huang

    Abstract: Causal analysis plays a foundational role in scientific discovery and reliable decision-making, yet it remains largely inaccessible to domain experts due to its conceptual and algorithmic complexity. This disconnect between causal methodology and practical usability presents a dual challenge: domain experts are unable to leverage recent advances in causal learning, while causal researchers lack br… ▽ More

    Submitted 21 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  25. arXiv:2504.12683  [pdf, other

    stat.ME cs.LG stat.ML

    Cluster weighted models with multivariate skewed distributions for functional data

    Authors: Cristina Anton, Roy Shivam Ram Shreshtth

    Abstract: We propose a clustering method, funWeightClustSkew, based on mixtures of functional linear regression models and three skewed multivariate distributions: the variance-gamma distribution, the skew-t distribution, and the normal-inverse Gaussian distribution. Our approach follows the framework of the functional high dimensional data clustering (funHDDC) method, and we extend to functional data the c… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  26. arXiv:2504.06011  [pdf, other

    cs.CL

    Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi

    Authors: Monojit Choudhury, Shivam Chauhan, Rocktim Jyoti Das, Dhruv Sahnan, Xudong Han, Haonan Li, Aaryamonvikram Singh, Alok Anil Jadhav, Utkarsh Agarwal, Mukund Choudhary, Debopriyo Banerjee, Fajri Koto, Junaid Bhat, Awantika Shukla, Samujjwal Ghosh, Samta Kamboj, Onkar Pandit, Lalit Pradhan, Rahul Pal, Sunil Sahu, Soundar Doraiswamy, Parvez Mullah, Ali El Filali, Neha Sengupta, Gokul Ramakrishnan , et al. (5 additional authors not shown)

    Abstract: Developing high-quality large language models (LLMs) for moderately resourced languages presents unique challenges in data availability, model adaptation, and evaluation. We introduce Llama-3-Nanda-10B-Chat, or Nanda for short, a state-of-the-art Hindi-centric instruction-tuned generative LLM, designed to push the boundaries of open-source Hindi language models. Built upon Llama-3-8B, Nanda incorp… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  27. arXiv:2504.04737  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

    Authors: Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya

    Abstract: In the landscape of Fact-based Judgment Prediction and Explanation (FJPE), reliance on factual data is essential for developing robust and realistic AI-driven decision-making tools. This paper introduces TathyaNyaya, the largest annotated dataset for FJPE tailored to the Indian legal context, encompassing judgments from the Supreme Court of India and various High Courts. Derived from the Hindi ter… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  28. arXiv:2504.03174  [pdf, other

    cs.CL

    Multi-lingual Multi-turn Automated Red Teaming for LLMs

    Authors: Abhishek Singhania, Christophe Dupuy, Shivam Mangale, Amani Namboori

    Abstract: Language Model Models (LLMs) have improved dramatically in the past few years, increasing their adoption and the scope of their capabilities over time. A significant amount of work is dedicated to ``model alignment'', i.e., preventing LLMs to generate unsafe responses when deployed into customer-facing applications. One popular method to evaluate safety risks is \textit{red-teaming}, where agents… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Accepted at TrustNLP@NAACL 2025

  29. arXiv:2504.01281  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.IR

    Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding

    Authors: Sakhinana Sagar Srinivas, Akash Das, Shivam Gupta, Venkataramana Runkana

    Abstract: We present a comprehensive framework for enhancing Retrieval-Augmented Generation (RAG) systems through dynamic retrieval strategies and reinforcement fine-tuning. This approach significantly improves large language models on knowledge-intensive tasks, including opendomain question answering and complex reasoning. Our framework integrates two complementary techniques: Policy-Optimized RetrievalAug… ▽ More

    Submitted 20 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  30. arXiv:2504.00338  [pdf, other

    cs.LG cs.AI cs.MA cs.SI

    Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework

    Authors: Sakhinana Sagar Srinivas, Akash Das, Shivam Gupta, Venkataramana Runkana

    Abstract: The growing use of foundation models (FMs) in real-world applications demands adaptive, reliable, and efficient strategies for dynamic markets. In the chemical industry, AI-discovered materials drive innovation, but commercial success hinges on market adoption, requiring FM-driven advertising frameworks that operate in-the-wild. We present a multilingual, multimodal AI framework for autonomous, hy… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  31. arXiv:2504.00294  [pdf, other

    cs.LG cs.AI cs.CL

    Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

    Authors: Vidhisha Balachandran, Jingya Chen, Lingjiao Chen, Shivam Garg, Neel Joshi, Yash Lara, John Langford, Besmira Nushi, Vibhav Vineet, Yue Wu, Safoora Yousefi

    Abstract: Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for mathematical tasks, the broader impact of this approach on other tasks remains less clear. In this work, we investigate the benefits and limitations of scaling methods ac… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    ACM Class: I.2

  32. Make Some Noise: Towards LLM audio reasoning and generation using sound tokens

    Authors: Shivam Mehta, Nebojsa Jojic, Hannes Gamper

    Abstract: Integrating audio comprehension and generation into large language models (LLMs) remains challenging due to the continuous nature of audio and the resulting high sampling rates. Here, we introduce a novel approach that combines Variational Quantization with Conditional Flow Matching to convert audio into ultra-low bitrate discrete tokens of 0.23kpbs, allowing for seamless integration with text tok… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 5 pages, 2 figures, Accepted at ICASSP 2025

    MSC Class: 68T07 ACM Class: I.2.7; I.2.6; H.5.5

  33. arXiv:2503.20191  [pdf, other

    cs.LG cs.DC

    Maya: Optimizing Deep Learning Training Workloads using Emulated Virtual Accelerators

    Authors: Srihas Yarlagadda, Amey Agrawal, Elton Pinto, Hakesh Darapaneni, Mitali Meratwal, Shivam Mittal, Pranavi Bajjuri, Srinivas Sridharan, Alexey Tumanov

    Abstract: Training large foundation models costs hundreds of millions of dollars, making deployment optimization critical. Current approaches require machine learning engineers to manually craft training recipes through error-prone trial-and-error on expensive compute clusters. To enable efficient exploration of training configurations, researchers have developed performance modeling systems. However, these… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  34. arXiv:2503.17494  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Efficient Knowledge Distillation via Curriculum Extraction

    Authors: Shivam Gupta, Sushrut Karmalkar

    Abstract: Knowledge distillation is a technique used to train a small student network using the output generated by a large teacher network, and has many empirical advantages~\citep{Hinton2015DistillingTK}. While the standard one-shot approach to distillation only uses the output of the final teacher network, recent work~\citep{panigrahi2024progressive} has shown that using intermediate checkpoints from the… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  35. arXiv:2503.13418  [pdf, other

    cs.RO cs.AI

    FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation

    Authors: Shijie Fang, Wenchang Gao, Shivam Goel, Christopher Thierauf, Matthias Scheutz, Jivko Sinapov

    Abstract: Learning to manipulate objects efficiently, particularly those involving sustained contact (e.g., pushing, sliding) and articulated parts (e.g., drawers, doors), presents significant challenges. Traditional methods, such as robot-centric reinforcement learning (RL), imitation learning, and hybrid techniques, require massive training and often struggle to generalize across different objects and rob… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Accepted at IEEE-ICRA-2025

  36. arXiv:2503.09477  [pdf, other

    cs.RO cs.LG cs.NE

    Neural reservoir control of a soft bio-hybrid arm

    Authors: Noel Naughton, Arman Tekinalp, Keshav Shivam, Seung Hung Kim, Volodymyr Kindratenko, Mattia Gazzola

    Abstract: A long-standing engineering problem, the control of soft robots is difficult because of their highly non-linear, heterogeneous, anisotropic, and distributed nature. Here, bridging engineering and biology, a neural reservoir is employed for the dynamic control of a bio-hybrid model arm made of multiple muscle-tendon groups enveloping an elastic spine. We show how the use of reservoirs facilitates s… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 12 pages; 4 figures

  37. Can KAN CANs? Input-convex Kolmogorov-Arnold Networks (KANs) as hyperelastic constitutive artificial neural networks (CANs)

    Authors: Prakash Thakolkaran, Yaqi Guo, Shivam Saini, Mathias Peirlinck, Benjamin Alheit, Siddhant Kumar

    Abstract: Traditional constitutive models rely on hand-crafted parametric forms with limited expressivity and generalizability, while neural network-based models can capture complex material behavior but often lack interpretability. To balance these trade-offs, we present monotonic Input-Convex Kolmogorov-Arnold Networks (ICKANs) for learning polyconvex hyperelastic constitutive laws. ICKANs leverage the Ko… ▽ More

    Submitted 4 June, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 36 pages, 16 figures

    Journal ref: Computer Methods in Applied Mechanics and Engineering 443 (2025), 118089

  38. arXiv:2503.05397  [pdf, other

    cs.MA cs.CL

    Multi Agent based Medical Assistant for Edge Devices

    Authors: Sakharam Gawade, Shivam Akhouri, Chinmay Kulkarni, Jagdish Samant, Pragya Sahu, Aastik, Jai Pahal, Saswat Meher

    Abstract: Large Action Models (LAMs) have revolutionized intelligent automation, but their application in healthcare faces challenges due to privacy concerns, latency, and dependency on internet access. This report introduces an ondevice, multi-agent healthcare assistant that overcomes these limitations. The system utilizes smaller, task-specific agents to optimize resources, ensure scalability and high per… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  39. arXiv:2503.01131  [pdf, other

    cs.CL cs.AI

    Beyond QA Pairs: Assessing Parameter-Efficient Fine-Tuning for Fact Embedding in LLMs

    Authors: Shivam Ratnakar, Abhiroop Talasila, Raghav Chamadiya, Nikhil Agarwal, Vinayak K Doifode

    Abstract: This paper presents an extensive examination of Parameter-Efficient Fine-Tuning (PEFT) for embedding domain specific facts into Large Language Models (LLMs), focusing on improving the fine-tuning process by categorizing question-answer (QA) pairs into Factual and Conceptual classes using a BERT-based classifier. Two distinct Llama-2 models are fine-tuned based on these classifications and evaluate… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: Presented at the Workshop on Preparing Good Data for Generative AI: Challenges and Approaches (Good-Data) in conjunction with AAAI 2025. The authors retain the copyright

    Journal ref: Workshop on Preparing Good Data for Generative AI: Challenges and Approaches, 2025

  40. arXiv:2502.19662  [pdf, other

    cs.AR cs.AI

    HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration

    Authors: Rohan Juneja, Shivam Aggarwal, Safeen Huda, Tulika Mitra, Li-Shiuan Peh

    Abstract: Quantization is critical for efficiently deploying large language models (LLMs). Yet conventional methods remain hardware-agnostic, limited to bit-width constraints, and do not account for intrinsic circuit characteristics such as the timing behaviors and energy profiles of Multiply-Accumulate (MAC) units. This disconnect from circuit-level behavior limits the ability to exploit available timing m… ▽ More

    Submitted 25 April, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  41. arXiv:2502.16473  [pdf, other

    cs.AR

    TerEffic: Highly Efficient Ternary LLM Inference on FPGA

    Authors: Chenyang Yin, Zhenyu Bai, Pranav Venkatram, Shivam Aggarwal, Zhaoying Li, Tulika Mitra

    Abstract: Deploying Large Language Models (LLMs) efficiently on edge devices is often constrained by limited memory capacity and high power consumption. Low-bit quantization methods, particularly ternary quantization, have demonstrated significant potential in preserving model accuracy while substantially decreasing memory footprint and computational costs. However, existing general-purpose architectures an… ▽ More

    Submitted 1 May, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

  42. arXiv:2502.14732  [pdf, other

    cs.GT

    FLIGHT: Facility Location Integrating Generalized, Holistic Theory of Welfare

    Authors: Avyukta Manjunatha Vummintala, Shivam Gupta, Shweta Jain, Sujit Gujar

    Abstract: The Facility Location Problem (FLP) is a well-studied optimization problem with applications in many real-world scenarios. Past literature has explored the solutions from different perspectives to tackle FLPs. These include investigating FLPs under objective functions such as utilitarian, egalitarian, Nash welfare, etc. Also, there is no treatment for asymmetric welfare functions around the facili… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 21 pages, 1 figure

  43. arXiv:2502.14718  [pdf, ps, other

    cs.CL

    Entity Framing and Role Portrayal in the News

    Authors: Tarek Mahmoud, Zhuohan Xie, Dimitar Dimitrov, Nikolaos Nikolaidis, Purificação Silvano, Roman Yangarber, Shivam Sharma, Elisa Sartori, Nicolas Stefanovitch, Giovanni Da San Martino, Jakub Piskorski, Preslav Nakov

    Abstract: We introduce a novel multilingual hierarchical corpus annotated for entity framing and role portrayal in news articles. The dataset uses a unique taxonomy inspired by storytelling elements, comprising 22 fine-grained roles, or archetypes, nested within three main categories: protagonist, antagonist, and innocent. Each archetype is carefully defined, capturing nuanced portrayals of entities such as… ▽ More

    Submitted 15 June, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 25 pages, 13 figures. Accepted to ACL 2025

  44. arXiv:2502.12393  [pdf, other

    stat.ME cs.AI cs.LG stat.ML

    Time Series Treatment Effects Analysis with Always-Missing Controls

    Authors: Juan Shu, Qiyu Han, George Chen, Xihao Cao, Kangming Luo, Dan Pallotta, Shivam Agrawal, Yuping Lu, Xiaoyu Zhang, Jawad Mansoor, Jyoti Anand

    Abstract: Estimating treatment effects in time series data presents a significant challenge, especially when the control group is always unobservable. For example, in analyzing the effects of Christmas on retail sales, we lack direct observation of what would have occurred in late December without the Christmas impact. To address this, we try to recover the control group in the event period while accounting… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  45. arXiv:2502.07328  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM

    Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models

    Authors: Atharva Mehta, Shivam Chauhan, Amirbek Djanibekov, Atharva Kulkarni, Gus Xia, Monojit Choudhury

    Abstract: The advent of Music-Language Models has greatly enhanced the automatic music generation capability of AI systems, but they are also limited in their coverage of the musical genres and cultures of the world. We present a study of the datasets and research papers for music generation and quantify the bias and under-representation of genres. We find that only 5.7% of the total hours of existing music… ▽ More

    Submitted 6 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: 17 pages, 5 figures, accepted to NAACL'25

  46. arXiv:2502.02067  [pdf, other

    cs.RO cs.AI cs.CL cs.LG

    AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement

    Authors: Shivam Singh, Karthik Swaminathan, Nabanita Dash, Ramandeep Singh, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna

    Abstract: An embodied agent assisting humans is often asked to complete new tasks, and there may not be sufficient time or labeled examples to train the agent to perform these new tasks. Large Language Models (LLMs) trained on considerable knowledge across many domains can be used to predict a sequence of abstract actions for completing such tasks, although the agent may not be able to execute this sequence… ▽ More

    Submitted 6 March, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

    Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2025

  47. arXiv:2502.02066  [pdf, other

    cs.RO cs.CL cs.LG

    Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments

    Authors: Raghav Arora, Shivam Singh, Karthik Swaminathan, Ahana Datta, Snehasis Banerjee, Brojeshwar Bhowmick, Krishna Murthy Jatavallabhula, Mohan Sridharan, Madhava Krishna

    Abstract: Assistive agents performing household tasks such as making the bed or cooking breakfast often compute and execute actions that accomplish one task at a time. However, efficiency can be improved by anticipating upcoming tasks and computing an action sequence that jointly achieves these tasks. State-of-the-art methods for task anticipation use data-driven deep networks and Large Language Models (LLM… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2024

  48. Dancing With Chains: Ideating Under Constraints With UIDEC in UI/UX Design

    Authors: Atefeh Shokrizadeh, Boniface Bahati Tadjuidje, Shivam Kumar, Sohan Kamble, Jinghui Cheng

    Abstract: UI/UX designers often work under constraints like brand identity, design norms, and industry guidelines. How these constraints impact designers' ideation and exploration processes should be addressed in creativity-support tools for design. Through an exploratory interview study, we identified three designer personas with varying views on having constraints in the ideation process, which guided the… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: 23 pages, 8 figures, CHI 2025

  49. arXiv:2501.10375  [pdf, other

    cs.DC cs.LG

    DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference

    Authors: Yujie Zhang, Shivam Aggarwal, Tulika Mitra

    Abstract: Mixture-of-Experts (MoE) models, though highly effective for various machine learning tasks, face significant deployment challenges on memory-constrained devices. While GPUs offer fast inference, their limited memory compared to CPUs means not all experts can be stored on the GPU simultaneously, necessitating frequent, costly data transfers from CPU memory, often negating GPU speed advantages. To… ▽ More

    Submitted 4 May, 2025; v1 submitted 16 December, 2024; originally announced January 2025.

    Comments: 7 pages, 10 figures, Accepted by DATE Conference 2025

  50. arXiv:2501.02184  [pdf, other

    cs.RO math.OC

    Model-Free and Real-Time Bioinspired Unicycle-Based Source Seeking: Differential Wheeled Robotic Experiments

    Authors: Ahmed A. Elgohary, Sameh A. Eisa, Shivam Bajpai

    Abstract: Bioinspred robots aimed at source-seeking are often studied, and their controls designed, using unicycle modeling and formulation. This is true not only for model-based controllers, but also for model-free, real-time control methods such as extremum seeking control (ESC). In this paper, we propose a unicycle-based ESC design applicable to differential wheeled robots that: (1) is very simple design… ▽ More

    Submitted 11 January, 2025; v1 submitted 3 January, 2025; originally announced January 2025.