Skip to main content

Showing 1–50 of 542 results for author: Jain, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.25435  [pdf, ps, other

    cs.AI

    GESA: Graph-Enhanced Semantic Allocation for Generalized, Fair, and Explainable Candidate-Role Matching

    Authors: Rishi Ashish Shah, Shivaay Dhondiyal, Kartik Sharma, Sukriti Talwar, Saksham Jain, Sparsh Jain

    Abstract: Accurate, fair, and explainable allocation of candidates to roles represents a fundamental challenge across multiple domains including corporate hiring, academic admissions, fellowship awards, and volunteer placement systems. Current state-of-the-art approaches suffer from semantic inflexibility, persistent demographic bias, opacity in decision-making processes, and poor scalability under dynamic… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  2. arXiv:2509.21267  [pdf, ps, other

    cs.CL cs.CY

    LLM Output Homogenization is Task Dependent

    Authors: Shomik Jain, Jack Lanchantin, Maximilian Nickel, Karen Ullrich, Ashia Wilson, Jamelle Watson-Daniels

    Abstract: A large language model can be less helpful if it exhibits output response homogenization. But whether two responses are considered homogeneous, and whether such homogenization is problematic, both depend on the task category. For instance, in objective math tasks, we often expect no variation in the final answer but anticipate variation in the problem-solving strategy. Whereas, for creative writin… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  3. arXiv:2509.19708  [pdf, ps, other

    cs.SE cs.AI cs.LG

    Intuition to Evidence: Measuring AI's True Impact on Developer Productivity

    Authors: Anand Kumar, Vishal Khare, Deepak Sharma, Satyam Kumar, Vijay Saini, Anshul Yadav, Sachendra Jain, Ankit Rana, Pratham Verma, Vaibhav Meena, Avinash Edubilli

    Abstract: We present a comprehensive real-world evaluation of AI-assisted software development tools deployed at enterprise scale. Over one year, 300 engineers across multiple teams integrated an in-house AI platform (DeputyDev) that combines code generation and automated review capabilities into their daily workflows. Through rigorous cohort analysis, our study demonstrates statistically significant produc… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: 16 pages, 10 figures, 5 tables

  4. arXiv:2509.12517  [pdf, ps, other

    cs.HC

    Extended AI Interactions Shape Sycophancy and Perspective Mimesis

    Authors: Shomik Jain, Charlotte Park, Matheus Mesquita Viana, Ashia Wilson, Dana Calacci

    Abstract: We investigate whether long-context interactions between users and LLMs lead to AI mirroring behaviors. We focus on two forms of mirroring: (1) sycophancy -- the tendency of models to be overly agreeable with users, and (2) perspective mimesis -- the extent to which models reflect a user's perspective. Using two weeks of interaction context collected from 38 users, we compare model responses with… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

  5. arXiv:2509.12239  [pdf

    cs.LG cs.CV

    InJecteD: Analyzing Trajectories and Drift Dynamics in Denoising Diffusion Probabilistic Models for 2D Point Cloud Generation

    Authors: Sanyam Jain, Khuram Naveed, Illia Oleksiienko, Alexandros Iosifidis, Ruben Pauwels

    Abstract: This work introduces InJecteD, a framework for interpreting Denoising Diffusion Probabilistic Models (DDPMs) by analyzing sample trajectories during the denoising process of 2D point cloud generation. We apply this framework to three datasets from the Datasaurus Dozen bullseye, dino, and circle using a simplified DDPM architecture with customizable input and time embeddings. Our approach quantifie… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

  6. arXiv:2509.10582  [pdf, ps, other

    cs.CY cs.AI

    LearnLens: An AI-Enhanced Dashboard to Support Teachers in Open-Ended Classrooms

    Authors: Namrata Srivastava, Shruti Jain, Clayton Cohn, Naveeduddin Mohammed, Umesh Timalsina, Gautam Biswas

    Abstract: Exploratory learning environments (ELEs), such as simulation-based platforms and open-ended science curricula, promote hands-on exploration and problem-solving but make it difficult for teachers to gain timely insights into students' conceptual understanding. This paper presents LearnLens, a generative AI (GenAI)-enhanced teacher-facing dashboard designed to support problem-based instruction in mi… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: 9 pages

    ACM Class: K.3.1

  7. arXiv:2509.07325  [pdf, ps, other

    cs.LG

    CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation

    Authors: Alyssa Unell, Noel C. F. Codella, Sam Preston, Peniel Argaw, Wen-wai Yim, Zelalem Gero, Cliff Wong, Rajesh Jena, Eric Horvitz, Amanda K. Hall, Ruican Rachel Zhong, Jiachen Li, Shrey Jain, Mu Wei, Matthew Lungren, Hoifung Poon

    Abstract: The National Comprehensive Cancer Network (NCCN) provides evidence-based guidelines for cancer treatment. Translating complex patient presentations into guideline-compliant treatment recommendations is time-intensive, requires specialized expertise, and is prone to error. Advances in large language model (LLM) capabilities promise to reduce the time required to generate treatment recommendations a… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  8. arXiv:2509.06602  [pdf, ps, other

    cs.LG cs.AI

    Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards

    Authors: Matthias Blondeel, Noel Codella, Sam Preston, Hao Qiu, Leonardo Schettini, Frank Tuan, Wen-wai Yim, Smitha Saligrama, Mert Öz, Shrey Jain, Matthew P. Lungren, Thomas Osborne

    Abstract: Molecular Tumor Boards (MTBs) are multidisciplinary forums where oncology specialists collaboratively assess complex patient cases to determine optimal treatment strategies. A central element of this process is the patient summary, typically compiled by a medical oncologist, radiation oncologist, or surgeon, or their trained medical assistant, who distills heterogeneous medical records into a conc… ▽ More

    Submitted 11 September, 2025; v1 submitted 8 September, 2025; originally announced September 2025.

    Comments: 9 pages, 1 figure; Added missing co-authors and contributors

  9. arXiv:2509.06553  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Impact of Labeling Inaccuracy and Image Noise on Tooth Segmentation in Panoramic Radiographs using Federated, Centralized and Local Learning

    Authors: Johan Andreas Balle Rubak, Khuram Naveed, Sanyam Jain, Lukas Esterle, Alexandros Iosifidis, Ruben Pauwels

    Abstract: Objectives: Federated learning (FL) may mitigate privacy constraints, heterogeneous data quality, and inconsistent labeling in dental diagnostic AI. We compared FL with centralized (CL) and local learning (LL) for tooth segmentation in panoramic radiographs across multiple data corruption scenarios. Methods: An Attention U-Net was trained on 2066 radiographs from six institutions across four setti… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  10. arXiv:2509.03741  [pdf, ps, other

    cs.HC cs.AI

    Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support

    Authors: Eduardo Davalos, Yike Zhang, Shruti Jain, Namrata Srivastava, Trieu Truong, Nafees-ul Haque, Tristan Van, Jorge Salas, Sara McFadden, Sun-Joo Cho, Gautam Biswas, Amanda Goodwin

    Abstract: Eye-tracking offers rich insights into student cognition and engagement, but remains underutilized in classroom-facing educational technology due to challenges in data interpretation and accessibility. In this paper, we present the iterative design and evaluation of a gaze-based learning analytics dashboard for English Language Arts (ELA), developed through five studies involving teachers and stud… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

    Comments: 22 pages, 9 figures, 3 tables, submitted to IUI2026

  11. arXiv:2509.00955  [pdf, ps, other

    cs.LG cs.AI stat.ML

    ART: Adaptive Resampling-based Training for Imbalanced Classification

    Authors: Arjun Basandrai, Shourya Jain, K. Ilanthenral

    Abstract: Traditional resampling methods for handling class imbalance typically uses fixed distributions, undersampling the majority or oversampling the minority. These static strategies ignore changes in class-wise learning difficulty, which can limit the overall performance of the model. This paper proposes an Adaptive Resampling-based Training (ART) method that periodically updates the distribution of… ▽ More

    Submitted 31 August, 2025; originally announced September 2025.

    Comments: Submitted to SIGKDD'26

  12. arXiv:2508.19316  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Sycophancy as compositions of Atomic Psychometric Traits

    Authors: Shreyans Jain, Alexandra Yost, Amirali Abdullah

    Abstract: Sycophancy is a key behavioral risk in LLMs, yet is often treated as an isolated failure mode that occurs via a single causal mechanism. We instead propose modeling it as geometric and causal compositions of psychometric traits such as emotionality, openness, and agreeableness - similar to factor decomposition in psychometrics. Using Contrastive Activation Addition (CAA), we map activation directi… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: 8 pages, 4 figures

    ACM Class: I.2.7; I.2.4

  13. arXiv:2508.14565  [pdf, ps, other

    cs.LG cs.DC

    Cooperative SGD with Dynamic Mixing Matrices

    Authors: Soumya Sarkar, Shweta Jain

    Abstract: One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A substantial number of works in the distributed SGD setting assume a fixed topology for the edge devices. These papers also assume that the contribution of nodes to… ▽ More

    Submitted 21 August, 2025; v1 submitted 20 August, 2025; originally announced August 2025.

    Comments: Accepted at 28th European Conference on Artificial Intelligence (ECAI-2025) in Main Paper track

  14. arXiv:2508.14444  [pdf, ps, other

    cs.CL cs.AI cs.LG

    NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

    Authors: NVIDIA, :, Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Alexander Bukharin, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan , et al. (192 additional authors not shown)

    Abstract: We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achi… ▽ More

    Submitted 2 September, 2025; v1 submitted 20 August, 2025; originally announced August 2025.

  15. arXiv:2508.13856  [pdf, ps, other

    cs.MA cs.GT

    The Multi-Stage Assignment Problem: A Fairness Perspective

    Authors: Vibulan J, Swapnil Dhamal, Shweta Jain

    Abstract: This paper explores the problem of fair assignment on Multi-Stage graphs. A multi-stage graph consists of nodes partitioned into $K$ disjoint sets (stages) structured as a sequence of weighted bipartite graphs formed across adjacent stages. The goal is to assign node-disjoint paths to $n$ agents starting from the first stage and ending in the last stage. We show that an efficient assignment that m… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: The original version of this paper is accepted in the 28th European Conference on Artificial Intelligence (ECAI), 2025

  16. arXiv:2508.10925  [pdf, ps, other

    cs.CL cs.AI

    gpt-oss-120b & gpt-oss-20b Model Card

    Authors: OpenAI, :, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck, Che Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook , et al. (102 additional authors not shown)

    Abstract: We present gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models that push the frontier of accuracy and inference cost. The models use an efficient mixture-of-expert transformer architecture and are trained using large-scale distillation and reinforcement learning. We optimize the models to have strong agentic capabilities (deep research browsing, python tool use, and support for develope… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

  17. arXiv:2508.09478  [pdf, ps, other

    cs.CV

    GazeLT: Visual attention-guided long-tailed disease classification in chest radiographs

    Authors: Moinak Bhattacharya, Gagandeep Singh, Shubham Jain, Prateek Prasanna

    Abstract: In this work, we present GazeLT, a human visual attention integration-disintegration approach for long-tailed disease classification. A radiologist's eye gaze has distinct patterns that capture both fine-grained and coarser level disease related information. While interpreting an image, a radiologist's attention varies throughout the duration; it is critical to incorporate this into a deep learnin… ▽ More

    Submitted 13 August, 2025; originally announced August 2025.

  18. arXiv:2508.09224  [pdf, ps, other

    cs.CY cs.AI cs.CL

    From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training

    Authors: Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman, Alec Helyar, Alex Beutel, Andrea Vallone, Saachi Jain

    Abstract: Large Language Models used in ChatGPT have traditionally been trained to learn a refusal boundary: depending on the user's intent, the model is taught to either fully comply or outright refuse. While this is a strong mitigation for explicitly malicious prompts, focusing safety training on refusals can lead to brittleness for prompts with obscured user intent. Binary refusal boundaries are especial… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

  19. arXiv:2508.01503  [pdf, ps, other

    cs.CL

    A Theory of Adaptive Scaffolding for LLM-Based Pedagogical Agents

    Authors: Clayton Cohn, Surya Rayala, Namrata Srivastava, Joyce Horn Fonteles, Shruti Jain, Xinying Luo, Divya Mereddy, Naveeduddin Mohammed, Gautam Biswas

    Abstract: Large language models (LLMs) present new opportunities for creating pedagogical agents that engage in meaningful dialogue to support student learning. However, the current use of LLM systems like ChatGPT in classrooms often lacks the solid theoretical foundation found in earlier intelligent tutoring systems. To bridge this gap, we propose a framework that combines Evidence-Centered Design with Soc… ▽ More

    Submitted 2 August, 2025; originally announced August 2025.

  20. arXiv:2507.21200  [pdf

    cs.CV cs.ET cs.LG eess.IV

    PanoGAN A Deep Generative Model for Panoramic Dental Radiographs

    Authors: Soren Pedersen, Sanyam Jain, Mikkel Chavez, Viktor Ladehoff, Bruna Neves de Freitas, Ruben Pauwels

    Abstract: This paper presents the development of a generative adversarial network (GAN) for synthesizing dental panoramic radiographs. Although exploratory in nature, the study aims to address the scarcity of data in dental research and education. We trained a deep convolutional GAN (DCGAN) using a Wasserstein loss with gradient penalty (WGANGP) on a dataset of 2322 radiographs of varying quality. The focus… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

  21. arXiv:2507.18675  [pdf

    cs.CV cs.LG

    Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

    Authors: Utkarsh Shandilya, Marsha Mariya Kappan, Sanyam Jain, Vijeta Sharma

    Abstract: Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language mo… ▽ More

    Submitted 30 July, 2025; v1 submitted 24 July, 2025; originally announced July 2025.

  22. arXiv:2507.18674  [pdf

    nlin.CG cs.FL

    Frequency-Histogram Coarse Graining in Elementary Cellular Automata and 2D CA

    Authors: Sanyam Jain, Stefano Nichele

    Abstract: Cellular automata and other discrete dynamical systems have long been studied as models of emergent complexity. Recently, neural cellular automata have been proposed as models to investigate the emerge of a more general artificial intelligence, thanks to their propensity to support properties such as self-organization, emergence, and open-endedness. However, understanding emergent complexity in la… ▽ More

    Submitted 24 July, 2025; originally announced July 2025.

  23. Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search

    Authors: Chang Eun Song, Weihong Xu, Keming Fan, Soumil Jain, Gopabandhu Hota, Haichao Yang, Leo Liu, Kerem Akarvardar, Meng-Fan Chang, Carlos H. Diaz, Gert Cauwenberghs, Tajana Rosing, Mingu Kang

    Abstract: Clo-HDnn is an on-device learning (ODL) accelerator designed for emerging continual learning (CL) tasks. Clo-HDnn integrates hyperdimensional computing (HDC) along with low-cost Kronecker HD Encoder and weight clustering feature extraction (WCFE) to optimize accuracy and efficiency. Clo-HDnn adopts gradient-free CL to efficiently update and store the learned knowledge in the form of class hypervec… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: Published in 2025 Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Kyoto, Japan, 2025

  24. arXiv:2507.14372  [pdf, ps, other

    cs.CL cs.AI cs.DB cs.HC

    Text-to-SQL for Enterprise Data Analytics

    Authors: Albert Chen, Manas Bundele, Gaurav Ahlawat, Patrick Stetz, Zhitao Wang, Qiang Fei, Donghoon Jung, Audrey Chu, Bharadwaj Jayaraman, Ayushi Panth, Yatin Arora, Sourav Jain, Renjith Varma, Alexey Ilin, Iuliia Melnychuk, Chelsea Chueh, Joyan Sil, Xiaofeng Wang

    Abstract: The introduction of large language models has brought rapid progress on Text-to-SQL benchmarks, but it is not yet easy to build a working enterprise solution. In this paper, we present insights from building an internal chatbot that enables LinkedIn's product managers, engineers, and operations teams to self-serve data insights from a large, dynamic data lake. Our approach features three component… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

    Comments: 11 pages, 8 figures, Workshop on Agentic AI for Enterprise at KDD '25

  25. Federated Learning for Commercial Image Sources

    Authors: Shreyansh Jain, Koteswar Rao Jerripothula

    Abstract: Federated Learning is a collaborative machine learning paradigm that enables multiple clients to learn a global model without exposing their data to each other. Consequently, it provides a secure learning platform with privacy-preserving capabilities. This paper introduces a new dataset containing 23,326 images collected from eight different commercial sources and classified into 31 categories, si… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: Published in the Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023 with DOI: 10.1109/WACV56688.2023.00647

    Journal ref: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 6523-6532, 2023

  26. arXiv:2507.09227  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution

    Authors: Sanyam Jain, Bruna Neves de Freitas, Andreas Basse-OConnor, Alexandros Iosifidis, Ruben Pauwels

    Abstract: There has been increasing interest in the generation of high-quality, realistic synthetic medical images in recent years. Such synthetic datasets can mitigate the scarcity of public datasets for artificial intelligence research, and can also be used for educational purposes. In this paper, we propose a combination of diffusion-based generation (PanoDiff) and Super-Resolution (SR) for generating sy… ▽ More

    Submitted 12 July, 2025; originally announced July 2025.

  27. arXiv:2507.09075  [pdf, ps, other

    cs.CL

    OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

    Authors: Wasi Uddin Ahmad, Somshubra Majumdar, Aleksander Ficek, Sean Narenthiran, Mehrzad Samadi, Jocelyn Huang, Siddhartha Jain, Vahid Noroozi, Boris Ginsburg

    Abstract: Recent advancements in reasoning-based Large Language Models (LLMs), particularly their potential through test-time scaling, have created significant opportunities for distillation in code generation and critique. However, progress in both areas fundamentally depends on large-scale, high-quality datasets. In this work, we introduce OpenCodeReasoning-II, a dataset consists of 2.5M question-solution… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: work in progress

  28. Analysis of Propaganda in Tweets From Politically Biased Sources

    Authors: Vivek Sharma, Mohammad Mahdi Shokri, Sarah Ita Levitan, Elena Filatova, Shweta Jain

    Abstract: News outlets are well known to have political associations, and many national outlets cultivate political biases to cater to different audiences. Journalists working for these news outlets have a big impact on the stories they cover. In this work, we present a methodology to analyze the role of journalists, affiliated with popular news outlets, in propagating their bias using some form of propagan… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: The International FLAIRS Conference Proceedings, 38(1). https://doi.org/10.32473/flairs.38.1.138706

  29. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  30. arXiv:2507.01042  [pdf, ps, other

    cs.IR cs.AI cs.CL

    Can Argus Judge Them All? Comparing VLMs Across Domains

    Authors: Harsh Joshi, Gautam Siddharth Kashyap, Rafiq Ali, Ebad Shabbir, Niharika Jain, Sarthak Jain, Jiechao Gao, Usman Naseem

    Abstract: Vision-Language Models (VLMs) are advancing multimodal AI, yet their performance consistency across tasks is underexamined. We benchmark CLIP, BLIP, and LXMERT across diverse datasets spanning retrieval, captioning, and reasoning. Our evaluation includes task accuracy, generation quality, efficiency, and a novel Cross-Dataset Consistency (CDC) metric. CLIP shows strongest generalization (CDC: 0.92… ▽ More

    Submitted 23 June, 2025; originally announced July 2025.

  31. arXiv:2506.22554  [pdf, ps, other

    cs.CV cs.AI

    Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset

    Authors: Vasu Agrawal, Akinniyi Akinyemi, Kathryn Alvero, Morteza Behrooz, Julia Buffalini, Fabio Maria Carlucci, Joy Chen, Junming Chen, Zhang Chen, Shiyang Cheng, Praveen Chowdary, Joe Chuang, Antony D'Avirro, Jon Daly, Ning Dong, Mark Duppenthaler, Cynthia Gao, Jeff Girard, Martin Gleize, Sahir Gomez, Hongyu Gong, Srivathsan Govindarajan, Brandon Han, Sen He, Denise Hernandez , et al. (59 additional authors not shown)

    Abstract: Human communication involves a complex interplay of verbal and nonverbal signals, essential for conveying meaning and achieving interpersonal goals. To develop socially intelligent AI technologies, it is crucial to develop models that can both comprehend and generate dyadic behavioral dynamics. To this end, we introduce the Seamless Interaction Dataset, a large-scale collection of over 4,000 hours… ▽ More

    Submitted 30 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  32. arXiv:2506.20703  [pdf, ps, other

    cs.GR cs.CV

    Generative Blocks World: Moving Things Around in Pictures

    Authors: Vaibhav Vavilala, Seemandhar Jain, Rahul Vasanth, D. A. Forsyth, Anand Bhattad

    Abstract: We describe Generative Blocks World to interact with the scene of a generated image by manipulating simple geometric abstractions. Our method represents scenes as assemblies of convex 3D primitives, and the same scene can be represented by different numbers of primitives, allowing an editor to move either whole structures or small details. Once the scene geometry has been edited, the image is gene… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 23 pages, 16 figures, 2 tables

  33. Efficient Computation of Closed Substrings

    Authors: Samkith K Jain, Neerja Mhaskar

    Abstract: A closed string $u$ is either of length one or contains a border that occurs only as a prefix and as a suffix in $u$ and nowhere else within $u$. In this paper, we present a fast and practical $O(n\log n)$ time algorithm to compute all $Θ(n^2)$ closed substrings by introducing a compact representation for all closed substrings of a string $ w[1..n]$, using only $O(n \log n)$ space. We also present… ▽ More

    Submitted 22 September, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

    Comments: Published at SPIRE - London, UK 2025

  34. arXiv:2506.04567  [pdf, ps, other

    cs.LG cs.CV

    StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation

    Authors: Ranjith Merugu, Bryan Bo Cao, Shubham Jain

    Abstract: Model merging has emerged as a promising solution to accommodate multiple large models within constrained memory budgets. We present StatsMerging, a novel lightweight learning-based model merging method guided by weight distribution statistics without requiring ground truth labels or test samples. StatsMerging offers three key advantages: (1) It uniquely leverages singular values from singular val… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 14 pages, 4 figures, 7 tables

    MSC Class: 68T05; 68T07; 68T45 ACM Class: I.4.0; I.4.9; I.5.1; I.5.4

  35. arXiv:2506.03793  [pdf, other

    cs.CL

    Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts

    Authors: Sidharth Pulipaka, Sparsh Jain, Ashwin Sankar, Raj Dabre

    Abstract: Punctuation plays a vital role in structuring meaning, yet current models often struggle to restore it accurately in transcripts of spontaneous speech, especially in the presence of disfluencies such as false starts and backtracking. These limitations hinder the performance of downstream tasks like translation, text to speech, summarization, etc. where sentence boundaries are critical for preservi… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Work in Progress

  36. arXiv:2506.03378  [pdf, ps, other

    eess.AS cs.CV cs.MM

    SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer

    Authors: Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

    Abstract: As video-sharing platforms have grown over the past decade, child viewership has surged, increasing the need for precise detection of harmful content like violence or explicit scenes. Malicious users exploit moderation systems by embedding unsafe content in minimal frames to evade detection. While prior research has focused on visual cues and advanced such fine-grained detection, audio features re… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  37. arXiv:2506.00450  [pdf, ps, other

    cs.IR cs.LG

    DV365: Extremely Long User History Modeling at Instagram

    Authors: Wenhan Lyu, Devashish Tyagi, Yihang Yang, Ziwei Li, Ajay Somani, Karthikeyan Shanmugasundaram, Nikola Andrejevic, Ferdi Adeputra, Curtis Zeng, Arun K. Singh, Maxime Ransan, Sagar Jain

    Abstract: Long user history is highly valuable signal for recommendation systems, but effectively incorporating it often comes with high cost in terms of data center power consumption and GPU. In this work, we chose offline embedding over end-to-end sequence length optimization methods to enable extremely long user sequence modeling as a cost-effective solution, and propose a new user embedding learning str… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: SIGKDD 2025 accepted

  38. arXiv:2505.23802  [pdf, ps, other

    cs.CL cs.AI

    MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

    Authors: Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Michael Wornow, Juan M. Banda, Nikesh Kotecha, Timothy Keyes, Yifan Mai, Mert Oez, Hao Qiu, Shrey Jain, Leonardo Schettini, Mehr Kashyap, Jason Alan Fries, Akshay Swaminathan, Philip Chung, Fateme Nateghi, Asad Aali, Ashwin Nayak, Shivam Vedak, Sneha S. Jain, Birju Patel, Oluseyi Fayanju, Shreya Shah , et al. (56 additional authors not shown)

    Abstract: While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. We introduce MedHELM, an extensible evaluation framework for assessing LLM performance for medical tasks with three key contributions. First, a clinician-validated taxonomy spanning 5 categories, 22 subcatego… ▽ More

    Submitted 2 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  39. arXiv:2505.18893  [pdf, ps, other

    cs.CY cs.AI

    Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects

    Authors: Reva Schwartz, Rumman Chowdhury, Akash Kundu, Heather Frase, Marzieh Fadaee, Tom David, Gabriella Waters, Afaf Taik, Morgan Briggs, Patrick Hall, Shomik Jain, Kyra Yee, Spencer Thomas, Sundeep Bhandari, Paul Duncan, Andrew Thompson, Maya Carlyle, Qinghua Lu, Matthew Holmes, Theodora Skeadas

    Abstract: Conventional AI evaluation approaches concentrated within the AI stack exhibit systemic limitations for exploring, navigating and resolving the human and societal factors that play out in real world deployment such as in education, finance, healthcare, and employment sectors. AI capability evaluations can capture detail about first-order effects, such as whether immediate system outputs are accura… ▽ More

    Submitted 30 May, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: 9 pages

  40. arXiv:2505.17238  [pdf, ps, other

    cs.CL

    Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG)

    Authors: Clayton Cohn, Surya Rayala, Caitlin Snyder, Joyce Fonteles, Shruti Jain, Naveeduddin Mohammed, Umesh Timalsina, Sarah K. Burriss, Ashwin T S, Namrata Srivastava, Menton Deweese, Angela Eeds, Gautam Biswas

    Abstract: Collaborative dialogue offers rich insights into students' learning and critical thinking, which is essential for personalizing pedagogical agent interactions in STEM+C settings. While large language models (LLMs) facilitate dynamic pedagogical interactions, hallucinations undermine confidence, trust, and instructional value. Retrieval-augmented generation (RAG) grounds LLM outputs in curated know… ▽ More

    Submitted 16 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: To appear in the International Conference on Artificial Intelligence in Education (AIED25) Workshop on Epistemics and Decision-Making in AI-Supported Education

  41. arXiv:2505.14978  [pdf, ps, other

    cs.SE cs.AI cs.LG

    JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation

    Authors: Ghasem Pasandi, Kishor Kunal, Varun Tej, Kunjal Shah, Hanfei Sun, Sumit Jain, Chunhui Li, Chenhui Deng, Teodor-Dumitru Ene, Haoxing Ren, Sreedhar Pratty

    Abstract: This paper presents JARVIS, a novel multi-agent framework that leverages Large Language Models (LLMs) and domain expertise to generate high-quality scripts for specialized Electronic Design Automation (EDA) tasks. By combining a domain-specific LLM trained with synthetically generated data, a custom compiler for structural verification, rule enforcement, code fixing capabilities, and advanced retr… ▽ More

    Submitted 15 August, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  42. arXiv:2505.06771  [pdf, ps, other

    cs.RO cs.LG cs.MA

    JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes

    Authors: Shalin Anand Jain, Jiazhen Liu, Siva Kailas, Harish Ravichandar

    Abstract: Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluat… ▽ More

    Submitted 26 May, 2025; v1 submitted 10 May, 2025; originally announced May 2025.

    Comments: 22 pages, 14 figures, 10 tables. https://github.com/GT-STAR-Lab/JaxRobotarium

  43. arXiv:2505.00949  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Llama-Nemotron: Efficient Reasoning Models

    Authors: Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani, Ido Shahaf, Oren Tropp, Ehud Karpas, Ran Zilberstein, Jiaqi Zeng, Soumye Singhal, Alexander Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shen, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delalleau, Zijia Chen , et al. (111 additional authors not shown)

    Abstract: We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior i… ▽ More

    Submitted 9 September, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  44. arXiv:2504.13180  [pdf, ps, other

    cs.CV cs.AI cs.LG

    PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

    Authors: Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi, Triantafyllos Afouras, Tushar Nagarajan, Muhammad Maaz, Yale Song, Tengyu Ma, Shuming Hu, Suyog Jain, Miguel Martin, Huiyu Wang, Hanoona Rasheed, Peize Sun, Po-Yao Huang, Daniel Bolya, Nikhila Ravi, Shashank Jain, Tammy Stark, Shane Moon, Babak Damavandi, Vivian Lee, Andrew Westbury, Salman Khan, Philipp Krähenbühl , et al. (4 additional authors not shown)

    Abstract: Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training recipe. The research community has responded by using distillation from black-box models to label training data, achieving strong benchmark results, at the cost of measurable scientific progress. However, without knowing the details of the… ▽ More

    Submitted 23 July, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Technical Report

  45. CiMBA: Accelerating Genome Sequencing through On-Device Basecalling via Compute-in-Memory

    Authors: William Andrew Simon, Irem Boybat, Riselda Kodra, Elena Ferro, Gagandeep Singh, Mohammed Alser, Shubham Jain, Hsinyu Tsai, Geoffrey W. Burr, Onur Mutlu, Abu Sebastian

    Abstract: As genome sequencing is finding utility in a wide variety of domains beyond the confines of traditional medical settings, its computational pipeline faces two significant challenges. First, the creation of up to 0.5 GB of data per minute imposes substantial communication and storage overheads. Second, the sequencing pipeline is bottlenecked at the basecalling step, consuming >40% of genome analysi… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Accepted to IEEE Transactions on Parallel and Distributed Systems

    Journal ref: IEEE Transactions on Parallel and Distributed Systems, pp. 1-15, 2025

  46. arXiv:2504.03624  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

    Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

    Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More

    Submitted 5 September, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  47. arXiv:2504.01943  [pdf, ps, other

    cs.CL

    OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

    Authors: Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg

    Abstract: Since the advent of reasoning-based large language models, many have found great success from distilling reasoning capabilities into student models. Such techniques have significantly bridged the gap between reasoning and standard LLMs on coding tasks. Despite this, much of the progress on distilling reasoning models remains locked behind proprietary datasets or lacks details on data curation, fil… ▽ More

    Submitted 7 August, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: Published at COLM 2025

  48. arXiv:2503.17431  [pdf, other

    math.OC cs.CE

    Adjoint Sensitivities for the Optimization of Nonlinear Structural Dynamics via Spectral Submanifolds

    Authors: Matteo Pozzi, Jacopo Marconi, Shobhit Jain, Mingwu Li, Francesco Braghin

    Abstract: This work presents an optimization framework for tailoring the nonlinear dynamic response of lightly damped mechanical systems using Spectral Submanifold (SSM) reduction. We derive the SSM-based backbone curve and its sensitivity with respect to parameters up to arbitrary polynomial orders, enabling efficient and accurate optimization of the nonlinear frequency-amplitude relation. We use the adjoi… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  49. Allocation Multiplicity: Evaluating the Promises of the Rashomon Set

    Authors: Shomik Jain, Margaret Wang, Kathleen Creel, Ashia Wilson

    Abstract: The Rashomon set of equally-good models promises less discriminatory algorithms, reduced outcome homogenization, and fairer decisions through model ensembles or reconciliation. However, we argue from the perspective of allocation multiplicity that these promises may remain unfulfilled. When there are more qualified candidates than resources available, many different allocations of scarce resources… ▽ More

    Submitted 1 September, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

    Comments: To appear in the proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT 2025)

    ACM Class: K.4.0

  50. arXiv:2503.12964  [pdf, other

    cs.CV cs.AI cs.LG

    Training Video Foundation Models with NVIDIA NeMo

    Authors: Zeeshan Patel, Ethan He, Parth Mannan, Xiaowei Ren, Ryan Wolf, Niket Agarwal, Jacob Huffman, Zhuoyao Wang, Carl Wang, Jack Chang, Yan Bai, Tommy Huang, Linnan Wang, Sahil Jain, Shanmugam Ramasamy, Joseph Jennings, Ekaterina Sirazitdinova, Oleg Sudakov, Mingyuan Ma, Bobby Chen, Forrest Lin, Hao Wang, Vasanth Rao Naik Sabavat, Sriharsha Niverty, Rong Ou , et al. (4 additional authors not shown)

    Abstract: Video Foundation Models (VFMs) have recently been used to simulate the real world to train physical AI systems and develop creative visual experiences. However, there are significant challenges in training large-scale, high quality VFMs that can generate high-quality videos. We present a scalable, open-source VFM training pipeline with NVIDIA NeMo, providing accelerated video dataset curation, mul… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.