Skip to main content

Showing 1–50 of 605 results for author: Prateek

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15029  [pdf

    cs.SD cs.CL cs.CV eess.AS

    An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW

    Authors: Prateek Mehta, Anasuya Patil

    Abstract: Knowledge extraction through sound is a distinctive property. Visually impaired individuals often rely solely on Braille books and audio recordings provided by NGOs. Due to limitations in these approaches, blind individuals often cannot access books of their choice. Speech is a more effective mode of communication than text for blind and visually impaired persons, as they can easily respond to sou… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 9 pages, 9 figures

    MSC Class: 14J60 ACM Class: I.2.7; I.4; I.5; I.7.5

  2. arXiv:2506.06644  [pdf, ps, other

    cs.LG stat.ML

    Spark Transformer: Reactivating Sparsity in FFN and Attention

    Authors: Chong You, Kan Wu, Zhipeng Jia, Lin Chen, Srinadh Bhojanapalli, Jiaxian Guo, Utku Evci, Jan Wassenberg, Praneeth Netrapalli, Jeremiah J. Willcock, Suvinay Subramanian, Felix Chern, Alek Andreev, Shreya Pathak, Felix Yu, Prateek Jain, David E. Culler, Henry M. Levy, Sanjiv Kumar

    Abstract: The discovery of the lazy neuron phenomenon in trained Transformers, where the vast majority of neurons in their feed-forward networks (FFN) are inactive for each token, has spurred tremendous interests in activation sparsity for enhancing large model efficiency. While notable progress has been made in translating such sparsity to wall-time benefits, modern Transformers have moved away from the Re… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  3. arXiv:2506.05127  [pdf, ps, other

    eess.IV cs.CV q-bio.QM

    PixCell: A generative foundation model for digital histopathology images

    Authors: Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Saarthak Kapse, Tarak Nath Nandi, Ravi K Madduri, Prateek Prasanna, Tahsin Kurc, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras

    Abstract: The digitization of histology slides has revolutionized pathology, providing massive datasets for cancer diagnosis and research. Contrastive self-supervised and vision-language models have been shown to effectively mine large pathology datasets to learn discriminative representations. On the other hand, generative models, capable of synthesizing realistic and diverse images, present a compelling s… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2506.04165  [pdf, ps, other

    cs.LG cs.DS

    Faster Approx. Top-K: Harnessing the Full Power of Two Stages

    Authors: Yashas Samaga, Varun Yerram, Spandana Raj Babbula, Prateek Jain, Praneeth Netrapalli

    Abstract: We consider the Top-$K$ selection problem, which aims to identify the largest-$K$ elements from an array. Top-$K$ selection arises in many machine learning algorithms and often becomes a bottleneck on accelerators, which are optimized for dense matrix multiplications. To address this problem, \citet{chern2022tpuknnknearestneighbor} proposed a fast two-stage \textit{approximate} Top-$K$ algorithm:… ▽ More

    Submitted 5 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: Includes appendix, 29 pages, and 10 figures. A preliminary version of this paper was rejected from MLSys 2025

  5. arXiv:2505.24703  [pdf, ps, other

    cs.CR cs.CV cs.LG

    PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches

    Authors: Dennis Jacob, Chong Xiang, Prateek Mittal

    Abstract: Deep learning techniques have enabled vast improvements in computer vision technologies. Nevertheless, these models are vulnerable to adversarial patch attacks which catastrophically impair performance. The physically realizable nature of these attacks calls for certifiable defenses, which feature provable guarantees on robustness. While certifiable defenses have been successfully applied to singl… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  6. arXiv:2505.23675  [pdf, ps, other

    eess.IV cs.CV

    ImmunoDiff: A Diffusion Model for Immunotherapy Response Prediction in Lung Cancer

    Authors: Moinak Bhattacharya, Judy Huang, Amna F. Sher, Gagandeep Singh, Chao Chen, Prateek Prasanna

    Abstract: Accurately predicting immunotherapy response in Non-Small Cell Lung Cancer (NSCLC) remains a critical unmet need. Existing radiomics and deep learning-based predictive models rely primarily on pre-treatment imaging to predict categorical response outcomes, limiting their ability to capture the complex morphological and textural transformations induced by immunotherapy. This study introduces Immuno… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  7. arXiv:2505.23337  [pdf, ps, other

    cs.LG cs.AI

    Matryoshka Model Learning for Improved Elastic Student Models

    Authors: Chetan Verma, Aditya Srinivas Timmaraju, Cho-Jui Hsieh, Suyash Damle, Ngot Bui, Yang Zhang, Wen Chen, Xin Liu, Prateek Jain, Inderjit S Dhillon

    Abstract: Industry-grade ML models are carefully designed to meet rapidly evolving serving constraints, which requires significant resources for model development. In this paper, we propose MatTA, a framework for training multiple accurate Student models using a novel Teacher-TA-Student recipe. TA models are larger versions of the Student models with higher capacity, and thus allow Student models to better… ▽ More

    Submitted 2 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures, Accepted at KDD 2025

  8. arXiv:2505.22894  [pdf, other

    cs.CC cs.DM

    Monotone Bounded-Depth Complexity of Homomorphism Polynomials

    Authors: C. S. Bhargav, Shiteng Chen, Radu Curticapean, Prateek Dwivedi

    Abstract: For every fixed graph $H$, it is known that homomorphism counts from $H$ and colorful $H$-subgraph counts can be determined in $O(n^{t+1})$ time on $n$-vertex input graphs $G$, where $t$ is the treewidth of $H$. On the other hand, a running time of $n^{o(t / \log t)}$ would refute the exponential-time hypothesis. Komarath, Pandey and Rahul (Algorithmica, 2023) studied algebraic variants of these c… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 22 pages, 1 figure

  9. arXiv:2505.17283  [pdf, ps, other

    stat.ML cs.LG math.OC stat.AP

    Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine

    Authors: Prateek Jaiswal, Esmaeil Keyvanshokooh, Junyu Cao

    Abstract: Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach that leverages a Doubly Debiased LASSO (DDL) procedure to identify a sparse set of… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  10. arXiv:2505.17091  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.SD eess.AS

    Large Language Models Implicitly Learn to See and Hear Just By Reading

    Authors: Prateek Verma, Mert Pilanci

    Abstract: This paper presents a fascinating find: By training an auto-regressive LLM model on text tokens, the text model inherently develops internally an ability to understand images and audio, thereby developing the ability to see and hear just by reading. Popular audio and visual LLM models fine-tune text LLM models to give text output conditioned on images and audio embeddings. On the other hand, our a… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures, 4 tables. Under Review WASPAA 2025

  11. arXiv:2505.13250  [pdf, ps, other

    cs.CV

    Joint Depth and Reflectivity Estimation using Single-Photon LiDAR

    Authors: Hashan K. Weerasooriya, Prateek Chennuri, Weijian Zhang, Istvan Gyongy, Stanley H. Chan

    Abstract: Single-Photon Light Detection and Ranging (SP-LiDAR is emerging as a leading technology for long-range, high-precision 3D vision tasks. In SP-LiDAR, timestamps encode two complementary pieces of information: pulse travel time (depth) and the number of photons reflected by the object (reflectivity). Existing SP-LiDAR reconstruction methods typically recover depth and reflectivity separately or sequ… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  12. arXiv:2505.08182  [pdf

    cs.IT

    Semantic De-boosting in e-commerce Query Autocomplete

    Authors: Adithya Rajan, Weiqi Tong, Greg Sharp, Prateek Verma, Kevin Li

    Abstract: In ecommerce search, query autocomplete plays a critical role to help users in their shopping journey. Often times, query autocomplete presents users with semantically similar queries, which can impede the user's ability to find diverse and relevant results. This paper proposes a novel strategy to enhance this service by refining the presentation of typeahead suggestions based on their semantic si… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  13. arXiv:2505.07351  [pdf, ps, other

    cs.LG

    From Search To Sampling: Generative Models For Robust Algorithmic Recourse

    Authors: Prateek Garg, Lokesh Nagalapatti, Sunita Sarawagi

    Abstract: Algorithmic Recourse provides recommendations to individuals who are adversely impacted by automated model decisions, on how to alter their profiles to achieve a favorable outcome. Effective recourse methods must balance three conflicting goals: proximity to the original profile to minimize cost, plausibility for realistic recourse, and validity to ensure the desired outcome. We show that existing… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  14. arXiv:2505.04196  [pdf

    cs.LG cs.MA

    A Large Language Model for Feasible and Diverse Population Synthesis

    Authors: Sung Yoo Lim, Hyunsoo Yun, Prateek Bansal, Dong-Kyu Kim, Eui-Jin Kim

    Abstract: Generating a synthetic population that is both feasible and diverse is crucial for ensuring the validity of downstream activity schedule simulation in activity-based models (ABMs). While deep generative models (DGMs), such as variational autoencoders and generative adversarial networks, have been applied to this task, they often struggle to balance the inclusion of rare but plausible combinations… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 28 pages, 7 figures, 6 tables. Submitted to Transportation Research Part C: Emerging Technologies. Preprint version

  15. arXiv:2505.02433  [pdf, other

    cs.LG cs.AI

    FairPO: Robust Preference Optimization for Fair Multi-Label Learning

    Authors: Soumen Kumar Mondal, Akshit Varmora, Prateek Chanda, Ganesh Ramakrishnan

    Abstract: We propose FairPO, a novel framework designed to promote fairness in multi-label classification by directly optimizing preference signals with a group robustness perspective. In our framework, the set of labels is partitioned into privileged and non-privileged groups, and a preference-based loss inspired by Direct Preference Optimization (DPO) is employed to more effectively differentiate true pos… ▽ More

    Submitted 16 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  16. arXiv:2504.21831  [pdf, other

    cs.CV cs.AI

    Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization

    Authors: Anas Anwarul Haq Khan, Utkarsh Verma, Prateek Chanda, Ganesh Ramakrishnan

    Abstract: We introduce DEEVISum (Distilled Early Exit Vision language model for Summarization), a lightweight, efficient, and scalable vision language model designed for segment wise video summarization. Leveraging multi modal prompts that combine textual and audio derived signals, DEEVISum incorporates Multi Stage Knowledge Distillation (MSKD) and Early Exit (EE) to strike a balance between performance and… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  17. arXiv:2504.19413  [pdf, other

    cs.CL cs.AI

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Authors: Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, Deshraj Yadav

    Abstract: Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues. We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient informatio… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  18. arXiv:2504.19327  [pdf, other

    cs.CV cs.AI

    Platonic Grounding for Efficient Multimodal Language Models

    Authors: Moulik Choraria, Xinbo Wu, Akhil Bhimaraju, Nitesh Sekhar, Yue Wu, Xu Zhang, Prateek Singhal, Lav R. Varshney

    Abstract: The hyperscaling of data and parameter count in Transformer-based models is yielding diminishing performance improvement, especially when weighed against training costs. Such plateauing indicates the importance of methods for more efficient finetuning and inference, while retaining similar performance. This is especially relevant for multimodal learning paradigms, where inference costs of processi… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  19. arXiv:2504.18246  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Efficient Single-Pass Training for Multi-Turn Reasoning

    Authors: Ritesh Goru, Shanay Mehta, Prateek Jain

    Abstract: Training Large Language Models ( LLMs) to generate explicit reasoning before they produce an answer has been shown to improve their performance across various tasks such as mathematics and coding. However, fine-tuning LLMs on multi-turn reasoning datasets presents a unique challenge: LLMs must generate reasoning tokens that are excluded from subsequent inputs to the LLM. This discrepancy prevents… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: 9 pages, 3 figures

  20. arXiv:2504.07483  [pdf, other

    cs.PL cs.SE

    Program Skeletons for Automated Program Translation

    Authors: Bo Wang, Tianyu Li, Ruishi Li, Umang Mathur, Prateek Saxena

    Abstract: Translating software between programming languages is a challenging task, for which automated techniques have been elusive and hard to scale up to larger programs. A key difficulty in cross-language translation is that one has to re-express the intended behavior of the source program into idiomatic constructs of a different target language. This task needs abstracting away from the source language… ▽ More

    Submitted 22 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted by PLDI 2025 (46th ACM SIGPLAN Conference on Programming Language Design and Implementation)

  21. arXiv:2504.07097  [pdf, other

    cs.LG cs.AI cs.CL math.PR stat.ML

    Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning

    Authors: Nikhil Shivakumar Nayak, Krishnateja Killamsetty, Ligong Han, Abhishek Bhandwaldar, Prateek Chanda, Kai Xu, Hao Wang, Aldo Pareja, Oleg Silkin, Mustafa Eyceoz, Akash Srivastava

    Abstract: Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones. Existing methods typically rely on low-rank, parameter-efficient updates that limit the model's expressivity and introduce additional parameters per task, leading to scalability issues. To address these limitations, we pr… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 25 pages, 13 figures, 6 tables

    MSC Class: 68T50 ACM Class: I.2.0; G.3

  22. arXiv:2504.04532  [pdf, ps, other

    eess.IV cs.CV

    BrainMRDiff: A Diffusion Model for Anatomically Consistent Brain MRI Synthesis

    Authors: Moinak Bhattacharya, Saumya Gupta, Annie Singh, Chao Chen, Gagandeep Singh, Prateek Prasanna

    Abstract: Accurate brain tumor diagnosis relies on the assessment of multiple Magnetic Resonance Imaging (MRI) sequences. However, in clinical practice, the acquisition of certain sequences may be affected by factors like motion artifacts or contrast agent contraindications, leading to suboptimal outcome, such as poor image quality. This can then affect image interpretation by radiologists. Synthesizing hig… ▽ More

    Submitted 29 May, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  23. arXiv:2504.01009  [pdf, other

    cs.CV

    GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

    Authors: Saarthak Kapse, Pushpak Pati, Srikar Yellapragada, Srijan Das, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras, Prateek Prasanna

    Abstract: Pretraining a Multiple Instance Learning (MIL) aggregator enables the derivation of Whole Slide Image (WSI)-level embeddings from patch-level representations without supervision. While recent multimodal MIL pretraining approaches leveraging auxiliary modalities have demonstrated performance gains over unimodal WSI pretraining, the acquisition of these additional modalities necessitates extensive c… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  24. arXiv:2503.24370  [pdf, other

    cs.LG cs.AI cs.CL

    Effectively Controlling Reasoning Models through Thinking Intervention

    Authors: Tong Wu, Chong Xiang, Jiachen T. Wang, G. Edward Suh, Prateek Mittal

    Abstract: Reasoning-enhanced large language models (LLMs) explicitly generate intermediate reasoning steps prior to generating final answers, helping the model excel in complex problem-solving. In this paper, we demonstrate that this emerging generation framework offers a unique opportunity for more fine-grained control over model behavior. We propose Thinking Intervention, a novel paradigm designed to expl… ▽ More

    Submitted 21 May, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  25. arXiv:2503.19447  [pdf, other

    cs.AR cs.PL

    Anvil: A General-Purpose Timing-Safe Hardware Description Language

    Authors: Jason Zhijingcheng Yu, Aditya Ranjan Jha, Umang Mathur, Trevor E. Carlson, Prateek Saxena

    Abstract: Hardware designs routinely use stateless signals which change with their underlying registers. Unintended behaviours arise when a register is mutated even when its dependent signals are expected to remain stable (unchanged). Such timing hazards are common because, with a few exceptions, existing HDLs lack the abstraction for stable values and delegate this responsibility to hardware designers, who… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 22 pages, 8 figures

    ACM Class: B.5.2; D.3.1

  26. arXiv:2503.17469  [pdf, other

    cs.LG

    OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters

    Authors: Sahil Tyagi, Prateek Sharma

    Abstract: Deep learning systems are optimized for clusters with homogeneous resources. However, heterogeneity is prevalent in computing infrastructure across edge, cloud and HPC. When training neural networks using stochastic gradient descent techniques on heterogeneous resources, performance degrades due to stragglers and stale updates. In this work, we develop an adaptive batch-scaling framework called Om… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  27. arXiv:2503.16833  [pdf, other

    cs.SD cs.AI cs.CL cs.CY eess.AS

    The Deployment of End-to-End Audio Language Models Should Take into Account the Principle of Least Privilege

    Authors: Luxi He, Xiangyu Qi, Michel Liao, Inyoung Cheong, Prateek Mittal, Danqi Chen, Peter Henderson

    Abstract: We are at a turning point for language models that accept audio input. The latest end-to-end audio language models (Audio LMs) process speech directly instead of relying on a separate transcription step. This shift preserves detailed information, such as intonation or the presence of multiple speakers, that would otherwise be lost in transcription. However, it also introduces new safety risks, inc… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  28. arXiv:2503.16248  [pdf, other

    cs.CR cs.AI

    Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents

    Authors: Atharv Singh Patlan, Peiyao Sheng, S. Ashwin Hebbar, Prateek Mittal, Pramod Viswanath

    Abstract: The integration of AI agents with Web3 ecosystems harnesses their complementary potential for autonomy and openness yet also introduces underexplored security risks, as these agents dynamically interact with financial protocols and immutable smart contracts. This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in… ▽ More

    Submitted 30 April, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

    Comments: 29 pages, 21 figures

    ACM Class: I.2.7

  29. arXiv:2503.15297  [pdf, other

    cs.NI

    Probabilistic Delay Forecasting in 5G Using Recurrent and Attention-Based Architectures

    Authors: Samie Mostafavi, Gourav Prateek Sharma, Ahmad Traboulsi, James Gross

    Abstract: With the emergence of new application areas such as cyber-physical systems and human-in-the-loop applications ensuring a specific level of end-to-end network latency with high reliability (e.g., 99.9%) is becoming increasingly critical. To align wireless links with these reliability requirements, it is essential to analyze and control network latency in terms of its full probability distribution.… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  30. arXiv:2503.11591  [pdf, other

    eess.IV cs.CV

    Pathology Image Compression with Pre-trained Autoencoders

    Authors: Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis, Zilinghan Li, Tarak Nath Nandi, Ravi K Madduri, Prateek Prasanna, Joel Saltz, Dimitris Samaras

    Abstract: The growing volume of high-resolution Whole Slide Images in digital histopathology poses significant storage, transmission, and computational efficiency challenges. Standard compression methods, such as JPEG, reduce file sizes but often fail to preserve fine-grained phenotypic details critical for downstream tasks. In this work, we repurpose autoencoders (AEs) designed for Latent Diffusion Models… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  31. arXiv:2503.09919  [pdf, ps, other

    math.CO cs.CG math.MG

    Drums of high width

    Authors: Alex Davies, Prateek Gupta, Sebastien Racaniere, Grzegorz Swirszcz, Adam Zsolt Wagner, Theophane Weber, Geordie Williamson

    Abstract: We provide a family of $5$-dimensional prismatoids whose width grows linearly in the number of vertices. This provides a new infinite family of counter-examples to the Hirsch conjecture whose excess width grows linearly in the number of vertices, and answers a question of Matschke, Santos and Weibel.

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 31 pages

  32. Design and Implementation of FourCropNet: A CNN-Based System for Efficient Multi-Crop Disease Detection and Management

    Authors: H. P. Khandagale, Sangram Patil, V. S. Gavali, S. V. Chavan, P. P. Halkarnikar, Prateek A. Meshram

    Abstract: Plant disease detection is a critical task in agriculture, directly impacting crop yield, food security, and sustainable farming practices. This study proposes FourCropNet, a novel deep learning model designed to detect diseases in multiple crops, including CottonLeaf, Grape, Soybean, and Corn. The model leverages an advanced architecture comprising residual blocks for efficient feature extraction… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Journal ref: Journal of Information Systems Engineering and Management 2025, 10(7s) e-ISSN: 2468-4376

  33. arXiv:2503.06808  [pdf, other

    cs.CR cs.AI cs.LG

    Privacy Auditing of Large Language Models

    Authors: Ashwinee Panda, Xinyu Tang, Milad Nasr, Christopher A. Choquette-Choo, Prateek Mittal

    Abstract: Current techniques for privacy auditing of large language models (LLMs) have limited efficacy -- they rely on basic approaches to generate canaries which leads to weak membership inference attacks that in turn give loose lower bounds on the empirical privacy leakage. We develop canaries that are far more effective than those used in prior work under threat models that cover a range of realistic se… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: ICLR 2025

  34. arXiv:2503.04639  [pdf, other

    cs.CV cs.LG

    Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation

    Authors: Aishik Konwer, Zhijian Yang, Erhan Bas, Cao Xiao, Prateek Prasanna, Parminder Bhatia, Taha Kass-Hout

    Abstract: Foundational models such as the Segment Anything Model (SAM) are gaining traction in medical imaging segmentation, supporting multiple downstream tasks. However, such models are supervised in nature, still relying on large annotated datasets or prompts supplied by experts. Conventional techniques such as active learning to alleviate such limitations are limited in scope and still necessitate conti… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  35. arXiv:2503.01820  [pdf, other

    cs.LG cs.AI cs.CL

    RSQ: Learning from Important Tokens Leads to Better Quantized LLMs

    Authors: Yi-Lin Sung, Prateek Yadav, Jialu Li, Jaehong Yoon, Mohit Bansal

    Abstract: Layer-wise quantization is a key technique for efficiently compressing large models without expensive retraining. Previous methods typically quantize the weights of each layer by "uniformly" optimizing the layer reconstruction loss across all output tokens. However, in this paper, we demonstrate that better-quantized models can be obtained by prioritizing learning from important tokens (e.g. which… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Our code is available at https://github.com/ylsung/rsq

  36. arXiv:2502.17422  [pdf, other

    cs.CV cs.AI cs.CL

    MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

    Authors: Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

    Abstract: Multimodal Large Language Models (MLLMs) have experienced rapid progress in visual recognition tasks in recent years. Given their potential integration into many critical applications, it is important to understand the limitations of their visual perception. In this work, we study whether MLLMs can perceive small visual details as effectively as large ones when answering questions about images. We… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: Published as a conference paper at ICLR 2025. Code at: https://github.com/saccharomycetes/mllms_know

  37. arXiv:2502.13450  [pdf, other

    cs.LG cs.AI

    Interleaved Gibbs Diffusion for Constrained Generation

    Authors: Gautham Govind Anil, Sachin Yadav, Dheeraj Nagaraj, Karthikeyan Shanmugam, Prateek Jain

    Abstract: We introduce Interleaved Gibbs Diffusion (IGD), a novel generative modeling framework for mixed continuous-discrete data, focusing on constrained generation problems. Prior works on discrete and continuous-discrete diffusion models assume factorized denoising distribution for fast generation, which can hinder the modeling of strong dependencies between random variables encountered in constrained g… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  38. arXiv:2502.11028  [pdf, ps, other

    cs.CL cs.AI

    Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models

    Authors: Prateek Chhikara

    Abstract: Large Language Models (LLMs) show remarkable proficiency in natural language tasks, yet their frequent overconfidence-misalignment between predicted confidence and true correctness-poses significant risks in critical decision-making applications. We present a comprehensive analysis on calibration in LLMs across nine LLMs and three factual Question-Answering (QA) datasets, systematically comparing… ▽ More

    Submitted 5 June, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  39. arXiv:2502.06786  [pdf, other

    cs.LG cs.AI

    Matryoshka Quantization

    Authors: Pranav Nair, Puranjay Datta, Jeff Dean, Prateek Jain, Aditya Kusupati

    Abstract: Quantizing model weights is critical for reducing the communication and inference costs of large models. However, quantizing models -- especially to low precisions like int4 or int2 -- requires a trade-off in model quality; int2, in particular, is known to severely degrade model quality. Consequently, practitioners are often forced to maintain multiple models with different quantization levels or… ▽ More

    Submitted 3 March, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  40. arXiv:2502.04248  [pdf, other

    cs.LG

    Adapting to Evolving Adversaries with Regularized Continual Robust Training

    Authors: Sihui Dai, Christian Cianfarani, Arjun Bhagoji, Vikash Sehwag, Prateek Mittal

    Abstract: Robust training methods typically defend against specific attack types, such as Lp attacks with fixed budgets, and rarely account for the fact that defenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning, a method which we call continual robust training (CRT). However, when implemented naively, fine-tuning on… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  41. arXiv:2502.00706  [pdf, other

    cs.CR cs.CL cs.LG

    Model Provenance Testing for Large Language Models

    Authors: Ivica Nikolic, Teodora Baluta, Prateek Saxena

    Abstract: Large language models are increasingly customized through fine-tuning and other adaptations, creating challenges in enforcing licensing terms and managing downstream impacts. Tracking model origins is crucial both for protecting intellectual property and for identifying derived models when biases or vulnerabilities are discovered in foundation models. We address this challenge by developing a fram… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

  42. arXiv:2502.00382  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Generative Nested Transformers with Decode Time Scaling

    Authors: Sahil Goyal, Debapriya Tula, Gagan Jain, Pradeep Shenoy, Prateek Jain, Sujoy Paul

    Abstract: Recent advances in visual generation have made significant strides in producing content of exceptional quality. However, most methods suffer from a fundamental problem - a bottleneck of inference computational efficiency. Most of these algorithms involve multiple passes over a transformer model to generate tokens or denoise inputs. However, the model size is kept consistent throughout all iteratio… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

  43. arXiv:2501.09826  [pdf, other

    cs.CV

    PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery

    Authors: Shristi Das Biswas, Matthew Shreve, Xuelu Li, Prateek Singhal, Kaushik Roy

    Abstract: Recent advancements in language-guided diffusion models for image editing are often bottle-necked by cumbersome prompt engineering to precisely articulate desired changes. An intuitive alternative calls on guidance from in-the-wild image exemplars to help users bring their imagined edits to life. Contemporary exemplar-based editing methods shy away from leveraging the rich latent space learnt by p… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  44. arXiv:2501.09672  [pdf, other

    cs.CV cs.AI

    Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark

    Authors: Alexis Roger, Prateek Humane, Daniel Z. Kaplan, Kshitij Gupta, Qi Sun, George Adamopoulos, Jonathan Siu Chi Lim, Quentin Anthony, Edwin Fennell, Irina Rish

    Abstract: The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and benchmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AI-based assessments, and human evaluations across diverse tasks. We first introduce Robin - a novel suite of VLMs that we built by combining Large Language Models (LL… ▽ More

    Submitted 20 January, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

  45. arXiv:2412.16926  [pdf, other

    cs.CL cs.AI cs.LG

    Revisiting In-Context Learning with Long Context Language Models

    Authors: Jinheon Baek, Sun Jae Lee, Prakhar Gupta, Geunseob Oh, Siddharth Dalmia, Prateek Kolhar

    Abstract: In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs)… ▽ More

    Submitted 28 May, 2025; v1 submitted 22 December, 2024; originally announced December 2024.

    Comments: ACL Findings 2025

  46. arXiv:2412.16429  [pdf, other

    cs.CY cs.AI cs.LG

    LearnLM: Improving Gemini for Learning

    Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Brett Wiltshire, Brian Veprek, Daniel Gillick, Daniel Kasenberg, Derek Ahmed, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Lisa Wang, Markus Kunesch, Mike Schaekermann, Miruna Pîslar, Nikhil Joshi, Parsa Mahmoudieh, Paul Jhun, Sara Wiltberger, Shakir Mohamed , et al. (21 additional authors not shown)

    Abstract: Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level ins… ▽ More

    Submitted 25 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

  47. arXiv:2412.14327  [pdf, other

    cs.CV

    Personalized Generative Low-light Image Denoising and Enhancement

    Authors: Xijun Wang, Prateek Chennuri, Yu Yuan, Bole Ma, Xingguang Zhang, Stanley Chan

    Abstract: While smartphone cameras today can produce astonishingly good photos, their performance in low light is still not completely satisfactory because of the fundamental limits in photon shot noise and sensor read noise. Generative image restoration methods have demonstrated promising results compared to traditional methods, but they suffer from hallucinatory content generation when the signal-to-noise… ▽ More

    Submitted 10 March, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

  48. arXiv:2412.11449  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Whisper-GPT: A Hybrid Representation Audio Large Language Model

    Authors: Prateek Verma

    Abstract: We propose WHISPER-GPT: A generative large language model (LLM) for speech and music that allows us to work with continuous audio representations and discrete tokens simultaneously as part of a single architecture. There has been a huge surge in generative audio, speech, and music models that utilize discrete audio tokens derived from neural compression algorithms, e.g. ENCODEC. However, one of th… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: 6 pages, 3 figures. 50th International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India

  49. arXiv:2412.09988  [pdf

    cs.CY cs.AI

    AI and the Future of Digital Public Squares

    Authors: Beth Goldberg, Diana Acosta-Navas, Michiel Bakker, Ian Beacock, Matt Botvinick, Prateek Buch, Renée DiResta, Nandika Donthi, Nathanael Fast, Ravi Iyer, Zaria Jalan, Andrew Konya, Grace Kwak Danciu, Hélène Landemore, Alice Marwick, Carl Miller, Aviv Ovadya, Emily Saltz, Lisa Schirch, Dalit Shalom, Divya Siddarth, Felix Sieker, Christopher Small, Jonathan Stray, Audrey Tang , et al. (2 additional authors not shown)

    Abstract: Two substantial technological advances have reshaped the public square in recent decades: first with the advent of the internet and second with the recent introduction of large language models (LLMs). LLMs offer opportunities for a paradigm shift towards more decentralized, participatory online spaces that can be used to facilitate deliberative dialogues at scale, but also create risks of exacerba… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: 40 pages, 5 figures

  50. arXiv:2412.09538  [pdf, other

    cs.LG stat.ML

    Capturing the Temporal Dependence of Training Data Influence

    Authors: Jiachen T. Wang, Dawn Song, James Zou, Prateek Mittal, Ruoxi Jia

    Abstract: Traditional data influence estimation methods, like influence function, assume that learning algorithms are permutation-invariant with respect to training data. However, modern training paradigms, especially for foundation models using stochastic algorithms and multi-stage curricula, are sensitive to data ordering, thus violating this assumption. This mismatch renders influence functions inadequat… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: Correspondence to Jiachen T. Wang and Ruoxi Jia