Skip to main content

Showing 1–50 of 579 results for author: Anurag

Searching in archive cs. Search in all archives.
.
  1. Echo-DND: A dual noise diffusion model for robust and precise left ventricle segmentation in echocardiography

    Authors: Abdur Rahman, Keerthiveena Balraj, Manojkumar Ramteke, Anurag Singh Rathore

    Abstract: Recent advancements in diffusion probabilistic models (DPMs) have revolutionized image processing, demonstrating significant potential in medical applications. Accurate segmentation of the left ventricle (LV) in echocardiograms is crucial for diagnostic procedures and necessary treatments. However, ultrasound images are notoriously noisy with low contrast and ambiguous LV boundaries, thereby compl… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Version of record published in Discover Applied Sciences (Springer Nature). The definitive article is available at https://doi.org/10.1007/s42452-025-07055-5

    Journal ref: Discov Appl Sci 7, 514 (2025)

  2. arXiv:2506.03373  [pdf, ps, other

    cs.CV cs.AI

    A Foundation Model for Spatial Proteomics

    Authors: Muhammad Shaban, Yuzhou Chang, Huaying Qiu, Yao Yu Yeo, Andrew H. Song, Guillaume Jaume, Yuchen Wang, Luca L. Weishaupt, Tong Ding, Anurag Vaidya, Abdallah Lamane, Daniel Shao, Mohammed Zidane, Yunhao Bai, Paige McCallum, Shuli Luo, Wenrui Wu, Yang Wang, Precious Cramer, Chi Ngai Chan, Pierre Stephan, Johanna Schaffenrath, Jia Le Lee, Hendrik A. Michel, Caiwei Tian , et al. (35 additional authors not shown)

    Abstract: Foundation models have begun to transform image analysis by acting as pretrained generalist backbones that can be adapted to many tasks even when post-training data are limited, yet their impact on spatial proteomics, imaging that maps proteins at single-cell resolution, remains limited. Here, we introduce KRONOS, a foundation model built for spatial proteomics. KRONOS was trained in a self-superv… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  3. arXiv:2506.03189  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Continual Learning in Vision-Language Models via Aligned Model Merging

    Authors: Ghada Sokar, Gintare Karolina Dziugaite, Anurag Arnab, Ahmet Iscen, Pablo Samuel Castro, Cordelia Schmid

    Abstract: Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors plasticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  4. arXiv:2506.02527  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Multilingual Information Retrieval with a Monolingual Knowledge Base

    Authors: Yingying Zhuang, Aman Gupta, Anurag Beniwal

    Abstract: Multilingual information retrieval has emerged as powerful tools for expanding knowledge sharing across languages. On the other hand, resources on high quality knowledge base are often scarce and in limited languages, therefore an effective embedding model to transform sentences from different languages into a feature vector space same as the knowledge base language becomes the key ingredient for… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 6 pages, accepted at GENNEXT@SIGIR25

  5. arXiv:2506.01611  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Lessons Learned from the URGENT 2024 Speech Enhancement Challenge

    Authors: Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

    Abstract: The URGENT 2024 Challenge aims to foster speech enhancement (SE) techniques with great universality, robustness, and generalizability, featuring a broader task definition, large-scale multi-domain data, and comprehensive evaluation metrics. Nourished by the challenge outcomes, this paper presents an in-depth analysis of two key, yet understudied, issues in SE system development: data cleaning and… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 5 pages, 4 figures, 1 table. Accepted by Interspeech 2025. Code available at https://github.com/urgent-challenge/urgent2024_analysis

  6. arXiv:2506.00943  [pdf, ps, other

    cs.SE cs.AI

    Legal Compliance Evaluation of Smart Contracts Generated By Large Language Models

    Authors: Chanuka Wijayakoon, Hai Dong, H. M. N. Dilum Bandara, Zahir Tari, Anurag Soin

    Abstract: Smart contracts can implement and automate parts of legal contracts, but ensuring their legal compliance remains challenging. Existing approaches such as formal specification, verification, and model-based development require expertise in both legal and software development domains, as well as extensive manual effort. Given the recent advances of Large Language Models (LLMs) in code generation, we… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted for publication at IEEE International Conference on Blockchain and Cryptocurrency (ICBC) 2025

  7. arXiv:2506.00210  [pdf, ps, other

    cs.CL cs.AI

    REIC: RAG-Enhanced Intent Classification at Scale

    Authors: Ziji Zhang, Michael Yang, Zhiyu Chen, Yingying Zhuang, Shu-Ting Pi, Qun Liu, Rajashekar Maragoud, Vy Nguyen, Anurag Beniwal

    Abstract: Accurate intent classification is critical for efficient routing in customer service, ensuring customers are connected with the most suitable agents while reducing handling times and operational costs. However, as companies expand their product lines, intent classification faces scalability challenges due to the increasing number of intents and variations in taxonomy across different verticals. In… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  8. arXiv:2505.22342  [pdf, ps, other

    cs.CV cs.LG

    Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training

    Authors: Shriram M S, Xinyue Hao, Shihao Hou, Yang Lu, Laura Sevilla-Lara, Anurag Arnab, Shreyank N Gowda

    Abstract: The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwined factors: the size of models and the size of datasets. While promising research efforts focus on reducing the size of models, the other half of the equation remains fairly mysterious. Indeed, it is surprising… ▽ More

    Submitted 6 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  9. arXiv:2505.20451  [pdf, ps, other

    cs.CL

    Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries

    Authors: Sahana Ramnath, Anurag Mudgil, Brihi Joshi, Skyler Hallinan, Xiang Ren

    Abstract: Today, large language models are widely used as judges to evaluate responses from other language models. Hence, it is imperative to benchmark and improve these LLM-judges on real-world language model usage: a typical human-assistant conversation is lengthy, and shows significant diversity in topics, intents, and requirements across turns, e.g. social interactions, task requests, feedback. We prese… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  10. arXiv:2505.17073  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Mechanistic Interpretability of GPT-like Models on Summarization Tasks

    Authors: Anurag Mishra

    Abstract: Mechanistic interpretability research seeks to reveal the inner workings of large language models, yet most work focuses on classification or generative tasks rather than summarization. This paper presents an interpretability framework for analyzing how GPT-like models adapt to summarization tasks. We conduct differential analysis between pre-trained and fine-tuned models, quantifying changes in a… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 8 pages (6 content + 2 references/appendix), 6 figures, 2 tables; under review for the ACL 2025 Student Research Workshop

  11. arXiv:2505.16086  [pdf, other

    cs.AI cs.CL

    Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development

    Authors: Ming Shen, Raphael Shu, Anurag Pratik, James Gung, Yubin Ge, Monica Sunkara, Yi Zhang

    Abstract: We have seen remarkable progress in large language models (LLMs) empowered multi-agent systems solving complex tasks necessitating cooperation among experts with diverse skills. However, optimizing LLM-based multi-agent systems remains challenging. In this work, we perform an empirical case study on group optimization of role-based multi-agent systems utilizing natural language feedback for challe… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  12. arXiv:2505.13535  [pdf, ps, other

    cs.IR cs.AI

    Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual Segments

    Authors: Aniket Bhattacharyya, Anurag Tripathi, Ujjal Das, Archan Karmakar, Amit Pathak, Maneesh Gupta

    Abstract: Information extraction (IE) from Visually Rich Documents (VRDs) containing layout features along with text is a critical and well-studied task. Specialized non-LLM NLP-based solutions typically involve training models using both textual and geometric information to label sequences/tokens as named entities or answers to specific questions. However, these approaches lack reasoning, are not able to i… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: Accepted to ACL Main 2025

  13. arXiv:2505.12154  [pdf, ps, other

    cs.CV cs.SD eess.AS

    Learning to Highlight Audio by Watching Movies

    Authors: Chao Huang, Ruohan Gao, J. M. F. Tsang, Jan Kurcius, Cagdas Bilen, Chenliang Xu, Anurag Kumar, Sanjeel Parekh

    Abstract: Recent years have seen a significant increase in video content creation and consumption. Crafting engaging content requires the careful curation of both visual and audio elements. While visual cue curation, through techniques like optimal viewpoint selection or post-editing, has been central to media production, its natural counterpart, audio, has not undergone equivalent advancements. This often… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: CVPR 2025. Project page: https://wikichao.github.io/VisAH/

  14. arXiv:2505.11423  [pdf, other

    cs.CL

    When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

    Authors: Xiaomin Li, Zhou Yu, Zhiwei Zhang, Xupeng Chen, Ziji Zhang, Yingying Zhuang, Narayanan Sadagopan, Anurag Beniwal

    Abstract: Reasoning-enhanced large language models (RLLMs), whether explicitly trained for reasoning or prompted via chain-of-thought (CoT), have achieved state-of-the-art performance on many complex reasoning tasks. However, we uncover a surprising and previously overlooked phenomenon: explicit CoT reasoning can significantly degrade instruction-following accuracy. Evaluating 15 models on two benchmarks: I… ▽ More

    Submitted 20 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  15. arXiv:2505.07706  [pdf, other

    math.CO cs.IT

    The generalized trifference problem

    Authors: Anurag Bishnoi, Bartłomiej Kielak, Benedek Kovács, Zoltán Lóránt Nagy, Gábor Somlai, Máté Vizer, Zeyu Zheng

    Abstract: We study the problem of finding the largest number $T(n, m)$ of ternary vectors of length $n$ such that for any three distinct vectors there are at least $m$ coordinates where they pairwise differ. For $m = 1$, this is the classical trifference problem which is wide open. We prove upper and lower bounds on $T(n, m)$ for various ranges of the parameter $m$ and determine the phase transition thr… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  16. arXiv:2504.17289  [pdf, other

    cs.CG

    Separating Two Points with Obstacles in the Plane: Improved Upper and Lower Bounds

    Authors: Jack Spalding-Jamieson, Anurag Murty Naredla

    Abstract: Given two points in the plane, and a set of "obstacles" given as curves through the plane with assigned weights, we consider the point-separation problem, which asks for the minimum-weight subset of the obstacles separating the two points. A few computational models for this problem have been previously studied. We give a unified approach to this problem in all models via a reduction to a particul… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 32 pages, 16 figures

  17. arXiv:2504.13157  [pdf, other

    cs.CV

    AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

    Authors: Khiem Vuong, Anurag Ghosh, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani

    Abstract: We explore the task of geometric reconstruction of images captured from a mixture of ground and aerial views. Current state-of-the-art learning-based approaches fail to handle the extreme viewpoint variation between aerial-ground image pairs. Our hypothesis is that the lack of high-quality, co-registered aerial-ground datasets for training is a key reason for this failure. Such data is difficult t… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Appearing in CVPR 2025. Project page: https://aerial-megadepth.github.io

  18. arXiv:2504.12755  [pdf, other

    cs.RO cs.AI

    Trajectory Adaptation using Large Language Models

    Authors: Anurag Maurya, Tashmoy Ghosh, Ravi Prakash

    Abstract: Adapting robot trajectories based on human instructions as per new situations is essential for achieving more intuitive and scalable human-robot interactions. This work proposes a flexible language-based framework to adapt generic robotic trajectories produced by off-the-shelf motion planners like RRT, A-star, etc, or learned from human demonstrations. We utilize pre-trained LLMs to adapt trajecto… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted to CoRL LangRob workshop 2024

  19. arXiv:2504.10746  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.SD eess.AS

    Hearing Anywhere in Any Environment

    Authors: Xiulong Liu, Anurag Kumar, Paul Calamia, Sebastia V. Amengual, Calvin Murdock, Ishwarya Ananthabhotla, Philip Robinson, Eli Shlizerman, Vamsi Krishna Ithapu, Ruohan Gao

    Abstract: In mixed reality applications, a realistic acoustic experience in spatial environments is as crucial as the visual experience for achieving true immersion. Despite recent advances in neural approaches for Room Impulse Response (RIR) estimation, most existing methods are limited to the single environment on which they are trained, lacking the ability to generalize to new rooms with different geomet… ▽ More

    Submitted 4 June, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: CVPR 2025; Project Page: https://dragonliu1995.github.io/hearinganywhereinanyenvironment/

  20. arXiv:2504.07111  [pdf, other

    cs.CE cond-mat.mtrl-sci

    High-Performance Gradient Evaluation for Complex Soft Materials Using MPI-based DFS Algorithm

    Authors: Anurag Bhattacharyya

    Abstract: This article presents a depth-first search (DFS)-based algorithm for evaluating sensitivity gradients in the topology optimization of soft materials exhibiting complex deformation behavior. The algorithm is formulated using a time-dependent adjoint sensitivity approach and is implemented within a PETSc-based C++ MPI framework for efficient parallel computing. It has been found that on a single pro… ▽ More

    Submitted 18 March, 2025; originally announced April 2025.

  21. arXiv:2504.02920  [pdf, other

    cs.CV cs.LG

    LiDAR-based Object Detection with Real-time Voice Specifications

    Authors: Anurag Kulkarni

    Abstract: This paper presents a LiDAR-based object detection system with real-time voice specifications, integrating KITTI's 3D point clouds and RGB images through a multi-modal PointNet framework. It achieves 87.0% validation accuracy on a 3000-sample subset, surpassing a 200-sample baseline of 67.5% by combining spatial and visual data, addressing class imbalance with weighted loss, and refining training… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 10 pages, 4 figures, submitted as part of MSc research

  22. arXiv:2503.22069  [pdf, other

    cs.CV cs.AI

    Contrasting Low and High-Resolution Features for HER2 Scoring using Deep Learning

    Authors: Ekansh Chauhan, Anila Sharma, Amit Sharma, Vikas Nishadham, Asha Ghughtyal, Ankur Kumar, Gurudutt Gupta, Anurag Mehta, C. V. Jawahar, P. K. Vinod

    Abstract: Breast cancer, the most common malignancy among women, requires precise detection and classification for effective treatment. Immunohistochemistry (IHC) biomarkers like HER2, ER, and PR are critical for identifying breast cancer subtypes. However, traditional IHC classification relies on pathologists' expertise, making it labor-intensive and subject to significant inter-observer variability. To ad… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  23. Data to Decisions: A Computational Framework to Identify skill requirements from Advertorial Data

    Authors: Aakash Singh, Anurag Kanaujia, Vivek Kumar Singh

    Abstract: Among the factors of production, human capital or skilled manpower is the one that keeps evolving and adapts to changing conditions and resources. This adaptability makes human capital the most crucial factor in ensuring a sustainable growth of industry/sector. As new technologies are developed and adopted, the new generations are required to acquire skills in newer technologies in order to be emp… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  24. arXiv:2503.16395  [pdf, other

    cs.LG

    Truthful Elicitation of Imprecise Forecasts

    Authors: Anurag Singh, Siu Lun Chau, Krikamol Muandet

    Abstract: The quality of probabilistic forecasts is crucial for decision-making under uncertainty. While proper scoring rules incentivize truthful reporting of precise forecasts, they fall short when forecasters face epistemic uncertainty about their beliefs, limiting their use in safety-critical domains where decision-makers (DMs) prioritize proper uncertainty management. To address this, we propose a fram… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: 32 pages, 3 figures

  25. arXiv:2503.08734  [pdf, ps, other

    cs.CR cs.AI

    Zero-to-One IDV: A Conceptual Model for AI-Powered Identity Verification

    Authors: Aniket Vaidya, Anurag Awasthi

    Abstract: In today's increasingly digital interactions, robust Identity Verification (IDV) is crucial for security and trust. Artificial Intelligence (AI) is transforming IDV, enhancing accuracy and fraud detection. This paper introduces ``Zero to One,'' a holistic conceptual framework for developing AI-powered IDV products. This paper outlines the foundational problem and research objectives that necessita… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: 7 pages

  26. arXiv:2503.06387  [pdf, other

    cs.SE cs.CR

    R+R: Security Vulnerability Dataset Quality Is Critical

    Authors: Anurag Swarnim Yadav, Joseph N. Wilson

    Abstract: Large Language Models (LLMs) are of great interest in vulnerability detection and repair. The effectiveness of these models hinges on the quality of the datasets used for both training and evaluation. Our investigation reveals that a number of studies featured in prominent software engineering conferences have employed datasets that are plagued by high duplication rates, questionable label accurac… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: 15 pages, 1 figure, 35 tables. To be published in Proceedings of the 2024 Annual Computer Security Applications Conference (ACSAC)

  27. arXiv:2503.04666  [pdf, other

    cs.CV

    What Are You Doing? A Closer Look at Controllable Human Video Generation

    Authors: Emanuele Bugliarello, Anurag Arnab, Roni Paiss, Pieter-Jan Kindermans, Cordelia Schmid

    Abstract: High-quality benchmarks are crucial for driving progress in machine learning research. However, despite the growing interest in video generation, there is no comprehensive dataset to evaluate human generation. Humans can perform a wide variety of actions and interactions, but existing datasets, like TikTok and TED-Talks, lack the diversity and complexity to fully capture the capabilities of video… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  28. arXiv:2502.21239  [pdf, other

    cs.CL

    Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs

    Authors: Xiaomin Li, Zhou Yu, Ziji Zhang, Yingying Zhuang, Swair Shah, Narayanan Sadagopan, Anurag Beniwal

    Abstract: Large language models (LLMs) have demonstrated remarkable performance across diverse tasks by encoding vast amounts of factual knowledge. However, they are still prone to hallucinations, generating incorrect or misleading information, often accompanied by high uncertainty. Existing methods for hallucination detection primarily focus on quantifying internal uncertainty, which arises from missing or… ▽ More

    Submitted 5 May, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

  29. arXiv:2502.14998  [pdf, other

    cs.LG

    Generative Modeling of Individual Behavior at Scale

    Authors: Nabil Omi, Lucas Caccia, Anurag Sarkar, Jordan T. Ash, Siddhartha Sen

    Abstract: There has been a growing interest in using AI to model human behavior, particularly in domains where humans interact with this technology. While most existing work models human behavior at an aggregate level, our goal is to model behavior at the individual level. Recent approaches to behavioral stylometry -- or the task of identifying a person from their actions alone -- have shown promise in doma… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  30. arXiv:2502.12094  [pdf, other

    cs.AI cs.CL

    A Study on Leveraging Search and Self-Feedback for Agent Reasoning

    Authors: Karthikeyan K, Michelle Yuan, Elman Mansimov, Katerina Margatina, Anurag Pratik, Daniele Bonadiman, Monica Sunkara, Yi Zhang, Yassine Benajiba

    Abstract: Recent works have demonstrated that incorporating search during inference can significantly improve reasoning capabilities of language agents. Some approaches may make use of the ground truth or rely on model's own generated feedback. The search algorithm uses this feedback to then produce values that will update its criterion for exploring and exploiting various reasoning paths. In this study, we… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: Under review

  31. arXiv:2502.07166  [pdf, other

    cs.MA cs.GT cs.LG stat.ML

    Bayesian Optimization for Building Social-Influence-Free Consensus

    Authors: Masaki Adachi, Siu Lun Chau, Wenjie Xu, Anurag Singh, Michael A. Osborne, Krikamol Muandet

    Abstract: We introduce Social Bayesian Optimization (SBO), a vote-efficient algorithm for consensus-building in collective decision-making. In contrast to single-agent scenarios, collective decision-making encompasses group dynamics that may distort agents' preference feedback, thereby impeding their capacity to achieve a social-influence-free consensus -- the most preferable decision based on the aggregate… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 50 pages, 8 figures

    MSC Class: 62C10; 62F15

  32. arXiv:2502.07001  [pdf, other

    cs.CV cs.AI cs.LG

    From Image to Video: An Empirical Study of Diffusion Representations

    Authors: Pedro Vélez, Luisa F. Polanía, Yi Yang, Chuhan Zhang, Rishabh Kabra, Anurag Arnab, Mehdi S. M. Sajjadi

    Abstract: Diffusion models have revolutionized generative modeling, enabling unprecedented realism in image and video synthesis. This success has sparked interest in leveraging their representations for visual understanding tasks. While recent works have explored this potential for image generation, the visual understanding capabilities of video diffusion models remain largely uncharted. To address this gap… ▽ More

    Submitted 19 March, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  33. arXiv:2502.06750  [pdf, ps, other

    cs.CV

    Accelerating Data Processing and Benchmarking of AI Models for Pathology

    Authors: Andrew Zhang, Guillaume Jaume, Anurag Vaidya, Tong Ding, Faisal Mahmood

    Abstract: Advances in foundation modeling have reshaped computational pathology. However, the increasing number of available models and lack of standardized benchmarks make it increasingly complex to assess their strengths, limitations, and potential for further development. To address these challenges, we introduce a new suite of software tools for whole-slide image processing, foundation model benchmarkin… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  34. arXiv:2502.05888  [pdf, other

    cs.DS

    Faster Approximation Algorithms for k-Center via Data Reduction

    Authors: Arnold Filtser, Shaofeng H. -C. Jiang, Yi Li, Anurag Murty Naredla, Ioannis Psarros, Qiaoyuan Yang, Qin Zhang

    Abstract: We study efficient algorithms for the Euclidean $k$-Center problem, focusing on the regime of large $k$. We take the approach of data reduction by considering $α$-coreset, which is a small subset $S$ of the dataset $P$ such that any $β$-approximation on $S$ is an $(α+ β)$-approximation on $P$. We give efficient algorithms to construct coresets whose size is $k \cdot o(n)$, which immediately speeds… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  35. arXiv:2502.01507  [pdf, other

    cs.CV

    End-to-end Training for Text-to-Image Synthesis using Dual-Text Embeddings

    Authors: Yeruru Asrar Ahmed, Anurag Mittal

    Abstract: Text-to-Image (T2I) synthesis is a challenging task that requires modeling complex interactions between two modalities ( i.e., text and image). A common framework adopted in recent state-of-the-art approaches to achieving such multimodal interactions is to bootstrap the learning process with pre-trained image-aligned text embeddings trained using contrastive loss. Furthermore, these embeddings are… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  36. arXiv:2501.18157  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment

    Authors: Joanna Hong, Sanjeel Parekh, Honglie Chen, Jacob Donley, Ke Tan, Buye Xu, Anurag Kumar

    Abstract: Building reliable speech systems often requires combining multiple modalities, like audio and visual cues. While such multimodal solutions frequently lead to improvements in performance and may even be critical in certain cases, they come with several constraints such as increased sensory requirements, computational cost, and modality synchronization, to mention a few. These challenges constrain t… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  37. arXiv:2501.16652  [pdf, other

    cs.CV cs.AI

    Molecular-driven Foundation Model for Oncologic Pathology

    Authors: Anurag Vaidya, Andrew Zhang, Guillaume Jaume, Andrew H. Song, Tong Ding, Sophia J. Wagner, Ming Y. Lu, Paul Doucet, Harry Robertson, Cristina Almagro-Perez, Richard J. Chen, Dina ElHarouni, Georges Ayoub, Connor Bossi, Keith L. Ligon, Georg Gerber, Long Phi Le, Faisal Mahmood

    Abstract: Foundation models are reshaping computational pathology by enabling transfer learning, where models pre-trained on vast datasets can be adapted for downstream diagnostic, prognostic, and therapeutic response tasks. Despite these advances, foundation models are still limited in their ability to encode the entire gigapixel whole-slide images without additional training and often lack complementary m… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  38. arXiv:2501.11218   

    cs.CV cs.AI cs.LG

    Leveraging GANs For Active Appearance Models Optimized Model Fitting

    Authors: Anurag Awasthi

    Abstract: Active Appearance Models (AAMs) are a well-established technique for fitting deformable models to images, but they are limited by linear appearance assumptions and can struggle with complex variations. In this paper, we explore if the AAM fitting process can benefit from a Generative Adversarial Network (GAN). We uses a U-Net based generator and a PatchGAN discriminator for GAN-augmented framework… ▽ More

    Submitted 7 April, 2025; v1 submitted 19 January, 2025; originally announced January 2025.

    Comments: The full text of this preprint has been withdrawn, as it was submitted in error at a much earlier stage, with work still needing substantial refinement and validation. Therefore, the authors do not wish this work to be cited as a reference

  39. arXiv:2501.11156  [pdf, ps, other

    math.CO cs.CG

    Covering half-grids with lines and planes

    Authors: Anurag Bishnoi, Shantanu Nene

    Abstract: We study hyperplane covering problems for finite grid-like structures in $\mathbb{R}^d$. We call a set $\mathcal{C}$ of points in $\mathbb{R}^2$ a conical grid if the line $y = a_i$ intersects $\mathcal{C}$ in exactly $i$ points, for some $a_1 > \cdots > a_n \in \mathbb{R}$. We prove that the number of lines required to cover every point of such a grid at least $k$ times is at least… ▽ More

    Submitted 26 January, 2025; v1 submitted 19 January, 2025; originally announced January 2025.

    Comments: 10 pages; minor revision; added a new reference

  40. arXiv:2501.08421  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models

    Authors: Anurag Kumar, Rohit Paturi, Amber Afshan, Sundararajan Srinivasan

    Abstract: Speaker Diarization (SD) is a crucial component of modern end-to-end ASR pipelines. Traditional SD systems, which are typically audio-based and operate independently of ASR, often introduce speaker errors, particularly during speaker transitions and overlapping speech. Recently, language models including fine-tuned large language models (LLMs) have shown to be effective as a second-pass speaker er… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  41. arXiv:2501.06234  [pdf, other

    cs.OS cs.CR

    Fast, Secure, Adaptable: LionsOS Design, Implementation and Performance

    Authors: Gernot Heiser, Ivan Velickovic, Peter Chubb, Alwin Joshy, Anuraag Ganesh, Bill Nguyen, Cheng Li, Courtney Darville, Guangtao Zhu, James Archer, Jingyao Zhou, Krishnan Winter, Lucy Parker, Szymon Duchniewicz, Tianyi Bai

    Abstract: We present LionsOS, an operating system for security- and safety-critical embedded systems. LionsOS is based on the formally verified seL4 microkernel and designed with verification in mind. It uses a static architecture and features a highly modular design driven by strict separa- tion of concerns and a focus on simplicity. We demonstrate that LionsOS achieves excellent performance on system-call… ▽ More

    Submitted 27 May, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

    Comments: 14 pages, 13 figures

    ACM Class: D.4.7; D.4.8

  42. arXiv:2412.19966  [pdf, other

    cs.CL cs.AI

    Bridging Context Gaps: Enhancing Comprehension in Long-Form Social Conversations Through Contextualized Excerpts

    Authors: Shrestha Mohanty, Sarah Xuan, Jacob Jobraeel, Anurag Kumar, Deb Roy, Jad Kabbara

    Abstract: We focus on enhancing comprehension in small-group recorded conversations, which serve as a medium to bring people together and provide a space for sharing personal stories and experiences on crucial social matters. One way to parse and convey information from these conversations is by sharing highlighted excerpts in subsequent conversations. This can help promote a collective understanding of rel… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

    Comments: Accepted at COLING 2025

  43. arXiv:2412.18669  [pdf, other

    cs.AI cs.CL

    Advancing Explainability in Neural Machine Translation: Analytical Metrics for Attention and Alignment Consistency

    Authors: Anurag Mishra

    Abstract: Neural Machine Translation (NMT) models have shown remarkable performance but remain largely opaque in their decision making processes. The interpretability of these models, especially their internal attention mechanisms, is critical for building trust and verifying that these systems behave as intended. In this work, we introduce a systematic framework to quantitatively evaluate the explainabilit… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 4 pages, 3 figures, research paper from the Rochester Institute of Technology, focused on explainability in Neural Machine Translation. Validated metrics using English-German data subset from WMT14 and mT5 model. Results connect attention entropy and alignment agreement with translation quality

    MSC Class: 68T50 ACM Class: I.2.7; I.2.3

  44. arXiv:2412.15220  [pdf, other

    cs.MM cs.SD eess.AS

    SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text

    Authors: Haohe Liu, Gael Le Lan, Xinhao Mei, Zhaoheng Ni, Anurag Kumar, Varun Nagaraja, Wenwu Wang, Mark D. Plumbley, Yangyang Shi, Vikas Chandra

    Abstract: Video and audio are closely correlated modalities that humans naturally perceive together. While recent advancements have enabled the generation of audio or video from text, producing both modalities simultaneously still typically relies on either a cascaded process or multi-modal contrastive encoders. These approaches, however, often lead to suboptimal results due to inherent information losses d… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  45. arXiv:2412.09948  [pdf, other

    cs.GT cs.ET

    SMEVCA: Stable Matching-based EV Charging Assignment in Subscription-Based Models

    Authors: Arindam Khanda, Anurag Satpathy, Anusha Vangala, Sajal K. Das

    Abstract: The rapid shift from internal combustion engine vehicles to battery-powered electric vehicles (EVs) presents considerable challenges, such as limited charging points (CPs), unpredictable wait times, and difficulty selecting appropriate CPs. To address these challenges, we propose a novel end-to-end framework called Stable Matching EV Charging Assignment (SMEVCA) that efficiently assigns charge-see… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: This paper has been accepted for presentation at the 26th International Conference on Distributed Computing and Networking (ICDCN), 2025

  46. arXiv:2412.09578  [pdf, other

    cs.SI cs.AI cs.CL

    DISHONEST: Dissecting misInformation Spread using Homogeneous sOcial NEtworks and Semantic Topic classification

    Authors: Caleb Stam, Emily Saldanha, Mahantesh Halappanavar, Anurag Acharya

    Abstract: The emergence of the COVID-19 pandemic resulted in a significant rise in the spread of misinformation on online platforms such as Twitter. Oftentimes this growth is blamed on the idea of the "echo chamber." However, the behavior said to characterize these echo chambers exists in two dimensions. The first is in a user's social interactions, where they are said to stick with the same clique of like-… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  47. arXiv:2412.06827  [pdf, other

    cs.LG cs.AI

    Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback

    Authors: Avinash Anand, Kritarth Prasad, Chhavi Kirtani, Ashwin R Nair, Mohit Gupta, Saloni Garg, Anurag Gautam, Snehal Buldeo, Rajiv Ratn Shah

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in text-based tasks but struggle with the complex reasoning required for physics problems, particularly in advanced arithmetic and conceptual understanding. While some research has explored ways to enhance LLMs in physics education using techniques such as prompt engineering and Retrieval Augmentation Generation (RAG), not enough e… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  48. arXiv:2412.06652  [pdf

    cs.DL

    Institutional Shifts in Contribution to Indian Research Output during the last two decades

    Authors: Vivek Kumar Singh, Mousumi Karmakar, Anurag Kanaujia

    Abstract: In the past few decades, India has emerged as a major knowledge producer, with research output being contributed by a diverse set of institutions ranging from centrally funded to state funded, and from public funded to private funded institutions. A significant change has been witnessed in Indian institutional actors during the last two decades, with various new private universities being set up a… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  49. arXiv:2412.03791  [pdf, ps, other

    cs.LG cs.AI

    INRFlow: Flow Matching for INRs in Ambient Space

    Authors: Yuyang Wang, Anurag Ranjan, Josh Susskind, Miguel Angel Bautista

    Abstract: Flow matching models have emerged as a powerful method for generative modeling on domains like images or videos, and even on irregular or unstructured data like 3D point clouds or even protein structures. These models are commonly trained in two stages: first, a data compressor is trained, and in a subsequent training stage a flow matching generative model is trained in the latent space of the dat… ▽ More

    Submitted 28 May, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

    Comments: 22 pages, 14 figures, 13 tables

  50. arXiv:2412.01946  [pdf, ps, other

    cs.AI

    The Reality of AI and Biorisk

    Authors: Aidan Peppin, Anka Reuel, Stephen Casper, Elliot Jones, Andrew Strait, Usman Anwar, Anurag Agrawal, Sayash Kapoor, Sanmi Koyejo, Marie Pellat, Rishi Bommasani, Nick Frosst, Sara Hooker

    Abstract: To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could increase biorisk and a robust method for testing that threat model. This paper provides an analysis of existing available research surrounding two AI and biorisk threat models: 1) access to information and… ▽ More

    Submitted 2 January, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: Updated to correct author affiliations and clarify findings of evaluations of the o1 model