Skip to main content

Showing 1–50 of 1,328 results for author: Kumaar, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.07302  [pdf, ps, other

    cs.AI cs.RO

    Application of LLMs to Multi-Robot Path Planning and Task Allocation

    Authors: Ashish Kumar

    Abstract: Efficient exploration is a well known problem in deep reinforcement learning and this problem is exacerbated in multi-agent reinforcement learning due the intrinsic complexities of such algorithms. There are several approaches to efficiently explore an environment to learn to solve tasks by multi-agent operating in that environment, of which, the idea of expert exploration is investigated in this… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  2. arXiv:2507.07247  [pdf, ps, other

    cs.LG cs.AI cs.NE

    Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention

    Authors: Zhengyu Tian, Anantha Padmanaban Krishna Kumar, Hemant Krishnakumar, Reza Rawassizadeh

    Abstract: As large language models (LLMs) and visual language models (VLMs) grow in scale and application, attention mechanisms have become a central computational bottleneck due to their high memory and time complexity. While many efficient attention variants have been proposed, there remains a lack of rigorous evaluation on their actual energy usage and hardware resource demands during training. In this w… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: 6 pages, 8 figures

  3. arXiv:2507.07028  [pdf, ps, other

    cs.DM

    On Construction of Approximate Real Mutually Unbiased Bases for an infinite class of dimensions $d \not\equiv 0 \bmod 4$

    Authors: Ajeet Kumar, Rakesh Kumar, Subhamoy Maitra, Uddipto Mandal

    Abstract: It is known that real Mutually Unbiased Bases (MUBs) do not exist for any dimension $d > 2$ which is not divisible by 4. Thus, the next combinatorial question is how one can construct Approximate Real MUBs (ARMUBs) in this direction with encouraging parameters. In this paper, for the first time, we show that it is possible to construct $> \lceil \sqrt{d} \rceil$ many ARMUBs for certain odd dimensi… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  4. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  5. arXiv:2507.03152  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Expert-level validation of AI-generated medical text with scalable language models

    Authors: Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador-Martinez, Eduardo Juan Perez Guerrero, Paola Naovi Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy D. Zandee van Rilland, Poonam Laxmappa Hosamani, Kevin R Keet, Minjoung Go, Evelyn Ling, David B. Larson, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo , et al. (2 additional authors not shown)

    Abstract: With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review. However, detecting errors in LM-generated text is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in rea… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  6. arXiv:2507.02660  [pdf, ps, other

    cs.AI cs.AR

    Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & Verification

    Authors: Deepak Narayan Gadde, Keerthan Kopparam Radhakrishna, Vaisakh Naduvodi Viswambharan, Aman Kumar, Djones Lettnin, Wolfgang Kunz, Sebastian Simon

    Abstract: Modern Integrated Circuits (ICs) are becoming increasingly complex, and so is their development process. Hardware design verification entails a methodical and disciplined approach to the planning, development, execution, and sign-off of functionally correct hardware designs. This tedious process requires significant effort and time to ensure a bug-free tape-out. The field of Natural Language Proce… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: To appear at the 38th SBC/SBMicro/IEEE Symposium on Integrated Circuits and Systems Design (SBCCI), August 25-29, 2025, Manaus, BRAZIL

  7. arXiv:2507.02492  [pdf, ps, other

    cs.DM

    On Obtaining New MUBs by Finding Points on Complete Intersection Varieties over $\mathbb{R}$

    Authors: Arindam Banerjee, Kanoy Kumar Das, Ajeet Kumar, Rakesh Kumar, Subhamoy Maitra

    Abstract: Mutually Unbiased Bases (MUBs) are closely connected with quantum physics, and the structure has a rich mathematical background. We provide equivalent criteria for extending a set of MUBs for $C^n$ by studying real points of a certain affine algebraic variety. This variety comes from the relations that determine the extendability of a system of MUBs. Finally, we show that some part of this variety… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  8. arXiv:2507.00971  [pdf, ps, other

    cs.LG cs.AI

    Reasoning as an Adaptive Defense for Safety

    Authors: Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan, Aviral Kumar

    Abstract: Reasoning methods that adaptively allocate test-time compute have advanced LLM performance on easy to verify domains such as math and code. In this work, we study how to utilize this approach to train models that exhibit a degree of robustness to safety vulnerabilities, and show that doing so can provide benefits. We build a recipe called $\textit{TARS}$ (Training Adaptive Reasoners for Safety), a… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 42 pages, 11 Figures, 7 Tables

  9. arXiv:2506.23924  [pdf, ps, other

    cs.AI

    Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice

    Authors: Akshit Kumar, Tianyi Peng, Yuhang Wu, Assaf Zeevi

    Abstract: Large language models (LLMs) have exhibited expert-level capabilities across various domains. However, their abilities to solve problems in Operations Research (OR) -- the analysis and optimization of mathematical models derived from real-world problems or their verbal descriptions -- remain underexplored. In this work, we take a first step toward evaluating LLMs' abilities to solve stochastic mod… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  10. arXiv:2506.23874  [pdf, ps, other

    eess.AS cs.SD

    URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition

    Authors: Jiahe Wang, Chenda Li, Wei Wang, Wangyou Zhang, Samuele Cornell, Marvin Sach, Robin Scheibler, Kohei Saijo, Yihui Fu, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian

    Abstract: The Mean Opinion Score (MOS) is fundamental to speech quality assessment. However, its acquisition requires significant human annotation. Although deep neural network approaches, such as DNSMOS and UTMOS, have been developed to predict MOS to avoid this issue, they often suffer from insufficient training data. Recognizing that the comparison of speech enhancement (SE) systems prioritizes a reliabl… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Submitted to ASRU2025

  11. arXiv:2506.23859  [pdf, ps, other

    eess.AS cs.SD

    Less is More: Data Curation Matters in Scaling Speech Enhancement

    Authors: Chenda Li, Wangyou Zhang, Wei Wang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Yihui Fu, Marvin Sach, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian

    Abstract: The vast majority of modern speech enhancement systems rely on data-driven neural network models. Conventionally, larger datasets are presumed to yield superior model performance, an observation empirically validated across numerous tasks in other domains. However, recent studies reveal diminishing returns when scaling speech enhancement data. We focus on a critical factor: prevalent quality issue… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Submitted to ASRU2025

  12. arXiv:2506.19863  [pdf, ps, other

    physics.comp-ph cs.AI

    Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research

    Authors: Ahmed Almeldein, Mohammed Alnaggar, Rick Archibald, Tom Beck, Arpan Biswas, Rike Bostelmann, Wes Brewer, Chris Bryan, Christopher Calle, Cihangir Celik, Rajni Chahal, Jong Youl Choi, Arindam Chowdhury, Mark Cianciosa, Franklin Curtis, Gregory Davidson, Sebastian De Pascuale, Lisa Fassino, Ana Gainaru, Yashika Ghai, Luke Gibson, Qian Gong, Christopher Greulich, Scott Greenwood, Cory Hauck , et al. (25 additional authors not shown)

    Abstract: The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to au… ▽ More

    Submitted 26 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  13. arXiv:2506.19014  [pdf, ps, other

    cs.SD cs.AI eess.AS

    IndieFake Dataset: A Benchmark Dataset for Audio Deepfake Detection

    Authors: Abhay Kumar, Kunal Verma, Omkar More

    Abstract: Advancements in audio deepfake technology offers benefits like AI assistants, better accessibility for speech impairments, and enhanced entertainment. However, it also poses significant risks to security, privacy, and trust in digital communications. Detecting and mitigating these threats requires comprehensive datasets. Existing datasets lack diverse ethnic accents, making them inadequate for man… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: Project Website: https://indie-fake-dataset.netlify.app/

  14. A Sea of Cyber Threats: Maritime Cybersecurity from the Perspective of Mariners

    Authors: Anna Raymaker, Akshaya Kumar, Miuyin Yong Wong, Ryan Pickren, Animesh Chhotaray, Frank Li, Saman Zonouz, Raheem Beyah

    Abstract: Maritime systems, including ships and ports, are critical components of global infrastructure, essential for transporting over 80% of the world's goods and supporting internet connectivity. However, these systems face growing cybersecurity threats, as shown by recent attacks disrupting Maersk, one of the world's largest shipping companies, causing widespread impacts on international trade. The uni… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 18 pages, 2 figures, To appear in the Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS '25)

  15. arXiv:2506.12347  [pdf, ps, other

    cs.SE cs.HC

    Sharp Tools: How Developers Wield Agentic AI in Real Software Engineering Tasks

    Authors: Aayush Kumar, Yasharth Bajpai, Sumit Gulwani, Gustavo Soares, Emerson Murphy-Hill

    Abstract: Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication cha… ▽ More

    Submitted 17 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

  16. arXiv:2506.12181  [pdf, ps, other

    cs.LG cs.CL

    Generative or Discriminative? Revisiting Text Classification in the Era of Transformers

    Authors: Siva Rajesh Kasa, Karan Gupta, Sumegh Roychowdhury, Ashutosh Kumar, Yaswanth Biruduraju, Santhosh Kumar Kasa, Nikhil Priyatam Pattisapu, Arindam Bhattacharya, Shailendra Agarwal, Vijay huddar

    Abstract: The comparison between discriminative and generative classifiers has intrigued researchers since Efron's seminal analysis of logistic regression versus discriminant analysis. While early theoretical work established that generative classifiers exhibit lower sample complexity but higher asymptotic error in simple linear settings, these trade-offs remain unexplored in the transformer era. We present… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 19 pages

  17. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  18. arXiv:2506.10999  [pdf

    cs.SE cs.AI

    Automated Validation of COBOL to Java Transformation

    Authors: Atul Kumar, Diptikalyan Saha, Toshikai Yasue, Kohichi Ono, Saravanan Krishnan, Sandeep Hans, Fumiko Satoh, Gerald Mitchell, Sachin Kumar

    Abstract: Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterpriselevel code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code. We propose a framework and a tool t… ▽ More

    Submitted 14 April, 2025; originally announced June 2025.

    Comments: arXiv admin note: text overlap with arXiv:2504.10548

    Journal ref: ASE 2024

  19. arXiv:2506.10797  [pdf

    physics.med-ph cs.CV

    Modality-AGnostic Image Cascade (MAGIC) for Multi-Modality Cardiac Substructure Segmentation

    Authors: Nicholas Summerfield, Qisheng He, Alex Kuo, Ahmed I. Ghanem, Simeng Zhu, Chase Ruff, Joshua Pan, Anudeep Kumar, Prashant Nagpal, Jiwei Zhao, Ming Dong, Carri K. Glide-Hurst

    Abstract: Cardiac substructures are essential in thoracic radiation therapy planning to minimize risk of radiation-induced heart disease. Deep learning (DL) offers efficient methods to reduce contouring burden but lacks generalizability across different modalities and overlapping structures. This work introduces and validates a Modality-AGnostic Image Cascade (MAGIC) for comprehensive and multi-modal cardia… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  20. arXiv:2506.10150  [pdf, ps, other

    cs.CL cs.HC

    When Large Language Models are Reliable for Judging Empathic Communication

    Authors: Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Erina Farrell, Bruce Lambert, Matthew Groh

    Abstract: Large language models (LLMs) excel at generating empathic responses in text-based conversations. But, how reliably do they judge the nuances of empathic communication? We investigate this question by comparing how experts, crowdworkers, and LLMs annotate empathic communication across four evaluative frameworks drawn from psychology, natural language processing, and communications applied to 200 re… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  21. arXiv:2506.09661  [pdf, ps, other

    eess.IV cs.CV q-bio.TO

    A Cytology Dataset for Early Detection of Oral Squamous Cell Carcinoma

    Authors: Garima Jain, Sanghamitra Pati, Mona Duggal, Amit Sethi, Abhijeet Patil, Gururaj Malekar, Nilesh Kowe, Jitender Kumar, Jatin Kashyap, Divyajeet Rout, Deepali, Hitesh, Nishi Halduniya, Sharat Kumar, Heena Tabassum, Rupinder Singh Dhaliwal, Sucheta Devi Khuraijam, Sushma Khuraijam, Sharmila Laishram, Simmi Kharb, Sunita Singh, K. Swaminadtan, Ranjana Solanki, Deepika Hemranjani, Shashank Nath Singh , et al. (12 additional authors not shown)

    Abstract: Oral squamous cell carcinoma OSCC is a major global health burden, particularly in several regions across Asia, Africa, and South America, where it accounts for a significant proportion of cancer cases. Early detection dramatically improves outcomes, with stage I cancers achieving up to 90 percent survival. However, traditional diagnosis based on histopathology has limited accessibility in low-res… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 7 pages, 2 figurs

  22. arXiv:2506.09026  [pdf, ps, other

    cs.LG cs.CL

    e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

    Authors: Amrith Setlur, Matthew Y. R. Yang, Charlie Snell, Jeremy Greer, Ian Wu, Virginia Smith, Max Simchowitz, Aviral Kumar

    Abstract: Test-time scaling offers a promising path to improve LLM reasoning by utilizing more compute at inference time; however, the true promise of this paradigm lies in extrapolation (i.e., improvement in performance on hard problems as LLMs keep "thinking" for longer, beyond the maximum token budget they were trained on). Surprisingly, we find that most existing reasoning models do not extrapolate well… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  23. arXiv:2506.07976  [pdf, ps, other

    cs.LG cs.AI

    Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

    Authors: Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar

    Abstract: The current paradigm of test-time scaling relies on generating long reasoning traces ("thinking" more) before producing a response. In agent problems that require interaction, this can be done by generating thinking traces before acting in the world. However, this process does not allow agents to acquire new information from the environment or adapt their behavior over time. In this work, we propo… ▽ More

    Submitted 10 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Fixed typo in Figure 6 and Conclusion

  24. arXiv:2506.07633  [pdf, ps, other

    cs.RO

    Blending Participatory Design and Artificial Awareness for Trustworthy Autonomous Vehicles

    Authors: Ana Tanevska, Ananthapathmanabhan Ratheesh Kumar, Arabinda Ghosh, Ernesto Casablanca, Ginevra Castellano, Sadegh Soudjani

    Abstract: Current robotic agents, such as autonomous vehicles (AVs) and drones, need to deal with uncertain real-world environments with appropriate situational awareness (SA), risk awareness, coordination, and decision-making. The SymAware project strives to address this issue by designing an architecture for artificial awareness in multi-agent systems, enabling safe collaboration of autonomous vehicles an… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Submitted to IEEE RO-MAN 2025

  25. arXiv:2506.07309  [pdf, other

    cs.CL

    ConfQA: Answer Only If You Are Confident

    Authors: Yin Huang, Yifan Ethan Xu, Kai Sun, Vera Yan, Alicia Sun, Haidar Khan, Jimmy Nguyen, Mohammad Kachuee, Zhaojiang Lin, Yue Liu, Aaron Colak, Anuj Kumar, Wen-tau Yih, Xin Luna Dong

    Abstract: Can we teach Large Language Models (LLMs) to refrain from hallucinating factual statements? In this paper we present a fine-tuning strategy that we call ConfQA, which can reduce hallucination rate from 20-40% to under 5% across multiple factuality benchmarks. The core idea is simple: when the LLM answers a question correctly, it is trained to continue with the answer; otherwise, it is trained to a… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 10 pages main content, 10 pages appendix, 5 figures, 7 tables

  26. arXiv:2506.05904  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.HC

    Proactive Assistant Dialogue Generation from Streaming Egocentric Videos

    Authors: Yichi Zhang, Xin Luna Dong, Zhaojiang Lin, Andrea Madotto, Anuj Kumar, Babak Damavandi, Joyce Chai, Seungwhan Moon

    Abstract: Recent advances in conversational AI have been substantial, but developing real-time systems for perceptual task guidance remains challenging. These systems must provide interactive, proactive assistance based on streaming visual inputs, yet their development is constrained by the costly and labor-intensive process of data collection and system evaluation. To address these limitations, we present… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  27. arXiv:2506.05538  [pdf, other

    cs.LG cs.MM

    SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms

    Authors: Arnesh Batra, Anushk Kumar, Jashn Khemani, Arush Gumber, Arhan Jain, Somil Gupta

    Abstract: The rapid advancement of deep generative models has significantly improved the realism of synthetic media, presenting both opportunities and security challenges. While deepfake technology has valuable applications in entertainment and accessibility, it has emerged as a potent vector for misinformation campaigns, particularly on social media. Existing detection frameworks struggle to distinguish be… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  28. An Independent Discriminant Network Towards Identification of Counterfeit Images and Videos

    Authors: Shayantani Kar, B. Shresth Bhimrajka, Aditya Kumar, Sahil Gupta, Sourav Ghosh, Subhamita Mukherjee, Shauvik Paul

    Abstract: Rapid spread of false images and videos on online platforms is an emerging problem. Anyone may add, delete, clone or modify people and entities from an image using various editing software which are readily available. This generates false and misleading proof to hide the crime. Now-a-days, these false and counterfeit images and videos are flooding on the internet. These spread false information. M… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: This research was conducted by student and professor co-authors from Techno Main Salt Lake, with co-author Sourav Ghosh serving as an alumni mentor in an invited capacity -- distinct from his primary affiliation and pre-approved by his employer. This preprint presents research originally completed in early 2023 and published in IETE Journal of Research in 2025

    Journal ref: IETE Journal of Research (TIJR), 2025

  29. arXiv:2506.04168  [pdf, ps, other

    cs.LG cs.AI

    Horizon Reduction Makes RL Scalable

    Authors: Seohong Park, Kevin Frans, Deepinder Mann, Benjamin Eysenbach, Aviral Kumar, Sergey Levine

    Abstract: In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks,… ▽ More

    Submitted 8 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

  30. arXiv:2506.03910  [pdf, ps, other

    cs.LG

    Enhancing Experimental Efficiency in Materials Design: A Comparative Study of Taguchi and Machine Learning Methods

    Authors: Shyam Prabhu, P Akshay Kumar, Antov Selwinston, Pavan Taduvai, Shreya Bairi, Rohit Batra

    Abstract: Materials design problems often require optimizing multiple variables, rendering full factorial exploration impractical. Design of experiment (DOE) methods, such as Taguchi technique, are commonly used to efficiently sample the design space but they inherently lack the ability to capture non-linear dependency of process variables. In this work, we demonstrate how machine learning (ML) methods can… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 7 pages, 3 figures

  31. arXiv:2506.03425  [pdf, ps, other

    eess.AS cs.AI cs.LG

    A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations

    Authors: Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj

    Abstract: Evaluating explainability techniques, such as SHAP and LRP, in the context of audio deepfake detection is challenging due to lack of clear ground truth annotations. In the cases when we are able to obtain the ground truth, we find that these methods struggle to provide accurate explanations. In this work, we propose a novel data-driven approach to identify artifact regions in deepfake audio. We co… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 5 pages, 3 figures, accepted at Interspeech 2025

  32. arXiv:2506.03369  [pdf, ps, other

    econ.TH cs.CY cs.IR

    Impact of Rankings and Personalized Recommendations in Marketplaces

    Authors: Omar Besbes, Yash Kanoria, Akshit Kumar

    Abstract: Individuals often navigate several options with incomplete knowledge of their own preferences. Information provisioning tools such as public rankings and personalized recommendations have become central to helping individuals make choices, yet their value proposition under different marketplace environments remains unexplored. This paper studies a stylized model to explore the impact of these tool… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  33. arXiv:2506.01611  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Lessons Learned from the URGENT 2024 Speech Enhancement Challenge

    Authors: Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

    Abstract: The URGENT 2024 Challenge aims to foster speech enhancement (SE) techniques with great universality, robustness, and generalizability, featuring a broader task definition, large-scale multi-domain data, and comprehensive evaluation metrics. Nourished by the challenge outcomes, this paper presents an in-depth analysis of two key, yet understudied, issues in SE system development: data cleaning and… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 5 pages, 4 figures, 1 table. Accepted by Interspeech 2025. Code available at https://github.com/urgent-challenge/urgent2024_analysis

  34. arXiv:2506.01451  [pdf

    cs.CL cs.IR

    Building Entity Association Mining Framework for Knowledge Discovery

    Authors: Anshika Rawal, Abhijeet Kumar, Mridul Mishra

    Abstract: Extracting useful signals or pattern to support important business decisions for example analyzing investment product traction and discovering customer preference, risk monitoring etc. from unstructured text is a challenging task. Capturing interaction of entities or concepts and association mining is a crucial component in text mining, enabling information extraction and reasoning over and knowle… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Presented at Business Analytics and Intelligence Conference, IIM Bengaluru

    ACM Class: I.2.7

  35. arXiv:2505.23678  [pdf, ps, other

    cs.CV

    Grounded Reinforcement Learning for Visual Reasoning

    Authors: Gabriel Sarch, Snigdha Saha, Naitik Khandelwal, Ayush Jain, Michael J. Tarr, Aviral Kumar, Katerina Fragkiadaki

    Abstract: While reinforcement learning (RL) over chains of thought has significantly advanced language models in tasks such as mathematics and coding, visual reasoning introduces added complexity by requiring models to direct visual attention, interpret perceptual inputs, and ground abstract reasoning in spatial evidence. We introduce ViGoRL (Visually Grounded Reinforcement Learning), a vision-language mode… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project website: https://visually-grounded-rl.github.io/

  36. arXiv:2505.23523  [pdf, ps, other

    cs.LG cs.DC

    Accelerating AllReduce with a Persistent Straggler

    Authors: Arjun Devraj, Eric Ding, Abhishek Vijaya Kumar, Robert Kleinberg, Rachee Singh

    Abstract: Distributed machine learning workloads use data and tensor parallelism for training and inference, both of which rely on the AllReduce collective to synchronize gradients or activations. However, bulk-synchronous AllReduce algorithms can be delayed by a persistent straggler that is slower to reach the synchronization barrier required to begin the collective. To address this challenge, we propose S… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 23 pages, 11 figures

  37. arXiv:2505.23150  [pdf, ps, other

    cs.LG

    Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

    Authors: Michal Nauman, Marek Cygan, Carmelo Sferrazza, Aviral Kumar, Pieter Abbeel

    Abstract: Recent advances in language modeling and vision stem from training large models on diverse, multi-task data. This paradigm has had limited impact in value-based reinforcement learning (RL), where improvements are often driven by small models trained in a single-task context. This is because in multi-task RL sparse rewards and gradient conflicts make optimization of temporal difference brittle. Pra… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: preprint

  38. arXiv:2505.23105  [pdf, ps, other

    cs.LG cs.NI

    LUMION: Fast Fault Recovery for ML Jobs Using Programmable Optical Fabrics

    Authors: Abhishek Vijaya Kumar, Eric Ding, Arjun Devraj, Darius Bunandar, Rachee Singh

    Abstract: When accelerators fail in modern ML datacenters, operators migrate the affected ML training or inference jobs to entirely new racks. This approach, while preserving network performance, is highly inefficient, requiring datacenters to reserve full racks of idle accelerators for fault tolerance. In this paper, we address this resource inefficiency by introducing LUMION, a novel reconfigurable optica… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  39. arXiv:2505.22550  [pdf

    cs.IR

    Domain specific ontologies from Linked Open Data (LOD)

    Authors: Rosario Uceda-Sosa, Nandana Mihindukulasooriya, Atul Kumar, Sahil Bansal, Seema Nagar

    Abstract: Logical and probabilistic reasoning tasks that require a deeper knowledge of semantics are increasingly relying on general purpose ontologies such as Wikidata and DBpedia. However, tasks such as entity disambiguation and linking may benefit from domain specific knowledge graphs, which make it more efficient to consume the knowledge and easier to extend with proprietary content. We discuss our expe… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  40. arXiv:2505.17206  [pdf, ps, other

    cs.CL cs.AI

    FB-RAG: Improving RAG with Forward and Backward Lookup

    Authors: Kushal Chawla, Alfy Samuel, Anoop Kumar, Daben Liu

    Abstract: The performance of Retrieval Augmented Generation (RAG) systems relies heavily on the retriever quality and the size of the retrieved context. A large enough context ensures that the relevant information is present in the input context for the LLM, but also incorporates irrelevant content that has been shown to confuse the models. On the other hand, a smaller context reduces the irrelevant informa… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  41. arXiv:2505.14900  [pdf

    cs.DB

    Implementing Decentralized Per-Partition Automatic Failover in Azure Cosmos DB

    Authors: Josh Rowe, Mikael Horal, Hari Sudan Sundar, Muthukumaran Arumugam, Burak Kose, Sravani Mitra Palivela, Geni Marsh, Varun Jain, Abhishek Kumar, Dhaval Patel

    Abstract: Azure Cosmos DB is a cloud-native distributed database, operating at a massive scale, powering Microsoft Cloud. Think 10s of millions of database partitions (replica-sets), 100+ PBs of data under management, 20M+ vCores. Failovers are an integral part of distributed databases to provide data availability during outages (partial or full regional outages). While failovers within a replica-set within… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    ACM Class: H.2.4; H.2.7

  42. arXiv:2505.14846  [pdf, ps, other

    cs.CV

    Open-Set Semi-Supervised Learning for Long-Tailed Medical Datasets

    Authors: Daniya Najiha A. Kareem, Jean Lahoud, Mustansar Fiaz, Amandeep Kumar, Hisham Cholakkal

    Abstract: Many practical medical imaging scenarios include categories that are under-represented but still crucial. The relevance of image recognition models to real-world applications lies in their ability to generalize to these rare classes as well as unseen classes. Real-world generalization requires taking into account the various complexities that can be encountered in the real-world. First, training d… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  43. arXiv:2505.13487  [pdf, ps, other

    cs.CL

    Detecting Prefix Bias in LLM-based Reward Models

    Authors: Ashwin Kumar, Yuzi He, Aram H. Markosyan, Bobbie Chern, Imanol Arrieta-Ibarra

    Abstract: Reinforcement Learning with Human Feedback (RLHF) has emerged as a key paradigm for task-specific fine-tuning of language models using human preference data. While numerous publicly available preference datasets provide pairwise comparisons of responses, the potential for biases in the resulting reward models remains underexplored. In this work, we introduce novel methods to detect and evaluate pr… ▽ More

    Submitted 19 June, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  44. arXiv:2505.12834  [pdf

    cs.CV

    A Study on the Refining Handwritten Font by Mixing Font Styles

    Authors: Avinash Kumar, Kyeolhee Kang, Ammar ul Hassan, Jaeyoung Choi

    Abstract: Handwritten fonts have a distinct expressive character, but they are often difficult to read due to unclear or inconsistent handwriting. FontFusionGAN (FFGAN) is a novel method for improving handwritten fonts by combining them with printed fonts. Our method implements generative adversarial network (GAN) to generate font that mix the desirable features of handwritten and printed fonts. By training… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 4 pages, 3 figures, MITA 2023 (The 19th International Conference on Multimedia Information Technology and Applications July. 11 ~ July 14, 2023, Technical University of Ostrava, Ostrava, Czech)

  45. arXiv:2505.12425  [pdf, other

    cs.CV

    Kornia-rs: A Low-Level 3D Computer Vision Library In Rust

    Authors: Edgar Riba, Jian Shi, Aditya Kumar, Andrew Shen, Gary Bradski

    Abstract: We present \textit{kornia-rs}, a high-performance 3D computer vision library written entirely in native Rust, designed for safety-critical and real-time applications. Unlike C++-based libraries like OpenCV or wrapper-based solutions like OpenCV-Rust, \textit{kornia-rs} is built from the ground up to leverage Rust's ownership model and type system for memory and thread safety. \textit{kornia-rs} ad… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  46. arXiv:2505.12154  [pdf, ps, other

    cs.CV cs.SD eess.AS

    Learning to Highlight Audio by Watching Movies

    Authors: Chao Huang, Ruohan Gao, J. M. F. Tsang, Jan Kurcius, Cagdas Bilen, Chenliang Xu, Anurag Kumar, Sanjeel Parekh

    Abstract: Recent years have seen a significant increase in video content creation and consumption. Crafting engaging content requires the careful curation of both visual and audio elements. While visual cue curation, through techniques like optimal viewpoint selection or post-editing, has been central to media production, its natural counterpart, audio, has not undergone equivalent advancements. This often… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: CVPR 2025. Project page: https://wikichao.github.io/VisAH/

  47. arXiv:2505.12143  [pdf, ps, other

    cs.LG cs.AI

    Structured Representation

    Authors: Arun Kumar, Paul Schrater

    Abstract: Invariant representations are core to representation learning, yet a central challenge remains: uncovering invariants that are stable and transferable without suppressing task-relevant signals. This raises fundamental questions, requiring further inquiry, about the appropriate level of abstraction at which such invariants should be defined, and which aspects of a system they should characterize. I… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  48. arXiv:2505.11958  [pdf, ps, other

    cs.CL

    Counterspeech the ultimate shield! Multi-Conditioned Counterspeech Generation through Attributed Prefix Learning

    Authors: Aswini Kumar, Anil Bandhakavi, Tanmoy Chakraborty

    Abstract: Counterspeech has proven to be a powerful tool to combat hate speech online. Previous studies have focused on generating counterspeech conditioned only on specific intents (single attributed). However, a holistic approach considering multiple attributes simultaneously can yield more nuanced and effective responses. Here, we introduce HiPPrO, Hierarchical Prefix learning with Preference Optimizatio… ▽ More

    Submitted 31 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

    Comments: Accepted in ACL 2025 Main Conference

  49. arXiv:2505.11581  [pdf, ps, other

    cs.CV cs.LG cs.NE

    Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis

    Authors: Akarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanley

    Abstract: Much of the excitement in modern AI is driven by the observation that scaling up existing systems leads to better performance. But does better performance necessarily imply better internal representations? While the representational optimist assumes it must, this position paper challenges that view. We compare neural networks evolved through an open-ended search process to networks trained via con… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 43 pages, 25 figures

  50. arXiv:2505.09251  [pdf

    cs.CV

    A Surrogate Model for the Forward Design of Multi-layered Metasurface-based Radar Absorbing Structures

    Authors: Vineetha Joy, Aditya Anand, Nidhi, Anshuman Kumar, Amit Sethi, Hema Singh

    Abstract: Metasurface-based radar absorbing structures (RAS) are highly preferred for applications like stealth technology, electromagnetic (EM) shielding, etc. due to their capability to achieve frequency selective absorption characteristics with minimal thickness and reduced weight penalty. However, the conventional approach for the EM design and optimization of these structures relies on forward simulati… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.