Skip to main content

Showing 1–50 of 223 results for author: Jain, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03541  [pdf, ps, other

    cs.CV cs.AI

    Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

    Authors: Redwan Sony, Parisa Farmanifard, Arun Ross, Anil K. Jain

    Abstract: In this paper, we address the following question: How do generic foundation models (e.g., CLIP, BLIP, LLaVa, DINO) compare against a domain-specific face recognition model (viz., AdaFace or ArcFace) on the face recognition task? Through a series of experiments involving several foundation models and benchmark datasets, we are able to report the following findings: (a) In all datasets considered, d… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  2. arXiv:2506.16784  [pdf, ps, other

    cs.CV cs.MM

    TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration

    Authors: Xiaoyu Shi, Rahul Kumar Jain, Yinhao Li, Ruibo Hou, Jingliang Cheng, Jie Bai, Guohua Zhao, Lanfen Lin, Rui Xu, Yen-wei Chen

    Abstract: Deep learning has demonstrated remarkable success in medical image segmentation and computer-aided diagnosis. In particular, numerous advanced methods have achieved state-of-the-art performance in brain tumor segmentation from MRI scans. While recent studies in other medical imaging domains have revealed that integrating textual reports with visual data can enhance segmentation accuracy, the field… ▽ More

    Submitted 23 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

  3. arXiv:2506.10910  [pdf, ps, other

    cs.CL

    Magistral

    Authors: Mistral-AI, :, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou , et al. (76 additional authors not shown)

    Abstract: We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  4. arXiv:2506.06452  [pdf, ps, other

    cs.DS

    Efficient Computation of Closed Substrings

    Authors: Samkith K Jain, Neerja Mhaskar

    Abstract: A closed string $u$ is either of length one or contains a border that occurs only as a prefix and as a suffix in $u$ and nowhere else within $u$. In this paper, we present a fast and practical $O(n\log n)$ time algorithm to compute all $Θ(n^2)$ closed substrings by introducing a compact representation for all closed substrings of a string $ w[1..n]$, using only $O(n \log n)$ space. We also present… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Submitted to SPIRE 2025

  5. arXiv:2506.05294  [pdf, ps, other

    cs.LG

    A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$: Robust Imitation via Learning to Search

    Authors: Arnav Kumar Jain, Vibhakar Mohta, Subin Kim, Atiksh Bhardwaj, Juntao Ren, Yunhai Feng, Sanjiban Choudhury, Gokul Swamy

    Abstract: The fundamental limitation of the behavioral cloning (BC) approach to imitation learning is that it only teaches an agent what the expert did at states the expert visited. This means that when a BC agent makes a mistake which takes them out of the support of the demonstrations, they often don't know how to recover from it. In this sense, BC is akin to giving the agent the fish -- giving them dense… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  6. arXiv:2506.01085  [pdf, ps, other

    cs.CV cs.AI

    Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

    Authors: Shivam Chandhok, Qian Yang, Oscar Manas, Kanishk Jain, Leonid Sigal, Aishwarya Agrawal

    Abstract: Instruction tuning has been central to the success of recent vision-language models (VLMs), but it remains expensive-requiring large-scale datasets, high-quality annotations, and large compute budgets. We propose PRioritized cOncept learninG via Relative Error-driven Sample Selection (PROGRESS), a data- and compute-efficient framework that enables VLMs to dynamically select what to learn next base… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Preprint

  7. arXiv:2505.00035  [pdf, other

    cs.CL cs.AI

    Linguistic Complexity and Socio-cultural Patterns in Hip-Hop Lyrics

    Authors: Aayam Bansal, Raghav Agarwal, Kaashvi Jain

    Abstract: This paper presents a comprehensive computational framework for analyzing linguistic complexity and socio-cultural trends in hip-hop lyrics. Using a dataset of 3,814 songs from 146 influential artists spanning four decades (1980-2020), we employ natural language processing techniques to quantify multiple dimensions of lyrical complexity. Our analysis reveals a 23.7% increase in vocabulary diversit… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

    Comments: 12 pages

  8. arXiv:2504.10883  [pdf, other

    cs.CV cs.AI

    Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models

    Authors: Karan Jain, Mohammad Nayeem Teli

    Abstract: Diffusion models have recently gained state of the art performance on many image generation tasks. However, most models require significant computational resources to achieve this. This becomes apparent in the application of medical image synthesis due to the 3D nature of medical datasets like CT-scans, MRIs, electron microscope, etc. In this paper we propose a novel architecture for a single GPU… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  9. arXiv:2504.07250  [pdf, other

    cs.SE

    Improving Examples in Web API Specifications using Iterated-Calls In-Context Learning

    Authors: Kush Jain, Kiran Kate, Jason Tsay, Claire Le Goues, Martin Hirzel

    Abstract: Examples in web API specifications can be essential for API testing, API understanding, and even building chat-bots for APIs. Unfortunately, most API specifications lack human-written examples. This paper introduces a novel technique for generating examples for web API specifications. We start from in-context learning (ICL): given an API parameter, use a prompt context containing a few examples fr… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  10. arXiv:2504.05504  [pdf, other

    cs.CV

    SelfMAD: Enhancing Generalization and Robustness in Morphing Attack Detection via Self-Supervised Learning

    Authors: Marija Ivanovska, Leon Todorov, Naser Damer, Deepak Kumar Jain, Peter Peer, Vitomir Štruc

    Abstract: With the continuous advancement of generative models, face morphing attacks have become a significant challenge for existing face verification systems due to their potential use in identity fraud and other malicious activities. Contemporary Morphing Attack Detection (MAD) approaches frequently rely on supervised, discriminative models trained on examples of bona fide and morphed images. These mode… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Accepted at IEEE International Conference on Automatic Face and Gesture Recognition (FG 2025)

  11. arXiv:2504.02521  [pdf, other

    cs.CL

    UNDO: Understanding Distillation as Optimization

    Authors: Kushal Jain, Piyushi Goyal, Kumar Shridhar

    Abstract: Knowledge distillation has emerged as an effective strategy for compressing large language models' (LLMs) knowledge into smaller, more efficient student models. However, standard one-shot distillation methods often produce suboptimal results due to a mismatch between teacher-generated rationales and the student's specific learning requirements. In this paper, we introduce the UNDO: UNderstanding D… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  12. arXiv:2504.00698  [pdf

    cs.CL cs.AI cs.LG

    Command A: An Enterprise-Ready Large Language Model

    Authors: Team Cohere, :, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Milad Alizadeh, Yazeed Alnumay, Sophia Althammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller, Raphaël Avalos, Zahara Aviv, Sammie Bae, Saurabh Baji, Alexandre Barbet, Max Bartolo, Björn Bebensee, Neeral Beladia, Walter Beller-Morales, Alexandre Bérard, Andrew Berneshawi, Anna Bialas, Phil Blunsom , et al. (205 additional authors not shown)

    Abstract: In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Genera… ▽ More

    Submitted 14 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: 55 pages

  13. arXiv:2503.22984  [pdf, other

    cs.CV

    Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing

    Authors: Zhuowei Li, Tianchen Zhao, Xiang Xu, Zheng Zhang, Zhihua Li, Xuanbai Chen, Qin Zhang, Alessandro Bergamo, Anil K. Jain, Yifan Xing

    Abstract: Developing a face anti-spoofing model that meets the security requirements of clients worldwide is challenging due to the domain gap between training datasets and diverse end-user test data. Moreover, for security and privacy reasons, it is undesirable for clients to share a large amount of their face data with service providers. In this work, we introduce a novel method in which the face anti-spo… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: 15 pages, 7 figures

    ACM Class: I.5.4; I.2.10; I.4.8; I.2.6; C.3

  14. arXiv:2503.14713  [pdf, other

    cs.SE

    TestForge: Feedback-Driven, Agentic Test Suite Generation

    Authors: Kush Jain, Claire Le Goues

    Abstract: Automated test generation holds great promise for alleviating the burdens of manual test creation. However, existing search-based techniques compromise on test readability, while LLM-based approaches are prohibitively expensive in practice. We present TestForge, an agentic unit testing framework designed to cost-effectively generate high-quality test suites for real-world code. Our key insight is… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  15. arXiv:2502.20380  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Multi-Turn Code Generation Through Single-Step Rewards

    Authors: Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush, Wenting Zhao, Sanjiban Choudhury

    Abstract: We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple yet scalable approach, $μ$Code, that solves multi-turn code generation using only single-step rewards. Our key insight is that code generation is a one-step recove… ▽ More

    Submitted 27 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: 9 pages (not including references or appendix); 5 figures (in main paper); (v2) camera-ready version

  16. arXiv:2502.20356  [pdf, other

    cs.CL cs.AI cs.LG

    Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs

    Authors: Kuan Lok Zhou, Jiayi Chen, Siddharth Suresh, Reuben Narad, Timothy T. Rogers, Lalit K Jain, Robert D Nowak, Bob Mankoff, Jifan Zhang

    Abstract: Large Language Models (LLMs) have shown significant limitations in understanding creative content, as demonstrated by Hessel et al. (2023)'s influential work on the New Yorker Cartoon Caption Contest (NYCCC). Their study exposed a substantial gap between LLMs and humans in humor comprehension, establishing that understanding and evaluating creative content is key challenge in AI development. We re… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  17. arXiv:2502.19187  [pdf, other

    cs.CL

    BIG-Bench Extra Hard

    Authors: Mehran Kazemi, Bahare Fatemi, Hritik Bansal, John Palowitch, Chrysovalantis Anastasiou, Sanket Vaibhav Mehta, Lalit K. Jain, Virginia Aglietti, Disha Jindal, Peter Chen, Nishanth Dikkala, Gladys Tyen, Xin Liu, Uri Shalit, Silvia Chiappa, Kate Olszewska, Yi Tay, Vinh Q. Tran, Quoc V. Le, Orhan Firat

    Abstract: Large language models (LLMs) are increasingly deployed in everyday applications, demanding robust general reasoning capabilities and diverse reasoning skillset. However, current LLM reasoning benchmarks predominantly focus on mathematical and coding abilities, leaving a gap in evaluating broader reasoning proficiencies. One particular exception is the BIG-Bench dataset, which has served as a cruci… ▽ More

    Submitted 6 May, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  18. arXiv:2502.14617  [pdf, other

    cs.DC

    Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale

    Authors: Shashwat Jaiswal, Kunal Jain, Yogesh Simmhan, Anjaly Parayil, Ankur Mallick, Rujia Wang, Renee St. Amant, Chetan Bansal, Victor Rühle, Anoop Kulkarni, Steve Kofsky, Saravan Rajmohan

    Abstract: Large Language Model (LLM) inference workloads handled by global cloud providers can include both latency-sensitive and insensitive tasks, creating a diverse range of Service Level Agreement (SLA) requirements. Managing these mixed workloads is challenging due to the complexity of the inference stack, which includes multiple LLMs, hardware configurations, and geographic distributions. Current opti… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 15 pages, 17 figures, 2 tables

  19. arXiv:2502.05879  [pdf, other

    cs.CL cs.AI

    Enhancing Depression Detection with Chain-of-Thought Prompting: From Emotion to Reasoning Using Large Language Models

    Authors: Shiyu Teng, Jiaqing Liu, Rahul Kumar Jain, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-wei Chen

    Abstract: Depression is one of the leading causes of disability worldwide, posing a severe burden on individuals, healthcare systems, and society at large. Recent advancements in Large Language Models (LLMs) have shown promise in addressing mental health challenges, including the detection of depression through text-based analysis. However, current LLM-based methods often struggle with nuanced symptom ident… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  20. arXiv:2501.17332  [pdf, other

    cs.SD cs.LG eess.AS

    Compact Neural TTS Voices for Accessibility

    Authors: Kunal Jain, Eoin Murphy, Deepanshu Gupta, Jonathan Dyke, Saumya Shah, Vasilieios Tsiaras, Petko Petkov, Alistair Conkie

    Abstract: Contemporary text-to-speech solutions for accessibility applications can typically be classified into two categories: (i) device-based statistical parametric speech synthesis (SPSS) or unit selection (USEL) and (ii) cloud-based neural TTS. SPSS and USEL offer low latency and low disk footprint at the expense of naturalness and audio quality. Cloud-based neural TTS systems provide significantly bet… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  21. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  22. arXiv:2501.00085  [pdf

    cs.LG cs.AI cs.CR

    Machine Learning-Based Security Policy Analysis

    Authors: Krish Jain, Joann Sum, Pranav Kapoor, Amir Eaman

    Abstract: Security-Enhanced Linux (SELinux) is a robust security mechanism that enforces mandatory access controls (MAC), but its policy language's complexity creates challenges for policy analysis and management. This research investigates the automation of SELinux policy analysis using graph-based techniques combined with machine learning approaches to detect policy anomalies. The study addresses two key… ▽ More

    Submitted 6 January, 2025; v1 submitted 30 December, 2024; originally announced January 2025.

  23. arXiv:2412.14161  [pdf, other

    cs.CL

    TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

    Authors: Frank F. Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Z. Wang, Xuhui Zhou, Zhitong Guo, Murong Cao, Mingyang Yang, Hao Yang Lu, Amaad Martin, Zhe Su, Leander Maben, Raj Mehta, Wayne Chi, Lawrence Jang, Yiqing Xie, Shuyan Zhou, Graham Neubig

    Abstract: We interact with computers on an everyday basis, be it in everyday life or work, and many aspects of work can be done entirely with access to a computer and the Internet. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. But how performant are AI agen… ▽ More

    Submitted 19 May, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: Preprint

  24. arXiv:2412.09641  [pdf, other

    cs.CR cs.LG

    Machine Learning Driven Smishing Detection Framework for Mobile Security

    Authors: Diksha Goel, Hussain Ahmad, Ankit Kumar Jain, Nikhil Kumar Goel

    Abstract: The increasing reliance on smartphones for communication, financial transactions, and personal data management has made them prime targets for cyberattacks, particularly smishing, a sophisticated variant of phishing conducted via SMS. Despite the growing threat, traditional detection methods often struggle with the informal and evolving nature of SMS language, which includes abbreviations, slang,… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  25. arXiv:2411.15997  [pdf, other

    cs.LG cs.AI cs.DC cs.MA

    Ensuring Fair LLM Serving Amid Diverse Applications

    Authors: Redwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Renèe St. Amant, Rujia Wang, Yue Cheng, Ali R. Butt, Victor Rühle, Chetan Bansal, Saravan Rajmohan

    Abstract: In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To addre… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

  26. arXiv:2411.15805  [pdf, other

    cs.LG cs.AI

    Benchmarking Active Learning for NILM

    Authors: Dhruv Patel, Ankita Kumari Jain, Haikoo Khandor, Xhitij Choudhary, Nipun Batra

    Abstract: Non-intrusive load monitoring (NILM) focuses on disaggregating total household power consumption into appliance-specific usage. Many advanced NILM methods are based on neural networks that typically require substantial amounts of labeled appliance data, which can be challenging and costly to collect in real-world settings. We hypothesize that appliance data from all households does not uniformly c… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

  27. arXiv:2411.13323  [pdf, other

    cs.SE cs.AI cs.LG

    Are Large Language Models Memorizing Bug Benchmarks?

    Authors: Daniel Ramos, Claudia Mamede, Kush Jain, Paulo Canelas, Catarina Gamboa, Claire Le Goues

    Abstract: Large Language Models (LLMs) have become integral to various software engineering tasks, including code generation, bug detection, and repair. To evaluate model performance in these domains, numerous bug benchmarks containing real-world bugs from software projects have been developed. However, a growing concern within the software engineering community is that these benchmarks may not reliably ref… ▽ More

    Submitted 31 March, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

  28. arXiv:2411.07007  [pdf, other

    cs.LG cs.AI

    Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

    Authors: Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury

    Abstract: In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimizes the reward through repeated RL procedures. This game-solving approach is both computationally expensive and difficult to stabilize. In this… ▽ More

    Submitted 22 April, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

    Comments: Accepted to ICLR 2025

  29. arXiv:2410.22950  [pdf, other

    cs.LG cs.AI cs.HC

    SpiroActive: Active Learning for Efficient Data Acquisition for Spirometry

    Authors: Ankita Kumari Jain, Nitish Sharma, Madhav Kanda, Nipun Batra

    Abstract: Respiratory illnesses are a significant global health burden. Respiratory illnesses, primarily Chronic obstructive pulmonary disease (COPD), is the seventh leading cause of poor health worldwide and the third leading cause of death worldwide, causing 3.23 million deaths in 2019, necessitating early identification and diagnosis for effective mitigation. Among the diagnostic tools employed, spiromet… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

  30. arXiv:2410.18111  [pdf, other

    cs.IR cs.LG

    Data Efficiency for Large Recommendation Models

    Authors: Kshitij Jain, Jingru Xie, Kevin Regan, Cheng Chen, Jie Han, Steve Li, Zhuoshu Li, Todd Phillips, Myles Sussman, Matt Troup, Angel Yu, Jia Zhuo

    Abstract: Large recommendation models (LRMs) are fundamental to the multi-billion dollar online advertising industry, processing massive datasets of hundreds of billions of examples before transitioning to continuous online training to adapt to rapidly changing user behavior. The massive scale of data directly impacts both computational costs and the speed at which new methods can be evaluated (R&D velocity… ▽ More

    Submitted 25 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

  31. arXiv:2410.13754  [pdf, other

    cs.AI cs.LG cs.MM

    MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

    Authors: Jinjie Ni, Yifan Song, Deepanway Ghosal, Bo Li, David Junhao Zhang, Xiang Yue, Fuzhao Xue, Zian Zheng, Kaichen Zhang, Mahir Shah, Kabir Jain, Yang You, Michael Shieh

    Abstract: Perceiving and generating diverse modalities are crucial for AI models to effectively learn from and engage with real-world signals, necessitating reliable evaluations for their development. We identify two major issues in current evaluations: (1) inconsistent standards, shaped by different communities with varying protocols and maturity levels; and (2) significant query, grading, and generalizati… ▽ More

    Submitted 18 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

  32. arXiv:2410.09220  [pdf, other

    cs.CL cs.CY cs.LG

    M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought

    Authors: Gitanjali Kumari, Kirtan Jain, Asif Ekbal

    Abstract: In recent years, there has been a significant rise in the phenomenon of hate against women on social media platforms, particularly through the use of misogynous memes. These memes often target women with subtle and obscure cues, making their detection a challenging task for automated systems. Recently, Large Language Models (LLMs) have shown promising results in reasoning using Chain-of-Thought (C… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: 34 Pages. Accepted in The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Main Conference

  33. arXiv:2410.08744  [pdf, other

    q-fin.TR cs.CE q-fin.CP

    No Tick-Size Too Small: A General Method for Modelling Small Tick Limit Order Books

    Authors: Konark Jain, Jean-François Muzy, Jonathan Kochems, Emmanuel Bacry

    Abstract: Tick sizes not only influence the granularity of the price formation process but also affect market agents' behavior. We investigate the disparity in the microstructural properties of the Limit Order Book (LOB) across different relative tick sizes. A key contribution of this study is the identification of several stylized facts, which are used to differentiate between large, medium, and small tick… ▽ More

    Submitted 8 November, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: Presented at ALGODEFI 2024, Milan; SFMES Workshop in ICAIF Conference 2024, New York City

  34. arXiv:2410.00752  [pdf, other

    cs.SE

    TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

    Authors: Kush Jain, Gabriel Synnaeve, Baptiste Rozière

    Abstract: Code generation models can help improve many common software tasks ranging from code completion to defect prediction. Most of the existing benchmarks for code generation LLMs focus on code authoring or code completion. Surprisingly, there has been far less effort dedicated to benchmarking software testing, despite the strong correlation between well-tested software and effective bug detection. To… ▽ More

    Submitted 18 March, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

  35. arXiv:2409.00345  [pdf, other

    cs.CV

    PS-StyleGAN: Illustrative Portrait Sketching using Attention-Based Style Adaptation

    Authors: Kushal Kumar Jain, Ankith Varun J, Anoop Namboodiri

    Abstract: Portrait sketching involves capturing identity specific attributes of a real face with abstract lines and shades. Unlike photo-realistic images, a good portrait sketch generation method needs selective attention to detail, making the problem challenging. This paper introduces \textbf{Portrait Sketching StyleGAN (PS-StyleGAN)}, a style transfer approach tailored for portrait sketch synthesis. We le… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  36. arXiv:2408.13510  [pdf, other

    cs.DC eess.SY

    Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Load Balancing

    Authors: Kunal Jain, Anjaly Parayil, Ankur Mallick, Esha Choukse, Xiaoting Qin, Jue Zhang, Íñigo Goiri, Rujia Wang, Chetan Bansal, Victor Rühle, Anoop Kulkarni, Steve Kofsky, Saravan Rajmohan

    Abstract: Large Language Model (LLM) workloads have distinct prefill and decode phases with different compute and memory requirements which should ideally be accounted for when scheduling input queries across different LLM instances in a cluster. However existing scheduling algorithms treat LLM workloads as monolithic jobs without considering the distinct characteristics of the two phases in each workload.… ▽ More

    Submitted 7 January, 2025; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: 16 pages, 10 figures

  37. CLIP4Sketch: Enhancing Sketch to Mugshot Matching through Dataset Augmentation using Diffusion Models

    Authors: Kushal Kumar Jain, Steve Grosz, Anoop M. Namboodiri, Anil K. Jain

    Abstract: Forensic sketch-to-mugshot matching is a challenging task in face recognition, primarily hindered by the scarcity of annotated forensic sketches and the modality gap between sketches and photographs. To address this, we propose CLIP4Sketch, a novel approach that leverages diffusion models to generate a large and diverse set of sketch images, which helps in enhancing the performance of face recogni… ▽ More

    Submitted 13 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

  38. arXiv:2407.10920  [pdf, other

    cs.CV cs.AI cs.CL

    Benchmarking Vision Language Models for Cultural Understanding

    Authors: Shravan Nayak, Kanishk Jain, Rabiul Awal, Siva Reddy, Sjoerd van Steenkiste, Lisa Anne Hendricks, Karolina Stańczak, Aishwarya Agrawal

    Abstract: Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their performance has been typically assessed on general scene understanding - recognizing objects, attributes, and actions - rather than cultural comprehension. This study introduces CulturalVQA, a visual question-answering… ▽ More

    Submitted 14 October, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to EMNLP 2024 Main Conference

  39. arXiv:2407.08817  [pdf, other

    eess.SP cs.NI

    CommRad: Context-Aware Sensing-Driven Millimeter-Wave Networks

    Authors: Ish Kumar Jain, Suriyaa MM, Dinesh Bharadia

    Abstract: Millimeter-wave (mmWave) technology is pivotal for next-generation wireless networks, enabling high-data-rate and low-latency applications such as autonomous vehicles and XR streaming. However, maintaining directional mmWave links in dynamic mobile environments is challenging due to mobility-induced disruptions and blockage. While effective, the current 5G NR beam training methods incur significan… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 12 pages

  40. arXiv:2407.03901  [pdf, other

    cs.CV cs.LG

    DiCTI: Diffusion-based Clothing Designer via Text-guided Input

    Authors: Ajda Lampe, Julija Stopar, Deepak Kumar Jain, Shinichiro Omachi, Peter Peer, Vitomir Štruc

    Abstract: Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and cus… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to FG 2024

  41. arXiv:2406.06565  [pdf, other

    cs.CL cs.AI cs.LG

    MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

    Authors: Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You

    Abstract: Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while LLM-as-judge benchmarks suffer from grading biases and limited query quantity. Both of them may also become contaminated over time. User-facing evaluation, such as Chatbot Arena, provides reliable signals but is costly and s… ▽ More

    Submitted 12 October, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to NeurIPS 2024

  42. arXiv:2406.00287  [pdf, other

    cs.CV cs.AI

    GenPalm: Contactless Palmprint Generation with Diffusion Models

    Authors: Steven A. Grosz, Anil K. Jain

    Abstract: The scarcity of large-scale palmprint databases poses a significant bottleneck to advancements in contactless palmprint recognition. To address this, researchers have turned to synthetic data generation. While Generative Adversarial Networks (GANs) have been widely used, they suffer from instability and mode collapse. Recently, diffusion probabilistic models have emerged as a promising alternative… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  43. arXiv:2404.13791  [pdf, other

    cs.CV cs.AI

    Universal Fingerprint Generation: Controllable Diffusion Model with Multimodal Conditions

    Authors: Steven A. Grosz, Anil K. Jain

    Abstract: The utilization of synthetic data for fingerprint recognition has garnered increased attention due to its potential to alleviate privacy concerns surrounding sensitive biometric data. However, current methods for generating fingerprints have limitations in creating impressions of the same finger with useful intra-class variations. To tackle this challenge, we present GenPrint, a framework to produ… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  44. arXiv:2404.07467  [pdf, other

    cs.CV

    Trashbusters: Deep Learning Approach for Litter Detection and Tracking

    Authors: Kashish Jain, Manthan Juthani, Jash Jain, Anant V. Nimkar

    Abstract: The illegal disposal of trash is a major public health and environmental concern. Disposing of trash in unplanned places poses serious health and environmental risks. We should try to restrict public trash cans as much as possible. This research focuses on automating the penalization of litterbugs, addressing the persistent problem of littering in public places. Traditional approaches relying on m… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  45. arXiv:2401.08111  [pdf, other

    cs.CV

    Mobile Contactless Palmprint Recognition: Use of Multiscale, Multimodel Embeddings

    Authors: Steven A. Grosz, Akash Godbole, Anil K. Jain

    Abstract: Contactless palmprints are comprised of both global and local discriminative features. Most prior work focuses on extracting global features or local features alone for palmprint matching, whereas this research introduces a novel framework that combines global and local features for enhanced palmprint matching accuracy. Leveraging recent advancements in deep learning, this study integrates a visio… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  46. arXiv:2312.11790  [pdf

    cs.NI

    Improvement of inter-protocol fairness for BBR congestion control using machine learning

    Authors: Vaishnavi Mhaske, Khushi Jain, Sai Karthik Thatikonda, Asif Kunwar

    Abstract: Google's BBR (Bottleneck Bandwidth and Round-trip Propagation Time) approach is used to enhance internet network transmission. It is particularly intended to efficiently handle enormous amounts of data. Traditional TCP (Transmission Control Protocol) algorithms confront the most difficulty in calculating the proper quantity of data to send in order to prevent congestion and bottlenecks. This waste… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  47. arXiv:2312.08927  [pdf, other

    q-fin.TR cs.CE q-fin.CP stat.AP

    Limit Order Book Dynamics and Order Size Modelling Using Compound Hawkes Process

    Authors: Konark Jain, Nick Firoozye, Jonathan Kochems, Philip Treleaven

    Abstract: Hawkes Process has been used to model Limit Order Book (LOB) dynamics in several ways in the literature however the focus has been limited to capturing the inter-event times while the order size is usually assumed to be constant. We propose a novel methodology of using Compound Hawkes Process for the LOB where each event has an order size sampled from a calibrated distribution. The process is form… ▽ More

    Submitted 14 August, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Presented at Market Microstructure 2023, Quantitative Finance Workshop 2024. Oxford SML Finance Seminar 2024 and Submitted to Finance Research Letters journal

  48. arXiv:2311.11753  [pdf, other

    cs.CV

    AdvGen: Physical Adversarial Attack on Face Presentation Attack Detection Systems

    Authors: Sai Amrit Patnaik, Shivali Chansoriya, Anil K. Jain, Anoop M. Namboodiri

    Abstract: Evaluating the risk level of adversarial images is essential for safely deploying face authentication models in the real world. Popular approaches for physical-world attacks, such as print or replay attacks, suffer from some limitations, like including physical and geometrical artifacts. Recently, adversarial attacks have gained attraction, which try to digitally deceive the learning strategy of a… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 10 pages, 9 figures, Accepted to the International Joint Conference on Biometrics (IJCB 2023)

  49. arXiv:2311.07945  [pdf, other

    cs.CL

    First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning

    Authors: Kushal Jain, Moritz Miller, Niket Tandon, Kumar Shridhar

    Abstract: Language models can solve complex reasoning tasks better by learning to generate rationales for their predictions. Often these models know how to solve a task but their auto-regressive decoding nature leads to incorrect results if they start incorrectly. We observe that smaller models in particular when corrected, can solve a task that they would have otherwise struggled with. We demonstrate this… ▽ More

    Submitted 1 July, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  50. arXiv:2310.11787  [pdf, other

    cs.LG

    NeuroCUT: A Neural Approach for Robust Graph Partitioning

    Authors: Rishi Shah, Krishnanshu Jain, Sahil Manchanda, Sourav Medya, Sayan Ranu

    Abstract: Graph partitioning aims to divide a graph into disjoint subsets while optimizing a specific partitioning objective. The majority of formulations related to graph partitioning exhibit NP-hardness due to their combinatorial nature. Conventional methods, like approximation algorithms or heuristics, are designed for distinct partitioning objectives and fail to achieve generalization across other impor… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: To appear in Knowledge Discovery and Data Mining(KDD), 2024