Skip to main content

Showing 1–50 of 71 results for author: Agarwal, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00310  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Open-ended Scientific Discovery via Bayesian Surprise

    Authors: Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson, Megha Chakravorty, Satvika Reddy Gavireddy, Aditya Parashar, Harshit Surana, Bhavana Dalvi Mishra, Andrew McCallum, Ashish Sabharwal, Peter Clark

    Abstract: The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to d… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  2. arXiv:2506.11007  [pdf, other

    cs.SE cs.AI

    Impact of Comments on LLM Comprehension of Legacy Code

    Authors: Rock Sabetto, Emily Escamilla, Devesh Agarwal, Sujay Kandwal, Justin F. Brunelle, Scott Rosen, Nitin Naik, Samruddhi Thaker, Eric O. Scott, Jacob Zimmer, Amit Madan, Arun Sridharan, Doug Wendt, Michael Doyle, Christopher Glasz, Jasper Phillips, William Macke, Colin Diggs, Michael Bartholf, Zachary Robin, Paul Ursino

    Abstract: Large language models (LLMs) have been increasingly integrated into software engineering and maintenance tasks due to their high performance with software engineering tasks and robust understanding of modern programming languages. However, the ability of LLMs to comprehend code written with legacy languages remains a research gap challenged by real-world legacy systems lacking or containing inaccu… ▽ More

    Submitted 23 April, 2025; originally announced June 2025.

  3. arXiv:2505.22995  [pdf, ps, other

    eess.AS cs.SD

    LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting

    Authors: Pai Zhu, Quan Wang, Dhruuv Agarwal, Kurt Partridge

    Abstract: Custom keyword spotting (KWS) allows detecting user-defined spoken keywords from streaming audio. This is achieved by comparing the embeddings from voice enrollments and input audio. State-of-the-art custom KWS models are typically trained contrastively using utterances whose keywords are randomly sampled from training dataset. These KWS models often struggle with confusing keywords, such as "blue… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  4. arXiv:2505.21548  [pdf, ps, other

    cs.CL cs.AI cs.CY physics.soc-ph

    Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding?

    Authors: Dhruv Agarwal, Anya Shukla, Sunayana Sitaram, Aditya Vashistha

    Abstract: Large language models (LLMs) are used around the world but exhibit Western cultural tendencies. To address this cultural misalignment, many countries have begun developing "regional" LLMs tailored to local communities. Yet it remains unclear whether these models merely speak the language of their users or also reflect their cultural values and practices. Using India as a case study, we evaluate fi… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Under review

  5. arXiv:2505.18878  [pdf, other

    cs.CL cs.AI

    CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions

    Authors: Kung-Hsiang Huang, Akshara Prabhakar, Onkar Thorat, Divyansh Agarwal, Prafulla Kumar Choubey, Yixin Mao, Silvio Savarese, Caiming Xiong, Chien-Sheng Wu

    Abstract: While AI agents hold transformative potential in business, effective performance benchmarking is hindered by the scarcity of public, realistic business data on widely used platforms. Existing benchmarks often lack fidelity in their environments, data, and agent-user interactions, with limited coverage of diverse business scenarios and industries. To address these gaps, we introduce CRMArena-Pro, a… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  6. arXiv:2505.14814  [pdf, ps, other

    cs.SD cs.CL eess.AS

    GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples

    Authors: Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang

    Abstract: Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversa… ▽ More

    Submitted 24 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

  7. arXiv:2505.09819  [pdf, other

    cs.HC cs.CV cs.LG eess.SY

    Visual Feedback of Pattern Separability Improves Myoelectric Decoding Performance of Upper Limb Prostheses

    Authors: Ruichen Yang, György M. Lévay, Christopher L. Hunt, Dániel Czeiner, Megan C. Hodgson, Damini Agarwal, Rahul R. Kaliki, Nitish V. Thakor

    Abstract: State-of-the-art upper limb myoelectric prostheses often use pattern recognition (PR) control systems that translate electromyography (EMG) signals into desired movements. As prosthesis movement complexity increases, users often struggle to produce sufficiently distinct EMG patterns for reliable classification. Existing training typically involves heuristic, trial-and-error user adjustments to sta… ▽ More

    Submitted 15 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  8. arXiv:2504.12417  [pdf

    cs.AI

    Interpretable AI-driven Guidelines for Type 2 Diabetes Treatment from Observational Data

    Authors: Dewang Kumar Agarwal, Dimitris J. Bertsimas

    Abstract: Objective: Create precise, structured, data-backed guidelines for type 2 diabetes treatment progression, suitable for clinical adoption. Research Design and Methods: Our training cohort was composed of patient (with type 2 diabetes) visits from Boston Medical Center (BMC) from 1998 to 2014. We divide visits into 4 groups based on the patient's treatment regimen before the visit, and further divi… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  9. arXiv:2503.06550  [pdf, other

    cs.CL

    BingoGuard: LLM Content Moderation Tools with Risk Levels

    Authors: Fan Yin, Philippe Laban, Xiangyu Peng, Yilun Zhou, Yixin Mao, Vaibhav Vats, Linnea Ross, Divyansh Agarwal, Caiming Xiong, Chien-Sheng Wu

    Abstract: Malicious content generated by large language models (LLMs) can pose varying degrees of harm. Although existing LLM-based moderators can detect harmful content, they struggle to assess risk levels and may miss lower-risk outputs. Accurate risk assessment allows platforms with different safety thresholds to tailor content filtering and rejection. In this paper, we introduce per-topic severity rubri… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 10 pages, 4 figures, 4 tables. ICLR 2025 poster

  10. arXiv:2411.09969  [pdf, other

    cs.HC cs.AI

    Steering AI-Driven Personalization of Scientific Text for General Audiences

    Authors: Taewook Kim, Dhruv Agarwal, Jordan Ackerman, Manaswi Saha

    Abstract: Digital media platforms (e.g., social media, science blogs) offer opportunities to communicate scientific content to general audiences at scale. However, these audiences vary in their scientific expertise, literacy levels, and personal backgrounds, making effective science communication challenging. To address this challenge, we designed TranSlider, an AI-powered tool that generates personalized t… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: 23 pages, 5 figures, 1 table

  11. arXiv:2410.23252  [pdf, other

    cs.CL

    Evaluating Cultural and Social Awareness of LLM Web Agents

    Authors: Haoyi Qiu, Alexander R. Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, Chien-Sheng Wu

    Abstract: As large language models (LLMs) expand into performing as agents for real-world applications beyond traditional NLP tasks, evaluating their robustness becomes increasingly important. However, existing benchmarks often overlook critical dimensions like cultural and social awareness. To address these, we introduce CASA, a benchmark designed to assess LLM agents' sensitivity to cultural and social no… ▽ More

    Submitted 8 March, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: NAACL 2025 Findings

  12. arXiv:2410.16647  [pdf, other

    eess.AS cs.AI cs.LG

    GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting

    Authors: Pai Zhu, Jacob W. Bartel, Dhruuv Agarwal, Kurt Partridge, Hyun Jin Park, Quan Wang

    Abstract: We propose GE2E-KWS -- a generalized end-to-end training and evaluation framework for customized keyword spotting. Specifically, enrollment utterances are separated and grouped by keywords from the training batch and their embedding centroids are compared to all other test utterance embeddings to compute the loss. This simulates runtime enrollment and verification stages, and improves convergence… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 8 pages, 6 figures, 2 tables The paper is accepted in IEEE Spoken Language Technology (SLT) 2024

  13. arXiv:2410.07168  [pdf, other

    cs.CL cs.SD eess.AS

    Sylber: Syllabic Embedding Representation of Speech from Raw Audio

    Authors: Cheol Jun Cho, Nicholas Lee, Akshat Gupta, Dhruv Agarwal, Ethan Chen, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Syllables are compositional units of spoken language that efficiently structure human speech perception and production. However, current neural speech representations lack such structure, resulting in dense token sequences that are costly to process. To bridge this gap, we propose a new model, Sylber, that produces speech representations with clean and robust syllabic structure. Specifically, we p… ▽ More

    Submitted 2 March, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted at ICLR 2025

  14. ViDAS: Vision-based Danger Assessment and Scoring

    Authors: Pranav Gupta, Advith Krishnan, Naman Nanda, Ananth Eswar, Deeksha Agarwal, Pratham Gohil, Pratyush Goel

    Abstract: We present a novel dataset aimed at advancing danger analysis and assessment by addressing the challenge of quantifying danger in video content and identifying how human-like a Large Language Model (LLM) evaluator is for the same. This is achieved by compiling a collection of 100 YouTube videos featuring various events. Each video is annotated by human participants who provided danger ratings on a… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: Preprint

  15. AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances

    Authors: Dhruv Agarwal, Mor Naaman, Aditya Vashistha

    Abstract: Large language models (LLMs) are being increasingly integrated into everyday products and services, such as coding tools and writing assistants. As these embedded AI applications are deployed globally, there is a growing concern that the AI models underlying these applications prioritize Western values. This paper investigates what happens when a Western-centric AI model provides writing suggestio… ▽ More

    Submitted 12 March, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: Accepted at CHI 2025

  16. arXiv:2408.10463  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded ac… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  17. arXiv:2407.18879  [pdf, other

    cs.SD cs.LG eess.AS

    Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: This paper explores the use of TTS synthesized training data for KWS (keyword spotting) task while minimizing development cost and time. Keyword spotting models require a huge amount of training data to be accurate, and obtaining such training data can be costly. In the current state of the art, TTS models can generate large amounts of natural-sounding data, which can help reducing cost and time f… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  18. arXiv:2407.16840  [pdf, other

    eess.AS cs.AI

    Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments

    Authors: Pai Zhu, Dhruuv Agarwal, Jacob W. Bartel, Kurt Partridge, Hyun Jin Park, Quan Wang

    Abstract: One of the challenges in developing a high quality custom keyword spotting (KWS) model is the lengthy and expensive process of collecting training data covering a wide range of languages, phrases and speaking styles. We introduce Synth4Kws - a framework to leverage Text to Speech (TTS) synthesized data for custom KWS in different resource settings. With no real data, we found increasing TTS phrase… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures, 2 tables The paper is accepted in Interspeech SynData4GenAI 2024 Workshop - https://syndata4genai.org/#call-for-papers

  19. arXiv:2407.01725  [pdf, other

    cs.CL cs.AI cs.LG

    DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systemat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Website: https://github.com/allenai/discoverybench

  20. arXiv:2406.12998  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Coding Speech through Vocal Tract Kinematics

    Authors: Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K. Anumanchipalli

    Abstract: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- Speech Articulatory Coding (SPARC). SPARC co… ▽ More

    Submitted 14 December, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 18, no. 8, pp. 1427-1440, Dec. 2024

  21. arXiv:2406.10750  [pdf, other

    cs.HC

    EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos

    Authors: Vineet Parikh, Saif Mahmud, Devansh Agarwal, Ke Li, François Guimbretière, Cheng Zhang

    Abstract: Self-recording eating behaviors is a step towards a healthy lifestyle recommended by many health professionals. However, the current practice of manually recording eating activities using paper records or smartphone apps is often unsustainable and inaccurate. Smart glasses have emerged as a promising wearable form factor for tracking eating behaviors, but existing systems primarily identify when e… ▽ More

    Submitted 31 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted at ISWC '24

  22. SonicID: User Identification on Smart Glasses with Acoustic Sensing

    Authors: Ke Li, Devansh Agarwal, Ruidong Zhang, Vipin Gunda, Tianjun Mo, Saif Mahmud, Boao Chen, François Guimbretière, Cheng Zhang

    Abstract: Smart glasses have become more prevalent as they provide an increasing number of applications for users. They store various types of private information or can access it via connections established with other devices. Therefore, there is a growing need for user identification on smart glasses. In this paper, we introduce a low-power and minimally-obtrusive system called SonicID, designed to authen… ▽ More

    Submitted 24 October, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 27 pages, 6 tables, 9 figures

  23. MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses

    Authors: Saif Mahmud, Devansh Agarwal, Ashwin Ajit, Qikang Liang, Thalia Viranda, Francois Guimbretiere, Cheng Zhang

    Abstract: We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses to track fine-grained dietary actions. MunchSonic emits inaudible ultrasonic waves from the eyeglass frame, with the reflected signals capturing detailed positions and movements of body parts, including the mouth, jaw, arms, and hands involved in eating. These signals are processed by a deep learning p… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: 8 pages, 7 figures

  24. arXiv:2405.20254  [pdf, other

    cs.HC cs.CY

    Conversational Agents to Facilitate Deliberation on Harmful Content in WhatsApp Groups

    Authors: Dhruv Agarwal, Farhana Shahid, Aditya Vashistha

    Abstract: WhatsApp groups have become a hotbed for the propagation of harmful content including misinformation, hate speech, polarizing content, and rumors, especially in Global South countries. Given the platform's end-to-end encryption, moderation responsibilities lie on group admins and members, who rarely contest such content. Another approach is fact-checking, which is unscalable, and can only contest… ▽ More

    Submitted 16 August, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted at CSCW 2024

  25. arXiv:2404.16251  [pdf, other

    cs.CR cs.AI cs.CL

    Prompt Leakage effect and defense strategies for multi-turn LLM interactions

    Authors: Divyansh Agarwal, Alexander R. Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, Chien-Sheng Wu

    Abstract: Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities a… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  26. arXiv:2404.13924  [pdf, other

    cs.HC cs.ET

    ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the Body

    Authors: Saif Mahmud, Vineet Parikh, Qikang Liang, Ke Li, Ruidong Zhang, Ashwin Ajit, Vipin Gunda, Devansh Agarwal, François Guimbretière, Cheng Zhang

    Abstract: We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body. It requires only a pair of miniature speakers and microphones mounted on each hinge of the eyeglasses to emit ultrasonic waves, creating an acoustic aura ar… ▽ More

    Submitted 25 November, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 8, Issue 4, November 2024, IMWUT/UbiComp 2025

  27. arXiv:2404.12980  [pdf, other

    cs.HC

    Ring-a-Pose: A Ring for Continuous Hand Pose Tracking

    Authors: Tianhong Catherine Yu, Guilin Hu, Ruidong Zhang, Hyunchul Lim, Saif Mahmud, Chi-Jung Lee, Ke Li, Devansh Agarwal, Shuyang Nie, Jinseok Oh, François Guimbretière, Cheng Zhang

    Abstract: We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use… ▽ More

    Submitted 11 November, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  28. arXiv:2404.12541  [pdf, other

    cs.CV

    GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models

    Authors: Sai Sree Harsha, Ambareesh Revanur, Dhwanit Agarwal, Shradha Agrawal

    Abstract: Video editing methods based on diffusion models that rely solely on a text prompt for the edit are hindered by the limited expressive power of text prompts. Thus, incorporating a reference target image as a visual guide becomes desirable for precise control over edit. Also, most existing methods struggle to accurately edit a video when the shape and size of the object in the target image differ fr… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: CVPRw 2024

  29. arXiv:2402.13610  [pdf, other

    cs.CL cs.AI cs.LG

    Data-driven Discovery with Large Generative Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Sanchaita Hazra, Ashish Sabharwal, Peter Clark

    Abstract: With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a se… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  30. EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband

    Authors: Chi-Jung Lee, Ruidong Zhang, Devansh Agarwal, Tianhong Catherine Yu, Vipin Gunda, Oliver Lopez, James Kim, Sicheng Yin, Boao Dong, Ke Li, Mose Sakashita, Francois Guimbretiere, Cheng Zhang

    Abstract: Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible s… ▽ More

    Submitted 29 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  31. One Style Does Not Regulate All: Moderation Practices in Public and Private WhatsApp Groups

    Authors: Farhana Shahid, Dhruv Agarwal, Aditya Vashistha

    Abstract: WhatsApp is the largest social media platform in the Global South and is a virulent force in global misinformation and political propaganda. Due to end-to-end encryption WhatsApp can barely review any content and mostly rely on volunteer moderation by group admins. Yet, little is known about how WhatsApp group admins manage their groups, what factors and values influence moderation decisions, and… ▽ More

    Submitted 2 January, 2025; v1 submitted 15 January, 2024; originally announced January 2024.

  32. arXiv:2311.15516  [pdf, other

    eess.SY cs.AI cs.LG

    Active Foundational Models for Fault Diagnosis of Electrical Motors

    Authors: Sriram Anbalagan, Sai Shashank GP, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan

    Abstract: Fault detection and diagnosis of electrical motors are of utmost importance in ensuring the safe and reliable operation of several industrial systems. Detection and diagnosis of faults at the incipient stage allows corrective actions to be taken in order to reduce the severity of faults. The existing data-driven deep learning approaches for machine fault diagnosis rely extensively on huge amounts… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 30 pages, 2 figures, 7 tables

  33. arXiv:2311.15301  [pdf

    eess.IV cs.CV cs.LG

    Eye Disease Prediction using Ensemble Learning and Attention on OCT Scans

    Authors: Gauri Naik, Nandini Narvekar, Dimple Agarwal, Nishita Nandanwar, Himangi Pande

    Abstract: Eye diseases have posed significant challenges for decades, but advancements in technology have opened new avenues for their detection and treatment. Machine learning and deep learning algorithms have become instrumental in this domain, particularly when combined with Optical Coherent Technology (OCT) imaging. We propose a novel method for efficient detection of eye diseases from OCT images. Our t… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: Full paper accepted at FICC (Springer) 2024

  34. arXiv:2311.07850  [pdf, other

    cs.CL cs.AI cs.DB cs.LG

    Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

    Authors: Dhruv Agarwal, Rajarshi Das, Sopan Khosla, Rashmi Gangadharaiah

    Abstract: We present BYOKG, a universal question-answering (QA) system that can operate on any knowledge graph (KG), requires no human-annotated training data, and can be ready to use within a day -- attributes that are out-of-scope for current KGQA systems. BYOKG draws inspiration from the remarkable ability of humans to comprehend information present in an unseen KG through exploration -- starting at rand… ▽ More

    Submitted 21 May, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  35. arXiv:2309.14556  [pdf, other

    cs.CL cs.AI cs.HC

    Art or Artifice? Large Language Models and the False Promise of Creativity

    Authors: Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, Chien-Sheng Wu

    Abstract: Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writ… ▽ More

    Submitted 8 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: ACM CHI 2024

  36. arXiv:2307.16891  [pdf, other

    eess.SY cs.AI cs.LG

    Foundational Models for Fault Diagnosis of Electrical Motors

    Authors: Sriram Anbalagan, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan

    Abstract: A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing s… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 7 pages, 1 figure, 5 tables, submitted to IEEE PESGRE 2023

  37. arXiv:2307.04610  [pdf, other

    cs.CV

    SPLAL: Similarity-based pseudo-labeling with alignment loss for semi-supervised medical image classification

    Authors: Md Junaid Mahmood, Pranaw Raj, Divyansh Agarwal, Suruchi Kumari, Pravendra Singh

    Abstract: Medical image classification is a challenging task due to the scarcity of labeled samples and class imbalance caused by the high variance in disease prevalence. Semi-supervised learning (SSL) methods can mitigate these challenges by leveraging both labeled and unlabeled data. However, SSL methods for medical image classification need to address two key challenges: (1) estimating reliable pseudo-la… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Under Review

  38. arXiv:2305.14815  [pdf, other

    cs.CL cs.IR

    Machine Reading Comprehension using Case-based Reasoning

    Authors: Dung Thai, Dhruv Agarwal, Mudit Chaudhary, Wenlong Zhao, Rajarshi Das, Manzil Zaheer, Jay-Yoon Lee, Hannaneh Hajishirzi, Andrew McCallum

    Abstract: We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds upon the hypothesis that contextualized answers to similar questions share semantic similarities with each other. Given a test question, CBR-MRC first retrieves a set of similar cases from a nonparame… ▽ More

    Submitted 5 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 9 pages, 2 figures

  39. arXiv:2305.14540  [pdf, other

    cs.CL

    LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

    Authors: Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu

    Abstract: With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency de… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  40. arXiv:2303.05031  [pdf, other

    cs.CV

    CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing

    Authors: Ambareesh Revanur, Debraj Basu, Shradha Agrawal, Dhwanit Agarwal, Deepak Pai

    Abstract: Edit fidelity is a significant issue in open-world controllable generative image editing. Recently, CLIP-based approaches have traded off simplicity to alleviate these problems by introducing spatial attention in a handpicked layer of a StyleGAN. In this paper, we propose CoralStyleCLIP, which incorporates a multi-layer attention-guided blending strategy in the feature space of StyleGAN2 for obtai… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  41. arXiv:2212.08841  [pdf, other

    cs.CL cs.IR

    AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data Augmentation

    Authors: Rui Meng, Ye Liu, Semih Yavuz, Divyansh Agarwal, Lifu Tu, Ning Yu, Jianguo Zhang, Meghana Bhat, Yingbo Zhou

    Abstract: Dense retrievers have made significant strides in text retrieval and open-domain question answering. However, most of these achievements have relied heavily on extensive human-annotated supervision. In this study, we aim to develop unsupervised methods for improving dense retrieval models. We propose two approaches that enable annotation-free and scalable training by creating pseudo querydocument… ▽ More

    Submitted 29 October, 2024; v1 submitted 17 December, 2022; originally announced December 2022.

    Comments: DCAI24, October 25, 2024, Boise, ID

  42. arXiv:2211.05886  [pdf, ps, other

    cs.CL

    CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

    Authors: Divyansh Agarwal, Alexander R. Fabbri, Simeng Han, Wojciech Kryściński, Faisal Ladhak, Bryan Li, Kathleen McKeown, Dragomir Radev, Tianyi Zhang, Sam Wiseman

    Abstract: This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique cha… ▽ More

    Submitted 6 December, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: 4 pages + 3 for references and appendix

  43. arXiv:2210.10163  [pdf, other

    cs.CV cs.CL

    MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

    Authors: Zifeng Wang, Zhenbang Wu, Dinesh Agarwal, Jimeng Sun

    Abstract: Existing vision-text contrastive learning like CLIP aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., im… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  44. arXiv:2206.07922  [pdf, other

    cs.LG

    Challenges and Opportunities in Deep Reinforcement Learning with Graph Neural Networks: A Comprehensive review of Algorithms and Applications

    Authors: Sai Munikoti, Deepesh Agarwal, Laya Das, Mahantesh Halappanavar, Balasubramaniam Natarajan

    Abstract: Deep reinforcement learning (DRL) has empowered a variety of artificial intelligence fields, including pattern recognition, robotics, recommendation-systems, and gaming. Similarly, graph neural networks (GNN) have also demonstrated their superior performance in supervised learning for graph-structured data. In recent times, the fusion of GNN with DRL for graph-structured environments has attracted… ▽ More

    Submitted 7 November, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: 20 pages, 3 figures, 2 tables

  45. arXiv:2205.09968  [pdf, other

    cs.LG stat.ML

    A General Framework for quantifying Aleatoric and Epistemic uncertainty in Graph Neural Networks

    Authors: Sai Munikoti, Deepesh Agarwal, Laya Das, Balasubramaniam Natarajan

    Abstract: Graph Neural Networks (GNN) provide a powerful framework that elegantly integrates Graph theory with Machine learning for modeling and analysis of networked data. We consider the problem of quantifying the uncertainty in predictions of GNN stemming from modeling errors and measurement uncertainty. We consider aleatoric uncertainty in the form of probabilistic links and noise in feature vector of n… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: 10 pages, 1 figure, 6 Tables

  46. arXiv:2204.11716  [pdf, other

    cs.CV cs.AI cs.LG q-bio.OT

    Masked Image Modeling Advances 3D Medical Image Analysis

    Authors: Zekai Chen, Devansh Agarwal, Kshitij Aggarwal, Wiem Safta, Samit Hirawat, Venkat Sethuraman, Mariann Micsinai Balan, Kevin Brown

    Abstract: Recently, masked image modeling (MIM) has gained considerable attention due to its capacity to learn from vast amounts of unlabeled data and has been demonstrated to be effective on a wide variety of vision tasks involving natural images. Meanwhile, the potential of self-supervised learning in modeling 3D medical images is anticipated to be immense due to the high quantities of unlabeled images, a… ▽ More

    Submitted 23 August, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: 8 pages, 6 figures, 9 tables; Accepted by WACV2023

  47. arXiv:2204.08364  [pdf, other

    cs.CV

    Detecting, Tracking and Counting Motorcycle Rider Traffic Violations on Unconstrained Roads

    Authors: Aman Goyal, Dev Agarwal, Anbumani Subramanian, C. V. Jawahar, Ravi Kiran Sarvadevabhatla, Rohit Saluja

    Abstract: In many Asian countries with unconstrained road traffic conditions, driving violations such as not wearing helmets and triple-riding are a significant source of fatalities involving motorcycles. Identifying and penalizing such riders is vital in curbing road accidents and improving citizens' safety. With this motivation, we propose an approach for detecting, tracking, and counting motorcycle ridin… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: 10 pages, 9 figures, Accepted at The 5th Workshop and Prize Challenge: Bridging the Gap between Computational Photography and Visual Recognition (UG2+) in conjunction with IEEE CVPR 2022

  48. arXiv:2202.12441  [pdf, other

    cs.LG math.OC stat.AP

    Long-Term Missing Value Imputation for Time Series Data Using Deep Neural Networks

    Authors: Jangho Park, Juliane Muller, Bhavna Arora, Boris Faybishenko, Gilberto Pastorello, Charuleka Varadharajan, Reetik Sahu, Deborah Agarwal

    Abstract: We present an approach that uses a deep learning model, in particular, a MultiLayer Perceptron (MLP), for estimating the missing values of a variable in multivariate time series data. We focus on filling a long continuous gap (e.g., multiple months of missing daily observations) rather than on individual randomly missing observations. Our proposed gap filling algorithm uses an automated method for… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  49. Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding

    Authors: Tanay Agrawal, Dhruv Agarwal, Michal Balazia, Neelabh Sinha, Francois Bremond

    Abstract: Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, w… ▽ More

    Submitted 12 January, 2023; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: Preprint. Final paper accepted at the 17th International Conference on Computer Vision Theory and Applications (VISAPP), virtual, February, 2022. 8 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  50. arXiv:2110.08270  [pdf, other

    cs.LG cs.CL

    From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

    Authors: Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, François Bremond

    Abstract: Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modali… ▽ More

    Submitted 19 October, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance, AVSS 2021, Virtual, November 16-19, 2021. 10 pages