Skip to main content

Showing 1–31 of 31 results for author: Kafle, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12623  [pdf, ps, other

    cs.CV cs.CL

    MS4UI: A Dataset for Multi-modal Summarization of User Interface Instructional Videos

    Authors: Yuan Zang, Hao Tan, Seunghyun Yoon, Franck Dernoncourt, Jiuxiang Gu, Kushal Kafle, Chen Sun, Trung Bui

    Abstract: We study multi-modal summarization for instructional videos, whose goal is to provide users an efficient way to learn skills in the form of text instructions and key video frames. We observe that existing benchmarks focus on generic semantic-level video summarization, and are not suitable for providing step-by-step executable instructions and illustrations, both of which are crucial for instructio… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  2. arXiv:2501.08648  [pdf, other

    cs.CL cs.AI

    MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

    Authors: Savya Khosla, Aditi Tiwari, Kushal Kafle, Simon Jenni, Handong Zhao, John Collomosse, Jing Shi

    Abstract: While originally designed for unidirectional generative modeling, decoder-only large language models (LLMs) are increasingly being adapted for bidirectional modeling. However, unidirectional and bidirectional models are typically trained separately with distinct objectives (generation and representation learning). This separation overlooks the opportunity for developing a more versatile language m… ▽ More

    Submitted 13 February, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

  3. arXiv:2408.05334  [pdf, other

    cs.AI cs.CL cs.CV

    Revisiting Multi-Modal LLM Evaluation

    Authors: Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal Kafle, Christopher Kanan

    Abstract: With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence. However, the most popular datasets used to evaluate MLLMs are some of the earliest ones created, and they have many known problems, including extreme bias, spurious correlations, and an inability to permit fine-grained analys… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  4. arXiv:2406.11331  [pdf, other

    cs.CV cs.IR cs.LG

    They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias

    Authors: Salma Abdel Magid, Jui-Hsien Wang, Kushal Kafle, Hanspeter Pfister

    Abstract: Vision Language Models (VLMs) such as CLIP are powerful models; however they can exhibit unwanted biases, making them less safe when deployed directly in applications such as text-to-image, text-to-video retrievals, reverse search, or classification tasks. In this work, we propose a novel framework to generate synthetic counterfactual images to create a diverse and balanced dataset that can be use… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2404.16123  [pdf, other

    cs.CV cs.AI cs.CL

    FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

    Authors: Eric Slyman, Stefan Lee, Scott Cohen, Kushal Kafle

    Abstract: Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset. These results have been based on pruning commonly used image-caption datasets collected from the web -- datasets that are known to harbor… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Conference paper at CVPR 2024. 6 pages, 8 figures. Project Page: https://ericslyman.com/fairdedup/

    ACM Class: I.4.10; I.2.7; E.0

  6. arXiv:2404.14715  [pdf, other

    cs.CV cs.CL

    FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

    Authors: Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

    Abstract: Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoning for VLMs, current models often struggle to effectively and precisely capture the compositional information on both the image and text sides. To add… ▽ More

    Submitted 19 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: ECCV 2024

  7. arXiv:2308.12910  [pdf, other

    cs.CV

    SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data

    Authors: Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez

    Abstract: We propose Subject-Conditional Relation Detection SCoRD, where conditioned on an input subject, the goal is to predict all its relations to other objects in a scene along with their locations. Based on the Open Images dataset, we propose a challenging OIv6-SCoRD benchmark such that the training and testing splits have a distribution shift in terms of the occurrence statistics of $\langle$subject,… ▽ More

    Submitted 4 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: WACV 2024

  8. Helion: Enabling Natural Testing of Smart Homes

    Authors: Prianka Mandal, Sunil Manandhar, Kaushal Kafle, Kevin Moran, Denys Poshyvanyk, Adwait Nadkarni

    Abstract: Prior work has developed numerous systems that test the security and safety of smart homes. For these systems to be applicable in practice, it is necessary to test them with realistic scenarios that represent the use of the smart home, i.e., home automation, in the wild. This demo paper presents the technical details and usage of Helion, a system that uses n-gram language modeling to learn the reg… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

    Comments: To be published in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. arXiv admin note: text overlap with arXiv:1907.00124

  9. MASC: A Tool for Mutation-Based Evaluation of Static Crypto-API Misuse Detectors

    Authors: Amit Seal Ami, Syed Yusuf Ahmed, Radowan Mahmud Redoy, Nathan Cooper, Kaushal Kafle, Kevin Moran, Denys Poshyvanyk, Adwait Nadkarni

    Abstract: While software engineers are optimistically adopting crypto-API misuse detectors (or crypto-detectors) in their software development cycles, this momentum must be accompanied by a rigorous understanding of crypto-detectors' effectiveness at finding crypto-API misuses in practice. This demo paper presents the technical details and usage scenarios of our tool, namely Mutation Analysis for evaluating… ▽ More

    Submitted 13 August, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

    Comments: To be published in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  10. arXiv:2206.15462  [pdf, other

    cs.CV cs.CL cs.LG

    Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

    Authors: Ziyan Yang, Kushal Kafle, Franck Dernoncourt, Vicente Ordonez

    Abstract: We propose a margin-based loss for tuning joint vision-language models so that their gradient-based explanations are consistent with region-level annotations provided by humans for relatively smaller grounding datasets. We refer to this objective as Attention Mask Consistency (AMC) and demonstrate that it produces superior visual grounding results than previous methods that rely on using vision-la… ▽ More

    Submitted 6 January, 2024; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: CVPR 2023. Fix ReferIt results. Code: https://github.com/uvavision/AMC-grounding Project Webpage: https://vislang.ai/amc

  11. arXiv:2204.02426  [pdf, other

    cs.LG

    OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses

    Authors: Robik Shrestha, Kushal Kafle, Christopher Kanan

    Abstract: Dataset bias and spurious correlations can significantly impair generalization in deep neural networks. Many prior efforts have addressed this problem using either alternative loss functions or sampling strategies that focus on rare patterns. We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias. Specifically, we prop… ▽ More

    Submitted 14 April, 2024; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: ECCV 2022

  12. arXiv:2109.05139  [pdf, other

    cs.CR

    Towards Practical Integrity in the Smart Home with HomeEndorser

    Authors: Kaushal Kafle, Kirti Jagtap, Mansoor Ahmed-Rengers, Trent Jaeger, Adwait Nadkarni

    Abstract: Home automation in modern smart home platforms is often facilitated using trigger-action routines. While such routines enable flexible automation, they also lead to an instance of the integrity problem in these systems: untrusted third-parties may use platform APIs to modify the abstract home objects (AHOs) that privileged, high-integrity devices such as security cameras rely on (i.e., as triggers… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 15 pages

  13. Why Crypto-detectors Fail: A Systematic Evaluation of Cryptographic Misuse Detection Techniques

    Authors: Amit Seal Ami, Nathan Cooper, Kaushal Kafle, Kevin Moran, Denys Poshyvanyk, Adwait Nadkarni

    Abstract: The correct use of cryptography is central to ensuring data security in modern software systems. Hence, several academic and commercial static analysis tools have been developed for detecting and mitigating crypto-API misuse. While developers are optimistically adopting these crypto-API misuse detectors (or crypto-detectors) in their software development cycles, this momentum must be accompanied b… ▽ More

    Submitted 24 July, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: 18 pages, 2 figures, 2 tables; paper published at 2022 IEEE Symposium on Security and Privacy (S&P)

  14. arXiv:2106.09707  [pdf, other

    cs.CV

    Learning to Predict Visual Attributes in the Wild

    Authors: Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, Abhinav Shrivastava

    Abstract: Visual attributes constitute a large portion of information contained in a scene. Objects can be described using a wide variety of attributes which portray their visual appearance (color, texture), geometry (shape, size, posture), and other intrinsic properties (state, action). Existing work is mostly limited to study of attribute prediction in specific domains. In this paper, we introduce a large… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted to CVPR 2021

  15. arXiv:2104.00170  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Are Bias Mitigation Techniques for Deep Learning Effective?

    Authors: Robik Shrestha, Kushal Kafle, Christopher Kanan

    Abstract: A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it is not clear how effective these methods are. This is because study protocols differ among papers, systems are tested on datasets that fail to test many forms… ▽ More

    Submitted 23 April, 2024; v1 submitted 31 March, 2021; originally announced April 2021.

    Comments: Published in WACV 2022 under the title "An Investigation of Critical Issues in Bias Mitigation Techniques"

  16. Systematic Mutation-based Evaluation of the Soundness of Security-focused Android Static Analysis Techniques

    Authors: Amit Seal Ami, Kaushal Kafle, Kevin Moran, Adwait Nadkarni, Denys Poshyvanyk

    Abstract: Mobile application security has been a major area of focus for security research over the course of the last decade. Numerous application analysis tools have been proposed in response to malicious, curious, or vulnerable apps. However, existing tools, and specifically, static analysis tools, trade soundness of the analysis for precision and performance and are hence soundy. Unfortunately, the spec… ▽ More

    Submitted 17 July, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Published in ACM Transactions on Privacy and Security, extends USENIX'18 paper (arXiv:1806.09761)

    Journal ref: ACM Transactions on Privacy and Security, Volume 24, Issue 3, Article No. 15, 2021

  17. $μ$SE: Mutation-based Evaluation of Security-focused Static Analysis Tools for Android

    Authors: Amit Seal Ami, Kaushal Kafle, Kevin Moran, Adwait Nadkarni, Denys Poshyvanyk

    Abstract: This demo paper presents the technical details and usage scenarios of $μ$SE: a mutation-based tool for evaluating security-focused static analysis tools for Android. Mutation testing is generally used by software practitioners to assess the robustness of a given test-suite. However, we leverage this technique to systematically evaluate static analysis tools and uncover and document soundness issue… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 43rd International Conference on Software Engineering, Virtual (originally in Madrid, Spain) - Demonstrations Track

  18. arXiv:2005.09241  [pdf, other

    cs.CV cs.LG

    On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

    Authors: Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

    Abstract: Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  19. arXiv:2004.13587  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Do We Need Fully Connected Output Layers in Convolutional Networks?

    Authors: Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan

    Abstract: Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification. While this design has been successful, for datasets with a large number of categories, the fully connected layers often account for a large percentage of the network's parameters. For applications with mem… ▽ More

    Submitted 28 April, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

  20. arXiv:2004.05704  [pdf, other

    cs.CV cs.AI cs.CL

    Visual Grounding Methods for VQA are Working for the Wrong Reasons!

    Authors: Robik Shrestha, Kushal Kafle, Christopher Kanan

    Abstract: Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performan… ▽ More

    Submitted 23 April, 2024; v1 submitted 12 April, 2020; originally announced April 2020.

    Comments: Published in ACL 2020 under the title "A negative case analysis of visual grounding methods for VQA"

  21. arXiv:1910.02509  [pdf, other

    cs.LG cs.CV cs.NE

    REMIND Your Neural Network to Prevent Catastrophic Forgetting

    Authors: Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, Christopher Kanan

    Abstract: People learn throughout life. However, incrementally updating conventional neural networks leads to catastrophic forgetting. A common remedy is replay, which is inspired by how the brain consolidates memory. Replay involves fine-tuning a network on a mixture of new and old instances. While there is neuroscientific evidence that the brain replays compressed memories, existing methods for convolutio… ▽ More

    Submitted 13 July, 2020; v1 submitted 6 October, 2019; originally announced October 2019.

    Comments: To appear in the European Conference on Computer Vision (ECCV-2020)

  22. arXiv:1908.01801  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Answering Questions about Data Visualizations using Efficient Bimodal Fusion

    Authors: Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan

    Abstract: Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Withou… ▽ More

    Submitted 22 July, 2020; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: Presented at WACV, 2020

  23. arXiv:1907.00124  [pdf, other

    cs.CR

    Helion: Enabling a Natural Perspective of Home Automation

    Authors: Sunil Manandhar, Kevin Moran, Kaushal Kafle, Ruhao Tang, Denys Poshyvanyk, Adwait Nadkarni

    Abstract: Security researchers have recently discovered significant security and safety issues related to home automation and developed approaches to address them. Such approaches often face design and evaluation challenges which arise from their restricted perspective of home automation that is bounded by the IoT apps they analyze. The challenges of past work can be overcome by relying on a deeper understa… ▽ More

    Submitted 28 June, 2019; originally announced July 2019.

  24. arXiv:1904.09317  [pdf, other

    cs.LG cs.CL cs.CV cs.NE stat.ML

    Challenges and Prospects in Vision and Language Research

    Authors: Kushal Kafle, Robik Shrestha, Christopher Kanan

    Abstract: Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving go… ▽ More

    Submitted 24 May, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

  25. arXiv:1903.00366  [pdf, other

    cs.CV

    Answer Them All! Toward Universal Visual Question Answering Models

    Authors: Robik Shrestha, Kushal Kafle, Christopher Kanan

    Abstract: Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both… ▽ More

    Submitted 5 April, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

    Comments: 8 pages

  26. arXiv:1812.01597  [pdf, other

    cs.CR

    A Study of Data Store-based Home Automation

    Authors: Kaushal Kafle, Kevin Moran, Sunil Manandhar, Adwait Nadkarni, Denys Poshyvanyk

    Abstract: Home automation platforms provide a new level of convenience by enabling consumers to automate various aspects of physical objects in their homes. While the convenience is beneficial, security flaws in the platforms or integrated third-party products can have serious consequences for the integrity of a user's physical environment. In this paper we perform a systematic security evaluation of two po… ▽ More

    Submitted 4 December, 2018; originally announced December 2018.

    Comments: Accepted to the The 9th ACM Conference on Data and Application Security and Privacy (CODASPY'19), 12 pages

  27. arXiv:1810.12440  [pdf, other

    cs.CV

    TallyQA: Answering Complex Counting Questions

    Authors: Manoj Acharya, Kushal Kafle, Christopher Kanan

    Abstract: Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that… ▽ More

    Submitted 31 October, 2018; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: To appear in AAAI 2019 ( To download the dataset please go to http://www.manojacharya.com/ )

  28. arXiv:1806.09761  [pdf, other

    cs.CR cs.SE

    Discovering Flaws in Security-Focused Static Analysis Tools for Android using Systematic Mutation

    Authors: Richard Bonett, Kaushal Kafle, Kevin Moran, Adwait Nadkarni, Denys Poshyvanyk

    Abstract: Mobile application security has been one of the major areas of security research in the last decade. Numerous application analysis tools have been proposed in response to malicious, curious, or vulnerable apps. However, existing tools, and specifically, static analysis tools, trade soundness of the analysis for precision and performance, and are hence soundy. Unfortunately, the specific unsound ch… ▽ More

    Submitted 27 June, 2018; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: Accepted as a technical paper at the 27th USENIX Security Symposium (USENIX'18)

  29. arXiv:1801.08163  [pdf, other

    cs.CV cs.GR

    DVQA: Understanding Data Visualizations via Question Answering

    Authors: Kushal Kafle, Brian Price, Scott Cohen, Christopher Kanan

    Abstract: Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them. Existing methods fail when faced with even minor variations in appearance. Here, we present DVQA, a dataset that tests many aspects of bar chart understanding in a question answering framework. Unlike visual question answering (VQA), DVQA requires processing words and answers that are unique to… ▽ More

    Submitted 29 March, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

    Comments: CVPR 2018 Camera Ready Version

  30. arXiv:1703.09684  [pdf, other

    cs.CV cs.AI cs.CL

    An Analysis of Visual Question Answering Algorithms

    Authors: Kushal Kafle, Christopher Kanan

    Abstract: In visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are evaluated on them. As a result, evaluation scores are inflated and predominantly determined by answering easier questions, making it difficult to compare different meth… ▽ More

    Submitted 13 September, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: To appear in ICCV 2017. Visit http://kushalkafle.com/projects/tdiuc to download the dataset

  31. Visual Question Answering: Datasets, Algorithms, and Future Challenges

    Authors: Kushal Kafle, Christopher Kanan

    Abstract: Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images. Since the release of the first VQA dataset in 2014, additional datasets have been released and… ▽ More

    Submitted 14 June, 2017; v1 submitted 5 October, 2016; originally announced October 2016.