Skip to main content

Showing 1–11 of 11 results for author: Akula, A R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12346  [pdf, ps, other

    cs.CL cs.AI

    Refract ICL: Rethinking Example Selection in the Era of Million-Token Models

    Authors: Arjun R. Akula, Kazuma Hashimoto, Krishna Srinivasan, Aditi Chaudhary, Karthik Raman, Michael Bendersky

    Abstract: The emergence of long-context large language models (LLMs) has enabled the use of hundreds, or even thousands, of demonstrations for in-context learning (ICL) - a previously impractical regime. This paper investigates whether traditional ICL selection strategies, which balance the similarity of ICL examples to the test input (using a text retriever) with diversity within the ICL set, remain effect… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  2. arXiv:2305.18373  [pdf, other

    cs.CV cs.CL

    KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

    Authors: Zhiwei Jia, Pradyumna Narayana, Arjun R. Akula, Garima Pruthi, Hao Su, Sugato Basu, Varun Jampani

    Abstract: Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  3. arXiv:2212.09898  [pdf, other

    cs.CV

    MetaCLUE: Towards Comprehensive Visual Metaphors Research

    Authors: Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani

    Abstract: Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, met… ▽ More

    Submitted 2 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted in CVPR 2023. Project page: https://metaclue.github.io/ , Video summary: https://youtu.be/V3TmeNETL-o

  4. arXiv:2201.11194  [pdf, other

    cs.HC cs.LG

    Attention cannot be an Explanation

    Authors: Arjun R Akula, Song-Chun Zhu

    Abstract: Attention based explanations (viz. saliency maps), by providing interpretability to black box models such as deep neural networks, are assumed to improve human trust and reliance in the underlying models. Recently, it has been shown that attention weights are frequently uncorrelated with gradient-based measures of feature importance. Motivated by this, we ask a follow-up question: "Assuming that w… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2109.01401, arXiv:1909.06907

  5. arXiv:2201.09639  [pdf, other

    cs.CV

    Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding

    Authors: Arjun R. Akula

    Abstract: Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image. Through cross-dataset adaptation methods, it is possible to transfer knowledge from a source dataset with larger train samples to a target dataset where training set is limited. Suppose a VQA model trained on one dataset train set fails in adapting to another, it is hard to identif… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  6. arXiv:2201.06207  [pdf, other

    cs.CV

    Discourse Analysis for Evaluating Coherence in Video Paragraph Captions

    Authors: Arjun R Akula, Song-Chun Zhu

    Abstract: Video paragraph captioning is the task of automatically generating a coherent paragraph description of the actions in a video. Previous linguistic studies have demonstrated that coherence of a natural language text is reflected by its discourse structure and relations. However, existing video captioning methods evaluate the coherence of generated paragraphs by comparing them merely against human p… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

  7. arXiv:2109.01401  [pdf, other

    cs.AI cs.CV cs.LG

    CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models

    Authors: Arjun R. Akula, Keze Wang, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Chai, Song-Chun Zhu

    Abstract: We propose CX-ToM, short for counterfactual explanations with theory-of mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More… ▽ More

    Submitted 2 December, 2021; v1 submitted 3 September, 2021; originally announced September 2021.

    Comments: Accepted by iScience Cell Press Journal 2021. arXiv admin note: text overlap with arXiv:1909.06907

  8. arXiv:2005.01655  [pdf, other

    cs.CL cs.CV

    Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions

    Authors: Arjun R Akula, Spandana Gella, Yaser Al-Onaizan, Song-Chun Zhu, Siva Reddy

    Abstract: Visual referring expression recognition is a challenging task that requires natural language understanding in the context of an image. We critically examine RefCOCOg, a standard benchmark for this task, using a human study and show that 83.7% of test instances do not require reasoning on linguistic structure, i.e., words are enough to identify the target object, the word order doesn't matter. To m… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  9. arXiv:1909.06907  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust

    Authors: Arjun R. Akula, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Y. Chai, Song-Chun Zhu

    Abstract: We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations. We pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, the machine generates sequence of explanations in a dialog which takes into account three important aspects at each dialog turn: (a)… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: A short version of this was presented at CVPR 2019 Workshop on Explainable AI

  10. arXiv:1903.05720  [pdf, other

    cs.AI

    Natural Language Interaction with Explainable AI Models

    Authors: Arjun R Akula, Sinisa Todorovic, Joyce Y Chai, Song-Chun Zhu

    Abstract: This paper presents an explainable AI (XAI) system that provides explanations for its predictions. The system consists of two key components -- namely, the prediction And-Or graph (AOG) model for recognizing and localizing concepts of interest in input data, and the XAI model for providing explanations to the user about the AOG's predictions. In this work, we focus on the XAI model specified to in… ▽ More

    Submitted 7 July, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

    Journal ref: CVPR 2019 Workshop on Explainable AI

  11. arXiv:1903.02252  [pdf, other

    cs.CV

    Discourse Parsing in Videos: A Multi-modal Appraoch

    Authors: Arjun R. Akula, Song-Chun Zhu

    Abstract: Text-level discourse parsing aims to unmask how two sentences in the text are related to each other. We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video. Here we use the term scene to refer to a subset of video frames that can better summarize the video. In order to collect a dataset for learning discourse cues from videos, one… ▽ More

    Submitted 22 January, 2022; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: Accepted in CVPR 2019 Workshop on Language and Vision (Oral Presentation)

    Journal ref: CVPR 2019 Workshop on Language and Vision (Oral Presentation)