Skip to main content

Showing 1–50 of 63 results for author: Russakovsky, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.08065  [pdf, ps, other

    astro-ph.IM cs.LG

    Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions

    Authors: Ye Zhu, Duo Xu, Zhiwei Deng, Jonathan C. Tan, Olga Russakovsky

    Abstract: We study Diffusion Schrödinger Bridge (DSB) models in the context of dynamical astrophysical systems, specifically tackling observational inverse prediction tasks within Giant Molecular Clouds (GMCs) for star formation. We introduce the Astro-DSB model, a variant of DSB with the pairwise domain assumption tailored for astrophysical dynamics. By investigating its learning process and prediction per… ▽ More

    Submitted 11 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Preprint. Code will be available at https://github.com/L-YeZhu/AstroDSB

  2. arXiv:2504.21850  [pdf, other

    cs.CV

    COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning

    Authors: Xindi Wu, Hee Seung Hwang, Polina Kirichenko, Olga Russakovsky

    Abstract: Multimodal Large Language Models (MLLMs) excel at simple vision-language tasks but struggle when faced with complex tasks that require multiple capabilities, such as simultaneously recognizing objects, counting them, and understanding their spatial relationships. This might be partially the result of the fact that Visual Instruction Tuning (VIT), a critical training step for MLLMs, has traditional… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 17 pages, 13 figures

  3. arXiv:2504.10745  [pdf, other

    cs.HC cs.CV

    Interactivity x Explainability: Toward Understanding How Interactivity Can Improve Computer Vision Explanations

    Authors: Indu Panigrahi, Sunnie S. Y. Kim, Amna Liaqat, Rohan Jinturkar, Olga Russakovsky, Ruth Fong, Parastoo Abtahi

    Abstract: Explanations for computer vision models are important tools for interpreting how the underlying models work. However, they are often presented in static formats, which pose challenges for users, including information overload, a gap between semantic and pixel-level information, and limited opportunities for exploration. We investigate interactivity as a mechanism for tackling these issues in three… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: To appear in Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)

  4. arXiv:2503.19846  [pdf, other

    cs.CV cs.LG

    Attention IoU: Examining Biases in CelebA using Attention Maps

    Authors: Aaron Serianni, Tyler Zhu, Olga Russakovsky, Vikram V. Ramaswamy

    Abstract: Computer vision models have been shown to exhibit and amplify biases across a wide array of datasets and tasks. Existing methods for quantifying bias in classification models primarily focus on dataset distribution and model performance on subgroups, overlooking the internal workings of a model. We introduce the Attention-IoU (Attention Intersection over Union) metric and related scores, which use… ▽ More

    Submitted 25 March, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: To appear in CVPR 2025. Code and data is available at https://github.com/aaronserianni/attention-iou . 15 pages, 14 figures, including appendix

  5. Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies

    Authors: Sunnie S. Y. Kim, Jennifer Wortman Vaughan, Q. Vera Liao, Tania Lombrozo, Olga Russakovsky

    Abstract: Large language models (LLMs) can produce erroneous responses that sound fluent and convincing, raising the risk that users will rely on these responses as if they were correct. Mitigating such overreliance is a key challenge. Through a think-aloud study in which participants use an LLM-infused application to answer objective questions, we identify several features of LLM responses that shape users… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: CHI 2025. This version includes the appendix

  6. arXiv:2501.01426  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Unifying Specialized Visual Encoders for Video Language Models

    Authors: Jihoon Chung, Tyler Zhu, Max Gonzalez Saez-Diez, Juan Carlos Niebles, Honglu Zhou, Olga Russakovsky

    Abstract: The recent advent of Large Language Models (LLMs) has ushered sophisticated reasoning capabilities into the realm of video through Video Large Language Models (VideoLLMs). However, VideoLLMs currently rely on a single vision encoder for all of their visual processing, which limits the amount and type of visual information that can be conveyed to the LLM. Our method, MERV, Multi-Encoder Representat… ▽ More

    Submitted 15 June, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

    Comments: Accepted to ICML 2025 as a Poster. Project page: https://tylerzhu.com/merv/

  7. arXiv:2501.00654  [pdf, ps, other

    cs.CV cs.CL cs.LG

    ICONS: Influence Consensus for Vision-Language Data Selection

    Authors: Xindi Wu, Mengzhou Xia, Rulin Shao, Zhiwei Deng, Pang Wei Koh, Olga Russakovsky

    Abstract: Training vision-language models via instruction tuning often relies on large mixtures of data spanning diverse tasks and domains. However, these mixtures frequently include redundant information, increasing computational costs without proportional performance gains, necessitating more effective data selection strategies. Existing methods typically rely on task-agnostic heuristics to estimate data… ▽ More

    Submitted 10 June, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

    Comments: 31 pages, 19 figures

  8. arXiv:2412.05101  [pdf, other

    cs.CV

    The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation

    Authors: Ruoyu Wang, Huayang Huang, Ye Zhu, Olga Russakovsky, Yu Wu

    Abstract: In this work, we introduce NoiseQuery as a novel method for enhanced noise initialization in versatile goal-driven text-to-image (T2I) generation. Specifically, we propose to leverage an aligned Gaussian noise as implicit guidance to complement explicit user-defined inputs, such as text prompts, for better generation quality and controllability. Unlike existing noise optimization methods designed… ▽ More

    Submitted 17 March, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

    Comments: 18 pages, 18 figures, 6 tables

  9. arXiv:2411.19182  [pdf, other

    cs.CV cs.AI

    SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation

    Authors: Yuhan Pei, Ruoyu Wang, Yongqi Yang, Ye Zhu, Olga Russakovsky, Yu Wu

    Abstract: Originating from the diffusion phenomenon in physics, which describes the random movement and collisions of particles, diffusion generative models simulate a random walk in the data space along the denoising trajectory. This allows information to diffuse across regions, yielding harmonious outcomes. However, the chaotic and disordered nature of information diffusion in diffusion models often resul… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

    Comments: Project page: https://pyh-129.github.io/SOW/

  10. arXiv:2408.14339  [pdf, other

    cs.CV

    ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty

    Authors: Xindi Wu, Dingli Yu, Yangsibo Huang, Olga Russakovsky, Sanjeev Arora

    Abstract: Compositionality is a critical capability in Text-to-Image (T2I) models, as it reflects their ability to understand and combine multiple concepts from text descriptions. Existing evaluations of compositional capability rely heavily on human-designed text prompts or fixed templates, limiting their diversity and complexity, and yielding low discriminative power. We propose ConceptMix, a scalable, co… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 43 pages

  11. arXiv:2406.04284  [pdf, other

    cs.LG

    What is Dataset Distillation Learning?

    Authors: William Yang, Ye Zhu, Zhiwei Deng, Olga Russakovsky

    Abstract: Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used to train high performing models, little is understood about how the information is stored. In this study, we posit and answer three questions about the behavio… ▽ More

    Submitted 22 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  12. arXiv:2404.04584  [pdf, other

    cs.CV

    D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy

    Authors: Yongqi Yang, Zhihao Qian, Ye Zhu, Olga Russakovsky, Yu Wu

    Abstract: The boom of Generative AI brings opportunities entangled with risks and concerns. Existing literature emphasizes the generalization capability of deepfake detection on unseen generators, significantly promoting the detector's ability to identify more universal artifacts. This work seeks a step toward a universal deepfake detection system with better generalization and robustness. We do so by first… ▽ More

    Submitted 23 March, 2025; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 13 pages, 3 figures, accepted by CVPR 2025

  13. arXiv:2403.19669  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Analyzing the Roles of Language and Vision in Learning from Limited Data

    Authors: Allison Chen, Ilia Sucholutsky, Olga Russakovsky, Thomas L. Griffiths

    Abstract: Does language help make sense of the visual world? How important is it to actually see the world rather than having it described with words? These basic questions about the nature of intelligence have been difficult to answer because we only had one example of an intelligent system -- humans -- and limited access to cases that isolated language or vision. However, the development of sophisticated… ▽ More

    Submitted 10 May, 2024; v1 submitted 15 February, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures

  14. arXiv:2312.10539  [pdf, other

    cs.CV

    DETER: Detecting Edited Regions for Deterring Generative Manipulations

    Authors: Sai Wang, Ye Zhu, Ruoyu Wang, Amaya Dharmasiri, Olga Russakovsky, Yu Wu

    Abstract: Generative AI capabilities have grown substantially in recent years, raising renewed concerns about potential malicious use of generated data, or "deep fakes". However, deep fake datasets have not kept up with generative AI advancements sufficiently to enable the development of deep fake detection technology which can meaningfully alert human users in real-world settings. Existing datasets typical… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: First two authors contribute equally to this work. Project page at https://deter2024.github.io/deter/

  15. arXiv:2311.02815  [pdf, other

    cs.CV

    Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning

    Authors: Nobline Yoo, Olga Russakovsky

    Abstract: The goal of 2D human pose estimation (HPE) is to localize anatomical landmarks, given an image of a person in a pose. SOTA techniques make use of thousands of labeled figures (finetuning transformers or training deep CNNs), acquired using labor-intensive crowdsourcing. On the other hand, self-supervised methods re-frame the HPE task as a reconstruction problem, enabling them to leverage the vast a… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: ICCVW 2023 Publication

  16. arXiv:2310.09213  [pdf, other

    cs.LG cs.CV

    Discovery and Expansion of New Domains within Diffusion Models

    Authors: Ye Zhu, Yu Wu, Duo Xu, Zhiwei Deng, Yan Yan, Olga Russakovsky

    Abstract: In this work, we study the generalization properties of diffusion models in a few-shot setup, introduce a novel tuning-free paradigm to synthesize the target out-of-domain (OOD) data, and demonstrate its advantages compared to existing methods in data-sparse scenarios with large domain gaps. Specifically, given a pre-trained model and a small set of images that are OOD relative to the model's trai… ▽ More

    Submitted 26 May, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Code will be released at https://github.com/L-YeZhu/DiscoveryDiff

  17. arXiv:2310.01755  [pdf, other

    cs.CV

    ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms

    Authors: William Yang, Byron Zhang, Olga Russakovsky

    Abstract: The task of out-of-distribution (OOD) detection is notoriously ill-defined. Earlier works focused on new-class detection, aiming to identify label-altering data distribution shifts, also known as "semantic shift." However, recent works argue for a focus on failure detection, expanding the OOD evaluation framework to account for label-preserving data distribution shifts, also known as "covariate sh… ▽ More

    Submitted 18 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Code and dataset at https://github.com/princetonvisualai/imagenetood

  18. arXiv:2308.07545  [pdf, other

    cs.CV

    Vision-Language Dataset Distillation

    Authors: Xindi Wu, Byron Zhang, Zhiwei Deng, Olga Russakovsky

    Abstract: Dataset distillation methods reduce large-scale datasets to smaller sets of synthetic data, preserving sufficient information to quickly train a new model from scratch. However, prior work on dataset distillation has focused exclusively on image classification datasets, whereas modern large-scale datasets are primarily vision-language datasets. In this work, we design the first vision-language dat… ▽ More

    Submitted 20 August, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: 31 pages, 13 figures

  19. arXiv:2306.04482  [pdf, other

    cs.CV

    ICON$^2$: Reliably Benchmarking Predictive Inequity in Object Detection

    Authors: Sruthi Sudhakar, Viraj Prabhu, Olga Russakovsky, Judy Hoffman

    Abstract: As computer vision systems are being increasingly deployed at scale in high-stakes applications like autonomous driving, concerns about social bias in these systems are rising. Analysis of fairness in real-world vision systems, such as object detection in driving scenes, has been limited to observing predictive inequity across attributes such as pedestrian skin tone, and lacks a consistent methodo… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023 SSAD Workshop

  20. Art and the science of generative AI: A deeper dive

    Authors: Ziv Epstein, Aaron Hertzmann, Laura Herman, Robert Mahari, Morgan R. Frank, Matthew Groh, Hope Schroeder, Amy Smith, Memo Akten, Jessica Fjeld, Hany Farid, Neil Leach, Alex Pentland, Olga Russakovsky

    Abstract: A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: This white paper is an expanded version of Epstein et al 2023 published in Science Perspectives on July 16, 2023 which you can find at the following DOI: 10.1126/science.adh4451

  21. Humans, AI, and Context: Understanding End-Users' Trust in a Real-World Computer Vision Application

    Authors: Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández

    Abstract: Trust is an important factor in people's interactions with AI systems. However, there is a lack of empirical studies examining how real end-users trust or distrust the AI system they interact with. Most research investigates one aspect of trust in lab settings with hypothetical end-users. In this paper, we provide a holistic and nuanced understanding of trust in AI through a qualitative case study… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: FAccT 2023

  22. arXiv:2303.15632  [pdf, other

    cs.CV

    UFO: A unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations for CNNs

    Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong, Olga Russakovsky

    Abstract: Concept-based explanations for convolutional neural networks (CNNs) aim to explain model behavior and outputs using a pre-defined set of semantic concepts (e.g., the model recognizes scene class ``bedroom'' based on the presence of concepts ``bed'' and ``pillow''). However, they often do not faithfully (i.e., accurately) characterize the model's behavior and can be too complex for people to unders… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  23. arXiv:2303.06167  [pdf, other

    cs.CV cs.CY cs.LG

    Overwriting Pretrained Bias with Finetuning Data

    Authors: Angelina Wang, Olga Russakovsky

    Abstract: Transfer learning is beneficial by allowing the expressive features of models pretrained on large-scale datasets to be finetuned for the target task of smaller, more domain-specific datasets. However, there is a concern that these pretrained models may come with their own biases which would propagate into the finetuned model. In this work, we investigate bias when conceptualized as both spurious c… ▽ More

    Submitted 16 August, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: ICCV 2023 Oral

  24. arXiv:2302.08357  [pdf, other

    cs.CV

    Boundary Guided Learning-Free Semantic Control with Diffusion Models

    Authors: Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan

    Abstract: Applying pre-trained generative denoising diffusion models (DDMs) for downstream tasks such as image semantic editing usually requires either fine-tuning DDMs or learning auxiliary editing networks in the existing literature. In this work, we present our BoundaryDiffusion method for efficient, effective and light-weight semantic control with frozen pre-trained DDMs, without learning any extra netw… ▽ More

    Submitted 18 October, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023. 27 pages including appendices, code at https://github.com/L-YeZhu/BoundaryDiffusion

  25. arXiv:2301.02560  [pdf, other

    cs.CV

    GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition

    Authors: Vikram V. Ramaswamy, Sing Yu Lin, Dora Zhao, Aaron B. Adcock, Laurens van der Maaten, Deepti Ghadiyaram, Olga Russakovsky

    Abstract: Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically… ▽ More

    Submitted 7 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  26. arXiv:2210.03735  [pdf, other

    cs.HC cs.AI cs.CV cs.CY

    "Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction

    Authors: Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández

    Abstract: Despite the proliferation of explainable AI (XAI) methods, little is understood about end-users' explainability needs and behaviors around XAI explanations. To address this gap and contribute to understanding how explainability can support human-AI interaction, we conducted a mixed-methods study with 20 end-users of a real-world AI application, the Merlin bird identification app, and inquired abou… ▽ More

    Submitted 16 February, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: CHI 2023

    Journal ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23-28, 2023, Hamburg, Germany. ACM, New York, NY, USA

  27. arXiv:2207.13325  [pdf, other

    cs.CV

    SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

    Authors: Mengxue Qu, Yu Wu, Wu Liu, Qiqi Gong, Xiaodan Liang, Olga Russakovsky, Yao Zhao, Yunchao Wei

    Abstract: In this paper, we investigate how to achieve better visual grounding with modern vision-language transformers, and propose a simple yet powerful Selective Retraining (SiRi) mechanism for this challenging task. Particularly, SiRi conveys a significant principle to the research of visual grounding, i.e., a better initialized vision-language encoder would help the model converge to a better local min… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: 21 pages (including Supplementary Materials); Accepted to ECCV 2022

  28. arXiv:2207.09847  [pdf, other

    cs.CL cs.AI cs.CV

    Predicting Word Learning in Children from the Performance of Computer Vision Systems

    Authors: Sunayana Rane, Mira L. Nencheva, Zeyu Wang, Casey Lew-Williams, Olga Russakovsky, Thomas L. Griffiths

    Abstract: For human children as well as machine learning systems, a key challenge in learning a word is linking the word to the visual phenomena it describes. We explore this aspect of word learning by using the performance of computer vision systems as a proxy for the difficulty of learning a word from visual cues. We show that the age at which children acquire different categories of words is correlated w… ▽ More

    Submitted 9 September, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: CogSci 2023

  29. arXiv:2207.09615  [pdf, other

    cs.CV

    Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability

    Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong, Olga Russakovsky

    Abstract: Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts. These methods evaluate a trained model on a new, "probe" dataset and correlate model predictions with the visual concepts labeled in that dataset. Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literatur… ▽ More

    Submitted 12 May, 2023; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Published at CVPR 2023

  30. arXiv:2206.09191  [pdf, other

    cs.CV

    Gender Artifacts in Visual Datasets

    Authors: Nicole Meister, Dora Zhao, Angelina Wang, Vikram V. Ramaswamy, Ruth Fong, Olga Russakovsky

    Abstract: Gender biases are known to exist within large-scale visual datasets and can be reflected or even amplified in downstream models. Many prior works have proposed methods for mitigating gender biases, often by attempting to remove gender expression information from images. To understand the feasibility and practicality of these approaches, we investigate what $\textit{gender artifacts}$ exist within… ▽ More

    Submitted 17 September, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: ICCV 2023

  31. arXiv:2206.07690  [pdf, other

    cs.CV cs.LG

    ELUDE: Generating interpretable explanations via a decomposition into labelled and unlabelled features

    Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Nicole Meister, Ruth Fong, Olga Russakovsky

    Abstract: Deep learning models have achieved remarkable success in different areas of machine learning over the past decade; however, the size and complexity of these models make them difficult to understand. In an effort to make them more interpretable, several recent works focus on explaining parts of a deep neural network through human-interpretable, semantic attributes. However, it may be impossible to… ▽ More

    Submitted 16 June, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

  32. arXiv:2206.02916  [pdf, other

    cs.LG cs.AI cs.CV

    Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks

    Authors: Zhiwei Deng, Olga Russakovsky

    Abstract: We propose an algorithm that compresses the critical information of a large dataset into compact addressable memories. These memories can then be recalled to quickly re-train a neural network and recover the performance (instead of storing and re-training on the full original dataset). Building upon the dataset distillation framework, we make a key observation that a shared common representation a… ▽ More

    Submitted 18 November, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

  33. Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation

    Authors: Angelina Wang, Vikram V. Ramaswamy, Olga Russakovsky

    Abstract: Research in machine learning fairness has historically considered a single binary demographic attribute; however, the reality is of course far more complicated. In this work, we grapple with questions that arise along three stages of the machine learning pipeline when incorporating intersectionality as multiple demographic attributes: (1) which demographic attributes to include as dataset labels,… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2022

  34. arXiv:2203.07613  [pdf, other

    cs.CL cs.CV

    CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

    Authors: Carlos E. Jimenez, Olga Russakovsky, Karthik Narasimhan

    Abstract: We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests. In contrast to existing VQA test sets, CARETS features balanced question generation to create pairs of instances to test models, with each pair focusing on a specific capability such as rephrasing, logical symmetry or image obfuscation. We e… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  35. arXiv:2201.03639  [pdf, other

    cs.CV

    Multi-Query Video Retrieval

    Authors: Zeyu Wang, Yu Wu, Karthik Narasimhan, Olga Russakovsky

    Abstract: Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. Despite recent progress, imperfect annotations in existing video retrieval datasets have posed significant challenges on model evaluation and development. In this paper, we tackle this issue by focusing on the less-studied setting of multi-query vide… ▽ More

    Submitted 20 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: ECCV 2022

  36. arXiv:2112.03184  [pdf, other

    cs.CV

    HIVE: Evaluating the Human Interpretability of Visual Explanations

    Authors: Sunnie S. Y. Kim, Nicole Meister, Vikram V. Ramaswamy, Ruth Fong, Olga Russakovsky

    Abstract: As AI technology is increasingly applied to high-impact, high-risk domains, there have been a number of new methods aimed at making AI models more human interpretable. Despite the recent growth of interpretability work, there is a lack of systematic evaluation of proposed techniques. In this work, we introduce HIVE (Human Interpretability of Visual Explanations), a novel human evaluation framework… ▽ More

    Submitted 21 July, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: ECCV 2022. Code and supplementary material are at https://princetonvisualai.github.io/HIVE

  37. arXiv:2106.08503  [pdf, other

    cs.CV

    Understanding and Evaluating Racial Biases in Image Captioning

    Authors: Dora Zhao, Angelina Wang, Olga Russakovsky

    Abstract: Image captioning is an important task for benchmarking visual reasoning and for enabling accessibility for people with vision impairments. However, as in many machine learning settings, social biases can influence image captioning in undesirable ways. In this work, we study bias propagation pathways within image captioning, focusing specifically on the COCO dataset. Prior work has analyzed gender… ▽ More

    Submitted 30 August, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: ICCV 2021

  38. [Re] Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

    Authors: Sunnie S. Y. Kim, Sharon Zhang, Nicole Meister, Olga Russakovsky

    Abstract: Singh et al. (2020) point out the dangers of contextual bias in visual recognition datasets. They propose two methods, CAM-based and feature-split, that better recognize an object or attribute in the absence of its typical context while maintaining competitive within-context accuracy. To verify their performance, we attempted to reproduce all 12 tables in the original paper, including those in the… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: ML Reproducibility Challenge 2020. Accepted for publication in the ReScience C journal

  39. arXiv:2103.06191  [pdf, other

    cs.CV

    A Study of Face Obfuscation in ImageNet

    Authors: Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng, Olga Russakovsky

    Abstract: Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective for privacy protection; nevertheless, object recognition research typically assumes access to complete, unobfuscated images. In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. Most categories in the ImageNet challenge are not people categories; however,… ▽ More

    Submitted 9 June, 2022; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted to ICML 2022

  40. arXiv:2102.12594  [pdf, other

    cs.LG cs.AI

    Directional Bias Amplification

    Authors: Angelina Wang, Olga Russakovsky

    Abstract: Mitigating bias in machine learning systems requires refining our understanding of bias propagation pathways: from societal structures to large-scale data to trained models to impact on society. In this work, we focus on one aspect of the problem, namely bias amplification: the tendency of models to amplify the biases present in the data they are trained on. A metric for measuring bias amplificati… ▽ More

    Submitted 7 June, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  41. arXiv:2012.01469  [pdf, other

    cs.CV

    Fair Attribute Classification through Latent Space De-biasing

    Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Olga Russakovsky

    Abstract: Fairness in visual recognition is becoming a prominent and critical topic of discussion as recognition systems are deployed at scale in the real world. Models trained from data in which target labels are correlated with protected attributes (e.g., gender, race) are known to learn and exploit those correlations. In this work, we introduce a method for training accurate target classifiers while miti… ▽ More

    Submitted 2 April, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted to CVPR 2021, code can be found at https://github.com/princetonvisualai/gan-debiasing

  42. arXiv:2011.13681  [pdf, other

    cs.CV

    Point and Ask: Incorporating Pointing into Visual Question Answering

    Authors: Arjun Mani, Nobline Yoo, Will Hinthorn, Olga Russakovsky

    Abstract: Visual Question Answering (VQA) has become one of the key benchmarks of visual recognition progress. Multiple VQA extensions have been explored to better simulate real-world settings: different question formulations, changing training and test distributions, conversational consistency in dialogues, and explanation-based answering. In this work, we further expand this space by considering visual qu… ▽ More

    Submitted 18 February, 2022; v1 submitted 27 November, 2020; originally announced November 2020.

  43. arXiv:2009.03949  [pdf, other

    cs.CV

    Towards Unique and Informative Captioning of Images

    Authors: Zeyu Wang, Berthy Feng, Karthik Narasimhan, Olga Russakovsky

    Abstract: Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phe… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: ECCV 2020

  44. arXiv:2007.05655  [pdf, other

    cs.CV cs.AI cs.RO

    Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

    Authors: Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky

    Abstract: The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively growing knowledge of the world; and (2) performing long-range planning and decision making in the form of effective exploration and error correction. Current m… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  45. arXiv:2004.07999  [pdf, other

    cs.CV

    REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets

    Authors: Angelina Wang, Alexander Liu, Ryan Zhang, Anat Kleiman, Leslie Kim, Dora Zhao, Iroha Shirai, Arvind Narayanan, Olga Russakovsky

    Abstract: Machine learning models are known to perpetuate and even amplify the biases present in the data. However, these data biases frequently do not become apparent until after the models are deployed. Our work tackles this issue and enables the preemptive analysis of large-scale datasets. REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset, surfacing potentia… ▽ More

    Submitted 23 July, 2021; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Extended version of ECCV 2020 Spotlight paper

  46. arXiv:2003.14269  [pdf, other

    cs.CV

    Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation

    Authors: Felix Yu, Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky

    Abstract: In the Vision-and-Language Navigation (VLN) task, an agent with egocentric vision navigates to a destination given natural language instructions. The act of manually annotating these instructions is timely and expensive, such that many existing approaches automatically generate additional samples to improve agent performance. However, these approaches still have difficulty generalizing their perfo… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: 4 page short paper

  47. Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy

    Authors: Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, Olga Russakovsky

    Abstract: Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in th… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Comments: Accepted to FAT* 2020

  48. arXiv:1912.02256  [pdf, other

    cs.CV

    Compositional Temporal Visual Grounding of Natural Language Event Descriptions

    Authors: Jonathan C. Stroud, Ryan McCaffrey, Rada Mihalcea, Jia Deng, Olga Russakovsky

    Abstract: Temporal grounding entails establishing a correspondence between natural language event descriptions and their visual depictions. Compositional modeling becomes central: we first ground atomic descriptions "girl eating an apple," "batter hitting the ball" to short video segments, and then establish the temporal relationships between the segments. This compositional structure enables models to reco… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: Project page: jonathancstroud.com/ctg

  49. arXiv:1911.11834  [pdf, other

    cs.CV

    Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation

    Authors: Zeyu Wang, Klint Qinami, Ioannis Christos Karakozis, Kyle Genova, Prem Nair, Kenji Hata, Olga Russakovsky

    Abstract: Computer vision models learn to perform a task by capturing relevant statistics from training data. It has been shown that models learn spurious age, gender, and race correlations when trained for seemingly unrelated tasks like activity recognition or image captioning. Various mitigation techniques have been presented to prevent models from utilizing or learning such biases. However, there has bee… ▽ More

    Submitted 2 April, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: To appear in CVPR 2020

  50. arXiv:1908.07086  [pdf, other

    cs.CV

    Human uncertainty makes classification more robust

    Authors: Joshua C. Peterson, Ruairidh M. Battleday, Thomas L. Griffiths, Olga Russakovsky

    Abstract: The classification performance of deep neural networks has begun to asymptote at near-perfect levels. However, their ability to generalize outside the training set and their robustness to adversarial attacks have not. In this paper, we make progress on this problem by training with full label distributions that reflect human perceptual uncertainty. We first present a new benchmark dataset which we… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

    Comments: In Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV)