Skip to main content

Showing 1–11 of 11 results for author: Panagopoulou, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.01275  [pdf, ps, other

    cs.AI

    Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D

    Authors: Artemis Panagopoulou, Le Xue, Honglu Zhou, silvio savarese, Ran Xu, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles

    Abstract: Real-world decision-making often begins with identifying which modality contains the most relevant information for a given query. While recent multimodal models have made impressive progress in processing diverse inputs, it remains unclear whether they can reason contrastively across multiple modalities to select the one that best satisfies a natural language prompt. We argue this capability is fo… ▽ More

    Submitted 15 September, 2025; v1 submitted 1 June, 2025; originally announced June 2025.

  2. arXiv:2412.08859  [pdf, other

    cs.CV

    ViUniT: Visual Unit Tests for More Robust Visual Programming

    Authors: Artemis Panagopoulou, Honglu Zhou, Silvio Savarese, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles

    Abstract: Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models answer correctly, they produce incorrect programs 33% of the time. These models are often right for the wrong reasons and risk unexpected failures on new data. Unit tests play a foundational role in ensuring co… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  3. arXiv:2405.19423  [pdf, other

    cs.CV cs.AI

    Evaluating Vision-Language Models on Bistable Images

    Authors: Artemis Panagopoulou, Coby Melkin, Chris Callison-Burch

    Abstract: Bistable images, also known as ambiguous or reversible images, present visual stimuli that can be seen in two distinct interpretations, though not simultaneously by the observer. In this study, we conduct the most extensive examination of vision-language models using bistable images to date. We manually gathered a dataset of 29 bistable images, along with their associated labels, and subjected the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2311.18799  [pdf, other

    cs.CV cs.CL

    X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

    Authors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

    Abstract: Recent research has achieved significant advancements in visual reasoning tasks through learning image-to-language projections and leveraging the impressive reasoning abilities of Large Language Models (LLMs). This paper introduces an efficient and effective framework that integrates multiple modalities (images, 3D, audio and video) to a frozen LLM and demonstrates an emergent ability for cross-mo… ▽ More

    Submitted 9 September, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  5. arXiv:2305.14724  [pdf, other

    cs.CL cs.AI cs.CV cs.HC

    I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

    Authors: Tuhin Chakrabarty, Arkadiy Saakyan, Olivia Winn, Artemis Panagopoulou, Yue Yang, Marianna Apidianaki, Smaranda Muresan

    Abstract: Visual metaphors are powerful rhetorical devices used to persuade or communicate creative ideas through images. Similar to linguistic metaphors, they convey meaning implicitly through symbolism and juxtaposition of the symbols. We propose a new task of generating visual metaphors from linguistic metaphors. This is a challenging task for diffusion-based text-to-image models, such as DALL$\cdot$E 2,… ▽ More

    Submitted 14 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ACL 2023 (Findings)

  6. arXiv:2305.08275  [pdf, other

    cs.CV

    ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

    Authors: Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

    Abstract: Recent advancements in multimodal pre-training have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions. However, the methods used by existing frameworks to curate such multimodal data, in particular language descriptions for 3D shapes, are not scalable, and the collected language descriptions are… ▽ More

    Submitted 25 April, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: CVPR2024

    Journal ref: CVPR2024

  7. arXiv:2304.01721  [pdf, other

    cs.OS

    Virtio-FPGA: a virtualization solution for SoC-attached FPGAs

    Authors: Anna Panagopoulou, Michele Paolino, Daniel Raho

    Abstract: Recently, FPGA accelerators have risen in popularity as they present a suitable way of satisfying the high-computation and low-power demands of real time applications. The modern electric transportation systems (such as aircraft, road vehicles) can greatly profit from embedded FPGAs, which incorporate both high-performance and flexibility features into a single SoC. At the same time, the virtualiz… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  8. arXiv:2211.11158  [pdf, other

    cs.CV cs.CL

    Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

    Authors: Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, Mark Yatskar

    Abstract: Concept Bottleneck Models (CBM) are inherently interpretable models that factor model decisions into human-readable concepts. They allow people to easily understand why a model is failing, a critical feature for high-stakes applications. CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and… ▽ More

    Submitted 25 April, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Published in CVPR 2023, 18 pages, 12 figures, 16 tables

  9. arXiv:2210.12905  [pdf, other

    cs.CL

    Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

    Authors: Yue Yang, Artemis Panagopoulou, Marianna Apidianaki, Mark Yatskar, Chris Callison-Burch

    Abstract: Neural language models encode rich knowledge about entities and their relationships which can be extracted from their representations using probing. Common properties of nouns (e.g., red strawberries, small ant) are, however, more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts. We hypothesize this to mainly be the case for perceptual… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022; The first two authors contributed equally

    Journal ref: Findings of EMNLP 2022

  10. arXiv:2111.09276  [pdf, other

    cs.CV cs.CL

    Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval

    Authors: Yue Yang, Joongwon Kim, Artemis Panagopoulou, Mark Yatskar, Chris Callison-Burch

    Abstract: Schemata are structured representations of complex tasks that can aid artificial intelligence by allowing models to break down complex tasks into intermediate steps. We propose a novel system that induces schemata from web videos and generalizes them to capture unseen tasks with the goal of improving video retrieval performance. Our system proceeds in three major phases: (1) Given a task with rela… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  11. arXiv:2104.05845  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Visual Goal-Step Inference using wikiHow

    Authors: Yue Yang, Artemis Panagopoulou, Qing Lyu, Li Zhang, Mark Yatskar, Chris Callison-Burch

    Abstract: Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible st… ▽ More

    Submitted 9 September, 2021; v1 submitted 12 April, 2021; originally announced April 2021.