Skip to main content

Showing 1–9 of 9 results for author: Mañas, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.01085  [pdf, ps, other

    cs.CV cs.AI

    Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

    Authors: Shivam Chandhok, Qian Yang, Oscar Manas, Kanishk Jain, Leonid Sigal, Aishwarya Agrawal

    Abstract: Instruction tuning has been central to the success of recent vision-language models (VLMs), but it remains expensive-requiring large-scale datasets, high-quality annotations, and large compute budgets. We propose PRioritized cOncept learninG via Relative Error-driven Sample Selection (PROGRESS), a data- and compute-efficient framework that enables VLMs to dynamically select what to learn next base… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Preprint

  2. arXiv:2412.10604  [pdf, other

    cs.CV

    EvalGIM: A Library for Evaluating Generative Image Models

    Authors: Melissa Hall, Oscar Mañas, Reyhane Askari-Hemmat, Mark Ibrahim, Candace Ross, Pietro Astolfi, Tariq Berrada Ifriqi, Marton Havasi, Yohann Benchetrit, Karen Ullrich, Carolina Braga, Abhishek Charnalia, Maeve Ryan, Mike Rabbat, Michal Drozdzal, Jakob Verbeek, Adriana Romero-Soriano

    Abstract: As the use of text-to-image generative models increases, so does the adoption of automatic benchmarking methods used in their evaluation. However, while metrics and datasets abound, there are few unified benchmarking libraries that provide a framework for performing evaluations across many datasets and metrics. Furthermore, the rapid introduction of increasingly robust benchmarking methods require… ▽ More

    Submitted 18 December, 2024; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: For code, see https://github.com/facebookresearch/EvalGIM/tree/main

  3. arXiv:2406.10429  [pdf, other

    cs.CV cs.AI

    Consistency-diversity-realism Pareto fronts of conditional image generative models

    Authors: Pietro Astolfi, Marlene Careil, Melissa Hall, Oscar Mañas, Matthew Muckley, Jakob Verbeek, Adriana Romero Soriano, Michal Drozdzal

    Abstract: Building world models that accurately and comprehensively represent the real world is the utmost aspiration for conditional image generative models as it would enable their use as world simulators. For these models to be successful world models, they should not only excel at image quality and prompt-image consistency but also ensure high representation diversity. However, current research in gener… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  4. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2403.17804  [pdf, other

    cs.CV cs.CL

    Improving Text-to-Image Consistency via Automatic Prompt Optimization

    Authors: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

    Abstract: Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  6. arXiv:2310.02567  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Improving Automatic VQA Evaluation Using Large Language Models

    Authors: Oscar Mañas, Benno Krojer, Aishwarya Agrawal

    Abstract: 8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towards open-ended generative models and OOD evaluation. In this new paradigm, the existing VQA Accuracy metric is overly stringent and underestimates the… ▽ More

    Submitted 10 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at AAAI 2024 (main track)

  7. arXiv:2210.07179  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

    Authors: Oscar Mañas, Pau Rodriguez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal

    Abstract: Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation… ▽ More

    Submitted 14 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at EACL 2023 (main track); 26 pages, 21 figures, 6 tables; Pau Rodriguez and Saba Ahmadi had equal contributions

  8. arXiv:2103.16607  [pdf, other

    cs.CV

    Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

    Authors: Oscar Mañas, Alexandre Lacoste, Xavier Giro-i-Nieto, David Vazquez, Pau Rodriguez

    Abstract: Remote sensing and automatic earth monitoring are key to solve global-scale challenges such as disaster prevention, land use monitoring, or tackling climate change. Although there exist vast amounts of remote sensing data, most of it remains unlabeled and thus inaccessible for supervised learning algorithms. Transfer learning approaches can reduce the data requirements of deep learning algorithms.… ▽ More

    Submitted 3 May, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

  9. arXiv:2007.02180  [pdf, other

    eess.IV cs.CV

    A Weakly Supervised Consistency-based Learning Method for COVID-19 Segmentation in CT Images

    Authors: Issam Laradji, Pau Rodriguez, Oscar Mañas, Keegan Lensink, Marco Law, Lironne Kurzman, William Parker, David Vazquez, Derek Nowrouzezahrai

    Abstract: Coronavirus Disease 2019 (COVID-19) has spread aggressively across the world causing an existential health crisis. Thus, having a system that automatically detects COVID-19 in tomography (CT) images can assist in quantifying the severity of the illness. Unfortunately, labelling chest CT scans requires significant domain expertise, time, and effort. We address these labelling challenges by only req… ▽ More

    Submitted 7 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.