Skip to main content

Showing 1–8 of 8 results for author: Assouel, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15871  [pdf, ps, other

    cs.CV

    Visual symbolic mechanisms: Emergent symbol processing in vision language models

    Authors: Rim Assouel, Declan Campbell, Taylor Webb

    Abstract: To accurately process a visual scene, observers must bind features together to represent individual objects. This capacity is necessary, for instance, to distinguish an image containing a red square and a blue circle from an image containing a blue square and a red circle. Recent work has found that language models solve this 'binding problem' via a set of symbol-like, content-independent indices,… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  2. arXiv:2502.14113  [pdf, other

    cs.CV cs.AI

    Object-centric Binding in Contrastive Language-Image Pretraining

    Authors: Rim Assouel, Pietro Astolfi, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: Recent advances in vision language models (VLM) have been driven by contrastive models such as CLIP, which learn to associate visual information with their corresponding text descriptions. However, these models have limitations in understanding complex compositional scenes involving multiple objects and their spatial relationships. To address these challenges, we propose a novel approach that dive… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  3. arXiv:2412.05467  [pdf, other

    cs.LG cs.AI cs.SE

    The BrowserGym Ecosystem for Web Agent Research

    Authors: Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin, Massimo Caccia, Léo Boisvert, Megh Thakkar, Tom Marty, Rim Assouel, Sahar Omidi Shayegan, Lawrence Keunho Jang, Xing Han Lù, Ori Yoran, Dehan Kong, Frank F. Xu, Siva Reddy, Quentin Cappart, Graham Neubig, Ruslan Salakhutdinov, Nicolas Chapados, Alexandre Lacoste

    Abstract: The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging automation and Large Language Models (LLMs). Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. In an earlier work, Drouin et al. (2024) i… ▽ More

    Submitted 28 February, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  4. arXiv:2410.15184  [pdf, other

    cs.LG cs.AI

    Action abstractions for amortized sampling

    Authors: Oussama Boussif, Léna Néhale Ezzine, Joseph D Viviano, Michał Koziarski, Moksh Jain, Nikolay Malkin, Emmanuel Bengio, Rim Assouel, Yoshua Bengio

    Abstract: As trajectories sampled by policies used by reinforcement learning (RL) and generative flow networks (GFlowNets) grow longer, credit assignment and exploration become more challenging, and the long planning horizon hinders mode discovery and generalization. The challenge is particularly pronounced in entropy-seeking RL methods, such as generative flow networks, where the agent must learn to sample… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  5. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  6. arXiv:2310.18807  [pdf, other

    cs.AI cs.CV

    OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning

    Authors: Rim Assouel, Pau Rodriguez, Perouz Taslakian, David Vazquez, Yoshua Bengio

    Abstract: A key aspect of human intelligence is the ability to imagine -- composing learned concepts in novel ways -- to make sense of new scenarios. Such capacity is not yet attained for machine learning systems. In this work, in the context of visual reasoning, we show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination. Our method, denoted Object-… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  7. arXiv:2006.13291  [pdf, other

    cs.CV cs.LG eess.IV

    Image-to-image Mapping with Many Domains by Sparse Attribute Transfer

    Authors: Matthew Amodio, Rim Assouel, Victor Schmidt, Tristan Sylvain, Smita Krishnaswamy, Yoshua Bengio

    Abstract: Unsupervised image-to-image translation consists of learning a pair of mappings between two domains without known pairwise correspondences between points. The current convention is to approach this task with cycle-consistent GANs: using a discriminator to encourage the generator to change the image to match the target domain, while training the generator to be inverted with another mapping. While… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  8. arXiv:1811.09766  [pdf, other

    cs.LG cs.AI

    DEFactor: Differentiable Edge Factorization-based Probabilistic Graph Generation

    Authors: Rim Assouel, Mohamed Ahmed, Marwin H Segler, Amir Saffari, Yoshua Bengio

    Abstract: Generating novel molecules with optimal properties is a crucial step in many industries such as drug discovery. Recently, deep generative models have shown a promising way of performing de-novo molecular design. Although graph generative models are currently available they either have a graph size dependency in their number of parameters, limiting their use to only very small graphs or are formula… ▽ More

    Submitted 24 November, 2018; originally announced November 2018.