Skip to main content

Showing 1–16 of 16 results for author: Astolfi, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.15588  [pdf, other

    cs.LG cs.AI

    Improving the Scaling Laws of Synthetic Data with Deliberate Practice

    Authors: Reyhane Askari-Hemmat, Mohammad Pezeshki, Elvis Dohmatob, Florian Bordes, Pietro Astolfi, Melissa Hall, Jakob Verbeek, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: Inspired by the principle of deliberate practice in human learning, we propose Deliberate Practice for Synthetic Data Generation (DP), a novel framework that improves sample efficiency through dynamic synthetic data generation. Prior work has shown that scaling synthetic data is inherently challenging, as naively adding new data leads to diminishing returns. To address this, pruning has been ident… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  2. arXiv:2502.14113  [pdf, other

    cs.CV cs.AI

    Object-centric Binding in Contrastive Language-Image Pretraining

    Authors: Rim Assouel, Pietro Astolfi, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: Recent advances in vision language models (VLM) have been driven by contrastive models such as CLIP, which learn to associate visual information with their corresponding text descriptions. However, these models have limitations in understanding complex compositional scenes involving multiple objects and their spatial relationships. To address these challenges, we propose a novel approach that dive… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  3. arXiv:2412.10604  [pdf, other

    cs.CV

    EvalGIM: A Library for Evaluating Generative Image Models

    Authors: Melissa Hall, Oscar Mañas, Reyhane Askari-Hemmat, Mark Ibrahim, Candace Ross, Pietro Astolfi, Tariq Berrada Ifriqi, Marton Havasi, Yohann Benchetrit, Karen Ullrich, Carolina Braga, Abhishek Charnalia, Maeve Ryan, Mike Rabbat, Michal Drozdzal, Jakob Verbeek, Adriana Romero-Soriano

    Abstract: As the use of text-to-image generative models increases, so does the adoption of automatic benchmarking methods used in their evaluation. However, while metrics and datasets abound, there are few unified benchmarking libraries that provide a framework for performing evaluations across many datasets and metrics. Furthermore, the rapid introduction of increasingly robust benchmarking methods require… ▽ More

    Submitted 18 December, 2024; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: For code, see https://github.com/facebookresearch/EvalGIM/tree/main

  4. arXiv:2411.04873  [pdf, other

    cs.CV

    Boosting Latent Diffusion with Perceptual Objectives

    Authors: Tariq Berrada, Pietro Astolfi, Melissa Hall, Marton Havasi, Yohann Benchetrit, Adriana Romero-Soriano, Karteek Alahari, Michal Drozdzal, Jakob Verbeek

    Abstract: Latent diffusion models (LDMs) power state-of-the-art high-resolution generative image models. LDMs learn the data distribution in the latent space of an autoencoder (AE) and produce images by mapping the generated latents into RGB image space using the AE decoder. While this approach allows for efficient model training and sampling, it induces a disconnect between the training of the diffusion mo… ▽ More

    Submitted 16 January, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: Pre-print

  5. arXiv:2411.03177  [pdf, other

    cs.CV cs.AI

    On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models

    Authors: Tariq Berrada Ifriqi, Pietro Astolfi, Melissa Hall, Reyhane Askari-Hemmat, Yohann Benchetrit, Marton Havasi, Matthew Muckley, Karteek Alahari, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal

    Abstract: Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, the key components of the best performing LDM training recipes are oftentimes not available to the research community, preventing apple-to-apple comparisons and hindering the validation of progress in the field. In this work, we perform an in-depth study of LDM training recipes fo… ▽ More

    Submitted 20 January, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

    Comments: Accepted as a conference paper (poster) for NeurIPS 2024

  6. arXiv:2407.18134  [pdf, other

    cs.CV cs.LG

    $\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

    Authors: Vlad Sobal, Mark Ibrahim, Randall Balestriero, Vivien Cabannes, Diane Bouchacourt, Pietro Astolfi, Kyunghyun Cho, Yann LeCun

    Abstract: Learning good representations involves capturing the diverse ways in which data samples relate. Contrastive loss - an objective matching related samples - underlies methods from self-supervised to multimodal learning. Contrastive losses, however, can be viewed more broadly as modifying a similarity graph to indicate how samples should relate in the embedding space. This view reveals a shortcoming… ▽ More

    Submitted 11 September, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  7. arXiv:2406.10429  [pdf, other

    cs.CV cs.AI

    Consistency-diversity-realism Pareto fronts of conditional image generative models

    Authors: Pietro Astolfi, Marlene Careil, Melissa Hall, Oscar Mañas, Matthew Muckley, Jakob Verbeek, Adriana Romero Soriano, Michal Drozdzal

    Abstract: Building world models that accurately and comprehensively represent the real world is the utmost aspiration for conditional image generative models as it would enable their use as world simulators. For these models to be successful world models, they should not only excel at image quality and prompt-image consistency but also ensure high representation diversity. However, current research in gener… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  9. arXiv:2403.17804  [pdf, other

    cs.CV cs.CL

    Improving Text-to-Image Consistency via Automatic Prompt Optimization

    Authors: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

    Abstract: Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  10. arXiv:2403.14421  [pdf, other

    cs.LG cs.CR cs.CV

    DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

    Authors: Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo

    Abstract: Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guara… ▽ More

    Submitted 13 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  11. arXiv:2312.08578  [pdf, other

    cs.CV

    A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

    Authors: Jack Urbanek, Florian Bordes, Pietro Astolfi, Mary Williamson, Vasu Sharma, Adriana Romero-Soriano

    Abstract: Curation methods for massive vision-language datasets trade off between dataset size and quality. However, even the highest quality of available curated captions are far too short to capture the rich visual detail in an image. To show the value of dense and highly-aligned image-text pairs, we collect the Densely Captioned Images (DCI) dataset, containing 7805 natural images human-annotated with ma… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  12. arXiv:2306.07483  [pdf, other

    cs.CV

    Semi-supervised learning made simple with self-supervised clustering

    Authors: Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal, Moin Nabi, Elisa Ricci

    Abstract: Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. However, in many real-world scenarios, labels are partially available, motivating a recent line of work on semi-supervised methods inspired by self-supervised principles. In this paper, we propose a conceptually simple yet empirically powerful approach to turn clustering-based… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: CVPR 2023 - Code available at https://github.com/pietroastolfi/suave-daino

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) 3187-3197

  13. arXiv:2305.08675  [pdf

    cs.CV

    Improved baselines for vision-language pre-training

    Authors: Enrico Fini, Pietro Astolfi, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal

    Abstract: Contrastive learning has emerged as an efficient framework to learn multimodal representations. CLIP, a seminal work in this area, achieved impressive results by training on paired image-text data using the contrastive loss. Recent work claims improvements over CLIP using additional non-contrastive losses inspired from self-supervised learning. However, it is sometimes hard to disentangle the cont… ▽ More

    Submitted 4 November, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: TMLR, featured certification; changelog at https://openreview.net/forum?id=a7nvXxNmdV

    Journal ref: Transactions on Machine Learning Research, 10/2023, issn 2835-8856

  14. arXiv:2303.09677  [pdf, other

    cs.CV

    Instance-Conditioned GAN Data Augmentation for Representation Learning

    Authors: Pietro Astolfi, Arantxa Casanova, Jakob Verbeek, Pascal Vincent, Adriana Romero-Soriano, Michal Drozdzal

    Abstract: Data augmentation has become a crucial component to train state-of-the-art visual representation models. However, handcrafting combinations of transformations that lead to improved performances is a laborious task, which can result in visually unrealistic samples. To overcome these limitations, recent works have explored the use of generative models as learnable data augmentation tools, showing pr… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: TMLR reviews at https://openreview.net/forum?id=1n7q9mxG3T&referrer=%5BTMLR%5D(%2Fgroup%3Fid%3DTMLR)

  15. arXiv:2212.03300  [pdf, other

    cs.CV q-bio.NC

    Supervised Tractogram Filtering using Geometric Deep Learning

    Authors: Pietro Astolfi, Ruben Verhagen, Laurent Petit, Emanuele Olivetti, Silvio Sarubbo, Jonathan Masci, Davide Boscaini, Paolo Avesani

    Abstract: A tractogram is a virtual representation of the brain white matter. It is composed of millions of virtual fibers, encoded as 3D polylines, which approximate the white matter axonal pathways. To date, tractograms are the most accurate white matter representation and thus are used for tasks like presurgical planning and investigations of neuroplasticity, brain disorders, or brain networks. However,… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Pre-print. Under review at journal

  16. arXiv:2003.11013  [pdf, other

    q-bio.NC cs.CV eess.IV

    Tractogram filtering of anatomically non-plausible fibers with geometric deep learning

    Authors: Pietro Astolfi, Ruben Verhagen, Laurent Petit, Emanuele Olivetti, Jonathan Masci, Davide Boscaini, Paolo Avesani

    Abstract: Tractograms are virtual representations of the white matter fibers of the brain. They are of primary interest for tasks like presurgical planning, and investigation of neuroplasticity or brain disorders. Each tractogram is composed of millions of fibers encoded as 3D polylines. Unfortunately, a large portion of those fibers are not anatomically plausible and can be considered artifacts of the trac… ▽ More

    Submitted 9 July, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

    Comments: Accepted at MICCAI2020