Skip to main content

Showing 1–7 of 7 results for author: Thomas, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.17794  [pdf, other

    cs.CV cs.AI

    Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models

    Authors: Ketan Suhaas Saichandran, Xavier Thomas, Prakhar Kaushik, Deepti Ghadiyaram

    Abstract: Text-to-image generative models often struggle with long prompts detailing complex scenes, diverse objects with distinct visual characteristics and spatial relationships. In this work, we propose SCoPE (Scheduled interpolation of Coarse-to-fine Prompt Embeddings), a training-free method to improve text-to-image alignment by progressively refining the input prompt in a coarse-to-fine-grained manner… ▽ More

    Submitted 30 May, 2025; v1 submitted 22 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025 workshops (AI4CC (oral) & GMCV (poster))

  2. arXiv:2503.06698  [pdf, other

    cs.LG cs.CV

    What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

    Authors: Xavier Thomas, Deepti Ghadiyaram

    Abstract: Domain Generalization aims to develop models that can generalize to novel and unseen data distributions. In this work, we study how model architectures and pre-training objectives impact feature richness and propose a method to effectively leverage them for domain generalization. Specifically, given a pre-trained feature space, we first discover latent domain structures, referred to as pseudo-doma… ▽ More

    Submitted 28 April, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

  3. arXiv:2411.16725  [pdf, ps, other

    cs.CV

    $\textit{Revelio}$: Interpreting and leveraging semantic information in diffusion models

    Authors: Dahye Kim, Xavier Thomas, Deepti Ghadiyaram

    Abstract: We study $\textit{how}$ rich visual semantic information is represented within various layers and denoising timesteps of different diffusion architectures. We uncover monosemantic interpretable features by leveraging k-sparse autoencoders (k-SAE). We substantiate our mechanistic interpretations via transfer learning using light-weight classifiers on off-the-shelf diffusion models' features. On… ▽ More

    Submitted 30 May, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: 15 pages, 14 figures

  4. arXiv:2212.11109  [pdf, other

    cs.CV cs.CL cs.LG

    MAViC: Multimodal Active Learning for Video Captioning

    Authors: Gyanendra Das, Xavier Thomas, Anant Raj, Vikram Gupta

    Abstract: A large number of annotated video-caption pairs are required for training video captioning models, resulting in high annotation costs. Active learning can be instrumental in reducing these annotation requirements. However, active learning for video captioning is challenging because multiple semantically similar captions are valid for a video, resulting in high entropy outputs even for less-informa… ▽ More

    Submitted 11 December, 2022; originally announced December 2022.

  5. arXiv:2205.10370  [pdf, other

    cs.AI cs.LG

    Diversity vs. Recognizability: Human-like generalization in one-shot generative models

    Authors: Victor Boutin, Lakshya Singhal, Xavier Thomas, Thomas Serre

    Abstract: Robust generalization to new concepts has long remained a distinctive feature of human intelligence. However, recent progress in deep generative models has now led to neural architectures capable of synthesizing novel instances of unknown visual concepts from a single training example. Yet, a more precise comparison between these models and humans is not possible because existing performance metri… ▽ More

    Submitted 7 October, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

  6. arXiv:2112.04766  [pdf, other

    cs.LG cs.CV

    Adaptive Methods for Aggregated Domain Generalization

    Authors: Xavier Thomas, Dhruv Mahajan, Alex Pentland, Abhimanyu Dubey

    Abstract: Domain generalization involves learning a classifier from a heterogeneous collection of training sources such that it generalizes to data drawn from similar unknown target domains, with applications in large-scale learning and personalized inference. In many settings, privacy concerns prohibit obtaining domain labels for the training data samples, and instead only have an aggregated collection of… ▽ More

    Submitted 23 December, 2021; v1 submitted 9 December, 2021; originally announced December 2021.

  7. arXiv:2010.12798  [pdf

    cs.IR cs.LG

    Content-Based Personalized Recommender System Using Entity Embeddings

    Authors: Xavier Thomas

    Abstract: Recommender systems are a class of machine learning algorithms that provide relevant recommendations to a user based on the user's interaction with similar items or based on the content of the item. In settings where the content of the item is to be preserved, a content-based approach would be beneficial. This paper aims to highlight the advantages of the content-based approach through learned emb… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

    Comments: 2 Pages, 1 figure