Skip to main content

Showing 1–2 of 2 results for author: Berman, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18790  [pdf, other

    cs.CV cs.AI

    MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data

    Authors: William Berman, Alexander Peysakhovich

    Abstract: We train a model to generate images from multimodal prompts of interleaved text and images such as "a <picture of a man> man and his <picture of a dog> dog in an <picture of a cartoon> animated style." We bootstrap a multimodal dataset by extracting semantically meaningful image crops corresponding to words in the image captions of synthetically generated and publicly available text-image data. Ou… ▽ More

    Submitted 11 September, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2401.01808  [pdf, other

    cs.CV

    aMUSEd: An Open MUSE Reproduction

    Authors: Suraj Patil, William Berman, Robin Rombach, Patrick von Platen

    Abstract: We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpre… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.