Skip to main content

Showing 1–10 of 10 results for author: Kirstain, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.02492  [pdf, other

    cs.CV

    VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

    Authors: Hila Chefer, Uriel Singer, Amit Zohar, Yuval Kirstain, Adam Polyak, Yaniv Taigman, Lior Wolf, Shelly Sheynin

    Abstract: Despite tremendous recent progress, generative video models still struggle to capture real-world motion, dynamics, and physics. We show that this limitation arises from the conventional pixel reconstruction objective, which biases models toward appearance fidelity at the expense of motion coherence. To address this, we introduce VideoJAM, a novel framework that instills an effective motion prior t… ▽ More

    Submitted 26 May, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  2. arXiv:2501.03059  [pdf, other

    cs.CV cs.AI cs.LG

    Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

    Authors: Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman, Yossi Adi, Sagie Benaim, Adam Polyak

    Abstract: We consider the task of Image-to-Video (I2V) generation, which involves transforming static images into realistic video sequences based on a textual description. While recent advancements produce photorealistic outputs, they frequently struggle to create videos with accurate and consistent object motion, especially in multi-object scenarios. To address these limitations, we propose a two-stage com… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  3. arXiv:2410.13720  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Movie Gen: A Cast of Media Foundation Models

    Authors: Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le , et al. (63 additional authors not shown)

    Abstract: We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization,… ▽ More

    Submitted 26 February, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

  4. arXiv:2403.09334  [pdf, other

    cs.CV

    Video Editing via Factorized Diffusion Distillation

    Authors: Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman

    Abstract: We introduce Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data. To develop EVE we separately train an image editing adapter and a video generation adapter, and attach both to the same text-to-image model. Then, to align the adapters towards video editing we introduce a new unsupervised distillation procedure,… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  5. arXiv:2311.10089  [pdf, other

    cs.CV cs.AI cs.LG

    Emu Edit: Precise Image Editing via Recognition and Generation Tasks

    Authors: Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv Taigman

    Abstract: Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editin… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  6. arXiv:2305.01569  [pdf, other

    cs.CV cs.AI

    Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

    Authors: Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, Omer Levy

    Abstract: The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' pref… ▽ More

    Submitted 23 November, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

  7. arXiv:2303.01000  [pdf, other

    cs.CV cs.AI

    X&Fuse: Fusing Visual Information in Text-to-Image Generation

    Authors: Yuval Kirstain, Omer Levy, Adam Polyak

    Abstract: We introduce X&Fuse, a general approach for conditioning on visual information when generating images from text. We demonstrate the potential of X&Fuse in three different text-to-image generation scenarios. (i) When a bank of images is available, we retrieve and condition on a related image (Retrieve&Fuse), resulting in significant improvements on the MS-COCO benchmark, gaining a state-of-the-art… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  8. arXiv:2110.04374  [pdf, other

    cs.CL

    A Few More Examples May Be Worth Billions of Parameters

    Authors: Yuval Kirstain, Patrick Lewis, Sebastian Riedel, Omer Levy

    Abstract: We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  9. arXiv:2101.00438  [pdf, other

    cs.CL

    Few-Shot Question Answering by Pretraining Span Selection

    Authors: Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy

    Abstract: In several question answering benchmarks, pretrained models have reached human parity through fine-tuning on an order of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred training examples are available, and observe that standard models perform poorly, highlighting the discrepancy between current pretraining objectives and question an… ▽ More

    Submitted 2 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: Accepted to ACL 2021

  10. arXiv:2101.00434  [pdf, other

    cs.CL

    Coreference Resolution without Span Representations

    Authors: Yuval Kirstain, Ori Ram, Omer Levy

    Abstract: The introduction of pretrained language models has reduced many complex task-specific NLP models to simple lightweight layers. An exception to this trend is coreference resolution, where a sophisticated task-specific model is appended to a pretrained transformer encoder. While highly effective, the model has a very large memory footprint -- primarily due to dynamically-constructed span and span-pa… ▽ More

    Submitted 31 May, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: Accepted to ACL 2021