Skip to main content

Showing 1–14 of 14 results for author: Chefer, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.02138  [pdf, ps, other

    cs.LG

    Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability

    Authors: Yarden Bakish, Itamar Zimerman, Hila Chefer, Lior Wolf

    Abstract: The development of effective explainability tools for Transformers is a crucial pursuit in deep learning research. One of the most promising approaches in this domain is Layer-wise Relevance Propagation (LRP), which propagates relevance scores backward through the network to the input space by redistributing activation values based on predefined rules. However, existing LRP-based methods for Trans… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    ACM Class: I.2.6; I.2.7

  2. arXiv:2506.01144  [pdf, ps, other

    cs.CV

    FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation

    Authors: Ariel Shaulov, Itay Hazan, Lior Wolf, Hila Chefer

    Abstract: Text-to-video diffusion models are notoriously limited in their ability to model temporal aspects such as motion, physics, and dynamic interactions. Existing approaches address this limitation by retraining the model or introducing external conditioning signals to enforce temporal consistency. In this work, we explore whether a meaningful temporal representation can be extracted directly from the… ▽ More

    Submitted 4 June, 2025; v1 submitted 1 June, 2025; originally announced June 2025.

  3. arXiv:2504.06800  [pdf, other

    cs.CV

    A Meaningful Perturbation Metric for Evaluating Explainability Methods

    Authors: Danielle Cohen, Hila Chefer, Lior Wolf

    Abstract: Deep neural networks (DNNs) have demonstrated remarkable success, yet their wide adoption is often hindered by their opaque decision-making. To address this, attribution methods have been proposed to assign relevance values to each part of the input. However, different methods often produce entirely different relevance maps, necessitating the development of standardized metrics to evaluate them. T… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  4. arXiv:2502.02492  [pdf, other

    cs.CV

    VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

    Authors: Hila Chefer, Uriel Singer, Amit Zohar, Yuval Kirstain, Adam Polyak, Yaniv Taigman, Lior Wolf, Shelly Sheynin

    Abstract: Despite tremendous recent progress, generative video models still struggle to capture real-world motion, dynamics, and physics. We show that this limitation arises from the conventional pixel reconstruction objective, which biases models toward appearance fidelity at the expense of motion coherence. To address this, we introduce VideoJAM, a novel framework that instills an effective motion prior t… ▽ More

    Submitted 26 May, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  5. arXiv:2407.08674  [pdf, other

    cs.CV

    Still-Moving: Customized Video Generation without Customized Video Data

    Authors: Hila Chefer, Shiran Zada, Roni Paiss, Ariel Ephrat, Omer Tov, Michael Rubinstein, Lior Wolf, Tali Dekel, Tomer Michaeli, Inbar Mosseri

    Abstract: Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V)… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Webpage: https://still-moving.github.io/ | Video: https://www.youtube.com/watch?v=U7UuV_VIjnA

  6. arXiv:2401.12945  [pdf, other

    cs.CV

    Lumiere: A Space-Time Diffusion Model for Video Generation

    Authors: Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

    Abstract: We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synth… ▽ More

    Submitted 5 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Webpage: https://lumiere-video.github.io/ | Video: https://www.youtube.com/watch?v=wxLr02Dz2Sc

  7. arXiv:2306.00966  [pdf, other

    cs.CV

    The Hidden Language of Diffusion Models

    Authors: Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher, Michal Irani, Inbar Mosseri, Lior Wolf

    Abstract: Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual prompt. However, the internal representations learned by these models remain an enigma. In this work, we present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model. This interpretation is obtained by decomposing t… ▽ More

    Submitted 5 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

  8. arXiv:2303.17155  [pdf, other

    cs.CV cs.AI

    Discriminative Class Tokens for Text-to-Image Diffusion Models

    Authors: Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

    Abstract: Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This approach has two disadvantages: (i) supervised da… ▽ More

    Submitted 9 January, 2025; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: ICCV 2023

  9. arXiv:2301.13826  [pdf, other

    cs.CV cs.CL cs.GR cs.LG

    Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

    Authors: Hila Chefer, Yuval Alaluf, Yael Vinker, Lior Wolf, Daniel Cohen-Or

    Abstract: Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of cata… ▽ More

    Submitted 31 May, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted to SIGGRAPH 2023; Project page available at https://yuval-alaluf.github.io/Attend-and-Excite/

  10. arXiv:2206.01161  [pdf, other

    cs.CV

    Optimizing Relevance Maps of Vision Transformers Improves Robustness

    Authors: Hila Chefer, Idan Schwartz, Lior Wolf

    Abstract: It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes. To alleviate this shortcoming, we propose to monitor the model's relevancy signal and manipulate it such that the model is focused on the foreground object. This is done as a finetuning step, involving relatively few samp… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  11. arXiv:2204.04908  [pdf, other

    cs.CV

    No Token Left Behind: Explainability-Aided Image Classification and Generation

    Authors: Roni Paiss, Hila Chefer, Lior Wolf

    Abstract: The application of zero-shot learning in computer vision has been revolutionized by the use of image-text matching models. The most notable example, CLIP, has been widely used for both zero-shot classification and guiding generative models with a text prompt. However, the zero-shot use of CLIP is unstable with respect to the phrasing of the input text, making it necessary to carefully engineer the… ▽ More

    Submitted 6 August, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

  12. arXiv:2110.12427  [pdf, other

    cs.CV

    Image-Based CLIP-Guided Essence Transfer

    Authors: Hila Chefer, Sagie Benaim, Roni Paiss, Lior Wolf

    Abstract: We make the distinction between (i) style transfer, in which a source image is manipulated to match the textures and colors of a target image, and (ii) essence transfer, in which one edits the source image to include high-level semantic attributes from the target. Crucially, the semantic attributes that constitute the essence of an image may differ from image to image. Our blending operator combin… ▽ More

    Submitted 11 October, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

    Comments: To appear in ECCV'22

  13. arXiv:2103.15679  [pdf, other

    cs.CV cs.LG

    Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

    Authors: Hila Chefer, Shir Gur, Lior Wolf

    Abstract: Transformers are increasingly dominating multi-modal reasoning tasks, such as visual question answering, achieving state-of-the-art results thanks to their ability to contextualize information using the self-attention and co-attention mechanisms. These attention modules also play a role in other computer vision tasks including object detection and image segmentation. Unlike Transformers that only… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

  14. arXiv:2012.09838  [pdf, other

    cs.CV

    Transformer Interpretability Beyond Attention Visualization

    Authors: Hila Chefer, Shir Gur, Lior Wolf

    Abstract: Self-attention techniques, and specifically Transformers, are dominating the field of text processing and are becoming increasingly popular in computer vision classification tasks. In order to visualize the parts of the image that led to a certain classification, existing methods either rely on the obtained attention maps or employ heuristic propagation along the attention graph. In this work, we… ▽ More

    Submitted 5 April, 2021; v1 submitted 17 December, 2020; originally announced December 2020.