Search | arXiv e-print repository

InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems

Authors: Noam Elata, Hyungjin Chung, Jong Chul Ye, Tomer Michaeli, Michael Elad

Abstract: Diffusion Models have demonstrated remarkable capabilities in handling inverse problems, offering high-quality posterior-sampling-based solutions. Despite significant advances, a fundamental trade-off persists, regarding the way the conditioned synthesis is employed: Training-based methods achieve high quality results, while zero-shot approaches trade this with flexibility. This work introduces a… ▽ More Diffusion Models have demonstrated remarkable capabilities in handling inverse problems, offering high-quality posterior-sampling-based solutions. Despite significant advances, a fundamental trade-off persists, regarding the way the conditioned synthesis is employed: Training-based methods achieve high quality results, while zero-shot approaches trade this with flexibility. This work introduces a framework that combines the best of both worlds -- the strong performance of supervised approaches and the flexibility of zero-shot methods. This is achieved through a novel architectural design that seamlessly integrates the degradation operator directly into the denoiser. In each block, our proposed architecture applies the degradation operator on the network activations and conditions the output using the attention mechanism, enabling adaptation to diverse degradation scenarios while maintaining high performance. Our work demonstrates the versatility of the proposed architecture, operating as a general MMSE estimator, a posterior sampler, or a Neural Posterior Principal Component estimator. This flexibility enables a wide range of downstream tasks, highlighting the broad applicability of our framework. The proposed modification of the denoiser network offers a versatile, accurate, and computationally efficient solution, demonstrating the advantages of dedicated network architectures for complex inverse problems. Experimental results on the FFHQ and ImageNet datasets demonstrate state-of-the-art posterior-sampling performance, surpassing both training-based and zero-shot alternatives. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.00873 [pdf, other]

Input to the ESPPU: The LUXE Experiment

Authors: H. Abramowicz, M. Almanza Soto, M. Altarelli, R. Aßmann, A. Athanassiadis, G. Avoni, T. Behnke, M. Benettoni, Y. Benhammou, J. Bhatt, T. Blackburn, C. Blanch, S. Bonaldo, S. Boogert, O. Borysov, M. Borysova, V. Boudry, D. Breton, R. Brinkmann, M. Bruschi, F. Burkart, K. Büßer, N. Cavanagh, F. Dal Corso, W. Decking , et al. (108 additional authors not shown)

Abstract: This document presents an overview of LUXE (Laser Und XFEL Experiment), an experiment that will combine the high-quality and high-energy electron beam of the European XFEL with a high-intensity laser, to explore the uncharted terrain of strong-field quantum electrodynamics. The scientific case, facility, and detector setup are presented together with an overview of the foreseen timeline and expect… ▽ More This document presents an overview of LUXE (Laser Und XFEL Experiment), an experiment that will combine the high-quality and high-energy electron beam of the European XFEL with a high-intensity laser, to explore the uncharted terrain of strong-field quantum electrodynamics. The scientific case, facility, and detector setup are presented together with an overview of the foreseen timeline and expected capital costs. △ Less

Submitted 1 April, 2025; originally announced April 2025.

arXiv:2502.01189 [pdf, other]

Compressed Image Generation with Denoising Diffusion Codebook Models

Authors: Guy Ohayon, Hila Manor, Tomer Michaeli, Michael Elad

Abstract: We present a novel generative approach based on Denoising Diffusion Models (DDMs), which produces high-quality image samples along with their losslessly compressed bit-stream representations. This is obtained by replacing the standard Gaussian noise sampling in the reverse diffusion with a selection of noise samples from pre-defined codebooks of fixed iid Gaussian vectors. Surprisingly, we find th… ▽ More We present a novel generative approach based on Denoising Diffusion Models (DDMs), which produces high-quality image samples along with their losslessly compressed bit-stream representations. This is obtained by replacing the standard Gaussian noise sampling in the reverse diffusion with a selection of noise samples from pre-defined codebooks of fixed iid Gaussian vectors. Surprisingly, we find that our method, termed Denoising Diffusion Codebook Model (DDCM), retains sample quality and diversity of standard DDMs, even for extremely small codebooks. We leverage DDCM and pick the noises from the codebooks that best match a given image, converting our generative model into a highly effective lossy image codec achieving state-of-the-art perceptual image compression results. More generally, by setting other noise selections rules, we extend our compression method to any conditional image generation task (e.g., image restoration), where the generated images are produced jointly with their condensed bit-stream representations. Our work is accompanied by a mathematical interpretation of the proposed compressed conditional generation schemes, establishing a connection with score-based approximations of posterior samplers for the tasks considered. △ Less

Submitted 10 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

Comments: Code and demo are available at https://ddcm-2025.github.io/

arXiv:2502.00180 [pdf, other]

Spectral Analysis of Diffusion Models with Application to Schedule Design

Authors: Roi Benita, Michael Elad, Joseph Keshet

Abstract: Diffusion models (DMs) have emerged as powerful tools for modeling complex data distributions and generating realistic new samples. Over the years, advanced architectures and sampling methods have been developed to make these models practically usable. However, certain synthesis process decisions still rely on heuristics without a solid theoretical foundation. In our work, we offer a novel analysi… ▽ More Diffusion models (DMs) have emerged as powerful tools for modeling complex data distributions and generating realistic new samples. Over the years, advanced architectures and sampling methods have been developed to make these models practically usable. However, certain synthesis process decisions still rely on heuristics without a solid theoretical foundation. In our work, we offer a novel analysis of the DM's inference process, introducing a comprehensive frequency response perspective. Specifically, by relying on Gaussianity assumption, we present the inference process as a closed-form spectral transfer function, capturing how the generated signal evolves in response to the initial noise. We demonstrate how the proposed analysis can be leveraged to design a noise schedule that aligns effectively with the characteristics of the data. The spectral perspective also provides insights into the underlying dynamics and sheds light on the relationship between spectral properties and noise schedule structure. Our results lead to scheduling curves that are dependent on the spectral content of the data, offering a theoretical justification for some of the heuristics taken by practitioners. △ Less

Submitted 31 May, 2025; v1 submitted 31 January, 2025; originally announced February 2025.

arXiv:2501.12102 [pdf, other]

Proxies for Distortion and Consistency with Applications for Real-World Image Restoration

Authors: Sean Man, Guy Ohayon, Ron Raphaeli, Michael Elad

Abstract: Real-world image restoration deals with the recovery of images suffering from an unknown degradation. This task is typically addressed while being given only degraded images, without their corresponding ground-truth versions. In this hard setting, designing and evaluating restoration algorithms becomes highly challenging. This paper offers a suite of tools that can serve both the design and assess… ▽ More Real-world image restoration deals with the recovery of images suffering from an unknown degradation. This task is typically addressed while being given only degraded images, without their corresponding ground-truth versions. In this hard setting, designing and evaluating restoration algorithms becomes highly challenging. This paper offers a suite of tools that can serve both the design and assessment of real-world image restoration algorithms. Our work starts by proposing a trained model that predicts the chain of degradations a given real-world measured input has gone through. We show how this estimator can be used to approximate the consistency -- the match between the measurements and any proposed recovered image. We also use this estimator as a guiding force for the design of a simple and highly-effective plug-and-play real-world image restoration algorithm, leveraging a pre-trained diffusion-based image prior. Furthermore, this work proposes no-reference proxy measures of MSE and LPIPS, which, without access to the ground-truth images, allow ranking of real-world image restoration algorithms according to their (approximate) MSE and LPIPS. The proposed suite provides a versatile, first of its kind framework for evaluating and comparing blind image restoration algorithms in real-world scenarios. △ Less

Submitted 21 January, 2025; originally announced January 2025.

Comments: Project page in https://man-sean.github.io/elad-website/

arXiv:2501.11746 [pdf, other]

SILO: Solving Inverse Problems with Latent Operators

Authors: Ron Raphaeli, Sean Man, Michael Elad

Abstract: Consistent improvement of image priors over the years has led to the development of better inverse problem solvers. Diffusion models are the newcomers to this arena, posing the strongest known prior to date. Recently, such models operating in a latent space have become increasingly predominant due to their efficiency. In recent works, these models have been applied to solve inverse problems. Worki… ▽ More Consistent improvement of image priors over the years has led to the development of better inverse problem solvers. Diffusion models are the newcomers to this arena, posing the strongest known prior to date. Recently, such models operating in a latent space have become increasingly predominant due to their efficiency. In recent works, these models have been applied to solve inverse problems. Working in the latent space typically requires multiple applications of an Autoencoder during the restoration process, which leads to both computational and restoration quality challenges. In this work, we propose a new approach for handling inverse problems with latent diffusion models, where a learned degradation function operates within the latent space, emulating a known image space degradation. Usage of the learned operator reduces the dependency on the Autoencoder to only the initial and final steps of the restoration process, facilitating faster sampling and superior restoration quality. We demonstrate the effectiveness of our method on a variety of image restoration tasks and datasets, achieving significant improvements over prior art. △ Less

Submitted 20 January, 2025; originally announced January 2025.

Comments: Project page in https://ronraphaeli.github.io/SILO-website/

arXiv:2501.07431 [pdf, other]

Novel Silicon and GaAs Sensors for Compact Sampling Calorimeters

Authors: H. Abramowicz, M. Almanza Soto, Y. Benhammou, W. Daniluk, M. Elad, M. Firlej, T. Fiutowski, V. Ghenescu, G. Grzelak, D. Horn, S. Huang, M. Idzik, A. Irles, J. Kotula, A. Levy, I. Levy, W. Lohmann, J. Morón, A. T. Neagu, D. Pietruch, P. M. Potlog, K. Świentek, A. F. Żarnecki, K. Zembaczyński

Abstract: Two samples of silicon pad sensors and two samples of GaAs sensors are studied in an electron beam with 5 GeV energy from the DESY-II test-beam facility. The sizes of the silicon and GaAs sensors are about 9$\times$9 cm$^2$ and 5$\times$8 cm$^2$, respectively. The thickness is 500 micrometer for both the silicon and GaAs sensors. The pad size is about 5$\times$5 mm$^2$. The sensors are foreseen to… ▽ More Two samples of silicon pad sensors and two samples of GaAs sensors are studied in an electron beam with 5 GeV energy from the DESY-II test-beam facility. The sizes of the silicon and GaAs sensors are about 9$\times$9 cm$^2$ and 5$\times$8 cm$^2$, respectively. The thickness is 500 micrometer for both the silicon and GaAs sensors. The pad size is about 5$\times$5 mm$^2$. The sensors are foreseen to be used in a compact electromagnetic sampling calorimeter. The readout of the pads is done by metal traces connected to the pads and the front-end ASICs at the edges of the sensors. For the silicon sensors, copper traces on a Kapton foil are connected to the sensor pads with conducting glue. The pads of the GaAs sensors are connected to bond-pads via aluminium traces on the sensor substrate. The readout is based on a dedicated front-end ASIC, called FLAME. Pre-processing of the raw data and deconvolution is performed with FPGAs. The whole system is orchestrated by a Trigger Logic Unit. Results are shown for the signal-to-noise ratio, the homogeneity of the response, edge effects on pads, and for signals due to the readout traces. △ Less

Submitted 13 January, 2025; originally announced January 2025.

Comments: 22 pages, 24 figures, submitted to The European Physical Journal C

arXiv:2410.00418 [pdf, other]

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

Authors: Guy Ohayon, Tomer Michaeli, Michael Elad

Abstract: Photo-realistic image restoration algorithms are typically evaluated by distortion measures (e.g., PSNR, SSIM) and by perceptual quality measures (e.g., FID, NIQE), where the desire is to attain the lowest possible distortion without compromising on perceptual quality. To achieve this goal, current methods commonly attempt to sample from the posterior distribution, or to optimize a weighted sum of… ▽ More Photo-realistic image restoration algorithms are typically evaluated by distortion measures (e.g., PSNR, SSIM) and by perceptual quality measures (e.g., FID, NIQE), where the desire is to attain the lowest possible distortion without compromising on perceptual quality. To achieve this goal, current methods commonly attempt to sample from the posterior distribution, or to optimize a weighted sum of a distortion loss (e.g., MSE) and a perceptual quality loss (e.g., GAN). Unlike previous works, this paper is concerned specifically with the optimal estimator that minimizes the MSE under a constraint of perfect perceptual index, namely where the distribution of the reconstructed images is equal to that of the ground-truth ones. A recent theoretical result shows that such an estimator can be constructed by optimally transporting the posterior mean prediction (MMSE estimate) to the distribution of the ground-truth images. Inspired by this result, we introduce Posterior-Mean Rectified Flow (PMRF), a simple yet highly effective algorithm that approximates this optimal estimator. In particular, PMRF first predicts the posterior mean, and then transports the result to a high-quality image using a rectified flow model that approximates the desired optimal transport map. We investigate the theoretical utility of PMRF and demonstrate that it consistently outperforms previous methods on a variety of image restoration tasks. △ Less

Submitted 4 February, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

Comments: Accepted to ICLR 2025. Code and demo are available at https://pmrf-ml.github.io/

arXiv:2408.17046 [pdf, other]

Text-to-Image Generation Via Energy-Based CLIP

Authors: Roy Ganz, Michael Elad

Abstract: Joint Energy Models (JEMs), while drawing significant research attention, have not been successfully scaled to real-world, high-resolution datasets. We present EB-CLIP, a novel approach extending JEMs to the multimodal vision-language domain using CLIP, integrating both generative and discriminative objectives. For the generative objective, we introduce an image-text joint-energy function based on… ▽ More Joint Energy Models (JEMs), while drawing significant research attention, have not been successfully scaled to real-world, high-resolution datasets. We present EB-CLIP, a novel approach extending JEMs to the multimodal vision-language domain using CLIP, integrating both generative and discriminative objectives. For the generative objective, we introduce an image-text joint-energy function based on Cosine similarity in the CLIP space, training CLIP to assign low energy to real image-caption pairs and high energy otherwise. For the discriminative objective, we employ contrastive adversarial loss, extending the adversarial training objective to the multimodal domain. EB-CLIP not only generates realistic images from text but also achieves competitive results on the compositionality benchmark, outperforming leading methods with fewer parameters. Additionally, we demonstrate the superior guidance capability of EB-CLIP by enhancing CLIP-based generative frameworks and converting unconditional diffusion models to text-based ones. Lastly, we show that EB-CLIP can serve as a more robust evaluation metric for text-to-image generative tasks than CLIP. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2407.15153 [pdf, other]

Anchored Diffusion for Video Face Reenactment

Authors: Idan Kligvasser, Regev Cohen, George Leifman, Ehud Rivlin, Michael Elad

Abstract: Video generation has drawn significant interest recently, pushing the development of large-scale models capable of producing realistic videos with coherent motion. Due to memory constraints, these models typically generate short video segments that are then combined into long videos. The merging process poses a significant challenge, as it requires ensuring smooth transitions and overall consisten… ▽ More Video generation has drawn significant interest recently, pushing the development of large-scale models capable of producing realistic videos with coherent motion. Due to memory constraints, these models typically generate short video segments that are then combined into long videos. The merging process poses a significant challenge, as it requires ensuring smooth transitions and overall consistency. In this paper, we introduce Anchored Diffusion, a novel method for synthesizing relatively long and seamless videos. We extend Diffusion Transformers (DiTs) to incorporate temporal information, creating our sequence-DiT (sDiT) model for generating short video segments. Unlike previous works, we train our model on video sequences with random non-uniform temporal spacing and incorporate temporal information via external guidance, increasing flexibility and allowing it to capture both short and long-term relationships. Furthermore, during inference, we leverage the transformer architecture to modify the diffusion process, generating a batch of non-uniform sequences anchored to a common frame, ensuring consistency regardless of temporal distance. To demonstrate our method, we focus on face reenactment, the task of creating a video from a source image that replicates the facial expressions and movements from a driving video. Through comprehensive experiments, we show our approach outperforms current techniques in producing longer consistent high-quality videos while offering editing capabilities. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.09896 [pdf, other]

PSC: Posterior Sampling-Based Compression

Authors: Noam Elata, Tomer Michaeli, Michael Elad

Abstract: Diffusion models have transformed the landscape of image generation and now show remarkable potential for image compression. Most of the recent diffusion-based compression methods require training and are tailored for a specific bit-rate. In this work, we propose Posterior Sampling-based Compression (PSC) - a zero-shot compression method that leverages a pre-trained diffusion model as its sole neu… ▽ More Diffusion models have transformed the landscape of image generation and now show remarkable potential for image compression. Most of the recent diffusion-based compression methods require training and are tailored for a specific bit-rate. In this work, we propose Posterior Sampling-based Compression (PSC) - a zero-shot compression method that leverages a pre-trained diffusion model as its sole neural network component, thus enabling the use of diverse, publicly available models without additional training. Our approach is inspired by transform coding methods, which encode the image in some pre-chosen transform domain. However, PSC constructs a transform that is adaptive to the image. This is done by employing a zero-shot diffusion-based posterior sampler so as to progressively construct the rows of the transform matrix. Each new chunk of rows is chosen to reduce the uncertainty about the image given the quantized measurements collected thus far. Importantly, the same adaptive scheme can be replicated at the decoder, thus avoiding the need to encode the transform itself. We demonstrate that even with basic quantization and entropy coding, PSC's performance is comparable to established training-based methods in terms of rate, distortion, and perceptual quality. This is while providing greater flexibility, allowing to choose at inference time any desired rate or distortion. △ Less

Submitted 5 February, 2025; v1 submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.08256 [pdf, other]

Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling

Authors: Noam Elata, Tomer Michaeli, Michael Elad

Abstract: Compressed Sensing (CS) facilitates rapid image acquisition by selecting a small subset of measurements sufficient for high-fidelity reconstruction. Adaptive CS seeks to further enhance this process by dynamically choosing future measurements based on information gleaned from data that is already acquired. However, many existing frameworks are often tailored to specific tasks and require intricate… ▽ More Compressed Sensing (CS) facilitates rapid image acquisition by selecting a small subset of measurements sufficient for high-fidelity reconstruction. Adaptive CS seeks to further enhance this process by dynamically choosing future measurements based on information gleaned from data that is already acquired. However, many existing frameworks are often tailored to specific tasks and require intricate training procedures. We propose AdaSense, a novel Adaptive CS approach that leverages zero-shot posterior sampling with pre-trained diffusion models. By sequentially sampling from the posterior distribution, we can quantify the uncertainty of each possible future linear measurement throughout the acquisition process. AdaSense eliminates the need for additional training and boasts seamless adaptation to diverse domains with minimal tuning requirements. Our experiments demonstrate the effectiveness of AdaSense in reconstructing facial images from a small number of measurements. Furthermore, we apply AdaSense for active acquisition of medical images in the domains of magnetic resonance imaging (MRI) and computed tomography (CT), highlighting its potential for tangible real-world acceleration. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Published in European Conference on Computer Vision (ECCV) 2024

arXiv:2405.16260 [pdf, other]

Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination

Authors: Shelly Golan, Roy Ganz, Michael Elad

Abstract: The recently introduced Consistency models pose an efficient alternative to diffusion algorithms, enabling rapid and good quality image synthesis. These methods overcome the slowness of diffusion models by directly mapping noise to data, while maintaining a (relatively) simpler training. Consistency models enable a fast one- or few-step generation, but they typically fall somewhat short in sample… ▽ More The recently introduced Consistency models pose an efficient alternative to diffusion algorithms, enabling rapid and good quality image synthesis. These methods overcome the slowness of diffusion models by directly mapping noise to data, while maintaining a (relatively) simpler training. Consistency models enable a fast one- or few-step generation, but they typically fall somewhat short in sample quality when compared to their diffusion origins. In this work we propose a novel and highly effective technique for post-processing Consistency-based generated images, enhancing their perceptual quality. Our approach utilizes a joint classifier-discriminator model, in which both portions are trained adversarially. While the classifier aims to grade an image based on its assignment to a designated class, the discriminator portion of the very same network leverages the softmax values to assess the proximity of the input image to the targeted data manifold, thereby serving as an Energy-based Model. By employing example-specific projected gradient iterations under the guidance of this joint machine, we refine synthesized images and achieve an improved FID scores on the ImageNet 64x64 dataset for both Consistency-Training and Consistency-Distillation techniques. △ Less

Submitted 28 November, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.13805 [pdf, other]

Perceptual Fairness in Image Restoration

Authors: Guy Ohayon, Michael Elad, Tomer Michaeli

Abstract: Fairness in image restoration tasks is the desire to treat different sub-groups of images equally well. Existing definitions of fairness in image restoration are highly restrictive. They consider a reconstruction to be a correct outcome for a group (e.g., women) only if it falls within the group's set of ground truth images (e.g., natural images of women); otherwise, it is considered entirely inco… ▽ More Fairness in image restoration tasks is the desire to treat different sub-groups of images equally well. Existing definitions of fairness in image restoration are highly restrictive. They consider a reconstruction to be a correct outcome for a group (e.g., women) only if it falls within the group's set of ground truth images (e.g., natural images of women); otherwise, it is considered entirely incorrect. Consequently, such definitions are prone to controversy, as errors in image restoration can manifest in various ways. In this work we offer an alternative approach towards fairness in image restoration, by considering the Group Perceptual Index (GPI), which we define as the statistical distance between the distribution of the group's ground truth images and the distribution of their reconstructions. We assess the fairness of an algorithm by comparing the GPI of different groups, and say that it achieves perfect Perceptual Fairness (PF) if the GPIs of all groups are identical. We motivate and theoretically study our new notion of fairness, draw its connection to previous ones, and demonstrate its utility on state-of-the-art face image restoration algorithms. △ Less

Submitted 12 October, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11566 [pdf, other]

Uncertainty-Aware PPG-2-ECG for Enhanced Cardiovascular Diagnosis using Diffusion Models

Authors: Omer Belhasin, Idan Kligvasser, George Leifman, Regev Cohen, Erin Rainaldi, Li-Fang Cheng, Nishant Verma, Paul Varghese, Ehud Rivlin, Michael Elad

Abstract: Analyzing the cardiovascular system condition via Electrocardiography (ECG) is a common and highly effective approach, and it has been practiced and perfected over many decades. ECG sensing is non-invasive and relatively easy to acquire, and yet it is still cumbersome for holter monitoring tests that may span over hours and even days. A possible alternative in this context is Photoplethysmography… ▽ More Analyzing the cardiovascular system condition via Electrocardiography (ECG) is a common and highly effective approach, and it has been practiced and perfected over many decades. ECG sensing is non-invasive and relatively easy to acquire, and yet it is still cumbersome for holter monitoring tests that may span over hours and even days. A possible alternative in this context is Photoplethysmography (PPG): An optically-based signal that measures blood volume fluctuations, as typically sensed by conventional ``wearable devices''. While PPG presents clear advantages in acquisition, convenience, and cost-effectiveness, ECG provides more comprehensive information, allowing for a more precise detection of heart conditions. This implies that a conversion from PPG to ECG, as recently discussed in the literature, inherently involves an unavoidable level of uncertainty. In this paper we introduce a novel methodology for addressing the PPG-2-ECG conversion, and offer an enhanced classification of cardiovascular conditions using the given PPG, all while taking into account the uncertainties arising from the conversion process. We provide a mathematical justification for our proposed computational approach, and present empirical studies demonstrating its superior performance compared to state-of-the-art baseline methods. △ Less

Submitted 20 April, 2025; v1 submitted 19 May, 2024; originally announced May 2024.

arXiv:2402.00857 [pdf, other]

Early Time Classification with Accumulated Accuracy Gap Control

Authors: Liran Ringel, Regev Cohen, Daniel Freedman, Michael Elad, Yaniv Romano

Abstract: Early time classification algorithms aim to label a stream of features without processing the full input stream, while maintaining accuracy comparable to that achieved by applying the classifier to the entire input. In this paper, we introduce a statistical framework that can be applied to any sequential classifier, formulating a calibrated stopping rule. This data-driven rule attains finite-sampl… ▽ More Early time classification algorithms aim to label a stream of features without processing the full input stream, while maintaining accuracy comparable to that achieved by applying the classifier to the entire input. In this paper, we introduce a statistical framework that can be applied to any sequential classifier, formulating a calibrated stopping rule. This data-driven rule attains finite-sample, distribution-free control of the accuracy gap between full and early-time classification. We start by presenting a novel method that builds on the Learn-then-Test calibration framework to control this gap marginally, on average over i.i.d. instances. As this algorithm tends to yield an excessively high accuracy gap for early halt times, our main contribution is the proposal of a framework that controls a stronger notion of error, where the accuracy gap is controlled conditionally on the accumulated halt times. Numerical experiments demonstrate the effectiveness, applicability, and usefulness of our method. We show that our proposed early stopping mechanism reduces up to 94% of timesteps used for classification while achieving rigorous accuracy gap control. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2311.09253 [pdf, other]

The Perception-Robustness Tradeoff in Deterministic Image Restoration

Authors: Guy Ohayon, Tomer Michaeli, Michael Elad

Abstract: We study the behavior of deterministic methods for solving inverse problems in imaging. These methods are commonly designed to achieve two goals: (1) attaining high perceptual quality, and (2) generating reconstructions that are consistent with the measurements. We provide a rigorous proof that the better a predictor satisfies these two requirements, the larger its Lipschitz constant must be, rega… ▽ More We study the behavior of deterministic methods for solving inverse problems in imaging. These methods are commonly designed to achieve two goals: (1) attaining high perceptual quality, and (2) generating reconstructions that are consistent with the measurements. We provide a rigorous proof that the better a predictor satisfies these two requirements, the larger its Lipschitz constant must be, regardless of the nature of the degradation involved. In particular, to approach perfect perceptual quality and perfect consistency, the Lipschitz constant of the model must grow to infinity. This implies that such methods are necessarily more susceptible to adversarial attacks. We demonstrate our theory on single image super-resolution algorithms, addressing both noisy and noiseless settings. We also show how this undesired behavior can be leveraged to explore the posterior distribution, thereby allowing the deterministic model to imitate stochastic methods. △ Less

Submitted 8 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.01381 [pdf, other]

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation

Authors: Roi Benita, Michael Elad, Joseph Keshet

Abstract: Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a vocoder). This work proposes a diffusion probabilistic end-to-end model for generating a raw speech waveform. The proposed model is autoregressive, g… ▽ More Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a vocoder). This work proposes a diffusion probabilistic end-to-end model for generating a raw speech waveform. The proposed model is autoregressive, generating overlapping frames sequentially, where each frame is conditioned on a portion of the previously generated one. Hence, our model can effectively synthesize an unlimited speech duration while preserving high-fidelity synthesis and temporal coherence. We implemented the proposed model for unconditional and conditional speech generation, where the latter can be driven by an input sequence of phonemes, amplitudes, and pitch values. Working on the waveform directly has some empirical advantages. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural. Furthermore, the proposed diffusion model is stochastic and not deterministic; therefore, each inference generates a slightly different waveform variation, enabling abundance of valid realizations. Experiments show that the proposed model generates speech with superior quality compared with other state-of-the-art neural speech generation systems. △ Less

Submitted 10 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2308.00515 [pdf, other]

Technical Design Report for the LUXE Experiment

Authors: H. Abramowicz, M. Almanza Soto, M. Altarelli, R. Aßmann, A. Athanassiadis, G. Avoni, T. Behnke, M. Benettoni, Y. Benhammou, J. Bhatt, T. Blackburn, C. Blanch, S. Bonaldo, S. Boogert, O. Borysov, M. Borysova, V. Boudry, D. Breton, R. Brinkmann, M. Bruschi, F. Burkart, K. Büßer, N. Cavanagh, F. Dal Corso, W. Decking , et al. (109 additional authors not shown)

Abstract: This Technical Design Report presents a detailed description of all aspects of the LUXE (Laser Und XFEL Experiment), an experiment that will combine the high-quality and high-energy electron beam of the European XFEL with a high-intensity laser, to explore the uncharted terrain of strong-field quantum electrodynamics characterised by both high energy and high intensity, reaching the Schwinger fiel… ▽ More This Technical Design Report presents a detailed description of all aspects of the LUXE (Laser Und XFEL Experiment), an experiment that will combine the high-quality and high-energy electron beam of the European XFEL with a high-intensity laser, to explore the uncharted terrain of strong-field quantum electrodynamics characterised by both high energy and high intensity, reaching the Schwinger field and beyond. The further implications for the search of physics beyond the Standard Model are also discussed. △ Less

Submitted 2 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

arXiv:2306.16805 [pdf, other]

CLIPAG: Towards Generator-Free Text-to-Image Generation

Authors: Roy Ganz, Michael Elad

Abstract: Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings. While this phenomenon has gained significant research attention, it was solely studied in the context of unimodal vision-only architectures. In this work, we extend the study of PAG to Vision-Langu… ▽ More Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings. While this phenomenon has gained significant research attention, it was solely studied in the context of unimodal vision-only architectures. In this work, we extend the study of PAG to Vision-Language architectures, which form the foundations for diverse image-text tasks and applications. Through an adversarial robustification finetuning of CLIP, we demonstrate that robust Vision-Language models exhibit PAG in contrast to their vanilla counterparts. This work reveals the merits of CLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, we show that seamlessly integrating CLIPAG in a "plug-n-play" manner leads to substantial improvements in vision-language generative applications. Furthermore, leveraging its PAG property, CLIPAG enables text-to-image generation without any generative model, which typically requires huge generators. △ Less

Submitted 1 September, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.02342 [pdf, other]

Deep Optimal Transport: A Practical Algorithm for Photo-realistic Image Restoration

Authors: Theo Adrai, Guy Ohayon, Tomer Michaeli, Michael Elad

Abstract: We propose an image restoration algorithm that can control the perceptual quality and/or the mean square error (MSE) of any pre-trained model, trading one over the other at test time. Our algorithm is few-shot: Given about a dozen images restored by the model, it can significantly improve the perceptual quality and/or the MSE of the model for newly restored images without further training. Our app… ▽ More We propose an image restoration algorithm that can control the perceptual quality and/or the mean square error (MSE) of any pre-trained model, trading one over the other at test time. Our algorithm is few-shot: Given about a dozen images restored by the model, it can significantly improve the perceptual quality and/or the MSE of the model for newly restored images without further training. Our approach is motivated by a recent theoretical result that links between the minimum MSE (MMSE) predictor and the predictor that minimizes the MSE under a perfect perceptual quality constraint. Specifically, it has been shown that the latter can be obtained by optimally transporting the output of the former, such that its distribution matches the source data. Thus, to improve the perceptual quality of a predictor that was originally trained to minimize MSE, we approximate the optimal transport by a linear transformation in the latent space of a variational auto-encoder, which we compute in closed-form using empirical means and covariances. Going beyond the theory, we find that applying the same procedure on models that were initially trained to achieve high perceptual quality, typically improves their perceptual quality even further. And by interpolating the results with the original output of the model, we can improve their MSE on the expense of perceptual quality. We illustrate our method on a variety of degradations applied to general content images of arbitrary dimensions. △ Less

Submitted 12 August, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

arXiv:2305.19066 [pdf, other]

Nested Diffusion Processes for Anytime Image Generation

Authors: Noam Elata, Bahjat Kawar, Tomer Michaeli, Michael Elad

Abstract: Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable i… ▽ More Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final generation result remains comparable. We illustrate the applicability of Nested Diffusion in several settings, including for solving inverse problems, and for rapid text-based content creation by allowing user intervention throughout the sampling process. △ Less

Submitted 30 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.13128 [pdf, other]

GSURE-Based Diffusion Model Training with Corrupted Data

Authors: Bahjat Kawar, Noam Elata, Tomer Michaeli, Michael Elad

Abstract: Diffusion models have demonstrated impressive results in both data generation and downstream tasks such as inverse problems, text-based editing, classification, and more. However, training such models usually requires large amounts of clean signals which are often difficult or impossible to obtain. In this work, we propose a novel training technique for generative diffusion models based only on co… ▽ More Diffusion models have demonstrated impressive results in both data generation and downstream tasks such as inverse problems, text-based editing, classification, and more. However, training such models usually requires large amounts of clean signals which are often difficult or impossible to obtain. In this work, we propose a novel training technique for generative diffusion models based only on corrupted data. We introduce a loss function based on the Generalized Stein's Unbiased Risk Estimator (GSURE), and prove that under some conditions, it is equivalent to the training objective used in fully supervised diffusion models. We demonstrate our technique on face images as well as Magnetic Resonance Imaging (MRI), where the use of undersampled data significantly alleviates data collection costs. Our approach achieves generative performance comparable to its fully supervised counterpart without training on any clean signals. In addition, we deploy the resulting diffusion model in various downstream tasks beyond the degradation present in the training set, showcasing promising results. △ Less

Submitted 13 June, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: Code: https://github.com/bahjat-kawar/gsure-diffusion

arXiv:2305.10124 [pdf, other]

Principal Uncertainty Quantification with Spatial Correlation for Image Restoration Problems

Authors: Omer Belhasin, Yaniv Romano, Daniel Freedman, Ehud Rivlin, Michael Elad

Abstract: Uncertainty quantification for inverse problems in imaging has drawn much attention lately. Existing approaches towards this task define uncertainty regions based on probable values per pixel, while ignoring spatial correlations within the image, resulting in an exaggerated volume of uncertainty. In this paper, we propose PUQ (Principal Uncertainty Quantification) -- a novel definition and corresp… ▽ More Uncertainty quantification for inverse problems in imaging has drawn much attention lately. Existing approaches towards this task define uncertainty regions based on probable values per pixel, while ignoring spatial correlations within the image, resulting in an exaggerated volume of uncertainty. In this paper, we propose PUQ (Principal Uncertainty Quantification) -- a novel definition and corresponding analysis of uncertainty regions that takes into account spatial relationships within the image, thus providing reduced volume regions. Using recent advancements in generative models, we derive uncertainty intervals around principal components of the empirical posterior distribution, forming an ambiguity region that guarantees the inclusion of true unseen values with a user-defined confidence probability. To improve computational efficiency and interpretability, we also guarantee the recovery of true unseen values using only a few principal directions, resulting in more informative uncertainty regions. Our approach is verified through experiments on image colorization, super-resolution, and inpainting; its effectiveness is shown through comparison to baseline methods, demonstrating significantly tighter uncertainty regions. △ Less

Submitted 20 January, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.10090 [pdf, other]

doi 10.1109/ICCVW60793.2023.00249

Semi-supervised Quality Evaluation of Colonoscopy Procedures

Authors: Idan Kligvasser, George Leifman, Roman Goldenberg, Ehud Rivlin, Michael Elad

Abstract: Colonoscopy is the standard of care technique for detecting and removing polyps for the prevention of colorectal cancer. Nevertheless, gastroenterologists (GI) routinely miss approximately 25% of polyps during colonoscopies. These misses are highly operator dependent, influenced by the physician skills, experience, vigilance, and fatigue. Standard quality metrics, such as Withdrawal Time or Cecal… ▽ More Colonoscopy is the standard of care technique for detecting and removing polyps for the prevention of colorectal cancer. Nevertheless, gastroenterologists (GI) routinely miss approximately 25% of polyps during colonoscopies. These misses are highly operator dependent, influenced by the physician skills, experience, vigilance, and fatigue. Standard quality metrics, such as Withdrawal Time or Cecal Intubation Rate, have been shown to be well correlated with Adenoma Detection Rate (ADR). However, those metrics are limited in their ability to assess the quality of a specific procedure, and they do not address quality aspects related to the style or technique of the examination. In this work we design novel online and offline quality metrics, based on visual appearance quality criteria learned by an ML model in an unsupervised way. Furthermore, we evaluate the likelihood of detecting an existing polyp as a function of quality and use it to demonstrate high correlation of the proposed metric to polyp detection sensitivity. The proposed online quality metric can be used to provide real time quality feedback to the performing GI. By integrating the local metric over the withdrawal phase, we build a global, offline quality metric, which is shown to be highly correlated to the standard Polyp Per Colonoscopy (PPC) quality metric. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Journal ref: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

arXiv:2305.10026 [pdf, other]

doi 10.1007/978-3-031-45350-2_9

Colonoscopy Coverage Revisited: Identifying Scanning Gaps in Real-Time

Authors: G. Leifman, I. Kligvasser, R. Goldenberg, M. Elad, E. Rivlin

Abstract: Colonoscopy is the most widely used medical technique for preventing Colorectal Cancer, by detecting and removing polyps before they become malignant. Recent studies show that around one quarter of the existing polyps are routinely missed. While some of these do appear in the endoscopist's field of view, others are missed due to a partial coverage of the colon. The task of detecting and marking un… ▽ More Colonoscopy is the most widely used medical technique for preventing Colorectal Cancer, by detecting and removing polyps before they become malignant. Recent studies show that around one quarter of the existing polyps are routinely missed. While some of these do appear in the endoscopist's field of view, others are missed due to a partial coverage of the colon. The task of detecting and marking unseen regions of the colon has been addressed in recent work, where the common approach is based on dense 3D reconstruction, which proves to be challenging due to lack of 3D ground truth and periods with poor visual content. In this paper we propose a novel and complementary method to detect deficient local coverage in real-time for video segments where a reliable 3D reconstruction is impossible. Our method aims to identify skips along the colon caused by a drifted position of the endoscope during poor visibility time intervals. The proposed solution consists of two phases. During the first, time segments with good visibility of the colon and gaps between them are identified. During the second phase, a trained model operates on each gap, answering the question: Do you observe the same scene before and after the gap? If the answer is negative, the endoscopist is alerted and can be directed to the appropriate area in real-time. The second phase model is trained using a contrastive loss based on auto-generated examples. Our method evaluation on a dataset of 250 procedures annotated by trained physicians provides sensitivity of 0.75 with specificity of 0.9. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 10 pages, 5 figures

Journal ref: MICCAI workshop on Cancer Prevention Through Early Detection, LNCS, Vol. 14295, 2023

arXiv:2303.15409 [pdf, other]

Class-Conditioned Transformation for Enhanced Robust Image Classification

Authors: Tsachi Blau, Roy Ganz, Chaim Baskin, Michael Elad, Alex M. Bronstein

Abstract: Robust classification methods predominantly concentrate on algorithms that address a specific threat model, resulting in ineffective defenses against other threat models. Real-world applications are exposed to this vulnerability, as malicious attackers might exploit alternative threat models. In this work, we propose a novel test-time threat model agnostic algorithm that enhances Adversarial-Train… ▽ More Robust classification methods predominantly concentrate on algorithms that address a specific threat model, resulting in ineffective defenses against other threat models. Real-world applications are exposed to this vulnerability, as malicious attackers might exploit alternative threat models. In this work, we propose a novel test-time threat model agnostic algorithm that enhances Adversarial-Trained (AT) models. Our method operates through COnditional image transformation and DIstance-based Prediction (CODIP) and includes two main steps: First, we transform the input image into each dataset class, where the input image might be either clean or attacked. Next, we make a prediction based on the shortest transformed distance. The conditional transformation utilizes the perceptually aligned gradients property possessed by AT models and, as a result, eliminates the need for additional models or additional training. Moreover, it allows users to choose the desired balance between clean and robust accuracy without training. The proposed method achieves state-of-the-art results demonstrated through extensive experiments on various models, AT methods, datasets, and attack types. Notably, applying CODIP leads to substantial robust accuracy improvement of up to $+23\%$, $+20\%$, $+26\%$, and $+22\%$ on CIFAR10, CIFAR100, ImageNet and Flowers datasets, respectively. △ Less

Submitted 4 November, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2302.04064 [pdf, other]

Weakly-supervised Representation Learning for Video Alignment and Analysis

Authors: Guy Bar-Shalom, George Leifman, Michael Elad, Ehud Rivlin

Abstract: Many tasks in video analysis and understanding boil down to the need for frame-based feature learning, aiming to encapsulate the relevant visual content so as to enable simpler and easier subsequent processing. While supervised strategies for this learning task can be envisioned, self and weakly-supervised alternatives are preferred due to the difficulties in getting labeled data. This paper intro… ▽ More Many tasks in video analysis and understanding boil down to the need for frame-based feature learning, aiming to encapsulate the relevant visual content so as to enable simpler and easier subsequent processing. While supervised strategies for this learning task can be envisioned, self and weakly-supervised alternatives are preferred due to the difficulties in getting labeled data. This paper introduces LRProp -- a novel weakly-supervised representation learning approach, with an emphasis on the application of temporal alignment between pairs of videos of the same action category. The proposed approach uses a transformer encoder for extracting frame-level features, and employs the DTW algorithm within the training iterations in order to identify the alignment path between video pairs. Through a process referred to as ``pair-wise position propagation'', the probability distributions of these correspondences per location are matched with the similarity of the frame-level features via KL-divergence minimization. The proposed algorithm uses also a regularized SoftDTW loss for better tuning the learned features. Our novel representation learning paradigm consistently outperforms the state of the art on temporal alignment tasks, establishing a new performance bar over several downstream video analysis applications. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2301.03362 [pdf, other]

Image Denoising: The Deep Learning Revolution and Beyond -- A Survey Paper --

Authors: Michael Elad, Bahjat Kawar, Gregory Vaksman

Abstract: Image denoising (removal of additive white Gaussian noise from an image) is one of the oldest and most studied problems in image processing. An extensive work over several decades has led to thousands of papers on this subject, and to many well-performing algorithms for this task. Indeed, 10 years ago, these achievements have led some researchers to suspect that "Denoising is Dead", in the sense t… ▽ More Image denoising (removal of additive white Gaussian noise from an image) is one of the oldest and most studied problems in image processing. An extensive work over several decades has led to thousands of papers on this subject, and to many well-performing algorithms for this task. Indeed, 10 years ago, these achievements have led some researchers to suspect that "Denoising is Dead", in the sense that all that can be achieved in this domain has already been obtained. However, this turned out to be far from the truth, with the penetration of deep learning (DL) into image processing. The era of DL brought a revolution to image denoising, both by taking the lead in today's ability for noise removal in images, and by broadening the scope of denoising problems being treated. Our paper starts by describing this evolution, highlighting in particular the tension and synergy that exist between classical approaches and modern DL-based alternatives in design of image denoisers. The recent transitions in the field of image denoising go far beyond the ability to design better denoisers. In the 2nd part of this paper we focus on recently discovered abilities and prospects of image denoisers. We expose the possibility of using denoisers to serve other problems, such as regularizing general inverse problems and serving as the prime engine in diffusion-based image synthesis. We also unveil the idea that denoising and other inverse problems might not have a unique solution as common algorithms would have us believe. Instead, we describe constructive ways to produce randomized and diverse high quality results for inverse problems, all fueled by the progress that DL brought to image denoising. This survey paper aims to provide a broad view of the history of image denoising and closely related topics. Our aim is to give a better context to recent discoveries, and to the influence of DL in our domain. △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2212.03235 [pdf, other]

Complex-valued Retrievals From Noisy Images Using Diffusion Models

Authors: Nadav Torem, Roi Ronen, Yoav Y. Schechner, Michael Elad

Abstract: In diverse microscopy modalities, sensors measure only real-valued intensities. Additionally, the sensor readouts are affected by Poissonian-distributed photon noise. Traditional restoration algorithms typically aim to minimize the mean squared error (MSE) between the original and recovered images. This often leads to blurry outcomes with poor perceptual quality. Recently, deep diffusion models (D… ▽ More In diverse microscopy modalities, sensors measure only real-valued intensities. Additionally, the sensor readouts are affected by Poissonian-distributed photon noise. Traditional restoration algorithms typically aim to minimize the mean squared error (MSE) between the original and recovered images. This often leads to blurry outcomes with poor perceptual quality. Recently, deep diffusion models (DDMs) have proven to be highly capable of sampling images from the a-posteriori probability of the sought variables, resulting in visually pleasing high-quality images. These models have mostly been suggested for real-valued images suffering from Gaussian noise. In this study, we generalize annealed Langevin Dynamics, a type of DDM, to tackle the fundamental challenges in optical imaging of complex-valued objects (and real images) affected by Poisson noise. We apply our algorithm to various optical scenarios, such as Fourier Ptychography, Phase Retrieval, and Poisson denoising. Our algorithm is evaluated on simulations and biological empirical data. △ Less

Submitted 28 July, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: 11 pages, 7figures

MSC Class: I2; I4 ACM Class: I.2; I.4

arXiv:2211.15211 [pdf, other]

What's Behind the Mask: Estimating Uncertainty in Image-to-Image Problems

Authors: Gilad Kutiel, Regev Cohen, Michael Elad, Daniel Freedman

Abstract: Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the ma… ▽ More Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. The mask thus identifies the more certain regions of the reconstructed image. Our approach is agnostic to the underlying image-to-image network, and only requires triples of the input (degraded), reconstructed and true images for training. Furthermore, our method is agnostic to the distance metric used. As a result, one can use $L_p$-style distances or perceptual distances like LPIPS, which contrasts with interval-based approaches to uncertainty. Our theoretical guarantees derive from a conformal calibration procedure. We evaluate our mask-based approach to uncertainty on image colorization, image completion, and super-resolution tasks, demonstrating high quality performance on each. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.11827 [pdf, other]

High-Perceptual Quality JPEG Decoding via Posterior Sampling

Authors: Sean Man, Guy Ohayon, Theo Adrai, Michael Elad

Abstract: JPEG is arguably the most popular image coding format, achieving high compression ratios via lossy quantization that may create visual artifacts degradation. Numerous attempts to remove these artifacts were conceived over the years, and common to most of these is the use of deterministic post-processing algorithms that optimize some distortion measure (e.g., PSNR, SSIM). In this paper we propose a… ▽ More JPEG is arguably the most popular image coding format, achieving high compression ratios via lossy quantization that may create visual artifacts degradation. Numerous attempts to remove these artifacts were conceived over the years, and common to most of these is the use of deterministic post-processing algorithms that optimize some distortion measure (e.g., PSNR, SSIM). In this paper we propose a different paradigm for JPEG artifact correction: Our method is stochastic, and the objective we target is high perceptual quality -- striving to obtain sharp, detailed and visually pleasing reconstructed images, while being consistent with the compressed input. These goals are achieved by training a stochastic conditional generator (conditioned on the compressed input), accompanied by a theoretically well-founded loss term, resulting in a sampler from the posterior distribution. Our solution offers a diverse set of plausible and fast reconstructions for a given input with perfect consistency. We demonstrate our scheme's unique properties and its superiority to a variety of alternative methods on the FFHQ and ImageNet datasets. △ Less

Submitted 30 August, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Presented in NTIRE workshop as part of CVPR 2023

arXiv:2211.09919 [pdf, other]

Patch-Craft Self-Supervised Training for Correlated Image Denoising

Authors: Gregory Vaksman, Michael Elad

Abstract: Supervised neural networks are known to achieve excellent results in various image restoration tasks. However, such training requires datasets composed of pairs of corrupted images and their corresponding ground truth targets. Unfortunately, such data is not available in many applications. For the task of image denoising in which the noise statistics is unknown, several self-supervised training me… ▽ More Supervised neural networks are known to achieve excellent results in various image restoration tasks. However, such training requires datasets composed of pairs of corrupted images and their corresponding ground truth targets. Unfortunately, such data is not available in many applications. For the task of image denoising in which the noise statistics is unknown, several self-supervised training methods have been proposed for overcoming this difficulty. Some of these require knowledge of the noise model, while others assume that the contaminating noise is uncorrelated, both assumptions are too limiting for many practical needs. This work proposes a novel self-supervised training technique suitable for the removal of unknown correlated noise. The proposed approach neither requires knowledge of the noise model nor access to ground truth targets. The input to our algorithm consists of easily captured bursts of noisy shots. Our algorithm constructs artificial patch-craft images from these bursts by patch matching and stitching, and the obtained crafted images are used as targets for the training. Our method does not require registration of the images within the burst. We evaluate the proposed framework through extensive experiments with synthetic and real image noise. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.08944 [pdf, other]

Reasons for the Superiority of Stochastic Estimators over Deterministic Ones: Robustness, Consistency and Perceptual Quality

Authors: Guy Ohayon, Theo Adrai, Michael Elad, Tomer Michaeli

Abstract: Stochastic restoration algorithms allow to explore the space of solutions that correspond to the degraded input. In this paper we reveal additional fundamental advantages of stochastic methods over deterministic ones, which further motivate their use. First, we prove that any restoration algorithm that attains perfect perceptual quality and whose outputs are consistent with the input must be a pos… ▽ More Stochastic restoration algorithms allow to explore the space of solutions that correspond to the degraded input. In this paper we reveal additional fundamental advantages of stochastic methods over deterministic ones, which further motivate their use. First, we prove that any restoration algorithm that attains perfect perceptual quality and whose outputs are consistent with the input must be a posterior sampler, and is thus required to be stochastic. Second, we illustrate that while deterministic restoration algorithms may attain high perceptual quality, this can be achieved only by filling up the space of all possible source images using an extremely sensitive mapping, which makes them highly vulnerable to adversarial attacks. Indeed, we show that enforcing deterministic models to be robust to such attacks profoundly hinders their perceptual quality, while robustifying stochastic models hardly influences their perceptual quality, and improves their output variability. These findings provide a motivation to foster progress in stochastic restoration methods, paving the way to better recovery algorithms. △ Less

Submitted 26 July, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:26474-26494, 2023

arXiv:2209.11888 [pdf, other]

JPEG Artifact Correction using Denoising Diffusion Restoration Models

Authors: Bahjat Kawar, Jiaming Song, Stefano Ermon, Michael Elad

Abstract: Diffusion models can be used as learned priors for solving various inverse problems. However, most existing approaches are restricted to linear inverse problems, limiting their applicability to more general cases. In this paper, we build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some non-linear inverse problems. We leverage the pseudo-inverse operator used… ▽ More Diffusion models can be used as learned priors for solving various inverse problems. However, most existing approaches are restricted to linear inverse problems, limiting their applicability to more general cases. In this paper, we build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some non-linear inverse problems. We leverage the pseudo-inverse operator used in DDRM and generalize this concept for other measurement operators, which allows us to use pre-trained unconditional diffusion models for applications such as JPEG artifact correction. We empirically demonstrate the effectiveness of our approach across various quality factors, attaining performance levels that are on par with state-of-the-art methods trained specifically for the JPEG restoration task. △ Less

Submitted 23 November, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: Presented at NeurIPS 2022 Workshop on Score-Based Methods. Code: https://github.com/bahjat-kawar/ddrm-jpeg

arXiv:2208.08664 [pdf, other]

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

Authors: Bahjat Kawar, Roy Ganz, Michael Elad

Abstract: Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial… ▽ More Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results. △ Less

Submitted 15 March, 2023; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: Accepted to TMLR

arXiv:2207.11378 [pdf, other]

Do Perceptually Aligned Gradients Imply Adversarial Robustness?

Authors: Roy Ganz, Bahjat Kawar, Michael Elad

Abstract: Adversarially robust classifiers possess a trait that non-robust models do not -- Perceptually Aligned Gradients (PAG). Their gradients with respect to the input align well with human perception. Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications. In this work, we focus on this trait and test… ▽ More Adversarially robust classifiers possess a trait that non-robust models do not -- Perceptually Aligned Gradients (PAG). Their gradients with respect to the input align well with human perception. Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications. In this work, we focus on this trait and test whether \emph{Perceptually Aligned Gradients imply Robustness}. To this end, we develop a novel objective to directly promote PAG in training classifiers and examine whether models with such gradients are more robust to adversarial attacks. Extensive experiments on multiple datasets and architectures validate that models with aligned gradients exhibit significant robustness, exposing the surprising bidirectional connection between PAG and robustness. Lastly, we show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques. △ Less

Submitted 9 August, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

arXiv:2207.08089 [pdf, other]

Threat Model-Agnostic Adversarial Defense using Diffusion Models

Authors: Tsachi Blau, Roy Ganz, Bahjat Kawar, Alex Bronstein, Michael Elad

Abstract: Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) --… ▽ More Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) -- training the classification network on images perturbed according to a specific threat model, which defines the magnitude of the allowed modification. Although AT leads to promising results, training on a specific threat model fails to generalize to other types of perturbations. A different approach utilizes a preprocessing step to remove the adversarial perturbation from the attacked image. In this work, we follow the latter path and aim to develop a technique that leads to robust classifiers across various realizations of threat models. To this end, we harness the recent advances in stochastic generative modeling, and means to leverage these for sampling from conditional distributions. Our defense relies on an addition of Gaussian i.i.d noise to the attacked image, followed by a pretrained diffusion process -- an architecture that performs a stochastic iterative process over a denoising network, yielding a high perceptual quality denoised outcome. The obtained robustness with this stochastic preprocessing step is validated through extensive experiments on the CIFAR-10 dataset, showing that our method outperforms the leading defense methods under various threat models. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2201.11793 [pdf, other]

Denoising Diffusion Restoration Models

Authors: Bahjat Kawar, Michael Elad, Stefano Ermon, Jiaming Song

Abstract: Many interesting tasks in image restoration can be cast as linear inverse problems. A recent family of approaches for solving these problems uses stochastic algorithms that sample from the posterior distribution of natural images given the measurements. However, efficient solutions often require problem-specific supervised training to model the posterior, whereas unsupervised methods that are not… ▽ More Many interesting tasks in image restoration can be cast as linear inverse problems. A recent family of approaches for solving these problems uses stochastic algorithms that sample from the posterior distribution of natural images given the measurements. However, efficient solutions often require problem-specific supervised training to model the posterior, whereas unsupervised methods that are not problem-specific typically rely on inefficient iterative methods. This work addresses these issues by introducing Denoising Diffusion Restoration Models (DDRM), an efficient, unsupervised posterior sampling method. Motivated by variational inference, DDRM takes advantage of a pre-trained denoising diffusion generative model for solving any linear inverse problem. We demonstrate DDRM's versatility on several image datasets for super-resolution, deblurring, inpainting, and colorization under various amounts of measurement noise. DDRM outperforms the current leading unsupervised methods on the diverse ImageNet dataset in reconstruction quality, perceptual quality, and runtime, being 5x faster than the nearest competitor. DDRM also generalizes well for natural images out of the distribution of the observed ImageNet training set. △ Less

Submitted 12 October, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: Project page: https://ddrm-ml.github.io/

arXiv:2108.03702 [pdf, other]

BIGRoC: Boosting Image Generation via a Robust Classifier

Authors: Roy Ganz, Michael Elad

Abstract: The interest of the machine learning community in image synthesis has grown significantly in recent years, with the introduction of a wide range of deep generative models and means for training them. In this work, we propose a general model-agnostic technique for improving the image quality and the distribution fidelity of generated images obtained by any generative model. Our method, termed BIGRo… ▽ More The interest of the machine learning community in image synthesis has grown significantly in recent years, with the introduction of a wide range of deep generative models and means for training them. In this work, we propose a general model-agnostic technique for improving the image quality and the distribution fidelity of generated images obtained by any generative model. Our method, termed BIGRoC (Boosting Image Generation via a Robust Classifier), is based on a post-processing procedure via the guidance of a given robust classifier and without a need for additional training of the generative model. Given a synthesized image, we propose to update it through projected gradient steps over the robust classifier to refine its recognition. We demonstrate this post-processing algorithm on various image synthesis methods and show a significant quantitative and qualitative improvement on CIFAR-10 and ImageNet. Surprisingly, although BIGRoC is the first model agnostic among refinement approaches and requires much less information, it outperforms competitive methods. Specifically, BIGRoC improves the image synthesis best performing diffusion model on ImageNet 128x128 by 14.81%, attaining an FID score of 2.53, and on 256x256 by 7.87%, achieving an FID of 3.63. Moreover, we conduct an opinion survey, according to which humans significantly prefer our method's outputs. △ Less

Submitted 1 February, 2023; v1 submitted 8 August, 2021; originally announced August 2021.

Journal ref: Transactions on machine learning research, 2023

arXiv:2105.14951 [pdf, other]

SNIPS: Solving Noisy Inverse Problems Stochastically

Authors: Bahjat Kawar, Gregory Vaksman, Michael Elad

Abstract: In this work we introduce a novel stochastic algorithm dubbed SNIPS, which draws samples from the posterior distribution of any linear inverse problem, where the observation is assumed to be contaminated by additive white Gaussian noise. Our solution incorporates ideas from Langevin dynamics and Newton's method, and exploits a pre-trained minimum mean squared error (MMSE) Gaussian denoiser. The pr… ▽ More In this work we introduce a novel stochastic algorithm dubbed SNIPS, which draws samples from the posterior distribution of any linear inverse problem, where the observation is assumed to be contaminated by additive white Gaussian noise. Our solution incorporates ideas from Langevin dynamics and Newton's method, and exploits a pre-trained minimum mean squared error (MMSE) Gaussian denoiser. The proposed approach relies on an intricate derivation of the posterior score function that includes a singular value decomposition (SVD) of the degradation operator, in order to obtain a tractable iterative algorithm for the desired sampling. Due to its stochasticity, the algorithm can produce multiple high perceptual quality samples for the same noisy observation. We demonstrate the abilities of the proposed paradigm for image deblurring, super-resolution, and compressive sensing. We show that the samples produced are sharp, detailed and consistent with the given measurements, and their diversity exposes the inherent uncertainty in the inverse problem being solved. △ Less

Submitted 10 November, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS), 2021

arXiv:2104.00464 [pdf, other]

Improved Image Generation via Sparse Modeling

Authors: Roy Ganz, Michael Elad

Abstract: The interest of the deep learning community in image synthesis has grown massively in recent years. Nowadays, deep generative methods, and especially Generative Adversarial Networks (GANs), are leading to state-of-the-art performance, capable of synthesizing images that appear realistic. While the efforts for improving the quality of the generated images are extensive, most attempts still consider… ▽ More The interest of the deep learning community in image synthesis has grown massively in recent years. Nowadays, deep generative methods, and especially Generative Adversarial Networks (GANs), are leading to state-of-the-art performance, capable of synthesizing images that appear realistic. While the efforts for improving the quality of the generated images are extensive, most attempts still consider the generator part as an uncorroborated "black-box". In this paper, we aim to provide a better understanding and design of the image generation process. We interpret existing generators as implicitly relying on sparsity-inspired models. More specifically, we show that generators can be viewed as manifestations of the Convolutional Sparse Coding (CSC) and its Multi-Layered version (ML-CSC) synthesis processes. We leverage this observation by explicitly enforcing a sparsifying regularization on appropriately chosen activation layers in the generator, and demonstrate that this leads to improved image synthesis. Furthermore, we show that the same rationale and benefits apply to generators serving inverse problems, demonstrated on the Deep Image Prior (DIP) method. △ Less

Submitted 13 May, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

arXiv:2103.13767 [pdf, other]

Patch Craft: Video Denoising by Deep Modeling and Patch Matching

Authors: Gregory Vaksman, Michael Elad, Peyman Milanfar

Abstract: The non-local self-similarity property of natural images has been exploited extensively for solving various image processing problems. When it comes to video sequences, harnessing this force is even more beneficial due to the temporal redundancy. In the context of image and video denoising, many classically-oriented algorithms employ self-similarity, splitting the data into overlapping patches, ga… ▽ More The non-local self-similarity property of natural images has been exploited extensively for solving various image processing problems. When it comes to video sequences, harnessing this force is even more beneficial due to the temporal redundancy. In the context of image and video denoising, many classically-oriented algorithms employ self-similarity, splitting the data into overlapping patches, gathering groups of similar ones and processing these together somehow. With the emergence of convolutional neural networks (CNN), the patch-based framework has been abandoned. Most CNN denoisers operate on the whole image, leveraging non-local relations only implicitly by using a large receptive field. This work proposes a novel approach for leveraging self-similarity in the context of video denoising, while still relying on a regular convolutional architecture. We introduce a concept of patch-craft frames - artificial frames that are similar to the real ones, built by tiling matched patches. Our algorithm augments video sequences with patch-craft frames and feeds them to a CNN. We demonstrate the substantial boost in denoising performance obtained with the proposed approach. △ Less

Submitted 30 October, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.04192 [pdf, other]

doi 10.1109/ICCVW54120.2021.00207

High Perceptual Quality Image Denoising with a Posterior Sampling CGAN

Authors: Guy Ohayon, Theo Adrai, Gregory Vaksman, Michael Elad, Peyman Milanfar

Abstract: The vast work in Deep Learning (DL) has led to a leap in image denoising research. Most DL solutions for this task have chosen to put their efforts on the denoiser's architecture while maximizing distortion performance. However, distortion driven solutions lead to blurry results with sub-optimal perceptual quality, especially in immoderate noise levels. In this paper we propose a different perspec… ▽ More The vast work in Deep Learning (DL) has led to a leap in image denoising research. Most DL solutions for this task have chosen to put their efforts on the denoiser's architecture while maximizing distortion performance. However, distortion driven solutions lead to blurry results with sub-optimal perceptual quality, especially in immoderate noise levels. In this paper we propose a different perspective, aiming to produce sharp and visually pleasing denoised images that are still faithful to their clean sources. Formally, our goal is to achieve high perceptual quality with acceptable distortion. This is attained by a stochastic denoiser that samples from the posterior distribution, trained as a generator in the framework of conditional generative adversarial networks (CGAN). Contrary to distortion-based regularization terms that conflict with perceptual quality, we introduce to the CGAN objective a theoretically founded penalty term that does not force a distortion requirement on individual samples, but rather on their mean. We showcase our proposed method with a novel denoiser architecture that achieves the reformed denoising goal and produces vivid and diverse outcomes in immoderate noise levels. △ Less

Submitted 11 October, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

Journal ref: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 2021, pp. 1805-1813

arXiv:2101.09552 [pdf, other]

Stochastic Image Denoising by Sampling from the Posterior Distribution

Authors: Bahjat Kawar, Gregory Vaksman, Michael Elad

Abstract: Image denoising is a well-known and well studied problem, commonly targeting a minimization of the mean squared error (MSE) between the outcome and the original image. Unfortunately, especially for severe noise levels, such Minimum MSE (MMSE) solutions may lead to blurry output images. In this work we propose a novel stochastic denoising approach that produces viable and high perceptual quality re… ▽ More Image denoising is a well-known and well studied problem, commonly targeting a minimization of the mean squared error (MSE) between the outcome and the original image. Unfortunately, especially for severe noise levels, such Minimum MSE (MMSE) solutions may lead to blurry output images. In this work we propose a novel stochastic denoising approach that produces viable and high perceptual quality results, while maintaining a small MSE. Our method employs Langevin dynamics that relies on a repeated application of any given MMSE denoiser, obtaining the reconstructed image by effectively sampling from the posterior distribution. Due to its stochasticity, the proposed algorithm can produce a variety of high-quality outputs for a given noisy input, all shown to be legitimate denoising results. In addition, we present an extension of our algorithm for handling the inpainting problem, recovering missing pixels while removing noise from partially given data. △ Less

Submitted 31 August, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

arXiv:2010.07069 [pdf, other]

Learned Greedy Method (LGM): A Novel Neural Architecture for Sparse Coding and Beyond

Authors: Rajaei Khatib, Dror Simon, Michael Elad

Abstract: The fields of signal and image processing have been deeply influenced by the introduction of deep neural networks. These are successfully deployed in a wide range of real-world applications, obtaining state of the art results and surpassing well-known and well-established classical methods. Despite their impressive success, the architectures used in many of these neural networks come with no clear… ▽ More The fields of signal and image processing have been deeply influenced by the introduction of deep neural networks. These are successfully deployed in a wide range of real-world applications, obtaining state of the art results and surpassing well-known and well-established classical methods. Despite their impressive success, the architectures used in many of these neural networks come with no clear justification. As such, these are usually treated as "black box" machines that lack any kind of interpretability. A constructive remedy to this drawback is a systematic design of such networks by unfolding well-understood iterative algorithms. A popular representative of this approach is the Iterative Shrinkage-Thresholding Algorithm (ISTA) and its learned version -- LISTA, aiming for the sparse representations of the processed signals. In this paper we revisit this sparse coding task and propose an unfolded version of a greedy pursuit algorithm for the same goal. More specifically, we concentrate on the well-known Orthogonal-Matching-Pursuit (OMP) algorithm, and introduce its unfolded and learned version. Key features of our Learned Greedy Method (LGM) are the ability to accommodate a dynamic number of unfolded layers, and a stopping mechanism based on representation error, both adapted to the input. We develop several variants of the proposed LGM architecture and test some of them in various experiments, demonstrating their flexibility and efficiency. △ Less

Submitted 20 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

arXiv:2008.00605 [pdf, other]

The Rate-Distortion-Accuracy Tradeoff: JPEG Case Study

Authors: Xiyang Luo, Hossein Talebi, Feng Yang, Michael Elad, Peyman Milanfar

Abstract: Handling digital images is almost always accompanied by a lossy compression in order to facilitate efficient transmission and storage. This introduces an unavoidable tension between the allocated bit-budget (rate) and the faithfulness of the resulting image to the original one (distortion). An additional complicating consideration is the effect of the compression on recognition performance by give… ▽ More Handling digital images is almost always accompanied by a lossy compression in order to facilitate efficient transmission and storage. This introduces an unavoidable tension between the allocated bit-budget (rate) and the faithfulness of the resulting image to the original one (distortion). An additional complicating consideration is the effect of the compression on recognition performance by given classifiers (accuracy). This work aims to explore this rate-distortion-accuracy tradeoff. As a case study, we focus on the design of the quantization tables in the JPEG compression standard. We offer a novel optimal tuning of these tables via continuous optimization, leveraging a differential implementation of both the JPEG encoder-decoder and an entropy estimator. This enables us to offer a unified framework that considers the interplay between rate, distortion and classification accuracy. In all these fronts, we report a substantial boost in performance by a simple and easily implemented modification of these tables. △ Less

Submitted 2 August, 2020; originally announced August 2020.

ACM Class: I.4.2; I.5.1

arXiv:2008.00226 [pdf, other]

Regularization by Denoising via Fixed-Point Projection (RED-PRO)

Authors: Regev Cohen, Michael Elad, Peyman Milanfar

Abstract: Inverse problems in image processing are typically cast as optimization tasks, consisting of data-fidelity and stabilizing regularization terms. A recent regularization strategy of great interest utilizes the power of denoising engines. Two such methods are the Plug-and-Play Prior (PnP) and Regularization by Denoising (RED). While both have shown state-of-the-art results in various recovery tasks,… ▽ More Inverse problems in image processing are typically cast as optimization tasks, consisting of data-fidelity and stabilizing regularization terms. A recent regularization strategy of great interest utilizes the power of denoising engines. Two such methods are the Plug-and-Play Prior (PnP) and Regularization by Denoising (RED). While both have shown state-of-the-art results in various recovery tasks, their theoretical justification is incomplete. In this paper, we aim to bridge between RED and PnP, enriching the understanding of both frameworks. Towards that end, we reformulate RED as a convex optimization problem utilizing a projection (RED-PRO) onto the fixed-point set of demicontractive denoisers. We offer a simple iterative solution to this problem, by which we show that PnP proximal gradient method is a special case of RED-PRO, while providing guarantees for the convergence of both frameworks to globally optimal solutions. In addition, we present relaxations of RED-PRO that allow for handling denoisers with limited fixed-point sets. Finally, we demonstrate RED-PRO for the tasks of image deblurring and super-resolution, showing improved results with respect to the original RED framework. △ Less

Submitted 28 October, 2020; v1 submitted 1 August, 2020; originally announced August 2020.

Comments: 33 Pages, 6 figures, 7 tables

arXiv:2006.15555 [pdf, other]

When and How Can Deep Generative Models be Inverted?

Authors: Aviad Aberdam, Dror Simon, Michael Elad

Abstract: Deep generative models (e.g. GANs and VAEs) have been developed quite extensively in recent years. Lately, there has been an increased interest in the inversion of such a model, i.e. given a (possibly corrupted) signal, we wish to recover the latent vector that generated it. Building upon sparse representation theory, we define conditions that are applicable to any inversion algorithm (gradient de… ▽ More Deep generative models (e.g. GANs and VAEs) have been developed quite extensively in recent years. Lately, there has been an increased interest in the inversion of such a model, i.e. given a (possibly corrupted) signal, we wish to recover the latent vector that generated it. Building upon sparse representation theory, we define conditions that are applicable to any inversion algorithm (gradient descent, deep encoder, etc.), under which such generative models are invertible with a unique solution. Importantly, the proposed analysis is applicable to any trained model, and does not depend on Gaussian i.i.d. weights. Furthermore, we introduce two layer-wise inversion pursuit algorithms for trained generative networks of arbitrary depth, and accompany these with recovery guarantees. Finally, we validate our theoretical results numerically and show that our method outperforms gradient descent when inverting such generators, both for clean and corrupted signals. △ Less

Submitted 28 June, 2020; originally announced June 2020.

arXiv:2002.00113 [pdf, other]

doi 10.1109/TIP.2021.3096085

Better Compression with Deep Pre-Editing

Authors: Hossein Talebi, Damien Kelly, Xiyang Luo, Ignacio Garcia Dorado, Feng Yang, Peyman Milanfar, Michael Elad

Abstract: Could we compress images via standard codecs while avoiding visible artifacts? The answer is obvious -- this is doable as long as the bit budget is generous enough. What if the allocated bit-rate for compression is insufficient? Then unfortunately, artifacts are a fact of life. Many attempts were made over the years to fight this phenomenon, with various degrees of success. In this work we aim to… ▽ More Could we compress images via standard codecs while avoiding visible artifacts? The answer is obvious -- this is doable as long as the bit budget is generous enough. What if the allocated bit-rate for compression is insufficient? Then unfortunately, artifacts are a fact of life. Many attempts were made over the years to fight this phenomenon, with various degrees of success. In this work we aim to break the unholy connection between bit-rate and image quality, and propose a way to circumvent compression artifacts by pre-editing the incoming image and modifying its content to fit the given bits. We design this editing operation as a learned convolutional neural network, and formulate an optimization problem for its training. Our loss takes into account a proximity between the original image and the edited one, a bit-budget penalty over the proposed image, and a no-reference image quality measure for forcing the outcome to be visually pleasing. The proposed approach is demonstrated on the popular JPEG compression, showing savings in bits and/or improvements in visual quality, obtained with intricate editing effects. △ Less

Submitted 23 July, 2021; v1 submitted 31 January, 2020; originally announced February 2020.

Showing 1–50 of 109 results for author: Elad, M