Search | arXiv e-print repository

CASteer: Steering Diffusion Models for Controllable Generation

Authors: Tatiana Gaintseva, Chengcheng Ma, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, Ismail Elezi

Abstract: Diffusion models have transformed image generation, yet controlling their outputs for diverse applications, including content moderation and creative customization, remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering) a train… ▽ More Diffusion models have transformed image generation, yet controlling their outputs for diverse applications, including content moderation and creative customization, remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering) a training-free framework for controllable image generation using steering vectors to influence a diffusion model$'$s hidden representations dynamically. CASteer computes these vectors offline by averaging activations from concept-specific generated images, then applies them during inference via a dynamic heuristic that activates modifications only when necessary, removing concepts from affected images or adding them to unaffected ones. This approach enables precise control over a wide range of tasks, including removing harmful content, adding desired attributes, replacing objects, or altering styles, all without model retraining. CASteer handles both concrete and abstract concepts, outperforming state-of-the-art techniques across multiple diffusion models while preserving unrelated content and minimizing unintended effects. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2501.19386 [pdf, ps, other]

Multi-Frame Blind Manifold Deconvolution for Rotating Synthetic Aperture Imaging

Authors: Dao Lin, Jian Zhang, Martin Benning

Abstract: Rotating synthetic aperture (RSA) imaging system captures images of the target scene at different rotation angles by rotating a rectangular aperture. Deblurring acquired RSA images plays a critical role in reconstructing a latent sharp image underlying the scene. In the past decade, the emergence of blind convolution technology has revolutionised this field by its ability to model complex features… ▽ More Rotating synthetic aperture (RSA) imaging system captures images of the target scene at different rotation angles by rotating a rectangular aperture. Deblurring acquired RSA images plays a critical role in reconstructing a latent sharp image underlying the scene. In the past decade, the emergence of blind convolution technology has revolutionised this field by its ability to model complex features from acquired images. Most of the existing methods attempt to solve the above ill-posed inverse problem through maximising a posterior. Despite this progress, researchers have paid limited attention to exploring low-dimensional manifold structures of the latent image within a high-dimensional ambient-space. Here, we propose a novel method to process RSA images using manifold fitting and penalisation in the content of multi-frame blind convolution. We develop fast algorithms for implementing the proposed procedure. Simulation studies demonstrate that manifold-based deconvolution can outperform conventional deconvolution algorithms in the sense that it can generate a sharper estimate of the latent image in terms of estimating pixel intensities and preserving structural details. △ Less

Submitted 31 January, 2025; originally announced January 2025.

Comments: 39 pages, 9 figures

MSC Class: 62P30

arXiv:2410.23130 [pdf, other]

Compositional Segmentation of Cardiac Images Leveraging Metadata

Authors: Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh

Abstract: Cardiac image segmentation is essential for automated cardiac function assessment and monitoring of changes in cardiac structures over time. Inspired by coarse-to-fine approaches in image analysis, we propose a novel multitask compositional segmentation approach that can simultaneously localize the heart in a cardiac image and perform part-based segmentation of different regions of interest. We de… ▽ More Cardiac image segmentation is essential for automated cardiac function assessment and monitoring of changes in cardiac structures over time. Inspired by coarse-to-fine approaches in image analysis, we propose a novel multitask compositional segmentation approach that can simultaneously localize the heart in a cardiac image and perform part-based segmentation of different regions of interest. We demonstrate that this compositional approach achieves better results than direct segmentation of the anatomies. Further, we propose a novel Cross-Modal Feature Integration (CMFI) module to leverage the metadata related to cardiac imaging collected during image acquisition. We perform experiments on two different modalities, MRI and ultrasound, using public datasets, Multi-disease, Multi-View, and Multi-Centre (M&Ms-2) and Multi-structure Ultrasound Segmentation (CAMUS) data, to showcase the efficiency of the proposed compositional segmentation method and Cross-Modal Feature Integration module incorporating metadata within the proposed compositional segmentation network. The source code is available: https://github.com/kabbas570/CompSeg-MetaData. △ Less

Submitted 30 October, 2024; originally announced October 2024.

Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

arXiv:2408.08742 [pdf, other]

A lifted Bregman strategy for training unfolded proximal neural network Gaussian denoisers

Authors: Xiaoyu Wang, Martin Benning, Audrey Repetti

Abstract: Unfolded proximal neural networks (PNNs) form a family of methods that combines deep learning and proximal optimization approaches. They consist in designing a neural network for a specific task by unrolling a proximal algorithm for a fixed number of iterations, where linearities can be learned from prior training procedure. PNNs have shown to be more robust than traditional deep learning approach… ▽ More Unfolded proximal neural networks (PNNs) form a family of methods that combines deep learning and proximal optimization approaches. They consist in designing a neural network for a specific task by unrolling a proximal algorithm for a fixed number of iterations, where linearities can be learned from prior training procedure. PNNs have shown to be more robust than traditional deep learning approaches while reaching at least as good performances, in particular in computational imaging. However, training PNNs still depends on the efficiency of available training algorithms. In this work, we propose a lifted training formulation based on Bregman distances for unfolded PNNs. Leveraging the deterministic mini-batch block-coordinate forward-backward method, we design a bespoke computational strategy beyond traditional back-propagation methods for solving the resulting learning problem efficiently. We assess the behaviour of the proposed training approach for PNNs through numerical simulations on image denoising, considering a denoising PNN whose structure is based on dual proximal-gradient iterations. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 2024 IEEE International Workshop on Machine Learning for Signal Processing, Sept. 22--25, 2024, London, UK

MSC Class: 65K10; 68T01

arXiv:2406.15035 [pdf, other]

Improving Interpretability and Robustness for the Detection of AI-Generated Images

Authors: Tatiana Gaintseva, Laida Kushnareva, German Magai, Irina Piontkovskaya, Sergey Nikolenko, Martin Benning, Serguei Barannikov, Gregory Slabaugh

Abstract: With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on froz… ▽ More With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.05786 [pdf, other]

CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation

Authors: Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh

Abstract: Convolutional Neural Networks (CNNs) and Transformer-based self-attention models have become the standard for medical image segmentation. This paper demonstrates that convolution and self-attention, while widely used, are not the only effective methods for segmentation. Breaking with convention, we present a Convolution and self-Attention-free Mamba-based semantic Segmentation Network named CAMS-N… ▽ More Convolutional Neural Networks (CNNs) and Transformer-based self-attention models have become the standard for medical image segmentation. This paper demonstrates that convolution and self-attention, while widely used, are not the only effective methods for segmentation. Breaking with convention, we present a Convolution and self-Attention-free Mamba-based semantic Segmentation Network named CAMS-Net. Specifically, we design Mamba-based Channel Aggregator and Spatial Aggregator, which are applied independently in each encoder-decoder stage. The Channel Aggregator extracts information across different channels, and the Spatial Aggregator learns features across different spatial locations. We also propose a Linearly Interconnected Factorized Mamba (LIFM) block to reduce the computational complexity of a Mamba block and to enhance its decision function by introducing a non-linearity between two factorized Mamba blocks. Our model outperforms the existing state-of-the-art CNN, self-attention, and Mamba-based methods on CMR and M&Ms-2 Cardiac segmentation datasets, showing how this innovative, convolution, and self-attention-free method can inspire further research beyond CNN and Transformer paradigms, achieving linear complexity and reducing the number of parameters. Source code and pre-trained models are available at: https://github.com/kabbas570/CAMS-Net. △ Less

Submitted 29 October, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: This paper has been accepted for the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

arXiv:2404.16708 [pdf, other]

Multi-view Cardiac Image Segmentation via Trans-Dimensional Priors

Authors: Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh

Abstract: We propose a novel multi-stage trans-dimensional architecture for multi-view cardiac image segmentation. Our method exploits the relationship between long-axis (2D) and short-axis (3D) magnetic resonance (MR) images to perform a sequential 3D-to-2D-to-3D segmentation, segmenting the long-axis and short-axis images. In the first stage, 3D segmentation is performed using the short-axis image, and th… ▽ More We propose a novel multi-stage trans-dimensional architecture for multi-view cardiac image segmentation. Our method exploits the relationship between long-axis (2D) and short-axis (3D) magnetic resonance (MR) images to perform a sequential 3D-to-2D-to-3D segmentation, segmenting the long-axis and short-axis images. In the first stage, 3D segmentation is performed using the short-axis image, and the prediction is transformed to the long-axis view and used as a segmentation prior in the next stage. In the second step, the heart region is localized and cropped around the segmentation prior using a Heart Localization and Cropping (HLC) module, focusing the subsequent model on the heart region of the image, where a 2D segmentation is performed. Similarly, we transform the long-axis prediction to the short-axis view, localize and crop the heart region and again perform a 3D segmentation to refine the initial short-axis segmentation. We evaluate our proposed method on the Multi-Disease, Multi-View & Multi-Center Right Ventricular Segmentation in Cardiac MRI (M&Ms-2) dataset, where our method outperforms state-of-the-art methods in segmenting cardiac regions of interest in both short-axis and long-axis images. The pre-trained models, source code, and implementation details will be publicly available. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.01889 [pdf, other]

RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement

Authors: Tatiana Gaintseva, Martin Benning, Gregory Slabaugh

Abstract: In this paper we propose a novel modification of Contrastive Language-Image Pre-Training (CLIP) guidance for the task of unsupervised backlit image enhancement. Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLI… ▽ More In this paper we propose a novel modification of Contrastive Language-Image Pre-Training (CLIP) guidance for the task of unsupervised backlit image enhancement. Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLIP embedding space. Learned prompts then guide an image enhancement network. Based on the CLIP-LIT framework, we propose two novel methods for CLIP guidance. First, we show that instead of tuning prompts in the space of text embeddings, it is possible to directly tune their embeddings in the latent space without any loss in quality. This accelerates training and potentially enables the use of additional encoders that do not have a text encoder. Second, we propose a novel approach that does not require any prompt tuning. Instead, based on CLIP embeddings of backlit and well-lit images from training data, we compute the residual vector in the embedding space as a simple difference between the mean embeddings of the well-lit and backlit images. This vector then guides the enhancement network during training, pushing a backlit image towards the space of well-lit images. This approach further dramatically reduces training time, stabilizes training and produces high quality enhanced images without artifacts, both in supervised and unsupervised training regimes. Additionally, we show that residual vectors can be interpreted, revealing biases in training data, and thereby enabling potential bias correction. △ Less

Submitted 20 July, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2402.09156 [pdf, other]

Crop and Couple: cardiac image segmentation using interlinked specialist networks

Authors: Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh

Abstract: Diagnosis of cardiovascular disease using automated methods often relies on the critical task of cardiac image segmentation. We propose a novel strategy that performs segmentation using specialist networks that focus on a single anatomy (left ventricle, right ventricle, or myocardium). Given an input long-axis cardiac MR image, our method performs a ternary segmentation in the first stage to ident… ▽ More Diagnosis of cardiovascular disease using automated methods often relies on the critical task of cardiac image segmentation. We propose a novel strategy that performs segmentation using specialist networks that focus on a single anatomy (left ventricle, right ventricle, or myocardium). Given an input long-axis cardiac MR image, our method performs a ternary segmentation in the first stage to identify these anatomical regions, followed by cropping the original image to focus subsequent processing on the anatomical regions. The specialist networks are coupled through an attention mechanism that performs cross-attention to interlink features from different anatomies, serving as a soft relative shape prior. Central to our approach is an additive attention block (E-2A block), which is used throughout our architecture thanks to its efficiency. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2303.01965 [pdf, other]

A Lifted Bregman Formulation for the Inversion of Deep Neural Networks

Authors: Xiaoyu Wang, Martin Benning

Abstract: We propose a novel framework for the regularised inversion of deep neural networks. The framework is based on the authors' recent work on training feed-forward neural networks without the differentiation of activation functions. The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables, and penalises these variables with tailored Bregman distances.… ▽ More We propose a novel framework for the regularised inversion of deep neural networks. The framework is based on the authors' recent work on training feed-forward neural networks without the differentiation of activation functions. The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables, and penalises these variables with tailored Bregman distances. We propose a family of variational regularisations based on these Bregman distances, present theoretical results and support their practical application with numerical examples. In particular, we present the first convergence result (to the best of our knowledge) for the regularised inversion of a single-layer perceptron that only assumes that the solution of the inverse problem is in the range of the regularisation operator, and that shows that the regularised inverse provably converges to the true inverse if measurement errors converge to zero. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 21 pages, 9 figures

MSC Class: 47A52; 47J30; 65J20; 65J22; 65K10; 68T07; 94A08

arXiv:2212.07786 [pdf, other]

Convergent Data-driven Regularizations for CT Reconstruction

Authors: Samira Kabri, Alexander Auras, Danilo Riccio, Hartmut Bauermeister, Martin Benning, Michael Moeller, Martin Burger

Abstract: The reconstruction of images from their corresponding noisy Radon transform is a typical example of an ill-posed linear inverse problem as arising in the application of computerized tomography (CT). As the (naive) solution does not depend on the measured data continuously, regularization is needed to re-establish a continuous dependence. In this work, we investigate simple, but yet still provably… ▽ More The reconstruction of images from their corresponding noisy Radon transform is a typical example of an ill-posed linear inverse problem as arising in the application of computerized tomography (CT). As the (naive) solution does not depend on the measured data continuously, regularization is needed to re-establish a continuous dependence. In this work, we investigate simple, but yet still provably convergent approaches to learning linear regularization methods from data. More specifically, we analyze two approaches: One generic linear regularization that learns how to manipulate the singular values of the linear operator in an extension of our previous work, and one tailored approach in the Fourier domain that is specific to CT-reconstruction. We prove that such approaches become convergent regularization methods as well as the fact that the reconstructions they provide are typically much smoother than the training data they were trained on. Finally, we compare the spectral as well as the Fourier-based approaches for CT-reconstruction numerically, discuss their advantages and disadvantages and investigate the effect of discretization errors at different resolutions. △ Less

Submitted 15 December, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2208.08772 [pdf, other]

Lifted Bregman Training of Neural Networks

Authors: Xiaoyu Wang, Martin Benning

Abstract: We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions. This formulation is based on Bregman distances and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions. Instead of… ▽ More We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions. This formulation is based on Bregman distances and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions. Instead of estimating the parameters with a combination of first-order optimisation method and back-propagation (as is the state-of-the-art), we propose the use of non-smooth first-order optimisation methods that exploit the specific structure of the novel formulation. We present several numerical results that demonstrate that these training approaches can be equally well or even better suited for the training of neural network-based classifiers and (denoising) autoencoders with sparse coding compared to more conventional training frameworks. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: 48 pages, 16 figures

MSC Class: 47A52; 65K10; 68T01; 68W15

arXiv:2109.02096 [pdf, other]

Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks

Authors: Russell Sammut Bonnici, Charalampos Saitis, Martin Benning

Abstract: This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality. The adopted approach combines Variational Autoencoders with Generative Adversarial Networks to construct meaningful representations of the source audio and produce realistic generations of the targe… ▽ More This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality. The adopted approach combines Variational Autoencoders with Generative Adversarial Networks to construct meaningful representations of the source audio and produce realistic generations of the target audio and is applied to the Flickr 8k Audio dataset for transferring the vocal timbre between speakers and the URMP dataset for transferring the musical timbre between instruments. Furthermore, variations of the adopted approach are trained, and generalised performance is compared using the metrics SSIM (Structural Similarity Index) and FAD (Frechét Audio Distance). It was found that a many-to-many approach supersedes a one-to-one approach in terms of reconstructive capabilities, and that the adoption of a basic over a bottleneck residual block design is more suitable for enriching content information about a latent space. It was also found that the decision on whether cyclic loss takes on a variational autoencoder or vanilla autoencoder approach does not have a significant impact on reconstructive and adversarial translation aspects of the model. △ Less

Submitted 10 October, 2021; v1 submitted 5 September, 2021; originally announced September 2021.

Comments: 12 pages, 3 main figures, 4 tables

arXiv:2012.03642 [pdf, other]

Generalised Perceptron Learning

Authors: Xiaoyu Wang, Martin Benning

Abstract: We present a generalisation of Rosenblatt's traditional perceptron learning algorithm to the class of proximal activation functions and demonstrate how this generalisation can be interpreted as an incremental gradient method applied to a novel energy function. This novel energy function is based on a generalised Bregman distance, for which the gradient with respect to the weights and biases does n… ▽ More We present a generalisation of Rosenblatt's traditional perceptron learning algorithm to the class of proximal activation functions and demonstrate how this generalisation can be interpreted as an incremental gradient method applied to a novel energy function. This novel energy function is based on a generalised Bregman distance, for which the gradient with respect to the weights and biases does not require the differentiation of the activation function. The interpretation as an energy minimisation algorithm paves the way for many new algorithms, of which we explore a novel variant of the iterative soft-thresholding algorithm for the learning of sparse perceptrons. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 8 pages, 2 figures, accepted at the 12th Annual Workshop on Optimization for Machine Learning

MSC Class: 68T07; 65K05; 49M37; 90C30

arXiv:1906.08754 [pdf, other]

Learning the Sampling Pattern for MRI

Authors: Ferdia Sherry, Martin Benning, Juan Carlos De los Reyes, Martin J. Graves, Georg Maierhofer, Guy Williams, Carola-Bibiane Schönlieb, Matthias J. Ehrhardt

Abstract: The discovery of the theory of compressed sensing brought the realisation that many inverse problems can be solved even when measurements are "incomplete". This is particularly interesting in magnetic resonance imaging (MRI), where long acquisition times can limit its use. In this work, we consider the problem of learning a sparse sampling pattern that can be used to optimally balance acquisition… ▽ More The discovery of the theory of compressed sensing brought the realisation that many inverse problems can be solved even when measurements are "incomplete". This is particularly interesting in magnetic resonance imaging (MRI), where long acquisition times can limit its use. In this work, we consider the problem of learning a sparse sampling pattern that can be used to optimally balance acquisition time versus quality of the reconstructed image. We use a supervised learning approach, making the assumption that our training data is representative enough of new data acquisitions. We demonstrate that this is indeed the case, even if the training data consists of just 7 training pairs of measurements and ground-truth images; with a training set of brain images of size 192 by 192, for instance, one of the learned patterns samples only 35% of k-space, however results in reconstructions with mean SSIM 0.914 on a test set of similar images. The proposed framework is general enough to learn arbitrary sampling patterns, including common patterns such as Cartesian, spiral and radial sampling. △ Less

Submitted 21 June, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: The main document is 12 pages, the supporting document is 2 pages and attached at the end of the main document

arXiv:1904.05657 [pdf, other]

Deep learning as optimal control problems: models and numerical methods

Authors: Martin Benning, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, Carola-Bibiane Schönlieb

Abstract: We consider recent work of Haber and Ruthotto 2017 and Chang et al. 2018, where deep learning neural networks have been interpreted as discretisations of an optimal control problem subject to an ordinary differential equation constraint. We review the first order conditions for optimality, and the conditions ensuring optimality after discretisation. This leads to a class of algorithms for solving… ▽ More We consider recent work of Haber and Ruthotto 2017 and Chang et al. 2018, where deep learning neural networks have been interpreted as discretisations of an optimal control problem subject to an ordinary differential equation constraint. We review the first order conditions for optimality, and the conditions ensuring optimality after discretisation. This leads to a class of algorithms for solving the discrete optimal control problem which guarantee that the corresponding discrete necessary conditions for optimality are fulfilled. The differential equation setting lends itself to learning additional parameters such as the time discretisation. We explore this extension alongside natural constraints (e.g. time steps lie in a simplex). We compare these deep learning algorithms numerically in terms of induced flow and generalisation ability. △ Less

Submitted 30 September, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

arXiv:1703.08001 [pdf, other]

Nonlinear Spectral Image Fusion

Authors: Martin Benning, Michael Möller, Raz Z. Nossek, Martin Burger, Daniel Cremers, Guy Gilboa, Carola-Bibiane Schönlieb

Abstract: In this paper we demonstrate that the framework of nonlinear spectral decompositions based on total variation (TV) regularization is very well suited for image fusion as well as more general image manipulation tasks. The well-localized and edge-preserving spectral TV decomposition allows to select frequencies of a certain image to transfer particular features, such as wrinkles in a face, from one… ▽ More In this paper we demonstrate that the framework of nonlinear spectral decompositions based on total variation (TV) regularization is very well suited for image fusion as well as more general image manipulation tasks. The well-localized and edge-preserving spectral TV decomposition allows to select frequencies of a certain image to transfer particular features, such as wrinkles in a face, from one image to another. We illustrate the effectiveness of the proposed approach in several numerical experiments, including a comparison to the competing techniques of Poisson image editing, linear osmosis, wavelet fusion and Laplacian pyramid fusion. We conclude that the proposed spectral TV image decomposition framework is a valuable tool for semi- and fully-automatic image editing and fusion. △ Less

Submitted 23 March, 2017; originally announced March 2017.

Comments: 13 pages, 9 figures, submitted to SSVM conference proceedings 2017

MSC Class: 35P30; 62H35; 65M70; 94A08 ACM Class: G.1.3; G.1.6; G.1.8; I.4.0; I.4.5

arXiv:1408.0173 [pdf, other]

doi 10.1109/TIP.2015.2479469

Variational Depth from Focus Reconstruction

Authors: Michael Moeller, Martin Benning, Carola Schönlieb, Daniel Cremers

Abstract: This paper deals with the problem of reconstructing a depth map from a sequence of differently focused images, also known as depth from focus or shape from focus. We propose to state the depth from focus problem as a variational problem including a smooth but nonconvex data fidelity term, and a convex nonsmooth regularization, which makes the method robust to noise and leads to more realistic dept… ▽ More This paper deals with the problem of reconstructing a depth map from a sequence of differently focused images, also known as depth from focus or shape from focus. We propose to state the depth from focus problem as a variational problem including a smooth but nonconvex data fidelity term, and a convex nonsmooth regularization, which makes the method robust to noise and leads to more realistic depth maps. Additionally, we propose to solve the nonconvex minimization problem with a linearized alternating directions method of multipliers (ADMM), allowing to minimize the energy very efficiently. A numerical comparison to classical methods on simulated as well as on real data is presented. △ Less

Submitted 5 November, 2014; v1 submitted 1 August, 2014; originally announced August 2014.

Showing 1–18 of 18 results for author: Benning, M