-
Unpaired Translation from Semantic Label Maps to Images by Leveraging Domain-Specific Simulations
Authors:
Lin Zhang,
Tiziano Portenier,
Orcun Goksel
Abstract:
Photorealistic image generation from simulated label maps are necessitated in several contexts, such as for medical training in virtual reality. With conventional deep learning methods, this task requires images that are paired with semantic annotations, which typically are unavailable. We introduce a contrastive learning framework for generating photorealistic images from simulated label maps, by…
▽ More
Photorealistic image generation from simulated label maps are necessitated in several contexts, such as for medical training in virtual reality. With conventional deep learning methods, this task requires images that are paired with semantic annotations, which typically are unavailable. We introduce a contrastive learning framework for generating photorealistic images from simulated label maps, by learning from unpaired sets of both. Due to potentially large scene differences between real images and label maps, existing unpaired image translation methods lead to artifacts of scene modification in synthesized images. We utilize simulated images as surrogate targets for a contrastive loss, while ensuring consistency by utilizing features from a reverse translation network. Our method enables bidirectional label-image translations, which is demonstrated in a variety of scenarios and datasets, including laparoscopy, ultrasound, and driving scenes. By comparing with state-of-the-art unpaired translation methods, our proposed method is shown to generate realistic and scene-accurate translations.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Generative Feature-driven Image Replay for Continual Learning
Authors:
Kevin Thandiackal,
Tiziano Portenier,
Andrea Giovannini,
Maria Gabrani,
Orcun Goksel
Abstract:
Neural networks are prone to catastrophic forgetting when trained incrementally on different tasks. Popular incremental learning methods mitigate such forgetting by retaining a subset of previously seen samples and replaying them during the training on subsequent tasks. However, this is not always possible, e.g., due to data protection regulations. In such restricted scenarios, one can employ gene…
▽ More
Neural networks are prone to catastrophic forgetting when trained incrementally on different tasks. Popular incremental learning methods mitigate such forgetting by retaining a subset of previously seen samples and replaying them during the training on subsequent tasks. However, this is not always possible, e.g., due to data protection regulations. In such restricted scenarios, one can employ generative models to replay either artificial images or hidden features to a classifier. In this work, we propose Genifer (GENeratIve FEature-driven image Replay), where a generative model is trained to replay images that must induce the same hidden features as real samples when they are passed through the classifier. Our technique therefore incorporates the benefits of both image and feature replay, i.e.: (1) unlike conventional image replay, our generative model explicitly learns the distribution of features that are relevant for classification; (2) in contrast to feature replay, our entire classifier remains trainable; and (3) we can leverage image-space augmentations, which increase distillation performance while also mitigating overfitting during the training of the generative model. We show that Genifer substantially outperforms the previous state of the art for various settings on the CIFAR-100 and CUB-200 datasets.
△ Less
Submitted 6 December, 2022; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Content-Preserving Unpaired Translation from Simulated to Realistic Ultrasound Images
Authors:
Devavrat Tomar,
Lin Zhang,
Tiziano Portenier,
Orcun Goksel
Abstract:
Interactive simulation of ultrasound imaging greatly facilitates sonography training. Although ray-tracing based methods have shown promising results, obtaining realistic images requires substantial modeling effort and manual parameter tuning. In addition, current techniques still result in a significant appearance gap between simulated images and real clinical scans. Herein we introduce a novel c…
▽ More
Interactive simulation of ultrasound imaging greatly facilitates sonography training. Although ray-tracing based methods have shown promising results, obtaining realistic images requires substantial modeling effort and manual parameter tuning. In addition, current techniques still result in a significant appearance gap between simulated images and real clinical scans. Herein we introduce a novel content-preserving image translation framework (ConPres) to bridge this appearance gap, while maintaining the simulated anatomical layout. We achieve this goal by leveraging both simulated images with semantic segmentations and unpaired in-vivo ultrasound scans. Our framework is based on recent contrastive unpaired translation techniques and we propose a regularization approach by learning an auxiliary segmentation-to-real image translation task, which encourages the disentanglement of content and style. In addition, we extend the generator to be class-conditional, which enables the incorporation of additional losses, in particular a cyclic consistency loss, to further improve the translation quality. Qualitative and quantitative comparisons against state-of-the-art unpaired translation methods demonstrate the superiority of our proposed framework.
△ Less
Submitted 30 September, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Probabilistic Spatial Analysis in Quantitative Microscopy with Uncertainty-Aware Cell Detection using Deep Bayesian Regression of Density Maps
Authors:
Alvaro Gomariz,
Tiziano Portenier,
César Nombela-Arrieta,
Orcun Goksel
Abstract:
3D microscopy is key in the investigation of diverse biological systems, and the ever increasing availability of large datasets demands automatic cell identification methods that not only are accurate, but also can imply the uncertainty in their predictions to inform about potential errors and hence confidence in conclusions using them. While conventional deep learning methods often yield determin…
▽ More
3D microscopy is key in the investigation of diverse biological systems, and the ever increasing availability of large datasets demands automatic cell identification methods that not only are accurate, but also can imply the uncertainty in their predictions to inform about potential errors and hence confidence in conclusions using them. While conventional deep learning methods often yield deterministic results, advances in deep Bayesian learning allow for accurate predictions with a probabilistic interpretation in numerous image classification and segmentation tasks. It is however nontrivial to extend such Bayesian methods to cell detection, which requires specialized learning frameworks. In particular, regression of density maps is a popular successful approach for extracting cell coordinates from local peaks in a postprocessing step, which hinders any meaningful probabilistic output. We herein propose a deep learning-based cell detection framework that can operate on large microscopy images and outputs desired probabilistic predictions by (i) integrating Bayesian techniques for the regression of uncertainty-aware density maps, where peak detection can be applied to generate cell proposals, and (ii) learning a mapping from the numerous proposals to a probabilistic space that is calibrated, i.e. accurately represents the chances of a successful prediction. Utilizing such calibrated predictions, we propose a probabilistic spatial analysis with Monte-Carlo sampling. We demonstrate this in revising an existing description of the distribution of a mesenchymal stromal cell type within the bone marrow, where our proposed methods allow us to reveal spatial patterns that are otherwise undetectable. Introducing such probabilistic analysis in quantitative microscopy pipelines will allow for reporting confidence intervals for testing biological hypotheses of spatial distributions.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Utilizing Uncertainty Estimation in Deep Learning Segmentation of Fluorescence Microscopy Images with Missing Markers
Authors:
Alvaro Gomariz,
Raphael Egli,
Tiziano Portenier,
César Nombela-Arrieta,
Orcun Goksel
Abstract:
Fluorescence microscopy images contain several channels, each indicating a marker staining the sample. Since many different marker combinations are utilized in practice, it has been challenging to apply deep learning based segmentation models, which expect a predefined channel combination for all training samples as well as at inference for future application. Recent work circumvents this problem…
▽ More
Fluorescence microscopy images contain several channels, each indicating a marker staining the sample. Since many different marker combinations are utilized in practice, it has been challenging to apply deep learning based segmentation models, which expect a predefined channel combination for all training samples as well as at inference for future application. Recent work circumvents this problem using a modality attention approach to be effective across any possible marker combination. However, for combinations that do not exist in a labeled training dataset, one cannot have any estimation of potential segmentation quality if that combination is encountered during inference. Without this, not only one lacks quality assurance but one also does not know where to put any additional imaging and labeling effort. We herein propose a method to estimate segmentation quality on unlabeled images by (i) estimating both aleatoric and epistemic uncertainties of convolutional neural networks for image segmentation, and (ii) training a Random Forest model for the interpretation of uncertainty features via regression to their corresponding segmentation metrics. Additionally, we demonstrate that including these uncertainty measures during training can provide an improvement on segmentation performance.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
Learning Ultrasound Rendering from Cross-Sectional Model Slices for Simulated Training
Authors:
Lin Zhang,
Tiziano Portenier,
Orcun Goksel
Abstract:
Purpose. Given the high level of expertise required for navigation and interpretation of ultrasound images, computational simulations can facilitate the training of such skills in virtual reality. With ray-tracing based simulations, realistic ultrasound images can be generated. However, due to computational constraints for interactivity, image quality typically needs to be compromised.
Methods.…
▽ More
Purpose. Given the high level of expertise required for navigation and interpretation of ultrasound images, computational simulations can facilitate the training of such skills in virtual reality. With ray-tracing based simulations, realistic ultrasound images can be generated. However, due to computational constraints for interactivity, image quality typically needs to be compromised.
Methods. We propose herein to bypass any rendering and simulation process at interactive time, by conducting such simulations during a non-time-critical offline stage and then learning image translation from cross-sectional model slices to such simulated frames. We use a generative adversarial framework with a dedicated generator architecture and input feeding scheme, which both substantially improve image quality without increase in network parameters. Integral attenuation maps derived from cross-sectional model slices, texture-friendly strided convolutions, providing stochastic noise and input maps to intermediate layers in order to preserve locality are all shown herein to greatly facilitate such translation task.
Results. Given several quality metrics, the proposed method with only tissue maps as input is shown to provide comparable or superior results to a state-of-the-art that uses additional images of low-quality ultrasound renderings. An extensive ablation study shows the need and benefits from the individual contributions utilized in this work, based on qualitative examples and quantitative ultrasound similarity metrics. To that end, a local histogram statistics based error metric is proposed and demonstrated for visualization of local dissimilarities between ultrasound images.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
Modality Attention and Sampling Enables Deep Learning with Heterogeneous Marker Combinations in Fluorescence Microscopy
Authors:
Alvaro Gomariz,
Tiziano Portenier,
Patrick M. Helbling,
Stephan Isringhausen,
Ute Suessbier,
César Nombela-Arrieta,
Orcun Goksel
Abstract:
Fluorescence microscopy allows for a detailed inspection of cells, cellular networks, and anatomical landmarks by staining with a variety of carefully-selected markers visualized as color channels. Quantitative characterization of structures in acquired images often relies on automatic image analysis methods. Despite the success of deep learning methods in other vision applications, their potentia…
▽ More
Fluorescence microscopy allows for a detailed inspection of cells, cellular networks, and anatomical landmarks by staining with a variety of carefully-selected markers visualized as color channels. Quantitative characterization of structures in acquired images often relies on automatic image analysis methods. Despite the success of deep learning methods in other vision applications, their potential for fluorescence image analysis remains underexploited. One reason lies in the considerable workload required to train accurate models, which are normally specific for a given combination of markers, and therefore applicable to a very restricted number of experimental settings. We herein propose Marker Sampling and Excite, a neural network approach with a modality sampling strategy and a novel attention module that together enable (i) flexible training with heterogeneous datasets with combinations of markers and (ii) successful utility of learned models on arbitrary subsets of markers prospectively. We show that our single neural network solution performs comparably to an upper bound scenario where an ensemble of many networks is naïvely trained for each possible marker combination separately. In addition, we demonstrate the feasibility of this framework in high-throughput biological analysis by revising a recent quantitative characterization of bone marrow vasculature in 3D confocal microscopy datasets and further confirm the validity of our approach on an additional, significantly different dataset of microvessels in fetal liver tissues. Not only can our work substantially ameliorate the use of deep learning in fluorescence microscopy analysis, but it can also be utilized in other fields with incomplete data acquisitions and missing modalities.
△ Less
Submitted 22 June, 2021; v1 submitted 27 August, 2020;
originally announced August 2020.
-
GramGAN: Deep 3D Texture Synthesis From 2D Exemplars
Authors:
Tiziano Portenier,
Siavash Bigdeli,
Orcun Goksel
Abstract:
We present a novel texture synthesis framework, enabling the generation of infinite, high-quality 3D textures given a 2D exemplar image. Inspired by recent advances in natural texture synthesis, we train deep neural models to generate textures by non-linearly combining learned noise frequencies. To achieve a highly realistic output conditioned on an exemplar patch, we propose a novel loss function…
▽ More
We present a novel texture synthesis framework, enabling the generation of infinite, high-quality 3D textures given a 2D exemplar image. Inspired by recent advances in natural texture synthesis, we train deep neural models to generate textures by non-linearly combining learned noise frequencies. To achieve a highly realistic output conditioned on an exemplar patch, we propose a novel loss function that combines ideas from both style transfer and generative adversarial networks. In particular, we train the synthesis network to match the Gram matrices of deep features from a discriminator network. In addition, we propose two architectural concepts and an extrapolation strategy that significantly improve generalization performance. In particular, we inject both model input and condition into hidden network layers by learning to scale and bias hidden activations. Quantitative and qualitative evaluations on a diverse set of exemplars motivate our design decisions and show that our system performs superior to previous state of the art. Finally, we conduct a user study that confirms the benefits of our framework.
△ Less
Submitted 30 June, 2020; v1 submitted 29 June, 2020;
originally announced June 2020.
-
Deep Image Translation for Enhancing Simulated Ultrasound Images
Authors:
Lin Zhang,
Tiziano Portenier,
Christoph Paulus,
Orcun Goksel
Abstract:
Ultrasound simulation based on ray tracing enables the synthesis of highly realistic images. It can provide an interactive environment for training sonographers as an educational tool. However, due to high computational demand, there is a trade-off between image quality and interactivity, potentially leading to sub-optimal results at interactive rates. In this work we introduce a deep learning app…
▽ More
Ultrasound simulation based on ray tracing enables the synthesis of highly realistic images. It can provide an interactive environment for training sonographers as an educational tool. However, due to high computational demand, there is a trade-off between image quality and interactivity, potentially leading to sub-optimal results at interactive rates. In this work we introduce a deep learning approach based on adversarial training that mitigates this trade-off by improving the quality of simulated images with constant computation time. An image-to-image translation framework is utilized to translate low quality images into high quality versions. To incorporate anatomical information potentially lost in low quality images, we additionally provide segmentation maps to image translation. Furthermore, we propose to leverage information from acoustic attenuation maps to better preserve acoustic shadows and directional artifacts, an invaluable feature for ultrasound image interpretation. The proposed method yields an improvement of 7.2% in Fréchet Inception Distance and 8.9% in patch-based Kullback-Leibler divergence.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
Learning Generative Models using Denoising Density Estimators
Authors:
Siavash A. Bigdeli,
Geng Lin,
Tiziano Portenier,
L. Andrea Dunbar,
Matthias Zwicker
Abstract:
Learning probabilistic models that can estimate the density of a given set of samples, and generate samples from that density, is one of the fundamental challenges in unsupervised machine learning. We introduce a new generative model based on denoising density estimators (DDEs), which are scalar functions parameterized by neural networks, that are efficiently trained to represent kernel density es…
▽ More
Learning probabilistic models that can estimate the density of a given set of samples, and generate samples from that density, is one of the fundamental challenges in unsupervised machine learning. We introduce a new generative model based on denoising density estimators (DDEs), which are scalar functions parameterized by neural networks, that are efficiently trained to represent kernel density estimators of the data. Leveraging DDEs, our main contribution is a novel technique to obtain generative models by minimizing the KL-divergence directly. We prove that our algorithm for obtaining generative models is guaranteed to converge to the correct solution. Our approach does not require specific network architecture as in normalizing flows, nor use ordinary differential equation solvers as in continuous normalizing flows. Experimental results demonstrate substantial improvement in density estimation and competitive performance in generative model training.
△ Less
Submitted 9 June, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
Smart, Deep Copy-Paste
Authors:
Tiziano Portenier,
Qiyang Hu,
Paolo Favaro,
Matthias Zwicker
Abstract:
In this work, we propose a novel system for smart copy-paste, enabling the synthesis of high-quality results given a masked source image content and a target image context as input. Our system naturally resolves both shading and geometric inconsistencies between source and target image, resulting in a merged result image that features the content from the pasted source image, seamlessly pasted int…
▽ More
In this work, we propose a novel system for smart copy-paste, enabling the synthesis of high-quality results given a masked source image content and a target image context as input. Our system naturally resolves both shading and geometric inconsistencies between source and target image, resulting in a merged result image that features the content from the pasted source image, seamlessly pasted into the target context. Our framework is based on a novel training image transformation procedure that allows to train a deep convolutional neural network end-to-end to automatically learn a representation that is suitable for copy-pasting. Our training procedure works with any image dataset without additional information such as labels, and we demonstrate the effectiveness of our system on two popular datasets, high-resolution face images and the more complex Cityscapes dataset. Our technique outperforms the current state of the art on face images, and we show promising results on the Cityscapes dataset, demonstrating that our system generalizes to much higher resolution than the training data.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Learning to Take Directions One Step at a Time
Authors:
Qiyang Hu,
Adrian Wälchli,
Tiziano Portenier,
Matthias Zwicker,
Paolo Favaro
Abstract:
We present a method to generate a video sequence given a single image. Because items in an image can be animated in arbitrarily many different ways, we introduce as control signal a sequence of motion strokes. Such control signal can be automatically transferred from other videos, e.g., via bounding box tracking. Each motion stroke provides the direction to the moving object in the input image and…
▽ More
We present a method to generate a video sequence given a single image. Because items in an image can be animated in arbitrarily many different ways, we introduce as control signal a sequence of motion strokes. Such control signal can be automatically transferred from other videos, e.g., via bounding box tracking. Each motion stroke provides the direction to the moving object in the input image and we aim to train a network to generate an animation following a sequence of such directions. To address this task we design a novel recurrent architecture, which can be trained easily and effectively thanks to an explicit separation of past, future and current states. As we demonstrate in the experiments, our proposed architecture is capable of generating an arbitrary number of frames from a single image and a sequence of motion strokes. Key components of our architecture are an autoencoding constraint to ensure consistency with the past and a generative adversarial scheme to ensure that images look realistic and are temporally smooth. We demonstrate the effectiveness of our approach on the MNIST, KTH, Human3.6M, Push and Weizmann datasets.
△ Less
Submitted 14 August, 2020; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Specular-to-Diffuse Translation for Multi-View Reconstruction
Authors:
Shihao Wu,
Hui Huang,
Tiziano Portenier,
Matan Sela,
Danny Cohen-Or,
Ron Kimmel,
Matthias Zwicker
Abstract:
Most multi-view 3D reconstruction algorithms, especially when shape-from-shading cues are used, assume that object appearance is predominantly diffuse. To alleviate this restriction, we introduce S2Dnet, a generative adversarial network for transferring multiple views of objects with specular reflection into diffuse ones, so that multi-view reconstruction methods can be applied more effectively. O…
▽ More
Most multi-view 3D reconstruction algorithms, especially when shape-from-shading cues are used, assume that object appearance is predominantly diffuse. To alleviate this restriction, we introduce S2Dnet, a generative adversarial network for transferring multiple views of objects with specular reflection into diffuse ones, so that multi-view reconstruction methods can be applied more effectively. Our network extends unsupervised image-to-image translation to multi-view "specular to diffuse" translation. To preserve object appearance across multiple views, we introduce a Multi-View Coherence loss (MVC) that evaluates the similarity and faithfulness of local patches after the view-transformation. Our MVC loss ensures that the similarity of local correspondences among multi-view images is preserved under the image-to-image translation. As a result, our network yields significantly better results than several single-view baseline techniques. In addition, we carefully design and generate a large synthetic training data set using physically-based rendering. During testing, our network takes only the raw glossy images as input, without extra information such as segmentation masks or lighting estimation. Results demonstrate that multi-view reconstruction can be significantly improved using the images filtered by our network. We also show promising performance on real world training and testing data.
△ Less
Submitted 30 July, 2018; v1 submitted 14 July, 2018;
originally announced July 2018.
-
FaceShop: Deep Sketch-based Face Image Editing
Authors:
Tiziano Portenier,
Qiyang Hu,
Attila Szabó,
Siavash Arjomand Bigdeli,
Paolo Favaro,
Matthias Zwicker
Abstract:
We present a novel system for sketch-based face image editing, enabling users to edit images intuitively by sketching a few strokes on a region of interest. Our interface features tools to express a desired image manipulation by providing both geometry and color constraints as user-drawn strokes. As an alternative to the direct user input, our proposed system naturally supports a copy-paste mode,…
▽ More
We present a novel system for sketch-based face image editing, enabling users to edit images intuitively by sketching a few strokes on a region of interest. Our interface features tools to express a desired image manipulation by providing both geometry and color constraints as user-drawn strokes. As an alternative to the direct user input, our proposed system naturally supports a copy-paste mode, which allows users to edit a given image region by using parts of another exemplar image without the need of hand-drawn sketching at all. The proposed interface runs in real-time and facilitates an interactive and iterative workflow to quickly express the intended edits. Our system is based on a novel sketch domain and a convolutional neural network trained end-to-end to automatically learn to render image regions corresponding to the input strokes. To achieve high quality and semantically consistent results we train our neural network on two simultaneous tasks, namely image completion and image translation. To the best of our knowledge, we are the first to combine these two tasks in a unified framework for interactive image editing. Our results show that the proposed sketch domain, network architecture, and training procedure generalize well to real user input and enable high quality synthesis results without additional post-processing.
△ Less
Submitted 7 June, 2018; v1 submitted 24 April, 2018;
originally announced April 2018.
-
Disentangling Factors of Variation by Mixing Them
Authors:
Qiyang Hu,
Attila Szabó,
Tiziano Portenier,
Matthias Zwicker,
Paolo Favaro
Abstract:
We propose an approach to learn image representations that consist of disentangled factors of variation without exploiting any manual labeling or data domain knowledge. A factor of variation corresponds to an image attribute that can be discerned consistently across a set of images, such as the pose or color of objects. Our disentangled representation consists of a concatenation of feature chunks,…
▽ More
We propose an approach to learn image representations that consist of disentangled factors of variation without exploiting any manual labeling or data domain knowledge. A factor of variation corresponds to an image attribute that can be discerned consistently across a set of images, such as the pose or color of objects. Our disentangled representation consists of a concatenation of feature chunks, each chunk representing a factor of variation. It supports applications such as transferring attributes from one image to another, by simply mixing and unmixing feature chunks, and classification or retrieval based on one or several attributes, by considering a user-specified subset of feature chunks. We learn our representation without any labeling or knowledge of the data domain, using an autoencoder architecture with two novel training objectives: first, we propose an invariance objective to encourage that encoding of each attribute, and decoding of each chunk, are invariant to changes in other attributes and chunks, respectively; second, we include a classification objective, which ensures that each chunk corresponds to a consistently discernible attribute in the represented image, hence avoiding degenerate feature mappings where some chunks are completely ignored. We demonstrate the effectiveness of our approach on the MNIST, Sprites, and CelebA datasets.
△ Less
Submitted 28 March, 2018; v1 submitted 20 November, 2017;
originally announced November 2017.
-
Challenges in Disentangling Independent Factors of Variation
Authors:
Attila Szabó,
Qiyang Hu,
Tiziano Portenier,
Matthias Zwicker,
Paolo Favaro
Abstract:
We study the problem of building models that disentangle independent factors of variation. Such models could be used to encode features that can efficiently be used for classification and to transfer attributes between different images in image synthesis. As data we use a weakly labeled training set. Our weak labels indicate what single factor has changed between two data samples, although the rel…
▽ More
We study the problem of building models that disentangle independent factors of variation. Such models could be used to encode features that can efficiently be used for classification and to transfer attributes between different images in image synthesis. As data we use a weakly labeled training set. Our weak labels indicate what single factor has changed between two data samples, although the relative value of the change is unknown. This labeling is of particular interest as it may be readily available without annotation costs. To make use of weak labels we introduce an autoencoder model and train it through constraints on image pairs and triplets. We formally prove that without additional knowledge there is no guarantee that two images with the same factor of variation will be mapped to the same feature. We call this issue the reference ambiguity. Moreover, we show the role of the feature dimensionality and adversarial training. We demonstrate experimentally that the proposed model can successfully transfer attributes on several datasets, but show also cases when the reference ambiguity occurs.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.