-
Improving Wildlife Out-of-Distribution Detection: Africas Big Five
Authors:
Mufhumudzi Muthivhi,
Jiahao Huo,
Fredrik Gustafsson,
Terence L. van Zyl
Abstract:
Mitigating human-wildlife conflict seeks to resolve unwanted encounters between these parties. Computer Vision provides a solution to identifying individuals that might escalate into conflict, such as members of the Big Five African animals. However, environments often contain several varied species. The current state-of-the-art animal classification models are trained under a closed-world assumpt…
▽ More
Mitigating human-wildlife conflict seeks to resolve unwanted encounters between these parties. Computer Vision provides a solution to identifying individuals that might escalate into conflict, such as members of the Big Five African animals. However, environments often contain several varied species. The current state-of-the-art animal classification models are trained under a closed-world assumption. They almost always remain overconfident in their predictions even when presented with unknown classes. This study investigates out-of-distribution (OOD) detection of wildlife, specifically the Big Five. To this end, we select a parametric Nearest Class Mean (NCM) and a non-parametric contrastive learning approach as baselines to take advantage of pretrained and projected features from popular classification encoders. Moreover, we compare our baselines to various common OOD methods in the literature. The results show feature-based methods reflect stronger generalisation capability across varying classification thresholds. Specifically, NCM with ImageNet pre-trained features achieves a 2%, 4% and 22% improvement on AUPR-IN, AUPR-OUT and AUTC over the best OOD methods, respectively. The code can be found here https://github.com/pxpana/BIG5OOD
△ Less
Submitted 7 June, 2025;
originally announced June 2025.
-
Forward-only Diffusion Probabilistic Models
Authors:
Ziwei Luo,
Fredrik K. Gustafsson,
Jens Sjölund,
Thomas B. Schön
Abstract:
This work presents a forward-only diffusion (FoD) approach for generative modelling. In contrast to traditional diffusion models that rely on a coupled forward-backward diffusion scheme, FoD directly learns data generation through a single forward diffusion process, yielding a simple yet efficient generative framework. The core of FoD is a state-dependent linear stochastic differential equation th…
▽ More
This work presents a forward-only diffusion (FoD) approach for generative modelling. In contrast to traditional diffusion models that rely on a coupled forward-backward diffusion scheme, FoD directly learns data generation through a single forward diffusion process, yielding a simple yet efficient generative framework. The core of FoD is a state-dependent linear stochastic differential equation that involves a mean-reverting term in both the drift and diffusion functions. This mean-reversion property guarantees the convergence to clean data, naturally simulating a stochastic interpolation between source and target distributions. More importantly, FoD is analytically tractable and is trained using a simple stochastic flow matching objective, enabling a few-step non-Markov chain sampling during inference. The proposed FoD model, despite its simplicity, achieves competitive performance on various image-conditioned (e.g., image restoration) and unconditional generation tasks, demonstrating its effectiveness in generative modelling. Our code is available at https://github.com/Algolzw/FoD.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts
Authors:
Fredrik K. Gustafsson,
Mattias Rantalainen
Abstract:
Foundation models have recently become a popular research direction within computational pathology. They are intended to be general-purpose feature extractors, promising to achieve good performance on a range of downstream tasks. Real-world pathology image data does however exhibit considerable variability. Foundation models should be robust to these variations and other distribution shifts which…
▽ More
Foundation models have recently become a popular research direction within computational pathology. They are intended to be general-purpose feature extractors, promising to achieve good performance on a range of downstream tasks. Real-world pathology image data does however exhibit considerable variability. Foundation models should be robust to these variations and other distribution shifts which might be encountered in practice. We evaluate two computational pathology foundation models: UNI (trained on more than 100,000 whole-slide images) and CONCH (trained on more than 1.1 million image-caption pairs), by utilizing them as feature extractors within prostate cancer grading models. We find that while UNI and CONCH perform well relative to baselines, the absolute performance can still be far from satisfactory in certain settings. The fact that foundation models have been trained on large and varied datasets does not guarantee that downstream models always will be robust to common distribution shifts.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction
Authors:
Fredrik K. Gustafsson,
Mattias Rantalainen
Abstract:
Prediction of mRNA gene-expression profiles directly from routine whole-slide images (WSIs) using deep learning models could potentially offer cost-effective and widely accessible molecular phenotyping. While such WSI-based gene-expression prediction models have recently emerged within computational pathology, the high-dimensional nature of the corresponding regression problem offers numerous desi…
▽ More
Prediction of mRNA gene-expression profiles directly from routine whole-slide images (WSIs) using deep learning models could potentially offer cost-effective and widely accessible molecular phenotyping. While such WSI-based gene-expression prediction models have recently emerged within computational pathology, the high-dimensional nature of the corresponding regression problem offers numerous design choices which remain to be analyzed in detail. This study provides recommendations on how deep regression models should be trained for WSI-based gene-expression prediction. For example, we conclude that training a single model to simultaneously regress all 20530 genes is a computationally efficient yet very strong baseline.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Taming Diffusion Models for Image Restoration: A Review
Authors:
Ziwei Luo,
Fredrik K. Gustafsson,
Zheng Zhao,
Jens Sjölund,
Thomas B. Schön
Abstract:
Diffusion models have achieved remarkable progress in generative modelling, particularly in enhancing image quality to conform to human preferences. Recently, these models have also been applied to low-level computer vision for photo-realistic image restoration (IR) in tasks such as image denoising, deblurring, dehazing, etc. In this review paper, we introduce key constructions in diffusion models…
▽ More
Diffusion models have achieved remarkable progress in generative modelling, particularly in enhancing image quality to conform to human preferences. Recently, these models have also been applied to low-level computer vision for photo-realistic image restoration (IR) in tasks such as image denoising, deblurring, dehazing, etc. In this review paper, we introduce key constructions in diffusion models and survey contemporary techniques that make use of diffusion models in solving general IR tasks. Furthermore, we point out the main challenges and limitations of existing diffusion-based IR frameworks and provide potential directions for future work.
△ Less
Submitted 22 October, 2024; v1 submitted 16 September, 2024;
originally announced September 2024.
-
Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models
Authors:
Ziwei Luo,
Fredrik K. Gustafsson,
Zheng Zhao,
Jens Sjölund,
Thomas B. Schön
Abstract:
Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline…
▽ More
Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Extended target tracking utilizing machine-learning software -- with applications to animal classification
Authors:
Magnus Malmström,
Anton Kullberg,
Isaac Skog,
Daniel Axehill,
Fredrik Gustafsson
Abstract:
This paper considers the problem of detecting and tracking objects in a sequence of images. The problem is formulated in a filtering framework, using the output of object-detection algorithms as measurements. An extension to the filtering formulation is proposed that incorporates class information from the previous frame to robustify the classification, even if the object-detection algorithm outpu…
▽ More
This paper considers the problem of detecting and tracking objects in a sequence of images. The problem is formulated in a filtering framework, using the output of object-detection algorithms as measurements. An extension to the filtering formulation is proposed that incorporates class information from the previous frame to robustify the classification, even if the object-detection algorithm outputs an incorrect prediction. Further, the properties of the object-detection algorithm are exploited to quantify the uncertainty of the bounding box detection in each frame. The complete filtering method is evaluated on camera trap images of the four large Swedish carnivores, bear, lynx, wolf, and wolverine. The experiments show that the class tracking formulation leads to a more robust classification.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Controlling Vision-Language Models for Multi-Task Image Restoration
Authors:
Ziwei Luo,
Fredrik K. Gustafsson,
Zheng Zhao,
Jens Sjölund,
Thomas B. Schön
Abstract:
Vision-language models such as CLIP have shown great impact on diverse downstream tasks for zero-shot or label-free predictions. However, when it comes to low-level vision such as image restoration their performance deteriorates dramatically due to corrupted inputs. In this paper, we present a degradation-aware vision-language model (DA-CLIP) to better transfer pretrained vision-language models to…
▽ More
Vision-language models such as CLIP have shown great impact on diverse downstream tasks for zero-shot or label-free predictions. However, when it comes to low-level vision such as image restoration their performance deteriorates dramatically due to corrupted inputs. In this paper, we present a degradation-aware vision-language model (DA-CLIP) to better transfer pretrained vision-language models to low-level vision tasks as a multi-task framework for image restoration. More specifically, DA-CLIP trains an additional controller that adapts the fixed CLIP image encoder to predict high-quality feature embeddings. By integrating the embedding into an image restoration network via cross-attention, we are able to pilot the model to learn a high-fidelity image reconstruction. The controller itself will also output a degradation feature that matches the real corruptions of the input, yielding a natural classifier for different degradation types. In addition, we construct a mixed degradation dataset with synthetic captions for DA-CLIP training. Our approach advances state-of-the-art performance on both \emph{degradation-specific} and \emph{unified} image restoration tasks, showing a promising direction of prompting image restoration with large-scale pretrained vision-language models. Our code is available at https://github.com/Algolzw/daclip-uir.
△ Less
Submitted 28 February, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models
Authors:
Ziwei Luo,
Fredrik K. Gustafsson,
Zheng Zhao,
Jens Sjölund,
Thomas B. Schön
Abstract:
This work aims to improve the applicability of diffusion models in realistic image restoration. Specifically, we enhance the diffusion model in several aspects such as network architecture, noise level, denoising steps, training image size, and optimizer/scheduler. We show that tuning these hyperparameters allows us to achieve better performance on both distortion and perceptual scores. We also pr…
▽ More
This work aims to improve the applicability of diffusion models in realistic image restoration. Specifically, we enhance the diffusion model in several aspects such as network architecture, noise level, denoising steps, training image size, and optimizer/scheduler. We show that tuning these hyperparameters allows us to achieve better performance on both distortion and perceptual scores. We also propose a U-Net based latent diffusion model which performs diffusion in a low-resolution latent space while preserving high-resolution information from the original input for the decoding process. Compared to the previous latent-diffusion model which trains a VAE-GAN to compress the image, our proposed U-Net compression strategy is significantly more stable and can recover highly accurate images without relying on adversarial optimization. Importantly, these modifications allow us to apply diffusion models to various image restoration tasks, including real-world shadow removal, HR non-homogeneous dehazing, stereo super-resolution, and bokeh effect transformation. By simply replacing the datasets and slightly changing the noise network, our model, named Refusion, is able to deal with large-size images (e.g., 6000 x 4000 x 3 in HR dehazing) and produces good results on all the above restoration problems. Our Refusion achieves the best perceptual performance in the NTIRE 2023 Image Shadow Removal Challenge and wins 2nd place overall.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Uncertainty quantification in neural network classifiers -- a local linear approach
Authors:
Magnus Malmström,
Isaac Skog,
Daniel Axehill,
Fredrik Gustafsson
Abstract:
Classifiers based on neural networks (NN) often lack a measure of uncertainty in the predicted class. We propose a method to estimate the probability mass function (PMF) of the different classes, as well as the covariance of the estimated PMF. First, a local linear approach is used during the training phase to recursively compute the covariance of the parameters in the NN. Secondly, in the classif…
▽ More
Classifiers based on neural networks (NN) often lack a measure of uncertainty in the predicted class. We propose a method to estimate the probability mass function (PMF) of the different classes, as well as the covariance of the estimated PMF. First, a local linear approach is used during the training phase to recursively compute the covariance of the parameters in the NN. Secondly, in the classification phase another local linear approach is used to propagate the covariance of the learned NN parameters to the uncertainty in the output of the last layer of the NN. This allows for an efficient Monte Carlo (MC) approach for: (i) estimating the PMF; (ii) calculating the covariance of the estimated PMF; and (iii) proper risk assessment and fusion of multiple classifiers. Two classical image classification tasks, i.e., MNIST, and CFAR10, are used to demonstrate the efficiency the proposed method.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
How Reliable is Your Regression Model's Uncertainty Under Real-World Distribution Shifts?
Authors:
Fredrik K. Gustafsson,
Martin Danelljan,
Thomas B. Schön
Abstract:
Many important computer vision applications are naturally formulated as regression problems. Within medical imaging, accurate regression models have the potential to automate various tasks, helping to lower costs and improve patient outcomes. Such safety-critical deployment does however require reliable estimation of model uncertainty, also under the wide variety of distribution shifts that might…
▽ More
Many important computer vision applications are naturally formulated as regression problems. Within medical imaging, accurate regression models have the potential to automate various tasks, helping to lower costs and improve patient outcomes. Such safety-critical deployment does however require reliable estimation of model uncertainty, also under the wide variety of distribution shifts that might be encountered in practice. Motivated by this, we set out to investigate the reliability of regression uncertainty estimation methods under various real-world distribution shifts. To that end, we propose an extensive benchmark of 8 image-based regression datasets with different types of challenging distribution shifts. We then employ our benchmark to evaluate many of the most common uncertainty estimation methods, as well as two state-of-the-art uncertainty scores from the task of out-of-distribution detection. We find that while methods are well calibrated when there is no distribution shift, they all become highly overconfident on many of the benchmark datasets. This uncovers important limitations of current uncertainty estimation methods, and the proposed benchmark therefore serves as a challenge to the research community. We hope that our benchmark will spur more work on how to develop truly reliable regression uncertainty estimation methods. Code is available at https://github.com/fregu856/regression_uncertainty.
△ Less
Submitted 7 November, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Image Restoration with Mean-Reverting Stochastic Differential Equations
Authors:
Ziwei Luo,
Fredrik K. Gustafsson,
Zheng Zhao,
Jens Sjölund,
Thomas B. Schön
Abstract:
This paper presents a stochastic differential equation (SDE) approach for general-purpose image restoration. The key construction consists in a mean-reverting SDE that transforms a high-quality image into a degraded counterpart as a mean state with fixed Gaussian noise. Then, by simulating the corresponding reverse-time SDE, we are able to restore the origin of the low-quality image without relyin…
▽ More
This paper presents a stochastic differential equation (SDE) approach for general-purpose image restoration. The key construction consists in a mean-reverting SDE that transforms a high-quality image into a degraded counterpart as a mean state with fixed Gaussian noise. Then, by simulating the corresponding reverse-time SDE, we are able to restore the origin of the low-quality image without relying on any task-specific prior knowledge. Crucially, the proposed mean-reverting SDE has a closed-form solution, allowing us to compute the ground truth time-dependent score and learn it with a neural network. Moreover, we propose a maximum likelihood objective to learn an optimal reverse trajectory that stabilizes the training and improves the restoration results. The experiments show that our proposed method achieves highly competitive performance in quantitative comparisons on image deraining, deblurring, and denoising, setting a new state-of-the-art on two deraining datasets. Finally, the general applicability of our approach is further demonstrated via qualitative results on image super-resolution, inpainting, and dehazing. Code is available at https://github.com/Algolzw/image-restoration-sde.
△ Less
Submitted 31 May, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
ECG-Based Electrolyte Prediction: Evaluating Regression and Probabilistic Methods
Authors:
Philipp Von Bachmann,
Daniel Gedon,
Fredrik K. Gustafsson,
Antônio H. Ribeiro,
Erik Lampa,
Stefan Gustafsson,
Johan Sundström,
Thomas B. Schön
Abstract:
Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple…
▽ More
Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple to acquire. However, the problem of estimating continuous electrolyte concentrations directly from ECGs is not well-studied. We therefore investigate if regression methods can be used for accurate ECG-based prediction of electrolyte concentrations. Methods: We explore the use of deep neural networks (DNNs) for this task. We analyze the regression performance across four electrolytes, utilizing a novel dataset containing over 290000 ECGs. For improved understanding, we also study the full spectrum from continuous predictions to binary classification of extreme concentration levels. To enhance clinical usefulness, we finally extend to a probabilistic regression approach and evaluate different uncertainty estimates. Results: We find that the performance varies significantly between different electrolytes, which is clinically justified in the interplay of electrolytes and their manifestation in the ECG. We also compare the regression accuracy with that of traditional machine learning models, demonstrating superior performance of DNNs. Conclusion: Discretization can lead to good classification performance, but does not help solve the original problem of predicting continuous concentration levels. While probabilistic regression demonstrates potential practical usefulness, the uncertainty estimates are not particularly well-calibrated. Significance: Our study is a first step towards accurate and reliable ECG-based prediction of electrolyte concentration levels.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Learning Proposals for Practical Energy-Based Regression
Authors:
Fredrik K. Gustafsson,
Martin Danelljan,
Thomas B. Schön
Abstract:
Energy-based models (EBMs) have experienced a resurgence within machine learning in recent years, including as a promising alternative for probabilistic regression. However, energy-based regression requires a proposal distribution to be manually designed for training, and an initial estimate has to be provided at test-time. We address both of these issues by introducing a conceptually simple metho…
▽ More
Energy-based models (EBMs) have experienced a resurgence within machine learning in recent years, including as a promising alternative for probabilistic regression. However, energy-based regression requires a proposal distribution to be manually designed for training, and an initial estimate has to be provided at test-time. We address both of these issues by introducing a conceptually simple method to automatically learn an effective proposal distribution, which is parameterized by a separate network head. To this end, we derive a surprising result, leading to a unified training objective that jointly minimizes the KL divergence from the proposal to the EBM, and the negative log-likelihood of the EBM. At test-time, we can then employ importance sampling with the trained proposal to efficiently evaluate the learned EBM and produce stand-alone predictions. Furthermore, we utilize our derived training objective to learn mixture density networks (MDNs) with a jointly trained energy-based teacher, consistently outperforming conventional MDN training on four real-world regression tasks within computer vision. Code is available at https://github.com/fregu856/ebms_proposals.
△ Less
Submitted 7 November, 2023; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Uncertainty-Aware Body Composition Analysis with Deep Regression Ensembles on UK Biobank MRI
Authors:
Taro Langner,
Fredrik K. Gustafsson,
Benny Avelin,
Robin Strand,
Håkan Ahlström,
Joel Kullberg
Abstract:
Along with rich health-related metadata, medical images have been acquired for over 40,000 male and female UK Biobank participants, aged 44-82, since 2014. Phenotypes derived from these images, such as measurements of body composition from MRI, can reveal new links between genetics, cardiovascular disease, and metabolic conditions. In this work, six measurements of body composition and adipose tis…
▽ More
Along with rich health-related metadata, medical images have been acquired for over 40,000 male and female UK Biobank participants, aged 44-82, since 2014. Phenotypes derived from these images, such as measurements of body composition from MRI, can reveal new links between genetics, cardiovascular disease, and metabolic conditions. In this work, six measurements of body composition and adipose tissues were automatically estimated by image-based, deep regression with ResNet50 neural networks from neck-to-knee body MRI. Despite the potential for high speed and accuracy, these networks produce no output segmentations that could indicate the reliability of individual measurements. The presented experiments therefore examine uncertainty quantification with mean-variance regression and ensembling to estimate individual measurement errors and thereby identify potential outliers, anomalies, and other failure cases automatically. In 10-fold cross-validation on data of about 8,500 subjects, mean-variance regression and ensembling showed complementary benefits, reducing the mean absolute error across all predictions by 12%. Both improved the calibration of uncertainties and their ability to identify high prediction errors. With intra-class correlation coefficients (ICC) above 0.97, all targets except the liver fat content yielded relative measurement errors below 5%. Testing on another 1,000 subjects showed consistent performance, and the method was finally deployed for inference to 30,000 subjects with missing reference values. The results indicate that deep regression ensembles could ultimately provide automated, uncertainty-aware measurements of body composition for more than 120,000 UK Biobank neck-to-knee body MRI that are to be acquired within the coming years.
△ Less
Submitted 15 September, 2021; v1 submitted 18 January, 2021;
originally announced January 2021.
-
Accurate 3D Object Detection using Energy-Based Models
Authors:
Fredrik K. Gustafsson,
Martin Danelljan,
Thomas B. Schön
Abstract:
Accurate 3D object detection (3DOD) is crucial for safe navigation of complex environments by autonomous robots. Regressing accurate 3D bounding boxes in cluttered environments based on sparse LiDAR data is however a highly challenging problem. We address this task by exploring recent advances in conditional energy-based models (EBMs) for probabilistic regression. While methods employing EBMs for…
▽ More
Accurate 3D object detection (3DOD) is crucial for safe navigation of complex environments by autonomous robots. Regressing accurate 3D bounding boxes in cluttered environments based on sparse LiDAR data is however a highly challenging problem. We address this task by exploring recent advances in conditional energy-based models (EBMs) for probabilistic regression. While methods employing EBMs for regression have demonstrated impressive performance on 2D object detection in images, these techniques are not directly applicable to 3D bounding boxes. In this work, we therefore design a differentiable pooling operator for 3D bounding boxes, serving as the core module of our EBM network. We further integrate this general approach into the state-of-the-art 3D object detector SA-SSD. On the KITTI dataset, our proposed approach consistently outperforms the SA-SSD baseline across all 3DOD metrics, demonstrating the potential of EBM-based regression for highly accurate 3DOD. Code is available at https://github.com/fregu856/ebms_3dod.
△ Less
Submitted 7 November, 2023; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Deep Energy-Based NARX Models
Authors:
Johannes N. Hendriks,
Fredrik K. Gustafsson,
Antônio H. Ribeiro,
Adrian G. Wills,
Thomas B. Schön
Abstract:
This paper is directed towards the problem of learning nonlinear ARX models based on system input--output data. In particular, our interest is in learning a conditional distribution of the current output based on a finite window of past inputs and outputs. To achieve this, we consider the use of so-called energy-based models, which have been developed in allied fields for learning unknown distribu…
▽ More
This paper is directed towards the problem of learning nonlinear ARX models based on system input--output data. In particular, our interest is in learning a conditional distribution of the current output based on a finite window of past inputs and outputs. To achieve this, we consider the use of so-called energy-based models, which have been developed in allied fields for learning unknown distributions based on data. This energy-based model relies on a general function to describe the distribution, and here we consider a deep neural network for this purpose. The primary benefit of this approach is that it is capable of learning both simple and highly complex noise models, which we demonstrate on simulated and experimental data.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
How to Train Your Energy-Based Model for Regression
Authors:
Fredrik K. Gustafsson,
Martin Danelljan,
Radu Timofte,
Thomas B. Schön
Abstract:
Energy-based models (EBMs) have become increasingly popular within computer vision in recent years. While they are commonly employed for generative image modeling, recent work has applied EBMs also for regression tasks, achieving state-of-the-art performance on object detection and visual tracking. Training EBMs is however known to be challenging. While a variety of different techniques have been…
▽ More
Energy-based models (EBMs) have become increasingly popular within computer vision in recent years. While they are commonly employed for generative image modeling, recent work has applied EBMs also for regression tasks, achieving state-of-the-art performance on object detection and visual tracking. Training EBMs is however known to be challenging. While a variety of different techniques have been explored for generative modeling, the application of EBMs to regression is not a well-studied problem. How EBMs should be trained for best possible regression performance is thus currently unclear. We therefore accept the task of providing the first detailed study of this problem. To that end, we propose a simple yet highly effective extension of noise contrastive estimation, and carefully compare its performance to six popular methods from literature on the tasks of 1D regression and object detection. The results of this comparison suggest that our training method should be considered the go-to approach. We also apply our method to the visual tracking task, achieving state-of-the-art performance on five datasets. Notably, our tracker achieves 63.7% AUC on LaSOT and 78.7% Success on TrackingNet. Code is available at https://github.com/fregu856/ebms_regression.
△ Less
Submitted 14 August, 2020; v1 submitted 4 May, 2020;
originally announced May 2020.
-
Energy-Based Models for Deep Probabilistic Regression
Authors:
Fredrik K. Gustafsson,
Martin Danelljan,
Goutam Bhat,
Thomas B. Schön
Abstract:
While deep learning-based classification is generally tackled using standardized approaches, a wide variety of techniques are employed for regression. In computer vision, one particularly popular such technique is that of confidence-based regression, which entails predicting a confidence value for each input-target pair (x,y). While this approach has demonstrated impressive results, it requires im…
▽ More
While deep learning-based classification is generally tackled using standardized approaches, a wide variety of techniques are employed for regression. In computer vision, one particularly popular such technique is that of confidence-based regression, which entails predicting a confidence value for each input-target pair (x,y). While this approach has demonstrated impressive results, it requires important task-dependent design choices, and the predicted confidences lack a natural probabilistic meaning. We address these issues by proposing a general and conceptually simple regression method with a clear probabilistic interpretation. In our proposed approach, we create an energy-based model of the conditional target density p(y|x), using a deep neural network to predict the un-normalized density from (x,y). This model of p(y|x) is trained by directly minimizing the associated negative log-likelihood, approximated using Monte Carlo sampling. We perform comprehensive experiments on four computer vision regression tasks. Our approach outperforms direct regression, as well as other probabilistic and confidence-based methods. Notably, our model achieves a 2.2% AP improvement over Faster-RCNN for object detection on the COCO dataset, and sets a new state-of-the-art on visual tracking when applied for bounding box estimation. In contrast to confidence-based methods, our approach is also shown to be directly applicable to more general tasks such as age and head-pose estimation. Code is available at https://github.com/fregu856/ebms_regression.
△ Less
Submitted 19 July, 2020; v1 submitted 26 September, 2019;
originally announced September 2019.
-
Asynchronous Averaging of Gait Cycles for Classification of Gait and Device Modes
Authors:
Parinaz Kasebzadeh,
Gustaf Hendeby,
Fredrik Gustafsson
Abstract:
An approach for computing unique gait signature using measurements collected from body-worn inertial measurement units (IMUs) is proposed. The gait signature represents one full cycle of the human gait, and is suitable for off-line or on-line classification of the gait mode. The signature can also be used to jointly classify the gait mode and the device mode. The device mode identifies how the IMU…
▽ More
An approach for computing unique gait signature using measurements collected from body-worn inertial measurement units (IMUs) is proposed. The gait signature represents one full cycle of the human gait, and is suitable for off-line or on-line classification of the gait mode. The signature can also be used to jointly classify the gait mode and the device mode. The device mode identifies how the IMU-equipped device is being carried by the user. The method is based on precise segmentation and resampling of the measured IMU signal, as an initial step, further tuned by minimizing the variability of the obtained signature within each gait cycle. Finally, a Fourier series expansion of the gait signature is introduced which provides a low-dimensional feature vector well suited for classification purposes. The proposed method is evaluated on a large dataset involving several subjects, each one containing two different gait modes and four different device modes. The gait signatures enable a high classification rate for each step cycle.
△ Less
Submitted 5 July, 2019; v1 submitted 4 July, 2019;
originally announced July 2019.
-
Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision
Authors:
Fredrik K. Gustafsson,
Martin Danelljan,
Thomas B. Schön
Abstract:
While deep neural networks have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, for example in automotive applications. In Bayesian deep learning, predictive uncertainty is commonly decomposed into the distinct types of aleatoric and epistem…
▽ More
While deep neural networks have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, for example in automotive applications. In Bayesian deep learning, predictive uncertainty is commonly decomposed into the distinct types of aleatoric and epistemic uncertainty. The former can be estimated by letting a neural network output the parameters of a certain probability distribution. Epistemic uncertainty estimation is a more challenging problem, and while different scalable methods recently have emerged, no extensive comparison has been performed in a real-world setting. We therefore accept this task and propose a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods in deep learning. Our proposed framework is specifically designed to test the robustness required in real-world computer vision applications. We also apply this framework to provide the first properly extensive and conclusive comparison of the two current state-of-the-art scalable methods: ensembling and MC-dropout. Our comparison demonstrates that ensembling consistently provides more reliable and practically useful uncertainty estimates. Code is available at https://github.com/fregu856/evaluating_bdl.
△ Less
Submitted 7 April, 2020; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family
Authors:
Tohid Ardeshiri,
Umut Orguner,
Fredrik Gustafsson
Abstract:
In this paper, a Bayesian inference technique based on Taylor series approximation of the logarithm of the likelihood function is presented. The proposed approximation is devised for the case, where the prior distribution belongs to the exponential family of distributions. The logarithm of the likelihood function is linearized with respect to the sufficient statistic of the prior distribution in e…
▽ More
In this paper, a Bayesian inference technique based on Taylor series approximation of the logarithm of the likelihood function is presented. The proposed approximation is devised for the case, where the prior distribution belongs to the exponential family of distributions. The logarithm of the likelihood function is linearized with respect to the sufficient statistic of the prior distribution in exponential family such that the posterior obtains the same exponential family form as the prior. Similarities between the proposed method and the extended Kalman filter for nonlinear filtering are illustrated. Furthermore, an extended target measurement update for target models where the target extent is represented by a random matrix having an inverse Wishart distribution is derived. The approximate update covers the important case where the spread of measurement is due to the target extent as well as the measurement noise in the sensor.
△ Less
Submitted 5 October, 2015;
originally announced October 2015.
-
Fundamental Bounds on Radio Localization Precision in the Far Field
Authors:
B. J. Dil,
F. Gustafsson,
B. J. Hoenders
Abstract:
This paper experimentally and theoretically investigates the fundamental bounds on radio localization precision of far-field Received Signal Strength (RSS) measurements. RSS measurements are proportional to power-flow measurements time-averaged over periods long compared to the coherence time of the radiation. Our experiments are performed in a novel localization setup using 2.4GHz quasi-monochrom…
▽ More
This paper experimentally and theoretically investigates the fundamental bounds on radio localization precision of far-field Received Signal Strength (RSS) measurements. RSS measurements are proportional to power-flow measurements time-averaged over periods long compared to the coherence time of the radiation. Our experiments are performed in a novel localization setup using 2.4GHz quasi-monochromatic radiation, which corresponds to a mean wavelength of 12.5cm. We experimentally and theoretically show that RSS measurements are cross-correlated over a minimum distance that approaches the diffraction limit, which equals half the mean wavelength of the radiation. Our experiments show that measuring RSS beyond a sampling density of one sample per half the mean wavelength does not increase localization precision, as the Root-Mean-Squared-Error (RMSE) converges asymptotically to roughly half the mean wavelength. This adds to the evidence that the diffraction limit determines (1) the lower bound on localization precision and (2) the sampling density that provides optimal localization precision. We experimentally validate the theoretical relations between Fisher information, Cramér-Rao Lower Bound (CRLB) and uncertainty, where uncertainty is lower bounded by diffraction as derived from coherence and speckle theory. When we reconcile Fisher Information with diffraction, the CRLB matches the experimental results with an accuracy of 97-98%.
△ Less
Submitted 23 February, 2016; v1 submitted 27 July, 2015;
originally announced July 2015.