Search | arXiv e-print repository

A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches

Authors: Kangqing Shen, Gemine Vivone, Xiaoyuan Yang, Simone Lolli, Michael Schmitt

Abstract: Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is stil… ▽ More Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is still in its early stages, and many limitations can be highlighted. In this paper, we propose a full research line for supervised learning-based approaches to SAR colorization. Our approach includes a protocol for generating synthetic color SAR images, several baselines, and an effective method based on the conditional generative adversarial network (cGAN) for SAR colorization. We also propose numerical assessment metrics for the problem at hand. To our knowledge, this is the first attempt to propose a research line for SAR colorization that includes a protocol, a benchmark, and a complete performance evaluation. Our extensive tests demonstrate the effectiveness of our proposed cGAN-based network for SAR colorization. The code will be made publicly available. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 16 pages, 16 figures, 6 tables

arXiv:2308.07477 [pdf, other]

Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation for Pixel-wise Regression

Authors: Anton Baumann, Thomas Roßberg, Michael Schmitt

Abstract: Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models, especially in high-stakes real-world scenarios. Despite the availability of numerous methods, they often pose a trade-off between the quality of uncertainty estimation and computational efficiency. Addressing this challenge, we present an adaptation of the Multiple-Input… ▽ More Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models, especially in high-stakes real-world scenarios. Despite the availability of numerous methods, they often pose a trade-off between the quality of uncertainty estimation and computational efficiency. Addressing this challenge, we present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework -- an approach exploiting the overparameterization of deep neural networks -- for pixel-wise regression tasks. Our MIMO variant expands the applicability of the approach from simple image classification to broader computer vision domains. For that purpose, we adapted the U-Net architecture to train multiple subnetworks within a single model, harnessing the overparameterization in deep neural networks. Additionally, we introduce a novel procedure for synchronizing subnetwork performance within the MIMO framework. Our comprehensive evaluations of the resulting MIMO U-Net on two orthogonal datasets demonstrate comparable accuracy to existing models, superior calibration on in-distribution data, robust out-of-distribution detection capabilities, and considerable improvements in parameter size and inference time. Code available at github.com/antonbaumann/MIMO-Unet △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 8 pages (references do not count), Accepted at UnCV (Workshop on Uncertainty Quantification for Computer Vision at ICCV)

arXiv:2306.02800 [pdf]

doi 10.1016/j.jaad.2023.11.065

Using Multiple Dermoscopic Photographs of One Lesion Improves Melanoma Classification via Deep Learning: A Prognostic Diagnostic Accuracy Study

Authors: Achim Hekler, Roman C. Maron, Sarah Haggenmüller, Max Schmitt, Christoph Wies, Jochen S. Utikal, Friedegund Meier, Sarah Hobelsberger, Frank F. Gellrich, Mildred Sergon, Axel Hauschild, Lars E. French, Lucie Heinzerling, Justin G. Schlager, Kamran Ghoreschi, Max Schlaak, Franz J. Hilke, Gabriela Poch, Sören Korsing, Carola Berking, Markus V. Heppt, Michael Erdmann, Sebastian Haferkamp, Konstantin Drexler, Dirk Schadendorf , et al. (6 additional authors not shown)

Abstract: Background: Convolutional neural network (CNN)-based melanoma classifiers face several challenges that limit their usefulness in clinical practice. Objective: To investigate the impact of multiple real-world dermoscopic views of a single lesion of interest on a CNN-based melanoma classifier. Methods: This study evaluated 656 suspected melanoma lesions. Classifier performance was measured using a… ▽ More Background: Convolutional neural network (CNN)-based melanoma classifiers face several challenges that limit their usefulness in clinical practice. Objective: To investigate the impact of multiple real-world dermoscopic views of a single lesion of interest on a CNN-based melanoma classifier. Methods: This study evaluated 656 suspected melanoma lesions. Classifier performance was measured using area under the receiver operating characteristic curve (AUROC), expected calibration error (ECE) and maximum confidence change (MCC) for (I) a single-view scenario, (II) a multiview scenario using multiple artificially modified images per lesion and (III) a multiview scenario with multiple real-world images per lesion. Results: The multiview approach with real-world images significantly increased the AUROC from 0.905 (95% CI, 0.879-0.929) in the single-view approach to 0.930 (95% CI, 0.909-0.951). ECE and MCC also improved significantly from 0.131 (95% CI, 0.105-0.159) to 0.072 (95% CI: 0.052-0.093) and from 0.149 (95% CI, 0.125-0.171) to 0.115 (95% CI: 0.099-0.131), respectively. Comparing multiview real-world to artificially modified images showed comparable diagnostic accuracy and uncertainty estimation, but significantly worse robustness for the latter. Conclusion: Using multiple real-world images is an inexpensive method to positively impact the performance of a CNN-based melanoma classifier. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2304.05464 [pdf, other]

UnCRtainTS: Uncertainty Quantification for Cloud Removal in Optical Satellite Time Series

Authors: Patrick Ebel, Vivien Sainte Fare Garnot, Michael Schmitt, Jan Dirk Wegner, Xiao Xiang Zhu

Abstract: Clouds and haze often occlude optical satellite images, hindering continuous, dense monitoring of the Earth's surface. Although modern deep learning methods can implicitly learn to ignore such occlusions, explicit cloud removal as pre-processing enables manual interpretation and allows training models when only few annotations are available. Cloud removal is challenging due to the wide range of oc… ▽ More Clouds and haze often occlude optical satellite images, hindering continuous, dense monitoring of the Earth's surface. Although modern deep learning methods can implicitly learn to ignore such occlusions, explicit cloud removal as pre-processing enables manual interpretation and allows training models when only few annotations are available. Cloud removal is challenging due to the wide range of occlusion scenarios -- from scenes partially visible through haze, to completely opaque cloud coverage. Furthermore, integrating reconstructed images in downstream applications would greatly benefit from trustworthy quality assessment. In this paper, we introduce UnCRtainTS, a method for multi-temporal cloud removal combining a novel attention-based architecture, and a formulation for multivariate uncertainty prediction. These two components combined set a new state-of-the-art performance in terms of image reconstruction on two public cloud removal datasets. Additionally, we show how the well-calibrated predicted uncertainties enable a precise control of the reconstruction quality. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2203.07378 [pdf, other]

doi 10.1109/TPAMI.2023.3263585

Dawn of the transformer era in speech emotion recognition: closing the valence gap

Authors: Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Felix Burkhardt, Florian Eyben, Björn W. Schuller

Abstract: Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks. In the audio domain, such architectures have also been successfully utilised in the field of speech emotion recognition (SER). However, existing works have not evaluated the influence of model size and pre-training data on downstream perform… ▽ More Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks. In the audio domain, such architectures have also been successfully utilised in the field of speech emotion recognition (SER). However, existing works have not evaluated the influence of model size and pre-training data on downstream performance, and have shown limited attention to generalisation, robustness, fairness, and efficiency. The present contribution conducts a thorough analysis of these aspects on several pre-trained variants of wav2vec 2.0 and HuBERT that we fine-tuned on the dimensions arousal, dominance, and valence of MSP-Podcast, while additionally using IEMOCAP and MOSI to test cross-corpus generalisation. To the best of our knowledge, we obtain the top performance for valence prediction without use of explicit linguistic information, with a concordance correlation coefficient (CCC) of .638 on MSP-Podcast. Furthermore, our investigations reveal that transformer-based architectures are more robust to small perturbations compared to a CNN-based baseline and fair with respect to biological sex groups, but not towards individual speakers. Finally, we are the first to show that their extraordinary success on valence is based on implicit linguistic information learnt during fine-tuning of the transformer layers, which explains why they perform on-par with recent multimodal approaches that explicitly utilise textual information. Our findings collectively paint the following picture: transformer-based architectures constitute the new state-of-the-art in SER, but further advances are needed to mitigate remaining robustness and individual speaker issues. To make our findings reproducible, we release the best performing model to the community. △ Less

Submitted 7 September, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Journal ref: in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10745-10759, 1 Sept. 2023

arXiv:2201.09613 [pdf, other]

doi 10.1109/TGRS.2022.3146246

SEN12MS-CR-TS: A Remote Sensing Data Set for Multi-modal Multi-temporal Cloud Removal

Authors: Patrick Ebel, Yajin Xu, Michael Schmitt, Xiaoxiang Zhu

Abstract: About half of all optical observations collected via spaceborne satellites are affected by haze or clouds. Consequently, cloud coverage affects the remote sensing practitioner's capabilities of a continuous and seamless monitoring of our planet. This work addresses the challenge of optical satellite image reconstruction and cloud removal by proposing a novel multi-modal and multi-temporal data set… ▽ More About half of all optical observations collected via spaceborne satellites are affected by haze or clouds. Consequently, cloud coverage affects the remote sensing practitioner's capabilities of a continuous and seamless monitoring of our planet. This work addresses the challenge of optical satellite image reconstruction and cloud removal by proposing a novel multi-modal and multi-temporal data set called SEN12MS-CR-TS. We propose two models highlighting the benefits and use cases of SEN12MS-CR-TS: First, a multi-modal multi-temporal 3D-Convolution Neural Network that predicts a cloud-free image from a sequence of cloudy optical and radar images. Second, a sequence-to-sequence translation model that predicts a cloud-free time series from a cloud-covered time series. Both approaches are evaluated experimentally, with their respective models trained and tested on SEN12MS-CR-TS. The conducted experiments highlight the contribution of our data set to the remote sensing community as well as the benefits of multi-modal and multi-temporal information to reconstruct noisy information. Our data set is available at https://patrickTUM.github.io/cloud_removal △ Less

Submitted 24 January, 2022; originally announced January 2022.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2022

arXiv:2111.02061 [pdf, other]

Deep-Learning-Based Single-Image Height Reconstruction from Very-High-Resolution SAR Intensity Data

Authors: Michael Recla, Michael Schmitt

Abstract: Originally developed in fields such as robotics and autonomous driving with image-based navigation in mind, deep learning-based single-image depth estimation (SIDE) has found great interest in the wider image analysis community. Remote sensing is no exception, as the possibility to estimate height maps from single aerial or satellite imagery bears great potential in the context of topographic reco… ▽ More Originally developed in fields such as robotics and autonomous driving with image-based navigation in mind, deep learning-based single-image depth estimation (SIDE) has found great interest in the wider image analysis community. Remote sensing is no exception, as the possibility to estimate height maps from single aerial or satellite imagery bears great potential in the context of topographic reconstruction. A few pioneering investigations have demonstrated the general feasibility of single image height prediction from optical remote sensing images and motivate further studies in that direction. With this paper, we present the first-ever demonstration of deep learning-based single image height prediction for the other important sensor modality in remote sensing: synthetic aperture radar (SAR) data. Besides the adaptation of a convolutional neural network (CNN) architecture for SAR intensity images, we present a workflow for the generation of training data, and extensive experimental results for different SAR imaging modes and test sites. Since we put a particular emphasis on transferability, we are able to confirm that deep learning-based single-image height estimation is not only possible, but also transfers quite well to unseen data, even if acquired by different imaging modes and imaging parameters. △ Less

Submitted 19 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

Comments: 19 pages, 14 figures

arXiv:2105.11726 [pdf, other]

There is no data like more data -- current status of machine learning datasets in remote sensing

Authors: Michael Schmitt, Seyed Ali Ahmadi, Ronny Hänsch

Abstract: Annotated datasets have become one of the most crucial preconditions for the development and evaluation of machine learning-based methods designed for the automated interpretation of remote sensing data. In this paper, we review the historic development of such datasets, discuss their features based on a few selected examples, and address open issues for future developments. Annotated datasets have become one of the most crucial preconditions for the development and evaluation of machine learning-based methods designed for the automated interpretation of remote sensing data. In this paper, we review the historic development of such datasets, discuss their features based on a few selected examples, and address open issues for future developments. △ Less

Submitted 17 June, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: accepted for the Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2021

arXiv:2104.00704 [pdf, other]

Remote Sensing Image Classification with the SEN12MS Dataset

Authors: Michael Schmitt, Yu-Lun Wu

Abstract: Image classification is one of the main drivers of the rapid developments in deep learning with convolutional neural networks for computer vision. So is the analogous task of scene classification in remote sensing. However, in contrast to the computer vision community that has long been using well-established, large-scale standard datasets to train and benchmark high-capacity models, the remote se… ▽ More Image classification is one of the main drivers of the rapid developments in deep learning with convolutional neural networks for computer vision. So is the analogous task of scene classification in remote sensing. However, in contrast to the computer vision community that has long been using well-established, large-scale standard datasets to train and benchmark high-capacity models, the remote sensing community still largely relies on relatively small and often application-dependend datasets, thus lacking comparability. With this letter, we present a classification-oriented conversion of the SEN12MS dataset. Using that, we provide results for several baseline models based on two standard CNN architectures and different input data configurations. Our results support the benchmarking of remote sensing image classification and provide insights to the benefit of multi-spectral data and multi-sensor data fusion over conventional RGB imagery. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: accepted for publication in the ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (online from July 2021)

arXiv:2011.11452 [pdf, other]

Multi-task Learning for Human Settlement Extent Regression and Local Climate Zone Classification

Authors: Chunping Qiu, Lukas Liebel, Lloyd H. Hughes, Michael Schmitt, Marco Körner, Xiao Xiang Zhu

Abstract: Human Settlement Extent (HSE) and Local Climate Zone (LCZ) maps are both essential sources, e.g., for sustainable urban development and Urban Heat Island (UHI) studies. Remote sensing (RS)- and deep learning (DL)-based classification approaches play a significant role by providing the potential for global mapping. However, most of the efforts only focus on one of the two schemes, usually on a spec… ▽ More Human Settlement Extent (HSE) and Local Climate Zone (LCZ) maps are both essential sources, e.g., for sustainable urban development and Urban Heat Island (UHI) studies. Remote sensing (RS)- and deep learning (DL)-based classification approaches play a significant role by providing the potential for global mapping. However, most of the efforts only focus on one of the two schemes, usually on a specific scale. This leads to unnecessary redundancies, since the learned features could be leveraged for both of these related tasks. In this letter, the concept of multi-task learning (MTL) is introduced to HSE regression and LCZ classification for the first time. We propose a MTL framework and develop an end-to-end Convolutional Neural Network (CNN), which consists of a backbone network for shared feature learning, attention modules for task-specific feature learning, and a weighting strategy for balancing the two tasks. We additionally propose to exploit HSE predictions as a prior for LCZ classification to enhance the accuracy. The MTL approach was extensively tested with Sentinel-2 data of 13 cities across the world. The results demonstrate that the framework is able to provide a competitive solution for both tasks. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Comments: This work has been accepted by IEEE GRSL for publication

arXiv:2009.07683 [pdf, other]

doi 10.1109/TGRS.2020.3024744

Multi-Sensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery

Authors: Patrick Ebel, Andrea Meraner, Michael Schmitt, Xiaoxiang Zhu

Abstract: This work has been accepted by IEEE TGRS for publication. The majority of optical observations acquired via spaceborne earth imagery are affected by clouds. While there is numerous prior work on reconstructing cloud-covered information, previous studies are oftentimes confined to narrowly-defined regions of interest, raising the question of whether an approach can generalize to a diverse set of ob… ▽ More This work has been accepted by IEEE TGRS for publication. The majority of optical observations acquired via spaceborne earth imagery are affected by clouds. While there is numerous prior work on reconstructing cloud-covered information, previous studies are oftentimes confined to narrowly-defined regions of interest, raising the question of whether an approach can generalize to a diverse set of observations acquired at variable cloud coverage or in different regions and seasons. We target the challenge of generalization by curating a large novel data set for training new cloud removal approaches and evaluate on two recently proposed performance metrics of image quality and diversity. Our data set is the first publically available to contain a global sample of co-registered radar and optical observations, cloudy as well as cloud-free. Based on the observation that cloud coverage varies widely between clear skies and absolute coverage, we propose a novel model that can deal with either extremes and evaluate its performance on our proposed data set. Finally, we demonstrate the superiority of training models on real over synthetic data, underlining the need for a carefully curated data set of real observations. To facilitate future research, our data set is made available online △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: This work has been accepted by IEEE TGRS for publication

arXiv:2009.06992 [pdf, other]

doi 10.1016/j.rse.2020.112096

Mapping horizontal and vertical urban densification in Denmark with Landsat time-series from 1985 to 2018: a semantic segmentation solution

Authors: Tzu-Hsin Karen Chen, Chunping Qiu, Michael Schmitt, Xiao Xiang Zhu, Clive E. Sabel, Alexander V. Prishchepov

Abstract: Landsat imagery is an unparalleled freely available data source that allows reconstructing horizontal and vertical urban form. This paper addresses the challenge of using Landsat data, particularly its 30m spatial resolution, for monitoring three-dimensional urban densification. We compare temporal and spatial transferability of an adapted DeepLab model with a simple fully convolutional network (F… ▽ More Landsat imagery is an unparalleled freely available data source that allows reconstructing horizontal and vertical urban form. This paper addresses the challenge of using Landsat data, particularly its 30m spatial resolution, for monitoring three-dimensional urban densification. We compare temporal and spatial transferability of an adapted DeepLab model with a simple fully convolutional network (FCN) and a texture-based random forest (RF) model to map urban density in the two morphological dimensions: horizontal (compact, open, sparse) and vertical (high rise, low rise). We test whether a model trained on the 2014 data can be applied to 2006 and 1995 for Denmark, and examine whether we could use the model trained on the Danish data to accurately map other European cities. Our results show that an implementation of deep networks and the inclusion of multi-scale contextual information greatly improve the classification and the model's ability to generalize across space and time. DeepLab provides more accurate horizontal and vertical classifications than FCN when sufficient training data is available. By using DeepLab, the F1 score can be increased by 4 and 10 percentage points for detecting vertical urban growth compared to FCN and RF for Denmark. For mapping the other European cities with training data from Denmark, DeepLab also shows an advantage of 6 percentage points over RF for both the dimensions. The resulting maps across the years 1985 to 2018 reveal different patterns of urban growth between Copenhagen and Aarhus, the two largest cities in Denmark, illustrating that those cities have used various planning policies in addressing population growth and housing supply challenges. In summary, we propose a transferable deep learning approach for automated, long-term mapping of urban form from Landsat images. △ Less

Submitted 21 September, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: Accepted manuscript including appendix (supplementary file)

ACM Class: I.4.6; I.4.9; J.2; J.4

Journal ref: Remote Sensing of Environment, 2020, 251

arXiv:2005.07983 [pdf, other]

Multi-level Feature Fusion-based CNN for Local Climate Zone Classification from Sentinel-2 Images: Benchmark Results on the So2Sat LCZ42 Dataset

Authors: Chunping Qiu, Xiaochong Tong, Michael Schmitt, Benjamin Bechtel, Xiao Xiang Zhu

Abstract: As a unique classification scheme for urban forms and functions, the local climate zone (LCZ) system provides essential general information for any studies related to urban environments, especially on a large scale. Remote sensing data-based classification approaches are the key to large-scale mapping and monitoring of LCZs. The potential of deep learning-based approaches is not yet fully explored… ▽ More As a unique classification scheme for urban forms and functions, the local climate zone (LCZ) system provides essential general information for any studies related to urban environments, especially on a large scale. Remote sensing data-based classification approaches are the key to large-scale mapping and monitoring of LCZs. The potential of deep learning-based approaches is not yet fully explored, even though advanced convolutional neural networks (CNNs) continue to push the frontiers for various computer vision tasks. One reason is that published studies are based on different datasets, usually at a regional scale, which makes it impossible to fairly and consistently compare the potential of different CNNs for real-world scenarios. This study is based on the big So2Sat LCZ42 benchmark dataset dedicated to LCZ classification. Using this dataset, we studied a range of CNNs of varying sizes. In addition, we proposed a CNN to classify LCZs from Sentinel-2 images, Sen2LCZ-Net. Using this base network, we propose fusing multi-level features using the extended Sen2LCZ-Net-MF. With this proposed simple network architecture and the highly competitive benchmark dataset, we obtain results that are better than those obtained by the state-of-the-art CNNs, while requiring less computation with fewer layers and parameters. Large-scale LCZ classification examples of completely unseen areas are presented, demonstrating the potential of our proposed Sen2LCZ-Net-MF as well as the So2Sat LCZ42 dataset. We also intensively investigated the influence of network depth and width and the effectiveness of the design choices made for Sen2LCZ-Net-MF. Our work will provide important baselines for future CNN-based algorithm developments for both LCZ classification and other urban land cover land use classification. △ Less

Submitted 16 May, 2020; originally announced May 2020.

arXiv:1912.12171 [pdf, other]

So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones Classification

Authors: Xiao Xiang Zhu, Jingliang Hu, Chunping Qiu, Yilei Shi, Jian Kang, Lichao Mou, Hossein Bagheri, Matthias Häberle, Yuansheng Hua, Rong Huang, Lloyd Hughes, Hao Li, Yao Sun, Guichen Zhang, Shiyao Han, Michael Schmitt, Yuanyuan Wang

Abstract: Access to labeled reference data is one of the grand challenges in supervised machine learning endeavors. This is especially true for an automated analysis of remote sensing images on a global scale, which enables us to address global challenges such as urbanization and climate change using state-of-the-art machine learning techniques. To meet these pressing needs, especially in urban research, we… ▽ More Access to labeled reference data is one of the grand challenges in supervised machine learning endeavors. This is especially true for an automated analysis of remote sensing images on a global scale, which enables us to address global challenges such as urbanization and climate change using state-of-the-art machine learning techniques. To meet these pressing needs, especially in urban research, we provide open access to a valuable benchmark dataset named "So2Sat LCZ42," which consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months. As rarely done in other labeled remote sensing dataset, we conducted rigorous quality assessment by domain experts. The dataset achieved an overall confidence of 85%. We believe this LCZ dataset is a first step towards an unbiased globallydistributed dataset for urban growth monitoring using machine learning methods, because LCZ provide a rather objective measure other than many other semantic land use and land cover classifications. It provides measures of the morphology, compactness, and height of urban areas, which are less dependent on human and culture. This dataset can be accessed from http://doi.org/10.14459/2018mp1483140. △ Less

Submitted 19 December, 2019; originally announced December 2019.

Comments: Article submitted to IEEE Geoscience and Remote Sensing Magazine

arXiv:1901.01548 [pdf, other]

doi 10.1016/j.isprsjprs.2018.12.007

Potential of nonlocally filtered pursuit monostatic TanDEM-X data for coastline detection

Authors: Michael Schmitt, Gerald Baier, Xiao Xiang Zhu

Abstract: This article investigates the potential of nonlocally filtered pursuit monostatic TanDEM-X data for coastline detection in comparison to conventional TanDEM-X data, i.e. image pairs acquired in repeat-pass or bistatic mode. For this task, an unsupervised coastline detection procedure based on scale-space representations and K-medians clustering as well as morphological image post-processing is pro… ▽ More This article investigates the potential of nonlocally filtered pursuit monostatic TanDEM-X data for coastline detection in comparison to conventional TanDEM-X data, i.e. image pairs acquired in repeat-pass or bistatic mode. For this task, an unsupervised coastline detection procedure based on scale-space representations and K-medians clustering as well as morphological image post-processing is proposed. Since this procedure exploits a clear discriminability of "dark" and "bright" appearances of water and land surfaces, respectively, in both SAR amplitude and coherence imagery, TanDEM-X InSAR data acquired in pursuit monostatic mode is expected to provide a promising benefit. In addition, we investigate the benefit introduced by a utilization of a non-local InSAR filter for amplitude denoising and coherence estimation instead of a conventional box-car filter. Experiments carried out on real TanDEM-X pursuit monostatic data confirm our expectations and illustrate the advantage of the employed data configuration over conventional TanDEM-X products for automatic coastline detection. △ Less

Submitted 6 January, 2019; originally announced January 2019.

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing 148: 130-141

arXiv:1810.11415 [pdf, other]

doi 10.1016/j.isprsjprs.2018.07.007

Fusion of TanDEM-X and Cartosat-1 Elevation Data Supported by NeuralNetwork-Predicted Weight Maps

Authors: Hossein Bagheri, Michael Schmitt, Xiao Xiang Zhu

Abstract: Recently, the bistatic SAR interferometry mission TanDEM-X provided a global terrain map with unprecedented accuracy. However, visual inspection and empirical assessment of TanDEM-X elevation data against high-resolution ground truth illustrates that the quality of the DEM decreases in urban areas because of SAR-inherent imaging properties. One possible solution for an enhancement of the TanDEM-X… ▽ More Recently, the bistatic SAR interferometry mission TanDEM-X provided a global terrain map with unprecedented accuracy. However, visual inspection and empirical assessment of TanDEM-X elevation data against high-resolution ground truth illustrates that the quality of the DEM decreases in urban areas because of SAR-inherent imaging properties. One possible solution for an enhancement of the TanDEM-X DEM quality is to fuse it with other elevation data derived from high-resolution optical stereoscopic imagery, such as that provided by the Cartosat-1 mission. This is usually done by Weighted Averaging (WA) of previously aligned DEM cells. The main contribution of this paper is to develop a method to efficiently predict weight maps in order to achieve optimized fusion results. The prediction is modeled using a fully connected Artificial Neural Network (ANN). The idea of this ANN is to extract suitable features from DEMs that relate to height residuals in training areas and then to automatically learn the pattern of the relationship between height errors and features. The results show the DEM fusion based on the ANN-predicted weights improves the qualities of the study DEMs. Apart from increasing the absolute accuracy of Cartosat-1 DEM by DEM fusion, the relative accuracy (respective to reference LiDAR data) ofDEMs is improved by up to 50% in urban areas and 22% in non-urban areas while the improvement by them-based method does not exceed 20% and 10% in urban and non-urban areas respectively. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: This is the pre-acceptance version, to read the final version, please go to ISPRS Journal of Photogrammetry and Remote Sensing on ScienceDirect

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, Volume 144, October 2018, Pages 285-297

arXiv:1810.11413 [pdf, other]

A Framework for SAR-Optical Stereogrammetry over Urban Areas

Authors: Hossein Bagheri, Michael Schmitt, Pablo d'Angelo, Xiao Xiang Zhu

Abstract: Currently, numerous remote sensing satellites provide a huge volume of diverse earth observation data. As these data show different features regarding resolution, accuracy, coverage, and spectral imaging ability, fusion techniques are required to integrate the different properties of each sensor and produce useful information. For example, synthetic aperture radar (SAR) data can be fused with opti… ▽ More Currently, numerous remote sensing satellites provide a huge volume of diverse earth observation data. As these data show different features regarding resolution, accuracy, coverage, and spectral imaging ability, fusion techniques are required to integrate the different properties of each sensor and produce useful information. For example, synthetic aperture radar (SAR) data can be fused with optical imagery to produce 3D information using stereogrammetric methods. The main focus of this study is to investigate the possibility of applying a stereogrammetry pipeline to very-high-resolution (VHR) SAR-optical image pairs. For this purpose, the applicability of semi-global matching is investigated in this unconventional multi-sensor setting. To support the image matching by reducing the search space and accelerating the identification of correct, reliable matches, the possibility of establishing an epipolarity constraint for VHR SAR-optical image pairs is investigated as well. In addition, it is shown that the absolute geolocation accuracy of VHR optical imagery with respect to VHR SAR imagery such as provided by TerraSAR-X can be improved by a multi-sensor block adjustment formulation based on rational polynomial coefficients. Finally, the feasibility of generating point clouds with a median accuracy of about 2m is demonstrated and confirms the potential of 3D reconstruction from SAR-optical image pairs over urban areas. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: This is the pre-acceptance version, to read the final version, please go to ISPRS Journal of Photogrammetry and Remote Sensing on ScienceDirect

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, 2018

arXiv:1810.11314 [pdf, other]

Fusion of Urban TanDEM-X raw DEMs using variational models

Authors: Hossein Bagheri, Michael Schmitt, Xiao Xiang Zhu

Abstract: Recently, a new global Digital Elevation Model (DEM) with pixel spacing of 0.4 arcseconds and relative height accuracy finer than 2m for flat areas (slopes < 20%) and better than 4m for rugged terrain (slopes > 20%) was created through the TanDEM-X mission. One important step of the chain of global DEM generation is to mosaic and fuse multiple raw DEM tiles to reach the target height accuracy. Cur… ▽ More Recently, a new global Digital Elevation Model (DEM) with pixel spacing of 0.4 arcseconds and relative height accuracy finer than 2m for flat areas (slopes < 20%) and better than 4m for rugged terrain (slopes > 20%) was created through the TanDEM-X mission. One important step of the chain of global DEM generation is to mosaic and fuse multiple raw DEM tiles to reach the target height accuracy. Currently, Weighted Averaging (WA) is applied as a fast and simple method for TanDEM-X raw DEM fusion in which the weights are computed from height error maps delivered from the Interferometric TanDEM-X Processor (ITP). However, evaluations show that WA is not the perfect DEM fusion method for urban areas especially in confrontation with edges such as building outlines. The main focus of this paper is to investigate more advanced variational approaches such as TV-L1 and Huber models. Furthermore, we also assess the performance of variational models for fusing raw DEMs produced from data takes with different baseline configurations and height of ambiguities. The results illustrate the high efficiency of variational models for TanDEM-X raw DEM fusion in comparison to WA. Using variational models could improve the DEM quality by up to 2m particularly in inner-city subsets. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: This is the pre-acceptance version, to read the final version, please go to IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing on IEEE Xplore

Journal ref: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018

arXiv:1802.09036 [pdf, other]

doi 10.1016/j.isprsjprs.2017.12.006

Towards Automatic SAR-Optical Stereogrammetry over Urban Areas using Very High Resolution Imagery

Authors: Chunping Qiu, Michael Schmitt, Xiao Xiang Zhu

Abstract: In this paper we discuss the potential and challenges regarding SAR-optical stereogrammetry for urban areas, using very-high-resolution (VHR) remote sensing imagery. Since we do this mainly from a geometrical point of view, we first analyze the height reconstruction accuracy to be expected for different stereogrammetric configurations. Then, we propose a strategy for simultaneous tie point matchin… ▽ More In this paper we discuss the potential and challenges regarding SAR-optical stereogrammetry for urban areas, using very-high-resolution (VHR) remote sensing imagery. Since we do this mainly from a geometrical point of view, we first analyze the height reconstruction accuracy to be expected for different stereogrammetric configurations. Then, we propose a strategy for simultaneous tie point matching and 3D reconstruction, which exploits an epipolar-like search window constraint. To drive the matching and ensure some robustness, we combine different established handcrafted similarity measures. For the experiments, we use real test data acquired by the Worldview-2, TerraSAR-X and MEMPHIS sensors. Our results show that SAR-optical stereogrammetry using VHR imagery is generally feasible with 3D positioning accuracies in the meter-domain, although the matching of these strongly hetereogeneous multi-sensor data remains very challenging. Keywords: Synthetic Aperture Radar (SAR), optical images, remote sensing, data fusion, stereogrammetry △ Less

Submitted 25 February, 2018; originally announced February 2018.

arXiv:1801.08467 [pdf, other]

doi 10.1109/LGRS.2018.2799232

Identifying Corresponding Patches in SAR and Optical Images with a Pseudo-Siamese CNN

Authors: Lloyd H. Hughes, Michael Schmitt, Lichao Mou, Yuanyuan Wang, Xiao Xiang Zhu

Abstract: In this letter, we propose a pseudo-siamese convolutional neural network (CNN) architecture that enables to solve the task of identifying corresponding patches in very-high-resolution (VHR) optical and synthetic aperture radar (SAR) remote sensing imagery. Using eight convolutional layers each in two parallel network streams, a fully connected layer for the fusion of the features learned in each s… ▽ More In this letter, we propose a pseudo-siamese convolutional neural network (CNN) architecture that enables to solve the task of identifying corresponding patches in very-high-resolution (VHR) optical and synthetic aperture radar (SAR) remote sensing imagery. Using eight convolutional layers each in two parallel network streams, a fully connected layer for the fusion of the features learned in each stream, and a loss function based on binary cross-entropy, we achieve a one-hot indication if two patches correspond or not. The network is trained and tested on an automatically generated dataset that is based on a deterministic alignment of SAR and optical imagery via previously reconstructed and subsequently co-registered 3D point clouds. The satellite images, from which the patches comprising our dataset are extracted, show a complex urban scene containing many elevated objects (i.e. buildings), thus providing one of the most difficult experimental environments. The achieved results show that the network is able to predict corresponding patches with high accuracy, thus indicating great potential for further development towards a generalized multi-sensor key-point matching procedure. Index Terms-synthetic aperture radar (SAR), optical imagery, data fusion, deep learning, convolutional neural networks (CNN), image matching, deep matching △ Less

Submitted 25 January, 2018; originally announced January 2018.

arXiv:1801.07499 [pdf, other]

doi 10.1109/TGRS.2018.2790480

Object-based Multipass InSAR via Robust Low Rank Tensor Decomposition

Authors: Jian Kang, Yuanyuan Wang, Michael Schmitt, Xiao Xiang Zhu

Abstract: The most unique advantage of multipass SAR interferometry (InSAR) is the retrieval of long term geophysical parameters, e.g. linear deformation rates, over large areas. Recently, an object-based multipass InSAR framework has been proposed in [1], as an alternative to the typical single-pixel methods, e.g. Persistent Scatterer Interferometry (PSI), or pixel-cluster-based methods, e.g. SqueeSAR. Thi… ▽ More The most unique advantage of multipass SAR interferometry (InSAR) is the retrieval of long term geophysical parameters, e.g. linear deformation rates, over large areas. Recently, an object-based multipass InSAR framework has been proposed in [1], as an alternative to the typical single-pixel methods, e.g. Persistent Scatterer Interferometry (PSI), or pixel-cluster-based methods, e.g. SqueeSAR. This enables the exploitation of inherent properties of InSAR phase stacks on an object level. As a followon, this paper investigates the inherent low rank property of such phase tensors, and proposes a Robust Multipass InSAR technique via Object-based low rank tensor decomposition (RoMIO). We demonstrate that the filtered InSAR phase stacks can improve the accuracy of geophysical parameters estimated via conventional multipass InSAR techniques, e.g. PSI, by a factor of ten to thirty in typical settings. The proposed method is particularly effective against outliers, such as pixels with unmodeled phases. These merits in turn can effectively reduce the number of images required for a reliable estimation. The promising performance of the proposed method is demonstrated using high-resolution TerraSAR-X image stacks. △ Less

Submitted 23 January, 2018; originally announced January 2018.

arXiv:1703.02810 [pdf, other]

An Integrated and Scalable Platform for Proactive Event-Driven Traffic Management

Authors: Alain Kibangou, Alexander Artikis, Evangelos Michelioudakis, Georgios Paliouras, Marius Schmitt, John Lygeros, Chris Baber, Natan Morar, Fabiana Fournier, Inna Skarbovsky

Abstract: Traffic on freeways can be managed by means of ramp meters from Road Traffic Control rooms. Human operators cannot efficiently manage a network of ramp meters. To support them, we present an intelligent platform for traffic management which includes a new ramp metering coordination scheme in the decision making module, an efficient dashboard for interacting with human operators, machine learning t… ▽ More Traffic on freeways can be managed by means of ramp meters from Road Traffic Control rooms. Human operators cannot efficiently manage a network of ramp meters. To support them, we present an intelligent platform for traffic management which includes a new ramp metering coordination scheme in the decision making module, an efficient dashboard for interacting with human operators, machine learning tools for learning event definitions and Complex Event Processing tools able to deal with uncertainties inherent to the traffic use case. Unlike the usual approach, the devised event-driven platform is able to predict a congestion up to 4 minutes before it really happens. Proactive decision making can then be established leading to significant improvement of traffic conditions. △ Less

Submitted 8 March, 2017; originally announced March 2017.

arXiv:1603.09258 [pdf, other]

Distributed Learning in the Presence of Disturbances

Authors: Chithrupa Ramesh, Marius Schmitt, John Lygeros

Abstract: We consider a problem where multiple agents must learn an action profile that maximises the sum of their utilities in a distributed manner. The agents are assumed to have no knowledge of either the utility functions or the actions and payoffs of other agents. These assumptions arise when modelling the interactions in a complex system and communicating between various components of the system are b… ▽ More We consider a problem where multiple agents must learn an action profile that maximises the sum of their utilities in a distributed manner. The agents are assumed to have no knowledge of either the utility functions or the actions and payoffs of other agents. These assumptions arise when modelling the interactions in a complex system and communicating between various components of the system are both difficult. In [1], a distributed algorithm was proposed, which learnt Pareto-efficient solutions in this problem setting. However, the approach assumes that all agents can choose their actions, which precludes disturbances. In this paper, we show that a modified version of this distributed learning algorithm can learn Pareto-efficient solutions, even in the presence of disturbances from a finite set. We apply our approach to the problem of ramp coordination in traffic control for different demand profiles. △ Less

Submitted 30 March, 2016; originally announced March 2016.

Showing 1–23 of 23 results for author: Schmitt, M