-
SAM-Based Building Change Detection with Distribution-Aware Fourier Adaptation and Edge-Constrained Warping
Authors:
Yun-Cheng Li,
Sen Lei,
Yi-Tao Zhao,
Heng-Chao Li,
Jun Li,
Antonio Plaza
Abstract:
Building change detection remains challenging for urban development, disaster assessment, and military reconnaissance. While foundation models like Segment Anything Model (SAM) show strong segmentation capabilities, SAM is limited in the task of building change detection due to domain gap issues. Existing adapter-based fine-tuning approaches face challenges with imbalanced building distribution, r…
▽ More
Building change detection remains challenging for urban development, disaster assessment, and military reconnaissance. While foundation models like Segment Anything Model (SAM) show strong segmentation capabilities, SAM is limited in the task of building change detection due to domain gap issues. Existing adapter-based fine-tuning approaches face challenges with imbalanced building distribution, resulting in poor detection of subtle changes and inaccurate edge extraction. Additionally, bi-temporal misalignment in change detection, typically addressed by optical flow, remains vulnerable to background noises. This affects the detection of building changes and compromises both detection accuracy and edge recognition. To tackle these challenges, we propose a new SAM-Based Network with Distribution-Aware Fourier Adaptation and Edge-Constrained Warping (FAEWNet) for building change detection. FAEWNet utilizes the SAM encoder to extract rich visual features from remote sensing images. To guide SAM in focusing on specific ground objects in remote sensing scenes, we propose a Distribution-Aware Fourier Aggregated Adapter to aggregate task-oriented changed information. This adapter not only effectively addresses the domain gap issue, but also pays attention to the distribution of changed buildings. Furthermore, to mitigate noise interference and misalignment in height offset estimation, we design a novel flow module that refines building edge extraction and enhances the perception of changed buildings. Our state-of-the-art results on the LEVIR-CD, S2Looking and WHU-CD datasets highlight the effectiveness of FAEWNet. The code is available at https://github.com/SUPERMAN123000/FAEWNet.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification
Authors:
Yan He,
Bing Tu,
Puzhao Jiang,
Bo Liu,
Jun Li,
Antonio Plaza
Abstract:
Hyperspectral image (HSI) classification has garnered substantial attention in remote sensing fields. Recent Mamba architectures built upon the Selective State Space Models (S6) have demonstrated enormous potential in long-range sequence modeling. However, the high dimensionality of hyperspectral data and information redundancy pose challenges to the application of Mamba in HSI classification, suf…
▽ More
Hyperspectral image (HSI) classification has garnered substantial attention in remote sensing fields. Recent Mamba architectures built upon the Selective State Space Models (S6) have demonstrated enormous potential in long-range sequence modeling. However, the high dimensionality of hyperspectral data and information redundancy pose challenges to the application of Mamba in HSI classification, suffering from suboptimal performance and computational efficiency. In light of this, this paper investigates a lightweight Interval Group Spatial-Spectral Mamba framework (IGroupSS-Mamba) for HSI classification, which allows for multi-directional and multi-scale global spatial-spectral information extraction in a grouping and hierarchical manner. Technically, an Interval Group S6 Mechanism (IGSM) is developed as the core component, which partitions high-dimensional features into multiple non-overlapping groups at intervals, and then integrates a unidirectional S6 for each group with a specific scanning direction to achieve non-redundant sequence modeling. Compared to conventional applying multi-directional scanning to all bands, this grouping strategy leverages the complementary strengths of different scanning directions while decreasing computational costs. To adequately capture the spatial-spectral contextual information, an Interval Group Spatial-Spectral Block (IGSSB) is introduced, in which two IGSM-based spatial and spectral operators are cascaded to characterize the global spatial-spectral relationship along the spatial and spectral dimensions, respectively. IGroupSS-Mamba is constructed as a hierarchical structure stacked by multiple IGSSB blocks, integrating a pixel aggregation-based downsampling strategy for multiscale spatial-spectral semantic learning from shallow to deep stages. Extensive experiments demonstrate that IGroupSS-Mamba outperforms the state-of-the-art methods.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Hyperspectral Pansharpening: Critical Review, Tools and Future Perspectives
Authors:
Matteo Ciotola,
Giuseppe Guarino,
Gemine Vivone,
Giovanni Poggi,
Jocelyn Chanussot,
Antonio Plaza,
Giuseppe Scarpa
Abstract:
Hyperspectral pansharpening consists of fusing a high-resolution panchromatic band and a low-resolution hyperspectral image to obtain a new image with high resolution in both the spatial and spectral domains. These remote sensing products are valuable for a wide range of applications, driving ever growing research efforts. Nonetheless, results still do not meet application demands. In part, this c…
▽ More
Hyperspectral pansharpening consists of fusing a high-resolution panchromatic band and a low-resolution hyperspectral image to obtain a new image with high resolution in both the spatial and spectral domains. These remote sensing products are valuable for a wide range of applications, driving ever growing research efforts. Nonetheless, results still do not meet application demands. In part, this comes from the technical complexity of the task: compared to multispectral pansharpening, many more bands are involved, in a spectral range only partially covered by the panchromatic component and with overwhelming noise. However, another major limiting factor is the absence of a comprehensive framework for the rapid development and accurate evaluation of new methods. This paper attempts to address this issue.
We started by designing a dataset large and diverse enough to allow reliable training (for data-driven methods) and testing of new methods. Then, we selected a set of state-of-the-art methods, following different approaches, characterized by promising performance, and reimplemented them in a single PyTorch framework. Finally, we carried out a critical comparative analysis of all methods, using the most accredited quality indicators. The analysis highlights the main limitations of current solutions in terms of spectral/spatial quality and computational efficiency, and suggests promising research directions.
To ensure full reproducibility of the results and support future research, the framework (including codes, evaluation procedures and links to the dataset) is shared on https://github.com/matciotola/hyperspectral_pansharpening_toolbox, as a single Python-based reference benchmark toolbox.
△ Less
Submitted 27 December, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
3DSS-Mamba: 3D-Spectral-Spatial Mamba for Hyperspectral Image Classification
Authors:
Yan He,
Bing Tu,
Bo Liu,
Jun Li,
Antonio Plaza
Abstract:
Hyperspectral image (HSI) classification constitutes the fundamental research in remote sensing fields. Convolutional Neural Networks (CNNs) and Transformers have demonstrated impressive capability in capturing spectral-spatial contextual dependencies. However, these architectures suffer from limited receptive fields and quadratic computational complexity, respectively. Fortunately, recent Mamba a…
▽ More
Hyperspectral image (HSI) classification constitutes the fundamental research in remote sensing fields. Convolutional Neural Networks (CNNs) and Transformers have demonstrated impressive capability in capturing spectral-spatial contextual dependencies. However, these architectures suffer from limited receptive fields and quadratic computational complexity, respectively. Fortunately, recent Mamba architectures built upon the State Space Model integrate the advantages of long-range sequence modeling and linear computational efficiency, exhibiting substantial potential in low-dimensional scenarios. Motivated by this, we propose a novel 3D-Spectral-Spatial Mamba (3DSS-Mamba) framework for HSI classification, allowing for global spectral-spatial relationship modeling with greater computational efficiency. Technically, a spectral-spatial token generation (SSTG) module is designed to convert the HSI cube into a set of 3D spectral-spatial tokens. To overcome the limitations of traditional Mamba, which is confined to modeling causal sequences and inadaptable to high-dimensional scenarios, a 3D-Spectral-Spatial Selective Scanning (3DSS) mechanism is introduced, which performs pixel-wise selective scanning on 3D hyperspectral tokens along the spectral and spatial dimensions. Five scanning routes are constructed to investigate the impact of dimension prioritization. The 3DSS scanning mechanism combined with conventional mapping operations forms the 3D-spectral-spatial mamba block (3DMB), enabling the extraction of global spectral-spatial semantic representations. Experimental results and analysis demonstrate that the proposed method outperforms the state-of-the-art methods on HSI classification benchmarks.
△ Less
Submitted 8 August, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Lightweight Change Detection in Heterogeneous Remote Sensing Images with Online All-Integer Pruning Training
Authors:
Chengyang Zhang,
Weiming Li,
Gang Li,
Huina Song,
Zhaohui Song,
Xueqian Wang,
Antonio Plaza
Abstract:
Detection of changes in heterogeneous remote sensing images is vital, especially in response to emergencies like earthquakes and floods. Current homogenous transformation-based change detection (CD) methods often suffer from high computation and memory costs, which are not friendly to edge-computation devices like onboard CD devices at satellites. To address this issue, this paper proposes a new l…
▽ More
Detection of changes in heterogeneous remote sensing images is vital, especially in response to emergencies like earthquakes and floods. Current homogenous transformation-based change detection (CD) methods often suffer from high computation and memory costs, which are not friendly to edge-computation devices like onboard CD devices at satellites. To address this issue, this paper proposes a new lightweight CD method for heterogeneous remote sensing images that employs the online all-integer pruning (OAIP) training strategy to efficiently fine-tune the CD network using the current test data. The proposed CD network consists of two visual geometry group (VGG) subnetworks as the backbone architecture. In the OAIP-based training process, all the weights, gradients, and intermediate data are quantized to integers to speed up training and reduce memory usage, where the per-layer block exponentiation scaling scheme is utilized to reduce the computation errors of network parameters caused by quantization. Second, an adaptive filter-level pruning method based on the L1-norm criterion is employed to further lighten the fine-tuning process of the CD network. Experimental results show that the proposed OAIP-based method attains similar detection performance (but with significantly reduced computation complexity and memory usage) in comparison with state-of-the-art CD methods.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
SpectralGPT: Spectral Remote Sensing Foundation Model
Authors:
Danfeng Hong,
Bing Zhang,
Xuyang Li,
Yuxuan Li,
Chenyu Li,
Jing Yao,
Naoto Yokoya,
Hao Li,
Pedram Ghamisi,
Xiuping Jia,
Antonio Plaza,
Paolo Gamba,
Jon Atli Benediktsson,
Jocelyn Chanussot
Abstract:
The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding,…
▽ More
The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; 4) trains on one million spectral RS images, yielding models with over 600 million parameters. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.
△ Less
Submitted 12 February, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Hyperspectral Unmixing Based on Nonnegative Matrix Factorization: A Comprehensive Review
Authors:
Xin-Ru Feng,
Heng-Chao Li,
Rui Wang,
Qian Du,
Xiuping Jia,
Antonio Plaza
Abstract:
Hyperspectral unmixing has been an important technique that estimates a set of endmembers and their corresponding abundances from a hyperspectral image (HSI). Nonnegative matrix factorization (NMF) plays an increasingly significant role in solving this problem. In this article, we present a comprehensive survey of the NMF-based methods proposed for hyperspectral unmixing. Taking the NMF model as a…
▽ More
Hyperspectral unmixing has been an important technique that estimates a set of endmembers and their corresponding abundances from a hyperspectral image (HSI). Nonnegative matrix factorization (NMF) plays an increasingly significant role in solving this problem. In this article, we present a comprehensive survey of the NMF-based methods proposed for hyperspectral unmixing. Taking the NMF model as a baseline, we show how to improve NMF by utilizing the main properties of HSIs (e.g., spectral, spatial, and structural information). We categorize three important development directions including constrained NMF, structured NMF, and generalized NMF. Furthermore, several experiments are conducted to illustrate the effectiveness of associated algorithms. Finally, we conclude the article with possible future directions with the purposes of providing guidelines and inspiration to promote the development of hyperspectral unmixing.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Optical Remote Sensing Image Understanding with Weak Supervision: Concepts, Methods, and Perspectives
Authors:
Jun Yue,
Leyuan Fang,
Pedram Ghamisi,
Weiying Xie,
Jun Li,
Jocelyn Chanussot,
Antonio J Plaza
Abstract:
In recent years, supervised learning has been widely used in various tasks of optical remote sensing image understanding, including remote sensing image classification, pixel-wise segmentation, change detection, and object detection. The methods based on supervised learning need a large amount of high-quality training data and their performance highly depends on the quality of the labels. However,…
▽ More
In recent years, supervised learning has been widely used in various tasks of optical remote sensing image understanding, including remote sensing image classification, pixel-wise segmentation, change detection, and object detection. The methods based on supervised learning need a large amount of high-quality training data and their performance highly depends on the quality of the labels. However, in practical remote sensing applications, it is often expensive and time-consuming to obtain large-scale data sets with high-quality labels, which leads to a lack of sufficient supervised information. In some cases, only coarse-grained labels can be obtained, resulting in the lack of exact supervision. In addition, the supervised information obtained manually may be wrong, resulting in a lack of accurate supervision. Therefore, remote sensing image understanding often faces the problems of incomplete, inexact, and inaccurate supervised information, which will affect the breadth and depth of remote sensing applications. In order to solve the above-mentioned problems, researchers have explored various tasks in remote sensing image understanding under weak supervision. This paper summarizes the research progress of weakly supervised learning in the field of remote sensing, including three typical weakly supervised paradigms: 1) Incomplete supervision, where only a subset of training data is labeled; 2) Inexact supervision, where only coarse-grained labels of training data are given; 3) Inaccurate supervision, where the labels given are not always true on the ground.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
A3CLNN: Spatial, Spectral and Multiscale Attention ConvLSTM Neural Network for Multisource Remote Sensing Data Classification
Authors:
Heng-Chao Li,
Wen-Shuai Hu,
Wei Li,
Jun Li,
Qian Du,
Antonio Plaza
Abstract:
The problem of effectively exploiting the information multiple data sources has become a relevant but challenging research topic in remote sensing. In this paper, we propose a new approach to exploit the complementarity of two data sources: hyperspectral images (HSIs) and light detection and ranging (LiDAR) data. Specifically, we develop a new dual-channel spatial, spectral and multiscale attentio…
▽ More
The problem of effectively exploiting the information multiple data sources has become a relevant but challenging research topic in remote sensing. In this paper, we propose a new approach to exploit the complementarity of two data sources: hyperspectral images (HSIs) and light detection and ranging (LiDAR) data. Specifically, we develop a new dual-channel spatial, spectral and multiscale attention convolutional long short-term memory neural network (called dual-channel A3CLNN) for feature extraction and classification of multisource remote sensing data. Spatial, spectral and multiscale attention mechanisms are first designed for HSI and LiDAR data in order to learn spectral- and spatial-enhanced feature representations, and to represent multiscale information for different classes. In the designed fusion network, a novel composite attention learning mechanism (combined with a three-level fusion strategy) is used to fully integrate the features in these two data sources. Finally, inspired by the idea of transfer learning, a novel stepwise training strategy is designed to yield a final classification result. Our experimental results, conducted on several multisource remote sensing data sets, demonstrate that the newly proposed dual-channel A3CLNN exhibits better feature representation ability (leading to more competitive classification performance) than other state-of-the-art methods.
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
Multimodal Fusion Transformer for Remote Sensing Image Classification
Authors:
Swalpa Kumar Roy,
Ankur Deria,
Danfeng Hong,
Behnood Rasti,
Antonio Plaza,
Jocelyn Chanussot
Abstract:
Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar tra…
▽ More
Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification. Our mCrossPA utilizes other sources of complementary information in addition to the HSI in the transformer encoder to achieve better generalization. The concept of tokenization is used to generate CLS and HSI patch tokens, helping to learn a {distinctive representation} in a reduced and hierarchical feature space. Extensive experiments are carried out on {widely used benchmark} datasets {i.e.,} the University of Houston, Trento, University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. We compare the results of the proposed MFT model with other state-of-the-art transformers, classical CNNs, and conventional classifiers models. The superior performance achieved by the proposed model is due to the use of multihead cross patch attention. The source code will be made available publicly at \url{https://github.com/AnkurDeria/MFT}.}
△ Less
Submitted 20 June, 2023; v1 submitted 31 March, 2022;
originally announced March 2022.
-
SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers
Authors:
Danfeng Hong,
Zhu Han,
Jing Yao,
Lianru Gao,
Bing Zhang,
Antonio Plaza,
Jocelyn Chanussot
Abstract:
Hyperspectral (HS) images are characterized by approximately contiguous spectral information, enabling the fine identification of materials by capturing subtle spectral discrepancies. Owing to their excellent locally contextual modeling ability, convolutional neural networks (CNNs) have been proven to be a powerful feature extractor in HS image classification. However, CNNs fail to mine and repres…
▽ More
Hyperspectral (HS) images are characterized by approximately contiguous spectral information, enabling the fine identification of materials by capturing subtle spectral discrepancies. Owing to their excellent locally contextual modeling ability, convolutional neural networks (CNNs) have been proven to be a powerful feature extractor in HS image classification. However, CNNs fail to mine and represent the sequence attributes of spectral signatures well due to the limitations of their inherent network backbone. To solve this issue, we rethink HS image classification from a sequential perspective with transformers, and propose a novel backbone network called \ul{SpectralFormer}. Beyond band-wise representations in classic transformers, SpectralFormer is capable of learning spectrally local sequence information from neighboring bands of HS images, yielding group-wise spectral embeddings. More significantly, to reduce the possibility of losing valuable information in the layer-wise propagation process, we devise a cross-layer skip connection to convey memory-like components from shallow to deep layers by adaptively learning to fuse "soft" residuals across layers. It is worth noting that the proposed SpectralFormer is a highly flexible backbone network, which can be applicable to both pixel- and patch-wise inputs. We evaluate the classification performance of the proposed SpectralFormer on three HS datasets by conducting extensive experiments, showing the superiority over classic transformers and achieving a significant improvement in comparison with state-of-the-art backbone networks. The codes of this work will be available at https://github.com/danfenghong/IEEE_TGRS_SpectralFormer for the sake of reproducibility.
△ Less
Submitted 19 November, 2021; v1 submitted 6 July, 2021;
originally announced July 2021.
-
AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks
Authors:
S. K. Roy,
M. E. Paoletti,
J. M. Haut,
S. R. Dubey,
P. Kar,
A. Plaza,
B. B. Chaudhuri
Abstract:
Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper propose…
▽ More
Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular information. Two variants of AngularGrad are developed based on the use of Tangent or Cosine functions for computing the gradient angular information. Theoretically, AngularGrad exhibits the same regret bound as Adam for convergence purposes. Nevertheless, extensive experiments conducted on benchmark data sets against state-of-the-art methods reveal a superior performance of AngularGrad. The source code will be made publicly available at: https://github.com/mhaut/AngularGrad.
△ Less
Submitted 9 September, 2023; v1 submitted 21 May, 2021;
originally announced May 2021.
-
Graph Convolutional Networks for Hyperspectral Image Classification
Authors:
Danfeng Hong,
Lianru Gao,
Jing Yao,
Bing Zhang,
Antonio Plaza,
Jocelyn Chanussot
Abstract:
To read the final version please go to IEEE TGRS on IEEE Xplore. Convolutional neural networks (CNNs) have been attracting increasing attention in hyperspectral (HS) image classification, owing to their ability to capture spatial-spectral feature representations. Nevertheless, their ability in modeling relations between samples remains limited. Beyond the limitations of grid sampling, graph convol…
▽ More
To read the final version please go to IEEE TGRS on IEEE Xplore. Convolutional neural networks (CNNs) have been attracting increasing attention in hyperspectral (HS) image classification, owing to their ability to capture spatial-spectral feature representations. Nevertheless, their ability in modeling relations between samples remains limited. Beyond the limitations of grid sampling, graph convolutional networks (GCNs) have been recently proposed and successfully applied in irregular (or non-grid) data representation and analysis. In this paper, we thoroughly investigate CNNs and GCNs (qualitatively and quantitatively) in terms of HS image classification. Due to the construction of the adjacency matrix on all the data, traditional GCNs usually suffer from a huge computational cost, particularly in large-scale remote sensing (RS) problems. To this end, we develop a new mini-batch GCN (called miniGCN hereinafter) which allows to train large-scale GCNs in a mini-batch fashion. More significantly, our miniGCN is capable of inferring out-of-sample data without re-training networks and improving classification performance. Furthermore, as CNNs and GCNs can extract different types of HS features, an intuitive solution to break the performance bottleneck of a single model is to fuse them. Since miniGCNs can perform batch-wise network training (enabling the combination of CNNs and GCNs) we explore three fusion strategies: additive fusion, element-wise multiplicative fusion, and concatenation fusion to measure the obtained performance gain. Extensive experiments, conducted on three HS datasets, demonstrate the advantages of miniGCNs over GCNs and the superiority of the tested fusion strategies with regards to the single CNN or GCN models. The codes of this work will be available at https://github.com/danfenghong/IEEE_TGRS_GCN for the sake of reproducibility.
△ Less
Submitted 14 January, 2021; v1 submitted 6 August, 2020;
originally announced August 2020.
-
Image Segmentation Using Deep Learning: A Survey
Authors:
Shervin Minaee,
Yuri Boykov,
Fatih Porikli,
Antonio Plaza,
Nasser Kehtarnavaz,
Demetri Terzopoulos
Abstract:
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of v…
▽ More
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.
△ Less
Submitted 14 November, 2020; v1 submitted 15 January, 2020;
originally announced January 2020.
-
Naive Gabor Networks for Hyperspectral Image Classification
Authors:
Chenying Liu,
Jun Li,
Lin He,
Antonio J. Plaza,
Shutao Li,
Bo Li
Abstract:
Recently, many convolutional neural network (CNN) methods have been designed for hyperspectral image (HSI) classification since CNNs are able to produce good representations of data, which greatly benefits from a huge number of parameters. However, solving such a high-dimensional optimization problem often requires a large amount of training samples in order to avoid overfitting. Additionally, it…
▽ More
Recently, many convolutional neural network (CNN) methods have been designed for hyperspectral image (HSI) classification since CNNs are able to produce good representations of data, which greatly benefits from a huge number of parameters. However, solving such a high-dimensional optimization problem often requires a large amount of training samples in order to avoid overfitting. Additionally, it is a typical non-convex problem affected by many local minima and flat regions. To address these problems, in this paper, we introduce naive Gabor Networks or Gabor-Nets which, for the first time in the literature, design and learn CNN kernels strictly in the form of Gabor filters, aiming to reduce the number of involved parameters and constrain the solution space, and hence improve the performances of CNNs. Specifically, we develop an innovative phase-induced Gabor kernel, which is trickily designed to perform the Gabor feature learning via a linear combination of local low-frequency and high-frequency components of data controlled by the kernel phase. With the phase-induced Gabor kernel, the proposed Gabor-Nets gains the ability to automatically adapt to the local harmonic characteristics of the HSI data and thus yields more representative harmonic features. Also, this kernel can fulfill the traditional complex-valued Gabor filtering in a real-valued manner, hence making Gabor-Nets easily perform in a usual CNN thread. We evaluated our newly developed Gabor-Nets on three well-known HSIs, suggesting that our proposed Gabor-Nets can significantly improve the performance of CNNs, particularly with a small training set.
△ Less
Submitted 24 February, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
Integration of LiDAR and Hyperspectral Data for Land-cover Classification: A Case Study
Authors:
Pedram Ghamisi,
Gabriele Cavallaro,
Dan,
Wu,
Jon Atli Benediktsson,
Antonio Plaza
Abstract:
In this paper, an approach is proposed to fuse LiDAR and hyperspectral data, which considers both spectral and spatial information in a single framework. Here, an extended self-dual attribute profile (ESDAP) is investigated to extract spatial information from a hyperspectral data set. To extract spectral information, a few well-known classifiers have been used such as support vector machines (SVMs…
▽ More
In this paper, an approach is proposed to fuse LiDAR and hyperspectral data, which considers both spectral and spatial information in a single framework. Here, an extended self-dual attribute profile (ESDAP) is investigated to extract spatial information from a hyperspectral data set. To extract spectral information, a few well-known classifiers have been used such as support vector machines (SVMs), random forests (RFs), and artificial neural networks (ANNs). The proposed method accurately classify the relatively volumetric data set in a few CPU processing time in a real ill-posed situation where there is no balance between the number of training samples and the number of features. The classification part of the proposed approach is fully-automatic.
△ Less
Submitted 9 July, 2017;
originally announced July 2017.
-
Drawbacks and alternatives to the numerical calculation of the base inertial parameters expressions for low mobility mechanisms
Authors:
Xabier Iriarte,
Javier Ros,
Aitor Plaza,
Jokin Aginaga,
Vicente Mata
Abstract:
Base inertial parameters constitute a minimal inertial parametrization of mechanical systems that is of interest, for example, in parameter estimation and model reduction. Numerical and symbolic methods are available to determine their expressions. In this paper the problems associated with the numerical determination of the base inertial parameters expressions in the context of low mobility mecha…
▽ More
Base inertial parameters constitute a minimal inertial parametrization of mechanical systems that is of interest, for example, in parameter estimation and model reduction. Numerical and symbolic methods are available to determine their expressions. In this paper the problems associated with the numerical determination of the base inertial parameters expressions in the context of low mobility mechanisms are analyzed and discussed through and example. To circumvent these problems two alternatives are proposed: a variable precision arithmetic implementation of the customary numerical algorithm and the application of a general symbolic method. Finally, the advantages of both approaches compared to the numerical one are discussed in the context of the proposed low mobility example.
△ Less
Submitted 28 June, 2017;
originally announced June 2017.
-
Symbolic Multibody Methods for Real-Time Simulation of Railway Vehicles
Authors:
Javier Ros,
Aitor Plaza,
Xabier Iriarte,
Jesús María Pintor
Abstract:
In this work, recently developed state-of-the-art symbolic multibody methods are tested to accurately model a complex railway vehicle. The model is generated using a symbolic implementation of the principle of the virtual power. Creep forces are modeled using a direct symbolic implementation of the standard linear Kalker model. No simplifications, as base parameter reduction, partial-linearization…
▽ More
In this work, recently developed state-of-the-art symbolic multibody methods are tested to accurately model a complex railway vehicle. The model is generated using a symbolic implementation of the principle of the virtual power. Creep forces are modeled using a direct symbolic implementation of the standard linear Kalker model. No simplifications, as base parameter reduction, partial-linearization or look-up tables for contact kinematics, are used. An Implicit-Explicit integration scheme is proposed to efficiently deal with the stiff creep dynamics. Hard real-time performance is achieved: the CPU time required for a very stable 1 ms integration time step is 256 μs.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
Simplification of multibody models by parameter reduction
Authors:
Javier Ros,
Xabier Iriarte,
Aitor Plaza,
Vicente Mata
Abstract:
Model selection methods are used in different scientific contexts to represent a characteristic data set in terms of a reduced number of parameters. Apparently, these methods have not found their way into the literature on multibody systems dynamics. Multibody models can be considered parametric models in terms of their dynamic parameters, and model selection techniques can then be used to express…
▽ More
Model selection methods are used in different scientific contexts to represent a characteristic data set in terms of a reduced number of parameters. Apparently, these methods have not found their way into the literature on multibody systems dynamics. Multibody models can be considered parametric models in terms of their dynamic parameters, and model selection techniques can then be used to express these models in terms of a reduced number of parameters. These parameter-reduced models are expected to have a smaller computational complexity than the original one and still preserve the desired level of accuracy. They are also known to be good candidates for parameter estimation purposes.
In this work, simulations of the actual model are used to define a data set that is representative of the system's standard working conditions. A parameter-reduced model is chosen and its parameter values estimated so that they minimize the prediction error on these data. To that end, model selection heuristics and normalized error measures are proposed.
Using this methodology, two multibody systems with very different characteristic mobility are analyzed. Highly considerable reductions in the number of parameters and computational cost are obtained without compromising the accuracy of the reduced model too much. As an additional result, a generalization of the base parameter concept to the context of parameter-reduced models is proposed.
△ Less
Submitted 29 May, 2017;
originally announced May 2017.
-
Adaptive Deep Pyramid Matching for Remote Sensing Scene Classification
Authors:
Qingshan Liu,
Renlong Hang,
Huihui Song,
Fuping Zhu,
Javier Plaza,
Antonio Plaza
Abstract:
Convolutional neural networks (CNNs) have attracted increasing attention in the remote sensing community. Most CNNs only take the last fully-connected layers as features for the classification of remotely sensed images, discarding the other convolutional layer features which may also be helpful for classification purposes. In this paper, we propose a new adaptive deep pyramid matching (ADPM) model…
▽ More
Convolutional neural networks (CNNs) have attracted increasing attention in the remote sensing community. Most CNNs only take the last fully-connected layers as features for the classification of remotely sensed images, discarding the other convolutional layer features which may also be helpful for classification purposes. In this paper, we propose a new adaptive deep pyramid matching (ADPM) model that takes advantage of the features from all of the convolutional layers for remote sensing image classification. To this end, the optimal fusing weights for different convolutional layers are learned from the data itself. In remotely sensed scenes, the objects of interest exhibit different scales in distinct scenes, and even a single scene may contain objects with different sizes. To address this issue, we select the CNN with spatial pyramid pooling (SPP-net) as the basic deep network, and further construct a multi-scale ADPM model to learn complementary information from multi-scale images. Our experiments have been conducted using two widely used remote sensing image databases, and the results show that the proposed method significantly improves the performance when compared to other state-of-the-art methods.
△ Less
Submitted 11 November, 2016;
originally announced November 2016.