Search | arXiv e-print repository

Modular Transformer Architecture for Precision Agriculture Imaging

Authors: Brian Gopalan, Nathalia Nascimento, Vishal Monga

Abstract: This paper addresses the critical need for efficient and accurate weed segmentation from drone video in precision agriculture. A quality-aware modular deep-learning framework is proposed that addresses common image degradation by analyzing quality conditions-such as blur and noise-and routing inputs through specialized pre-processing and transformer models optimized for each degradation type. The… ▽ More This paper addresses the critical need for efficient and accurate weed segmentation from drone video in precision agriculture. A quality-aware modular deep-learning framework is proposed that addresses common image degradation by analyzing quality conditions-such as blur and noise-and routing inputs through specialized pre-processing and transformer models optimized for each degradation type. The system first analyzes drone images for noise and blur using Mean Absolute Deviation and the Laplacian. Data is then dynamically routed to one of three vision transformer models: a baseline for clean images, a modified transformer with Fisher Vector encoding for noise reduction, or another with an unrolled Lucy-Richardson decoder to correct blur. This novel routing strategy allows the system to outperform existing CNN-based methods in both segmentation quality and computational efficiency, demonstrating a significant advancement in deep-learning applications for agriculture. △ Less

Submitted 7 August, 2025; v1 submitted 4 August, 2025; originally announced August 2025.

Comments: Preprint of paper submitted to IEEE-AIOT 2025

arXiv:2402.12872 [pdf, other]

Deep, convergent, unrolled half-quadratic splitting for image deconvolution

Authors: Yanan Zhao, Yuelong Li, Haichuan Zhang, Vishal Monga, Yonina C. Eldar

Abstract: In recent years, algorithm unrolling has emerged as a powerful technique for designing interpretable neural networks based on iterative algorithms. Imaging inverse problems have particularly benefited from unrolling-based deep network design since many traditional model-based approaches rely on iterative optimization. Despite exciting progress, typical unrolling approaches heuristically design lay… ▽ More In recent years, algorithm unrolling has emerged as a powerful technique for designing interpretable neural networks based on iterative algorithms. Imaging inverse problems have particularly benefited from unrolling-based deep network design since many traditional model-based approaches rely on iterative optimization. Despite exciting progress, typical unrolling approaches heuristically design layer-specific convolution weights to improve performance. Crucially, convergence properties of the underlying iterative algorithm are lost once layer-specific parameters are learned from training data. We propose an unrolling technique that breaks the trade-off between retaining algorithm properties while simultaneously enhancing performance. We focus on image deblurring and unrolling the widely-applied Half-Quadratic Splitting (HQS) algorithm. We develop a new parametrization scheme which enforces layer-specific parameters to asymptotically approach certain fixed points. Through extensive experimental studies, we verify that our approach achieves competitive performance with state-of-the-art unrolled layer-specific learning and significantly improves over the traditional HQS algorithm. We further establish convergence of the proposed unrolled network as the number of layers approaches infinity, and characterize its convergence rate. Our experimental verification involves simulations that validate the analytical results as well as comparison with state-of-the-art non-blind deblurring techniques on benchmark datasets. The merits of the proposed convergent unrolled network are established over competing alternatives, especially in the regime of limited training. △ Less

Submitted 25 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted with mandatory minor revisions by Transactions on Computational Imaging

arXiv:2308.14904 [pdf, other]

Maturity-Aware Active Learning for Semantic Segmentation with Hierarchically-Adaptive Sample Assessment

Authors: Amirsaeed Yazdani, Xuelu Li, Vishal Monga

Abstract: Active Learning (AL) for semantic segmentation is challenging due to heavy class imbalance and different ways of defining "sample" (pixels, areas, etc.), leaving the interpretation of the data distribution ambiguous. We propose "Maturity-Aware Distribution Breakdown-based Active Learning'' (MADBAL), an AL method that benefits from a hierarchical approach to define a multiview data distribution, wh… ▽ More Active Learning (AL) for semantic segmentation is challenging due to heavy class imbalance and different ways of defining "sample" (pixels, areas, etc.), leaving the interpretation of the data distribution ambiguous. We propose "Maturity-Aware Distribution Breakdown-based Active Learning'' (MADBAL), an AL method that benefits from a hierarchical approach to define a multiview data distribution, which takes into account the different "sample" definitions jointly, hence able to select the most impactful segmentation pixels with comprehensive understanding. MADBAL also features a novel uncertainty formulation, where AL supporting modules are included to sense the features' maturity whose weighted influence continuously contributes to the uncertainty detection. In this way, MADBAL makes significant performance leaps even in the early AL stage, hence reducing the training burden significantly. It outperforms state-of-the-art methods on Cityscapes and PASCAL VOC datasets as verified in our extensive experiments. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted to the 34th British Machine Vision Conference (BMVC 2023)

MSC Class: 68-06 ACM Class: I.4.6; I.5.1

arXiv:2203.15082 [pdf, other]

doi 10.1109/TGRS.2022.3162420

Iterative, Deep Synthetic Aperture Sonar Image Segmentation

Authors: Yung-Chen Sun, Isaac D. Gerg, Vishal Monga

Abstract: Synthetic aperture sonar (SAS) systems produce high-resolution images of the seabed environment. Moreover, deep learning has demonstrated superior ability in finding robust features for automating imagery analysis. However, the success of deep learning is conditioned on having lots of labeled training data, but obtaining generous pixel-level annotations of SAS imagery is often practically infeasib… ▽ More Synthetic aperture sonar (SAS) systems produce high-resolution images of the seabed environment. Moreover, deep learning has demonstrated superior ability in finding robust features for automating imagery analysis. However, the success of deep learning is conditioned on having lots of labeled training data, but obtaining generous pixel-level annotations of SAS imagery is often practically infeasible. This challenge has thus far limited the adoption of deep learning methods for SAS segmentation. Algorithms exist to segment SAS imagery in an unsupervised manner, but they lack the benefit of state-of-the-art learning methods and the results present significant room for improvement. In view of the above, we propose a new iterative algorithm for unsupervised SAS image segmentation combining superpixel formation, deep learning, and traditional clustering methods. We call our method Iterative Deep Unsupervised Segmentation (IDUS). IDUS is an unsupervised learning framework that can be divided into four main steps: 1) A deep network estimates class assignments. 2) Low-level image features from the deep network are clustered into superpixels. 3) Superpixels are clustered into class assignments (which we call pseudo-labels) using $k$-means. 4) Resulting pseudo-labels are used for loss backpropagation of the deep network prediction. These four steps are performed iteratively until convergence. A comparison of IDUS to current state-of-the-art methods on a realistic benchmark dataset for SAS image segmentation demonstrates the benefits of our proposal even as the IDUS incurs a much lower computational burden during inference (actual labeling of a test image). Finally, we also develop a semi-supervised (SS) extension of IDUS called IDSS and demonstrate experimentally that it can further enhance performance while outperforming supervised alternatives that exploit the same labeled training imagery. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: arXiv admin note: text overlap with arXiv:2107.14563

arXiv:2203.09580 [pdf, other]

Surface Defect Detection and Evaluation for Marine Vessels using Multi-Stage Deep Learning

Authors: Li Yu, Kareem Metwaly, James Z. Wang, Vishal Monga

Abstract: Detecting and evaluating surface coating defects is important for marine vessel maintenance. Currently, the assessment is carried out manually by qualified inspectors using international standards and their own experience. Automating the processes is highly challenging because of the high level of variation in vessel type, paint surface, coatings, lighting condition, weather condition, paint color… ▽ More Detecting and evaluating surface coating defects is important for marine vessel maintenance. Currently, the assessment is carried out manually by qualified inspectors using international standards and their own experience. Automating the processes is highly challenging because of the high level of variation in vessel type, paint surface, coatings, lighting condition, weather condition, paint colors, areas of the vessel, and time in service. We present a novel deep learning-based pipeline to detect and evaluate the percentage of corrosion, fouling, and delamination on the vessel surface from normal photographs. We propose a multi-stage image processing framework, including ship section segmentation, defect segmentation, and defect classification, to automatically recognize different types of defects and measure the coverage percentage on the ship surface. Experimental results demonstrate that our proposed pipeline can objectively perform a similar assessment as a qualified inspector. △ Less

Submitted 17 March, 2022; originally announced March 2022.

arXiv:2203.03079 [pdf, other]

GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction

Authors: Kareem Metwaly, Aerin Kim, Elliot Branson, Vishal Monga

Abstract: Attaching attributes (such as color, shape, state, action) to object categories is an important computer vision problem. Attribute prediction has seen exciting recent progress and is often formulated as a multi-label classification problem. Yet significant challenges remain in: 1) predicting diverse attributes over multiple categories, 2) modeling attributes-category dependency, 3) capturing both… ▽ More Attaching attributes (such as color, shape, state, action) to object categories is an important computer vision problem. Attribute prediction has seen exciting recent progress and is often formulated as a multi-label classification problem. Yet significant challenges remain in: 1) predicting diverse attributes over multiple categories, 2) modeling attributes-category dependency, 3) capturing both global and local scene context, and 4) predicting attributes of objects with low pixel-count. To address these issues, we propose a novel multi-category attribute prediction deep architecture named GlideNet, which contains three distinct feature extractors. A global feature extractor recognizes what objects are present in a scene, whereas a local one focuses on the area surrounding the object of interest. Meanwhile, an intrinsic feature extractor uses an extension of standard convolution dubbed Informed Convolution to retrieve features of objects with low pixel-count. GlideNet uses gating mechanisms with binary masks and its self-learned category embedding to combine the dense embeddings. Collectively, the Global-Local-Intrinsic blocks comprehend the scene's global context while attending to the characteristics of the local object of interest. Finally, using the combined features, an interpreter predicts the attributes, and the length of the output is determined by the category, thereby removing unnecessary attributes. GlideNet can achieve compelling results on two recent and challenging datasets -- VAW and CAR -- for large-scale attribute prediction. For instance, it obtains more than 5\% gain over state of the art in the mean recall (mR) metric. GlideNet's advantages are especially apparent when predicting attributes of objects with low pixel counts as well as attributes that demand global context understanding. Finally, we show that GlideNet excels in training starved real-world scenarios. △ Less

Submitted 14 March, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

Comments: CVPR 2022, 16 pages (including supplementary), CAR Dataset, VAW Dataset, http://signal.ee.psu.edu/research/glidenet.html

arXiv:2111.08243 [pdf, other]

CAR -- Cityscapes Attributes Recognition A Multi-category Attributes Dataset for Autonomous Vehicles

Authors: Kareem Metwaly, Aerin Kim, Elliot Branson, Vishal Monga

Abstract: Self-driving vehicles are the future of transportation. With current advancements in this field, the world is getting closer to safe roads with almost zero probability of having accidents and eliminating human errors. However, there is still plenty of research and development necessary to reach a level of robustness. One important aspect is to understand a scene fully including all details. As som… ▽ More Self-driving vehicles are the future of transportation. With current advancements in this field, the world is getting closer to safe roads with almost zero probability of having accidents and eliminating human errors. However, there is still plenty of research and development necessary to reach a level of robustness. One important aspect is to understand a scene fully including all details. As some characteristics (attributes) of objects in a scene (drivers' behavior for instance) could be imperative for correct decision making. However, current algorithms suffer from low-quality datasets with such rich attributes. Therefore, in this paper, we present a new dataset for attributes recognition -- Cityscapes Attributes Recognition (CAR). The new dataset extends the well-known dataset Cityscapes by adding an additional yet important annotation layer of attributes of objects in each image. Currently, we have annotated more than 32k instances of various categories (Vehicles, Pedestrians, etc.). The dataset has a structured and tailored taxonomy where each category has its own set of possible attributes. The tailored taxonomy focuses on attributes that is of most beneficent for developing better self-driving algorithms that depend on accurate computer vision and scene comprehension. We have also created an API for the dataset to ease the usage of CAR. The API can be accessed through https://github.com/kareem-metwaly/CAR-API. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2108.06637 [pdf, other]

Deep Algorithm Unrolling for Biomedical Imaging

Authors: Yuelong Li, Or Bar-Shira, Vishal Monga, Yonina C. Eldar

Abstract: In this chapter, we review biomedical applications and breakthroughs via leveraging algorithm unrolling, an important technique that bridges between traditional iterative algorithms and modern deep learning techniques. To provide context, we start by tracing the origin of algorithm unrolling and providing a comprehensive tutorial on how to unroll iterative algorithms into deep networks. We then ex… ▽ More In this chapter, we review biomedical applications and breakthroughs via leveraging algorithm unrolling, an important technique that bridges between traditional iterative algorithms and modern deep learning techniques. To provide context, we start by tracing the origin of algorithm unrolling and providing a comprehensive tutorial on how to unroll iterative algorithms into deep networks. We then extensively cover algorithm unrolling in a wide variety of biomedical imaging modalities and delve into several representative recent works in detail. Indeed, there is a rich history of iterative algorithms for biomedical image synthesis, which makes the field ripe for unrolling techniques. In addition, we put algorithm unrolling into a broad perspective, in order to understand why it is particularly effective and discuss recent trends. Finally, we conclude the chapter by discussing open challenges, and suggesting future research directions. △ Less

Submitted 14 August, 2021; originally announced August 2021.

arXiv:2107.14563 [pdf, other]

Iterative, Deep, and Unsupervised Synthetic Aperture Sonar Image Segmentation

Authors: Yung-Chen Sun, Isaac D. Gerg, Vishal Monga

Abstract: Deep learning has not been routinely employed for semantic segmentation of seabed environment for synthetic aperture sonar (SAS) imagery due to the implicit need of abundant training data such methods necessitate. Abundant training data, specifically pixel-level labels for all images, is usually not available for SAS imagery due to the complex logistics (e.g., diver survey, chase boat, precision p… ▽ More Deep learning has not been routinely employed for semantic segmentation of seabed environment for synthetic aperture sonar (SAS) imagery due to the implicit need of abundant training data such methods necessitate. Abundant training data, specifically pixel-level labels for all images, is usually not available for SAS imagery due to the complex logistics (e.g., diver survey, chase boat, precision position information) needed for obtaining accurate ground-truth. Many hand-crafted feature based algorithms have been proposed to segment SAS in an unsupervised fashion. However, there is still room for improvement as the feature extraction step of these methods is fixed. In this work, we present a new iterative unsupervised algorithm for learning deep features for SAS image segmentation. Our proposed algorithm alternates between clustering superpixels and updating the parameters of a convolutional neural network (CNN) so that the feature extraction for image segmentation can be optimized. We demonstrate the efficacy of our method on a realistic benchmark dataset. Our results show that the performance of our proposed method is considerably better than current state-of-the-art methods in SAS image segmentation. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: IEEE OCEANS 2021

arXiv:2105.02209 [pdf, other]

Physically Inspired Dense Fusion Networks for Relighting

Authors: Amirsaeed Yazdani, Tiantong Guo, Vishal Monga

Abstract: Image relighting has emerged as a problem of significant research interest inspired by augmented reality applications. Physics-based traditional methods, as well as black box deep learning models, have been developed. The existing deep networks have exploited training to achieve a new state of the art; however, they may perform poorly when training is limited or does not represent problem phenomen… ▽ More Image relighting has emerged as a problem of significant research interest inspired by augmented reality applications. Physics-based traditional methods, as well as black box deep learning models, have been developed. The existing deep networks have exploited training to achieve a new state of the art; however, they may perform poorly when training is limited or does not represent problem phenomenology, such as the addition or removal of dense shadows. We propose a model which enriches neural networks with physical insight. More precisely, our method generates the relighted image with new illumination settings via two different strategies and subsequently fuses them using a weight map (w). In the first strategy, our model predicts the material reflectance parameters (albedo) and illumination/geometry parameters of the scene (shading) for the relit image (we refer to this strategy as intrinsic image decomposition (IID)). The second strategy is solely based on the black box approach, where the model optimizes its weights based on the ground-truth images and the loss terms in the training stage and generates the relit output directly (we refer to this strategy as direct). While our proposed method applies to both one-to-one and any-to-any relighting problems, for each case we introduce problem-specific components that enrich the model performance: 1) For one-to-one relighting we incorporate normal vectors of the surfaces in the scene to adjust gloss and shadows accordingly in the image. 2) For any-to-any relighting, we propose an additional multiscale block to the architecture to enhance feature extraction. Experimental results on the VIDIT 2020 and the VIDIT 2021 dataset (used in the NTIRE 2021 relighting challenge) reveals that our proposal can outperform many state-of-the-art methods in terms of well-known fidelity metrics and perceptual loss. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: Rank second in NTIRE 2021 One-to-one depth guided image relighting challenge, accepted by CVPRW 2021

arXiv:2104.14713 [pdf, other]

Simultaneous Denoising and Localization Network for Photoacoustic Target Localization

Authors: Amirsaeed Yazdani, Sumit Agrawal, Kerrick Johnstonbaugh, Sri-Rajasekhar Kothapalli, Vishal Monga

Abstract: A significant research problem of recent interest is the localization of targets like vessels, surgical needles, and tumors in photoacoustic (PA) images. To achieve accurate localization, a high photoacoustic signal-to-noise ratio (SNR) is required. However, this is not guaranteed for deep targets, as optical scattering causes an exponential decay in optical fluence with respect to tissue depth. T… ▽ More A significant research problem of recent interest is the localization of targets like vessels, surgical needles, and tumors in photoacoustic (PA) images. To achieve accurate localization, a high photoacoustic signal-to-noise ratio (SNR) is required. However, this is not guaranteed for deep targets, as optical scattering causes an exponential decay in optical fluence with respect to tissue depth. To address this, we develop a novel deep learning method designed to explicitly exhibit robustness to noise present in photoacoustic radio-frequency (RF) data. More precisely, we describe and evaluate a deep neural network architecture consisting of a shared encoder and two parallel decoders. One decoder extracts the target coordinates from the input RF data while the other boosts the SNR and estimates clean RF data. The joint optimization of the shared encoder and dual decoders lends significant noise robustness to the features extracted by the encoder, which in turn enables the network to contain detailed information about deep targets that may be obscured by noise. Additional custom layers and newly proposed regularizers in the training loss function (designed based on observed RF data signal and noise behavior) serve to increase the SNR in the cleaned RF output and improve model performance. To account for depth-dependent strong optical scattering, our network was trained with simulated photoacoustic datasets of targets embedded at different depths inside tissue media of different scattering levels. The network trained on this novel dataset accurately locates targets in experimental PA data that is clinically relevant with respect to the localization of vessels, needles, or brachytherapy seeds. We verify the merits of the proposed architecture by outperforming the state of the art on both simulated and experimental datasets. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Comments: Accepted by IEEE Transactions on Medical Imaging

arXiv:2104.10705 [pdf, other]

Multi-Class Micro-CT Image Segmentation Using Sparse Regularized Deep Networks

Authors: Amirsaeed Yazdani, Yung-Chen Sun, Nicholas B. Stephens, Timothy Ryan, Vishal Monga

Abstract: It is common in anthropology and paleontology to address questions about extant and extinct species through the quantification of osteological features observable in micro-computed tomographic (micro-CT) scans. In cases where remains were buried, the grey values present in these scans may be classified as belonging to air, dirt, or bone. While various intensity-based methods have been proposed to… ▽ More It is common in anthropology and paleontology to address questions about extant and extinct species through the quantification of osteological features observable in micro-computed tomographic (micro-CT) scans. In cases where remains were buried, the grey values present in these scans may be classified as belonging to air, dirt, or bone. While various intensity-based methods have been proposed to segment scans into these classes, it is often the case that intensity values for dirt and bone are nearly indistinguishable. In these instances, scientists resort to laborious manual segmentation, which does not scale well in practice when a large number of scans are to be analyzed. Here we present a new domain-enriched network for three-class image segmentation, which utilizes the domain knowledge of experts familiar with manually segmenting bone and dirt structures. More precisely, our novel structure consists of two components: 1) a representation network trained on special samples based on newly designed custom loss terms, which extracts discriminative bone and dirt features, 2) and a segmentation network that leverages these extracted discriminative features. These two parts are jointly trained in order to optimize the segmentation performance. A comparison of our network to that of the current state-of-the-art U-NETs demonstrates the benefits of our proposal, particularly when the number of labeled training images are limited, which is invariably the case for micro-CT segmentation. △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: 5 pages, 6 figures, accepted in 2020 54th Asilomar Conference on Signals, Systems, and Computers

arXiv:2103.10312 [pdf, other]

Real-Time, Deep Synthetic Aperture Sonar (SAS) Autofocus

Authors: Isaac D. Gerg, Vishal Monga

Abstract: Synthetic aperture sonar (SAS) requires precise time-of-flight measurements of the transmitted/received waveform to produce well-focused imagery. It is not uncommon for errors in these measurements to be present resulting in image defocusing. To overcome this, an \emph{autofocus} algorithm is employed as a post-processing step after image reconstruction to improve image focus. A particular class o… ▽ More Synthetic aperture sonar (SAS) requires precise time-of-flight measurements of the transmitted/received waveform to produce well-focused imagery. It is not uncommon for errors in these measurements to be present resulting in image defocusing. To overcome this, an \emph{autofocus} algorithm is employed as a post-processing step after image reconstruction to improve image focus. A particular class of these algorithms can be framed as a sharpness/contrast metric-based optimization. To improve convergence, a hand-crafted weighting function to remove "bad" areas of the image is sometimes applied to the image-under-test before the optimization procedure. Additionally, dozens of iterations are necessary for convergence which is a large compute burden for low size, weight, and power (SWaP) systems. We propose a deep learning technique to overcome these limitations and implicitly learn the weighting function in a data-driven manner. Our proposed method, which we call Deep Autofocus, uses features from the single-look-complex (SLC) to estimate the phase correction which is applied in $k$-space. Furthermore, we train our algorithm on batches of training imagery so that during deployment, only a single iteration of our method is sufficient to autofocus. We show results demonstrating the robustness of our technique by comparing our results to four commonly used image sharpness metrics. Our results demonstrate Deep Autofocus can produce imagery perceptually better than common iterative techniques but at a lower computational cost. We conclude that Deep Autofocus can provide a more favorable cost-quality trade-off than alternatives with significant potential of future research. △ Less

Submitted 1 June, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: Four pages. Accepted to IGARSS 2021. Fixed Eq 9

arXiv:2010.15687

Deep Autofocus for Synthetic Aperture Sonar

Authors: Isaac Gerg, Vishal Monga

Abstract: Synthetic aperture sonar (SAS) requires precise positional and environmental information to produce well-focused output during the image reconstruction step. However, errors in these measurements are commonly present resulting in defocused imagery. To overcome these issues, an \emph{autofocus} algorithm is employed as a post-processing step after image reconstruction for the purpose of improving i… ▽ More Synthetic aperture sonar (SAS) requires precise positional and environmental information to produce well-focused output during the image reconstruction step. However, errors in these measurements are commonly present resulting in defocused imagery. To overcome these issues, an \emph{autofocus} algorithm is employed as a post-processing step after image reconstruction for the purpose of improving image quality using the image content itself. These algorithms are usually iterative and metric-based in that they seek to optimize an image sharpness metric. In this letter, we demonstrate the potential of machine learning, specifically deep learning, to address the autofocus problem. We formulate the problem as a self-supervised, phase error estimation task using a deep network we call Deep Autofocus. Our formulation has the advantages of being non-iterative (and thus fast) and not requiring ground truth focused-defocused images pairs as often required by other deblurring deep learning methods. We compare our technique against a set of common sharpness metrics optimized using gradient descent over a real-world dataset. Our results demonstrate Deep Autofocus can produce imagery that is perceptually as good as benchmark iterative techniques but at a substantially lower computational cost. We conclude that our proposed Deep Autofocus can provide a more favorable cost-quality trade-off than state-of-the-art alternatives with significant potential of future research. △ Less

Submitted 30 July, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

Comments: superseded by another work

arXiv:2010.13317 [pdf, other]

Structural Prior Driven Regularized Deep Learning for Sonar Image Classification

Authors: Isaac D. Gerg, Vishal Monga

Abstract: Deep learning has been recently shown to improve performance in the domain of synthetic aperture sonar (SAS) image classification. Given the constant resolution with range of a SAS, it is no surprise that deep learning techniques perform so well. Despite deep learning's recent success, there are still compelling open challenges in reducing the high false alarm rate and enabling success when traini… ▽ More Deep learning has been recently shown to improve performance in the domain of synthetic aperture sonar (SAS) image classification. Given the constant resolution with range of a SAS, it is no surprise that deep learning techniques perform so well. Despite deep learning's recent success, there are still compelling open challenges in reducing the high false alarm rate and enabling success when training imagery is limited, which is a practical challenge that distinguishes the SAS classification problem from standard image classification set-ups where training imagery may be abundant. We address these challenges by exploiting prior knowledge that humans use to grasp the scene. These include unconscious elimination of the image speckle and localization of objects in the scene. We introduce a new deep learning architecture which incorporates these priors with the goal of improving automatic target recognition (ATR) from SAS imagery. Our proposal -- called SPDRDL, Structural Prior Driven Regularized Deep Learning -- incorporates the previously mentioned priors in a multi-task convolutional neural network (CNN) and requires no additional training data when compared to traditional SAS ATR methods. Two structural priors are enforced via regularization terms in the learning of the network: (1) structural similarity prior -- enhanced imagery (often through despeckling) aids human interpretation and is semantically similar to the original imagery and (2) structural scene context priors -- learned features ideally encapsulate target centering information; hence learning may be enhanced via a regularization that encourages fidelity against known ground truth target shifts (relative target position from scene center). Experiments on a challenging real-world dataset reveal that SPDRDL outperforms state-of-the-art deep learning and other competing methods for SAS image classification. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: To appear in TGRS, 2021

arXiv:2005.03457 [pdf, other]

NTIRE 2020 Challenge on NonHomogeneous Dehazing

Authors: Codruta O. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, Radu Timofte, Jing Liu, Haiyan Wu, Yuan Xie, Yanyun Qu, Lizhuang Ma, Ziling Huang, Qili Deng, Ju-Chin Chao, Tsung-Shan Yang, Peng-Wen Chen, Po-Min Hsu, Tzu-Yi Liao, Chung-En Sun, Pei-Yuan Wu, Jeonghyeok Do, Jongmin Park, Munchurl Kim, Kareem Metwaly, Xuelu Li, Tiantong Guo, Vishal Monga , et al. (27 additional authors not shown)

Abstract: This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images.… ▽ More This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images. The nonhomogeneous haze has been produced using a professional haze generator that imitates the real conditions of haze scenes. 168 participants registered in the challenge and 27 teams competed in the final testing phase. The proposed solutions gauge the state-of-the-art in image dehazing. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: CVPR Workshops Proceedings 2020

arXiv:2004.01817 [pdf, other]

Group Based Deep Shared Feature Learning for Fine-grained Image Classification

Authors: Xuelu Li, Vishal Monga

Abstract: Fine-grained image classification has emerged as a significant challenge because objects in such images have small inter-class visual differences but with large variations in pose, lighting, and viewpoints, etc. Most existing work focuses on highly customized feature extraction via deep network architectures which have been shown to deliver state of the art performance. Given that images from dist… ▽ More Fine-grained image classification has emerged as a significant challenge because objects in such images have small inter-class visual differences but with large variations in pose, lighting, and viewpoints, etc. Most existing work focuses on highly customized feature extraction via deep network architectures which have been shown to deliver state of the art performance. Given that images from distinct classes in fine-grained classification share significant features of interest, we present a new deep network architecture that explicitly models shared features and removes their effect to achieve enhanced classification results. Our modeling of shared features is based on a new group based learning wherein existing classes are divided into groups and multiple shared feature patterns are discovered (learned). We call this framework Group based deep Shared Feature Learning (GSFL) and the resulting learned network as GSFL-Net. Specifically, the proposed GSFL-Net develops a specially designed autoencoder which is constrained by a newly proposed Feature Expression Loss to decompose a set of features into their constituent shared and discriminative components. During inference, only the discriminative feature component is used to accomplish the classification task. A key benefit of our specialized autoencoder is that it is versatile and can be combined with state-of-the-art fine-grained feature extraction models and trained together with them to improve their performance directly. Experiments on benchmark datasets show that GSFL-Net can enhance classification accuracy over the state of the art with a more interpretable architecture. △ Less

Submitted 3 April, 2020; originally announced April 2020.

arXiv:1912.10557 [pdf, other]

Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing

Authors: Vishal Monga, Yuelong Li, Yonina C. Eldar

Abstract: Deep neural networks provide unprecedented performance gains in many real world problems in signal and image processing. Despite these gains, future development and practical deployment of deep networks is hindered by their blackbox nature, i.e., lack of interpretability, and by the need for very large training sets. An emerging technique called algorithm unrolling or unfolding offers promise in e… ▽ More Deep neural networks provide unprecedented performance gains in many real world problems in signal and image processing. Despite these gains, future development and practical deployment of deep networks is hindered by their blackbox nature, i.e., lack of interpretability, and by the need for very large training sets. An emerging technique called algorithm unrolling or unfolding offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are used widely in signal processing and deep neural networks. Unrolling methods were first proposed to develop fast neural network approximations for sparse coding. More recently, this direction has attracted enormous attention and is rapidly growing both in theoretic investigations and practical applications. The growing popularity of unrolled deep networks is due in part to their potential in developing efficient, high-performance and yet interpretable network architectures from reasonable size training sets. In this article, we review algorithm unrolling for signal and image processing. We extensively cover popular techniques for algorithm unrolling in various domains of signal and image processing including imaging, vision and recognition, and speech processing. By reviewing previous works, we reveal the connections between iterative algorithms and neural networks and present recent theoretical results. Finally, we provide a discussion on current limitations of unrolling and suggest possible future research directions. △ Less

Submitted 7 August, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

arXiv:1910.10908 [pdf, other]

Thresholded Non-Uniform Fourier Frame-Based Reconstruction for Stripmap SAR

Authors: John McKay, Anne Gelb, Suren Jayasuriya, Vishal Monga

Abstract: Fourier domain methods are fast algorithms for SAR imaging. They typically involve an interpolation in the frequency domain to re-grid non-uniform data so inverse fast Fourier transforms can be performed. In this paper, we apply a frame reconstruction algorithm, extending the non-uniform fast Fourier transform, to stripmap SAR data. Further, we present an improved thresholded frame reconstruction… ▽ More Fourier domain methods are fast algorithms for SAR imaging. They typically involve an interpolation in the frequency domain to re-grid non-uniform data so inverse fast Fourier transforms can be performed. In this paper, we apply a frame reconstruction algorithm, extending the non-uniform fast Fourier transform, to stripmap SAR data. Further, we present an improved thresholded frame reconstruction algorithm for robust performance and improved computational efficiency. We demonstrate compelling results on real stripmap SAR data. △ Less

Submitted 24 October, 2019; originally announced October 2019.

arXiv:1909.09175 [pdf, other]

doi 10.1109/TIP.2019.2946078

Deep Retinal Image Segmentation with Regularization Under Geometric Priors

Authors: Venkateswararao Cherukuri, Vijay Kumar BG, Raja Bala, Vishal Monga

Abstract: Vessel segmentation of retinal images is a key diagnostic capability in ophthalmology. This problem faces several challenges including low contrast, variable vessel size and thickness, and presence of interfering pathology such as micro-aneurysms and hemorrhages. Early approaches addressing this problem employed hand-crafted filters to capture vessel structures, accompanied by morphological post-p… ▽ More Vessel segmentation of retinal images is a key diagnostic capability in ophthalmology. This problem faces several challenges including low contrast, variable vessel size and thickness, and presence of interfering pathology such as micro-aneurysms and hemorrhages. Early approaches addressing this problem employed hand-crafted filters to capture vessel structures, accompanied by morphological post-processing. More recently, deep learning techniques have been employed with significantly enhanced segmentation accuracy. We propose a novel domain enriched deep network that consists of two components: 1) a representation network that learns geometric features specific to retinal images, and 2) a custom designed computationally efficient residual task network that utilizes the features obtained from the representation layer to perform pixel-level segmentation. The representation and task networks are {\em jointly learned} for any given training set. To obtain physically meaningful and practically effective representation filters, we propose two new constraints that are inspired by expected prior structure on these filters: 1) orientation constraint that promotes geometric diversity of curvilinear features, and 2) a data adaptive noise regularizer that penalizes false positives. Multi-scale extensions are developed to enable accurate detection of thin vessels. Experiments performed on three challenging benchmark databases under a variety of training scenarios show that the proposed prior guided deep network outperforms state of the art alternatives as measured by common evaluation metrics, while being more economical in network size and inference time. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: Accepted to IEEE TIP

arXiv:1909.04572 [pdf, other]

doi 10.1109/TIP.2019.2942510

Deep MR Brain Image Super-Resolution Using Spatio-Structural Priors

Authors: Venkateswararao Cherukuri, Tiantong Guo, Steve. J. Schiff, Vishal Monga

Abstract: High resolution Magnetic Resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware and processing constraints. Recently, deep learning methods have been shown to produce compelling state-of-the-art results for image enhancement/super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose… ▽ More High resolution Magnetic Resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware and processing constraints. Recently, deep learning methods have been shown to produce compelling state-of-the-art results for image enhancement/super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose a new regularized network that exploits image priors, namely a low-rank structure and a sharpness prior to enhance deep MR image super-resolution (SR). Our contributions are then incorporating these priors in an analytically tractable fashion \color{black} as well as towards a novel prior guided network architecture that accomplishes the super-resolution task. This is particularly challenging for the low rank prior since the rank is not a differentiable function of the image matrix(and hence the network parameters), an issue we address by pursuing differentiable approximations of the rank. Sharpness is emphasized by the variance of the Laplacian which we show can be implemented by a fixed feedback layer at the output of the network. As a key extension, we modify the fixed feedback (Laplacian) layer by learning a new set of training data driven filters that are optimized for enhanced sharpness. Experiments performed on publicly available MR brain image databases and comparisons against existing state-of-the-art methods show that the proposed prior guided network offers significant practical gains in terms of improved SNR/image quality measures. Because our priors are on output images, the proposed method is versatile and can be combined with a wide variety of existing network architectures to further enhance their performance. △ Less

Submitted 10 September, 2019; originally announced September 2019.

Comments: Accepted to IEEE transactions on Image Processing

arXiv:1904.10082 [pdf, other]

doi 10.1109/TIP.2019.2913500

Adaptive Transform Domain Image Super-resolution Via Orthogonally Regularized Deep Networks

Authors: Tiantong Guo, Hojjat S. Mousavi, Vishal Monga

Abstract: Deep learning methods, in particular, trained Convolutional Neural Networks (CNN) have recently been shown to produce compelling results for single image Super-Resolution (SR). Invariably, a CNN is learned to map the Low Resolution (LR) image to its corresponding High Resolution (HR) version in the spatial domain. We propose a novel network structure for learning the SR mapping function in an imag… ▽ More Deep learning methods, in particular, trained Convolutional Neural Networks (CNN) have recently been shown to produce compelling results for single image Super-Resolution (SR). Invariably, a CNN is learned to map the Low Resolution (LR) image to its corresponding High Resolution (HR) version in the spatial domain. We propose a novel network structure for learning the SR mapping function in an image transform domain, specifically the Discrete Cosine Transform (DCT). As the first contribution, we show that DCT can be integrated into the network structure as a Convolutional DCT (CDCT) layer. With the CDCT layer, we construct the DCT Deep SR (DCT-DSR) network. We further extend the DCT-DSR to allow the CDCT layer to become trainable (i.e., optimizable). Because this layer represents an image transform, we enforce pairwise orthogonality constraints and newly formulated complexity order constraints on the individual basis functions/filters. This Orthogonally Regularized Deep SR network (ORDSR) simplifies the SR task by taking advantage of image transform domain while adapting the design of transform basis to the training image set. Experimental results show ORDSR achieves state-of-the-art SR image quality with fewer parameters than most of the deep CNN methods. A particular success of ORDSR is in overcoming the artifacts introduced by bicubic interpolation. A key burden of deep SR has been identified as the requirement of generous training LR and HR image pairs; ORSDR exhibits a much more graceful degradation as training size is reduced with significant benefits in the regime of limited training. Analysis of memory and computation requirements confirms that ORDSR can allow for a more efficient network with faster inference. △ Less

Submitted 22 April, 2019; originally announced April 2019.

arXiv:1904.07329 [pdf, other]

doi 10.1109/TSP.2019.2914884

Transmit MIMO Radar Beampattern Design Via Optimization on the Complex Circle Manifol

Authors: Khaled Alhujaili, Vishal Monga, Muralidhar Rangaswamy

Abstract: The ability of Multiple-Input Multiple-Output (MIMO) radar systems to adapt waveforms across antennas allows flexibility in the transmit beampattern design. In cognitive radar, a popular cost function is to minimize the deviation against an idealized beampattern (which is arrived at with knowledge of the environment). The optimization of the transmit beampattern becomes particularly challenging in… ▽ More The ability of Multiple-Input Multiple-Output (MIMO) radar systems to adapt waveforms across antennas allows flexibility in the transmit beampattern design. In cognitive radar, a popular cost function is to minimize the deviation against an idealized beampattern (which is arrived at with knowledge of the environment). The optimization of the transmit beampattern becomes particularly challenging in the presence of practical constraints on the transmit waveform. One of the hardest of such constraints is the non-convex constant modulus constraint, which has been the subject of much recent work. In a departure from most existing approaches, we develop a solution that involves direct optimization over the non-convex complex circle manifold. That is, we derive a new projection, descent, and retraction (PDR) update strategy that allows for monotonic cost function improvement while maintaining feasibility over the complex circle manifold (constant modulus set). For quadratic cost functions (as is the case with beampattern deviation), we provide analytical guarantees of monotonic cost function improvement along with proof of convergence to a local minima. We evaluate the proposed PDR algorithm against other candidate MIMO beampattern design methods and show that PDR can outperform competing wideband beampattern design methods while being computationally less expensive. Finally, orthogonality across antennas is incorporated in the PDR framework by adding a penalty term to the beampattern cost function. Enabled by orthogonal waveforms, robustness to target direction mismatch is also demonstrated. △ Less

Submitted 15 April, 2019; originally announced April 2019.

arXiv:1904.04158 [pdf, other]

doi 10.1109/TIP.2019.2909800

Robust Alignment for Panoramic Stitching via an Exact Rank Constraint

Authors: Yuelong Li, Mohammad Tofighi, Vishal Monga

Abstract: We study the problem of image alignment for panoramic stitching. Unlike most existing approaches that are feature-based, our algorithm works on pixels directly, and accounts for errors across the whole images globally. Technically, we formulate the alignment problem as rank-1 and sparse matrix decomposition over transformed images, and develop an efficient algorithm for solving this challenging no… ▽ More We study the problem of image alignment for panoramic stitching. Unlike most existing approaches that are feature-based, our algorithm works on pixels directly, and accounts for errors across the whole images globally. Technically, we formulate the alignment problem as rank-1 and sparse matrix decomposition over transformed images, and develop an efficient algorithm for solving this challenging non-convex optimization problem. The algorithm reduces to solving a sequence of subproblems, where we analytically establish exact recovery conditions, convergence and optimality, together with convergence rate and complexity. We generalize it to simultaneously align multiple images and recover multiple homographies, extending its application scope towards vast majority of practical scenarios. Experimental results demonstrate that the proposed algorithm is capable of more accurately aligning the images and generating higher quality stitched images than state-of-the-art methods. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: Accepted for publication in IEEE Transactions on Image Processing

arXiv:1902.05399 [pdf, other]

An Algorithm Unrolling Approach to Deep Image Deblurring

Authors: Yuelong Li, Mohammad Tofighi, Vishal Monga, Yonina C. Eldar

Abstract: While neural networks have achieved vastly enhanced performance over traditional iterative methods in many cases, they are generally empirically designed and the underlying structures are difficult to interpret. The algorithm unrolling approach has helped connect iterative algorithms to neural network architectures. However, such connections have not been made yet for blind image deblurring. In th… ▽ More While neural networks have achieved vastly enhanced performance over traditional iterative methods in many cases, they are generally empirically designed and the underlying structures are difficult to interpret. The algorithm unrolling approach has helped connect iterative algorithms to neural network architectures. However, such connections have not been made yet for blind image deblurring. In this paper, we propose a neural network architecture that advances this idea. We first present an iterative algorithm that may be considered a generalization of the traditional total-variation regularization method on the gradient domain, and subsequently unroll the half-quadratic splitting algorithm to construct a neural network. Our proposed deep network achieves significant practical performance gains while enjoying interpretability at the same time. Experimental results show that our approach outperforms many state-of-the-art methods. △ Less

Submitted 15 February, 2019; v1 submitted 9 February, 2019; originally announced February 2019.

Comments: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

arXiv:1902.03493 [pdf, other]

Deep Algorithm Unrolling for Blind Image Deblurring

Authors: Yuelong Li, Mohammad Tofighi, Junyi Geng, Vishal Monga, Yonina C. Eldar

Abstract: Blind image deblurring remains a topic of enduring interest. Learning based approaches, especially those that employ neural networks have emerged to complement traditional model based methods and in many cases achieve vastly enhanced performance. That said, neural network approaches are generally empirically designed and the underlying structures are difficult to interpret. In recent years, a prom… ▽ More Blind image deblurring remains a topic of enduring interest. Learning based approaches, especially those that employ neural networks have emerged to complement traditional model based methods and in many cases achieve vastly enhanced performance. That said, neural network approaches are generally empirically designed and the underlying structures are difficult to interpret. In recent years, a promising technique called algorithm unrolling has been developed that has helped connect iterative algorithms such as those for sparse coding to neural network architectures. However, such connections have not been made yet for blind image deblurring. In this paper, we propose a neural network architecture based on this idea. We first present an iterative algorithm that may be considered as a generalization of the traditional total-variation regularization method in the gradient domain. We then unroll the algorithm to construct a neural network for image deblurring which we refer to as Deep Unrolling for Blind Deblurring (DUBLID). Key algorithm parameters are learned with the help of training images. Our proposed deep network DUBLID achieves significant practical performance gains while enjoying interpretability at the same time. Extensive experimental results show that DUBLID outperforms many state-of-the-art methods and in addition is computationally faster. △ Less

Submitted 29 May, 2019; v1 submitted 9 February, 2019; originally announced February 2019.

arXiv:1901.07061 [pdf, other]

Prior Information Guided Regularized Deep Learning for Cell Nucleus Detection

Authors: Mohammad Tofighi, Tiantong Guo, Jairam K. P. Vanamala, Vishal Monga

Abstract: Cell nuclei detection is a challenging research topic because of limitations in cellular image quality and diversity of nuclear morphology, i.e. varying nuclei shapes, sizes, and overlaps between multiple cell nuclei. This has been a topic of enduring interest with promising recent success shown by deep learning methods. These methods train Convolutional Neural Networks (CNNs) with a training set… ▽ More Cell nuclei detection is a challenging research topic because of limitations in cellular image quality and diversity of nuclear morphology, i.e. varying nuclei shapes, sizes, and overlaps between multiple cell nuclei. This has been a topic of enduring interest with promising recent success shown by deep learning methods. These methods train Convolutional Neural Networks (CNNs) with a training set of input images and known, labeled nuclei locations. Many such methods are supplemented by spatial or morphological processing. Using a set of canonical cell nuclei shapes, prepared with the help of a domain expert, we develop a new approach that we call Shape Priors with Convolutional Neural Networks (SP-CNN). We further extend the network to introduce a shape prior (SP) layer and then allowing it to become trainable (i.e. optimizable). We call this network tunable SP-CNN (TSP-CNN). In summary, we present new network structures that can incorporate 'expected behavior' of nucleus shapes via two components: learnable layers that perform the nucleus detection and a fixed processing part that guides the learning with prior information. Analytically, we formulate two new regularization terms that are targeted at: 1) learning the shapes, 2) reducing false positives while simultaneously encouraging detection inside the cell nucleus boundary. Experimental results on two challenging datasets reveal that the proposed SP-CNN and TSP-CNN can outperform state-of-the-art alternatives. △ Less

Submitted 21 January, 2019; originally announced January 2019.

Comments: Accepted for Publication

Journal ref: IEEE Transactions on Medical Imaging, January 2019

arXiv:1811.11627 [pdf, other]

Spatio-Spectral Radar Beampattern Design for Co-existence with Wireless Communication Systems

Authors: Bosung Kang, Omar Aldayel, Vishal Monga, Muralidhar Rangaswamy

Abstract: We address the problem of designing a transmit beampattern for multiple-input multiple-output (MIMO) radar considering co-existence with wireless communication systems. The designed beampattern is able to manage the transmit energy in spatial directions as well as in spectral frequency bands of interest by minimizing the deviation of the designed beampattern versus a desired one under a spectral c… ▽ More We address the problem of designing a transmit beampattern for multiple-input multiple-output (MIMO) radar considering co-existence with wireless communication systems. The designed beampattern is able to manage the transmit energy in spatial directions as well as in spectral frequency bands of interest by minimizing the deviation of the designed beampattern versus a desired one under a spectral constraint as well as the constant modulus constraint. While unconstrained beampattern design is straightforward, a key open challenge is jointly enforcing the spectral constraint in addition to the constant modulus constraint on the radar waveform. A new approach is proposed in our work, which involves solving a sequence of constrained quadratic programs such that constant modulus is achieved at convergence. Further, we show that each problem in the sequence has a closed form solution leading to analytical tractability. We evaluate the proposed beampattern with interference control (BIC) algorithm against the state-of-the-art MIMO beampattern design techniques and show that BIC achieves closeness to an idealized beampattern along with desired spectral shaping. △ Less

Submitted 28 November, 2018; originally announced November 2018.

arXiv:1810.02812 [pdf, other]

Classifying Multi-channel UWB SAR Imagery via Tensor Sparsity Learning Techniques

Authors: Tiep Vu, Lam Nguyen, Vishal Monga

Abstract: Using low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture radar (SAR) technology for detecting buried and obscured targets, e.g. bomb or mine, has been successfully demonstrated recently. Despite promising recent progress, a significant open challenge is to distinguish obscured targets from other (natural and manmade) clutter sources in the scene. The problem becomes exacerbated… ▽ More Using low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture radar (SAR) technology for detecting buried and obscured targets, e.g. bomb or mine, has been successfully demonstrated recently. Despite promising recent progress, a significant open challenge is to distinguish obscured targets from other (natural and manmade) clutter sources in the scene. The problem becomes exacerbated in the presence of noisy responses from rough ground surfaces. In this paper, we present three novel sparsity-driven techniques, which not only exploit the subtle features of raw captured data but also take advantage of the polarization diversity and the aspect angle dependence information from multi-channel SAR data. First, the traditional sparse representation-based classification (SRC) is generalized to exploit shared information of classes and various sparsity structures of tensor coefficients for multi-channel data. Corresponding tensor dictionary learning models are consequently proposed to enhance classification accuracy. Lastly, a new tensor sparsity model is proposed to model responses from multiple consecutive looks of objects, which is a unique characteristic of the dataset we consider. Extensive experimental results on a high-fidelity electromagnetic simulated dataset and radar data collected from the U.S. Army Research Laboratory side-looking SAR demonstrate the advantages of proposed tensor sparsity models. △ Less

Submitted 4 October, 2018; originally announced October 2018.

arXiv:1809.03140 [pdf, other]

Deep MR Image Super-Resolution Using Structural Priors

Authors: Venkateswararao Cherukuri, Tiantong Guo, Steven J. Schiff, Vishal Monga

Abstract: High resolution magnetic resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware, cost and processing constraints. Recently, deep learning methods have been shown to produce compelling state of the art results for image super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose a new… ▽ More High resolution magnetic resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware, cost and processing constraints. Recently, deep learning methods have been shown to produce compelling state of the art results for image super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose a new regularized network that exploits image priors, namely a low-rank structure and a sharpness prior to enhance deep MR image superresolution. Our contributions are then incorporating these priors in an analytically tractable fashion in the learning of a convolutional neural network (CNN) that accomplishes the super-resolution task. This is particularly challenging for the low rank prior, since the rank is not a differentiable function of the image matrix (and hence the network parameters), an issue we address by pursuing differentiable approximations of the rank. Sharpness is emphasized by the variance of the Laplacian which we show can be implemented by a fixed {\em feedback} layer at the output of the network. Experiments performed on two publicly available MR brain image databases exhibit promising results particularly when training imagery is limited. △ Less

Submitted 10 September, 2018; originally announced September 2018.

Comments: Accepted to IEEE ICIP 2018

arXiv:1807.03135 [pdf, other]

Deep Networks with Shape Priors for Nucleus Detection

Authors: Mohammad Tofighi, Tiantong Guo, Jairam K. P. Vanamala, Vishal Monga

Abstract: Detection of cell nuclei in microscopic images is a challenging research topic, because of limitations in cellular image quality and diversity of nuclear morphology, i.e. varying nuclei shapes, sizes, and overlaps between multiple cell nuclei. This has been a topic of enduring interest with promising recent success shown by deep learning methods. These methods train for example convolutional neura… ▽ More Detection of cell nuclei in microscopic images is a challenging research topic, because of limitations in cellular image quality and diversity of nuclear morphology, i.e. varying nuclei shapes, sizes, and overlaps between multiple cell nuclei. This has been a topic of enduring interest with promising recent success shown by deep learning methods. These methods train for example convolutional neural networks (CNNs) with a training set of input images and known, labeled nuclei locations. Many of these methods are supplemented by spatial or morphological processing. We develop a new approach that we call Shape Priors with Convolutional Neural Networks (SP-CNN) to perform significantly enhanced nuclei detection. A set of canonical shapes is prepared with the help of a domain expert. Subsequently, we present a new network structure that can incorporate `expected behavior' of nucleus shapes via two components: {\em learnable} layers that perform the nucleus detection and a {\em fixed} processing part that guides the learning with prior information. Analytically, we formulate a new regularization term that is targeted at penalizing false positives while simultaneously encouraging detection inside cell nucleus boundary. Experimental results on a challenging dataset reveal that SP-CNN is competitive with or outperforms several state-of-the-art methods. △ Less

Submitted 29 June, 2018; originally announced July 2018.

Comments: Accepted paper to 2018 IEEE International Conference on Image Processing (ICIP 2018)

arXiv:1803.07220 [pdf, other]

Collaborative Sparse Priors for Infrared Image Multi-view ATR

Authors: Xuelu Li, Vishal Monga

Abstract: Feature extraction from infrared (IR) images remains a challenging task. Learning based methods that can work on raw imagery/patches have therefore assumed significance. We propose a novel multi-task extension of the widely used sparse-representation-classification (SRC) method in both single and multi-view set-ups. That is, the test sample could be a single IR image or images from different views… ▽ More Feature extraction from infrared (IR) images remains a challenging task. Learning based methods that can work on raw imagery/patches have therefore assumed significance. We propose a novel multi-task extension of the widely used sparse-representation-classification (SRC) method in both single and multi-view set-ups. That is, the test sample could be a single IR image or images from different views. When expanded in terms of a training dictionary, the coefficient matrix in a multi-view scenario admits a sparse structure that is not easily captured by traditional sparsity-inducing measures such as the $l_0$-row pseudo norm. To that end, we employ collaborative spike and slab priors on the coefficient matrix, which can capture fairly general sparse structures. Our work involves joint parameter and sparse coefficient estimation (JPCEM) which alleviates the need to handpick prior parameters before classification. The experimental merits of JPCEM are substantiated through comparisons with other state-of-art methods on a challenging mid-wave IR image (MWIR) ATR database made available by the US Army Night Vision and Electronic Sensors Directorate. △ Less

Submitted 3 May, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

Comments: 4 pages, 3 figures, conference paper

arXiv:1802.02721 [pdf, other]

Deep Image Super Resolution via Natural Image Priors

Authors: Hojjat S. Mousavi, Tiantong Guo, Vishal Monga

Abstract: Single image super-resolution (SR) via deep learning has recently gained significant attention in the literature. Convolutional neural networks (CNNs) are typically learned to represent the mapping between low-resolution (LR) and high-resolution (HR) images/patches with the help of training examples. Most existing deep networks for SR produce high quality results when training data is abundant. Ho… ▽ More Single image super-resolution (SR) via deep learning has recently gained significant attention in the literature. Convolutional neural networks (CNNs) are typically learned to represent the mapping between low-resolution (LR) and high-resolution (HR) images/patches with the help of training examples. Most existing deep networks for SR produce high quality results when training data is abundant. However, their performance degrades sharply when training is limited. We propose to regularize deep structures with prior knowledge about the images so that they can capture more structural information from the same limited data. In particular, we incorporate in a tractable fashion within the CNN framework, natural image priors which have shown to have much recent success in imaging and vision inverse problems. Experimental results show that the proposed deep network with natural image priors is particularly effective in training starved regimes. △ Less

Submitted 8 February, 2018; originally announced February 2018.

arXiv:1802.02018 [pdf, other]

Orthogonally Regularized Deep Networks For Image Super-resolution

Authors: Tiantong Guo, Hojjat S. Mousavi, Vishal Monga

Abstract: Deep learning methods, in particular trained Convolutional Neural Networks (CNNs) have recently been shown to produce compelling state-of-the-art results for single image Super-Resolution (SR). Invariably, a CNN is learned to map the low resolution (LR) image to its corresponding high resolution (HR) version in the spatial domain. Aiming for faster inference and more efficient solutions than solvi… ▽ More Deep learning methods, in particular trained Convolutional Neural Networks (CNNs) have recently been shown to produce compelling state-of-the-art results for single image Super-Resolution (SR). Invariably, a CNN is learned to map the low resolution (LR) image to its corresponding high resolution (HR) version in the spatial domain. Aiming for faster inference and more efficient solutions than solving the SR problem in the spatial domain, we propose a novel network structure for learning the SR mapping function in an image transform domain, specifically the Discrete Cosine Transform (DCT). As a first contribution, we show that DCT can be integrated into the network structure as a Convolutional DCT (CDCT) layer. We further extend the network to allow the CDCT layer to become trainable (i.e. optimizable). Because this layer represents an image transform, we enforce pairwise orthogonality constraints on the individual basis functions/filters. This Orthogonally Regularized Deep SR network (ORDSR) simplifies the SR task by taking advantage of image transform domain while adapting the design of transform basis to the training image set. △ Less

Submitted 6 February, 2018; originally announced February 2018.

arXiv:1801.05458 [pdf, other]

Deep Network for Simultaneous Decomposition and Classification in UWB-SAR Imagery

Authors: Tiep Vu, Lam Nguyen, Tiantong Guo, Vishal Monga

Abstract: Classifying buried and obscured targets of interest from other natural and manmade clutter objects in the scene is an important problem for the U.S. Army. Targets of interest are often represented by signals captured using low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture radar (SAR) technology. This technology has been used in various applications, including ground penetration… ▽ More Classifying buried and obscured targets of interest from other natural and manmade clutter objects in the scene is an important problem for the U.S. Army. Targets of interest are often represented by signals captured using low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture radar (SAR) technology. This technology has been used in various applications, including ground penetration and sensing-through-the-wall. However, the technology still faces a significant issues regarding low-resolution SAR imagery in this particular frequency band, low radar cross sections (RCS), small objects compared to radar signal wavelengths, and heavy interference. The classification problem has been firstly, and partially, addressed by sparse representation-based classification (SRC) method which can extract noise from signals and exploit the cross-channel information. Despite providing potential results, SRC-related methods have drawbacks in representing nonlinear relations and dealing with larger training sets. In this paper, we propose a Simultaneous Decomposition and Classification Network (SDCN) to alleviate noise inferences and enhance classification accuracy. The network contains two jointly trained sub-networks: the decomposition sub-network handles denoising, while the classification sub-network discriminates targets from confusers. Experimental results show significant improvements over a network without decomposition and SRC-related methods. △ Less

Submitted 22 February, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

arXiv:1801.02548 [pdf, other]

Bridging the Gap: Simultaneous Fine Tuning for Data Re-Balancing

Authors: John McKay, Isaac Gerg, Vishal Monga

Abstract: There are many real-world classification problems wherein the issue of data imbalance (the case when a data set contains substantially more samples for one/many classes than the rest) is unavoidable. While under-sampling the problematic classes is a common solution, this is not a compelling option when the large data class is itself diverse and/or the limited data class is especially small. We sug… ▽ More There are many real-world classification problems wherein the issue of data imbalance (the case when a data set contains substantially more samples for one/many classes than the rest) is unavoidable. While under-sampling the problematic classes is a common solution, this is not a compelling option when the large data class is itself diverse and/or the limited data class is especially small. We suggest a strategy based on recent work concerning limited data problems which utilizes a supplemental set of images with similar properties to the limited data class to aid in the training of a neural network. We show results for our model against other typical methods on a real-world synthetic aperture sonar data set. Code can be found at github.com/JohnMcKay/dataImbalance. △ Less

Submitted 8 January, 2018; originally announced January 2018.

Comments: Submitted to IGARSS 2018, 4 Pages, 8 Figures

arXiv:1712.08227 [pdf, other]

Analysis-synthesis model learning with shared features: a new framework for histopathological image classification

Authors: Xuelu Li, Vishal Monga, U. K. Arvind Rao

Abstract: Automated histopathological image analysis offers exciting opportunities for the early diagnosis of several medical conditions including cancer. There are however stiff practical challenges: 1.) discriminative features from such images for separating diseased vs. healthy classes are not readily apparent, and 2.) distinct classes, e.g. healthy vs. stages of disease continue to share several geometr… ▽ More Automated histopathological image analysis offers exciting opportunities for the early diagnosis of several medical conditions including cancer. There are however stiff practical challenges: 1.) discriminative features from such images for separating diseased vs. healthy classes are not readily apparent, and 2.) distinct classes, e.g. healthy vs. stages of disease continue to share several geometric features. We propose a novel Analysis-synthesis model Learning with Shared Features algorithm (ALSF) for classifying such images more effectively. In ALSF, a joint analysis and synthesis learning model is introduced to learn the classifier and the feature extractor at the same time. In this way, the computation load in patch-level based image classification can be much reduced. Crucially, we integrate into this framework the learning of a low rank shared dictionary and a shared analysis operator, which more accurately represents both similarities and differences in histopathological images from distinct classes. ALSF is evaluated on two challenging databases: (1) kidney tissue images provided by the Animal Diagnosis Lab (ADL) at the Pennsylvania State University and (2) brain tumor images from The Cancer Genome Atlas (TCGA) database. Experimental results confirm that ALSF can offer benefits over state of the art alternatives. △ Less

Submitted 21 December, 2017; originally announced December 2017.

Comments: 2018 ISBI conference accepted paper

arXiv:1712.03993 [pdf, other]

Learning Based Segmentation of CT Brain Images: Application to Post-Operative Hydrocephalic Scans

Authors: Venkateswararao Cherukuri, Peter Ssenyonga, Benjamin C. Warf, Abhaya V. Kulkarni, Vishal Monga, Steven J. Schiff

Abstract: Objective: Hydrocephalus is a medical condition in which there is an abnormal accumulation of cerebrospinal fluid (CSF) in the brain. Segmentation of brain imagery into brain tissue and CSF (before and after surgery, i.e. pre-op vs. postop) plays a crucial role in evaluating surgical treatment. Segmentation of pre-op images is often a relatively straightforward problem and has been well researched… ▽ More Objective: Hydrocephalus is a medical condition in which there is an abnormal accumulation of cerebrospinal fluid (CSF) in the brain. Segmentation of brain imagery into brain tissue and CSF (before and after surgery, i.e. pre-op vs. postop) plays a crucial role in evaluating surgical treatment. Segmentation of pre-op images is often a relatively straightforward problem and has been well researched. However, segmenting post-operative (post-op) computational tomographic (CT)-scans becomes more challenging due to distorted anatomy and subdural hematoma collections pressing on the brain. Most intensity and feature based segmentation methods fail to separate subdurals from brain and CSF as subdural geometry varies greatly across different patients and their intensity varies with time. We combat this problem by a learning approach that treats segmentation as supervised classification at the pixel level, i.e. a training set of CT scans with labeled pixel identities is employed. Methods: Our contributions include: 1.) a dictionary learning framework that learns class (segment) specific dictionaries that can efficiently represent test samples from the same class while poorly represent corresponding samples from other classes, 2.) quantification of associated computation and memory footprint, and 3.) a customized training and test procedure for segmenting post-op hydrocephalic CT images. Results: Experiments performed on infant CT brain images acquired from the CURE Children's Hospital of Uganda reveal the success of our method against the state-of-the-art alternatives. We also demonstrate that the proposed algorithm is computationally less burdensome and exhibits a graceful degradation against number of training samples, enhancing its deployment potential. △ Less

Submitted 11 December, 2017; originally announced December 2017.

Comments: IEEE Transactions on Biomedical Engineering, 2018

Journal ref: IEEE Transactions on Biomedical Engineering 65.8 (2018): 1871-1884

arXiv:1712.01937 [pdf, other]

doi 10.1109/LSP.2017.2782570

Blind Image Deblurring Using Row-Column Sparse Representations

Authors: Mohammad Tofighi, Yuelong Li, Vishal Monga

Abstract: Blind image deblurring is a particularly challenging inverse problem where the blur kernel is unknown and must be estimated en route to recover the deblurred image. The problem is of strong practical relevance since many imaging devices such as cellphone cameras, must rely on deblurring algorithms to yield satisfactory image quality. Despite significant research effort, handling large motions rema… ▽ More Blind image deblurring is a particularly challenging inverse problem where the blur kernel is unknown and must be estimated en route to recover the deblurred image. The problem is of strong practical relevance since many imaging devices such as cellphone cameras, must rely on deblurring algorithms to yield satisfactory image quality. Despite significant research effort, handling large motions remains an open problem. In this paper, we develop a new method called Blind Image Deblurring using Row-Column Sparsity (BD-RCS) to address this issue. Specifically, we model the outer product of kernel and image coefficients in certain transformation domains as a rank-one matrix, and recover it by solving a rank minimization problem. Our central contribution then includes solving {\em two new} optimization problems involving row and column sparsity to automatically determine blur kernel and image support sequentially. The kernel and image can then be recovered through a singular value decomposition (SVD). Experimental results on linear motion deblurring demonstrate that BD-RCS can yield better results than state of the art, particularly when the blur is caused by large motion. This is confirmed both visually and through quantitative measures. △ Less

Submitted 5 December, 2017; originally announced December 2017.

Comments: Accepted to IEEE Signal Processing Letters, December 2017

arXiv:1707.02336 [pdf, other]

Fast Stochastic Hierarchical Bayesian MAP for Tomographic Imaging

Authors: John McKay, Raghu G. Raj, Vishal Monga

Abstract: Any image recovery algorithm attempts to achieve the highest quality reconstruction in a timely manner. The former can be achieved in several ways, among which are by incorporating Bayesian priors that exploit natural image tendencies to cue in on relevant phenomena. The Hierarchical Bayesian MAP (HB-MAP) is one such approach which is known to produce compelling results albeit at a substantial com… ▽ More Any image recovery algorithm attempts to achieve the highest quality reconstruction in a timely manner. The former can be achieved in several ways, among which are by incorporating Bayesian priors that exploit natural image tendencies to cue in on relevant phenomena. The Hierarchical Bayesian MAP (HB-MAP) is one such approach which is known to produce compelling results albeit at a substantial computational cost. We look to provide further analysis and insights into what makes the HB-MAP work. While retaining the proficient nature of HB-MAP's Type-I estimation, we propose a stochastic approximation-based approach to Type-II estimation. The resulting algorithm, fast stochastic HB-MAP (fsHBMAP), takes dramatically fewer operations while retaining high reconstruction quality. We employ our fsHBMAP scheme towards the problem of tomographic imaging and demonstrate that fsHBMAP furnishes promising results when compared to many competing methods. △ Less

Submitted 7 July, 2017; originally announced July 2017.

Comments: 5 Pages, 4 Figures, Conference (Accepted to Asilomar 2017)

arXiv:1706.09858 [pdf, other]

What's Mine is Yours: Pretrained CNNs for Limited Training Sonar ATR

Authors: John McKay, Isaac Gerg, Vishal Monga, Raghu Raj

Abstract: Finding mines in Sonar imagery is a significant problem with a great deal of relevance for seafaring military and commercial endeavors. Unfortunately, the lack of enormous Sonar image data sets has prevented automatic target recognition (ATR) algorithms from some of the same advances seen in other computer vision fields. Namely, the boom in convolutional neural nets (CNNs) which have been able to… ▽ More Finding mines in Sonar imagery is a significant problem with a great deal of relevance for seafaring military and commercial endeavors. Unfortunately, the lack of enormous Sonar image data sets has prevented automatic target recognition (ATR) algorithms from some of the same advances seen in other computer vision fields. Namely, the boom in convolutional neural nets (CNNs) which have been able to achieve incredible results - even surpassing human actors - has not been an easily feasible route for many practitioners of Sonar ATR. We demonstrate the power of one avenue to incorporating CNNs into Sonar ATR: transfer learning. We first show how well a straightforward, flexible CNN feature-extraction strategy can be used to obtain impressive if not state-of-the-art results. Secondly, we propose a way to utilize the powerful transfer learning approach towards multiple instance target detection and identification within a provided synthetic aperture Sonar data set. △ Less

Submitted 29 June, 2017; originally announced June 2017.

Comments: Accepted to OCEANS 2017 - Anchorage (Conference)

arXiv:1706.08590 [pdf, other]

doi 10.1109/TGRS.2017.2710040

Robust Sonar ATR Through Bayesian Pose Corrected Sparse Classification

Authors: John McKay, Vishal Monga, Raghu G. Raj

Abstract: Sonar imaging has seen vast improvements over the last few decades due in part to advances in synthetic aperture Sonar (SAS). Sophisticated classification techniques can now be used in Sonar automatic target recognition (ATR) to locate mines and other threatening objects. Among the most promising of these methods is sparse reconstruction-based classification (SRC) which has shown an impressive res… ▽ More Sonar imaging has seen vast improvements over the last few decades due in part to advances in synthetic aperture Sonar (SAS). Sophisticated classification techniques can now be used in Sonar automatic target recognition (ATR) to locate mines and other threatening objects. Among the most promising of these methods is sparse reconstruction-based classification (SRC) which has shown an impressive resiliency to noise, blur, and occlusion. We present a coherent strategy for expanding upon SRC for Sonar ATR that retains SRC's robustness while also being able to handle targets with diverse geometric arrangements, bothersome Rayleigh noise, and unavoidable background clutter. Our method, pose corrected sparsity (PCS), incorporates a novel interpretation of a spike and slab probability distribution towards use as a Bayesian prior for class-specific discrimination in combination with a dictionary learning scheme for localized patch extractions. Additionally, PCS offers the potential for anomaly detection in order to avoid false identifications of tested objects from outside the training set with no additional training required. Compelling results are shown using a database provided by the United States Naval Surface Warfare Center. △ Less

Submitted 26 June, 2017; originally announced June 2017.

Comments: 14 Pages, 16 Figures, Accepted TGARS

arXiv:1706.08575 [pdf, other]

Using Frame Theoretic Convolutional Gridding for Robust Synthetic Aperture Sonar Imaging

Authors: John McKay, Anne Gelb, Vishal Monga, Raghu Raj

Abstract: Recent progress in synthetic aperture sonar (SAS) technology and processing has led to significant advances in underwater imaging, outperforming previously common approaches in both accuracy and efficiency. There are, however, inherent limitations to current SAS reconstruction methodology. In particular, popular and efficient Fourier domain SAS methods require a 2D interpolation which is often ill… ▽ More Recent progress in synthetic aperture sonar (SAS) technology and processing has led to significant advances in underwater imaging, outperforming previously common approaches in both accuracy and efficiency. There are, however, inherent limitations to current SAS reconstruction methodology. In particular, popular and efficient Fourier domain SAS methods require a 2D interpolation which is often ill conditioned and inaccurate, inevitably reducing robustness with regard to speckle and inaccurate sound-speed estimation. To overcome these issues, we propose using the frame theoretic convolution gridding (FTCG) algorithm to handle the non-uniform Fourier data. FTCG extends upon non-uniform fast Fourier transform (NUFFT) algorithms by casting the NUFFT as an approximation problem given Fourier frame data. The FTCG has been show to yield improved accuracy at little more computational cost. Using simulated data, we outline how the FTCG can be used to enhance current SAS processing. △ Less

Submitted 26 June, 2017; originally announced June 2017.

Comments: Accepted to OCEANS 2017 - Anchorage (Conference)

arXiv:1612.02761 [pdf, other]

doi 10.1109/TIP.2016.2642790

A Maximum A Posteriori Estimation Framework for Robust High Dynamic Range Video Synthesis

Authors: Yuelong Li, Chul Lee, Vishal Monga

Abstract: High dynamic range (HDR) image synthesis from multiple low dynamic range (LDR) exposures continues to be actively researched. The extension to HDR video synthesis is a topic of significant current interest due to potential cost benefits. For HDR video, a stiff practical challenge presents itself in the form of accurate correspondence estimation of objects between video frames. In particular, loss… ▽ More High dynamic range (HDR) image synthesis from multiple low dynamic range (LDR) exposures continues to be actively researched. The extension to HDR video synthesis is a topic of significant current interest due to potential cost benefits. For HDR video, a stiff practical challenge presents itself in the form of accurate correspondence estimation of objects between video frames. In particular, loss of data resulting from poor exposures and varying intensity make conventional optical flow methods highly inaccurate. We avoid exact correspondence estimation by proposing a statistical approach via maximum a posterior (MAP) estimation, and under appropriate statistical assumptions and choice of priors and models, we reduce it to an optimization problem of solving for the foreground and background of the target frame. We obtain the background through rank minimization and estimate the foreground via a novel multiscale adaptive kernel regression technique, which implicitly captures local structure and temporal motion by solving an unconstrained optimization problem. Extensive experimental results on both real and synthetic datasets demonstrate that our algorithm is more capable of delivering high-quality HDR videos than current state-of-the-art methods, under both subjective and objective assessments. Furthermore, a thorough complexity analysis reveals that our algorithm achieves better complexity-performance trade-off than conventional methods. △ Less

Submitted 8 December, 2016; originally announced December 2016.

arXiv:1610.08606 [pdf, other]

doi 10.1109/TIP.2017.2729885

Fast Low-rank Shared Dictionary Learning for Image Classification

Authors: Tiep Vu, Vishal Monga

Abstract: Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. This observation has been exploited partially in a recently proposed dictionary learning framework by separating the particularity and the commonality (COPAR). Inspired by this, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well a… ▽ More Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. This observation has been exploited partially in a recently proposed dictionary learning framework by separating the particularity and the commonality (COPAR). Inspired by this, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification with more intuitive constraints. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (class-specific) dictionaries. For the shared dictionary, we enforce a low-rank constraint, i.e. claim that its spanning subspace should have low dimension and the coefficients corresponding to this dictionary should be similar. For the particular dictionaries, we impose on them the well-known constraints stated in the Fisher discrimination dictionary learning (FDDL). Further, we develop new fast and accurate algorithms to solve the subproblems in the learning step, accelerating its convergence. The said algorithms could also be applied to FDDL and its extensions. The efficiencies of these algorithms are theoretically and experimentally verified by comparing their complexities and running time with those of other well-known dictionary learning methods. Experimental results on widely used image datasets establish the advantages of our method over state-of-the-art dictionary learning methods. △ Less

Submitted 15 July, 2017; v1 submitted 26 October, 2016; originally announced October 2016.

Comments: Accepted version

arXiv:1610.08495 [pdf, other]

Adaptive matching pursuit for sparse signal recovery

Authors: Tiep H. Vu, Hojjat S. Mousavi, Vishal Monga

Abstract: Spike and Slab priors have been of much recent interest in signal processing as a means of inducing sparsity in Bayesian inference. Applications domains that benefit from the use of these priors include sparse recovery, regression and classification. It is well-known that solving for the sparse coefficient vector to maximize these priors results in a hard non-convex and mixed integer programming p… ▽ More Spike and Slab priors have been of much recent interest in signal processing as a means of inducing sparsity in Bayesian inference. Applications domains that benefit from the use of these priors include sparse recovery, regression and classification. It is well-known that solving for the sparse coefficient vector to maximize these priors results in a hard non-convex and mixed integer programming problem. Most existing solutions to this optimization problem either involve simplifying assumptions/relaxations or are computationally expensive. We propose a new greedy and adaptive matching pursuit (AMP) algorithm to directly solve this hard problem. Essentially, in each step of the algorithm, the set of active elements would be updated by either adding or removing one index, whichever results in better improvement. In addition, the intermediate steps of the algorithm are calculated via an inexpensive Cholesky decomposition which makes the algorithm much faster. Results on simulated data sets as well as real-world image recovery challenges confirm the benefits of the proposed AMP, particularly in providing a superior cost-quality trade-off over existing alternatives. △ Less

Submitted 12 September, 2016; originally announced October 2016.

Comments: ICASSP

arXiv:1610.01066 [pdf, other]

doi 10.1109/TIP.2017.2704443

Sparsity-based Color Image Super Resolution via Exploiting Cross Channel Constraints

Authors: Hojjat S. Mousavi, Vishal Monga

Abstract: Sparsity constrained single image super-resolution (SR) has been of much recent interest. A typical approach involves sparsely representing patches in a low-resolution (LR) input image via a dictionary of example LR patches, and then using the coefficients of this representation to generate the high-resolution (HR) output via an analogous HR dictionary. However, most existing sparse representation… ▽ More Sparsity constrained single image super-resolution (SR) has been of much recent interest. A typical approach involves sparsely representing patches in a low-resolution (LR) input image via a dictionary of example LR patches, and then using the coefficients of this representation to generate the high-resolution (HR) output via an analogous HR dictionary. However, most existing sparse representation methods for super resolution focus on the luminance channel information and do not capture interactions between color channels. In this work, we extend sparsity based super-resolution to multiple color channels by taking color information into account. Edge similarities amongst RGB color bands are exploited as cross channel correlation constraints. These additional constraints lead to a new optimization problem which is not easily solvable; however, a tractable solution is proposed to solve it efficiently. Moreover, to fully exploit the complementary information among color channels, a dictionary learning method is also proposed specifically to learn color dictionaries that encourage edge similarities. Merits of the proposed method over state of the art are demonstrated both visually and quantitatively using image quality metrics. △ Less

Submitted 4 October, 2016; originally announced October 2016.

arXiv:1602.05540 [pdf, other]

Robust Covariance Estimation under Imperfect Constraints using an Expected Likelihood Approach

Authors: Bosung Kang, Vishal Monga, Muralidhar Rangaswamy, Yuri I. Abramovich

Abstract: We address the problem of structured covariance matrix estimation for radar space-time adaptive processing (STAP). A priori knowledge of the interference environment has been exploited in many previous works to enable accurate estimators even when training is not generous. Specifically, recent work has shown that employing practical constraints such as the rank of clutter subspace and the conditio… ▽ More We address the problem of structured covariance matrix estimation for radar space-time adaptive processing (STAP). A priori knowledge of the interference environment has been exploited in many previous works to enable accurate estimators even when training is not generous. Specifically, recent work has shown that employing practical constraints such as the rank of clutter subspace and the condition number of disturbance covariance leads to powerful estimators that have closed form solutions. While rank and the condition number are very effective constraints, often practical non-idealities makes it difficult for them to be known precisely using physical models. Therefore, we propose a robust covariance estimation method for radar STAP via an expected likelihood (EL) approach. We analyze covariance estimation algorithms under three cases of imperfect constraints: 1) a rank constraint, 2) both rank and noise power constraints, and 3) condition number constraint. In each case, we formulate precise constraint determination as an optimization problem using the EL criterion. For each of the three cases, we derive new analytical results which allow for computationally efficient, practical ways of setting these constraints. In particular, we prove formally that both the rank and condition number as determined by the EL criterion are unique. Through experimental results from a simulation model and the KASSPER data set, we show the estimator with optimal constraints obtained by the EL approach outperforms state of the art alternatives. △ Less

Submitted 15 February, 2016; originally announced February 2016.

Comments: arXiv admin note: substantial text overlap with arXiv:1602.05069

arXiv:1602.00310 [pdf, other]

Learning a low-rank shared dictionary for object classification

Authors: Tiep H. Vu, Vishal Monga

Abstract: Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. Inspired by this observation, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (c… ▽ More Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. Inspired by this observation, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (class-specific) dictionaries. For the shared dictionary, we enforce a low-rank constraint, i.e. claim that its spanning subspace should have low dimension and the coefficients corresponding to this dictionary should be similar. For the particular dictionaries, we impose on them the well-known constraints stated in the Fisher discrimination dictionary learning (FDDL). Further, we propose a new fast and accurate algorithm to solve the sparse coding problems in the learning step, accelerating its convergence. The said algorithm could also be applied to FDDL and its extensions. Experimental results on widely used image databases establish the advantages of our method over state-of-the-art dictionary learning methods. △ Less

Submitted 17 May, 2016; v1 submitted 31 January, 2016; originally announced February 2016.

Comments: 4 page + 1 reference page

arXiv:1601.03323 [pdf, other]

Localized Dictionary design for Geometrically Robust Sonar ATR

Authors: John McKay, Vishal Monga, Raghu Raj

Abstract: Advancements in Sonar image capture have opened the door to powerful classification schemes for automatic target recognition (ATR. Recent work has particularly seen the application of sparse reconstruction-based classification (SRC) to sonar ATR, which provides compelling accuracy rates even in the presence of noise and blur. Existing sparsity based sonar ATR techniques however assume that the tes… ▽ More Advancements in Sonar image capture have opened the door to powerful classification schemes for automatic target recognition (ATR. Recent work has particularly seen the application of sparse reconstruction-based classification (SRC) to sonar ATR, which provides compelling accuracy rates even in the presence of noise and blur. Existing sparsity based sonar ATR techniques however assume that the test images exhibit geometric pose that is consistent with respect to the training set. This work addresses the outstanding open challenge of handling inconsistently posed test sonar images relative to training. We develop a new localized block-based dictionary design that can enable geometric, i.e. pose robustness. Further, a dictionary learning method is incorporated to increase performance and efficiency. The proposed SRC with Localized Pose Management (LPM), is shown to outperform the state of the art SIFT feature and SVM approach, due to its power to discern background clutter in Sonar images. △ Less

Submitted 13 January, 2016; originally announced January 2016.

Comments: Submitted to IGARSS 2016

Showing 1–50 of 59 results for author: Monga, V