-
GANs Settle Scores!
Authors:
Siddarth Asokan,
Nishanth Shetty,
Aadithya Srikanth,
Chandra Sekhar Seelamantula
Abstract:
Generative adversarial networks (GANs) comprise a generator, trained to learn the underlying distribution of the desired data, and a discriminator, trained to distinguish real samples from those output by the generator. A majority of GAN literature focuses on understanding the optimality of the discriminator through integral probability metric (IPM) or divergence based analysis. In this paper, we…
▽ More
Generative adversarial networks (GANs) comprise a generator, trained to learn the underlying distribution of the desired data, and a discriminator, trained to distinguish real samples from those output by the generator. A majority of GAN literature focuses on understanding the optimality of the discriminator through integral probability metric (IPM) or divergence based analysis. In this paper, we propose a unified approach to analyzing the generator optimization through variational approach. In $f$-divergence-minimizing GANs, we show that the optimal generator is the one that matches the score of its output distribution with that of the data distribution, while in IPM GANs, we show that this optimal generator matches score-like functions, involving the flow-field of the kernel associated with a chosen IPM constraint space. Further, the IPM-GAN optimization can be seen as one of smoothed score-matching, where the scores of the data and the generator distributions are convolved with the kernel associated with the constraint. The proposed approach serves to unify score-based training and existing GAN flavors, leveraging results from normalizing flows, while also providing explanations for empirical phenomena such as the stability of non-saturating GAN losses. Based on these results, we propose novel alternatives to $f$-GAN and IPM-GAN training based on score and flow matching, and discriminator-guided Langevin sampling.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Data Interpolants -- That's What Discriminators in Higher-order Gradient-regularized GANs Are
Authors:
Siddarth Asokan,
Chandra Sekhar Seelamantula
Abstract:
We consider the problem of optimizing the discriminator in generative adversarial networks (GANs) subject to higher-order gradient regularization. We show analytically, via the least-squares (LSGAN) and Wasserstein (WGAN) GAN variants, that the discriminator optimization problem is one of interpolation in $n$-dimensions. The optimal discriminator, derived using variational Calculus, turns out to b…
▽ More
We consider the problem of optimizing the discriminator in generative adversarial networks (GANs) subject to higher-order gradient regularization. We show analytically, via the least-squares (LSGAN) and Wasserstein (WGAN) GAN variants, that the discriminator optimization problem is one of interpolation in $n$-dimensions. The optimal discriminator, derived using variational Calculus, turns out to be the solution to a partial differential equation involving the iterated Laplacian or the polyharmonic operator. The solution is implementable in closed-form via polyharmonic radial basis function (RBF) interpolation. In view of the polyharmonic connection, we refer to the corresponding GANs as Poly-LSGAN and Poly-WGAN. Through experimental validation on multivariate Gaussians, we show that implementing the optimal RBF discriminator in closed-form, with penalty orders $m \approx\lceil \frac{n}{2} \rceil $, results in superior performance, compared to training GAN with arbitrarily chosen discriminator architectures. We employ the Poly-WGAN discriminator to model the latent space distribution of the data with encoder-decoder-based GAN flavors such as Wasserstein autoencoders.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Spider GAN: Leveraging Friendly Neighbors to Accelerate GAN Training
Authors:
Siddarth Asokan,
Chandra Sekhar Seelamantula
Abstract:
Training Generative adversarial networks (GANs) stably is a challenging task. The generator in GANs transform noise vectors, typically Gaussian distributed, into realistic data such as images. In this paper, we propose a novel approach for training GANs with images as inputs, but without enforcing any pairwise constraints. The intuition is that images are more structured than noise, which the gene…
▽ More
Training Generative adversarial networks (GANs) stably is a challenging task. The generator in GANs transform noise vectors, typically Gaussian distributed, into realistic data such as images. In this paper, we propose a novel approach for training GANs with images as inputs, but without enforcing any pairwise constraints. The intuition is that images are more structured than noise, which the generator can leverage to learn a more robust transformation. The process can be made efficient by identifying closely related datasets, or a ``friendly neighborhood'' of the target distribution, inspiring the moniker, Spider GAN. To define friendly neighborhoods leveraging proximity between datasets, we propose a new measure called the signed inception distance (SID), inspired by the polyharmonic kernel. We show that the Spider GAN formulation results in faster convergence, as the generator can discover correspondence even between seemingly unrelated datasets, for instance, between Tiny-ImageNet and CelebA faces. Further, we demonstrate cascading Spider GAN, where the output distribution from a pre-trained GAN generator is used as the input to the subsequent network. Effectively, transporting one distribution to another in a cascaded fashion until the target is learnt -- a new flavor of transfer learning. We demonstrate the efficacy of the Spider approach on DCGAN, conditional GAN, PGGAN, StyleGAN2 and StyleGAN3. The proposed approach achieves state-of-the-art Frechet inception distance (FID) values, with one-fifth of the training iterations, in comparison to their baseline counterparts on high-resolution small datasets such as MetFaces, Ukiyo-E Faces and AFHQ-Cats.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Wavelet Design in a Learning Framework
Authors:
Dhruv Jawali,
Abhishek Kumar,
Chandra Sekhar Seelamantula
Abstract:
Wavelets have proven to be highly successful in several signal and image processing applications. Wavelet design has been an active field of research for over two decades, with the problem often being approached from an analytical perspective. In this paper, we introduce a learning based approach to wavelet design. We draw a parallel between convolutional autoencoders and wavelet multiresolution a…
▽ More
Wavelets have proven to be highly successful in several signal and image processing applications. Wavelet design has been an active field of research for over two decades, with the problem often being approached from an analytical perspective. In this paper, we introduce a learning based approach to wavelet design. We draw a parallel between convolutional autoencoders and wavelet multiresolution approximation, and show how the learning angle provides a coherent computational framework for addressing the design problem. We aim at designing data-independent wavelets by training filterbank autoencoders, which precludes the need for customized datasets. In fact, we use high-dimensional Gaussian vectors for training filterbank autoencoders, and show that a near-zero training loss implies that the learnt filters satisfy the perfect reconstruction property with very high probability. Properties of a wavelet such as orthogonality, compact support, smoothness, symmetry, and vanishing moments can be incorporated by designing the autoencoder architecture appropriately and with a suitable regularization term added to the mean-squared error cost used in the learning process. Our approach not only recovers the well known Daubechies family of orthogonal wavelets and the Cohen-Daubechies-Feauveau family of symmetric biorthogonal wavelets, but also learns wavelets outside these families.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Learning Generative Prior with Latent Space Sparsity Constraints
Authors:
Vinayak Killedar,
Praveen Kumar Pokala,
Chandra Sekhar Seelamantula
Abstract:
We address the problem of compressed sensing using a deep generative prior model and consider both linear and learned nonlinear sensing mechanisms, where the nonlinear one involves either a fully connected neural network or a convolutional neural network. Recently, it has been argued that the distribution of natural images do not lie in a single manifold but rather lie in a union of several subman…
▽ More
We address the problem of compressed sensing using a deep generative prior model and consider both linear and learned nonlinear sensing mechanisms, where the nonlinear one involves either a fully connected neural network or a convolutional neural network. Recently, it has been argued that the distribution of natural images do not lie in a single manifold but rather lie in a union of several submanifolds. We propose a sparsity-driven latent space sampling (SDLSS) framework and develop a proximal meta-learning (PML) algorithm to enforce sparsity in the latent space. SDLSS allows the range-space of the generator to be considered as a union-of-submanifolds. We also derive the sample complexity bounds within the SDLSS framework for the linear measurement model. The results demonstrate that for a higher degree of compression, the SDLSS method is more efficient than the state-of-the-art method. We first consider a comparison between linear and nonlinear sensing mechanisms on Fashion-MNIST dataset and show that the learned nonlinear version is superior to the linear one. Subsequent comparisons with the deep compressive sensing (DCS) framework proposed in the literature are reported. We also consider the effect of the dimension of the latent space and the sparsity factor in validating the SDLSS framework. Performance quantification is carried out by employing three objective metrics: peak signal-to-noise ratio (PSNR), structural similarity index metric (SSIM), and reconstruction error (RE).
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Quantized Proximal Averaging Network for Analysis Sparse Coding
Authors:
Kartheek Kumar Reddy Nareddy,
Mani Madhoolika Bulusu,
Praveen Kumar Pokala,
Chandra Sekhar Seelamantula
Abstract:
We solve the analysis sparse coding problem considering a combination of convex and non-convex sparsity promoting penalties. The multi-penalty formulation results in an iterative algorithm involving proximal-averaging. We then unfold the iterative algorithm into a trainable network that facilitates learning the sparsity prior. We also consider quantization of the network weights. Quantization make…
▽ More
We solve the analysis sparse coding problem considering a combination of convex and non-convex sparsity promoting penalties. The multi-penalty formulation results in an iterative algorithm involving proximal-averaging. We then unfold the iterative algorithm into a trainable network that facilitates learning the sparsity prior. We also consider quantization of the network weights. Quantization makes neural networks efficient both in terms of memory and computation during inference, and also renders them compatible for low-precision hardware deployment. Our learning algorithm is based on a variant of the ADAM optimizer in which the quantizer is part of the forward pass and the gradients of the loss function are evaluated corresponding to the quantized weights while doing a book-keeping of the high-precision weights. We demonstrate applications to compressed image recovery and magnetic resonance image reconstruction. The proposed approach offers superior reconstruction accuracy and quality than state-of-the-art unfolding techniques and the performance degradation is minimal even when the weights are subjected to extreme quantization.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
NuSPAN: A Proximal Average Network for Nonuniform Sparse Model -- Application to Seismic Reflectivity Inversion
Authors:
Swapnil Mache,
Praveen Kumar Pokala,
Kusala Rajendran,
Chandra Sekhar Seelamantula
Abstract:
We solve the problem of sparse signal deconvolution in the context of seismic reflectivity inversion, which pertains to high-resolution recovery of the subsurface reflection coefficients. Our formulation employs a nonuniform, non-convex synthesis sparse model comprising a combination of convex and non-convex regularizers, which results in accurate approximations of the l0 pseudo-norm. The resultin…
▽ More
We solve the problem of sparse signal deconvolution in the context of seismic reflectivity inversion, which pertains to high-resolution recovery of the subsurface reflection coefficients. Our formulation employs a nonuniform, non-convex synthesis sparse model comprising a combination of convex and non-convex regularizers, which results in accurate approximations of the l0 pseudo-norm. The resulting iterative algorithm requires the proximal average strategy. When unfolded, the iterations give rise to a learnable proximal average network architecture that can be optimized in a data-driven fashion. We demonstrate the efficacy of the proposed approach through numerical experiments on synthetic 1-D seismic traces and 2-D wedge models in comparison with the benchmark techniques. We also present validations considering the simulated Marmousi2 model as well as real 3-D seismic volume data acquired from the Penobscot 3D survey off the coast of Nova Scotia, Canada.
△ Less
Submitted 16 September, 2021; v1 submitted 1 May, 2021;
originally announced May 2021.
-
DuRIN: A Deep-unfolded Sparse Seismic Reflectivity Inversion Network
Authors:
Swapnil Mache,
Praveen Kumar Pokala,
Kusala Rajendran,
Chandra Sekhar Seelamantula
Abstract:
We consider the reflection seismology problem of recovering the locations of interfaces and the amplitudes of reflection coefficients from seismic data, which are vital for estimating the subsurface structure. The reflectivity inversion problem is typically solved using greedy algorithms and iterative techniques. Sparse Bayesian learning framework, and more recently, deep learning techniques have…
▽ More
We consider the reflection seismology problem of recovering the locations of interfaces and the amplitudes of reflection coefficients from seismic data, which are vital for estimating the subsurface structure. The reflectivity inversion problem is typically solved using greedy algorithms and iterative techniques. Sparse Bayesian learning framework, and more recently, deep learning techniques have shown the potential of data-driven approaches to solve the problem. In this paper, we propose a weighted minimax-concave penalty-regularized reflectivity inversion formulation and solve it through a model-based neural network. The network is referred to as deep-unfolded reflectivity inversion network (DuRIN). We demonstrate the efficacy of the proposed approach over the benchmark techniques by testing on synthetic 1-D seismic traces and 2-D wedge models and validation with the simulated 2-D Marmousi2 model and real data from the Penobscot 3D survey off the coast of Nova Scotia, Canada.
△ Less
Submitted 16 September, 2021; v1 submitted 10 April, 2021;
originally announced April 2021.
-
Robust Segmentation of Optic Disc and Cup from Fundus Images Using Deep Neural Networks
Authors:
Aniketh Manjunath,
Subramanya Jois,
Chandra Sekhar Seelamantula
Abstract:
Optic disc (OD) and optic cup (OC) are regions of prominent clinical interest in a retinal fundus image. They are the primary indicators of a glaucomatous condition. With the advent and success of deep learning for healthcare research, several approaches have been proposed for the segmentation of important features in retinal fundus images. We propose a novel approach for the simultaneous segmenta…
▽ More
Optic disc (OD) and optic cup (OC) are regions of prominent clinical interest in a retinal fundus image. They are the primary indicators of a glaucomatous condition. With the advent and success of deep learning for healthcare research, several approaches have been proposed for the segmentation of important features in retinal fundus images. We propose a novel approach for the simultaneous segmentation of the OD and OC using a residual encoder-decoder network (REDNet) based regional convolutional neural network (RCNN). The RED-RCNN is motivated by the Mask RCNN (MRCNN). Performance comparisons with the state-of-the-art techniques and extensive validations on standard publicly available fundus image datasets show that RED-RCNN has superior performance compared with MRCNN. RED-RCNN results in Sensitivity, Specificity, Accuracy, Precision, Dice and Jaccard indices of 95.64%, 99.9%, 99.82%, 95.68%, 95.64%, 91.65%, respectively, for OD segmentation, and 91.44%, 99.87%, 99.83%, 85.67%, 87.48%, 78.09%, respectively, for OC segmentation. Further, we perform two-stage glaucoma severity grading using the cup-to-disc ratio (CDR) computed based on the obtained OD/OC segmentation. The superior segmentation performance of RED-RCNN over MRCNN translates to higher accuracy in glaucoma severity grading.
△ Less
Submitted 13 December, 2020;
originally announced December 2020.
-
Teaching a GAN What Not to Learn
Authors:
Siddarth Asokan,
Chandra Sekhar Seelamantula
Abstract:
Generative adversarial networks (GANs) were originally envisioned as unsupervised generative models that learn to follow a target distribution. Variants such as conditional GANs, auxiliary-classifier GANs (ACGANs) project GANs on to supervised and semi-supervised learning frameworks by providing labelled data and using multi-class discriminators. In this paper, we approach the supervised GAN probl…
▽ More
Generative adversarial networks (GANs) were originally envisioned as unsupervised generative models that learn to follow a target distribution. Variants such as conditional GANs, auxiliary-classifier GANs (ACGANs) project GANs on to supervised and semi-supervised learning frameworks by providing labelled data and using multi-class discriminators. In this paper, we approach the supervised GAN problem from a different perspective, one that is motivated by the philosophy of the famous Persian poet Rumi who said, "The art of knowing is knowing what to ignore." In the GAN framework, we not only provide the GAN positive data that it must learn to model, but also present it with so-called negative samples that it must learn to avoid - we call this "The Rumi Framework." This formulation allows the discriminator to represent the underlying target distribution better by learning to penalize generated samples that are undesirable - we show that this capability accelerates the learning process of the generator. We present a reformulation of the standard GAN (SGAN) and least-squares GAN (LSGAN) within the Rumi setting. The advantage of the reformulation is demonstrated by means of experiments conducted on MNIST, Fashion MNIST, CelebA, and CIFAR-10 datasets. Finally, we consider an application of the proposed formulation to address the important problem of learning an under-represented class in an unbalanced dataset. The Rumi approach results in substantially lower FID scores than the standard GAN frameworks while possessing better generalization capability.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Quantization-Aware Phase Retrieval
Authors:
Subhadip Mukherjee,
Chandra Sekhar Seelamantula
Abstract:
We address the problem of phase retrieval (PR) from quantized measurements. The goal is to reconstruct a signal from quadratic measurements encoded with a finite precision, which is indeed the case in many practical applications. We develop a rank-1 projection algorithm that recovers the signal subject to ensuring consistency with the measurement, that is, the recovered signal when encoded must yi…
▽ More
We address the problem of phase retrieval (PR) from quantized measurements. The goal is to reconstruct a signal from quadratic measurements encoded with a finite precision, which is indeed the case in many practical applications. We develop a rank-1 projection algorithm that recovers the signal subject to ensuring consistency with the measurement, that is, the recovered signal when encoded must yield the same set of measurements that one started with. The rank-1 projection stems from the idea of lifting, originally proposed in the context of PhaseLift. The consistency criterion is enforced using a one-sided quadratic cost. We also determine the probability with which different vectors lead to the same set of quantized measurements, which makes it impossible to resolve them. Naturally, this probability depends on how correlated such vectors are, and how coarsely/finely the measurements get quantized. The proposed algorithm is also capable of incorporating a sparsity constraint on the signal. An analysis of the cost function reveals that it is bounded, both above and below, by functions that are dependent on how well correlated the estimate is with the ground truth. We also derive the Cramér-Rao lower bound (CRB) on the achievable reconstruction accuracy. A comparison with the state-of-the- art algorithms shows that the proposed algorithm has a higher reconstruction accuracy and is about 2 to 3 dB away from the CRB. The edge, in terms of the reconstruction signal-to-noise ratio, over the competing algorithms is higher (about 5 to 6 dB) when the quantization is coarse.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals
Authors:
Sunil Rudresh,
Aditya Vasisht,
Karthika Vijayan,
Chandra Sekhar Seelamantula
Abstract:
Time- and pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc.. There is a requirement for computationally efficient and real-time implementable algorithms. In this paper, we propose a high quality and computationally efficient time- and pitch-scaling methodology based on the glottal closure ins…
▽ More
Time- and pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc.. There is a requirement for computationally efficient and real-time implementable algorithms. In this paper, we propose a high quality and computationally efficient time- and pitch-scaling methodology based on the glottal closure instants (GCIs) or epochs in speech signals. The proposed algorithm, termed as epoch-synchronous overlap-add time/pitch-scaling (ESOLA-TS/PS), segments speech signals into overlapping short-time frames and then the adjacent frames are aligned with respect to the epochs and the frames are overlap-added to synthesize time-scale modified speech. Pitch scaling is achieved by resampling the time-scaled speech by a desired sampling factor. We also propose a concept of epoch embedding into speech signals, which facilitates the identification and time-stamping of samples corresponding to epochs and using them for time/pitch-scaling to multiple scaling factors whenever desired, thereby contributing to faster and efficient implementation. The results of perceptual evaluation tests reported in this paper indicate the superiority of ESOLA over state-of-the-art techniques. ESOLA significantly outperforms the conventional pitch synchronous overlap-add (PSOLA) techniques in terms of perceptual quality and intelligibility of the modified speech. Unlike the waveform similarity overlap-add (WSOLA) or synchronous overlap-add (SOLA) techniques, the ESOLA technique has the capability to do exact time-scaling of speech with high quality to any desired modification factor within a range of 0.5 to 2. Compared to synchronous overlap-add with fixed synthesis (SOLAFS), the ESOLA is computationally advantageous and at least three times faster.
△ Less
Submitted 19 January, 2018;
originally announced January 2018.
-
PROSE: Perceptual Risk Optimization for Speech Enhancement
Authors:
Jishnu Sadasivan,
Chandra Sekhar Seelamantula,
Nagarjuna Reddy Muraka
Abstract:
The goal in speech enhancement is to obtain an estimate of clean speech starting from the noisy signal by minimizing a chosen distortion measure, which results in an estimate that depends on the unknown clean signal or its statistics. Since access to such prior knowledge is limited or not possible in practice, one has to estimate the clean signal statistics. In this paper, we develop a new risk mi…
▽ More
The goal in speech enhancement is to obtain an estimate of clean speech starting from the noisy signal by minimizing a chosen distortion measure, which results in an estimate that depends on the unknown clean signal or its statistics. Since access to such prior knowledge is limited or not possible in practice, one has to estimate the clean signal statistics. In this paper, we develop a new risk minimization framework for speech enhancement, in which, one optimizes an unbiased estimate of the distortion/risk instead of the actual risk. The estimated risk is expressed solely as a function of the noisy observations. We consider several perceptually relevant distortion measures and develop corresponding unbiased estimates under realistic assumptions on the noise distribution and a priori signal-to-noise ratio (SNR). Minimizing the risk estimates gives rise to the corresponding denoisers, which are nonlinear functions of the a posteriori SNR. Perceptual evaluation of speech quality (PESQ), average segmental SNR (SSNR) computations, and listening tests show that the proposed risk optimization approach employing Itakura-Saito and weighted hyperbolic cosine distortions gives better performance than the other distortion measures. For SNRs greater than 5 dB, the proposed approach gives superior denoising performance over the benchmark techniques based on the Wiener filter, log-MMSE minimization, and Bayesian nonnegative matrix factorization.
△ Less
Submitted 11 October, 2017;
originally announced October 2017.
-
A Non-Convex Optimization Technique for Sparse Blind Deconvolution -- Initialization Aspects and Error Reduction Properties
Authors:
Aniruddha Adiga,
Chandra Sekhar Seelamantula
Abstract:
Sparse blind deconvolution is the problem of estimating the blur kernel and sparse excitation, both of which are unknown. Considering a linear convolution model, as opposed to the standard circular convolution model, we derive a sufficient condition for stable deconvolution. The columns of the linear convolution matrix form a Riesz basis with the tightness of the Riesz bounds determined by the aut…
▽ More
Sparse blind deconvolution is the problem of estimating the blur kernel and sparse excitation, both of which are unknown. Considering a linear convolution model, as opposed to the standard circular convolution model, we derive a sufficient condition for stable deconvolution. The columns of the linear convolution matrix form a Riesz basis with the tightness of the Riesz bounds determined by the autocorrelation of the blur kernel. Employing a Bayesian framework results in a non-convex, non-smooth cost function consisting of an $\ell_2$ data-fidelity term and a sparsity promoting $\ell_p$-norm ($0 \le p \le 1$) regularizer. Since the $\ell_p$-norm is not differentiable at the origin, we employ an $ε$-regularized $\ell_p$-norm as a surrogate. The data term is also non-convex in both the blur kernel and excitation. An iterative scheme termed alternating minimization (Alt. Min.) $\ell_p-\ell_2$ projections algorithm (ALPA) is developed for optimization of the $ε$-regularized cost function. Further, we demonstrate that, in every iteration, the $ε$-regularized cost function is non-increasing and more importantly, bounds the original $\ell_p$-norm-based cost. Due to non-convexity of the cost, the accuracy of estimation is largely influenced by the initialization. Considering regularized least-squares estimate as the initialization, we analyze how the initialization errors are concentrated, first in Gaussian noise, and then in bounded noise, the latter case resulting in tighter bounds. Comparisons with state-of-the-art blind deconvolution algorithms show that the deconvolution accuracy is higher in case of ALPA. In the context of natural speech signals, ALPA results in accurate deconvolution of a voiced speech segment into a sparse excitation and smooth vocal tract response.
△ Less
Submitted 11 October, 2017; v1 submitted 24 August, 2017;
originally announced August 2017.
-
Phase Retrieval From Binary Measurements
Authors:
Subhadip Mukherjee,
Chandra Sekhar Seelamantula
Abstract:
We consider the problem of signal reconstruction from quadratic measurements that are encoded as +1 or -1 depending on whether they exceed a predetermined positive threshold or not. Binary measurements are fast to acquire and inexpensive in terms of hardware. We formulate the problem of signal reconstruction using a consistency criterion, wherein one seeks to find a signal that is in agreement wit…
▽ More
We consider the problem of signal reconstruction from quadratic measurements that are encoded as +1 or -1 depending on whether they exceed a predetermined positive threshold or not. Binary measurements are fast to acquire and inexpensive in terms of hardware. We formulate the problem of signal reconstruction using a consistency criterion, wherein one seeks to find a signal that is in agreement with the measurements. To enforce consistency, we construct a convex cost using a one-sided quadratic penalty and minimize it using an iterative accelerated projected gradient-descent (APGD) technique. The PGD scheme reduces the cost function in each iteration, whereas incorporating momentum into PGD, notwithstanding the lack of such a descent property, exhibits faster convergence than PGD empirically. We refer to the resulting algorithm as binary phase retrieval (BPR). Considering additive white noise contamination prior to quantization, we also derive the Cramer-Rao Bound (CRB) for the binary encoding model. Experimental results demonstrate that the BPR algorithm yields a signal-to- reconstruction error ratio (SRER) of approximately 25 dB in the absence of noise. In the presence of noise prior to quantization, the SRER is within 2 to 3 dB of the CRB.
△ Less
Submitted 16 November, 2017; v1 submitted 2 August, 2017;
originally announced August 2017.
-
Online Reweighted Least Squares Algorithm for Sparse Recovery and Application to Short-Wave Infrared Imaging
Authors:
Subhadip Mukherjee,
Deepak R.,
Huaijin Chen,
Ashok Veeraraghavan,
Chandra Sekhar Seelamantula
Abstract:
We address the problem of sparse recovery in an online setting, where random linear measurements of a sparse signal are revealed sequentially and the objective is to recover the underlying signal. We propose a reweighted least squares (RLS) algorithm to solve the problem of online sparse reconstruction, wherein a system of linear equations is solved using conjugate gradient with the arrival of eve…
▽ More
We address the problem of sparse recovery in an online setting, where random linear measurements of a sparse signal are revealed sequentially and the objective is to recover the underlying signal. We propose a reweighted least squares (RLS) algorithm to solve the problem of online sparse reconstruction, wherein a system of linear equations is solved using conjugate gradient with the arrival of every new measurement. The proposed online algorithm is useful in a setting where one seeks to design a progressive decoding strategy to reconstruct a sparse signal from linear measurements so that one does not have to wait until all measurements are acquired. Moreover, the proposed algorithm is also useful in applications where it is infeasible to process all the measurements using a batch algorithm, owing to computational and storage constraints. It is not needed a priori to collect a fixed number of measurements; rather one can keep collecting measurements until the quality of reconstruction is satisfactory and stop taking further measurements once the reconstruction is sufficiently accurate. We provide a proof-of-concept by comparing the performance of our algorithm with the RLS-based batch reconstruction strategy, known as iteratively reweighted least squares (IRLS), on natural images. Experiments on a recently proposed focal plane array-based imaging setup show up to 1 dB improvement in output peak signal-to-noise ratio as compared with the total variation-based reconstruction.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
Deep Sparse Coding Using Optimized Linear Expansion of Thresholds
Authors:
Debabrata Mahapatra,
Subhadip Mukherjee,
Chandra Sekhar Seelamantula
Abstract:
We address the problem of reconstructing sparse signals from noisy and compressive measurements using a feed-forward deep neural network (DNN) with an architecture motivated by the iterative shrinkage-thresholding algorithm (ISTA). We maintain the weights and biases of the network links as prescribed by ISTA and model the nonlinear activation function using a linear expansion of thresholds (LET),…
▽ More
We address the problem of reconstructing sparse signals from noisy and compressive measurements using a feed-forward deep neural network (DNN) with an architecture motivated by the iterative shrinkage-thresholding algorithm (ISTA). We maintain the weights and biases of the network links as prescribed by ISTA and model the nonlinear activation function using a linear expansion of thresholds (LET), which has been very successful in image denoising and deconvolution. The optimal set of coefficients of the parametrized activation is learned over a training dataset containing measurement-sparse signal pairs, corresponding to a fixed sensing matrix. For training, we develop an efficient second-order algorithm, which requires only matrix-vector product computations in every training epoch (Hessian-free optimization) and offers superior convergence performance than gradient-descent optimization. Subsequently, we derive an improved network architecture inspired by FISTA, a faster version of ISTA, to achieve similar signal estimation performance with about 50% of the number of layers. The resulting architecture turns out to be a deep residual network, which has recently been shown to exhibit superior performance in several visual recognition tasks. Numerical experiments demonstrate that the proposed DNN architectures lead to 3 to 4 dB improvement in the reconstruction signal-to-noise ratio (SNR), compared with the state-of-the-art sparse coding algorithms.
△ Less
Submitted 20 May, 2017;
originally announced May 2017.
-
Super-Resolution From Binary Measurements With Unknown Threshold
Authors:
Subhadip Mukherjee,
Anjany Kumar Sekuboyina,
Chandra Sekhar Seelamantula
Abstract:
We address the problem of super-resolution of point sources from binary measurements, where random projections of the blurred measurement of the actual signal are encoded using only the sign information. The threshold used for binary quantization is not known to the decoder. We develop an algorithm that solves convex programs iteratively and achieves signal recovery. The proposed algorithm, which…
▽ More
We address the problem of super-resolution of point sources from binary measurements, where random projections of the blurred measurement of the actual signal are encoded using only the sign information. The threshold used for binary quantization is not known to the decoder. We develop an algorithm that solves convex programs iteratively and achieves signal recovery. The proposed algorithm, which we refer to as the binary super-resolution (BSR) algorithm, recovers point sources with reasonable accuracy, albeit up to a scale factor. We show through simulations that the BSR algorithm is successful in recovering the locations and the amplitudes of the point sources, even in the presence of significant amount of blurring. We also propose a framework for handling noisy measurements and demonstrate that BSR gives a reliable reconstruction (correspondingly, reconstruction signal-to-noise ratio (SNR) of about 22 dB) for a measurement SNR of 15 dB.
△ Less
Submitted 13 May, 2016;
originally announced June 2016.
-
Risk Estimation Without Using Stein's Lemma -- Application to Image Denoising
Authors:
Sagar Venkatesh Gubbi,
Chandra Sekhar Seelamantula
Abstract:
We address the problem of image denoising in additive white noise without placing restrictive assumptions on its statistical distribution. In the recent literature, specific noise distributions have been considered and correspondingly, optimal denoising techniques have been developed. One of the successful approaches for denoising relies on the notion of unbiased risk estimation, which enables one…
▽ More
We address the problem of image denoising in additive white noise without placing restrictive assumptions on its statistical distribution. In the recent literature, specific noise distributions have been considered and correspondingly, optimal denoising techniques have been developed. One of the successful approaches for denoising relies on the notion of unbiased risk estimation, which enables one to obtain a useful substitute for the mean-square error. For the case of additive white Gaussian noise contamination, the risk estimation procedure relies on Stein's lemma. Sophisticated wavelet-based denoising techniques, which are essentially nonlinear, have been developed with the help of the lemma. We show that, for linear, shift-invariant denoisers, it is possible to obtain unbiased risk estimates of the mean-square error without using Stein's lemma. An interesting consequence of this development is that the unbiased risk estimator becomes agnostic to the statistical distribution of the noise. As a proof of principle, we show how the new methodology can be used to optimize the parameters of a simple Gaussian smoother. By locally adapting the parameters of the Gaussian smoother, we obtain a shift-variant smoother, which has a denoising performance (quantified by the improvement in peak signal-to-noise ratio (PSNR)) that is competitive to far more sophisticated methods reported in the literature. The proposed solution exhibits considerable parallelism, which we exploit in a Graphics Processing Unit (GPU) implementation.
△ Less
Submitted 27 January, 2015; v1 submitted 6 December, 2014;
originally announced December 2014.
-
Directional Bilateral Filters
Authors:
Manasij Venkatesh,
Chandra Sekhar Seelamantula
Abstract:
We propose a bilateral filter with a locally controlled domain kernel for directional edge-preserving smoothing. Traditional bilateral filters use a range kernel, which is responsible for edge preservation, and a fixed domain kernel that performs smoothing. Our intuition is that orientation and anisotropy of image structures should be incorporated into the domain kernel while smoothing. For this p…
▽ More
We propose a bilateral filter with a locally controlled domain kernel for directional edge-preserving smoothing. Traditional bilateral filters use a range kernel, which is responsible for edge preservation, and a fixed domain kernel that performs smoothing. Our intuition is that orientation and anisotropy of image structures should be incorporated into the domain kernel while smoothing. For this purpose, we employ an oriented Gaussian domain kernel locally controlled by a structure tensor. The oriented domain kernel combined with a range kernel forms the directional bilateral filter. The two kernels assist each other in effectively suppressing the influence of the outliers while smoothing. To find the optimal parameters of the directional bilateral filter, we propose the use of Stein's unbiased risk estimate (SURE). We test the capabilities of the kernels separately as well as together, first on synthetic images, and then on real endoscopic images. The directional bilateral filter has better denoising performance than the Gaussian bilateral filter at various noise levels in terms of peak signal-to-noise ratio (PSNR).
△ Less
Submitted 27 October, 2014;
originally announced October 2014.
-
A Risk Minimization Framework for Channel Estimation in OFDM Systems
Authors:
Karthik Upadhya,
Chandra Sekhar Seelamantula,
K. V. S. Hari
Abstract:
We address the problem of channel estimation for cyclic-prefix (CP) Orthogonal Frequency Division Multiplexing (OFDM) systems. We model the channel as a vector of unknown deterministic constants and hence, do not require prior knowledge of the channel statistics. Since the mean-square error (MSE) is not computable in practice, in such a scenario, we propose a novel technique using Stein's lemma to…
▽ More
We address the problem of channel estimation for cyclic-prefix (CP) Orthogonal Frequency Division Multiplexing (OFDM) systems. We model the channel as a vector of unknown deterministic constants and hence, do not require prior knowledge of the channel statistics. Since the mean-square error (MSE) is not computable in practice, in such a scenario, we propose a novel technique using Stein's lemma to obtain an unbiased estimate of the mean-square error, namely the Stein's unbiased risk estimate (SURE). We obtain an estimate of the channel from noisy observations using linear and nonlinear denoising functions, whose parameters are chosen to minimize SURE. Based on computer simulations, we show that using SURE-based channel estimate in equalization offers an improvement in signal-to-noise ratio of around 2.25 dB over the maximum-likelihood channel estimate, in practical channel scenarios, without assuming prior knowledge of channel statistics.
△ Less
Submitted 22 October, 2014;
originally announced October 2014.
-
$\ell_1$-K-SVD: A Robust Dictionary Learning Algorithm With Simultaneous Update
Authors:
Subhadip Mukherjee,
Rupam Basu,
Chandra Sekhar Seelamantula
Abstract:
We develop a dictionary learning algorithm by minimizing the $\ell_1$ distortion metric on the data term, which is known to be robust for non-Gaussian noise contamination. The proposed algorithm exploits the idea of iterative minimization of weighted $\ell_2$ error. We refer to this algorithm as $\ell_1$-K-SVD, where the dictionary atoms and the corresponding sparse coefficients are simultaneously…
▽ More
We develop a dictionary learning algorithm by minimizing the $\ell_1$ distortion metric on the data term, which is known to be robust for non-Gaussian noise contamination. The proposed algorithm exploits the idea of iterative minimization of weighted $\ell_2$ error. We refer to this algorithm as $\ell_1$-K-SVD, where the dictionary atoms and the corresponding sparse coefficients are simultaneously updated to minimize the $\ell_1$ objective, resulting in noise-robustness. We demonstrate through experiments that the $\ell_1$-K-SVD algorithm results in higher atom recovery rate compared with the K-SVD and the robust dictionary learning (RDL) algorithm proposed by Lu et al., both in Gaussian and non-Gaussian noise conditions. We also show that, for fixed values of sparsity, number of dictionary atoms, and data-dimension, the $\ell_1$-K-SVD algorithm outperforms the K-SVD and RDL algorithms when the training set available is small. We apply the proposed algorithm for denoising natural images corrupted by additive Gaussian and Laplacian noise. The images denoised using $\ell_1$-K-SVD are observed to have slightly higher peak signal-to-noise ratio (PSNR) over K-SVD for Laplacian noise, but the improvement in structural similarity index (SSIM) is significant (approximately $0.1$) for lower values of input PSNR, indicating the efficacy of the $\ell_1$ metric.
△ Less
Submitted 2 March, 2015; v1 submitted 26 August, 2014;
originally announced October 2014.
-
A Split-and-Merge Dictionary Learning Algorithm for Sparse Representation
Authors:
Subhadip Mukherjee,
Chandra Sekhar Seelamantula
Abstract:
In big data image/video analytics, we encounter the problem of learning an overcomplete dictionary for sparse representation from a large training dataset, which can not be processed at once because of storage and computational constraints. To tackle the problem of dictionary learning in such scenarios, we propose an algorithm for parallel dictionary learning. The fundamental idea behind the algor…
▽ More
In big data image/video analytics, we encounter the problem of learning an overcomplete dictionary for sparse representation from a large training dataset, which can not be processed at once because of storage and computational constraints. To tackle the problem of dictionary learning in such scenarios, we propose an algorithm for parallel dictionary learning. The fundamental idea behind the algorithm is to learn a sparse representation in two phases. In the first phase, the whole training dataset is partitioned into small non-overlapping subsets, and a dictionary is trained independently on each small database. In the second phase, the dictionaries are merged to form a global dictionary. We show that the proposed algorithm is efficient in its usage of memory and computational complexity, and performs on par with the standard learning strategy operating on the entire data at a time. As an application, we consider the problem of image denoising. We present a comparative analysis of our algorithm with the standard learning techniques, that use the entire database at a time, in terms of training and denoising performance. We observe that the split-and-merge algorithm results in a remarkable reduction of training time, without significantly affecting the denoising performance.
△ Less
Submitted 19 March, 2014;
originally announced March 2014.
-
Template-Based Active Contours
Authors:
Jayanth Krishna Mogali,
Adithya Kumar Pediredla,
Chandra Sekhar Seelamantula
Abstract:
We develop a generalized active contour formalism for image segmentation based on shape templates. The shape template is subjected to a restricted affine transformation (RAT) in order to segment the object of interest. RAT allows for translation, rotation, and scaling, which give a total of five degrees of freedom. The proposed active contour comprises an inner and outer contour pair, which are cl…
▽ More
We develop a generalized active contour formalism for image segmentation based on shape templates. The shape template is subjected to a restricted affine transformation (RAT) in order to segment the object of interest. RAT allows for translation, rotation, and scaling, which give a total of five degrees of freedom. The proposed active contour comprises an inner and outer contour pair, which are closed and concentric. The active contour energy is a contrast function defined based on the intensities of pixels that lie inside the inner contour and those that lie in the annulus between the inner and outer contours. We show that the contrast energy functional is optimal under certain conditions. The optimal RAT parameters are computed by maximizing the contrast function using a gradient descent optimizer. We show that the calculations are made efficient through use of Green's theorem. The proposed formalism is capable of handling a variety of shapes because for a chosen template, optimization is carried with respect to the RAT parameters only. The proposed formalism is validated on multiple images to show robustness to Gaussian and Poisson noise, to initialization, and to partial loss of structure in the object to be segmented.
△ Less
Submitted 3 December, 2013;
originally announced December 2013.