-
Optimizing the Efficiency of First-Order Methods for Decreasing the Gradient of Smooth Convex Functions
Authors:
Donghwan Kim,
Jeffrey A. Fessler
Abstract:
This paper optimizes the step coefficients of first-order methods for smooth convex minimization in terms of the worst-case convergence bound (i.e., efficiency) of the decrease in the gradient norm. This work is based on the performance estimation problem approach. The worst-case gradient bound of the resulting method is optimal up to a constant for large-dimensional smooth convex minimization pro…
▽ More
This paper optimizes the step coefficients of first-order methods for smooth convex minimization in terms of the worst-case convergence bound (i.e., efficiency) of the decrease in the gradient norm. This work is based on the performance estimation problem approach. The worst-case gradient bound of the resulting method is optimal up to a constant for large-dimensional smooth convex minimization problems, under the initial bounded condition on the cost function value. This paper then illustrates that the proposed method has a computationally efficient form that is similar to the optimized gradient method.
△ Less
Submitted 27 October, 2020; v1 submitted 18 March, 2018;
originally announced March 2018.
-
Deep BCD-Net Using Identical Encoding-Decoding CNN Structures for Iterative Image Recovery
Authors:
Il Yong Chun,
Jeffrey A. Fessler
Abstract:
In "extreme" computational imaging that collects extremely undersampled or noisy measurements, obtaining an accurate image within a reasonable computing time is challenging. Incorporating image mapping convolutional neural networks (CNN) into iterative image recovery has great potential to resolve this issue. This paper 1) incorporates image mapping CNN using identical convolutional kernels in bot…
▽ More
In "extreme" computational imaging that collects extremely undersampled or noisy measurements, obtaining an accurate image within a reasonable computing time is challenging. Incorporating image mapping convolutional neural networks (CNN) into iterative image recovery has great potential to resolve this issue. This paper 1) incorporates image mapping CNN using identical convolutional kernels in both encoders and decoders into a block coordinate descent (BCD) signal recovery method and 2) applies alternating direction method of multipliers to train the aforementioned image mapping CNN. We refer to the proposed recurrent network as BCD-Net using identical encoding-decoding CNN structures. Numerical experiments show that, for a) denoising low signal-to-noise-ratio images and b) extremely undersampled magnetic resonance imaging, the proposed BCD-Net achieves significantly more accurate image recovery, compared to BCD-Net using distinct encoding-decoding structures and/or the conventional image recovery model using both wavelets and total variation.
△ Less
Submitted 28 April, 2018; v1 submitted 20 February, 2018;
originally announced February 2018.
-
Convolutional Analysis Operator Learning: Acceleration and Convergence
Authors:
Il Yong Chun,
Jeffrey A. Fessler
Abstract:
Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets -- particularly with multi-layered stru…
▽ More
Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets -- particularly with multi-layered structures, e.g., convolutional neural networks -- or when applying the learned kernels to high-dimensional signal recovery problems. The so-called convolution approach does not store many overlapping patches, and thus overcomes the memory problems particularly with careful algorithmic designs; it has been studied within the "synthesis" signal model, e.g., convolutional dictionary learning. This paper proposes a new convolutional analysis operator learning (CAOL) framework that learns an analysis sparsifying regularizer with the convolution perspective, and develops a new convergent Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve the corresponding block multi-nonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tight-frame filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, with sharp majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared to the state-of-the-art block proximal gradient (BPG) method. Numerical experiments for sparse-view computational tomography show that a convolutional sparsifying regularizer learned via CAOL significantly improves reconstruction quality compared to a conventional edge-preserving regularizer. Using more and wider kernels in a learned regularizer better preserves edges in reconstructed images.
△ Less
Submitted 11 September, 2019; v1 submitted 15 February, 2018;
originally announced February 2018.
-
Statistical Image Reconstruction Using Mixed Poisson-Gaussian Noise Model for X-Ray CT
Authors:
Qiaoqiao Ding,
Yong Long,
Xiaoqun Zhang,
Jeffrey A. Fessler
Abstract:
Statistical image reconstruction (SIR) methods for X-ray CT produce high-quality and accurate images, while greatly reducing patient exposure to radiation. When further reducing X-ray dose to an ultra-low level by lowering the tube current, photon starvation happens and electronic noise starts to dominate, which introduces negative or zero values into the raw measurements. These non-positive value…
▽ More
Statistical image reconstruction (SIR) methods for X-ray CT produce high-quality and accurate images, while greatly reducing patient exposure to radiation. When further reducing X-ray dose to an ultra-low level by lowering the tube current, photon starvation happens and electronic noise starts to dominate, which introduces negative or zero values into the raw measurements. These non-positive values pose challenges to post-log SIR methods that require taking the logarithm of the raw data, and causes artifacts in the reconstructed images if simple correction methods are used to process these non-positive raw measurements. The raw data at ultra-low dose deviates significantly from Poisson or shifted Poisson statistics for pre-log data and from Gaussian statistics for post-log data. This paper proposes a novel SIR method called MPG (mixed Poisson-Gaussian). MPG models the raw noisy measurements using a mixed Poisson-Gaussian distribution that accounts for both the quantum noise and electronic noise. MPG is able to directly use the negative and zero values in raw data without any pre-processing. MPG cost function contains a reweighted least square data-fit term, an edge preserving regularization term and a non-negativity constraint term. We use Alternating Direction Method of Multipliers (ADMM) to separate the MPG optimization problem into several sub-problems that are easier to solve. Our results on 3D simulated cone-beam data set and synthetic helical data set generated from clinical data indicate that the proposed MPG method reduces noise and decreases bias in the reconstructed images, comparing with the conventional filtered back projection (FBP), penalized weighted least-square (PWLS) and shift Poisson (SP) method for ultra-low dose CT (ULDCT) imaging.
△ Less
Submitted 19 January, 2018;
originally announced January 2018.
-
Sparse-View X-Ray CT Reconstruction Using $\ell_1$ Prior with Learned Transform
Authors:
Xuehang Zheng,
Il Yong Chun,
Zhipeng Li,
Yong Long,
Jeffrey A. Fessler
Abstract:
A major challenge in X-ray computed tomography (CT) is reducing radiation dose while maintaining high quality of reconstructed images. To reduce the radiation dose, one can reduce the number of projection views (sparse-view CT); however, it becomes difficult to achieve high-quality image reconstruction as the number of projection views decreases. Researchers have applied the concept of learning sp…
▽ More
A major challenge in X-ray computed tomography (CT) is reducing radiation dose while maintaining high quality of reconstructed images. To reduce the radiation dose, one can reduce the number of projection views (sparse-view CT); however, it becomes difficult to achieve high-quality image reconstruction as the number of projection views decreases. Researchers have applied the concept of learning sparse representations from (high-quality) CT image dataset to the sparse-view CT reconstruction. We propose a new statistical CT reconstruction model that combines penalized weighted-least squares (PWLS) and $\ell_1$ prior with learned sparsifying transform (PWLS-ST-$\ell_1$), and a corresponding efficient algorithm based on Alternating Direction Method of Multipliers (ADMM). To moderate the difficulty of tuning ADMM parameters, we propose a new ADMM parameter selection scheme based on approximated condition numbers. We interpret the proposed model by analyzing the minimum mean square error of its ($\ell_2$-norm relaxed) image update estimator. Our results with the extended cardiac-torso (XCAT) phantom data and clinical chest data show that, for sparse-view 2D fan-beam CT and 3D axial cone-beam CT, PWLS-ST-$\ell_1$ improves the quality of reconstructed images compared to the CT reconstruction methods using edge-preserving regularizer and $\ell_2$ prior with learned ST. These results also show that, for sparse-view 2D fan-beam CT, PWLS-ST-$\ell_1$ achieves comparable or better image quality and requires much shorter runtime than PWLS-DL using a learned overcomplete dictionary. Our results with clinical chest data show that, methods using the unsupervised learned prior generalize better than a state-of-the-art deep "denoising" neural network that does not use a physical imaging model.
△ Less
Submitted 15 September, 2019; v1 submitted 2 November, 2017;
originally announced November 2017.
-
Dictionary-Free MRI PERK: Parameter Estimation via Regression with Kernels
Authors:
Gopal Nataraj,
Jon-Fredrik Nielsen,
Clayton Scott,
Jeffrey A. Fessler
Abstract:
This paper introduces a fast, general method for dictionary-free parameter estimation in quantitative magnetic resonance imaging (QMRI) via regression with kernels (PERK). PERK first uses prior distributions and the nonlinear MR signal model to simulate many parameter-measurement pairs. Inspired by machine learning, PERK then takes these parameter-measurement pairs as labeled training points and l…
▽ More
This paper introduces a fast, general method for dictionary-free parameter estimation in quantitative magnetic resonance imaging (QMRI) via regression with kernels (PERK). PERK first uses prior distributions and the nonlinear MR signal model to simulate many parameter-measurement pairs. Inspired by machine learning, PERK then takes these parameter-measurement pairs as labeled training points and learns from them a nonlinear regression function using kernel functions and convex optimization. PERK admits a simple implementation as per-voxel nonlinear lifting of MRI measurements followed by linear minimum mean-squared error regression. We demonstrate PERK for $T_1,T_2$ estimation, a well-studied application where it is simple to compare PERK estimates against dictionary-based grid search estimates. Numerical simulations as well as single-slice phantom and in vivo experiments demonstrate that PERK and grid search produce comparable $T_1,T_2$ estimates in white and gray matter, but PERK is consistently at least $23\times$ faster. This acceleration factor will increase by several orders of magnitude for full-volume QMRI estimation problems involving more latent parameters per voxel.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.
-
Medical image reconstruction: a brief overview of past milestones and future directions
Authors:
Jeffrey A. Fessler
Abstract:
This paper briefly reviews past milestones in the field of medical image reconstruction and describes some future directions. It is part of an overview paper on "open problems in signal processing" that will appear in IEEE Signal Processing Magazine, but presented here with citations and equations.
This paper briefly reviews past milestones in the field of medical image reconstruction and describes some future directions. It is part of an overview paper on "open problems in signal processing" that will appear in IEEE Signal Processing Magazine, but presented here with citations and equations.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.
-
Low Dose CT Image Reconstruction With Learned Sparsifying Transform
Authors:
Xuehang Zheng,
Zening Lu,
Saiprasad Ravishankar,
Yong Long,
Jeffrey A. Fessler
Abstract:
A major challenge in computed tomography (CT) is to reduce X-ray dose to a low or even ultra-low level while maintaining the high quality of reconstructed images. We propose a new method for CT reconstruction that combines penalized weighted-least squares reconstruction (PWLS) with regularization based on a sparsifying transform (PWLS-ST) learned from a dataset of numerous CT images. We adopt an a…
▽ More
A major challenge in computed tomography (CT) is to reduce X-ray dose to a low or even ultra-low level while maintaining the high quality of reconstructed images. We propose a new method for CT reconstruction that combines penalized weighted-least squares reconstruction (PWLS) with regularization based on a sparsifying transform (PWLS-ST) learned from a dataset of numerous CT images. We adopt an alternating algorithm to optimize the PWLS-ST cost function that alternates between a CT image update step and a sparse coding step. We adopt a relaxed linearized augmented Lagrangian method with ordered-subsets (relaxed OS-LALM) to accelerate the CT image update step by reducing the number of forward and backward projections. Numerical experiments on the XCAT phantom show that for low dose levels, the proposed PWLS-ST method dramatically improves the quality of reconstructed images compared to PWLS reconstruction with a nonadaptive edge-preserving regularizer (PWLS-EP).
△ Less
Submitted 10 July, 2017;
originally announced July 2017.
-
Convolutional Dictionary Learning: Acceleration and Convergence
Authors:
Il Yong Chun,
Jeffrey A. Fessler
Abstract:
Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence…
▽ More
Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach.
△ Less
Submitted 25 August, 2017; v1 submitted 2 July, 2017;
originally announced July 2017.
-
PWLS-ULTRA: An Efficient Clustering and Learning-Based Approach for Low-Dose 3D CT Image Reconstruction
Authors:
Xuehang Zheng,
Saiprasad Ravishankar,
Yong Long,
Jeffrey A. Fessler
Abstract:
The development of computed tomography (CT) image reconstruction methods that significantly reduce patient radiation exposure while maintaining high image quality is an important area of research in low-dose CT (LDCT) imaging. We propose a new penalized weighted least squares (PWLS) reconstruction method that exploits regularization based on an efficient Union of Learned TRAnsforms (PWLS-ULTRA). T…
▽ More
The development of computed tomography (CT) image reconstruction methods that significantly reduce patient radiation exposure while maintaining high image quality is an important area of research in low-dose CT (LDCT) imaging. We propose a new penalized weighted least squares (PWLS) reconstruction method that exploits regularization based on an efficient Union of Learned TRAnsforms (PWLS-ULTRA). The union of square transforms is pre-learned from numerous image patches extracted from a dataset of CT images or volumes. The proposed PWLS-based cost function is optimized by alternating between a CT image reconstruction step, and a sparse coding and clustering step. The CT image reconstruction step is accelerated by a relaxed linearized augmented Lagrangian method with ordered-subsets that reduces the number of forward and back projections. Simulations with 2-D and 3-D axial CT scans of the extended cardiac-torso phantom and 3D helical chest and abdomen scans show that for both normal-dose and low-dose levels, the proposed method significantly improves the quality of reconstructed images compared to PWLS reconstruction with a nonadaptive edge-preserving regularizer (PWLS-EP). PWLS with regularization based on a union of learned transforms leads to better image reconstructions than using a single learned square transform. We also incorporate patch-based weights in PWLS-ULTRA that enhance image quality and help improve image resolution uniformity. The proposed approach achieves comparable or better image quality compared to learned overcomplete synthesis dictionaries, but importantly, is much faster (computationally more efficient).
△ Less
Submitted 1 June, 2018; v1 submitted 27 March, 2017;
originally announced March 2017.
-
Asymptotic performance of PCA for high-dimensional heteroscedastic data
Authors:
David Hong,
Laura Balzano,
Jeffrey A. Fessler
Abstract:
Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for hig…
▽ More
Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simplified expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that, for a fixed average noise variance, the asymptotic recovery of PCA for heteroscedastic data is always worse than that for homoscedastic data (i.e., for noise variances that are equal across samples). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data.
△ Less
Submitted 23 June, 2018; v1 submitted 20 March, 2017;
originally announced March 2017.
-
Adaptive Restart of the Optimized Gradient Method for Convex Optimization
Authors:
Donghwan Kim,
Jeffrey A. Fessler
Abstract:
First-order methods with momentum such as Nesterov's fast gradient method are very useful for convex optimization problems, but can exhibit undesirable oscillations yielding slow convergence rates for some applications. An adaptive restarting scheme can improve the convergence rate of the fast gradient method, when the parameter of a strongly convex cost function is unknown or when the iterates of…
▽ More
First-order methods with momentum such as Nesterov's fast gradient method are very useful for convex optimization problems, but can exhibit undesirable oscillations yielding slow convergence rates for some applications. An adaptive restarting scheme can improve the convergence rate of the fast gradient method, when the parameter of a strongly convex cost function is unknown or when the iterates of the algorithm enter a locally strongly convex region. Recently, we introduced the optimized gradient method, a first-order algorithm that has an inexpensive per-iteration computational cost similar to that of the fast gradient method, yet has a worst-case cost function rate that is twice faster than that of the fast gradient method and that is optimal for large-dimensional smooth convex problems. Building upon the success of accelerating the fast gradient method using adaptive restart, this paper investigates similar heuristic acceleration of the optimized gradient method. We first derive a new first-order method that resembles the optimized gradient method for strongly convex quadratic problems with known function parameters, yielding a linear convergence rate that is faster than that of the analogous version of the fast gradient method. We then provide a heuristic analysis and numerical experiments that illustrate that adaptive restart can accelerate the convergence of the optimized gradient method. Numerical results also illustrate that adaptive restart is helpful for a proximal version of the optimized gradient method for nonsmooth composite convex functions.
△ Less
Submitted 27 November, 2017; v1 submitted 14 March, 2017;
originally announced March 2017.
-
Low-rank and Adaptive Sparse Signal (LASSI) Models for Highly Accelerated Dynamic Imaging
Authors:
Saiprasad Ravishankar,
Brian E. Moore,
Raj Rao Nadakuditi,
Jeffrey A. Fessler
Abstract:
Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery from undersampled measurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the pat…
▽ More
Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery from undersampled measurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the patches of the underlying data are modeled as sparse in an adaptive dictionary domain, and the resulting image and dictionary estimation from undersampled measurements is called dictionary-blind compressed sensing, or the dynamic image sequence is modeled as a sum of low-rank and sparse (in some transform domain) components (L+S model) that are estimated from limited measurements. In this work, we investigate a data-adaptive extension of the L+S model, dubbed LASSI, where the temporal image sequence is decomposed into a low-rank component and a component whose spatiotemporal (3D) patches are sparse in some adaptive dictionary domain. We investigate various formulations and efficient methods for jointly estimating the underlying dynamic signal components and the spatiotemporal dictionary from limited measurements. We also obtain efficient sparsity penalized dictionary-blind compressed sensing methods as special cases of our LASSI approaches. Our numerical experiments demonstrate the promising performance of LASSI schemes for dynamic magnetic resonance image reconstruction from limited k-t space data compared to recent methods such as k-t SLR and L+S, and compared to the proposed dictionary-blind compressed sensing method.
△ Less
Submitted 9 January, 2017; v1 submitted 12 November, 2016;
originally announced November 2016.
-
Towards a Theoretical Analysis of PCA for Heteroscedastic Data
Authors:
David Hong,
Laura Balzano,
Jeffrey A. Fessler
Abstract:
Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. PCA performs well in the presence of moderate noise and even with missing data, but is also sensitive to outliers. PCA is also known to have a phase transition w…
▽ More
Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. PCA performs well in the presence of moderate noise and even with missing data, but is also sensitive to outliers. PCA is also known to have a phase transition when noise is independent and identically distributed; recovery of the subspace sharply declines at a threshold noise variance. Effective use of PCA requires a rigorous understanding of these behaviors. This paper provides a step towards an analysis of PCA for samples with heteroscedastic noise, that is, samples that have non-uniform noise variances and so are no longer identically distributed. In particular, we provide a simple asymptotic prediction of the recovery of a one-dimensional subspace from noisy heteroscedastic samples. The prediction enables: a) easy and efficient calculation of the asymptotic performance, and b) qualitative reasoning to understand how PCA is impacted by heteroscedasticity (such as outliers).
△ Less
Submitted 12 October, 2016;
originally announced October 2016.
-
Fast dual proximal gradient algorithms with rate $O(1/k^{1.5})$ for convex minimization
Authors:
Donghwan Kim,
Jeffrey A. Fessler
Abstract:
We consider minimizing the composite function that consists of a strongly convex function and a convex function. The fast dual proximal gradient (FDPG) method decreases the dual function with a rate $O(1/k^2)$, leading to a rate $O(1/k)$ for decreasing the primal function. We propose a generalized FDPG method that guarantees an $O(1/k^{1.5})$ rate for the dual proximal gradient norm decrease. By r…
▽ More
We consider minimizing the composite function that consists of a strongly convex function and a convex function. The fast dual proximal gradient (FDPG) method decreases the dual function with a rate $O(1/k^2)$, leading to a rate $O(1/k)$ for decreasing the primal function. We propose a generalized FDPG method that guarantees an $O(1/k^{1.5})$ rate for the dual proximal gradient norm decrease. By relating this to the primal function decrease, the proposed approach decreases the primal function with the improved $O(1/k^{1.5})$ rate.
△ Less
Submitted 29 September, 2016;
originally announced September 2016.
-
Another look at the fast iterative shrinkage/thresholding algorithm (FISTA)
Authors:
Donghwan Kim,
Jeffrey A. Fessler
Abstract:
This paper provides a new way of developing the fast iterative shrinkage/thresholding algorithm (FISTA) that is widely used for minimizing composite convex functions with a nonsmooth term such as the $\ell_1$ regularizer. In particular, this paper shows that FISTA corresponds to an optimized approach to accelerating the proximal gradient method with respect to a worst-case bound of the cost functi…
▽ More
This paper provides a new way of developing the fast iterative shrinkage/thresholding algorithm (FISTA) that is widely used for minimizing composite convex functions with a nonsmooth term such as the $\ell_1$ regularizer. In particular, this paper shows that FISTA corresponds to an optimized approach to accelerating the proximal gradient method with respect to a worst-case bound of the cost function. This paper then proposes a new algorithm that is derived by instead optimizing the step coefficients of the proximal gradient method with respect to a worst-case bound of the composite gradient mapping. The proof is based on the worst-case analysis called Performance Estimation Problem.
△ Less
Submitted 22 January, 2018; v1 submitted 12 August, 2016;
originally announced August 2016.
-
Generalizing the optimized gradient method for smooth convex minimization
Authors:
Donghwan Kim,
Jeffrey A. Fessler
Abstract:
This paper generalizes the optimized gradient method (OGM) that achieves the optimal worst-case cost function bound of first-order methods for smooth convex minimization. Specifically, this paper studies a generalized formulation of OGM and analyzes its worst-case rates in terms of both the function value and the norm of the function gradient. This paper also develops a new algorithm called OGM-OG…
▽ More
This paper generalizes the optimized gradient method (OGM) that achieves the optimal worst-case cost function bound of first-order methods for smooth convex minimization. Specifically, this paper studies a generalized formulation of OGM and analyzes its worst-case rates in terms of both the function value and the norm of the function gradient. This paper also develops a new algorithm called OGM-OG that is in the generalized family of OGM and that has the best known analytical worst-case bound with rate $O(1/N^{1.5})$ on the decrease of the gradient norm among fixed-step first-order methods. This paper also proves that Nesterov's fast gradient method has an $O(1/N^{1.5})$ worst-case gradient norm rate but with constant larger than OGM-OG. The proof is based on the worst-case analysis called Performance Estimation Problem.
△ Less
Submitted 1 April, 2018; v1 submitted 22 July, 2016;
originally announced July 2016.
-
Relaxed Linearized Algorithms for Faster X-Ray CT Image Reconstruction
Authors:
Hung Nien,
Jeffrey A. Fessler
Abstract:
Statistical image reconstruction (SIR) methods are studied extensively for X-ray computed tomography (CT) due to the potential of acquiring CT scans with reduced X-ray dose while maintaining image quality. However, the longer reconstruction time of SIR methods hinders their use in X-ray CT in practice. To accelerate statistical methods, many optimization techniques have been investigated. Over-rel…
▽ More
Statistical image reconstruction (SIR) methods are studied extensively for X-ray computed tomography (CT) due to the potential of acquiring CT scans with reduced X-ray dose while maintaining image quality. However, the longer reconstruction time of SIR methods hinders their use in X-ray CT in practice. To accelerate statistical methods, many optimization techniques have been investigated. Over-relaxation is a common technique to speed up convergence of iterative algorithms. For instance, using a relaxation parameter that is close to two in alternating direction method of multipliers (ADMM) has been shown to speed up convergence significantly. This paper proposes a relaxed linearized augmented Lagrangian (AL) method that shows theoretical faster convergence rate with over-relaxation and applies the proposed relaxed linearized AL method to X-ray CT image reconstruction problems. Experimental results with both simulated and real CT scan data show that the proposed relaxed algorithm (with ordered-subsets [OS] acceleration) is about twice as fast as the existing unrelaxed fast algorithms, with negligible computation and memory overhead.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) - The $\ell_0$ Method
Authors:
Saiprasad Ravishankar,
Raj Rao Nadakuditi,
Jeffrey A. Fessler
Abstract:
The sparsity of natural signals and images in a transform domain or dictionary has been extensively exploited in several applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise in many applications compared to fixed or analytical dictionary models. However, dictionary learning problems are typically non-con…
▽ More
The sparsity of natural signals and images in a transform domain or dictionary has been extensively exploited in several applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise in many applications compared to fixed or analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. In this work, we investigate an efficient method for $\ell_{0}$ "norm"-based dictionary learning by first approximating the training data set with a sum of sparse rank-one matrices and then using a block coordinate descent approach to estimate the unknowns. The proposed block coordinate descent algorithm involves efficient closed-form solutions. In particular, the sparse coding step involves a simple form of thresholding. We provide a convergence analysis for the proposed block coordinate descent approach. Our numerical experiments show the promising performance and significant speed-ups provided by our method over the classical K-SVD scheme in sparse signal representation and image denoising.
△ Less
Submitted 20 April, 2017; v1 submitted 27 November, 2015;
originally announced November 2015.
-
Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) and Its Application to Inverse Problems
Authors:
Saiprasad Ravishankar,
Raj Rao Nadakuditi,
Jeffrey A. Fessler
Abstract:
The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches fo…
▽ More
The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. This paper exploits the ideas that drive algorithms such as K-SVD, and investigates in detail efficient methods for aggregate sparsity penalized dictionary learning by first approximating the data with a sum of sparse rank-one matrices (outer products) and then using a block coordinate descent approach to estimate the unknowns. The resulting block coordinate descent algorithms involve efficient closed-form solutions. Furthermore, we consider the problem of dictionary-blind image reconstruction, and propose novel and efficient algorithms for adaptive image reconstruction using block coordinate descent and sum of outer products methodologies. We provide a convergence study of the algorithms for dictionary learning and dictionary-blind image reconstruction. Our numerical experiments show the promising performance and speed-ups provided by the proposed methods over previous schemes in sparse data representation and compressed sensing-based image reconstruction.
△ Less
Submitted 20 April, 2017; v1 submitted 19 November, 2015;
originally announced November 2015.
-
On the convergence analysis of the optimized gradient method
Authors:
Donghwan Kim,
Jeffrey A. Fessler
Abstract:
This paper considers the problem of unconstrained minimization of smooth convex functions having Lipschitz continuous gradients with known Lipschitz constant. We recently proposed an optimized gradient method (OGM) for this problem and showed that it has a worst-case convergence bound for the cost function decrease that is twice as small as that of Nesterov's fast gradient method (FGM), yet has a…
▽ More
This paper considers the problem of unconstrained minimization of smooth convex functions having Lipschitz continuous gradients with known Lipschitz constant. We recently proposed an optimized gradient method (OGM) for this problem and showed that it has a worst-case convergence bound for the cost function decrease that is twice as small as that of Nesterov's fast gradient method (FGM), yet has a similarly efficient practical implementation. Drori showed recently that OGM has optimal complexity over the general class of first-order methods. This optimality makes it important to study fully the convergence properties of OGM. The previous worst-case convergence bound for OGM was derived for only the last iterate of a secondary sequence. This paper provides an analytic convergence bound for the primary sequence generated by OGM. We then discuss additional convergence properties of OGM, including the interesting fact that OGM has two types of worst-case functions: a piecewise affine-quadratic function and a quadratic function. These results help complete the theory of optimal first-order methods for smooth convex minimization.
△ Less
Submitted 27 June, 2016; v1 submitted 29 October, 2015;
originally announced October 2015.
-
Algorithmic Design of Majorizers for Large-Scale Inverse Problems
Authors:
Madison G. McGaffin,
Jeffrey A. Fessler
Abstract:
Iterative majorize-minimize (MM) (also called optimization transfer) algorithms solve challenging numerical optimization problems by solving a series of "easier" optimization problems that are constructed to guarantee monotonic descent of the cost function. Many MM algorithms replace a computationally expensive Hessian matrix with another more computationally convenient majorizing matrix. These ma…
▽ More
Iterative majorize-minimize (MM) (also called optimization transfer) algorithms solve challenging numerical optimization problems by solving a series of "easier" optimization problems that are constructed to guarantee monotonic descent of the cost function. Many MM algorithms replace a computationally expensive Hessian matrix with another more computationally convenient majorizing matrix. These majorizing matrices are often generated using various matrix inequalities, and consequently the set of available majorizers is limited to structures for which these matrix inequalities can be efficiently applied. In this paper, we present a technique to algorithmically design matrix majorizers with wide varieties of structures. We use a novel duality-based approach to avoid the high computational and memory costs of standard semidefinite programming techniques. We present some preliminary results for 2D X-ray CT reconstruction that indicate these more exotic regularizers may significantly accelerate MM algorithms.
△ Less
Submitted 21 October, 2015; v1 submitted 12 August, 2015;
originally announced August 2015.
-
Undersampled Phase Retrieval with Outliers
Authors:
Daniel S. Weller,
Ayelet Pnueli,
Gilad Divon,
Ori Radzyner,
Yonina C. Eldar,
Jeffrey A. Fessler
Abstract:
We propose a general framework for reconstructing transform-sparse images from undersampled (squared)-magnitude data corrupted with outliers. This framework is implemented using a multi-layered approach, combining multiple initializations (to address the nonconvexity of the phase retrieval problem), repeated minimization of a convex majorizer (surrogate for a nonconvex objective function), and ite…
▽ More
We propose a general framework for reconstructing transform-sparse images from undersampled (squared)-magnitude data corrupted with outliers. This framework is implemented using a multi-layered approach, combining multiple initializations (to address the nonconvexity of the phase retrieval problem), repeated minimization of a convex majorizer (surrogate for a nonconvex objective function), and iterative optimization using the alternating directions method of multipliers. Exploiting the generality of this framework, we investigate using a Laplace measurement noise model better adapted to outliers present in the data than the conventional Gaussian noise model. Using simulations, we explore the sensitivity of the method to both the regularization and penalty parameters. We include 1D Monte Carlo and 2D image reconstruction comparisons with alternative phase retrieval algorithms. The results suggest the proposed method, with the Laplace noise model, both increases the likelihood of correct support recovery and reduces the mean squared error from measurements containing outliers. We also describe exciting extensions made possible by the generality of the proposed framework, including regularization using analysis-form sparsity priors that are incompatible with many existing approaches.
△ Less
Submitted 8 November, 2014;
originally announced November 2014.
-
Optimized first-order methods for smooth convex minimization
Authors:
Donghwan Kim,
Jeffrey A. Fessler
Abstract:
We introduce new optimized first-order methods for smooth unconstrained convex minimization. Drori and Teboulle recently described a numerical method for computing the $N$-iteration optimal step coefficients in a class of first-order algorithms that includes gradient methods, heavy-ball methods, and Nesterov's fast gradient methods. However, Drori and Teboulle's numerical method is computationally…
▽ More
We introduce new optimized first-order methods for smooth unconstrained convex minimization. Drori and Teboulle recently described a numerical method for computing the $N$-iteration optimal step coefficients in a class of first-order algorithms that includes gradient methods, heavy-ball methods, and Nesterov's fast gradient methods. However, Drori and Teboulle's numerical method is computationally expensive for large $N$, and the corresponding numerically optimized first-order algorithm requires impractical memory and computation for large-scale optimization problems. In this paper, we propose optimized first-order algorithms that achieve a convergence bound that is two times smaller than for Nesterov's fast gradient methods; our bound is found analytically and refines the numerical bound. Furthermore, the proposed optimized first-order methods have efficient recursive forms that are remarkably similar to Nesterov's fast gradient methods.
△ Less
Submitted 11 February, 2016; v1 submitted 20 June, 2014;
originally announced June 2014.
-
Fast X-ray CT image reconstruction using the linearized augmented Lagrangian method with ordered subsets
Authors:
Hung Nien,
Jeffrey A. Fessler
Abstract:
The augmented Lagrangian (AL) method that solves convex optimization problems with linear constraints has drawn more attention recently in imaging applications due to its decomposable structure for composite cost functions and empirical fast convergence rate under weak conditions. However, for problems such as X-ray computed tomography (CT) image reconstruction and large-scale sparse regression wi…
▽ More
The augmented Lagrangian (AL) method that solves convex optimization problems with linear constraints has drawn more attention recently in imaging applications due to its decomposable structure for composite cost functions and empirical fast convergence rate under weak conditions. However, for problems such as X-ray computed tomography (CT) image reconstruction and large-scale sparse regression with "big data", where there is no efficient way to solve the inner least-squares problem, the AL method can be slow due to the inevitable iterative inner updates. In this paper, we focus on solving regularized (weighted) least-squares problems using a linearized variant of the AL method that replaces the quadratic AL penalty term in the scaled augmented Lagrangian with its separable quadratic surrogate (SQS) function, thus leading to a much simpler ordered-subsets (OS) accelerable splitting-based algorithm, OS-LALM, for X-ray CT image reconstruction. To further accelerate the proposed algorithm, we use a second-order recursive system analysis to design a deterministic downward continuation approach that avoids tedious parameter tuning and provides fast convergence. Experimental results show that the proposed algorithm significantly accelerates the "convergence" of X-ray CT image reconstruction with negligible overhead and greatly reduces the OS artifacts in the reconstructed image when using many subsets for OS acceleration.
△ Less
Submitted 18 February, 2014;
originally announced February 2014.
-
A convergence proof of the split Bregman method for regularized least-squares problems
Authors:
Hung Nien,
Jeffrey A. Fessler
Abstract:
The split Bregman (SB) method [T. Goldstein and S. Osher, SIAM J. Imaging Sci., 2 (2009), pp. 323-43] is a fast splitting-based algorithm that solves image reconstruction problems with general l1, e.g., total-variation (TV) and compressed sensing (CS), regularizations by introducing a single variable split to decouple the data-fitting term and the regularization term, yielding simple subproblems t…
▽ More
The split Bregman (SB) method [T. Goldstein and S. Osher, SIAM J. Imaging Sci., 2 (2009), pp. 323-43] is a fast splitting-based algorithm that solves image reconstruction problems with general l1, e.g., total-variation (TV) and compressed sensing (CS), regularizations by introducing a single variable split to decouple the data-fitting term and the regularization term, yielding simple subproblems that are separable (or partially separable) and easy to minimize. Several convergence proofs have been proposed, and these proofs either impose a "full column rank" assumption to the split or assume exact updates in all subproblems. However, these assumptions are impractical in many applications such as the X-ray computed tomography (CT) image reconstructions, where the inner least-squares problem usually cannot be solved efficiently due to the highly shift-variant Hessian. In this paper, we show that when the data-fitting term is quadratic, the SB method is a convergent alternating direction method of multipliers (ADMM), and a straightforward convergence proof with inexact updates is given using [J. Eckstein and D. P. Bertsekas, Mathematical Programming, 55 (1992), pp. 293-318, Theorem 8]. Furthermore, since the SB method is just a special case of an ADMM algorithm, it seems likely that the ADMM algorithm will be faster than the SB method if the augmented Largangian (AL) penalty parameters are selected appropriately. To have a concrete example, we conduct a convergence rate analysis of the ADMM algorithm using two splits for image restoration problems with quadratic data-fitting term and regularization term. According to our analysis, we can show that the two-split ADMM algorithm can be faster than the SB method if the AL penalty parameter of the SB method is suboptimal. Numerical experiments were conducted to verify our analysis.
△ Less
Submitted 18 February, 2014;
originally announced February 2014.
-
Electron spin detection in the frequency domain under the interrupted Oscillating Cantilever-driven Adiabatic Reversal (iOSCAR) Protocol
Authors:
M. Ting,
A. O. Hero,
D. Rugar,
C. Y. Yip,
J. A. Fessler
Abstract:
Magnetic Resonance Force Microscopy (MRFM) is an emergent technology for measuring spin-induced attonewton forces using a micromachined cantilever. In the interrupted Oscillating Cantilever-driven Adiabatic Reversal (iOSCAR) method, small ensembles of electron spins are manipulated by an external radio frequency (RF) magnetic field to produce small periodic deviations in the resonant frequency o…
▽ More
Magnetic Resonance Force Microscopy (MRFM) is an emergent technology for measuring spin-induced attonewton forces using a micromachined cantilever. In the interrupted Oscillating Cantilever-driven Adiabatic Reversal (iOSCAR) method, small ensembles of electron spins are manipulated by an external radio frequency (RF) magnetic field to produce small periodic deviations in the resonant frequency of the cantilever. These deviations can be detected by frequency demodulation, followed by conventional amplitude or energy detection. In this paper, we develop optimal detectors for several signal models that have been hypothesized for measurements induced by iOSCAR spin manipulation. We show that two simple variants of the energy detector--the filtered energy detector and a hybrid filtered energy/amplitude/energy detector--are approximately asymptotically optimal for the Discrete-Time (D-T) random telegraph signal model assuming White Gaussian Noise (WGN). For the D-T random walk signal model, the filtered energy detector performs close to the optimal Likelihood Ratio Test (LRT) when the transition probabilities are symmetric.
△ Less
Submitted 14 January, 2004; v1 submitted 15 December, 2003;
originally announced December 2003.
-
Baseband Detection of Bistatic Electron Spin Signals in Magnetic Resonance Force Microscopy (MRFM)
Authors:
Chun-yu Yip,
Alfred O. Hero,
Daniel Rugar,
Jeffrey A. Fessler
Abstract:
In single spin Magnetic Resonance Force Microscopy (MRFM), the objective is to detect the presence of an electron (or nuclear) spin in a sample volume by measuring spin-induced attonewton forces using a micromachined cantilever. In the OSCAR method of single spin MRFM, the spins are manipulated by an external rf field to produce small periodic deviations in the resonant frequency of the cantilev…
▽ More
In single spin Magnetic Resonance Force Microscopy (MRFM), the objective is to detect the presence of an electron (or nuclear) spin in a sample volume by measuring spin-induced attonewton forces using a micromachined cantilever. In the OSCAR method of single spin MRFM, the spins are manipulated by an external rf field to produce small periodic deviations in the resonant frequency of the cantilever. These deviations can be detected by frequency demodulation followed by conventional amplitude or energy detection. In this paper, we present an alternative to these detection methods, based on optimal detection theory and Gibbs sampling. On the basis of simulations, we show that our detector outperforms the conventional amplitude and energy detectors for realistic MRFM operating conditions. For example, to achieve a 10% false alarm rate and an 80% correct detection rate our detector has an 8 dB SNR advantage as compared with the conventional amplitude or energy detectors. Furthermore, at these detection rates it comes within 4 dB of the omniscient matched-filter lower bound.
△ Less
Submitted 20 August, 2003; v1 submitted 7 July, 2003;
originally announced July 2003.