Search | arXiv e-print repository

Optimizing the Efficiency of First-Order Methods for Decreasing the Gradient of Smooth Convex Functions

Authors: Donghwan Kim, Jeffrey A. Fessler

Abstract: This paper optimizes the step coefficients of first-order methods for smooth convex minimization in terms of the worst-case convergence bound (i.e., efficiency) of the decrease in the gradient norm. This work is based on the performance estimation problem approach. The worst-case gradient bound of the resulting method is optimal up to a constant for large-dimensional smooth convex minimization pro… ▽ More This paper optimizes the step coefficients of first-order methods for smooth convex minimization in terms of the worst-case convergence bound (i.e., efficiency) of the decrease in the gradient norm. This work is based on the performance estimation problem approach. The worst-case gradient bound of the resulting method is optimal up to a constant for large-dimensional smooth convex minimization problems, under the initial bounded condition on the cost function value. This paper then illustrates that the proposed method has a computationally efficient form that is similar to the optimized gradient method. △ Less

Submitted 27 October, 2020; v1 submitted 18 March, 2018; originally announced March 2018.

arXiv:1802.07129 [pdf, other]

doi 10.1109/IVMSPW.2018.8448694

Deep BCD-Net Using Identical Encoding-Decoding CNN Structures for Iterative Image Recovery

Authors: Il Yong Chun, Jeffrey A. Fessler

Abstract: In "extreme" computational imaging that collects extremely undersampled or noisy measurements, obtaining an accurate image within a reasonable computing time is challenging. Incorporating image mapping convolutional neural networks (CNN) into iterative image recovery has great potential to resolve this issue. This paper 1) incorporates image mapping CNN using identical convolutional kernels in bot… ▽ More In "extreme" computational imaging that collects extremely undersampled or noisy measurements, obtaining an accurate image within a reasonable computing time is challenging. Incorporating image mapping convolutional neural networks (CNN) into iterative image recovery has great potential to resolve this issue. This paper 1) incorporates image mapping CNN using identical convolutional kernels in both encoders and decoders into a block coordinate descent (BCD) signal recovery method and 2) applies alternating direction method of multipliers to train the aforementioned image mapping CNN. We refer to the proposed recurrent network as BCD-Net using identical encoding-decoding CNN structures. Numerical experiments show that, for a) denoising low signal-to-noise-ratio images and b) extremely undersampled magnetic resonance imaging, the proposed BCD-Net achieves significantly more accurate image recovery, compared to BCD-Net using distinct encoding-decoding structures and/or the conventional image recovery model using both wavelets and total variation. △ Less

Submitted 28 April, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

Comments: 5 pages, 3 figures

Journal ref: Proc. IEEE Image, Video, and Multidim. Signal Process. (IVMSP) Workshop, pp. 1-5, Apr. 2018

arXiv:1802.05584 [pdf, other]

doi 10.1109/TIP.2019.2937734

Convolutional Analysis Operator Learning: Acceleration and Convergence

Authors: Il Yong Chun, Jeffrey A. Fessler

Abstract: Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets -- particularly with multi-layered stru… ▽ More Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets -- particularly with multi-layered structures, e.g., convolutional neural networks -- or when applying the learned kernels to high-dimensional signal recovery problems. The so-called convolution approach does not store many overlapping patches, and thus overcomes the memory problems particularly with careful algorithmic designs; it has been studied within the "synthesis" signal model, e.g., convolutional dictionary learning. This paper proposes a new convolutional analysis operator learning (CAOL) framework that learns an analysis sparsifying regularizer with the convolution perspective, and develops a new convergent Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve the corresponding block multi-nonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tight-frame filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, with sharp majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared to the state-of-the-art block proximal gradient (BPG) method. Numerical experiments for sparse-view computational tomography show that a convolutional sparsifying regularizer learned via CAOL significantly improves reconstruction quality compared to a conventional edge-preserving regularizer. Using more and wider kernels in a learned regularizer better preserves edges in reconstructed images. △ Less

Submitted 11 September, 2019; v1 submitted 15 February, 2018; originally announced February 2018.

Comments: 22 pages, 11 figures, fixed incorrect math theorem numbers in fig. 3

Journal ref: IEEE Trans. Image Process., 29:2108-2122, 2020

arXiv:1801.09533 [pdf, ps, other]

Statistical Image Reconstruction Using Mixed Poisson-Gaussian Noise Model for X-Ray CT

Authors: Qiaoqiao Ding, Yong Long, Xiaoqun Zhang, Jeffrey A. Fessler

Abstract: Statistical image reconstruction (SIR) methods for X-ray CT produce high-quality and accurate images, while greatly reducing patient exposure to radiation. When further reducing X-ray dose to an ultra-low level by lowering the tube current, photon starvation happens and electronic noise starts to dominate, which introduces negative or zero values into the raw measurements. These non-positive value… ▽ More Statistical image reconstruction (SIR) methods for X-ray CT produce high-quality and accurate images, while greatly reducing patient exposure to radiation. When further reducing X-ray dose to an ultra-low level by lowering the tube current, photon starvation happens and electronic noise starts to dominate, which introduces negative or zero values into the raw measurements. These non-positive values pose challenges to post-log SIR methods that require taking the logarithm of the raw data, and causes artifacts in the reconstructed images if simple correction methods are used to process these non-positive raw measurements. The raw data at ultra-low dose deviates significantly from Poisson or shifted Poisson statistics for pre-log data and from Gaussian statistics for post-log data. This paper proposes a novel SIR method called MPG (mixed Poisson-Gaussian). MPG models the raw noisy measurements using a mixed Poisson-Gaussian distribution that accounts for both the quantum noise and electronic noise. MPG is able to directly use the negative and zero values in raw data without any pre-processing. MPG cost function contains a reweighted least square data-fit term, an edge preserving regularization term and a non-negativity constraint term. We use Alternating Direction Method of Multipliers (ADMM) to separate the MPG optimization problem into several sub-problems that are easier to solve. Our results on 3D simulated cone-beam data set and synthetic helical data set generated from clinical data indicate that the proposed MPG method reduces noise and decreases bias in the reconstructed images, comparing with the conventional filtered back projection (FBP), penalized weighted least-square (PWLS) and shift Poisson (SP) method for ultra-low dose CT (ULDCT) imaging. △ Less

Submitted 19 January, 2018; originally announced January 2018.

Comments: 11 pages,6 figures

arXiv:1711.00905 [pdf, other]

Sparse-View X-Ray CT Reconstruction Using $\ell_1$ Prior with Learned Transform

Authors: Xuehang Zheng, Il Yong Chun, Zhipeng Li, Yong Long, Jeffrey A. Fessler

Abstract: A major challenge in X-ray computed tomography (CT) is reducing radiation dose while maintaining high quality of reconstructed images. To reduce the radiation dose, one can reduce the number of projection views (sparse-view CT); however, it becomes difficult to achieve high-quality image reconstruction as the number of projection views decreases. Researchers have applied the concept of learning sp… ▽ More A major challenge in X-ray computed tomography (CT) is reducing radiation dose while maintaining high quality of reconstructed images. To reduce the radiation dose, one can reduce the number of projection views (sparse-view CT); however, it becomes difficult to achieve high-quality image reconstruction as the number of projection views decreases. Researchers have applied the concept of learning sparse representations from (high-quality) CT image dataset to the sparse-view CT reconstruction. We propose a new statistical CT reconstruction model that combines penalized weighted-least squares (PWLS) and $\ell_1$ prior with learned sparsifying transform (PWLS-ST-$\ell_1$), and a corresponding efficient algorithm based on Alternating Direction Method of Multipliers (ADMM). To moderate the difficulty of tuning ADMM parameters, we propose a new ADMM parameter selection scheme based on approximated condition numbers. We interpret the proposed model by analyzing the minimum mean square error of its ($\ell_2$-norm relaxed) image update estimator. Our results with the extended cardiac-torso (XCAT) phantom data and clinical chest data show that, for sparse-view 2D fan-beam CT and 3D axial cone-beam CT, PWLS-ST-$\ell_1$ improves the quality of reconstructed images compared to the CT reconstruction methods using edge-preserving regularizer and $\ell_2$ prior with learned ST. These results also show that, for sparse-view 2D fan-beam CT, PWLS-ST-$\ell_1$ achieves comparable or better image quality and requires much shorter runtime than PWLS-DL using a learned overcomplete dictionary. Our results with clinical chest data show that, methods using the unsupervised learned prior generalize better than a state-of-the-art deep "denoising" neural network that does not use a physical imaging model. △ Less

Submitted 15 September, 2019; v1 submitted 2 November, 2017; originally announced November 2017.

Comments: The first two authors contributed equally to this work

arXiv:1710.02441 [pdf, ps, other]

doi 10.1109/TMI.2018.2817547

Dictionary-Free MRI PERK: Parameter Estimation via Regression with Kernels

Authors: Gopal Nataraj, Jon-Fredrik Nielsen, Clayton Scott, Jeffrey A. Fessler

Abstract: This paper introduces a fast, general method for dictionary-free parameter estimation in quantitative magnetic resonance imaging (QMRI) via regression with kernels (PERK). PERK first uses prior distributions and the nonlinear MR signal model to simulate many parameter-measurement pairs. Inspired by machine learning, PERK then takes these parameter-measurement pairs as labeled training points and l… ▽ More This paper introduces a fast, general method for dictionary-free parameter estimation in quantitative magnetic resonance imaging (QMRI) via regression with kernels (PERK). PERK first uses prior distributions and the nonlinear MR signal model to simulate many parameter-measurement pairs. Inspired by machine learning, PERK then takes these parameter-measurement pairs as labeled training points and learns from them a nonlinear regression function using kernel functions and convex optimization. PERK admits a simple implementation as per-voxel nonlinear lifting of MRI measurements followed by linear minimum mean-squared error regression. We demonstrate PERK for $T_1,T_2$ estimation, a well-studied application where it is simple to compare PERK estimates against dictionary-based grid search estimates. Numerical simulations as well as single-slice phantom and in vivo experiments demonstrate that PERK and grid search produce comparable $T_1,T_2$ estimates in white and gray matter, but PERK is consistently at least $23\times$ faster. This acceleration factor will increase by several orders of magnitude for full-volume QMRI estimation problems involving more latent parameters per voxel. △ Less

Submitted 6 October, 2017; originally announced October 2017.

Comments: submitted to IEEE Transactions on Medical Imaging

Journal ref: IEEE Transactions on Medical Imaging 37(9):2103-14 Sep 2018

arXiv:1707.05927 [pdf, ps, other]

Medical image reconstruction: a brief overview of past milestones and future directions

Authors: Jeffrey A. Fessler

Abstract: This paper briefly reviews past milestones in the field of medical image reconstruction and describes some future directions. It is part of an overview paper on "open problems in signal processing" that will appear in IEEE Signal Processing Magazine, but presented here with citations and equations. This paper briefly reviews past milestones in the field of medical image reconstruction and describes some future directions. It is part of an overview paper on "open problems in signal processing" that will appear in IEEE Signal Processing Magazine, but presented here with citations and equations. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: Part of a submission to IEEE Signal Processing Magazine

arXiv:1707.02914 [pdf, other]

doi 10.1109/IVMSPW.2016.7528219

Low Dose CT Image Reconstruction With Learned Sparsifying Transform

Authors: Xuehang Zheng, Zening Lu, Saiprasad Ravishankar, Yong Long, Jeffrey A. Fessler

Abstract: A major challenge in computed tomography (CT) is to reduce X-ray dose to a low or even ultra-low level while maintaining the high quality of reconstructed images. We propose a new method for CT reconstruction that combines penalized weighted-least squares reconstruction (PWLS) with regularization based on a sparsifying transform (PWLS-ST) learned from a dataset of numerous CT images. We adopt an a… ▽ More A major challenge in computed tomography (CT) is to reduce X-ray dose to a low or even ultra-low level while maintaining the high quality of reconstructed images. We propose a new method for CT reconstruction that combines penalized weighted-least squares reconstruction (PWLS) with regularization based on a sparsifying transform (PWLS-ST) learned from a dataset of numerous CT images. We adopt an alternating algorithm to optimize the PWLS-ST cost function that alternates between a CT image update step and a sparse coding step. We adopt a relaxed linearized augmented Lagrangian method with ordered-subsets (relaxed OS-LALM) to accelerate the CT image update step by reducing the number of forward and backward projections. Numerical experiments on the XCAT phantom show that for low dose levels, the proposed PWLS-ST method dramatically improves the quality of reconstructed images compared to PWLS reconstruction with a nonadaptive edge-preserving regularizer (PWLS-EP). △ Less

Submitted 10 July, 2017; originally announced July 2017.

Comments: This is a revised and corrected version of the IEEE IVMSP Workshop paper DOI: 10.1109/IVMSPW.2016.7528219

arXiv:1707.00389 [pdf, other]

doi 10.1109/TIP.2017.2761545

Convolutional Dictionary Learning: Acceleration and Convergence

Authors: Il Yong Chun, Jeffrey A. Fessler

Abstract: Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence… ▽ More Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach. △ Less

Submitted 25 August, 2017; v1 submitted 2 July, 2017; originally announced July 2017.

Comments: 21 pages, 7 figures, submitted to IEEE Transactions on Image Processing

Journal ref: IEEE Trans. Image Process., 27(4):1697-1712, Apr. 2018

arXiv:1703.09165 [pdf, other]

doi 10.1109/TMI.2018.2832007

PWLS-ULTRA: An Efficient Clustering and Learning-Based Approach for Low-Dose 3D CT Image Reconstruction

Authors: Xuehang Zheng, Saiprasad Ravishankar, Yong Long, Jeffrey A. Fessler

Abstract: The development of computed tomography (CT) image reconstruction methods that significantly reduce patient radiation exposure while maintaining high image quality is an important area of research in low-dose CT (LDCT) imaging. We propose a new penalized weighted least squares (PWLS) reconstruction method that exploits regularization based on an efficient Union of Learned TRAnsforms (PWLS-ULTRA). T… ▽ More The development of computed tomography (CT) image reconstruction methods that significantly reduce patient radiation exposure while maintaining high image quality is an important area of research in low-dose CT (LDCT) imaging. We propose a new penalized weighted least squares (PWLS) reconstruction method that exploits regularization based on an efficient Union of Learned TRAnsforms (PWLS-ULTRA). The union of square transforms is pre-learned from numerous image patches extracted from a dataset of CT images or volumes. The proposed PWLS-based cost function is optimized by alternating between a CT image reconstruction step, and a sparse coding and clustering step. The CT image reconstruction step is accelerated by a relaxed linearized augmented Lagrangian method with ordered-subsets that reduces the number of forward and back projections. Simulations with 2-D and 3-D axial CT scans of the extended cardiac-torso phantom and 3D helical chest and abdomen scans show that for both normal-dose and low-dose levels, the proposed method significantly improves the quality of reconstructed images compared to PWLS reconstruction with a nonadaptive edge-preserving regularizer (PWLS-EP). PWLS with regularization based on a union of learned transforms leads to better image reconstructions than using a single learned square transform. We also incorporate patch-based weights in PWLS-ULTRA that enhance image quality and help improve image resolution uniformity. The proposed approach achieves comparable or better image quality compared to learned overcomplete synthesis dictionaries, but importantly, is much faster (computationally more efficient). △ Less

Submitted 1 June, 2018; v1 submitted 27 March, 2017; originally announced March 2017.

Comments: Accepted to IEEE Transaction on Medical Imaging

Journal ref: IEEE Transaction on Medical Imaging 37(6):1498-510 Jun 2018

arXiv:1703.06610 [pdf, other]

doi 10.1016/j.jmva.2018.06.002

Asymptotic performance of PCA for high-dimensional heteroscedastic data

Authors: David Hong, Laura Balzano, Jeffrey A. Fessler

Abstract: Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for hig… ▽ More Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simplified expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that, for a fixed average noise variance, the asymptotic recovery of PCA for heteroscedastic data is always worse than that for homoscedastic data (i.e., for noise variances that are equal across samples). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data. △ Less

Submitted 23 June, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

Comments: 34 pages (including supplement), 17 figures

MSC Class: 62H25; 62H12; 62F12

Journal ref: J. Multivariate Analysis 167:435-52 Sep 2018

arXiv:1703.04641 [pdf, ps, other]

doi 10.1007/s10957-018-1287-4

Adaptive Restart of the Optimized Gradient Method for Convex Optimization

Authors: Donghwan Kim, Jeffrey A. Fessler

Abstract: First-order methods with momentum such as Nesterov's fast gradient method are very useful for convex optimization problems, but can exhibit undesirable oscillations yielding slow convergence rates for some applications. An adaptive restarting scheme can improve the convergence rate of the fast gradient method, when the parameter of a strongly convex cost function is unknown or when the iterates of… ▽ More First-order methods with momentum such as Nesterov's fast gradient method are very useful for convex optimization problems, but can exhibit undesirable oscillations yielding slow convergence rates for some applications. An adaptive restarting scheme can improve the convergence rate of the fast gradient method, when the parameter of a strongly convex cost function is unknown or when the iterates of the algorithm enter a locally strongly convex region. Recently, we introduced the optimized gradient method, a first-order algorithm that has an inexpensive per-iteration computational cost similar to that of the fast gradient method, yet has a worst-case cost function rate that is twice faster than that of the fast gradient method and that is optimal for large-dimensional smooth convex problems. Building upon the success of accelerating the fast gradient method using adaptive restart, this paper investigates similar heuristic acceleration of the optimized gradient method. We first derive a new first-order method that resembles the optimized gradient method for strongly convex quadratic problems with known function parameters, yielding a linear convergence rate that is faster than that of the analogous version of the fast gradient method. We then provide a heuristic analysis and numerical experiments that illustrate that adaptive restart can accelerate the convergence of the optimized gradient method. Numerical results also illustrate that adaptive restart is helpful for a proximal version of the optimized gradient method for nonsmooth composite convex functions. △ Less

Submitted 27 November, 2017; v1 submitted 14 March, 2017; originally announced March 2017.

Journal ref: JOTA 178:240-63 Jul 2018

arXiv:1611.04069 [pdf, other]

doi 10.1109/TMI.2017.2650960

Low-rank and Adaptive Sparse Signal (LASSI) Models for Highly Accelerated Dynamic Imaging

Authors: Saiprasad Ravishankar, Brian E. Moore, Raj Rao Nadakuditi, Jeffrey A. Fessler

Abstract: Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery from undersampled measurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the pat… ▽ More Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery from undersampled measurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the patches of the underlying data are modeled as sparse in an adaptive dictionary domain, and the resulting image and dictionary estimation from undersampled measurements is called dictionary-blind compressed sensing, or the dynamic image sequence is modeled as a sum of low-rank and sparse (in some transform domain) components (L+S model) that are estimated from limited measurements. In this work, we investigate a data-adaptive extension of the L+S model, dubbed LASSI, where the temporal image sequence is decomposed into a low-rank component and a component whose spatiotemporal (3D) patches are sparse in some adaptive dictionary domain. We investigate various formulations and efficient methods for jointly estimating the underlying dynamic signal components and the spatiotemporal dictionary from limited measurements. We also obtain efficient sparsity penalized dictionary-blind compressed sensing methods as special cases of our LASSI approaches. Our numerical experiments demonstrate the promising performance of LASSI schemes for dynamic magnetic resonance image reconstruction from limited k-t space data compared to recent methods such as k-t SLR and L+S, and compared to the proposed dictionary-blind compressed sensing method. △ Less

Submitted 9 January, 2017; v1 submitted 12 November, 2016; originally announced November 2016.

Journal ref: IEEE Tr. Med. Imaging 36(5):1116-28 May 2017

arXiv:1610.03595 [pdf, other]

doi 10.1109/ALLERTON.2016.7852272

Towards a Theoretical Analysis of PCA for Heteroscedastic Data

Authors: David Hong, Laura Balzano, Jeffrey A. Fessler

Abstract: Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. PCA performs well in the presence of moderate noise and even with missing data, but is also sensitive to outliers. PCA is also known to have a phase transition w… ▽ More Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. PCA performs well in the presence of moderate noise and even with missing data, but is also sensitive to outliers. PCA is also known to have a phase transition when noise is independent and identically distributed; recovery of the subspace sharply declines at a threshold noise variance. Effective use of PCA requires a rigorous understanding of these behaviors. This paper provides a step towards an analysis of PCA for samples with heteroscedastic noise, that is, samples that have non-uniform noise variances and so are no longer identically distributed. In particular, we provide a simple asymptotic prediction of the recovery of a one-dimensional subspace from noisy heteroscedastic samples. The prediction enables: a) easy and efficient calculation of the asymptotic performance, and b) qualitative reasoning to understand how PCA is impacted by heteroscedasticity (such as outliers). △ Less

Submitted 12 October, 2016; originally announced October 2016.

Comments: Presented at 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

arXiv:1609.09441 [pdf, ps, other]

Fast dual proximal gradient algorithms with rate $O(1/k^{1.5})$ for convex minimization

Authors: Donghwan Kim, Jeffrey A. Fessler

Abstract: We consider minimizing the composite function that consists of a strongly convex function and a convex function. The fast dual proximal gradient (FDPG) method decreases the dual function with a rate $O(1/k^2)$, leading to a rate $O(1/k)$ for decreasing the primal function. We propose a generalized FDPG method that guarantees an $O(1/k^{1.5})$ rate for the dual proximal gradient norm decrease. By r… ▽ More We consider minimizing the composite function that consists of a strongly convex function and a convex function. The fast dual proximal gradient (FDPG) method decreases the dual function with a rate $O(1/k^2)$, leading to a rate $O(1/k)$ for decreasing the primal function. We propose a generalized FDPG method that guarantees an $O(1/k^{1.5})$ rate for the dual proximal gradient norm decrease. By relating this to the primal function decrease, the proposed approach decreases the primal function with the improved $O(1/k^{1.5})$ rate. △ Less

Submitted 29 September, 2016; originally announced September 2016.

arXiv:1608.03861 [pdf, ps, other]

doi 10.1137/16M108940X

Another look at the fast iterative shrinkage/thresholding algorithm (FISTA)

Authors: Donghwan Kim, Jeffrey A. Fessler

Abstract: This paper provides a new way of developing the fast iterative shrinkage/thresholding algorithm (FISTA) that is widely used for minimizing composite convex functions with a nonsmooth term such as the $\ell_1$ regularizer. In particular, this paper shows that FISTA corresponds to an optimized approach to accelerating the proximal gradient method with respect to a worst-case bound of the cost functi… ▽ More This paper provides a new way of developing the fast iterative shrinkage/thresholding algorithm (FISTA) that is widely used for minimizing composite convex functions with a nonsmooth term such as the $\ell_1$ regularizer. In particular, this paper shows that FISTA corresponds to an optimized approach to accelerating the proximal gradient method with respect to a worst-case bound of the cost function. This paper then proposes a new algorithm that is derived by instead optimizing the step coefficients of the proximal gradient method with respect to a worst-case bound of the composite gradient mapping. The proof is based on the worst-case analysis called Performance Estimation Problem. △ Less

Submitted 22 January, 2018; v1 submitted 12 August, 2016; originally announced August 2016.

Comments: minor modification in the title

Journal ref: SIAM J. Optim. 28(1):223-50 2018

arXiv:1607.06764 [pdf, ps, other]

doi 10.1137/17m112124x

Generalizing the optimized gradient method for smooth convex minimization

Authors: Donghwan Kim, Jeffrey A. Fessler

Abstract: This paper generalizes the optimized gradient method (OGM) that achieves the optimal worst-case cost function bound of first-order methods for smooth convex minimization. Specifically, this paper studies a generalized formulation of OGM and analyzes its worst-case rates in terms of both the function value and the norm of the function gradient. This paper also develops a new algorithm called OGM-OG… ▽ More This paper generalizes the optimized gradient method (OGM) that achieves the optimal worst-case cost function bound of first-order methods for smooth convex minimization. Specifically, this paper studies a generalized formulation of OGM and analyzes its worst-case rates in terms of both the function value and the norm of the function gradient. This paper also develops a new algorithm called OGM-OG that is in the generalized family of OGM and that has the best known analytical worst-case bound with rate $O(1/N^{1.5})$ on the decrease of the gradient norm among fixed-step first-order methods. This paper also proves that Nesterov's fast gradient method has an $O(1/N^{1.5})$ worst-case gradient norm rate but with constant larger than OGM-OG. The proof is based on the worst-case analysis called Performance Estimation Problem. △ Less

Submitted 1 April, 2018; v1 submitted 22 July, 2016; originally announced July 2016.

Journal ref: SIAM J. Optim. 28(2):1920-50 2018

arXiv:1512.04564 [pdf, ps, other]

doi 10.1109/TMI.2015.2508780

Relaxed Linearized Algorithms for Faster X-Ray CT Image Reconstruction

Authors: Hung Nien, Jeffrey A. Fessler

Abstract: Statistical image reconstruction (SIR) methods are studied extensively for X-ray computed tomography (CT) due to the potential of acquiring CT scans with reduced X-ray dose while maintaining image quality. However, the longer reconstruction time of SIR methods hinders their use in X-ray CT in practice. To accelerate statistical methods, many optimization techniques have been investigated. Over-rel… ▽ More Statistical image reconstruction (SIR) methods are studied extensively for X-ray computed tomography (CT) due to the potential of acquiring CT scans with reduced X-ray dose while maintaining image quality. However, the longer reconstruction time of SIR methods hinders their use in X-ray CT in practice. To accelerate statistical methods, many optimization techniques have been investigated. Over-relaxation is a common technique to speed up convergence of iterative algorithms. For instance, using a relaxation parameter that is close to two in alternating direction method of multipliers (ADMM) has been shown to speed up convergence significantly. This paper proposes a relaxed linearized augmented Lagrangian (AL) method that shows theoretical faster convergence rate with over-relaxation and applies the proposed relaxed linearized AL method to X-ray CT image reconstruction problems. Experimental results with both simulated and real CT scan data show that the proposed relaxed algorithm (with ordered-subsets [OS] acceleration) is about twice as fast as the existing unrelaxed fast algorithms, with negligible computation and memory overhead. △ Less

Submitted 14 December, 2015; originally announced December 2015.

Comments: Submitted to IEEE Transactions on Medical Imaging

Journal ref: IEEE Transactions on Medical Imaging 35(4):1090-8 Apr 2016

arXiv:1511.08842 [pdf, ps, other]

Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) - The $\ell_0$ Method

Authors: Saiprasad Ravishankar, Raj Rao Nadakuditi, Jeffrey A. Fessler

Abstract: The sparsity of natural signals and images in a transform domain or dictionary has been extensively exploited in several applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise in many applications compared to fixed or analytical dictionary models. However, dictionary learning problems are typically non-con… ▽ More The sparsity of natural signals and images in a transform domain or dictionary has been extensively exploited in several applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise in many applications compared to fixed or analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. In this work, we investigate an efficient method for $\ell_{0}$ "norm"-based dictionary learning by first approximating the training data set with a sum of sparse rank-one matrices and then using a block coordinate descent approach to estimate the unknowns. The proposed block coordinate descent algorithm involves efficient closed-form solutions. In particular, the sparse coding step involves a simple form of thresholding. We provide a convergence analysis for the proposed block coordinate descent approach. Our numerical experiments show the promising performance and significant speed-ups provided by our method over the classical K-SVD scheme in sparse signal representation and image denoising. △ Less

Submitted 20 April, 2017; v1 submitted 27 November, 2015; originally announced November 2015.

Comments: This work is cited by the IEEE Transactions on Computational Imaging Paper arXiv:1511.06333 (DOI: 10.1109/TCI.2017.2697206)

arXiv:1511.06333 [pdf, other]

doi 10.1109/TCI.2017.2697206

Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) and Its Application to Inverse Problems

Authors: Saiprasad Ravishankar, Raj Rao Nadakuditi, Jeffrey A. Fessler

Abstract: The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches fo… ▽ More The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. This paper exploits the ideas that drive algorithms such as K-SVD, and investigates in detail efficient methods for aggregate sparsity penalized dictionary learning by first approximating the data with a sum of sparse rank-one matrices (outer products) and then using a block coordinate descent approach to estimate the unknowns. The resulting block coordinate descent algorithms involve efficient closed-form solutions. Furthermore, we consider the problem of dictionary-blind image reconstruction, and propose novel and efficient algorithms for adaptive image reconstruction using block coordinate descent and sum of outer products methodologies. We provide a convergence study of the algorithms for dictionary learning and dictionary-blind image reconstruction. Our numerical experiments show the promising performance and speed-ups provided by the proposed methods over previous schemes in sparse data representation and compressed sensing-based image reconstruction. △ Less

Submitted 20 April, 2017; v1 submitted 19 November, 2015; originally announced November 2015.

Comments: Accepted to IEEE Transactions on Computational Imaging. This paper also cites experimental results reported in arXiv:1511.08842

Journal ref: IEEE Transactions on Computational Imaging, 3(4):694-709 Dec 2017

arXiv:1510.08573 [pdf, ps, other]

doi 10.1007/s10957-016-1018-7

On the convergence analysis of the optimized gradient method

Authors: Donghwan Kim, Jeffrey A. Fessler

Abstract: This paper considers the problem of unconstrained minimization of smooth convex functions having Lipschitz continuous gradients with known Lipschitz constant. We recently proposed an optimized gradient method (OGM) for this problem and showed that it has a worst-case convergence bound for the cost function decrease that is twice as small as that of Nesterov's fast gradient method (FGM), yet has a… ▽ More This paper considers the problem of unconstrained minimization of smooth convex functions having Lipschitz continuous gradients with known Lipschitz constant. We recently proposed an optimized gradient method (OGM) for this problem and showed that it has a worst-case convergence bound for the cost function decrease that is twice as small as that of Nesterov's fast gradient method (FGM), yet has a similarly efficient practical implementation. Drori showed recently that OGM has optimal complexity over the general class of first-order methods. This optimality makes it important to study fully the convergence properties of OGM. The previous worst-case convergence bound for OGM was derived for only the last iterate of a secondary sequence. This paper provides an analytic convergence bound for the primary sequence generated by OGM. We then discuss additional convergence properties of OGM, including the interesting fact that OGM has two types of worst-case functions: a piecewise affine-quadratic function and a quadratic function. These results help complete the theory of optimal first-order methods for smooth convex minimization. △ Less

Submitted 27 June, 2016; v1 submitted 29 October, 2015; originally announced October 2015.

Report number: JOTA 172:187-205 Jan 2017

arXiv:1508.02958 [pdf, other]

Algorithmic Design of Majorizers for Large-Scale Inverse Problems

Authors: Madison G. McGaffin, Jeffrey A. Fessler

Abstract: Iterative majorize-minimize (MM) (also called optimization transfer) algorithms solve challenging numerical optimization problems by solving a series of "easier" optimization problems that are constructed to guarantee monotonic descent of the cost function. Many MM algorithms replace a computationally expensive Hessian matrix with another more computationally convenient majorizing matrix. These ma… ▽ More Iterative majorize-minimize (MM) (also called optimization transfer) algorithms solve challenging numerical optimization problems by solving a series of "easier" optimization problems that are constructed to guarantee monotonic descent of the cost function. Many MM algorithms replace a computationally expensive Hessian matrix with another more computationally convenient majorizing matrix. These majorizing matrices are often generated using various matrix inequalities, and consequently the set of available majorizers is limited to structures for which these matrix inequalities can be efficiently applied. In this paper, we present a technique to algorithmically design matrix majorizers with wide varieties of structures. We use a novel duality-based approach to avoid the high computational and memory costs of standard semidefinite programming techniques. We present some preliminary results for 2D X-ray CT reconstruction that indicate these more exotic regularizers may significantly accelerate MM algorithms. △ Less

Submitted 21 October, 2015; v1 submitted 12 August, 2015; originally announced August 2015.

Comments: 10 pages, 4 figures

ACM Class: G.1.6; I.4.4

arXiv:1411.2183 [pdf, ps, other]

doi 10.1109/TCI.2015.2498402

Undersampled Phase Retrieval with Outliers

Authors: Daniel S. Weller, Ayelet Pnueli, Gilad Divon, Ori Radzyner, Yonina C. Eldar, Jeffrey A. Fessler

Abstract: We propose a general framework for reconstructing transform-sparse images from undersampled (squared)-magnitude data corrupted with outliers. This framework is implemented using a multi-layered approach, combining multiple initializations (to address the nonconvexity of the phase retrieval problem), repeated minimization of a convex majorizer (surrogate for a nonconvex objective function), and ite… ▽ More We propose a general framework for reconstructing transform-sparse images from undersampled (squared)-magnitude data corrupted with outliers. This framework is implemented using a multi-layered approach, combining multiple initializations (to address the nonconvexity of the phase retrieval problem), repeated minimization of a convex majorizer (surrogate for a nonconvex objective function), and iterative optimization using the alternating directions method of multipliers. Exploiting the generality of this framework, we investigate using a Laplace measurement noise model better adapted to outliers present in the data than the conventional Gaussian noise model. Using simulations, we explore the sensitivity of the method to both the regularization and penalty parameters. We include 1D Monte Carlo and 2D image reconstruction comparisons with alternative phase retrieval algorithms. The results suggest the proposed method, with the Laplace noise model, both increases the likelihood of correct support recovery and reduces the mean squared error from measurements containing outliers. We also describe exciting extensions made possible by the generality of the proposed framework, including regularization using analysis-form sparsity priors that are incompatible with many existing approaches. △ Less

Submitted 8 November, 2014; originally announced November 2014.

Comments: 11 pages, 9 figures

arXiv:1406.5468 [pdf, ps, other]

doi 10.1007/s10107-015-0949-3

Optimized first-order methods for smooth convex minimization

Authors: Donghwan Kim, Jeffrey A. Fessler

Abstract: We introduce new optimized first-order methods for smooth unconstrained convex minimization. Drori and Teboulle recently described a numerical method for computing the $N$-iteration optimal step coefficients in a class of first-order algorithms that includes gradient methods, heavy-ball methods, and Nesterov's fast gradient methods. However, Drori and Teboulle's numerical method is computationally… ▽ More We introduce new optimized first-order methods for smooth unconstrained convex minimization. Drori and Teboulle recently described a numerical method for computing the $N$-iteration optimal step coefficients in a class of first-order algorithms that includes gradient methods, heavy-ball methods, and Nesterov's fast gradient methods. However, Drori and Teboulle's numerical method is computationally expensive for large $N$, and the corresponding numerically optimized first-order algorithm requires impractical memory and computation for large-scale optimization problems. In this paper, we propose optimized first-order algorithms that achieve a convergence bound that is two times smaller than for Nesterov's fast gradient methods; our bound is found analytically and refines the numerical bound. Furthermore, the proposed optimized first-order methods have efficient recursive forms that are remarkably similar to Nesterov's fast gradient methods. △ Less

Submitted 11 February, 2016; v1 submitted 20 June, 2014; originally announced June 2014.

Journal ref: Math. Prog. 159(1):81-107, Sep. 2016

arXiv:1402.4381 [pdf, ps, other]

doi 10.1109/TMI.2014.2358499

Fast X-ray CT image reconstruction using the linearized augmented Lagrangian method with ordered subsets

Authors: Hung Nien, Jeffrey A. Fessler

Abstract: The augmented Lagrangian (AL) method that solves convex optimization problems with linear constraints has drawn more attention recently in imaging applications due to its decomposable structure for composite cost functions and empirical fast convergence rate under weak conditions. However, for problems such as X-ray computed tomography (CT) image reconstruction and large-scale sparse regression wi… ▽ More The augmented Lagrangian (AL) method that solves convex optimization problems with linear constraints has drawn more attention recently in imaging applications due to its decomposable structure for composite cost functions and empirical fast convergence rate under weak conditions. However, for problems such as X-ray computed tomography (CT) image reconstruction and large-scale sparse regression with "big data", where there is no efficient way to solve the inner least-squares problem, the AL method can be slow due to the inevitable iterative inner updates. In this paper, we focus on solving regularized (weighted) least-squares problems using a linearized variant of the AL method that replaces the quadratic AL penalty term in the scaled augmented Lagrangian with its separable quadratic surrogate (SQS) function, thus leading to a much simpler ordered-subsets (OS) accelerable splitting-based algorithm, OS-LALM, for X-ray CT image reconstruction. To further accelerate the proposed algorithm, we use a second-order recursive system analysis to design a deterministic downward continuation approach that avoids tedious parameter tuning and provides fast convergence. Experimental results show that the proposed algorithm significantly accelerates the "convergence" of X-ray CT image reconstruction with negligible overhead and greatly reduces the OS artifacts in the reconstructed image when using many subsets for OS acceleration. △ Less

Submitted 18 February, 2014; originally announced February 2014.

Comments: 21 pages (including the supplementary material), 12 figures, submitted to IEEE Trans. Med. Imag

Journal ref: IEEE Trans. Medical Imaging, 34(2):388-99, Feb. 2015

arXiv:1402.4371 [pdf, ps, other]

A convergence proof of the split Bregman method for regularized least-squares problems

Authors: Hung Nien, Jeffrey A. Fessler

Abstract: The split Bregman (SB) method [T. Goldstein and S. Osher, SIAM J. Imaging Sci., 2 (2009), pp. 323-43] is a fast splitting-based algorithm that solves image reconstruction problems with general l1, e.g., total-variation (TV) and compressed sensing (CS), regularizations by introducing a single variable split to decouple the data-fitting term and the regularization term, yielding simple subproblems t… ▽ More The split Bregman (SB) method [T. Goldstein and S. Osher, SIAM J. Imaging Sci., 2 (2009), pp. 323-43] is a fast splitting-based algorithm that solves image reconstruction problems with general l1, e.g., total-variation (TV) and compressed sensing (CS), regularizations by introducing a single variable split to decouple the data-fitting term and the regularization term, yielding simple subproblems that are separable (or partially separable) and easy to minimize. Several convergence proofs have been proposed, and these proofs either impose a "full column rank" assumption to the split or assume exact updates in all subproblems. However, these assumptions are impractical in many applications such as the X-ray computed tomography (CT) image reconstructions, where the inner least-squares problem usually cannot be solved efficiently due to the highly shift-variant Hessian. In this paper, we show that when the data-fitting term is quadratic, the SB method is a convergent alternating direction method of multipliers (ADMM), and a straightforward convergence proof with inexact updates is given using [J. Eckstein and D. P. Bertsekas, Mathematical Programming, 55 (1992), pp. 293-318, Theorem 8]. Furthermore, since the SB method is just a special case of an ADMM algorithm, it seems likely that the ADMM algorithm will be faster than the SB method if the augmented Largangian (AL) penalty parameters are selected appropriately. To have a concrete example, we conduct a convergence rate analysis of the ADMM algorithm using two splits for image restoration problems with quadratic data-fitting term and regularization term. According to our analysis, we can show that the two-split ADMM algorithm can be faster than the SB method if the AL penalty parameter of the SB method is suboptimal. Numerical experiments were conducted to verify our analysis. △ Less

Submitted 18 February, 2014; originally announced February 2014.

Comments: 11 pages, 3 figures, submitted to SIAM J. Imaging Sci

arXiv:quant-ph/0312139 [pdf, ps, other]

Electron spin detection in the frequency domain under the interrupted Oscillating Cantilever-driven Adiabatic Reversal (iOSCAR) Protocol

Authors: M. Ting, A. O. Hero, D. Rugar, C. Y. Yip, J. A. Fessler

Abstract: Magnetic Resonance Force Microscopy (MRFM) is an emergent technology for measuring spin-induced attonewton forces using a micromachined cantilever. In the interrupted Oscillating Cantilever-driven Adiabatic Reversal (iOSCAR) method, small ensembles of electron spins are manipulated by an external radio frequency (RF) magnetic field to produce small periodic deviations in the resonant frequency o… ▽ More Magnetic Resonance Force Microscopy (MRFM) is an emergent technology for measuring spin-induced attonewton forces using a micromachined cantilever. In the interrupted Oscillating Cantilever-driven Adiabatic Reversal (iOSCAR) method, small ensembles of electron spins are manipulated by an external radio frequency (RF) magnetic field to produce small periodic deviations in the resonant frequency of the cantilever. These deviations can be detected by frequency demodulation, followed by conventional amplitude or energy detection. In this paper, we develop optimal detectors for several signal models that have been hypothesized for measurements induced by iOSCAR spin manipulation. We show that two simple variants of the energy detector--the filtered energy detector and a hybrid filtered energy/amplitude/energy detector--are approximately asymptotically optimal for the Discrete-Time (D-T) random telegraph signal model assuming White Gaussian Noise (WGN). For the D-T random walk signal model, the filtered energy detector performs close to the optimal Likelihood Ratio Test (LRT) when the transition probabilities are symmetric. △ Less

Submitted 14 January, 2004; v1 submitted 15 December, 2003; originally announced December 2003.

Comments: 11 pages, Version 2: Removed extraneous author information, Version 3: Corrected references, Version 4: Re-arranged references and some cosmetic changes

arXiv:quant-ph/0307042 [pdf, ps, other]

Baseband Detection of Bistatic Electron Spin Signals in Magnetic Resonance Force Microscopy (MRFM)

Authors: Chun-yu Yip, Alfred O. Hero, Daniel Rugar, Jeffrey A. Fessler

Abstract: In single spin Magnetic Resonance Force Microscopy (MRFM), the objective is to detect the presence of an electron (or nuclear) spin in a sample volume by measuring spin-induced attonewton forces using a micromachined cantilever. In the OSCAR method of single spin MRFM, the spins are manipulated by an external rf field to produce small periodic deviations in the resonant frequency of the cantilev… ▽ More In single spin Magnetic Resonance Force Microscopy (MRFM), the objective is to detect the presence of an electron (or nuclear) spin in a sample volume by measuring spin-induced attonewton forces using a micromachined cantilever. In the OSCAR method of single spin MRFM, the spins are manipulated by an external rf field to produce small periodic deviations in the resonant frequency of the cantilever. These deviations can be detected by frequency demodulation followed by conventional amplitude or energy detection. In this paper, we present an alternative to these detection methods, based on optimal detection theory and Gibbs sampling. On the basis of simulations, we show that our detector outperforms the conventional amplitude and energy detectors for realistic MRFM operating conditions. For example, to achieve a 10% false alarm rate and an 80% correct detection rate our detector has an 8 dB SNR advantage as compared with the conventional amplitude or energy detectors. Furthermore, at these detection rates it comes within 4 dB of the omniscient matched-filter lower bound. △ Less

Submitted 20 August, 2003; v1 submitted 7 July, 2003; originally announced July 2003.

Comments: 8 pages, 9 figures, revision of paper contains correction to a typo on the first page (introduction section)

Showing 51–78 of 78 results for author: Fessler, A