-
Stochastic Natural Thresholding Algorithms
Authors:
Rachel Grotheer,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and disc…
▽ More
Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Iterative Singular Tube Hard Thresholding Algorithms for Tensor Recovery
Authors:
Rachel Grotheer,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
Due to the explosive growth of large-scale data sets, tensors have been a vital tool to analyze and process high-dimensional data. Different from the matrix case, tensor decomposition has been defined in various formats, which can be further used to define the best low-rank approximation of a tensor to significantly reduce the dimensionality for signal compression and recovery. In this paper, we c…
▽ More
Due to the explosive growth of large-scale data sets, tensors have been a vital tool to analyze and process high-dimensional data. Different from the matrix case, tensor decomposition has been defined in various formats, which can be further used to define the best low-rank approximation of a tensor to significantly reduce the dimensionality for signal compression and recovery. In this paper, we consider the low-rank tensor recovery problem when the tubal rank of the underlying tensor is given or estimated a priori. We propose a novel class of iterative singular tube hard thresholding algorithms for tensor recovery based on the low-tubal-rank tensor approximation, including basic, accelerated deterministic and stochastic versions. Convergence guarantees are provided along with the special case when the measurements are linear. Numerical experiments on tensor compressive sensing and color image inpainting are conducted to demonstrate convergence and computational efficiency in practice.
△ Less
Submitted 26 December, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Automatic Infectious Disease Classification Analysis with Concept Discovery
Authors:
Elena Sizikova,
Joshua Vendrow,
Xu Cao,
Rachel Grotheer,
Jamie Haddock,
Lara Kassab,
Alona Kryshchenko,
Thomas Merkh,
R. W. M. A. Madushani,
Kenny Moise,
Annie Ulichney,
Huy V. Vo,
Chuntian Wang,
Megan Coffee,
Kathryn Leonard,
Deanna Needell
Abstract:
Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmiss…
▽ More
Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmission and improve clinical outcomes. In order to understand and trust neural network predictions, analysis of learned representations is necessary. In this work, we argue that automatic discovery of concepts, i.e., human interpretable attributes, allows for a deep understanding of learned information in medical image analysis tasks, generalizing beyond the training labels or protocols. We provide an overview of existing concept discovery approaches in medical image and computer vision communities, and evaluate representative methods on tuberculosis (TB) prediction and monkeypox prediction tasks. Finally, we propose NMFx, a general NMF formulation of interpretability by concept discovery that works in a unified way in unsupervised, weakly supervised, and supervised scenarios.
△ Less
Submitted 14 November, 2022; v1 submitted 28 August, 2022;
originally announced September 2022.
-
Semi-supervised Nonnegative Matrix Factorization for Document Classification
Authors:
Jamie Haddock,
Lara Kassab,
Sixian Li,
Alona Kryshchenko,
Rachel Grotheer,
Elena Sizikova,
Chuntian Wang,
Thomas Merkh,
RWMA Madushani,
Miju Ahn,
Deanna Needell,
Kathryn Leonard
Abstract:
We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates f…
▽ More
We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates for each new model, and demonstrate the application of these models to single-label and multi-label document classification, although the models are flexible to other supervised learning tasks such as regression. We illustrate the promise of these models and training methods on document classification datasets (e.g., 20 Newsgroups, Reuters).
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
Semi-supervised NMF Models for Topic Modeling in Learning Tasks
Authors:
Jamie Haddock,
Lara Kassab,
Sixian Li,
Alona Kryshchenko,
Rachel Grotheer,
Elena Sizikova,
Chuntian Wang,
Thomas Merkh,
R. W. M. A. Madushani,
Miju Ahn,
Deanna Needell,
Kathryn Leonard
Abstract:
We propose several new models for semi-supervised nonnegative matrix factorization (SSNMF) and provide motivation for SSNMF models as maximum likelihood estimators given specific distributions of uncertainty. We present multiplicative updates training methods for each new model, and demonstrate the application of these models to classification, although they are flexible to other supervised learni…
▽ More
We propose several new models for semi-supervised nonnegative matrix factorization (SSNMF) and provide motivation for SSNMF models as maximum likelihood estimators given specific distributions of uncertainty. We present multiplicative updates training methods for each new model, and demonstrate the application of these models to classification, although they are flexible to other supervised learning tasks. We illustrate the promise of these models and training methods on both synthetic and real data, and achieve high classification accuracy on the 20 Newsgroups dataset.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
COVID-19 Literature Topic-Based Search via Hierarchical NMF
Authors:
Rachel Grotheer,
Yihuan Huang,
Pengyu Li,
Elizaveta Rebrova,
Deanna Needell,
Longxiu Huang,
Alona Kryshchenko,
Xia Li,
Kyung Ha,
Oleksandr Kryshchenko
Abstract:
A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics…
▽ More
A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines, genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Stochastic Iterative Hard Thresholding for Low-Tucker-Rank Tensor Recovery
Authors:
Rachel Grotheer,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
Low-rank tensor recovery problems have been widely studied in many applications of signal processing and machine learning. Tucker decomposition is known as one of the most popular decompositions in the tensor framework. In recent years, researchers have developed many state-of-the-art algorithms to address the problem of low-Tucker-rank tensor recovery. Motivated by the favorable properties of the…
▽ More
Low-rank tensor recovery problems have been widely studied in many applications of signal processing and machine learning. Tucker decomposition is known as one of the most popular decompositions in the tensor framework. In recent years, researchers have developed many state-of-the-art algorithms to address the problem of low-Tucker-rank tensor recovery. Motivated by the favorable properties of the stochastic algorithms, such as stochastic gradient descent and stochastic iterative hard thresholding, we aim to extend the well-known stochastic iterative hard thresholding algorithm to the tensor framework in order to address the problem of recovering a low-Tucker-rank tensor from its linear measurements. We have also developed linear convergence analysis for the proposed method and conducted a series of experiments with both synthetic and real data to illustrate the performance of the proposed method.
△ Less
Submitted 16 July, 2020; v1 submitted 22 September, 2019;
originally announced September 2019.
-
Iterative Hard Thresholding for Low CP-rank Tensor Models
Authors:
Rachel Grotheer,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
Recovery of low-rank matrices from a small number of linear measurements is now well-known to be possible under various model assumptions on the measurements. Such results demonstrate robustness and are backed with provable theoretical guarantees. However, extensions to tensor recovery have only recently began to be studied and developed, despite an abundance of practical tensor applications. Rece…
▽ More
Recovery of low-rank matrices from a small number of linear measurements is now well-known to be possible under various model assumptions on the measurements. Such results demonstrate robustness and are backed with provable theoretical guarantees. However, extensions to tensor recovery have only recently began to be studied and developed, despite an abundance of practical tensor applications. Recently, a tensor variant of the Iterative Hard Thresholding method was proposed and theoretical results were obtained that guarantee exact recovery of tensors with low Tucker rank. In this paper, we utilize the same tensor version of the Restricted Isometry Property (RIP) to extend these results for tensors with low CANDECOMP/PARAFAC (CP) rank. In doing so, we leverage recent results on efficient approximations of CP decompositions that remove the need for challenging assumptions in prior works. We complement our theoretical findings with empirical results that showcase the potential of the approach.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
Alternatives for Generating a Reduced Basis to Solve the Hyperspectral Diffuse Optical Tomography Model
Authors:
Rachel Grotheer,
Thilo Strauss,
Phil Gralla,
Taufiquar Khan
Abstract:
The Reduced Basis Method (RBM) is a model reduction technique used to solve parametric PDEs that relies upon a basis set of solutions to the PDE at specific parameter values. To generate this reduced basis, the set of a small number of parameter values must be strategically chosen. We apply a Metropolis algorithm and a gradient algorithm to find the set of parameters and compare them to the standa…
▽ More
The Reduced Basis Method (RBM) is a model reduction technique used to solve parametric PDEs that relies upon a basis set of solutions to the PDE at specific parameter values. To generate this reduced basis, the set of a small number of parameter values must be strategically chosen. We apply a Metropolis algorithm and a gradient algorithm to find the set of parameters and compare them to the standard greedy algorithm most commonly used in the RBM. We test our methods by using the RBM to solve a simplified version of the governing partial differential equation for hyperspectral diffuse optical tomography (hyDOT). The governing equation for hyDOT is an elliptic PDE parameterized by the wavelength of the laser source. For this one-dimensional problem, we find that both the Metropolis and gradient algorithms are potentially superior alternatives to the greedy algorithm in that they generate a reduced basis which produces solutions with a smaller relative error with respect to solutions found using the finite element method and in less time.
△ Less
Submitted 2 March, 2018;
originally announced March 2018.
-
Compressed Anomaly Detection with Multiple Mixed Observations
Authors:
Natalie Durgin,
Rachel Grotheer,
Chenxi Huang,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
We consider a collection of independent random variables that are identically distributed, except for a small subset which follows a different, anomalous distribution. We study the problem of detecting which random variables in the collection are governed by the anomalous distribution. Recent work proposes to solve this problem by conducting hypothesis tests based on mixed observations (e.g. linea…
▽ More
We consider a collection of independent random variables that are identically distributed, except for a small subset which follows a different, anomalous distribution. We study the problem of detecting which random variables in the collection are governed by the anomalous distribution. Recent work proposes to solve this problem by conducting hypothesis tests based on mixed observations (e.g. linear combinations) of the random variables. Recognizing the connection between taking mixed observations and compressed sensing, we view the problem as recovering the "support" (index set) of the anomalous random variables from multiple measurement vectors (MMVs). Many algorithms have been developed for recovering jointly sparse signals and their support from MMVs. We establish the theoretical and empirical effectiveness of these algorithms at detecting anomalies. We also extend the LASSO algorithm to an MMV version for our purpose. Further, we perform experiments on synthetic data, consisting of samples from the random variables, to explore the trade-off between the number of mixed observations per sample and the number of samples required to detect anomalies.
△ Less
Submitted 19 June, 2018; v1 submitted 30 January, 2018;
originally announced January 2018.
-
Sparse Randomized Kaczmarz for Support Recovery of Jointly Sparse Corrupted Multiple Measurement Vectors
Authors:
Natalie Durgin,
Rachel Grotheer,
Chenxi Huang,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
While single measurement vector (SMV) models have been widely studied in signal processing, there is a surging interest in addressing the multiple measurement vectors (MMV) problem. In the MMV setting, more than one measurement vector is available and the multiple signals to be recovered share some commonalities such as a common support. Applications in which MMV is a naturally occurring phenomeno…
▽ More
While single measurement vector (SMV) models have been widely studied in signal processing, there is a surging interest in addressing the multiple measurement vectors (MMV) problem. In the MMV setting, more than one measurement vector is available and the multiple signals to be recovered share some commonalities such as a common support. Applications in which MMV is a naturally occurring phenomenon include online streaming, medical imaging, and video recovery. This work presents a stochastic iterative algorithm for the support recovery of jointly sparse corrupted MMV. We present a variant of the Sparse Randomized Kaczmarz algorithm for corrupted MMV and compare our proposed method with an existing Kaczmarz type algorithm for MMV problems. We also showcase the usefulness of our approach in the online (streaming) setting and provide empirical evidence that suggests the robustness of the proposed method to the distribution of the corruption and the number of corruptions occurring.
△ Less
Submitted 14 June, 2018; v1 submitted 7 November, 2017;
originally announced November 2017.
-
Stochastic Greedy Algorithms For Multiple Measurement Vectors
Authors:
Jing Qin,
Shuang Li,
Deanna Needell,
Anna Ma,
Rachel Grotheer,
Chenxi Huang,
Natalie Durgin
Abstract:
Sparse representation of a single measurement vector (SMV) has been explored in a variety of compressive sensing applications. Recently, SMV models have been extended to solve multiple measurement vectors (MMV) problems, where the underlying signal is assumed to have joint sparse structures. To circumvent the NP-hardness of the $\ell_0$ minimization problem, many deterministic MMV algorithms solve…
▽ More
Sparse representation of a single measurement vector (SMV) has been explored in a variety of compressive sensing applications. Recently, SMV models have been extended to solve multiple measurement vectors (MMV) problems, where the underlying signal is assumed to have joint sparse structures. To circumvent the NP-hardness of the $\ell_0$ minimization problem, many deterministic MMV algorithms solve the convex relaxed models with limited efficiency. In this paper, we develop stochastic greedy algorithms for solving the joint sparse MMV reconstruction problem. In particular, we propose the MMV Stochastic Iterative Hard Thresholding (MStoIHT) and MMV Stochastic Gradient Matching Pursuit (MStoGradMP) algorithms, and we also utilize the mini-batching technique to further improve their performance. Convergence analysis indicates that the proposed algorithms are able to converge faster than their SMV counterparts, i.e., concatenated StoIHT and StoGradMP, under certain conditions. Numerical experiments have illustrated the superior effectiveness of the proposed algorithms over their SMV counterparts.
△ Less
Submitted 22 August, 2020; v1 submitted 4 November, 2017;
originally announced November 2017.