-
Stochastic Natural Thresholding Algorithms
Authors:
Rachel Grotheer,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and disc…
▽ More
Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Automatic Infectious Disease Classification Analysis with Concept Discovery
Authors:
Elena Sizikova,
Joshua Vendrow,
Xu Cao,
Rachel Grotheer,
Jamie Haddock,
Lara Kassab,
Alona Kryshchenko,
Thomas Merkh,
R. W. M. A. Madushani,
Kenny Moise,
Annie Ulichney,
Huy V. Vo,
Chuntian Wang,
Megan Coffee,
Kathryn Leonard,
Deanna Needell
Abstract:
Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmiss…
▽ More
Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmission and improve clinical outcomes. In order to understand and trust neural network predictions, analysis of learned representations is necessary. In this work, we argue that automatic discovery of concepts, i.e., human interpretable attributes, allows for a deep understanding of learned information in medical image analysis tasks, generalizing beyond the training labels or protocols. We provide an overview of existing concept discovery approaches in medical image and computer vision communities, and evaluate representative methods on tuberculosis (TB) prediction and monkeypox prediction tasks. Finally, we propose NMFx, a general NMF formulation of interpretability by concept discovery that works in a unified way in unsupervised, weakly supervised, and supervised scenarios.
△ Less
Submitted 14 November, 2022; v1 submitted 28 August, 2022;
originally announced September 2022.
-
Semi-supervised Nonnegative Matrix Factorization for Document Classification
Authors:
Jamie Haddock,
Lara Kassab,
Sixian Li,
Alona Kryshchenko,
Rachel Grotheer,
Elena Sizikova,
Chuntian Wang,
Thomas Merkh,
RWMA Madushani,
Miju Ahn,
Deanna Needell,
Kathryn Leonard
Abstract:
We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates f…
▽ More
We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates for each new model, and demonstrate the application of these models to single-label and multi-label document classification, although the models are flexible to other supervised learning tasks such as regression. We illustrate the promise of these models and training methods on document classification datasets (e.g., 20 Newsgroups, Reuters).
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
Semi-supervised NMF Models for Topic Modeling in Learning Tasks
Authors:
Jamie Haddock,
Lara Kassab,
Sixian Li,
Alona Kryshchenko,
Rachel Grotheer,
Elena Sizikova,
Chuntian Wang,
Thomas Merkh,
R. W. M. A. Madushani,
Miju Ahn,
Deanna Needell,
Kathryn Leonard
Abstract:
We propose several new models for semi-supervised nonnegative matrix factorization (SSNMF) and provide motivation for SSNMF models as maximum likelihood estimators given specific distributions of uncertainty. We present multiplicative updates training methods for each new model, and demonstrate the application of these models to classification, although they are flexible to other supervised learni…
▽ More
We propose several new models for semi-supervised nonnegative matrix factorization (SSNMF) and provide motivation for SSNMF models as maximum likelihood estimators given specific distributions of uncertainty. We present multiplicative updates training methods for each new model, and demonstrate the application of these models to classification, although they are flexible to other supervised learning tasks. We illustrate the promise of these models and training methods on both synthetic and real data, and achieve high classification accuracy on the 20 Newsgroups dataset.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
COVID-19 Literature Topic-Based Search via Hierarchical NMF
Authors:
Rachel Grotheer,
Yihuan Huang,
Pengyu Li,
Elizaveta Rebrova,
Deanna Needell,
Longxiu Huang,
Alona Kryshchenko,
Xia Li,
Kyung Ha,
Oleksandr Kryshchenko
Abstract:
A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics…
▽ More
A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines, genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Compressed Anomaly Detection with Multiple Mixed Observations
Authors:
Natalie Durgin,
Rachel Grotheer,
Chenxi Huang,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
We consider a collection of independent random variables that are identically distributed, except for a small subset which follows a different, anomalous distribution. We study the problem of detecting which random variables in the collection are governed by the anomalous distribution. Recent work proposes to solve this problem by conducting hypothesis tests based on mixed observations (e.g. linea…
▽ More
We consider a collection of independent random variables that are identically distributed, except for a small subset which follows a different, anomalous distribution. We study the problem of detecting which random variables in the collection are governed by the anomalous distribution. Recent work proposes to solve this problem by conducting hypothesis tests based on mixed observations (e.g. linear combinations) of the random variables. Recognizing the connection between taking mixed observations and compressed sensing, we view the problem as recovering the "support" (index set) of the anomalous random variables from multiple measurement vectors (MMVs). Many algorithms have been developed for recovering jointly sparse signals and their support from MMVs. We establish the theoretical and empirical effectiveness of these algorithms at detecting anomalies. We also extend the LASSO algorithm to an MMV version for our purpose. Further, we perform experiments on synthetic data, consisting of samples from the random variables, to explore the trade-off between the number of mixed observations per sample and the number of samples required to detect anomalies.
△ Less
Submitted 19 June, 2018; v1 submitted 30 January, 2018;
originally announced January 2018.
-
Sparse Randomized Kaczmarz for Support Recovery of Jointly Sparse Corrupted Multiple Measurement Vectors
Authors:
Natalie Durgin,
Rachel Grotheer,
Chenxi Huang,
Shuang Li,
Anna Ma,
Deanna Needell,
Jing Qin
Abstract:
While single measurement vector (SMV) models have been widely studied in signal processing, there is a surging interest in addressing the multiple measurement vectors (MMV) problem. In the MMV setting, more than one measurement vector is available and the multiple signals to be recovered share some commonalities such as a common support. Applications in which MMV is a naturally occurring phenomeno…
▽ More
While single measurement vector (SMV) models have been widely studied in signal processing, there is a surging interest in addressing the multiple measurement vectors (MMV) problem. In the MMV setting, more than one measurement vector is available and the multiple signals to be recovered share some commonalities such as a common support. Applications in which MMV is a naturally occurring phenomenon include online streaming, medical imaging, and video recovery. This work presents a stochastic iterative algorithm for the support recovery of jointly sparse corrupted MMV. We present a variant of the Sparse Randomized Kaczmarz algorithm for corrupted MMV and compare our proposed method with an existing Kaczmarz type algorithm for MMV problems. We also showcase the usefulness of our approach in the online (streaming) setting and provide empirical evidence that suggests the robustness of the proposed method to the distribution of the corruption and the number of corruptions occurring.
△ Less
Submitted 14 June, 2018; v1 submitted 7 November, 2017;
originally announced November 2017.
-
Stochastic Greedy Algorithms For Multiple Measurement Vectors
Authors:
Jing Qin,
Shuang Li,
Deanna Needell,
Anna Ma,
Rachel Grotheer,
Chenxi Huang,
Natalie Durgin
Abstract:
Sparse representation of a single measurement vector (SMV) has been explored in a variety of compressive sensing applications. Recently, SMV models have been extended to solve multiple measurement vectors (MMV) problems, where the underlying signal is assumed to have joint sparse structures. To circumvent the NP-hardness of the $\ell_0$ minimization problem, many deterministic MMV algorithms solve…
▽ More
Sparse representation of a single measurement vector (SMV) has been explored in a variety of compressive sensing applications. Recently, SMV models have been extended to solve multiple measurement vectors (MMV) problems, where the underlying signal is assumed to have joint sparse structures. To circumvent the NP-hardness of the $\ell_0$ minimization problem, many deterministic MMV algorithms solve the convex relaxed models with limited efficiency. In this paper, we develop stochastic greedy algorithms for solving the joint sparse MMV reconstruction problem. In particular, we propose the MMV Stochastic Iterative Hard Thresholding (MStoIHT) and MMV Stochastic Gradient Matching Pursuit (MStoGradMP) algorithms, and we also utilize the mini-batching technique to further improve their performance. Convergence analysis indicates that the proposed algorithms are able to converge faster than their SMV counterparts, i.e., concatenated StoIHT and StoGradMP, under certain conditions. Numerical experiments have illustrated the superior effectiveness of the proposed algorithms over their SMV counterparts.
△ Less
Submitted 22 August, 2020; v1 submitted 4 November, 2017;
originally announced November 2017.