-
CoCA: Cooperative Component Analysis
Authors:
Daisy Yi Ding,
Alden Green,
Min Woo Sun,
Robert Tibshirani
Abstract:
We propose Cooperative Component Analysis (CoCA), a new method for unsupervised multi-view analysis: it identifies the component that simultaneously captures significant within-view variance and exhibits strong cross-view correlation. The challenge of integrating multi-view data is particularly important in biology and medicine, where various types of "-omic" data, ranging from genomics to proteom…
▽ More
We propose Cooperative Component Analysis (CoCA), a new method for unsupervised multi-view analysis: it identifies the component that simultaneously captures significant within-view variance and exhibits strong cross-view correlation. The challenge of integrating multi-view data is particularly important in biology and medicine, where various types of "-omic" data, ranging from genomics to proteomics, are measured on the same set of samples. The goal is to uncover important, shared signals that represent underlying biological mechanisms. CoCA combines an approximation error loss to preserve information within data views and an "agreement penalty" to encourage alignment across data views. By balancing the trade-off between these two key components in the objective, CoCA has the property of interpolating between the commonly-used principal component analysis (PCA) and canonical correlation analysis (CCA) as special cases at the two ends of the solution path. CoCA chooses the degree of agreement in a data-adaptive manner, using a validation set or cross-validation to estimate test error. Furthermore, we propose a sparse variant of CoCA that incorporates the Lasso penalty to yield feature sparsity, facilitating the identification of key features driving the observed patterns. We demonstrate the effectiveness of CoCA on simulated data and two real multiomics studies of COVID-19 and ductal carcinoma in situ of breast. In both real data applications, CoCA successfully integrates multiomics data, extracting components that are not only consistently present across different data views but also more informative and predictive of disease progression. CoCA offers a powerful framework for discovering important shared signals in multi-view data, with the potential to uncover novel insights in an increasingly multi-view data world.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Authors:
Xinmeng Huang,
Shuo Li,
Edgar Dobriban,
Osbert Bastani,
Hamed Hassani,
Dongsheng Ding
Abstract:
The growing safety concerns surrounding large language models raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety. A promising approach is to enforce safety constraints through Reinforcement Learning from Human Feedback (RLHF). For such constrained RLHF, typical Lagrangian-based primal-dual policy optimization methods are computa…
▽ More
The growing safety concerns surrounding large language models raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety. A promising approach is to enforce safety constraints through Reinforcement Learning from Human Feedback (RLHF). For such constrained RLHF, typical Lagrangian-based primal-dual policy optimization methods are computationally expensive and often unstable. This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem. We do so by pre-optimizing a smooth and convex dual function that has a closed form. This shortcut eliminates the need for cumbersome primal-dual policy iterations, greatly reducing the computational burden and improving training stability. Our strategy leads to two practical algorithms in model-based and preference-based settings (MoCAN and PeCAN, respectively). A broad range of experiments demonstrate the effectiveness and merits of our algorithms.
△ Less
Submitted 22 November, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
An Efficient 1 Iteration Learning Algorithm for Gaussian Mixture Model And Gaussian Mixture Embedding For Neural Network
Authors:
Weiguo Lu,
Xuan Wu,
Deng Ding,
Gangnan Yuan
Abstract:
We propose an Gaussian Mixture Model (GMM) learning algorithm, based on our previous work of GMM expansion idea. The new algorithm brings more robustness and simplicity than classic Expectation Maximization (EM) algorithm. It also improves the accuracy and only take 1 iteration for learning. We theoretically proof that this new algorithm is guarantee to converge regardless the parameters initialis…
▽ More
We propose an Gaussian Mixture Model (GMM) learning algorithm, based on our previous work of GMM expansion idea. The new algorithm brings more robustness and simplicity than classic Expectation Maximization (EM) algorithm. It also improves the accuracy and only take 1 iteration for learning. We theoretically proof that this new algorithm is guarantee to converge regardless the parameters initialisation. We compare our GMM expansion method with classic probability layers in neural network leads to demonstrably better capability to overcome data uncertainty and inverse problem. Finally, we test GMM based generator which shows a potential to build further application that able to utilized distribution random sampling for stochastic variation as well as variation control.
△ Less
Submitted 6 September, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Semi-supervised Cooperative Learning for Multiomics Data Fusion
Authors:
Daisy Yi Ding,
Xiaotao Shen,
Michael Snyder,
Robert Tibshirani
Abstract:
Multiomics data fusion integrates diverse data modalities, ranging from transcriptomics to proteomics, to gain a comprehensive understanding of biological systems and enhance predictions on outcomes of interest related to disease phenotypes and treatment responses. Cooperative learning, a recently proposed method, unifies the commonly-used fusion approaches, including early and late fusion, and of…
▽ More
Multiomics data fusion integrates diverse data modalities, ranging from transcriptomics to proteomics, to gain a comprehensive understanding of biological systems and enhance predictions on outcomes of interest related to disease phenotypes and treatment responses. Cooperative learning, a recently proposed method, unifies the commonly-used fusion approaches, including early and late fusion, and offers a systematic framework for leveraging the shared underlying relationships across omics to strengthen signals. However, the challenge of acquiring large-scale labeled data remains, and there are cases where multiomics data are available but in the absence of annotated labels. To harness the potential of unlabeled multiomcis data, we introduce semi-supervised cooperative learning. By utilizing an "agreement penalty", our method incorporates the additional unlabeled data in the learning process and achieves consistently superior predictive performance on simulated data and a real multiomics study of aging. It offers an effective solution to multiomics data fusion in settings with both labeled and unlabeled data and maximizes the utility of available data resources, with the potential of significantly improving predictive models for diagnostics and therapeutics in an increasingly multiomics world.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Cooperative learning for multiview analysis
Authors:
Daisy Yi Ding,
Shuangning Li,
Balasubramanian Narasimhan,
Robert Tibshirani
Abstract:
We propose a new method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data such as genomics, proteomics and radiomics are measured on a common set of samples. Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from…
▽ More
We propose a new method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data such as genomics, proteomics and radiomics are measured on a common set of samples. Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. One version of our fitting procedure is modular, where one can choose different fitting mechanisms (e.g. lasso, random forests, boosting, neural networks) appropriate for different data views. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals. We show that cooperative learning achieves higher predictive accuracy on simulated data and a real multiomics example of labor onset prediction. Leveraging aligned signals and allowing flexible fitting mechanisms for different modalities, cooperative learning offers a powerful approach to multiomics data fusion.
△ Less
Submitted 3 September, 2022; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Handling Missing Data with Graph Representation Learning
Authors:
Jiaxuan You,
Xiaobai Ma,
Daisy Yi Ding,
Mykel Kochenderfer,
Jure Leskovec
Abstract:
Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based on observed values, and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting…
▽ More
Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based on observed values, and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label prediction often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a graph-based framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using a graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task. These tasks are then solved with Graph Neural Networks. Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks, compared with existing state-of-the-art methods.
△ Less
Submitted 30 October, 2020;
originally announced October 2020.
-
Dirichlet Process Mixture Models with Shrinkage Prior
Authors:
Dawei Ding,
George Karabatsos
Abstract:
We propose Dirichlet Process Mixture (DPM) models for prediction and cluster-wise variable selection, based on two choices of shrinkage baseline prior distributions for the linear regression coefficients, namely the Horseshoe prior and Normal-Gamma prior. We show in a simulation study that each of the two proposed DPM models tend to outperform the standard DPM model based on the non-shrinkage norm…
▽ More
We propose Dirichlet Process Mixture (DPM) models for prediction and cluster-wise variable selection, based on two choices of shrinkage baseline prior distributions for the linear regression coefficients, namely the Horseshoe prior and Normal-Gamma prior. We show in a simulation study that each of the two proposed DPM models tend to outperform the standard DPM model based on the non-shrinkage normal prior, in terms of predictive, variable selection, and clustering accuracy. This is especially true for the Horseshoe model, and when the number of covariates exceeds the within-cluster sample size. A real data set is analyzed to illustrate the proposed modeling methodology, where both proposed DPM models again attained better predictive accuracy.
△ Less
Submitted 25 February, 2021; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Fast and Secure Distributed Nonnegative Matrix Factorization
Authors:
Yuqiu Qian,
Conghui Tan,
Danhao Ding,
Hui Li,
Nikos Mamoulis
Abstract:
Nonnegative matrix factorization (NMF) has been successfully applied in several data mining tasks. Recently, there is an increasing interest in the acceleration of NMF, due to its high cost on large matrices. On the other hand, the privacy issue of NMF over federated data is worthy of attention, since NMF is prevalently applied in image and text analysis which may involve leveraging privacy data (…
▽ More
Nonnegative matrix factorization (NMF) has been successfully applied in several data mining tasks. Recently, there is an increasing interest in the acceleration of NMF, due to its high cost on large matrices. On the other hand, the privacy issue of NMF over federated data is worthy of attention, since NMF is prevalently applied in image and text analysis which may involve leveraging privacy data (e.g, medical image and record) across several parties (e.g., hospitals). In this paper, we study the acceleration and security problems of distributed NMF. Firstly, we propose a distributed sketched alternating nonnegative least squares (DSANLS) framework for NMF, which utilizes a matrix sketching technique to reduce the size of nonnegative least squares subproblems with a convergence guarantee. For the second problem, we show that DSANLS with modification can be adapted to the security setting, but only for one or limited iterations. Consequently, we propose four efficient distributed NMF methods in both synchronous and asynchronous settings with a security guarantee. We conduct extensive experiments on several real datasets to show the superiority of our proposed methods. The implementation of our methods is available at https://github.com/qianyuqiu79/DSANLS.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization
Authors:
Dongsheng Ding,
Xiaohan Wei,
Zhuoran Yang,
Zhaoran Wang,
Mihailo R. Jovanović
Abstract:
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation in which an agent aims to maximize the expected total reward subject to a safety constraint on the expected total value of a utility function. We focus on an episodic setting with the function approximation where the Markov transition kernels have a linear structure but do not im…
▽ More
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation in which an agent aims to maximize the expected total reward subject to a safety constraint on the expected total value of a utility function. We focus on an episodic setting with the function approximation where the Markov transition kernels have a linear structure but do not impose any additional assumptions on the sampling model. Designing SRL algorithms with provable computational and statistical efficiency is particularly challenging under this setting because of the need to incorporate both the safety constraint and the function approximation into the fundamental exploitation/exploration tradeoff. To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration. We prove that the proposed algorithm achieves an $\tilde{O}(d H^{2.5}\sqrt{T})$ regret and an $\tilde{O}(d H^{2.5}\sqrt{T})$ constraint violation, where $d$ is the dimension of the feature mapping, $H$ is the horizon of each episode, and $T$ is the total number of steps. These bounds hold when the reward/utility functions are fixed but the feedback after each episode is bandit. Our bounds depend on the capacity of the state-action space only through the dimension of the feature mapping and thus our results hold even when the number of states goes to infinity. To the best of our knowledge, we provide the first provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
△ Less
Submitted 25 October, 2020; v1 submitted 1 March, 2020;
originally announced March 2020.
-
Adapting Behaviour for Learning Progress
Authors:
Tom Schaul,
Diana Borsa,
David Ding,
David Szepesvari,
Georg Ostrovski,
Will Dabney,
Simon Osindero
Abstract:
Determining what experience to generate to best facilitate learning (i.e. exploration) is one of the distinguishing features and open challenges in reinforcement learning. The advent of distributed agents that interact with parallel instances of the environment has enabled larger scales and greater flexibility, but has not removed the need to tune exploration to the task, because the ideal data fo…
▽ More
Determining what experience to generate to best facilitate learning (i.e. exploration) is one of the distinguishing features and open challenges in reinforcement learning. The advent of distributed agents that interact with parallel instances of the environment has enabled larger scales and greater flexibility, but has not removed the need to tune exploration to the task, because the ideal data for the learning algorithm necessarily depends on its process of learning. We propose to dynamically adapt the data generation by using a non-stationary multi-armed bandit to optimize a proxy of the learning progress. The data distribution is controlled by modulating multiple parameters of the policy (such as stochasticity, consistency or optimism) without significant overhead. The adaptation speed of the bandit can be increased by exploiting the factored modulation structure. We demonstrate on a suite of Atari 2600 games how this unified approach produces results comparable to per-task tuning at a fraction of the cost.
△ Less
Submitted 14 December, 2019;
originally announced December 2019.
-
Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in Healthcare
Authors:
Scott L. Fleming,
Kuhan Jeyapragasan,
Tony Duan,
Daisy Ding,
Saurabh Gombar,
Nigam Shah,
Emma Brunskill
Abstract:
There is an emerging trend in the reinforcement learning for healthcare literature. In order to prepare longitudinal, irregularly sampled, clinical datasets for reinforcement learning algorithms, many researchers will resample the time series data to short, regular intervals and use last-observation-carried-forward (LOCF) imputation to fill in these gaps. Typically, they will not maintain any expl…
▽ More
There is an emerging trend in the reinforcement learning for healthcare literature. In order to prepare longitudinal, irregularly sampled, clinical datasets for reinforcement learning algorithms, many researchers will resample the time series data to short, regular intervals and use last-observation-carried-forward (LOCF) imputation to fill in these gaps. Typically, they will not maintain any explicit information about which values were imputed. In this work, we (1) call attention to this practice and discuss its potential implications; (2) propose an alternative representation of the patient state that addresses some of these issues; and (3) demonstrate in a novel but representative clinical dataset that our alternative representation yields consistently better results for achieving optimal control, as measured by off-policy policy evaluation, compared to representations that do not incorporate missingness information.
△ Less
Submitted 16 November, 2019;
originally announced November 2019.
-
NGBoost: Natural Gradient Boosting for Probabilistic Prediction
Authors:
Tony Duan,
Anand Avati,
Daisy Yi Ding,
Khanh K. Thai,
Sanjay Basu,
Andrew Y. Ng,
Alejandro Schuler
Abstract:
We present Natural Gradient Boosting (NGBoost), an algorithm for generic probabilistic prediction via gradient boosting. Typical regression models return a point estimate, conditional on covariates, but probabilistic regression models output a full probability distribution over the outcome space, conditional on the covariates. This allows for predictive uncertainty estimation -- crucial in applica…
▽ More
We present Natural Gradient Boosting (NGBoost), an algorithm for generic probabilistic prediction via gradient boosting. Typical regression models return a point estimate, conditional on covariates, but probabilistic regression models output a full probability distribution over the outcome space, conditional on the covariates. This allows for predictive uncertainty estimation -- crucial in applications like healthcare and weather forecasting. NGBoost generalizes gradient boosting to probabilistic regression by treating the parameters of the conditional distribution as targets for a multiparameter boosting algorithm. Furthermore, we show how the Natural Gradient is required to correct the training dynamics of our multiparameter boosting approach. NGBoost can be used with any base learner, any family of distributions with continuous parameters, and any scoring rule. NGBoost matches or exceeds the performance of existing methods for probabilistic prediction while offering additional benefits in flexibility, scalability, and usability. An open-source implementation is available at github.com/stanfordmlgroup/ngboost.
△ Less
Submitted 9 June, 2020; v1 submitted 8 October, 2019;
originally announced October 2019.
-
Counterfactual Reasoning for Fair Clinical Risk Prediction
Authors:
Stephen Pfohl,
Tony Duan,
Daisy Yi Ding,
Nigam H. Shah
Abstract:
The use of machine learning systems to support decision making in healthcare raises questions as to what extent these systems may introduce or exacerbate disparities in care for historically underrepresented and mistreated groups, due to biases implicitly embedded in observational data in electronic health records. To address this problem in the context of clinical risk prediction models, we devel…
▽ More
The use of machine learning systems to support decision making in healthcare raises questions as to what extent these systems may introduce or exacerbate disparities in care for historically underrepresented and mistreated groups, due to biases implicitly embedded in observational data in electronic health records. To address this problem in the context of clinical risk prediction models, we develop an augmented counterfactual fairness criteria to extend the group fairness criteria of equalized odds to an individual level. We do so by requiring that the same prediction be made for a patient, and a counterfactual patient resulting from changing a sensitive attribute, if the factual and counterfactual outcomes do not differ. We investigate the extent to which the augmented counterfactual fairness criteria may be applied to develop fair models for prolonged inpatient length of stay and mortality with observational electronic health records data. As the fairness criteria is ill-defined without knowledge of the data generating process, we use a variational autoencoder to perform counterfactual inference in the context of an assumed causal graph. While our technique provides a means to trade off maintenance of fairness with reduction in predictive performance in the context of a learned generative model, further work is needed to assess the generality of this approach.
△ Less
Submitted 14 July, 2019;
originally announced July 2019.
-
Tree-based Particle Smoothing Algorithms in a Hidden Markov Model
Authors:
Dong Ding,
Axel Gandy
Abstract:
We provide a new strategy built on the divide-and-conquer approach by Lindsten et al. (2017) to investigate the smoothing problem in a hidden Markov model. We employ this approach to decompose a hidden Markov model into sub-models with intermediate target distributions based on an auxiliary tree structure and produce independent samples from the sub-models at the leaf nodes towards the original mo…
▽ More
We provide a new strategy built on the divide-and-conquer approach by Lindsten et al. (2017) to investigate the smoothing problem in a hidden Markov model. We employ this approach to decompose a hidden Markov model into sub-models with intermediate target distributions based on an auxiliary tree structure and produce independent samples from the sub-models at the leaf nodes towards the original model of interest at the root. We review the target distribution in the sub-models suggested by Lindsten et al. and propose two new classes of target distributions, which are the estimates of the (joint) filtering distributions and the (joint) smoothing distributions. The first proposed type is straightforwardly constructible by running a filtering algorithm in advance. The algorithm using the second type of target distributions has an advantage of roughly retaining the marginals of all random variables invariant at all levels of the tree at the cost of approximating the marginal smoothing distributions in advance. We further propose the constructions of these target distributions using pre-generated Monte Carlo samples. We show empirically the algorithms with the proposed intermediate target distributions give stable and comparable results as the conventional smoothing methods in a linear Gaussian model and a non-linear model.
△ Less
Submitted 25 August, 2018;
originally announced August 2018.
-
The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data
Authors:
Daisy Yi Ding,
Chloé Simpson,
Stephen Pfohl,
Dave C. Kale,
Kenneth Jung,
Nigam H. Shah
Abstract:
Electronic phenotyping is the task of ascertaining whether an individual has a medical condition of interest by analyzing their medical record and is foundational in clinical informatics. Increasingly, electronic phenotyping is performed via supervised learning. We investigate the effectiveness of multitask learning for phenotyping using electronic health records (EHR) data. Multitask learning aim…
▽ More
Electronic phenotyping is the task of ascertaining whether an individual has a medical condition of interest by analyzing their medical record and is foundational in clinical informatics. Increasingly, electronic phenotyping is performed via supervised learning. We investigate the effectiveness of multitask learning for phenotyping using electronic health records (EHR) data. Multitask learning aims to improve model performance on a target task by jointly learning additional auxiliary tasks and has been used in disparate areas of machine learning. However, its utility when applied to EHR data has not been established, and prior work suggests that its benefits are inconsistent. We present experiments that elucidate when multitask learning with neural nets improves performance for phenotyping using EHR data relative to neural nets trained for a single phenotype and to well-tuned logistic regression baselines. We find that multitask neural nets consistently outperform single-task neural nets for rare phenotypes but underperform for relatively more common phenotypes. The effect size increases as more auxiliary tasks are added. Moreover, multitask learning reduces the sensitivity of neural nets to hyperparameter settings for rare phenotypes. Last, we quantify phenotype complexity and find that neural nets trained with or without multitask learning do not improve on simple baselines unless the phenotypes are sufficiently complex.
△ Less
Submitted 5 January, 2019; v1 submitted 9 August, 2018;
originally announced August 2018.
-
Theoretical Analysis of Image-to-Image Translation with Adversarial Learning
Authors:
Xudong Pan,
Mi Zhang,
Daizong Ding
Abstract:
Recently, a unified model for image-to-image translation tasks within adversarial learning framework has aroused widespread research interests in computer vision practitioners. Their reported empirical success however lacks solid theoretical interpretations for its inherent mechanism. In this paper, we reformulate their model from a brand-new geometrical perspective and have eventually reached a f…
▽ More
Recently, a unified model for image-to-image translation tasks within adversarial learning framework has aroused widespread research interests in computer vision practitioners. Their reported empirical success however lacks solid theoretical interpretations for its inherent mechanism. In this paper, we reformulate their model from a brand-new geometrical perspective and have eventually reached a full interpretation on some interesting but unclear empirical phenomenons from their experiments. Furthermore, by extending the definition of generalization for generative adversarial nets to a broader sense, we have derived a condition to control the generalization capability of their model. According to our derived condition, several practical suggestions have also been proposed on model design and dataset construction as a guidance for further empirical researches.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
Authors:
Pranav Rajpurkar,
Jeremy Irvin,
Kaylie Zhu,
Brandon Yang,
Hershel Mehta,
Tony Duan,
Daisy Ding,
Aarti Bagul,
Curtis Langlotz,
Katie Shpanskaya,
Matthew P. Lungren,
Andrew Y. Ng
Abstract:
We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on w…
▽ More
We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on which we compare the performance of CheXNet to that of radiologists. We find that CheXNet exceeds average radiologist performance on the F1 metric. We extend CheXNet to detect all 14 diseases in ChestX-ray14 and achieve state of the art results on all 14 diseases.
△ Less
Submitted 25 December, 2017; v1 submitted 14 November, 2017;
originally announced November 2017.
-
Deep Lattice Networks and Partial Monotonic Functions
Authors:
Seungil You,
David Ding,
Kevin Canini,
Jan Pfeifer,
Maya Gupta
Abstract:
We propose learning deep models that are monotonic with respect to a user-specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network. We implement the layers and projections with new computational graph nodes in TensorFlow and use t…
▽ More
We propose learning deep models that are monotonic with respect to a user-specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network. We implement the layers and projections with new computational graph nodes in TensorFlow and use the ADAM optimizer and batched stochastic gradients. Experiments on benchmark and real-world datasets show that six-layer monotonic deep lattice networks achieve state-of-the art performance for classification and regression with monotonicity guarantees.
△ Less
Submitted 19 September, 2017;
originally announced September 2017.
-
Implementing Monte Carlo Tests with P-value Buckets
Authors:
Axel Gandy,
Georg Hahn,
Dong Ding
Abstract:
Software packages usually report the results of statistical tests using p-values. Users often interpret these by comparing them to standard thresholds, e.g. 0.1%, 1% and 5%, which is sometimes reinforced by a star rating (***, **, *). We consider an arbitrary statistical test whose p-value p is not available explicitly, but can be approximated by Monte Carlo samples, e.g. by bootstrap or permutati…
▽ More
Software packages usually report the results of statistical tests using p-values. Users often interpret these by comparing them to standard thresholds, e.g. 0.1%, 1% and 5%, which is sometimes reinforced by a star rating (***, **, *). We consider an arbitrary statistical test whose p-value p is not available explicitly, but can be approximated by Monte Carlo samples, e.g. by bootstrap or permutation tests. The standard implementation of such tests usually draws a fixed number of samples to approximate p. However, the probability that the exact and the approximated p-value lie on different sides of a threshold (the resampling risk) can be high, particularly for p-values close to a threshold. We present a method to overcome this. We consider a finite set of user-specified intervals which cover [0,1] and which can be overlapping. We call these p-value buckets. We present algorithms that, with arbitrarily high probability, return a p-value bucket containing p. We prove that for both a bounded resampling risk and a finite runtime, overlapping buckets need to be employed, and that our methods both bound the resampling risk and guarantee a finite runtime for such overlapping buckets. To interpret decisions with overlapping buckets, we propose an extension of the star rating system. We demonstrate that our methods are suitable for use in standard software, including for low p-value thresholds occurring in multiple testing settings, and that they can be computationally more efficient than standard implementations.
△ Less
Submitted 4 November, 2019; v1 submitted 27 March, 2017;
originally announced March 2017.
-
A simple method for implementing Monte Carlo tests
Authors:
Dong Ding,
Axel Gandy,
Georg Hahn
Abstract:
We consider a statistical test whose p-value can only be approximated using Monte Carlo simulations. We are interested in deciding whether the p-value for an observed data set lies above or below a given threshold such as 5%. We want to ensure that the resampling risk, the probability of the (Monte Carlo) decision being different from the true decision, is uniformly bounded. This article introduce…
▽ More
We consider a statistical test whose p-value can only be approximated using Monte Carlo simulations. We are interested in deciding whether the p-value for an observed data set lies above or below a given threshold such as 5%. We want to ensure that the resampling risk, the probability of the (Monte Carlo) decision being different from the true decision, is uniformly bounded. This article introduces a simple open-ended method with this property, the confidence sequence method (CSM). We compare our approach to another algorithm, SIMCTEST, which also guarantees an (asymptotic) uniform bound on the resampling risk, as well as to other Monte Carlo procedures without a uniform bound. CSM is free of tuning parameters and conservative. It has the same theoretical guarantee as SIMCTEST and, in many settings, similar stopping boundaries. As it is much simpler than other methods, CSM is a useful method for practical applications.
△ Less
Submitted 9 October, 2019; v1 submitted 5 November, 2016;
originally announced November 2016.