Search | arXiv e-print repository

A comparison of Bayesian sampling algorithms for high-dimensional particle physics and cosmology applications

Authors: Joshua Albert, Csaba Balazs, Andrew Fowlie, Will Handley, Nicholas Hunt-Smith, Roberto Ruiz de Austri, Martin White

Abstract: For several decades now, Bayesian inference techniques have been applied to theories of particle physics, cosmology and astrophysics to obtain the probability density functions of their free parameters. In this study, we review and compare a wide range of Markov Chain Monte Carlo (MCMC) and nested sampling techniques to determine their relative efficacy on functions that resemble those encountered… ▽ More For several decades now, Bayesian inference techniques have been applied to theories of particle physics, cosmology and astrophysics to obtain the probability density functions of their free parameters. In this study, we review and compare a wide range of Markov Chain Monte Carlo (MCMC) and nested sampling techniques to determine their relative efficacy on functions that resemble those encountered most frequently in the particle astrophysics literature. Our first series of tests explores a series of high-dimensional analytic test functions that exemplify particular challenges, for example highly multimodal posteriors or posteriors with curving degeneracies. We then investigate two real physics examples, the first being a global fit of the $Λ$CDM model using cosmic microwave background data from the Planck experiment, and the second being a global fit of the Minimal Supersymmetric Standard Model using a wide variety of collider and astrophysics data. We show that several examples widely thought to be most easily solved using nested sampling approaches can in fact be more efficiently solved using modern MCMC algorithms, but the details of the implementation matter. Furthermore, we also provide a series of useful insights for practitioners of particle astrophysics and cosmology. △ Less

Submitted 24 November, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

Comments: 45 pages, 8 figures, 20 tables

arXiv:2406.16241 [pdf, other]

Position: Benchmarking is Limited in Reinforcement Learning Research

Authors: Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas

Abstract: Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite numerous calls for improvements, experimental practices continue to produce misleading or unsupported claims. One reason for the ongoing substandard practices is… ▽ More Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite numerous calls for improvements, experimental practices continue to produce misleading or unsupported claims. One reason for the ongoing substandard practices is that conducting rigorous benchmarking experiments requires substantial computational time. This work investigates the sources of increased computation costs in rigorous experiment designs. We show that conducting rigorous performance benchmarks will likely have computational costs that are often prohibitive. As a result, we argue for using an additional experimentation paradigm to overcome the limitations of benchmarking. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 19 pages, 13 figures, The Forty-first International Conference on Machine Learning (ICML 2024)

arXiv:2402.13425 [pdf, other]

Investigating the Histogram Loss in Regression

Authors: Ehsan Imani, Kai Luedemann, Sam Scholnick-Hughes, Esraa Elelimy, Martha White

Abstract: It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distr… ▽ More It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distribution of the target variable by minimizing the cross-entropy between a target distribution and a flexible histogram prediction. We design theoretical and empirical analyses to determine why and when this performance gain appears, and how different components of the loss contribute to it. Our results suggest that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information. We then demonstrate the viability of the Histogram Loss in common deep learning applications without a need for costly hyperparameter tuning. △ Less

Submitted 19 October, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 52 pages

arXiv:2312.02953 [pdf]

Longitudinal Assessment of Seasonal Impacts and Depression Associations on Circadian Rhythm Using Multimodal Wearable Sensing

Authors: Yuezhou Zhang, Amos A Folarin, Shaoxiong Sun, Nicholas Cummins, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Pauline Conde, Heet Sankesara, Petroula Laiou, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Srinivasan Vairavan, Inez Myin-Germeys, David C. Mohr, Til Wykes, Josep Maria Haro, Peter Annas, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf , et al. (2 additional authors not shown)

Abstract: Objective: This study aimed to explore the associations between depression severity and wearable-measured circadian rhythms, accounting for seasonal impacts and quantifying seasonal changes in circadian rhythms.Materials and Methods: Data used in this study came from a large longitudinal mobile health study. Depression severity (measured biweekly using the 8-item Patient Health Questionnaire [PHQ-… ▽ More Objective: This study aimed to explore the associations between depression severity and wearable-measured circadian rhythms, accounting for seasonal impacts and quantifying seasonal changes in circadian rhythms.Materials and Methods: Data used in this study came from a large longitudinal mobile health study. Depression severity (measured biweekly using the 8-item Patient Health Questionnaire [PHQ-8]) and behaviors (monitored by Fitbit) were tracked for up to two years. Twelve features were extracted from Fitbit recordings to approximate circadian rhythms. Three nested linear mixed-effects models were employed for each feature: (1) incorporating the PHQ-8 score as an independent variable; (2) adding the season variable; and (3) adding an interaction term between season and the PHQ-8 score. Results: This study analyzed 10,018 PHQ-8 records with Fitbit data from 543 participants. Upon adjusting for seasonal effects, higher PHQ-8 scores were associated with reduced activity, irregular behaviors, and delayed rhythms. Notably, the negative association with daily step counts was stronger in summer and spring than in winter, and the positive association with the onset of the most active continuous 10-hour period was significant only during summer. Furthermore, participants had shorter and later sleep, more activity, and delayed circadian rhythms in summer compared to winter. Discussion and Conclusions: Our findings underscore the significant seasonal impacts on human circadian rhythms and their associations with depression and indicate that wearable-measured circadian rhythms have the potential to be the digital biomarkers of depression. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2309.01454 [pdf, other]

Accelerating Markov Chain Monte Carlo sampling with diffusion models

Authors: N. T. Hunt-Smith, W. Melnitchouk, F. Ringer, N. Sato, A. W Thomas, M. J. White

Abstract: Global fits of physics models require efficient methods for exploring high-dimensional and/or multimodal posterior functions. We introduce a novel method for accelerating Markov Chain Monte Carlo (MCMC) sampling by pairing a Metropolis-Hastings algorithm with a diffusion model that can draw global samples with the aim of approximating the posterior. We briefly review diffusion models in the contex… ▽ More Global fits of physics models require efficient methods for exploring high-dimensional and/or multimodal posterior functions. We introduce a novel method for accelerating Markov Chain Monte Carlo (MCMC) sampling by pairing a Metropolis-Hastings algorithm with a diffusion model that can draw global samples with the aim of approximating the posterior. We briefly review diffusion models in the context of image synthesis before providing a streamlined diffusion model tailored towards low-dimensional data arrays. We then present our adapted Metropolis-Hastings algorithm which combines local proposals with global proposals taken from a diffusion model that is regularly trained on the samples produced during the MCMC run. Our approach leads to a significant reduction in the number of likelihood evaluations required to obtain an accurate representation of the Bayesian posterior across several analytic functions, as well as for a physical example based on a global analysis of parton distribution functions. Our method is extensible to other MCMC techniques, and we briefly compare our method to similar approaches based on normalizing flows. A code implementation can be found at https://github.com/NickHunt-Smith/MCMC-diffusion. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 21 pages, 8 figures, 1 table

arXiv:2303.04871 [pdf, other]

doi 10.1007/s41109-023-00598-9

Discovering a change point and piecewise linear structure in a time series of organoid networks via the iso-mirror

Authors: Tianyi Chen, Youngser Park, Ali Saad-Eldin, Zachary Lubberts, Avanti Athreya, Benjamin D. Pedigo, Joshua T. Vogelstein, Francesca Puppo, Gabriel A. Silva, Alysson R. Muotri, Weiwei Yang, Christopher M. White, Carey E. Priebe

Abstract: Recent advancements have been made in the development of cell-based in-vitro neuronal networks, or organoids. In order to better understand the network structure of these organoids, a super-selective algorithm has been proposed for inferring the effective connectivity networks from multi-electrode array data. In this paper, we apply a novel statistical method called spectral mirror estimation to t… ▽ More Recent advancements have been made in the development of cell-based in-vitro neuronal networks, or organoids. In order to better understand the network structure of these organoids, a super-selective algorithm has been proposed for inferring the effective connectivity networks from multi-electrode array data. In this paper, we apply a novel statistical method called spectral mirror estimation to the time series of inferred effective connectivity organoid networks. This method produces a one-dimensional iso-mirror representation of the dynamics of the time series of the networks which exhibits a piecewise linear structure. A classical change point algorithm is then applied to this representation, which successfully detects a change point coinciding with the neuroscientifically significant time inhibitory neurons start appearing and the percentage of astrocytes increases dramatically. This finding demonstrates the potential utility of applying the iso-mirror dynamic structure discovery method to inferred effective connectivity time series of organoid networks. △ Less

Submitted 12 April, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Journal ref: Appl Netw Sci 8, 75 (2023)

arXiv:2203.11992 [pdf, other]

Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum

Authors: Kirby Banman, Liam Peet-Pare, Nidhi Hegde, Alona Fyshe, Martha White

Abstract: Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on iid sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated input samples such as continual learning and reinforcement learning. Existing work has shown that SGDm with a decaying step-size can converge under Markovian temporal correlation. In this work, we show that SGDm u… ▽ More Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on iid sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated input samples such as continual learning and reinforcement learning. Existing work has shown that SGDm with a decaying step-size can converge under Markovian temporal correlation. In this work, we show that SGDm under covariate shift with a fixed step-size can be unstable and diverge. In particular, we show SGDm under covariate shift is a parametric oscillator, and so can suffer from a phenomenon known as resonance. We approximate the learning system as a time varying system of ordinary differential equations, and leverage existing theory to characterize the system's divergence/convergence as resonant/nonresonant modes. The theoretical result is limited to the linear setting with periodic covariate shift, so we empirically supplement this result to show that resonance phenomena persist even under non-periodic covariate shift, nonlinear dynamics with neural networks, and optimizers other than SGDm. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: In International Conference on Learning Representations. 2021

arXiv:2108.13637 [pdf, other]

When are Deep Networks really better than Decision Forests at small sample sizes, and how?

Authors: Haoyin Xu, Kaleab A. Kinfu, Will LeVine, Sambit Panda, Jayanta Dey, Michael Ainsworth, Yu-Chung Peng, Madi Kusmanov, Florian Engert, Christopher M. White, Joshua T. Vogelstein, Carey E. Priebe

Abstract: Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of classifiers on one or two different domains (e.g., on 100 different tabular data settings). However, a careful conceptual and empirical comparison of these two strategies… ▽ More Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of classifiers on one or two different domains (e.g., on 100 different tabular data settings). However, a careful conceptual and empirical comparison of these two strategies using the most contemporary best practices has yet to be performed. Conceptually, we illustrate that both can be profitably viewed as "partition and vote" schemes. Specifically, the representation space that they both learn is a partitioning of feature space into a union of convex polytopes. For inference, each decides on the basis of votes from the activated nodes. This formulation allows for a unified basic understanding of the relationship between these methods. Empirically, we compare these two strategies on hundreds of tabular data settings, as well as several vision and auditory settings. Our focus is on datasets with at most 10,000 samples, which represent a large fraction of scientific and biomedical datasets. In general, we found forests to excel at tabular and structured data (vision and audition) with small sample sizes, whereas deep nets performed better on structured data with larger sample sizes. This suggests that further gains in both scenarios may be realized via further combining aspects of forests and networks. We will continue revising this technical report in the coming months with updated results. △ Less

Submitted 2 November, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

arXiv:2106.12621 [pdf, other]

Leveraging semantically similar queries for ranking via combining representations

Authors: Hayden S. Helm, Marah Abdin, Benjamin D. Pedigo, Shweti Mahajan, Vince Lyzinski, Youngser Park, Amitabh Basu, Piali~Choudhury, Christopher M. White, Weiwei Yang, Carey E. Priebe

Abstract: In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of l… ▽ More In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function. One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries. Indeed, as we demonstrate in simulation settings and real data examples, when semantically similar queries are available it is possible to gainfully use them when ranking with respect to a particular query. We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2105.14027 [pdf, other]

doi 10.21468/SciPostPhys.12.1.043

The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider

Authors: T. Aarrestad, M. van Beekveld, M. Bona, A. Boveia, S. Caron, J. Davies, A. De Simone, C. Doglioni, J. M. Duarte, A. Farbin, H. Gupta, L. Hendriks, L. Heinrich, J. Howarth, P. Jawahar, A. Jueid, J. Lastow, A. Leinweber, J. Mamuzic, E. Merényi, A. Morandini, P. Moskvitina, C. Nellist, J. Ngadiuba, B. Ostdiek , et al. (14 additional authors not shown)

Abstract: We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We defin… ▽ More We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 Billion simulated LHC events corresponding to $10~\rm{fb}^{-1}$ of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge. △ Less

Submitted 9 December, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: v1: 54 pages, 24 figures. v2: 56 pages, citations added, extend discussion of look-elsewhere-effect, results unchanged; v3. minor typos and updated references

Journal ref: SciPost Phys. 12, 043 (2022)

arXiv:2011.06557 [pdf, other]

A partition-based similarity for classification distributions

Authors: Hayden S. Helm, Ronak D. Mehta, Brandon Duderstadt, Weiwei Yang, Christoper M. White, Ali Geisa, Joshua T. Vogelstein, Carey E. Priebe

Abstract: Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a… ▽ More Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a source distribution performs when applied to inference related to a target distribution. The definition of task similarity allows for natural definitions of adversarial and orthogonal distributions. We highlight limiting properties of representations induced by (universally) consistent decision rules and demonstrate in simulation that an empirical estimate of task similarity is a function of the decision rule deployed for inference. We demonstrate that for a given target distribution, both transfer efficiency and semantic similarity of candidate source distributions correlate with empirical task similarity. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2007.06414 [pdf, other]

Epidemic modelling of bovine tuberculosis in cattle herds and badgers in Ireland

Authors: L. M. White, G. E. Kelly

Abstract: Bovine tuberculosis, a disease that affects cattle and badgers in Ireland, was studied via stochastic epidemic modeling using incidence data from the Four Area Project (Griffin et al., 2005). The Four Area Project was a large scale field trial conducted in four diverse farming regions of Ireland over a five-year period (1997-2002) to evaluate the impact of badger culling on bovine tuberculosis inc… ▽ More Bovine tuberculosis, a disease that affects cattle and badgers in Ireland, was studied via stochastic epidemic modeling using incidence data from the Four Area Project (Griffin et al., 2005). The Four Area Project was a large scale field trial conducted in four diverse farming regions of Ireland over a five-year period (1997-2002) to evaluate the impact of badger culling on bovine tuberculosis incidence in cattle herds. Based on the comparison of several models, the model with no between-herd transmission and badger-to-herd transmission proportional to the total number of infected badgers culled was best supported by the data. Detailed model validation was conducted via model prediction, identifiability checks and sensitivity analysis. The results suggest that badger-to-cattle transmission is of more importance than between-herd transmission and that if there was no badger-to-herd transmission, levels of bovine tuberculosis in cattle herds in Ireland could decrease considerably. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: 32 pages, 2 figures

arXiv:2007.03807 [pdf, other]

Towards a practical measure of interference for reinforcement learning

Authors: Vincent Liu, Adam White, Hengshuai Yao, Martha White

Abstract: Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. But, before we overcome interference we must understand it better. In this work, we provide a definition of interference for control in reinforcement learning. We systematically evaluate our new measures, by assessing correlation with several measures of learning performance, inc… ▽ More Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. But, before we overcome interference we must understand it better. In this work, we provide a definition of interference for control in reinforcement learning. We systematically evaluate our new measures, by assessing correlation with several measures of learning performance, including stability, sample efficiency, and online and offline control performance across a variety of learning architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures. In particular we show that target network frequency is a dominating factor for interference, and that updates on the last layer result in significantly higher interference than updates internal to the network. This new measure can be expensive to compute; we conclude with motivation for an efficient proxy measure and empirically demonstrate it is correlated with our definition of interference. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: 18 pages

arXiv:2007.02418 [pdf, other]

Selective Dyna-style Planning Under Limited Model Capacity

Authors: Zaheer Abbas, Samuel Sokota, Erin J. Talvitie, Martha White

Abstract: In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this paper, we investigate the idea of using an imperfect model selectively. The agent should plan in parts of the state space where the model would be helpful but… ▽ More In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this paper, we investigate the idea of using an imperfect model selectively. The agent should plan in parts of the state space where the model would be helpful but refrain from using the model where it would be harmful. An effective selective planning mechanism requires estimating predictive uncertainty, which arises out of aleatoric uncertainty, parameter uncertainty, and model inadequacy, among other sources. Prior work has focused on parameter uncertainty for selective planning. In this work, we emphasize the importance of model inadequacy. We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation. △ Less

Submitted 7 March, 2021; v1 submitted 5 July, 2020; originally announced July 2020.

Comments: Accepted at ICML 2020

arXiv:2007.00611 [pdf, other]

Gradient Temporal-Difference Learning with Regularized Corrections

Authors: Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

Abstract: It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an ea… ▽ More It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an easy to use and performant TD method, or a more complex algorithm that is more sound but harder to tune and all but unexplored with non-linear function approximation or control. In this paper, we introduce a new method called TD with Regularized Corrections (TDRC), that attempts to balance ease of use, soundness, and performance. It behaves as well as TD, when TD performs well, but is sound in cases where TD diverges. We empirically investigate TDRC across a range of problems, for both prediction and control, and for both linear and non-linear function approximation, and show, potentially for the first time, that gradient TD methods could be a better alternative to TD and Q-learning. △ Less

Submitted 17 September, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: Appeared in Proceedings of the 37th International Conference on Machine Learning (ICML2020)

arXiv:2006.07461 [pdf, other]

Learning Causal Models Online

Authors: Khurram Javed, Martha White, Yoshua Bengio

Abstract: Predictive models -- learned from observational data not covering the complete data distribution -- can rely on spurious correlations in the data for making predictions. These correlations make the models brittle and hinder generalization. One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations… ▽ More Predictive models -- learned from observational data not covering the complete data distribution -- can rely on spurious correlations in the data for making predictions. These correlations make the models brittle and hinder generalization. One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them. However, learning these structures is a hard problem in itself. Moreover, it's not clear how to incorporate the machinery of causality with online continual learning. In this work, we take an indirect approach to discovering causal models. Instead of searching for the true causal model directly, we propose an online algorithm that continually detects and removes spurious features. Our algorithm works on the idea that the correlation of a spurious feature with a target is not constant over-time. As a result, the weight associated with that feature is constantly changing. We show that by continually removing such features, our method converges to solutions that have strong generalization. Moreover, our method combined with random search can also discover non-spurious features from raw sensory data. Finally, our work highlights that the information present in the temporal structure of the problem -- destroyed by shuffling the data -- is essential for detecting spurious features online. △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: Spurious features, causal models, online learning, random search, non-iid

arXiv:2006.04363 [pdf, other]

Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

Authors: Taher Jafferjee, Ehsan Imani, Erin Talvitie, Martha White, Micheal Bowling

Abstract: Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we investigate one type of model error: hallucinated s… ▽ More Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we investigate one type of model error: hallucinated states. These are states generated by the model, but that are not real states of the environment. We present the Hallucinated Value Hypothesis (HVH): updating values of real states towards values of hallucinated states results in misleading state-action values which adversely affect the control policy. We discuss and evaluate four Dyna variants; three which update real states toward simulated -- and therefore potentially hallucinated -- states and one which does not. The experimental results provide evidence for the HVH thus suggesting a fruitful direction toward developing Dyna algorithms robust to model error. △ Less

Submitted 8 June, 2020; originally announced June 2020.

Comments: 9 pages, 7 figures,

arXiv:2005.08158 [pdf, other]

Optimizing for the Future in Non-Stationary MDPs

Authors: Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas

Abstract: Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. To proactively search for a good future policy, we present a policy grad… ▽ More Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. To proactively search for a good future policy, we present a policy gradient algorithm that maximizes a forecast of future performance. This forecast is obtained by fitting a curve to the counter-factual estimates of policy performance over time, without explicitly modeling the underlying non-stationarity. The resulting algorithm amounts to a non-uniform reweighting of past data, and we observe that minimizing performance over some of the data from past episodes can be beneficial when searching for a policy that maximizes future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques, on three simulated problems motivated by real-world applications. △ Less

Submitted 21 September, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

Comments: Thirty-seventh International Conference on Machine Learning (ICML 2020)

arXiv:2004.12908 [pdf, other]

Simple Lifelong Learning Machines

Authors: Jayanta Dey, Joshua T. Vogelstein, Hayden S. Helm, Will LeVine, Ronak D. Mehta, Tyler M. Tomita, Haoyin Xu, Ali Geisa, Qingyang Wang, Gido M. van de Ven, Chenyu Gao, Bryan Tower, Jonathan Larson, Christopher M. White, Carey E. Priebe

Abstract: In lifelong learning, data are used to improve performance not only on the present task, but also on past and future (unencountered) tasks. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain perf… ▽ More In lifelong learning, data are used to improve performance not only on the present task, but also on past and future (unencountered) tasks. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain performance on old tasks given new tasks. But striving to avoid forgetting sets the goal unnecessarily low. The goal of lifelong learning should be to use data to improve performance on both future tasks (forward transfer) and past tasks (backward transfer). In this paper, we show that a simple approach -- representation ensembling -- demonstrates both forward and backward transfer in a variety of simulated and benchmark data scenarios, including tabular, vision (CIFAR-100, 5-dataset, Split Mini-Imagenet, and Food1k), and speech (spoken digit), in contrast to various reference algorithms, which typically failed to transfer either forward or backward, or both. Moreover, our proposed approach can flexibly operate with or without a computational budget. △ Less

Submitted 20 April, 2025; v1 submitted 27 April, 2020; originally announced April 2020.

arXiv:2002.06195 [pdf, other]

An implicit function learning approach for parametric modal regression

Authors: Yangchen Pan, Ehsan Imani, Martha White, Amir-massoud Farahmand

Abstract: For multi-valued functions---such as when the conditional distribution on targets given the inputs is multi-modal---standard regression approaches are not always desirable because they provide the conditional mean. Modal regression algorithms address this issue by instead finding the conditional mode(s). Most, however, are nonparametric approaches and so can be difficult to scale. Further, paramet… ▽ More For multi-valued functions---such as when the conditional distribution on targets given the inputs is multi-modal---standard regression approaches are not always desirable because they provide the conditional mean. Modal regression algorithms address this issue by instead finding the conditional mode(s). Most, however, are nonparametric approaches and so can be difficult to scale. Further, parametric approximators, like neural networks, facilitate learning complex relationships between inputs and targets. In this work, we propose a parametric modal regression algorithm. We use the implicit function theorem to develop an objective, for learning a joint function over inputs and targets. We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions. We demonstrate that our method is competitive in a real-world modal regression problem and two regular regression datasets. △ Less

Submitted 29 October, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: Accepted to NeurIPS 2020

arXiv:1910.01705 [pdf, other]

Is Fast Adaptation All You Need?

Authors: Khurram Javed, Hengshuai Yao, Martha White

Abstract: Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples. The core idea behind these approaches is to use fast adaptation and generalization -- two second-order metrics -- as training signals on a meta-training dataset. However, little attention has been given to other possible… ▽ More Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples. The core idea behind these approaches is to use fast adaptation and generalization -- two second-order metrics -- as training signals on a meta-training dataset. However, little attention has been given to other possible second-order metrics. In this paper, we investigate a different training signal -- robustness to catastrophic interference -- and demonstrate that representations learned by directing minimizing interference are more conducive to incremental learning than those learned by just maximizing fast adaptation. △ Less

Submitted 3 October, 2019; originally announced October 2019.

Comments: Meta Learning Workshop, NeurIPS 2019, 2 figures, MRCL, MAML

arXiv:1909.01936 [pdf, other]

State Drug Policy Effectiveness: Comparative Policy Analysis of Drug Overdose Mortality

Authors: Jarrod Olson, Po-Hsu Allen Chen, Marissa White, Nicole Brennan, Ning Gong

Abstract: Opioid overdose rates have reached an epidemic level and state-level policy innovations have followed suit in an effort to prevent overdose deaths. State-level drug law is a set of policies that may reinforce or undermine each other, and analysts have a limited set of tools for handling the policy collinearity using statistical methods. This paper uses a machine learning method called hierarchical… ▽ More Opioid overdose rates have reached an epidemic level and state-level policy innovations have followed suit in an effort to prevent overdose deaths. State-level drug law is a set of policies that may reinforce or undermine each other, and analysts have a limited set of tools for handling the policy collinearity using statistical methods. This paper uses a machine learning method called hierarchical clustering to empirically generate "policy bundles" by grouping states with similar sets of policies in force at a given time together for analysis in a 50-state, 10-year interrupted time series regression with drug overdose deaths as the dependent variable. Policy clusters were generated from 138 binomial variables observed by state and year from the Prescription Drug Abuse Policy System. Clustering reduced the policies to a set of 10 bundles. The approach allows for ranking of the relative effect of different bundles and is a tool to recommend those most likely to succeed. This study shows that a set of policies balancing Medication Assisted Treatment, Naloxone Access, Good Samaritan Laws, Medication Assisted Treatment, Prescription Drug Monitoring Programs and legalization of medical marijuana leads to a reduced number of overdose deaths, but not until its second year in force. △ Less

Submitted 5 October, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

MSC Class: 62H12

arXiv:1907.07751 [pdf, other]

Meta-descent for Online, Continual Prediction

Authors: Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam White, Martha White

Abstract: This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order upda… ▽ More This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update---a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the step-size parameters to minimize prediction error. These meta-descent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental meta-descent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot. △ Less

Submitted 13 December, 2019; v1 submitted 17 July, 2019; originally announced July 2019.

Comments: AAAI Conference on Artificial Intelligence 2019. v2: Correction to Baird's counterexample. A bug in the code lead to results being reported for AMSGrad in this experiment, when they were actually results for Adam

arXiv:1906.07865 [pdf, other]

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study

Authors: Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White

Abstract: Learning about many things can provide numerous benefits to a reinforcement learning system. For example, learning many auxiliary value functions, in addition to optimizing the environmental reward, appears to improve both exploration and representation learning. The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the learning system's behavior---to optimi… ▽ More Learning about many things can provide numerous benefits to a reinforcement learning system. For example, learning many auxiliary value functions, in addition to optimizing the environmental reward, appears to improve both exploration and representation learning. The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the learning system's behavior---to optimize the learning of a collection of value functions. A simple answer is to compute an intrinsic reward based on the statistics of each auxiliary learner, and use reinforcement learning to maximize that intrinsic reward. Unfortunately, implementing this simple idea has proven difficult, and thus has been the focus of decades of study. It remains unclear which of the many possible measures of learning would work well in a parallel learning setting where environmental reward is extremely sparse or absent. In this paper, we investigate and compare different intrinsic reward mechanisms in a new bandit-like parallel-learning testbed. We discuss the interaction between reward and prediction learners and highlight the importance of introspective prediction learners: those that increase their rate of learning when progress is possible, and decrease when it is not. We provide a comprehensive empirical comparison of 14 different rewards, including well-known ideas from reinforcement learning and active learning. Our results highlight a simple but seemingly powerful principle: intrinsic rewards based on the amount of learning can generate useful behavior, if each individual learner is introspective. △ Less

Submitted 21 August, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

arXiv:1906.07791 [pdf, other]

Hill Climbing on Value Estimates for Search-control in Dyna

Authors: Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White

Abstract: Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search-control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtaine… ▽ More Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search-control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high-value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy projected natural gradient algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC-Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search-control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region. △ Less

Submitted 4 July, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: IJCAI 2019

arXiv:1906.04328 [pdf, other]

Importance Resampling for Off-policy Prediction

Authors: Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White

Abstract: Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning. While it is consistent and unbiased, it can result in high variance updates to the weights for the value function. In this work, we explore a resampling strategy as an alternative to reweighting. We propose Importance Resampling (IR) for off-policy prediction, which resamples experience f… ▽ More Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning. While it is consistent and unbiased, it can result in high variance updates to the weights for the value function. In this work, we explore a resampling strategy as an alternative to reweighting. We propose Importance Resampling (IR) for off-policy prediction, which resamples experience from a replay buffer and applies standard on-policy updates. The approach avoids using importance sampling ratios in the update, instead correcting the distribution before the update. We characterize the bias and consistency of IR, particularly compared to Weighted IS (WIS). We demonstrate in several microworlds that IR has improved sample efficiency and lower variance updates, as compared to IS and several variance-reduced IS strategies, including variants of WIS and V-trace which clips IS ratios. We also provide a demonstration showing IR improves over IS for learning a value function from images in a racing car simulator. △ Less

Submitted 13 November, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: Recently published in NeurIPS 2019

arXiv:1905.12588 [pdf, other]

Meta-Learning Representations for Continual Learning

Authors: Khurram Javed, Martha White

Abstract: A continual learning agent should be able to build on top of existing knowledge to learn on new data quickly while minimizing forgetting. Current intelligent systems based on neural network function approximators arguably do the opposite---they are highly prone to forgetting and rarely trained to facilitate future learning. One reason for this poor behavior is that they learn from a representation… ▽ More A continual learning agent should be able to build on top of existing knowledge to learn on new data quickly while minimizing forgetting. Current intelligent systems based on neural network function approximators arguably do the opposite---they are highly prone to forgetting and rarely trained to facilitate future learning. One reason for this poor behavior is that they learn from a representation that is not explicitly trained for these two goals. In this paper, we propose OML, an objective that directly minimizes catastrophic interference by learning representations that accelerate future learning and are robust to forgetting under online updates in continual learning. We show that it is possible to learn naturally sparse representations that are more effective for online updating. Moreover, our algorithm is complementary to existing continual learning strategies, such as MER and GEM. Finally, we demonstrate that a basic online updating strategy on representations learned by OML is competitive with rehearsal based methods for continual learning. We release an implementation of our method at https://github.com/khurramjaved96/mrcl . △ Less

Submitted 30 October, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: Accepted at NeurIPS19, 15 pages, 10 figures, open-source, representation learning, continual learning, online learning

arXiv:1904.01191 [pdf, other]

Planning with Expectation Models

Authors: Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton

Abstract: Distribution and sample models are two popular model choices in model-based reinforcement learning (MBRL). However, learning these models can be intractable, particularly when the state and action spaces are large. Expectation models, on the other hand, are relatively easier to learn due to their compactness and have also been widely used for deterministic environments. For stochastic environments… ▽ More Distribution and sample models are two popular model choices in model-based reinforcement learning (MBRL). However, learning these models can be intractable, particularly when the state and action spaces are large. Expectation models, on the other hand, are relatively easier to learn due to their compactness and have also been widely used for deterministic environments. For stochastic environments, it is not obvious how expectation models can be used for planning as they only partially characterize a distribution. In this paper, we propose a sound way of using approximate expectation models for MBRL. In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm. △ Less

Submitted 29 July, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

arXiv:1812.00914 [pdf, other]

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

Authors: Minghan Li, Tanli Zuo, Ruicheng Li, Martha White, Weishi Zheng

Abstract: Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student. However, just like massive classification, large scale knowledge distillation also imposes heavy computational costs on training models of deep neural networks, as the softmax activations at the last layer involve computing probabilities over numerous classes. In this work, we… ▽ More Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student. However, just like massive classification, large scale knowledge distillation also imposes heavy computational costs on training models of deep neural networks, as the softmax activations at the last layer involve computing probabilities over numerous classes. In this work, we apply the idea of importance sampling which is often used in Neural Machine Translation on large scale knowledge distillation. We present a method called dynamic importance sampling, where ranked classes are sampled from a dynamic distribution derived from the interaction between the teacher and student in full distillation. We highlight the utility of our proposal prior which helps the student capture the main information in the loss function. Our approach manages to reduce the computational cost at training time while maintaining the competitive performance on CIFAR-100 and Market-1501 person re-identification datasets. △ Less

Submitted 3 December, 2018; originally announced December 2018.

arXiv:1811.09013 [pdf, other]

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Authors: Ehsan Imani, Eric Graves, Martha White

Abstract: Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting. There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient. In off-policy learning, however, where the behaviour policy is not necessaril… ▽ More Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting. There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient. In off-policy learning, however, where the behaviour policy is not necessarily attempting to learn and follow the optimal policy for the given task, the existence of such a theorem has been elusive. In this work, we solve this open problem by providing the first off-policy policy gradient theorem. The key to the derivation is the use of $emphatic$ $weightings$. We develop a new actor-critic algorithm$\unicode{x2014}$called Actor Critic with Emphatic weightings (ACE)$\unicode{x2014}$that approximates the simplified gradients provided by the theorem. We demonstrate in a simple counterexample that previous off-policy policy gradient methods$\unicode{x2014}$particularly OffPAC and DPG$\unicode{x2014}$converge to the wrong solution whereas ACE finds the optimal solution. △ Less

Submitted 20 June, 2019; v1 submitted 21 November, 2018; originally announced November 2018.

Comments: Updated to final NeurIPS version

arXiv:1811.06626 [pdf, other]

The Utility of Sparse Representations for Control in Reinforcement Learning

Authors: Vincent Liu, Raksha Kumaraswamy, Lei Le, Martha White

Abstract: We investigate sparse representations for control in reinforcement learning. While these representations are widely used in computer vision, their prevalence in reinforcement learning is limited to sparse coding where extracting representations for new data can be computationally intensive. Here, we begin by demonstrating that learning a control policy incrementally with a representation from a st… ▽ More We investigate sparse representations for control in reinforcement learning. While these representations are widely used in computer vision, their prevalence in reinforcement learning is limited to sparse coding where extracting representations for new data can be computationally intensive. Here, we begin by demonstrating that learning a control policy incrementally with a representation from a standard neural network fails in classic control domains, whereas learning with a representation obtained from a neural network that has sparsity properties enforced is effective. We provide evidence that the reason for this is that the sparse representation provides locality, and so avoids catastrophic interference, and particularly keeps consistent, stable values for bootstrapping. We then discuss how to learn such sparse representations. We explore the idea of Distributional Regularizers, where the activation of hidden nodes is encouraged to match a particular distribution that results in sparse activation across time. We identify a simple but effective way to obtain sparse representations, not afforded by previously proposed strategies, making it more practical for further investigation into sparse representations for reinforcement learning. △ Less

Submitted 15 November, 2018; originally announced November 2018.

Comments: Association for the Advancement of Artificial Intelligence 2019

arXiv:1811.02597 [pdf, other]

Online Off-policy Prediction

Authors: Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

Abstract: This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the prediction… ▽ More This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the predictions, and thus the samples are generated off-policy. The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades. The issue lies with the temporal difference (TD) learning update at the heart of most prediction algorithms: combining bootstrapping, off-policy sampling and function approximation may cause the value estimate to diverge. A breakthrough came with the development of a new objective function that admitted stochastic gradient descent variants of TD. Since then, many sound online off-policy prediction algorithms have been developed, but there has been limited empirical work investigating the relative merits of all the variants. This paper aims to fill these empirical gaps and provide clarity on the key ideas behind each method. We summarize the large body of literature on off-policy learning, focusing on 1- methods that use computation linear in the number of features and are convergent under off-policy sampling, and 2- other methods which have proven useful with non-fixed, nonlinear function approximation. We provide an empirical study of off-policy prediction methods in two challenging microworlds. We report each method's parameter sensitivity, empirical convergence rate, and final performance, providing new insights that should enable practitioners to successfully extend these new methods to large-scale applications.[Abridged abstract] △ Less

Submitted 6 November, 2018; originally announced November 2018.

Comments: 68 pages

arXiv:1810.09103 [pdf, other]

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

Authors: Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White

Abstract: Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is learned to facilitate updating the parameterized policy (actor). The update to the actor involves a log-likelihood update weighted by the action-values, with the addition of entropy regularization for soft variants. In this work, we explore an alternative update for the actor, based on an extension o… ▽ More Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is learned to facilitate updating the parameterized policy (actor). The update to the actor involves a log-likelihood update weighted by the action-values, with the addition of entropy regularization for soft variants. In this work, we explore an alternative update for the actor, based on an extension of the cross entropy method (CEM) to condition on inputs (states). The idea is to start with a broader policy and slowly concentrate around maximal actions, using a maximum likelihood update towards actions in the top percentile per state. The speed of this concentration is controlled by a proposal policy, that concentrates at a slower rate than the actor. We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values. We empirically show that our Greedy AC algorithm, that uses CCEM for the actor update, performs better than Soft Actor-Critic and is much less sensitive to entropy-regularization. △ Less

Submitted 28 February, 2023; v1 submitted 22 October, 2018; originally announced October 2018.

Comments: 27 pages, 8 figures

arXiv:1808.09127 [pdf, other]

High-confidence error estimates for learned value functions

Authors: Touqir Sajed, Wesley Chung, Martha White

Abstract: Estimating the value function for a fixed policy is a fundamental problem in reinforcement learning. Policy evaluation algorithms---to estimate value functions---continue to be developed, to improve convergence rates, improve stability and handle variability, particularly for off-policy learning. To understand the properties of these algorithms, the experimenter needs high-confidence estimates of… ▽ More Estimating the value function for a fixed policy is a fundamental problem in reinforcement learning. Policy evaluation algorithms---to estimate value functions---continue to be developed, to improve convergence rates, improve stability and handle variability, particularly for off-policy learning. To understand the properties of these algorithms, the experimenter needs high-confidence estimates of the accuracy of the learned value functions. For environments with small, finite state-spaces, like chains, the true value function can be easily computed, to compute accuracy. For large, or continuous state-spaces, however, this is no longer feasible. In this paper, we address the largely open problem of how to obtain these high-confidence estimates, for general state-spaces. We provide a high-confidence bound on an empirical estimate of the value error to the true value error. We use this bound to design an offline sampling algorithm, which stores the required quantities to repeatedly compute value error estimates for any learned value function. We provide experiments investigating the number of samples required by this offline algorithm in simple benchmark reinforcement learning domains, and highlight that there are still many open questions to be solved for this important problem. △ Less

Submitted 28 August, 2018; originally announced August 2018.

Comments: Presented at (UAI) Uncertainty in Artificial Intelligence 2018

arXiv:1807.06763 [pdf, other]

doi 10.1613/jair.1.12105

General Value Function Networks

Authors: Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White

Abstract: State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions… ▽ More State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions and decision-making. At the same time, specifying and training RNNs is notoriously tricky, particularly as the common strategy to approximate gradients back in time, called truncated Back-prop Through Time (BPTT), can be sensitive to the truncation window. Further, domain-expertise--which can usually help constrain the function class and so improve trainability--can be difficult to incorporate into complex recurrent units used within RNNs. In this work, we explore how to use multi-step predictions to constrain the RNN and incorporate prior knowledge. In particular, we revisit the idea of using predictions to construct state and ask: does constraining (parts of) the state to consist of predictions about the future improve RNN trainability? We formulate a novel RNN architecture, called a General Value Function Network (GVFN), where each internal state component corresponds to a prediction about the future represented as a value function. We first provide an objective for optimizing GVFNs, and derive several algorithms to optimize this objective. We then show that GVFNs are more robust to the truncation level, in many cases only requiring one-step gradient updates. △ Less

Submitted 2 February, 2021; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: Published in the Journal of Artificial Intelligence Research

Journal ref: Journal of Artificial Intelligence Research, 70, 497-543 (2021)

arXiv:1806.06931 [pdf, other]

Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

Authors: Yangchen Pan, Amir-massoud Farahmand, Martha White, Saleh Nabi, Piyush Grover, Daniel Nikovski

Abstract: Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, wh… ▽ More Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, which encode regularities among spatially-extended action dimensions and enable the agent to control high-dimensional action PDEs. We provide theoretical evidence suggesting that this approach can be more sample efficient compared to a conventional approach that treats each action dimension separately and does not explicitly exploit the spatial regularity of the action space. The action descriptor approach is then used within the deep deterministic policy gradient algorithm. Experiments on two PDE control problems, with up to 256-dimensional continuous actions, show the advantage of the proposed approach over the conventional one. △ Less

Submitted 12 June, 2018; originally announced June 2018.

Comments: ICML2018

arXiv:1806.04613 [pdf, other]

Improving Regression Performance with Distributional Losses

Authors: Ehsan Imani, Martha White

Abstract: There is growing evidence that converting targets to soft targets in supervised learning can provide considerable gains in performance. Much of this work has considered classification, converting hard zero-one values to soft labels---such as by adding label noise, incorporating label ambiguity or using distillation. In parallel, there is some evidence from a regression setting in reinforcement lea… ▽ More There is growing evidence that converting targets to soft targets in supervised learning can provide considerable gains in performance. Much of this work has considered classification, converting hard zero-one values to soft labels---such as by adding label noise, incorporating label ambiguity or using distillation. In parallel, there is some evidence from a regression setting in reinforcement learning that learning distributions can improve performance. In this work, we investigate the reasons for this improvement, in a regression setting. We introduce a novel distributional regression loss, and similarly find it significantly improves prediction accuracy. We investigate several common hypotheses, around reducing overfitting and improved representations. We instead find evidence for an alternative hypothesis: this loss is easier to optimize, with better behaved gradients, resulting in improved generalization. We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss. △ Less

Submitted 12 June, 2018; originally announced June 2018.

Comments: 12 pages, 4 figures. To appear in Proceedings of the 35th International Conference on Machine Learning, 2018

arXiv:1707.08316 [pdf, other]

Learning Sparse Representations in Reinforcement Learning with Sparse Coding

Authors: Lei Le, Raksha Kumaraswamy, Martha White

Abstract: A variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations. In this work, we develop a supervised sparse co… ▽ More A variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations. In this work, we develop a supervised sparse coding objective for policy evaluation. Despite the non-convexity of this objective, we prove that all local minima are global minima, making the approach amenable to simple optimization strategies. We empirically show that it is key to use a supervised objective, rather than the more straightforward unsupervised sparse coding approach. We compare the learned representations to a canonical fixed sparse representation, called tile-coding, demonstrating that the sparse coding representation outperforms a wide variety of tilecoding representations. △ Less

Submitted 26 July, 2017; originally announced July 2017.

Comments: 6(+1) pages, 2 figures, International Joint Conference on Artificial Intelligence 2017

arXiv:1702.00518 [pdf, other]

Recovering True Classifier Performance in Positive-Unlabeled Learning

Authors: Shantanu Jain, Martha White, Predrag Radivojac

Abstract: A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data. This strategy is in fact known to give an optimal classifier under mild conditions; however, it results in biased empirical estimates of the classifier performance. In this work, we show that the typically used performance measures such as the receiver operating characteristic cu… ▽ More A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data. This strategy is in fact known to give an optimal classifier under mild conditions; however, it results in biased empirical estimates of the classifier performance. In this work, we show that the typically used performance measures such as the receiver operating characteristic curve, or the precision-recall curve obtained on such data can be corrected with the knowledge of class priors; i.e., the proportions of the positive and negative examples in the unlabeled data. We extend the results to a noisy setting where some of the examples labeled positive are in fact negative and show that the correction also requires the knowledge of the proportion of noisy examples in the labeled positives. Using state-of-the-art algorithms to estimate the positive class prior and the proportion of noise, we experimentally evaluate two correction approaches and demonstrate their efficacy on real-life data. △ Less

Submitted 1 February, 2017; originally announced February 2017.

Comments: Full paper with supplement

arXiv:1611.09328 [pdf, other]

Accelerated Gradient Temporal Difference Learning

Authors: Yangchen Pan, Adam White, Martha White

Abstract: The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD(λ) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic… ▽ More The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD(λ) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic developments have yielded several sub-quadratic methods that use an approximation to the least squares TD solution, but incur bias. In this paper, we propose a new family of accelerated gradient TD (ATD) methods that (1) provide similar data efficiency benefits to least-squares methods, at a fraction of the computation and storage (2) significantly reduce parameter sensitivity compared to linear TD methods, and (3) are asymptotically unbiased. We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain. △ Less

Submitted 9 March, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

Comments: AAAI Conference on Artificial Intelligence (AAAI), 2017

arXiv:1607.00446 [pdf, other]

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

Authors: Martha White, Adam White

Abstract: One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key strategies in scaling up reinforcement learning algorithms. In this setting, we have effective and reasonably well understood algorithms for adapting the learning-… ▽ More One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key strategies in scaling up reinforcement learning algorithms. In this setting, we have effective and reasonably well understood algorithms for adapting the learning-rate parameter, online during learning. Such meta-learning approaches can improve robustness of learning and enable specialization to current task, improving learning speed. For temporal-difference learning algorithms which we study here, there is yet another parameter, $λ$, that similarly impacts learning speed and stability in practice. Unfortunately, unlike the learning-rate parameter, $λ$ parametrizes the objective function that temporal-difference methods optimize. Different choices of $λ$ produce different fixed-point solutions, and thus adapting $λ$ online and characterizing the optimization is substantially more complex than adapting the learning-rate parameter. There are no meta-learning method for $λ$ that can achieve (1) incremental updating, (2) compatibility with function approximation, and (3) maintain stability of learning under both on and off-policy sampling. In this paper we contribute a novel objective function for optimizing $λ$ as a function of state rather than time. We derive a new incremental, linear complexity $λ$-adaption algorithm that does not require offline batch updating or access to a model of the world, and present a suite of experiments illustrating the practicality of our new algorithm in three different settings. Taken together, our contributions represent a concrete step towards black-box application of temporal-difference learning methods in real world problems. △ Less

Submitted 24 October, 2016; v1 submitted 1 July, 2016; originally announced July 2016.

arXiv:1606.08561 [pdf, other]

Estimating the class prior and posterior from noisy positives and unlabeled data

Authors: Shantanu Jain, Martha White, Predrag Radivojac

Abstract: We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data. In recent years, several algorithms have been proposed to learn from positive-unlabeled data; however, many of these contributions remain theoretical, performing poorly on real high-dimensional data that i… ▽ More We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data. In recent years, several algorithms have been proposed to learn from positive-unlabeled data; however, many of these contributions remain theoretical, performing poorly on real high-dimensional data that is typically contaminated with noise. We build on this previous work to develop two practical classification algorithms that explicitly model the noise in the positive labels and utilize univariate transforms built on discriminative classifiers. We prove that these univariate transforms preserve the class prior, enabling estimation in the univariate space and avoiding kernel density estimation for high-dimensional data. The theoretical development and both parametric and nonparametric algorithms proposed here constitutes an important step towards wide-spread use of robust classification algorithms for positive-unlabeled data. △ Less

Submitted 31 January, 2017; v1 submitted 28 June, 2016; originally announced June 2016.

Comments: Fixed a typo in the MSGMM update equations in the appendix. Other minor changes

arXiv:1604.04942 [pdf, other]

Identifying global optimality for dictionary learning

Authors: Lei Le, Martha White

Abstract: Learning new representations of input observations in machine learning is often tackled using a factorization of the data. For many such problems, including sparse coding and matrix completion, learning these factorizations can be difficult, in terms of efficiency and to guarantee that the solution is a global minimum. Recently, a general class of objectives have been introduced-which we term indu… ▽ More Learning new representations of input observations in machine learning is often tackled using a factorization of the data. For many such problems, including sparse coding and matrix completion, learning these factorizations can be difficult, in terms of efficiency and to guarantee that the solution is a global minimum. Recently, a general class of objectives have been introduced-which we term induced dictionary learning models (DLMs)-that have an induced convex form that enables global optimization. Though attractive theoretically, this induced form is impractical, particularly for large or growing datasets. In this work, we investigate the use of practical alternating minimization algorithms for induced DLMs, that ensure convergence to global optima. We characterize the stationary points of these models, and, using these insights, highlight practical choices for the objectives. We then provide theoretical and empirical evidence that alternating minimization, from a random initialization, converges to global minima for a large subclass of induced DLMs. In particular, we take advantage of the existence of the (potentially unknown) convex induced form, to identify when stationary points are global minima for the dictionary learning objective. We then provide an empirical investigation into practical optimization choices for using alternating minimization for induced DLMs, for both batch and stochastic gradient descent. △ Less

Submitted 6 August, 2017; v1 submitted 17 April, 2016; originally announced April 2016.

Comments: Updates to previous version include a small modification to Proposition 2, to only use normed regularizers, and a modification to the main theorem (previously Theorem 13) to focus on the overcomplete, full rank setting and to better characterize non-differentiable induced regularizers. The theory has been significantly modified since version 1

arXiv:1602.08771 [pdf, other]

Investigating practical linear temporal difference learning

Authors: Adam White, Martha White

Abstract: Off-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policy-evaluation algorithms that fill a longstanding algorithmic void in reinforcement learning: combining robustness to off-policy sampling, function approximatio… ▽ More Off-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policy-evaluation algorithms that fill a longstanding algorithmic void in reinforcement learning: combining robustness to off-policy sampling, function approximation, linear complexity, and temporal difference (TD) updates. This paper contains two main contributions. First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms. Second, we perform an empirical comparison to elicit which of these new linear TD methods should be preferred in different situations, and make concrete suggestions about practical use. △ Less

Submitted 30 March, 2016; v1 submitted 28 February, 2016; originally announced February 2016.

Comments: Autonomous Agents and Multi-agent Systems, 2016

arXiv:1601.01944 [pdf, other]

Nonparametric semi-supervised learning of class proportions

Authors: Shantanu Jain, Martha White, Michael W. Trosset, Predrag Radivojac

Abstract: The problem of developing binary classifiers from positive and unlabeled data is often encountered in machine learning. A common requirement in this setting is to approximate posterior probabilities of positive and negative classes for a previously unseen data point. This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unl… ▽ More The problem of developing binary classifiers from positive and unlabeled data is often encountered in machine learning. A common requirement in this setting is to approximate posterior probabilities of positive and negative classes for a previously unseen data point. This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples. In this work we primarily focus on the latter subproblem. We study nonparametric class prior estimation and formulate this problem as an estimation of mixing proportions in two-component mixture models, given a sample from one of the components and another sample from the mixture itself. We show that estimation of mixing proportions is generally ill-defined and propose a canonical form to obtain identifiability while maintaining the flexibility to model any distribution. We use insights from this theory to elucidate the optimization surface of the class priors and propose an algorithm for estimating them. To address the problems of high-dimensional density estimation, we provide practical transformations to low-dimensional spaces that preserve class priors. Finally, we demonstrate the efficacy of our method on univariate and multivariate data. △ Less

Submitted 8 January, 2016; originally announced January 2016.

arXiv:1512.01241 [pdf, other]

doi 10.1093/mnras/stw1042

Estimating sparse precision matrices

Authors: Nikhil Padmanabhan, Martin White, Harrison H. Zhou, Ross O'Connell

Abstract: We apply a method recently introduced to the statistical literature to directly estimate the precision matrix from an ensemble of samples drawn from a corresponding Gaussian distribution. Motivated by the observation that cosmological precision matrices are often approximately sparse, the method allows one to exploit this sparsity of the precision matrix to more quickly converge to an asymptotic 1… ▽ More We apply a method recently introduced to the statistical literature to directly estimate the precision matrix from an ensemble of samples drawn from a corresponding Gaussian distribution. Motivated by the observation that cosmological precision matrices are often approximately sparse, the method allows one to exploit this sparsity of the precision matrix to more quickly converge to an asymptotic 1/sqrt(Nsim) rate while simultaneously providing an error model for all of the terms. Such an estimate can be used as the starting point for further regularization efforts which can improve upon the 1/sqrt(Nsim) limit above, and incorporating such additional steps is straightforward within this framework. We demonstrate the technique with toy models and with an example motivated by large-scale structure two-point analysis, showing significant improvements in the rate of convergence.For the large-scale structure example we find errors on the precision matrix which are factors of 5 smaller than for the sample precision matrix for thousands of simulations or, alternatively, convergence to the same error level with more than an order of magnitude fewer simulations. △ Less

Submitted 3 December, 2015; originally announced December 2015.

Comments: 11 pages, 14 figures, submitted to MNRAS

arXiv:1401.1640 [pdf, ps, other]

doi 10.1214/13-AOAS669

Quantifying intrinsic and extrinsic noise in gene transcription using the linear noise approximation: An application to single cell data

Authors: Bärbel Finkenstädt, Dan J. Woodcock, Michal Komorowski, Claire V. Harper, Julian R. E. Davis, Mike R. H. White, David A. Rand

Abstract: A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements. However, one would not only like to infer kinetic parameters but also study their variability from cell to cell. Here we focus on the case where single-cell fluorescent protein imaging time series data are available for a population of cells. Based on van K… ▽ More A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements. However, one would not only like to infer kinetic parameters but also study their variability from cell to cell. Here we focus on the case where single-cell fluorescent protein imaging time series data are available for a population of cells. Based on van Kampen's linear noise approximation, we derive a dynamic state space model for molecular populations which is then extended to a hierarchical model. This model has potential to address the sources of variability relevant to single-cell data, namely, intrinsic noise due to the stochastic nature of the birth and death processes involved in reactions and extrinsic noise arising from the cell-to-cell variation of kinetic parameters. In order to infer such a model from experimental data, one must also quantify the measurement process where one has to allow for nonmeasurable molecular species as well as measurement noise of unknown level and variance. The availability of multiple single-cell time series data here provides a unique testbed to fit such a model and quantify these different sources of variation from experimental data. △ Less

Submitted 8 January, 2014; originally announced January 2014.

Comments: Published in at http://dx.doi.org/10.1214/13-AOAS669 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS669

Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 4, 1960-1982

arXiv:1211.0587 [pdf, other]

Partition Tree Weighting

Authors: Joel Veness, Martha White, Michael Bowling, András György

Abstract: This paper introduces the Partition Tree Weighting technique, an efficient meta-algorithm for piecewise stationary sources. The technique works by performing Bayesian model averaging over a large class of possible partitions of the data into locally stationary segments. It uses a prior, closely related to the Context Tree Weighting technique of Willems, that is well suited to data compression appl… ▽ More This paper introduces the Partition Tree Weighting technique, an efficient meta-algorithm for piecewise stationary sources. The technique works by performing Bayesian model averaging over a large class of possible partitions of the data into locally stationary segments. It uses a prior, closely related to the Context Tree Weighting technique of Willems, that is well suited to data compression applications. Our technique can be applied to any coding distribution at an additional time and space cost only logarithmic in the sequence length. We provide a competitive analysis of the redundancy of our method, and explore its application in a variety of settings. The order of the redundancy and the complexity of our algorithm matches those of the best competitors available in the literature, and the new algorithm exhibits a superior complexity-performance trade-off in our experiments. △ Less

Submitted 21 November, 2012; v1 submitted 2 November, 2012; originally announced November 2012.

arXiv:1208.4066 [pdf, other]

Reverse Engineering Gene Interaction Networks Using the Phi-Mixing Coefficient

Authors: Nitin Kumar Singh, M. Eren Ahsen, Shiva Mankala, Hyun-Seok Kim, Michael A. White, M. Vidyasagar

Abstract: Constructing gene interaction networks (GINs) from high-throughput gene expression data is an important and challenging problem in systems biology. Existing algorithms produce networks that either have undirected and unweighted edges, or else are constrained to contain no cycles, both of which are biologically unrealistic. In the present paper we propose a new algorithm, based on a concept from pr… ▽ More Constructing gene interaction networks (GINs) from high-throughput gene expression data is an important and challenging problem in systems biology. Existing algorithms produce networks that either have undirected and unweighted edges, or else are constrained to contain no cycles, both of which are biologically unrealistic. In the present paper we propose a new algorithm, based on a concept from probability theory known as the phi-mixing coefficient, that produces networks whose edges are weighted and directed, and are permitted to contain cycles. Because there is no "ground truth" for genome-wide networks on a human scale, we analyzed the outcomes of several experiments on lung cancer, and matched the predictions from the inferred networks with experimental results. Specifically, we inferred three networks (NSCLC, Neuro-endocrine NSCLC plus SCLC, and normal) from the gene expression measurements of 157 lung cancer and 59 normal cell lines, compared with the outcomes of siRNA screening of 19,000+ genes on 11 NSCLC cell lines, and analyzed data from a ChIP-Seq experiment to determine putative downstream targets of the lineage specific oncogenic transcription factor ASCL1. The inferred networks displayed a scale-free or power law behavior between the degree of a node and the number of nodes with that degree. There was a strong correlation between the degree of a gene in the inferred NSCLC network and its essentiality for the survival of the cells. The inferred downstream neighborhood genes of ASCL1 in the SCLC network were significantly enriched by ChIP-Seq determined putative target genes, while no such enrichment was found in the inferred NSCLC network. △ Less

Submitted 12 March, 2016; v1 submitted 20 August, 2012; originally announced August 2012.

Comments: 19 pages, 6 figures

MSC Class: 62P10; 92B15

arXiv:0911.3944 [pdf, other]

doi 10.1109/JSTSP.2010.2076050

Likelihood-based semi-supervised model selection with applications to speech processing

Authors: Christopher M. White, Sanjeev P. Khudanpur, Patrick J. Wolfe

Abstract: In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically cost… ▽ More In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically costly and difficult to obtain. This article proposes an alternative semi-supervised framework for likelihood-based model selection that leverages unlabeled data by using trained classifiers representing each model to automatically generate putative labels. The errors that result from this automatic labeling are shown to be amenable to results from robust statistics, which in turn provide for minimax-optimal censored likelihood ratio tests that recover the nonparametric sign test as a limiting case. This approach is then validated experimentally using a state-of-the-art automatic speech recognition system to select between candidate word pronunciations using unlabeled speech data that only potentially contain instances of the words under test. Results provide supporting evidence for the utility of this approach, and suggest that it may also find use in other applications of machine learning. △ Less

Submitted 19 November, 2009; originally announced November 2009.

Comments: 11 pages, 2 figures; submitted for publication

Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 4, pp. 1016-1026, 2010

Showing 1–50 of 50 results for author: White, M