-
Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient
Authors:
Vu C. Dinh,
Lam Si Tung Ho,
Cuong V. Nguyen
Abstract:
We analyze the error rates of the Hamiltonian Monte Carlo algorithm with leapfrog integrator for Bayesian neural network inference. We show that due to the non-differentiability of activation functions in the ReLU family, leapfrog HMC for networks with these activation functions has a large local error rate of $Ω(ε)$ rather than the classical error rate of $O(ε^3)$. This leads to a higher rejectio…
▽ More
We analyze the error rates of the Hamiltonian Monte Carlo algorithm with leapfrog integrator for Bayesian neural network inference. We show that due to the non-differentiability of activation functions in the ReLU family, leapfrog HMC for networks with these activation functions has a large local error rate of $Ω(ε)$ rather than the classical error rate of $O(ε^3)$. This leads to a higher rejection rate of the proposals, making the method inefficient. We then verify our theoretical findings through empirical simulations as well as experiments on a real-world dataset that highlight the inefficiency of HMC inference on ReLU-based neural networks compared to analytical networks.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Nonparametric Inference Framework for Time-dependent Epidemic Models
Authors:
Son Luu,
Edward Susko,
Lam Si Tung Ho
Abstract:
Compartmental models, especially the Susceptible-Infected-Removed (SIR) model, have long been used to understand the behaviour of various diseases. Allowing parameters, such as the transmission rate, to be time-dependent functions makes it possible to adjust for and make inferences about changes in the process due to mitigation strategies or evolutionary changes of the infectious agent. In this ar…
▽ More
Compartmental models, especially the Susceptible-Infected-Removed (SIR) model, have long been used to understand the behaviour of various diseases. Allowing parameters, such as the transmission rate, to be time-dependent functions makes it possible to adjust for and make inferences about changes in the process due to mitigation strategies or evolutionary changes of the infectious agent. In this article, we attempt to build a nonparametric inference framework for stochastic SIR models with time dependent infection rate. The framework includes three main steps: likelihood approximation, parameter estimation and confidence interval construction. The likelihood function of the stochastic SIR model, which is often intractable, can be approximated using methods such as diffusion approximation or tau leaping. The infection rate is modelled by a B-spline basis whose knot location and number of knots are determined by a fast knot placement method followed by a criterion-based model selection procedure. Finally, a point-wise confidence interval is built using a parametric bootstrap procedure. The performance of the framework is observed through various settings for different epidemic patterns. The model is then applied to the Ontario COVID-19 data across multiple waves.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Authors:
Siavash Golkar,
Alberto Bietti,
Mariel Pettee,
Michael Eickenberg,
Miles Cranmer,
Keiya Hirashima,
Geraud Krawezik,
Nicholas Lourie,
Michael McCabe,
Rudy Morel,
Ruben Ohana,
Liam Holden Parker,
Bruno Régaldo-Saint Blancard,
Kyunghyun Cho,
Shirley Ho
Abstract:
Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datas…
▽ More
Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datasets, akin to object detection or region-based scientific analysis. We present theoretical and empirical analysis using both causal and non-causal Transformer architectures, investigating the influence of various positional encodings on performance and interpretability. In particular, we find that causal attention is much better suited for the task, and that no positional embeddings lead to the best accuracy, though rotary embeddings are competitive and easier to train. We also show that out of distribution performance is tightly linked to which tokens it uses as a bias term.
△ Less
Submitted 30 May, 2024;
originally announced June 2024.
-
Detection of evolutionary shifts in variance under an Ornsten-Uhlenbeck model
Authors:
Wensha Zhang,
Lam Si Tung Ho,
Toby Kenney
Abstract:
Abrupt environmental changes can lead to evolutionary shifts in not only the optimal trait value, but also the rate of adaptation and the diffusion variance in trait evolution. While several methods exist for detecting shifts in optimal values, few explicitly model shifts in both evolutionary variance and adaptation rates. We use a multi-optima and multi-variance Ornstein-Uhlenbeck (OU) process mo…
▽ More
Abrupt environmental changes can lead to evolutionary shifts in not only the optimal trait value, but also the rate of adaptation and the diffusion variance in trait evolution. While several methods exist for detecting shifts in optimal values, few explicitly model shifts in both evolutionary variance and adaptation rates. We use a multi-optima and multi-variance Ornstein-Uhlenbeck (OU) process model to describe trait evolution with shifts in both optimal value and diffusion variance and analyze how covariance between species is affected when shifts in variance occur along the phylogeny. We propose a new method that simultaneously detects shifts in both variance and optimal values by formulating the problem as a variable selection task using an L1-penalized loss function. Our method is implemented in the R package ShiVa (Detection of evolutionary Shifts in Variance). Through simulations, we compare ShiVa with methods that only consider shifts in optimal values (l1ou; PhylogeneticEM), and PCMFit. Our method demonstrates improved predictive ability and significantly reduces false positives in detecting optimal value shifts when variance shifts are present. When only shifts in optimal value occur, our method performs comparably to existing approaches. Applying ShiVa to empirical data from cordylid lizards , we find that it outperforms l1ou and PhylogeneticEM, achieving the highest log-likelihood and lowest BIC.
△ Less
Submitted 31 March, 2025; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Simple Transferability Estimation for Regression Tasks
Authors:
Cuong N. Nguyen,
Phong Tran,
Lam Si Tung Ho,
Vu Dinh,
Anh T. Tran,
Tal Hassner,
Cuong V. Nguyen
Abstract:
We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel…
▽ More
We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel theoretical results connecting our approaches to the actual transferability of the optimal target models obtained from the transfer learning process. Despite their simplicity, our approaches significantly outperform existing state-of-the-art regression transferability estimators in both accuracy and efficiency. On two large-scale keypoint regression benchmarks, our approaches yield 12% to 36% better results on average while being at least 27% faster than previous state-of-the-art methods.
△ Less
Submitted 3 December, 2023; v1 submitted 1 December, 2023;
originally announced December 2023.
-
A Generalization Bound of Deep Neural Networks for Dependent Data
Authors:
Quan Huu Do,
Binh T. Nguyen,
Lam Si Tung Ho
Abstract:
Existing generalization bounds for deep neural networks require data to be independent and identically distributed (iid). This assumption may not hold in real-life applications such as evolutionary biology, infectious disease epidemiology, and stock price prediction. This work establishes a generalization bound of feed-forward neural networks for non-stationary $φ$-mixing data.
Existing generalization bounds for deep neural networks require data to be independent and identically distributed (iid). This assumption may not hold in real-life applications such as evolutionary biology, infectious disease epidemiology, and stock price prediction. This work establishes a generalization bound of feed-forward neural networks for non-stationary $φ$-mixing data.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Multiple Physics Pretraining for Physical Surrogate Models
Authors:
Michael McCabe,
Bruno Régaldo-Saint Blancard,
Liam Holden Parker,
Ruben Ohana,
Miles Cranmer,
Alberto Bietti,
Michael Eickenberg,
Siavash Golkar,
Geraud Krawezik,
Francois Lanusse,
Mariel Pettee,
Tiberiu Tesileanu,
Kyunghyun Cho,
Shirley Ho
Abstract:
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly…
▽ More
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.
△ Less
Submitted 10 December, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
xVal: A Continuous Numerical Tokenization for Scientific Language Models
Authors:
Siavash Golkar,
Mariel Pettee,
Michael Eickenberg,
Alberto Bietti,
Miles Cranmer,
Geraud Krawezik,
Francois Lanusse,
Michael McCabe,
Ruben Ohana,
Liam Parker,
Bruno Régaldo-Saint Blancard,
Tiberiu Tesileanu,
Kyunghyun Cho,
Shirley Ho
Abstract:
Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science…
▽ More
Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science. In this work, we introduce xVal, a strategy for continuously tokenizing numbers within language models that results in a more appropriate inductive bias for scientific applications. By training specially-modified language models from scratch on a variety of scientific datasets formatted as text, we find that xVal generally outperforms other common numerical tokenization strategies on metrics including out-of-distribution generalization and computational efficiency.
△ Less
Submitted 15 December, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Wavelet Moments for Cosmological Parameter Estimation
Authors:
Michael Eickenberg,
Erwan Allys,
Azadeh Moradinezhad Dizgah,
Pablo Lemos,
Elena Massara,
Muntazir Abidi,
ChangHoon Hahn,
Sultan Hassan,
Bruno Regaldo-Saint Blancard,
Shirley Ho,
Stephane Mallat,
Joakim Andén,
Francisco Villaescusa-Navarro
Abstract:
Extracting non-Gaussian information from the non-linear regime of structure formation is key to fully exploiting the rich data from upcoming cosmological surveys probing the large-scale structure of the universe. However, due to theoretical and computational complexities, this remains one of the main challenges in analyzing observational data. We present a set of summary statistics for cosmologica…
▽ More
Extracting non-Gaussian information from the non-linear regime of structure formation is key to fully exploiting the rich data from upcoming cosmological surveys probing the large-scale structure of the universe. However, due to theoretical and computational complexities, this remains one of the main challenges in analyzing observational data. We present a set of summary statistics for cosmological matter fields based on 3D wavelets to tackle this challenge. These statistics are computed as the spatial average of the complex modulus of the 3D wavelet transform raised to a power $q$ and are therefore known as invariant wavelet moments. The 3D wavelets are constructed to be radially band-limited and separable on a spherical polar grid and come in three types: isotropic, oriented, and harmonic. In the Fisher forecast framework, we evaluate the performance of these summary statistics on matter fields from the Quijote suite, where they are shown to reach state-of-the-art parameter constraints on the base $Λ$CDM parameters, as well as the sum of neutrino masses. We show that we can improve constraints by a factor 5 to 10 in all parameters with respect to the power spectrum baseline.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Evolutionary shift detection with ensemble variable selection
Authors:
Wensha Zhang,
Toby Kenney,
Lam Si Tung Ho
Abstract:
1. Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes.
2. We propose an ensemble variable selection method (R package ELPASO) for the evolutionary shift detection task and compare it with existing methods (R packages l1ou and PhylogeneticEM) under several scenarios.…
▽ More
1. Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes.
2. We propose an ensemble variable selection method (R package ELPASO) for the evolutionary shift detection task and compare it with existing methods (R packages l1ou and PhylogeneticEM) under several scenarios.
3. The performances of methods are highly dependent on the selection criterion. When the signal sizes are small, the methods using the Bayesian information criterion (BIC) have better performances. And when the signal sizes are large enough, the methods using the phylogenetic Bayesian information criterion (pBIC) (Khabbazian et al., 2016) have better performance. Moreover, the performance is heavily impacted by measurement error and tree reconstruction error.
4. Ensemble method + pBIC tends to perform less conservatively than l1ou + pBIC, and Ensemble method + BIC is more conservatively than l1ou + BIC. PhylogeneticEM is even more conservative with small signal sizes and falls between l1ou + pBIC and Ensemble method + BIC with large signal sizes. The results can differ between the methods, but none clearly outperforms the others. By applying multiple methods to a single dataset, we can access the robustness of each detected shift, based on the agreement among methods.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Searching for Minimal Optimal Neural Networks
Authors:
Lam Si Tung Ho,
Vu Dinh
Abstract:
Large neural network models have high predictive power but may suffer from overfitting if the training set is not large enough. Therefore, it is desirable to select an appropriate size for neural networks. The destructive approach, which starts with a large architecture and then reduces the size using a Lasso-type penalty, has been used extensively for this task. Despite its popularity, there is n…
▽ More
Large neural network models have high predictive power but may suffer from overfitting if the training set is not large enough. Therefore, it is desirable to select an appropriate size for neural networks. The destructive approach, which starts with a large architecture and then reduces the size using a Lasso-type penalty, has been used extensively for this task. Despite its popularity, there is no theoretical guarantee for this technique. Based on the notion of minimal neural networks, we posit a rigorous mathematical framework for studying the asymptotic theory of the destructive technique. We prove that Adaptive group Lasso is consistent and can reconstruct the correct number of hidden nodes of one-hidden-layer feedforward networks with high probability. To the best of our knowledge, this is the first theoretical result establishing for the destructive technique.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Ancestral state reconstruction with large numbers of sequences and edge-length estimation
Authors:
Lam Si Tung Ho,
Edward Susko
Abstract:
Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the…
▽ More
Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the number of species, however, edge lengths are not likely to be accurately estimated. We study the consistency of the maximum likelihood and empirical Bayes estimators of ancestral state of discrete traits in such settings under a star tree. We prove that the likelihood-based reconstruction is consistent under symmetric models but can be inconsistent under non-symmetric models. We show, however, that a simple consistent estimator for the ancestral states is available under non-symmetric models. The results illustrate that likelihood methods can unexpectedly have undesirable properties as the number of sequences considered get very large. Broader implications of the results are discussed.
△ Less
Submitted 31 March, 2021;
originally announced April 2021.
-
A Bayesian neural network predicts the dissolution of compact planetary systems
Authors:
Miles Cranmer,
Daniel Tamayo,
Hanno Rein,
Peter Battaglia,
Samuel Hadden,
Philip J. Armitage,
Shirley Ho,
David N. Spergel
Abstract:
Despite over three hundred years of effort, no solutions exist for predicting when a general planetary configuration will become unstable. We introduce a deep learning architecture to push forward this problem for compact systems. While current machine learning algorithms in this area rely on scientist-derived instability metrics, our new technique learns its own metrics from scratch, enabled by a…
▽ More
Despite over three hundred years of effort, no solutions exist for predicting when a general planetary configuration will become unstable. We introduce a deep learning architecture to push forward this problem for compact systems. While current machine learning algorithms in this area rely on scientist-derived instability metrics, our new technique learns its own metrics from scratch, enabled by a novel internal structure inspired from dynamics theory. Our Bayesian neural network model can accurately predict not only if, but also when a compact planetary system with three or more planets will go unstable. Our model, trained directly from short N-body time series of raw orbital elements, is more than two orders of magnitude more accurate at predicting instability times than analytical estimators, while also reducing the bias of existing machine learning algorithms by nearly a factor of three. Despite being trained on compact resonant and near-resonant three-planet configurations, the model demonstrates robust generalization to both non-resonant and higher multiplicity configurations, in the latter case outperforming models fit to that specific set of integrations. The model computes instability estimates up to five orders of magnitude faster than a numerical integrator, and unlike previous efforts provides confidence intervals on its predictions. Our inference model is publicly available in the SPOCK package, with training code open-sourced.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
Consistent Feature Selection for Analytic Deep Neural Networks
Authors:
Vu Dinh,
Lam Si Tung Ho
Abstract:
One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the prediction aspect of the problem with virtually no work on feature selection consistency for deep neural networks due to the model's severe nonlinearity and unide…
▽ More
One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the prediction aspect of the problem with virtually no work on feature selection consistency for deep neural networks due to the model's severe nonlinearity and unidentifiability. This lack of theoretical foundation casts doubt on the applicability of deep learning to contexts where correct interpretations of the features play a central role.
In this work, we investigate the problem of feature selection for analytic deep networks. We prove that for a wide class of networks, including deep feed-forward neural networks, convolutional neural networks, and a major sub-class of residual neural networks, the Adaptive Group Lasso selection procedure with Group Lasso as the base estimator is selection-consistent. The work provides further evidence that Group Lasso might be inefficient for feature selection with neural networks and advocates the use of Adaptive Group Lasso over the popular Group Lasso.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Meta-Learning for One-Class Classification with Few Examples using Order-Equivariant Network
Authors:
Ademola Oladosu,
Tony Xu,
Philip Ekfeldt,
Brian A. Kelly,
Miles Cranmer,
Shirley Ho,
Adrian M. Price-Whelan,
Gabriella Contardo
Abstract:
This paper presents a meta-learning framework for few-shots One-Class Classification (OCC) at test-time, a setting where labeled examples are only available for the positive class, and no supervision is given for the negative example. We consider that we have a set of `one-class classification' objective-tasks with only a small set of positive examples available for each task, and a set of trainin…
▽ More
This paper presents a meta-learning framework for few-shots One-Class Classification (OCC) at test-time, a setting where labeled examples are only available for the positive class, and no supervision is given for the negative example. We consider that we have a set of `one-class classification' objective-tasks with only a small set of positive examples available for each task, and a set of training tasks with full supervision (i.e. highly imbalanced classification). We propose an approach using order-equivariant networks to learn a 'meta' binary-classifier. The model will take as input an example to classify from a given task, as well as the corresponding supervised set of positive examples for this OCC task. Thus, the output of the model will be 'conditioned' on the available positive example of a given task, allowing to predict on new tasks and new examples without labeled negative examples. In this paper, we are motivated by an astronomy application. Our goal is to identify if stars belong to a specific stellar group (the 'one-class' for a given task), called \textit{stellar streams}, where each stellar stream is a different OCC-task. We show that our method transfers well on unseen (test) synthetic streams, and outperforms the baselines even though it is not retrained and accesses a much smaller part of the data per task to predict (only positive supervision). We see however that it doesn't transfer as well on the real stream GD-1. This could come from intrinsic differences from the synthetic and real stream, highlighting the need for consistency in the 'nature' of the task for this method. However, light fine-tuning improve performances and outperform our baselines. Our experiments show encouraging results to further explore meta-learning methods for OCC tasks.
△ Less
Submitted 21 May, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Discovering Symbolic Models from Deep Learning with Inductive Biases
Authors:
Miles Cranmer,
Alvaro Sanchez-Gonzalez,
Peter Battaglia,
Rui Xu,
Kyle Cranmer,
David Spergel,
Shirley Ho
Abstract:
We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical rela…
▽ More
We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical relations. We find the correct known equations, including force laws and Hamiltonians, can be extracted from the neural network. We then apply our method to a non-trivial cosmology example-a detailed dark matter simulation-and discover a new analytic formula which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures. The symbolic expressions extracted from the GNN using our technique also generalized to out-of-distribution data better than the GNN itself. Our approach offers alternative directions for interpreting neural networks and discovering novel physical principles from the representations they learn.
△ Less
Submitted 17 November, 2020; v1 submitted 19 June, 2020;
originally announced June 2020.
-
Consistent feature selection for neural networks via Adaptive Group Lasso
Authors:
Vu Dinh,
Lam Si Tung Ho
Abstract:
One main obstacle for the wide use of deep learning in medical and engineering sciences is its interpretability. While neural network models are strong tools for making predictions, they often provide little information about which features play significant roles in influencing the prediction accuracy. To overcome this issue, many regularization procedures for learning with neural networks have be…
▽ More
One main obstacle for the wide use of deep learning in medical and engineering sciences is its interpretability. While neural network models are strong tools for making predictions, they often provide little information about which features play significant roles in influencing the prediction accuracy. To overcome this issue, many regularization procedures for learning with neural networks have been proposed for dropping non-significant features. Unfortunately, the lack of theoretical results casts doubt on the applicability of such pipelines. In this work, we propose and establish a theoretical guarantee for the use of the adaptive group lasso for selecting important features of neural networks. Specifically, we show that our feature selection method is consistent for single-output feed-forward neural networks with one hidden layer and hyperbolic tangent activation function. We demonstrate its applicability using both simulation and data analysis.
△ Less
Submitted 2 December, 2021; v1 submitted 30 May, 2020;
originally announced June 2020.
-
Efficient Bayesian Inference of General Gaussian Models on Large Phylogenetic Trees
Authors:
Paul Bastide,
Lam Si Tung Ho,
Guy Baele,
Philippe Lemey,
Marc A Suchard
Abstract:
Phylogenetic comparative methods correct for shared evolutionary history among a set of non-independent organisms by modeling sample traits as arising from a diffusion process along on the branches of a possibly unknown history. To incorporate such uncertainty, we present a scalable Bayesian inference framework under a general Gaussian trait evolution model that exploits Hamiltonian Monte Carlo (H…
▽ More
Phylogenetic comparative methods correct for shared evolutionary history among a set of non-independent organisms by modeling sample traits as arising from a diffusion process along on the branches of a possibly unknown history. To incorporate such uncertainty, we present a scalable Bayesian inference framework under a general Gaussian trait evolution model that exploits Hamiltonian Monte Carlo (HMC). HMC enables efficient sampling of the constrained model parameters and takes advantage of the tree structure for fast likelihood and gradient computations, yielding algorithmic complexity linear in the number of observations. This approach encompasses a wide family of stochastic processes, including the general Ornstein-Uhlenbeck (OU) process, with possible missing data and measurement errors. We implement inference tools for a biologically relevant subset of all these models into the BEAST phylogenetic software package and develop model comparison through marginal likelihood estimation. We apply our approach to study the morphological evolution in the superfamilly of Musteloidea (including weasels and allies) as well as the heritability of HIV virulence. This second problem furnishes a new measure of evolutionary heritability that demonstrates its utility through a targeted simulation study.
△ Less
Submitted 29 September, 2020; v1 submitted 23 March, 2020;
originally announced March 2020.
-
Lagrangian Neural Networks
Authors:
Miles Cranmer,
Sam Greydanus,
Stephan Hoyer,
Peter Battaglia,
David Spergel,
Shirley Ho
Abstract:
Accurate models of the world are built upon notions of its underlying symmetries. In physics, these symmetries correspond to conservation laws, such as for energy and momentum. Yet even though neural network models see increasing use in the physical sciences, they struggle to learn these symmetries. In this paper, we propose Lagrangian Neural Networks (LNNs), which can parameterize arbitrary Lagra…
▽ More
Accurate models of the world are built upon notions of its underlying symmetries. In physics, these symmetries correspond to conservation laws, such as for energy and momentum. Yet even though neural network models see increasing use in the physical sciences, they struggle to learn these symmetries. In this paper, we propose Lagrangian Neural Networks (LNNs), which can parameterize arbitrary Lagrangians using neural networks. In contrast to models that learn Hamiltonians, LNNs do not require canonical coordinates, and thus perform well in situations where canonical momenta are unknown or difficult to compute. Unlike previous approaches, our method does not restrict the functional form of learned energies and will produce energy-conserving models for a variety of tasks. We test our approach on a double pendulum and a relativistic particle, demonstrating energy conservation where a baseline approach incurs dissipation and modeling relativity without canonical coordinates where a Hamiltonian approach fails. Finally, we show how this model can be applied to graphs and continuous systems using a Lagrangian Graph Network, and demonstrate it on the 1D wave equation.
△ Less
Submitted 30 July, 2020; v1 submitted 10 March, 2020;
originally announced March 2020.
-
Learning Symbolic Physics with Graph Networks
Authors:
Miles D. Cranmer,
Rui Xu,
Peter Battaglia,
Shirley Ho
Abstract:
We introduce an approach for imposing physically motivated inductive biases on graph networks to learn interpretable representations and improved zero-shot generalization. Our experiments show that our graph network models, which implement this inductive bias, can learn message representations equivalent to the true force vector when trained on n-body gravitational and spring-like simulations. We…
▽ More
We introduce an approach for imposing physically motivated inductive biases on graph networks to learn interpretable representations and improved zero-shot generalization. Our experiments show that our graph network models, which implement this inductive bias, can learn message representations equivalent to the true force vector when trained on n-body gravitational and spring-like simulations. We use symbolic regression to fit explicit algebraic equations to our trained model's message function and recover the symbolic form of Newton's law of gravitation without prior knowledge. We also show that our model generalizes better at inference time to systems with more bodies than had been experienced during training. Our approach is extensible, in principle, to any unknown interaction law learned by a graph network, and offers a valuable technique for interpreting and inferring explicit causal theories about the world from implicit knowledge captured by deep learning.
△ Less
Submitted 1 November, 2019; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates
Authors:
Miles D. Cranmer,
Richard Galvez,
Lauren Anderson,
David N. Spergel,
Shirley Ho
Abstract:
We demonstrate an algorithm for learning a flexible color-magnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multi-dimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this data-driven model using Gaia DR2 photometry and parallax…
▽ More
We demonstrate an algorithm for learning a flexible color-magnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multi-dimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this data-driven model using Gaia DR2 photometry and parallaxes. Dust estimation and dereddening is done iteratively inside the model and without prior distance information, using the Bayestar map. The signal-to-noise (precision) of distance measurements improves on average by more than 48% over the raw Gaia data, and we also demonstrate how the accuracy of distances have improved over other models, especially in the noisy-parallax regime. Applications are discussed, including significantly improved Milky Way disk separation and substructure detection. We conclude with a discussion of future work, which exploits the normalizing flow architecture to allow us to exactly marginalize over missing photometry, enabling the inclusion of many surveys without losing coverage.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
Inferring phenotypic trait evolution on large trees with many incomplete measurements
Authors:
Gabriel Hassler,
Max R. Tolkoff,
William L. Allen,
Lam Si Tung Ho,
Philippe Lemey,
Marc A. Suchard
Abstract:
Comparative biologists are often interested in inferring covariation between multiple biological traits sampled across numerous related taxa. To properly study these relationships, we must control for the shared evolutionary history of the taxa to avoid spurious inference. Existing control techniques almost universally scale poorly as the number of taxa increases. An additional challenge arises as…
▽ More
Comparative biologists are often interested in inferring covariation between multiple biological traits sampled across numerous related taxa. To properly study these relationships, we must control for the shared evolutionary history of the taxa to avoid spurious inference. Existing control techniques almost universally scale poorly as the number of taxa increases. An additional challenge arises as obtaining a full suite of measurements becomes increasingly difficult with increasing taxa. This typically necessitates data imputation or integration that further exacerbates scalability. We propose an inference technique that integrates out missing measurements analytically and scales linearly with the number of taxa by using a post-order traversal algorithm under a multivariate Brownian diffusion (MBD) model to characterize trait evolution. We further exploit this technique to extend the MBD model to account for sampling error or non-heritable residual variance. We test these methods to examine mammalian life history traits, prokaryotic genomic and phenotypic traits, and HIV infection traits. We find computational efficiency increases that top two orders-of-magnitude over current best practices. While we focus on the utility of this algorithm in phylogenetic comparative methods, our approach generalizes to solve long-standing challenges in computing the likelihood for matrix-normal and multivariate normal distributions with missing data at scale.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Bayesian Active Learning With Abstention Feedbacks
Authors:
Cuong V. Nguyen,
Lam Si Tung Ho,
Huan Xu,
Vu Dinh,
Binh Nguyen
Abstract:
We study pool-based active learning with abstention feedbacks where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These are…
▽ More
We study pool-based active learning with abstention feedbacks where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These are achieved by simply incorporating the estimated average abstention rate into the greedy criteria. We prove that both algorithms have near-optimality guarantees: they respectively achieve a ${(1-\frac{1}{e})}$ constant factor approximation of the optimal expected or worst-case value of a useful utility function. Our experiments show the algorithms perform well in various practical scenarios.
△ Less
Submitted 30 December, 2020; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Recovery guarantees for polynomial approximation from dependent data with outliers
Authors:
Lam Si Tung Ho,
Hayden Schaeffer,
Giang Tran,
Rachel Ward
Abstract:
Learning non-linear systems from noisy, limited, and/or dependent data is an important task across various scientific fields including statistics, engineering, computer science, mathematics, and many more. In general, this learning task is ill-posed; however, additional information about the data's structure or on the behavior of the unknown function can make the task well-posed. In this work, we…
▽ More
Learning non-linear systems from noisy, limited, and/or dependent data is an important task across various scientific fields including statistics, engineering, computer science, mathematics, and many more. In general, this learning task is ill-posed; however, additional information about the data's structure or on the behavior of the unknown function can make the task well-posed. In this work, we study the problem of learning nonlinear functions from corrupted and dependent data. The learning problem is recast as a sparse robust linear regression problem where we incorporate both the unknown coefficients and the corruptions in a basis pursuit framework. The main contribution of our paper is to provide a reconstruction guarantee for the associated $\ell_1$-optimization problem where the sampling matrix is formed from dependent data. Specifically, we prove that the sampling matrix satisfies the null space property and the stable null space property, provided that the data is compact and satisfies a suitable concentration inequality. We show that our recovery results are applicable to various types of dependent data such as exponentially strongly $α$-mixing data, geometrically $\mathcal{C}$-mixing data, and uniformly ergodic Markov chain. Our theoretical results are verified via several numerical simulations.
△ Less
Submitted 25 November, 2018;
originally announced November 2018.
-
Detecting Galaxy-Filament Alignments in the Sloan Digital Sky Survey III
Authors:
Yen-Chi Chen,
Shirley Ho,
Jonathan Blazek,
Siyu He,
Rachel Mandelbaum,
Peter Melchior,
Sukhdeep Singh
Abstract:
Previous studies have shown the filamentary structures in the cosmic web influence the alignments of nearby galaxies. We study this effect in the LOWZ sample of the Sloan Digital Sky Survey using the "Cosmic Web Reconstruction" filament catalogue. We find that LOWZ galaxies exhibit a small but statistically significant alignment in the direction parallel to the orientation of nearby filaments. Thi…
▽ More
Previous studies have shown the filamentary structures in the cosmic web influence the alignments of nearby galaxies. We study this effect in the LOWZ sample of the Sloan Digital Sky Survey using the "Cosmic Web Reconstruction" filament catalogue. We find that LOWZ galaxies exhibit a small but statistically significant alignment in the direction parallel to the orientation of nearby filaments. This effect is detectable even in the absence of nearby galaxy clusters, which suggests it is an effect from the matter distribution in the filament. A nonparametric regression model suggests that the alignment effect with filaments extends over separations of 30-40 Mpc. We find that galaxies that are bright and early-forming align more strongly with the directions of nearby filaments than those that are faint and late-forming; however, trends with stellar mass are less statistically significant, within the narrow range of stellar mass of this sample.
△ Less
Submitted 21 February, 2019; v1 submitted 30 April, 2018;
originally announced May 2018.
-
Estimating Cosmological Parameters from the Dark Matter Distribution
Authors:
Siamak Ravanbakhsh,
Junier Oliva,
Sebastien Fromenteau,
Layne C. Price,
Shirley Ho,
Jeff Schneider,
Barnabas Poczos
Abstract:
A grand challenge of the 21st century cosmology is to accurately estimate the cosmological parameters of our Universe. A major approach to estimating the cosmological parameters is to use the large-scale matter distribution of the Universe. Galaxy surveys provide the means to map out cosmic large-scale structure in three dimensions. Information about galaxy locations is typically summarized in a "…
▽ More
A grand challenge of the 21st century cosmology is to accurately estimate the cosmological parameters of our Universe. A major approach to estimating the cosmological parameters is to use the large-scale matter distribution of the Universe. Galaxy surveys provide the means to map out cosmic large-scale structure in three dimensions. Information about galaxy locations is typically summarized in a "single" function of scale, such as the galaxy correlation function or power-spectrum. We show that it is possible to estimate these cosmological parameters directly from the distribution of matter. This paper presents the application of deep 3D convolutional networks to volumetric representation of dark-matter simulations as well as the results obtained using a recently proposed distribution regression framework, showing that machine learning techniques are comparable to, and can sometimes outperform, maximum-likelihood point estimates using "cosmological models". This opens the way to estimating the parameters of our Universe with higher accuracy.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.
-
Retrosynthetic reaction prediction using neural sequence-to-sequence models
Authors:
Bowen Liu,
Bharath Ramsundar,
Prasad Kawthekar,
Jade Shi,
Joseph Gomes,
Quang Luu Nguyen,
Stephen Ho,
Jack Sloane,
Paul Wender,
Vijay Pande
Abstract:
We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation…
▽ More
We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step towards solving the challenging problem of computational retrosynthetic analysis.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
Bayesian Pool-based Active Learning With Abstention Feedbacks
Authors:
Cuong V. Nguyen,
Lam Si Tung Ho,
Huan Xu,
Vu Dinh,
Binh Nguyen
Abstract:
We study pool-based active learning with abstention feedbacks, where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These ar…
▽ More
We study pool-based active learning with abstention feedbacks, where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These are achieved by simply incorporating the estimated abstention rate into the greedy criteria. We prove that both of our algorithms have near-optimality guarantees: they respectively achieve a ${(1-\frac{1}{e})}$ constant factor approximation of the optimal expected or worst-case value of a useful utility function. Our experiments show the algorithms perform well in various practical scenarios.
△ Less
Submitted 2 January, 2021; v1 submitted 23 May, 2017;
originally announced May 2017.
-
Fast learning rates with heavy-tailed losses
Authors:
Vu Dinh,
Lam Si Tung Ho,
Duy Nguyen,
Binh T. Nguyen
Abstract:
We study fast learning rates when the losses are not necessarily bounded and may have a distribution with heavy tails. To enable such analyses, we introduce two new conditions: (i) the envelope function $\sup_{f \in \mathcal{F}}|\ell \circ f|$, where $\ell$ is the loss function and $\mathcal{F}$ is the hypothesis class, exists and is $L^r$-integrable, and (ii) $\ell$ satisfies the multi-scale Bern…
▽ More
We study fast learning rates when the losses are not necessarily bounded and may have a distribution with heavy tails. To enable such analyses, we introduce two new conditions: (i) the envelope function $\sup_{f \in \mathcal{F}}|\ell \circ f|$, where $\ell$ is the loss function and $\mathcal{F}$ is the hypothesis class, exists and is $L^r$-integrable, and (ii) $\ell$ satisfies the multi-scale Bernstein's condition on $\mathcal{F}$. Under these assumptions, we prove that learning rate faster than $O(n^{-1/2})$ can be obtained and, depending on $r$ and the multi-scale Bernstein's powers, can be arbitrarily close to $O(n^{-1})$. We then verify these assumptions and derive fast learning rates for the problem of vector quantization by $k$-means clustering with heavy-tailed distributions. The analyses enable us to obtain novel learning rates that extend and complement existing results in the literature from both theoretical and practical viewpoints.
△ Less
Submitted 29 September, 2016;
originally announced September 2016.
-
Direct likelihood-based inference for discretely observed stochastic compartmental models of infectious disease
Authors:
Lam Si Tung Ho,
Forrest W. Crawford,
Marc A. Suchard
Abstract:
Stochastic compartmental models are important tools for understanding the course of infectious diseases epidemics in populations and in prospective evaluation of intervention policies. However, calculating the likelihood for discretely observed data from even simple models -- such as the ubiquitous susceptible-infectious-removed (SIR) model -- has been considered computationally intractable, since…
▽ More
Stochastic compartmental models are important tools for understanding the course of infectious diseases epidemics in populations and in prospective evaluation of intervention policies. However, calculating the likelihood for discretely observed data from even simple models -- such as the ubiquitous susceptible-infectious-removed (SIR) model -- has been considered computationally intractable, since its formulation almost a century ago. Recently researchers have proposed methods to circumvent this limitation through data augmentation or approximation, but these approaches often suffer from high computational cost or loss of accuracy. We develop the mathematical foundation and an efficient algorithm to compute the likelihood for discretely observed data from a broad class of stochastic compartmental models. We also give expressions for the derivatives of the transition probabilities using the same technique, making possible inference via Hamiltonian Monte Carlo (HMC). We use the 17th century plague in Eyam, a classic example of the SIR model, to compare our recursion method to sequential Monte Carlo, analyze using HMC, and assess the model assumptions. We also apply our direct likelihood evaluation to perform Bayesian inference for the 2014-2015 Ebola outbreak in Guinea. The results suggest that the epidemic infectious rates have decreased since October 2014 in the Southeast region of Guinea, while rates remain the same in other regions, facilitating understanding of the outbreak and the effectiveness of Ebola control interventions.
△ Less
Submitted 25 July, 2018; v1 submitted 24 August, 2016;
originally announced August 2016.
-
Birth/birth-death processes and their computable transition probabilities with biological applications
Authors:
Lam Si Tung Ho,
Jason Xu,
Forrest W. Crawford,
Vladimir N. Minin,
Marc A. Suchard
Abstract:
Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationall…
▽ More
Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationally expensive methods such as matrix exponentiation or Monte Carlo approximation, restricting likelihood-based inference to small systems, or indirect methods such as approximate Bayesian computation. In this paper, we introduce the birth(death)/birth-death process, a tractable bivariate extension of the birth-death process. We develop an efficient and robust algorithm to calculate the transition probabilities of birth(death)/birth-death processes using a continued fraction representation of their Laplace transforms. Next, we identify several exemplary models arising in molecular epidemiology, macro-parasite evolution, and infectious disease modeling that fall within this class, and demonstrate advantages of our proposed method over existing approaches to inference in these models. Notably, the ubiquitous stochastic susceptible-infectious-removed (SIR) model falls within this class, and we emphasize that computable transition probabilities newly enable direct inference of parameters in the SIR model. We also propose a very fast method for approximating the transition probabilities under the SIR model via a novel branching process simplification, and compare it to the continued fraction representation method with application to the 17th century plague in Eyam. Although the two methods produce similar maximum a posteriori estimates, the branching process approximation fails to capture the correlation structure in the joint posterior distribution.
△ Less
Submitted 7 August, 2017; v1 submitted 11 March, 2016;
originally announced March 2016.
-
A Relaxed Drift Diffusion Model for Phylogenetic Trait Evolution
Authors:
Mandev S. Gill,
Lam Si Tung Ho,
Guy Baele,
Philippe Lemey,
Marc A. Suchard
Abstract:
Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits…
▽ More
Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits as a Brownian diffusion process. However, standard Brownian diffusion is quite restrictive and may not accurately characterize certain trait evolutionary processes. Here, we relax one of the major restrictions of standard Brownian diffusion by incorporating a nontrivial estimable drift into the process. We introduce a relaxed drift diffusion model for the evolution of multivariate continuously varying traits along a phylogenetic tree via Brownian diffusion with drift. Notably, the relaxed drift model accommodates branch-specific variation of drift rates while preserving model identifiability. We implement the relaxed drift model in a Bayesian inference framework to simultaneously reconstruct the evolutionary histories of molecular sequence data and associated multivariate continuous trait data, and provide tools to visualize evolutionary reconstructions. We illustrate our approach in three viral examples. In the first two, we examine the spatiotemporal spread of HIV-1 in central Africa and West Nile virus in North America and show that a relaxed drift approach uncovers a clearer, more detailed picture of the dynamics of viral dispersal than standard Brownian diffusion. Finally, we study antigenic evolution in the context of HIV-1 resistance to three broadly neutralizing antibodies. Our analysis reveals evidence of a continuous drift at the HIV-1 population level towards enhanced resistance to neutralization by the VRC01 monoclonal antibody over the course of the epidemic.
△ Less
Submitted 29 December, 2015; v1 submitted 24 December, 2015;
originally announced December 2015.
-
Cosmic Web Reconstruction through Density Ridges: Catalogue
Authors:
Yen-Chi Chen,
Shirley Ho,
Jon Brinkmann,
Peter E. Freeman,
Christopher R. Genovese,
Donald P. Schneider,
Larry Wasserman
Abstract:
We construct a catalogue for filaments using a novel approach called SCMS (subspace constrained mean shift; Ozertem & Erdogmus 2011; Chen et al. 2015). SCMS is a gradient-based method that detects filaments through density ridges (smooth curves tracing high-density regions). A great advantage of SCMS is its uncertainty measure, which allows an evaluation of the errors for the detected filaments. T…
▽ More
We construct a catalogue for filaments using a novel approach called SCMS (subspace constrained mean shift; Ozertem & Erdogmus 2011; Chen et al. 2015). SCMS is a gradient-based method that detects filaments through density ridges (smooth curves tracing high-density regions). A great advantage of SCMS is its uncertainty measure, which allows an evaluation of the errors for the detected filaments. To detect filaments, we use data from the Sloan Digital Sky Survey, which consist of three galaxy samples: the NYU main galaxy sample (MGS), the LOWZ sample and the CMASS sample. Each of the three dataset covers different redshift regions so that the combined sample allows detection of filaments up to z = 0.7. Our filament catalogue consists of a sequence of two-dimensional filament maps at different redshifts that provide several useful statistics on the evolution cosmic web. To construct the maps, we select spectroscopically confirmed galaxies within 0.050 < z < 0.700 and partition them into 130 bins. For each bin, we ignore the redshift, treating the galaxy observations as a 2-D data and detect filaments using SCMS. The filament catalogue consists of 130 individual 2-D filament maps, and each map comprises points on the detected filaments that describe the filamentary structures at a particular redshift. We also apply our filament catalogue to investigate galaxy luminosity and its relation with distance to filament. Using a volume-limited sample, we find strong evidence (6.1$σ$ - 12.3$σ$) that galaxies close to filaments are generally brighter than those at significant distance from filaments.
△ Less
Submitted 21 September, 2015;
originally announced September 2015.
-
Detecting Effects of Filaments on Galaxy Properties in the Sloan Digital Sky Survey III
Authors:
Yen-Chi Chen,
Shirley Ho,
Rachel Mandelbaum,
Neta A. Bahcall,
Joel R. Brownstein,
Peter E. Freeman,
Christopher R. Genovese,
Donald P. Schneider,
Larry Wasserman
Abstract:
We study the effects of filaments on galaxy properties in the Sloan Digital Sky Survey (SDSS) Data Release 12 using filaments from the `Cosmic Web Reconstruction' catalogue (Chen et al. 2016), a publicly available filament catalogue for SDSS. Since filaments are tracers of medium-to-high density regions, we expect that galaxy properties associated with the environment are dependent on the distance…
▽ More
We study the effects of filaments on galaxy properties in the Sloan Digital Sky Survey (SDSS) Data Release 12 using filaments from the `Cosmic Web Reconstruction' catalogue (Chen et al. 2016), a publicly available filament catalogue for SDSS. Since filaments are tracers of medium-to-high density regions, we expect that galaxy properties associated with the environment are dependent on the distance to the nearest filament. Our analysis demonstrates that a red galaxy or a high-mass galaxy tend to reside closer to filaments than a blue or low-mass galaxy. After adjusting the effect from stellar mass, on average, early-forming galaxies or large galaxies have a shorter distance to filaments than late-forming galaxies or small galaxies. For the Main galaxy sample (MGS), all signals are very significant ($>6σ$). For the LOWZ and CMASS sample, the stellar mass and size are significant ($>2 σ$). The filament effects we observe persist until $z = 0.7$ (the edge of the CMASS sample). Comparing our results to those using the galaxy distances from redMaPPer galaxy clusters as a reference, we find a similar result between filaments and clusters. Moreover, we find that the effect of clusters on the stellar mass of nearby galaxies depends on the galaxy's filamentary environment. Our findings illustrate the strong correlation of galaxy properties with proximity to density ridges, strongly supporting the claim that density ridges are good tracers of filaments.
△ Less
Submitted 12 January, 2017; v1 submitted 21 September, 2015;
originally announced September 2015.
-
Investigating Galaxy-Filament Alignments in Hydrodynamic Simulations using Density Ridges
Authors:
Yen-Chi Chen,
Shirley Ho,
Ananth Tenneti,
Rachel Mandelbaum,
Rupert Croft,
Tiziana DiMatteo,
Peter E. Freeman,
Christopher R. Genovese,
Larry Wasserman
Abstract:
In this paper, we study the filamentary structures and the galaxy alignment along filaments at redshift $z=0.06$ in the MassiveBlack-II simulation, a state-of-the-art, high-resolution hydrodynamical cosmological simulation which includes stellar and AGN feedback in a volume of (100 Mpc$/h$)$^3$. The filaments are constructed using the subspace constrained mean shift (SCMS; Ozertem & Erdogmus (2011…
▽ More
In this paper, we study the filamentary structures and the galaxy alignment along filaments at redshift $z=0.06$ in the MassiveBlack-II simulation, a state-of-the-art, high-resolution hydrodynamical cosmological simulation which includes stellar and AGN feedback in a volume of (100 Mpc$/h$)$^3$. The filaments are constructed using the subspace constrained mean shift (SCMS; Ozertem & Erdogmus (2011) and Chen et al. (2015a)). First, we show that reconstructed filaments using galaxies and reconstructed filaments using dark matter particles are similar to each other; over $50\%$ of the points on the galaxy filaments have a corresponding point on the dark matter filaments within distance $0.13$ Mpc$/h$ (and vice versa) and this distance is even smaller at high-density regions. Second, we observe the alignment of the major principal axis of a galaxy with respect to the orientation of its nearest filament and detect a $2.5$ Mpc$/h$ critical radius for filament's influence on the alignment when the subhalo mass of this galaxy is between $10^9M_\odot/h$ and $10^{12}M_\odot/h$. Moreover, we find the alignment signal to increase significantly with the subhalo mass. Third, when a galaxy is close to filaments (less than $0.25$ Mpc$/h$), the galaxy alignment toward the nearest galaxy group depends on the galaxy subhalo mass. Finally, we find that galaxies close to filaments or groups tend to be rounder than those away from filaments or groups.
△ Less
Submitted 17 August, 2015;
originally announced August 2015.
-
Optimal Ridge Detection using Coverage Risk
Authors:
Yen-Chi Chen,
Christopher R. Genovese,
Shirley Ho,
Larry Wasserman
Abstract:
We introduce the concept of coverage risk as an error measure for density ridge estimation. The coverage risk generalizes the mean integrated square error to set estimation. We propose two risk estimators for the coverage risk and we show that we can select tuning parameters by minimizing the estimated risk. We study the rate of convergence for coverage risk and prove consistency of the risk estim…
▽ More
We introduce the concept of coverage risk as an error measure for density ridge estimation. The coverage risk generalizes the mean integrated square error to set estimation. We propose two risk estimators for the coverage risk and we show that we can select tuning parameters by minimizing the estimated risk. We study the rate of convergence for coverage risk and prove consistency of the risk estimators. We apply our method to three simulated datasets and to cosmology data. In all the examples, the proposed method successfully recover the underlying density structure.
△ Less
Submitted 7 June, 2015;
originally announced June 2015.
-
Cosmic Web Reconstruction through Density Ridges: Method and Algorithm
Authors:
Yen-Chi Chen,
Shirley Ho,
Peter E. Freeman,
Christopher R. Genovese,
Larry Wasserman
Abstract:
The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters that dictates the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the Subspace Constrained Mean Shift (SCMS) algorithm…
▽ More
The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters that dictates the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the Subspace Constrained Mean Shift (SCMS) algorithm (Ozertem and Erdogmus (2011); Genovese et al. (2012)) to uncover filamentary structure in galaxy data. The SCMS algorithm is a gradient ascent method that models filaments as density ridges, one-dimensional smooth curves that trace high-density regions within the point cloud. We also demonstrate how augmenting the SCMS algorithm with bootstrap-based methods of uncertainty estimation allows one to place uncertainty bands around putative filaments. We apply the SCMS method to datasets sampled from the P3M N-body simulation, with galaxy number densities consistent with SDSS and WFIRST-AFTA and to LOWZ and CMASS data from the Baryon Oscillation Spectroscopic Survey (BOSS). To further assess the efficacy of SCMS, we compare the relative locations of BOSS filaments with galaxy clusters in the redMaPPer catalog, and find that redMaPPer clusters are significantly closer (with p-values $< 10^{-9}$) to SCMS-detected filaments than to randomly selected galaxies.
△ Less
Submitted 27 August, 2015; v1 submitted 21 January, 2015;
originally announced January 2015.
-
Learning From Non-iid Data: Fast Rates for the One-vs-All Multiclass Plug-in Classifiers
Authors:
Vu Dinh,
Lam Si Tung Ho,
Nguyen Viet Cuong,
Duy Nguyen,
Binh T. Nguyen
Abstract:
We prove new fast learning rates for the one-vs-all multiclass plug-in classifiers trained either from exponentially strongly mixing data or from data generated by a converging drifting distribution. These are two typical scenarios where training data are not iid. The learning rates are obtained under a multiclass version of Tsybakov's margin assumption, a type of low-noise assumption, and do not…
▽ More
We prove new fast learning rates for the one-vs-all multiclass plug-in classifiers trained either from exponentially strongly mixing data or from data generated by a converging drifting distribution. These are two typical scenarios where training data are not iid. The learning rates are obtained under a multiclass version of Tsybakov's margin assumption, a type of low-noise assumption, and do not depend on the number of classes. Our results are general and include a previous result for binary-class plug-in classifiers with iid data as a special case. In contrast to previous works for least squares SVMs under the binary-class setting, our results retain the optimal learning rate in the iid case.
△ Less
Submitted 24 January, 2015; v1 submitted 12 August, 2014;
originally announced August 2014.
-
Generalization and Robustness of Batched Weighted Average Algorithm with V-geometrically Ergodic Markov Data
Authors:
Nguyen Viet Cuong,
Lam Si Tung Ho,
Vu Dinh
Abstract:
We analyze the generalization and robustness of the batched weighted average algorithm for V-geometrically ergodic Markov data. This algorithm is a good alternative to the empirical risk minimization algorithm when the latter suffers from overfitting or when optimizing the empirical risk is hard. For the generalization of the algorithm, we prove a PAC-style bound on the training sample size for th…
▽ More
We analyze the generalization and robustness of the batched weighted average algorithm for V-geometrically ergodic Markov data. This algorithm is a good alternative to the empirical risk minimization algorithm when the latter suffers from overfitting or when optimizing the empirical risk is hard. For the generalization of the algorithm, we prove a PAC-style bound on the training sample size for the expected $L_1$-loss to converge to the optimal loss when training data are V-geometrically ergodic Markov chains. For the robustness, we show that if the training target variable's values contain bounded noise, then the generalization bound of the algorithm deviates at most by the range of the noise. Our results can be applied to the regression problem, the classification problem, and the case where there exists an unknown deterministic target hypothesis.
△ Less
Submitted 12 August, 2014; v1 submitted 12 June, 2014;
originally announced June 2014.
-
On the Detection of Concept Changes in Time-Varying Data Stream by Testing Exchangeability
Authors:
Shen-Shyang Ho,
Harry Wechsler
Abstract:
A martingale framework for concept change detection based on testing data exchangeability was recently proposed (Ho, 2005). In this paper, we describe the proposed change-detection test based on the Doob's Maximal Inequality and show that it is an approximation of the sequential probability ratio test (SPRT). The relationship between the threshold value used in the proposed test and its size and p…
▽ More
A martingale framework for concept change detection based on testing data exchangeability was recently proposed (Ho, 2005). In this paper, we describe the proposed change-detection test based on the Doob's Maximal Inequality and show that it is an approximation of the sequential probability ratio test (SPRT). The relationship between the threshold value used in the proposed test and its size and power is deduced from the approximation. The mean delay time before a change is detected is estimated using the average sample number of a SPRT. The performance of the test using various threshold values is examined on five different data stream scenarios simulated using two synthetic data sets. Finally, experimental results show that the test is effective in detecting changes in time-varying data streams simulated using three benchmark data sets.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.