Skip to main content

Showing 1–40 of 40 results for author: Ho, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2410.22065  [pdf, other

    stat.ML cs.LG

    Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient

    Authors: Vu C. Dinh, Lam Si Tung Ho, Cuong V. Nguyen

    Abstract: We analyze the error rates of the Hamiltonian Monte Carlo algorithm with leapfrog integrator for Bayesian neural network inference. We show that due to the non-differentiability of activation functions in the ReLU family, leapfrog HMC for networks with these activation functions has a large local error rate of $Ω(ε)$ rather than the classical error rate of $O(ε^3)$. This leads to a higher rejectio… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: Paper published at NeurIPS 2024

  2. arXiv:2409.17968  [pdf, other

    stat.ME

    Nonparametric Inference Framework for Time-dependent Epidemic Models

    Authors: Son Luu, Edward Susko, Lam Si Tung Ho

    Abstract: Compartmental models, especially the Susceptible-Infected-Removed (SIR) model, have long been used to understand the behaviour of various diseases. Allowing parameters, such as the transmission rate, to be time-dependent functions makes it possible to adjust for and make inferences about changes in the process due to mitigation strategies or evolutionary changes of the infectious agent. In this ar… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  3. arXiv:2406.02585  [pdf, other

    cs.LG cs.AI stat.ML

    Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

    Authors: Siavash Golkar, Alberto Bietti, Mariel Pettee, Michael Eickenberg, Miles Cranmer, Keiya Hirashima, Geraud Krawezik, Nicholas Lourie, Michael McCabe, Rudy Morel, Ruben Ohana, Liam Holden Parker, Bruno Régaldo-Saint Blancard, Kyunghyun Cho, Shirley Ho

    Abstract: Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datas… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

  4. arXiv:2312.17480  [pdf, other

    q-bio.PE stat.ME

    Detection of evolutionary shifts in variance under an Ornsten-Uhlenbeck model

    Authors: Wensha Zhang, Lam Si Tung Ho, Toby Kenney

    Abstract: Abrupt environmental changes can lead to evolutionary shifts in not only the optimal trait value, but also the rate of adaptation and the diffusion variance in trait evolution. While several methods exist for detecting shifts in optimal values, few explicitly model shifts in both evolutionary variance and adaptation rates. We use a multi-optima and multi-variance Ornstein-Uhlenbeck (OU) process mo… ▽ More

    Submitted 31 March, 2025; v1 submitted 29 December, 2023; originally announced December 2023.

  5. arXiv:2312.00656  [pdf, other

    cs.LG cs.AI stat.ML

    Simple Transferability Estimation for Regression Tasks

    Authors: Cuong N. Nguyen, Phong Tran, Lam Si Tung Ho, Vu Dinh, Anh T. Tran, Tal Hassner, Cuong V. Nguyen

    Abstract: We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel… ▽ More

    Submitted 3 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Paper published at The 39th Conference on Uncertainty in Artificial Intelligence (UAI) 2023

  6. arXiv:2310.05892  [pdf, ps, other

    stat.ML cs.LG

    A Generalization Bound of Deep Neural Networks for Dependent Data

    Authors: Quan Huu Do, Binh T. Nguyen, Lam Si Tung Ho

    Abstract: Existing generalization bounds for deep neural networks require data to be independent and identically distributed (iid). This assumption may not hold in real-life applications such as evolutionary biology, infectious disease epidemiology, and stock price prediction. This work establishes a generalization bound of feed-forward neural networks for non-stationary $φ$-mixing data.

    Submitted 9 October, 2023; originally announced October 2023.

  7. arXiv:2310.02994  [pdf, other

    cs.LG cs.AI stat.ML

    Multiple Physics Pretraining for Physical Surrogate Models

    Authors: Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, Mariel Pettee, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho

    Abstract: We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly… ▽ More

    Submitted 10 December, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  8. arXiv:2310.02989  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    xVal: A Continuous Numerical Tokenization for Scientific Language Models

    Authors: Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno Régaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho

    Abstract: Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science… ▽ More

    Submitted 15 December, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 15 pages, 12 figures. Appendix: 8 pages, 2 figures. Accepted contribution at the NeurIPS Workshop on ML for the Physical Sciences

  9. arXiv:2204.07646  [pdf, other

    astro-ph.CO stat.AP

    Wavelet Moments for Cosmological Parameter Estimation

    Authors: Michael Eickenberg, Erwan Allys, Azadeh Moradinezhad Dizgah, Pablo Lemos, Elena Massara, Muntazir Abidi, ChangHoon Hahn, Sultan Hassan, Bruno Regaldo-Saint Blancard, Shirley Ho, Stephane Mallat, Joakim Andén, Francisco Villaescusa-Navarro

    Abstract: Extracting non-Gaussian information from the non-linear regime of structure formation is key to fully exploiting the rich data from upcoming cosmological surveys probing the large-scale structure of the universe. However, due to theoretical and computational complexities, this remains one of the main challenges in analyzing observational data. We present a set of summary statistics for cosmologica… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

  10. arXiv:2204.06032  [pdf, other

    q-bio.PE stat.ME

    Evolutionary shift detection with ensemble variable selection

    Authors: Wensha Zhang, Toby Kenney, Lam Si Tung Ho

    Abstract: 1. Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes. 2. We propose an ensemble variable selection method (R package ELPASO) for the evolutionary shift detection task and compare it with existing methods (R packages l1ou and PhylogeneticEM) under several scenarios.… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  11. arXiv:2109.13061  [pdf, other

    cs.LG stat.ML

    Searching for Minimal Optimal Neural Networks

    Authors: Lam Si Tung Ho, Vu Dinh

    Abstract: Large neural network models have high predictive power but may suffer from overfitting if the training set is not large enough. Therefore, it is desirable to select an appropriate size for neural networks. The destructive approach, which starts with a large architecture and then reduces the size using a Lasso-type penalty, has been used extensively for this task. Despite its popularity, there is n… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  12. arXiv:2104.00151  [pdf, other

    stat.ME q-bio.PE

    Ancestral state reconstruction with large numbers of sequences and edge-length estimation

    Authors: Lam Si Tung Ho, Edward Susko

    Abstract: Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

  13. arXiv:2101.04117  [pdf, other

    astro-ph.EP astro-ph.IM cs.AI cs.LG stat.ML

    A Bayesian neural network predicts the dissolution of compact planetary systems

    Authors: Miles Cranmer, Daniel Tamayo, Hanno Rein, Peter Battaglia, Samuel Hadden, Philip J. Armitage, Shirley Ho, David N. Spergel

    Abstract: Despite over three hundred years of effort, no solutions exist for predicting when a general planetary configuration will become unstable. We introduce a deep learning architecture to push forward this problem for compact systems. While current machine learning algorithms in this area rely on scientist-derived instability metrics, our new technique learns its own metrics from scratch, enabled by a… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: 8 content pages, 7 appendix and references. 8 figures. Source code at: https://github.com/MilesCranmer/bnn_chaos_model inference code at https://github.com/dtamayo/spock

  14. arXiv:2010.08097  [pdf, other

    cs.LG math.ST stat.ML

    Consistent Feature Selection for Analytic Deep Neural Networks

    Authors: Vu Dinh, Lam Si Tung Ho

    Abstract: One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the prediction aspect of the problem with virtually no work on feature selection consistency for deep neural networks due to the model's severe nonlinearity and unide… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  15. arXiv:2007.04459  [pdf, other

    cs.LG astro-ph.GA stat.ML

    Meta-Learning for One-Class Classification with Few Examples using Order-Equivariant Network

    Authors: Ademola Oladosu, Tony Xu, Philip Ekfeldt, Brian A. Kelly, Miles Cranmer, Shirley Ho, Adrian M. Price-Whelan, Gabriella Contardo

    Abstract: This paper presents a meta-learning framework for few-shots One-Class Classification (OCC) at test-time, a setting where labeled examples are only available for the positive class, and no supervision is given for the negative example. We consider that we have a set of `one-class classification' objective-tasks with only a small set of positive examples available for each task, and a set of trainin… ▽ More

    Submitted 21 May, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

  16. arXiv:2006.11287  [pdf, other

    cs.LG astro-ph.CO astro-ph.IM physics.comp-ph stat.ML

    Discovering Symbolic Models from Deep Learning with Inductive Biases

    Authors: Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, Shirley Ho

    Abstract: We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical rela… ▽ More

    Submitted 17 November, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020. 9 pages content + 16 pages appendix/references. Supporting code found at https://github.com/MilesCranmer/symbolic_deep_learning

  17. arXiv:2006.00334  [pdf, other

    stat.ML cs.LG math.ST

    Consistent feature selection for neural networks via Adaptive Group Lasso

    Authors: Vu Dinh, Lam Si Tung Ho

    Abstract: One main obstacle for the wide use of deep learning in medical and engineering sciences is its interpretability. While neural network models are strong tools for making predictions, they often provide little information about which features play significant roles in influencing the prediction accuracy. To overcome this issue, many regularization procedures for learning with neural networks have be… ▽ More

    Submitted 2 December, 2021; v1 submitted 30 May, 2020; originally announced June 2020.

  18. arXiv:2003.10336  [pdf, other

    stat.AP q-bio.PE

    Efficient Bayesian Inference of General Gaussian Models on Large Phylogenetic Trees

    Authors: Paul Bastide, Lam Si Tung Ho, Guy Baele, Philippe Lemey, Marc A Suchard

    Abstract: Phylogenetic comparative methods correct for shared evolutionary history among a set of non-independent organisms by modeling sample traits as arising from a diffusion process along on the branches of a possibly unknown history. To incorporate such uncertainty, we present a scalable Bayesian inference framework under a general Gaussian trait evolution model that exploits Hamiltonian Monte Carlo (H… ▽ More

    Submitted 29 September, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

  19. arXiv:2003.04630  [pdf, other

    cs.LG math.DS physics.comp-ph physics.data-an stat.ML

    Lagrangian Neural Networks

    Authors: Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, Shirley Ho

    Abstract: Accurate models of the world are built upon notions of its underlying symmetries. In physics, these symmetries correspond to conservation laws, such as for energy and momentum. Yet even though neural network models see increasing use in the physical sciences, they struggle to learn these symmetries. In this paper, we propose Lagrangian Neural Networks (LNNs), which can parameterize arbitrary Lagra… ▽ More

    Submitted 30 July, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: 7 pages (+2 appendix). Published in ICLR 2020 Deep Differential Equations Workshop. Code at github.com/MilesCranmer/lagrangian_nns

  20. arXiv:1909.05862  [pdf, other

    cs.LG astro-ph.IM physics.comp-ph stat.ML

    Learning Symbolic Physics with Graph Networks

    Authors: Miles D. Cranmer, Rui Xu, Peter Battaglia, Shirley Ho

    Abstract: We introduce an approach for imposing physically motivated inductive biases on graph networks to learn interpretable representations and improved zero-shot generalization. Our experiments show that our graph network models, which implement this inductive bias, can learn message representations equivalent to the true force vector when trained on n-body gravitational and spring-like simulations. We… ▽ More

    Submitted 1 November, 2019; v1 submitted 12 September, 2019; originally announced September 2019.

    Comments: 6 pages; references added + improvements to writing and clarity; accepted for an oral presentation at Machine Learning and the Physical Sciences Workshop @ NeurIPS 2019

  21. arXiv:1908.08045  [pdf, other

    astro-ph.IM astro-ph.GA cs.LG stat.ML

    Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates

    Authors: Miles D. Cranmer, Richard Galvez, Lauren Anderson, David N. Spergel, Shirley Ho

    Abstract: We demonstrate an algorithm for learning a flexible color-magnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multi-dimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this data-driven model using Gaia DR2 photometry and parallax… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

    Comments: 15 pages, 8 figures

  22. arXiv:1906.03222  [pdf, other

    stat.ME stat.CO

    Inferring phenotypic trait evolution on large trees with many incomplete measurements

    Authors: Gabriel Hassler, Max R. Tolkoff, William L. Allen, Lam Si Tung Ho, Philippe Lemey, Marc A. Suchard

    Abstract: Comparative biologists are often interested in inferring covariation between multiple biological traits sampled across numerous related taxa. To properly study these relationships, we must control for the shared evolutionary history of the taxa to avoid spurious inference. Existing control techniques almost universally scale poorly as the number of taxa increases. An additional challenge arises as… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: 29 pages, 7 figures, 2 tables, 3 supplementary sections

  23. arXiv:1906.02179  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Bayesian Active Learning With Abstention Feedbacks

    Authors: Cuong V. Nguyen, Lam Si Tung Ho, Huan Xu, Vu Dinh, Binh Nguyen

    Abstract: We study pool-based active learning with abstention feedbacks where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These are… ▽ More

    Submitted 30 December, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: Poster presented at 2019 ICML Workshop on Human in the Loop Learning 2019 (non-archival). arXiv admin note: substantial text overlap with arXiv:1705.08481

  24. arXiv:1811.10115  [pdf, other

    cs.IT cs.LG stat.ML

    Recovery guarantees for polynomial approximation from dependent data with outliers

    Authors: Lam Si Tung Ho, Hayden Schaeffer, Giang Tran, Rachel Ward

    Abstract: Learning non-linear systems from noisy, limited, and/or dependent data is an important task across various scientific fields including statistics, engineering, computer science, mathematics, and many more. In general, this learning task is ill-posed; however, additional information about the data's structure or on the behavior of the unknown function can make the task well-posed. In this work, we… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

    Comments: 17 pages, 1 figure

    MSC Class: 68T05; 41A10; 60F05; 68Q32; 62G08; 94A15; 65K10

  25. arXiv:1805.00159  [pdf, other

    astro-ph.CO stat.AP

    Detecting Galaxy-Filament Alignments in the Sloan Digital Sky Survey III

    Authors: Yen-Chi Chen, Shirley Ho, Jonathan Blazek, Siyu He, Rachel Mandelbaum, Peter Melchior, Sukhdeep Singh

    Abstract: Previous studies have shown the filamentary structures in the cosmic web influence the alignments of nearby galaxies. We study this effect in the LOWZ sample of the Sloan Digital Sky Survey using the "Cosmic Web Reconstruction" filament catalogue. We find that LOWZ galaxies exhibit a small but statistically significant alignment in the direction parallel to the orientation of nearby filaments. Thi… ▽ More

    Submitted 21 February, 2019; v1 submitted 30 April, 2018; originally announced May 2018.

    Comments: 14 pages, 13 figures. Accepted to the MNRAS

  26. arXiv:1711.02033  [pdf, other

    astro-ph.CO cs.LG stat.ML

    Estimating Cosmological Parameters from the Dark Matter Distribution

    Authors: Siamak Ravanbakhsh, Junier Oliva, Sebastien Fromenteau, Layne C. Price, Shirley Ho, Jeff Schneider, Barnabas Poczos

    Abstract: A grand challenge of the 21st century cosmology is to accurately estimate the cosmological parameters of our Universe. A major approach to estimating the cosmological parameters is to use the large-scale matter distribution of the Universe. Galaxy surveys provide the means to map out cosmic large-scale structure in three dimensions. Information about galaxy locations is typically summarized in a "… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

    Comments: ICML 2016

  27. arXiv:1706.01643  [pdf

    cs.LG q-bio.QM stat.ML

    Retrosynthetic reaction prediction using neural sequence-to-sequence models

    Authors: Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, Vijay Pande

    Abstract: We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation… ▽ More

    Submitted 6 June, 2017; originally announced June 2017.

  28. arXiv:1705.08481   

    stat.ML cs.LG

    Bayesian Pool-based Active Learning With Abstention Feedbacks

    Authors: Cuong V. Nguyen, Lam Si Tung Ho, Huan Xu, Vu Dinh, Binh Nguyen

    Abstract: We study pool-based active learning with abstention feedbacks, where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These ar… ▽ More

    Submitted 2 January, 2021; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: There is a new version at arXiv:1906.02179

  29. arXiv:1609.09481  [pdf, ps, other

    stat.ML cs.LG

    Fast learning rates with heavy-tailed losses

    Authors: Vu Dinh, Lam Si Tung Ho, Duy Nguyen, Binh T. Nguyen

    Abstract: We study fast learning rates when the losses are not necessarily bounded and may have a distribution with heavy tails. To enable such analyses, we introduce two new conditions: (i) the envelope function $\sup_{f \in \mathcal{F}}|\ell \circ f|$, where $\ell$ is the loss function and $\mathcal{F}$ is the hypothesis class, exists and is $L^r$-integrable, and (ii) $\ell$ satisfies the multi-scale Bern… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.

    Comments: Advances in Neural Information Processing Systems (NIPS 2016): 11 pages

  30. arXiv:1608.06769  [pdf, other

    stat.CO q-bio.PE

    Direct likelihood-based inference for discretely observed stochastic compartmental models of infectious disease

    Authors: Lam Si Tung Ho, Forrest W. Crawford, Marc A. Suchard

    Abstract: Stochastic compartmental models are important tools for understanding the course of infectious diseases epidemics in populations and in prospective evaluation of intervention policies. However, calculating the likelihood for discretely observed data from even simple models -- such as the ubiquitous susceptible-infectious-removed (SIR) model -- has been considered computationally intractable, since… ▽ More

    Submitted 25 July, 2018; v1 submitted 24 August, 2016; originally announced August 2016.

  31. arXiv:1603.03819  [pdf, other

    stat.CO

    Birth/birth-death processes and their computable transition probabilities with biological applications

    Authors: Lam Si Tung Ho, Jason Xu, Forrest W. Crawford, Vladimir N. Minin, Marc A. Suchard

    Abstract: Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationall… ▽ More

    Submitted 7 August, 2017; v1 submitted 11 March, 2016; originally announced March 2016.

  32. arXiv:1512.07948  [pdf, other

    q-bio.PE stat.ME

    A Relaxed Drift Diffusion Model for Phylogenetic Trait Evolution

    Authors: Mandev S. Gill, Lam Si Tung Ho, Guy Baele, Philippe Lemey, Marc A. Suchard

    Abstract: Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits… ▽ More

    Submitted 29 December, 2015; v1 submitted 24 December, 2015; originally announced December 2015.

    Comments: 35 pages, 3 figures, 5 tables. Changed from double-spaced to single-spaced

  33. arXiv:1509.06443  [pdf, other

    astro-ph.CO stat.AP

    Cosmic Web Reconstruction through Density Ridges: Catalogue

    Authors: Yen-Chi Chen, Shirley Ho, Jon Brinkmann, Peter E. Freeman, Christopher R. Genovese, Donald P. Schneider, Larry Wasserman

    Abstract: We construct a catalogue for filaments using a novel approach called SCMS (subspace constrained mean shift; Ozertem & Erdogmus 2011; Chen et al. 2015). SCMS is a gradient-based method that detects filaments through density ridges (smooth curves tracing high-density regions). A great advantage of SCMS is its uncertainty measure, which allows an evaluation of the errors for the detected filaments. T… ▽ More

    Submitted 21 September, 2015; originally announced September 2015.

    Comments: 14 pages, 12 figures, 4 tables

  34. arXiv:1509.06376  [pdf, other

    astro-ph.GA astro-ph.CO stat.AP

    Detecting Effects of Filaments on Galaxy Properties in the Sloan Digital Sky Survey III

    Authors: Yen-Chi Chen, Shirley Ho, Rachel Mandelbaum, Neta A. Bahcall, Joel R. Brownstein, Peter E. Freeman, Christopher R. Genovese, Donald P. Schneider, Larry Wasserman

    Abstract: We study the effects of filaments on galaxy properties in the Sloan Digital Sky Survey (SDSS) Data Release 12 using filaments from the `Cosmic Web Reconstruction' catalogue (Chen et al. 2016), a publicly available filament catalogue for SDSS. Since filaments are tracers of medium-to-high density regions, we expect that galaxy properties associated with the environment are dependent on the distance… ▽ More

    Submitted 12 January, 2017; v1 submitted 21 September, 2015; originally announced September 2015.

    Comments: To appear in MNRAS

  35. arXiv:1508.04149  [pdf, other

    astro-ph.CO stat.AP

    Investigating Galaxy-Filament Alignments in Hydrodynamic Simulations using Density Ridges

    Authors: Yen-Chi Chen, Shirley Ho, Ananth Tenneti, Rachel Mandelbaum, Rupert Croft, Tiziana DiMatteo, Peter E. Freeman, Christopher R. Genovese, Larry Wasserman

    Abstract: In this paper, we study the filamentary structures and the galaxy alignment along filaments at redshift $z=0.06$ in the MassiveBlack-II simulation, a state-of-the-art, high-resolution hydrodynamical cosmological simulation which includes stellar and AGN feedback in a volume of (100 Mpc$/h$)$^3$. The filaments are constructed using the subspace constrained mean shift (SCMS; Ozertem & Erdogmus (2011… ▽ More

    Submitted 17 August, 2015; originally announced August 2015.

    Comments: 11 pages, 10 figures

  36. arXiv:1506.02278  [pdf, other

    stat.ME stat.ML

    Optimal Ridge Detection using Coverage Risk

    Authors: Yen-Chi Chen, Christopher R. Genovese, Shirley Ho, Larry Wasserman

    Abstract: We introduce the concept of coverage risk as an error measure for density ridge estimation. The coverage risk generalizes the mean integrated square error to set estimation. We propose two risk estimators for the coverage risk and we show that we can select tuning parameters by minimizing the estimated risk. We study the rate of convergence for coverage risk and prove consistency of the risk estim… ▽ More

    Submitted 7 June, 2015; originally announced June 2015.

    Comments: 16 pages, 4 figures

  37. arXiv:1501.05303  [pdf, other

    astro-ph.CO stat.AP

    Cosmic Web Reconstruction through Density Ridges: Method and Algorithm

    Authors: Yen-Chi Chen, Shirley Ho, Peter E. Freeman, Christopher R. Genovese, Larry Wasserman

    Abstract: The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters that dictates the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the Subspace Constrained Mean Shift (SCMS) algorithm… ▽ More

    Submitted 27 August, 2015; v1 submitted 21 January, 2015; originally announced January 2015.

    Comments: To appear in MNRAS. 18 pages, 19 figures, 1 table

  38. arXiv:1408.2714  [pdf, ps, other

    stat.ML

    Learning From Non-iid Data: Fast Rates for the One-vs-All Multiclass Plug-in Classifiers

    Authors: Vu Dinh, Lam Si Tung Ho, Nguyen Viet Cuong, Duy Nguyen, Binh T. Nguyen

    Abstract: We prove new fast learning rates for the one-vs-all multiclass plug-in classifiers trained either from exponentially strongly mixing data or from data generated by a converging drifting distribution. These are two typical scenarios where training data are not iid. The learning rates are obtained under a multiclass version of Tsybakov's margin assumption, a type of low-noise assumption, and do not… ▽ More

    Submitted 24 January, 2015; v1 submitted 12 August, 2014; originally announced August 2014.

    Comments: 12th Annual Conference on Theory and Applications of Models of Computation (TAMC 2015)

  39. arXiv:1406.3166  [pdf, ps, other

    stat.ML

    Generalization and Robustness of Batched Weighted Average Algorithm with V-geometrically Ergodic Markov Data

    Authors: Nguyen Viet Cuong, Lam Si Tung Ho, Vu Dinh

    Abstract: We analyze the generalization and robustness of the batched weighted average algorithm for V-geometrically ergodic Markov data. This algorithm is a good alternative to the empirical risk minimization algorithm when the latter suffers from overfitting or when optimizing the empirical risk is hard. For the generalization of the algorithm, we prove a PAC-style bound on the training sample size for th… ▽ More

    Submitted 12 August, 2014; v1 submitted 12 June, 2014; originally announced June 2014.

    Comments: This article was published in Proceedings of the 24th International Conference on Algorithmic Learning Theory (ALT 2013). This is the accepted version. The final publication is available at link.springer.com

  40. arXiv:1207.1379  [pdf

    cs.LG stat.ML

    On the Detection of Concept Changes in Time-Varying Data Stream by Testing Exchangeability

    Authors: Shen-Shyang Ho, Harry Wechsler

    Abstract: A martingale framework for concept change detection based on testing data exchangeability was recently proposed (Ho, 2005). In this paper, we describe the proposed change-detection test based on the Doob's Maximal Inequality and show that it is an approximation of the sequential probability ratio test (SPRT). The relationship between the threshold value used in the proposed test and its size and p… ▽ More

    Submitted 4 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

    Report number: UAI-P-2005-PG-267-274