-
Incorporating circuit theory into a dynamic model for crowd-sourced observations of migratory birds
Authors:
Michael F. Christensen,
Peter D. Hoff
Abstract:
While the overarching pattern of biannual avian migration is well understood, there are significant questions pertaining to this phenomenon that invite further study. Necessary to any analysis of these questions is an understanding of how a given species' spatial distribution evolves in time. While studies of animal movement are often conducted using telemetry data, the collection of such data can…
▽ More
While the overarching pattern of biannual avian migration is well understood, there are significant questions pertaining to this phenomenon that invite further study. Necessary to any analysis of these questions is an understanding of how a given species' spatial distribution evolves in time. While studies of animal movement are often conducted using telemetry data, the collection of such data can be time- and resource-intensive, frequently resulting in small sample sizes. Ecological surveys of animal populations are also indicative of species distribution trends, but may be constrained to a limited spatial domain. Within this article we utilize crowd-sourced observations from the eBird database to model the abundance of migratory bird species in space and time. While crowd-sourced observations are individually less reliable than those produced by experts, the sheer size and spatial coverage of the eBird database make it attractive for use in this setting. We introduce a hidden Markov model for observed bird counts utilizing a novel transition structure developed using principles from circuit theory. After illustrating model properties we fit it to observations of Baltimore orioles and yellow-rumped warblers within the eastern United States and discuss insight it provides into the migratory patterns for these species.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
A dimension reduction approach to edge weight estimation for use in spatial models
Authors:
Michael F. Christensen,
Jo Eidsvik
Abstract:
Models for areal data are traditionally defined using the neighborhood structure of the regions on which data are observed. The unweighted adjacency matrix of a graph is commonly used to characterize the relationships between locations, resulting in the implicit assumption that all pairs of neighboring regions interact similarly, an assumption which may not be true in practice. It has been shown t…
▽ More
Models for areal data are traditionally defined using the neighborhood structure of the regions on which data are observed. The unweighted adjacency matrix of a graph is commonly used to characterize the relationships between locations, resulting in the implicit assumption that all pairs of neighboring regions interact similarly, an assumption which may not be true in practice. It has been shown that more complex spatial relationships between graph nodes may be represented when edge weights are allowed to vary. Christensen and Hoff (2023) introduced a covariance model for data observed on graphs which is more flexible than traditional alternatives, parameterizing covariance as a function of an unknown edge weights matrix. A potential issue with their approach is that each edge weight is treated as a unique parameter, resulting in increasingly challenging parameter estimation as graph size increases. Within this article we propose a framework for estimating edge weight matrices that reduces their effective dimension via a basis function representation of of the edge weights. We show that this method may be used to enhance the performance and flexibility of covariance models parameterized by such matrices in a series of illustrations, simulations and data examples.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Joint Multivariate and Functional Modeling for Plant Traits and Reflectances
Authors:
Philip A. White,
Michael F. Christensen,
Henry Frye,
Alan E. Gelfand,
John A. Silander Jr
Abstract:
The investigation of leaf-level traits in response to varying environmental conditions has immense importance for understanding plant ecology. Remote sensing technology enables measurement of the reflectance of plants to make inferences about underlying traits along environmental gradients. While much focus has been placed on understanding how reflectance and traits are related at the leaf-level,…
▽ More
The investigation of leaf-level traits in response to varying environmental conditions has immense importance for understanding plant ecology. Remote sensing technology enables measurement of the reflectance of plants to make inferences about underlying traits along environmental gradients. While much focus has been placed on understanding how reflectance and traits are related at the leaf-level, the challenge of modelling the dependence of this relationship along environmental gradients has limited this line of inquiry. Here, we take up the problem of jointly modeling traits and reflectance given environment. Our objective is to assess not only response to environmental regressors but also dependence between trait levels and the reflectance spectrum in the context of this regression. This leads to joint modeling of a response vector of traits with reflectance arising as a functional response over the wavelength spectrum. To conduct this investigation, we employ a dataset from a global biodiversity hotspot, the Greater Cape Floristic Region in South Africa.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
A flexible and interpretable spatial covariance model for data on graphs
Authors:
Michael F. Christensen,
Peter D. Hoff
Abstract:
Spatial models for areal data are often constructed such that all pairs of adjacent regions are assumed to have near-identical spatial autocorrelation. In practice, data can exhibit dependence structures more complicated than can be represented under this assumption. In this article we develop a new model for spatially correlated data observed on graphs, which can flexibly represented many types o…
▽ More
Spatial models for areal data are often constructed such that all pairs of adjacent regions are assumed to have near-identical spatial autocorrelation. In practice, data can exhibit dependence structures more complicated than can be represented under this assumption. In this article we develop a new model for spatially correlated data observed on graphs, which can flexibly represented many types of spatial dependence patterns while retaining aspects of the original graph geometry. Our method implies an embedding of the graph into Euclidean space wherein covariance can be modeled using traditional covariance functions, such as those from the Matérn family. We parameterize our model using a class of graph metrics compatible with such covariance functions, and which characterize distance in terms of network flow, a property useful for understanding proximity in many ecological settings. By estimating the parameters underlying these metrics, we recover the "intrinsic distances" between graph nodes, which assist in the interpretation of the estimated covariance and allow us to better understand the relationship between the observed process and spatial domain. We compare our model to existing methods for spatially dependent graph data, primarily conditional autoregressive models and their variants, and illustrate advantages of our method over traditional approaches. We fit our model to bird abundance data for several species in North Carolina, and show how it provides insight into the interactions between species-specific spatial distributions and geography.
△ Less
Submitted 2 July, 2024; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Spatial Functional Data Modeling of Plant Reflectances
Authors:
Philip A. White,
Henry Frye,
Michael F. Christensen,
Alan E. Gelfand,
John A. Silander Jr
Abstract:
Plant reflectance spectra - the profile of light reflected by leaves across different wavelengths - supply the spectral signature for a species at a spatial location to enable estimation of functional and taxonomic diversity for plants. We consider leaf spectra as "responses" to be explained spatially. These spectra/reflectances are functions over a wavelength band that respond to the environment.…
▽ More
Plant reflectance spectra - the profile of light reflected by leaves across different wavelengths - supply the spectral signature for a species at a spatial location to enable estimation of functional and taxonomic diversity for plants. We consider leaf spectra as "responses" to be explained spatially. These spectra/reflectances are functions over a wavelength band that respond to the environment.
Our motivating data are gathered for several families from the Cape Floristic Region (CFR) in South Africa and lead us to develop rich novel spatial models that can explain spectra for genera within families. Wavelength responses for an individual leaf are viewed as a function of wavelength, leading to functional data modeling. Local environmental features become covariates. We introduce wavelength - covariate interaction since the response to environmental regressors may vary with wavelength, so may variance. Formal spatial modeling enables prediction of reflectances for genera at unobserved locations with known environmental features. We incorporate spatial dependence, wavelength dependence, and space-wavelength interaction (in the spirit of space-time interaction). We implement out-of-sample validation to select a best model, discovering that the model features listed above are all informative for the functional data analysis. We then supply interpretation of the results under the selected model.
△ Less
Submitted 25 March, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Olympus: a benchmarking framework for noisy optimization and experiment planning
Authors:
Florian Häse,
Matteo Aldeghi,
Riley J. Hickman,
Loïc M. Roch,
Melodie Christensen,
Elena Liles,
Jason E. Hein,
Alán Aspuru-Guzik
Abstract:
Research challenges encountered across science, engineering, and economics can frequently be formulated as optimization tasks. In chemistry and materials science, recent growth in laboratory digitization and automation has sparked interest in optimization-guided autonomous discovery and closed-loop experimentation. Experiment planning strategies based on off-the-shelf optimization algorithms can b…
▽ More
Research challenges encountered across science, engineering, and economics can frequently be formulated as optimization tasks. In chemistry and materials science, recent growth in laboratory digitization and automation has sparked interest in optimization-guided autonomous discovery and closed-loop experimentation. Experiment planning strategies based on off-the-shelf optimization algorithms can be employed in fully autonomous research platforms to achieve desired experimentation goals with the minimum number of trials. However, the experiment planning strategy that is most suitable to a scientific discovery task is a priori unknown while rigorous comparisons of different strategies are highly time and resource demanding. As optimization algorithms are typically benchmarked on low-dimensional synthetic functions, it is unclear how their performance would translate to noisy, higher-dimensional experimental tasks encountered in chemistry and materials science. We introduce Olympus, a software package that provides a consistent and easy-to-use framework for benchmarking optimization algorithms against realistic experiments emulated via probabilistic deep-learning models. Olympus includes a collection of experimentally derived benchmark sets from chemistry and materials science and a suite of experiment planning strategies that can be easily accessed via a user-friendly python interface. Furthermore, Olympus facilitates the integration, testing, and sharing of custom algorithms and user-defined datasets. In brief, Olympus mitigates the barriers associated with benchmarking optimization algorithms on realistic experimental scenarios, promoting data sharing and the creation of a standard framework for evaluating the performance of experiment planning strategies
△ Less
Submitted 30 March, 2021; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Detecting anthropogenic cloud perturbations with deep learning
Authors:
Duncan Watson-Parris,
Samuel Sutherland,
Matthew Christensen,
Anthony Caterini,
Dino Sejdinovic,
Philip Stier
Abstract:
One of the most pressing questions in climate science is that of the effect of anthropogenic aerosol on the Earth's energy balance. Aerosols provide the `seeds' on which cloud droplets form, and changes in the amount of aerosol available to a cloud can change its brightness and other physical properties such as optical thickness and spatial extent. Clouds play a critical role in moderating global…
▽ More
One of the most pressing questions in climate science is that of the effect of anthropogenic aerosol on the Earth's energy balance. Aerosols provide the `seeds' on which cloud droplets form, and changes in the amount of aerosol available to a cloud can change its brightness and other physical properties such as optical thickness and spatial extent. Clouds play a critical role in moderating global temperatures and small perturbations can lead to significant amounts of cooling or warming. Uncertainty in this effect is so large it is not currently known if it is negligible, or provides a large enough cooling to largely negate present-day warming by CO2. This work uses deep convolutional neural networks to look for two particular perturbations in clouds due to anthropogenic aerosol and assess their properties and prevalence, providing valuable insights into their climatic effects.
△ Less
Submitted 29 November, 2019;
originally announced November 2019.
-
Machine Learning for Stochastic Parameterization: Generative Adversarial Networks in the Lorenz '96 Model
Authors:
David John Gagne II,
Hannah M. Christensen,
Aneesh C. Subramanian,
Adam H. Monahan
Abstract:
Stochastic parameterizations account for uncertainty in the representation of unresolved sub-grid processes by sampling from the distribution of possible sub-grid forcings. Some existing stochastic parameterizations utilize data-driven approaches to characterize uncertainty, but these approaches require significant structural assumptions that can limit their scalability. Machine learning models, i…
▽ More
Stochastic parameterizations account for uncertainty in the representation of unresolved sub-grid processes by sampling from the distribution of possible sub-grid forcings. Some existing stochastic parameterizations utilize data-driven approaches to characterize uncertainty, but these approaches require significant structural assumptions that can limit their scalability. Machine learning models, including neural networks, are able to represent a wide range of distributions and build optimized mappings between a large number of inputs and sub-grid forcings. Recent research on machine learning parameterizations has focused only on deterministic parameterizations. In this study, we develop a stochastic parameterization using the generative adversarial network (GAN) machine learning framework. The GAN stochastic parameterization is trained and evaluated on output from the Lorenz '96 model, which is a common baseline model for evaluating both parameterization and data assimilation techniques. We evaluate different ways of characterizing the input noise for the model and perform model runs with the GAN parameterization at weather and climate timescales. Some of the GAN configurations perform better than a baseline bespoke parameterization at both timescales, and the networks closely reproduce the spatio-temporal correlations and regimes of the Lorenz '96 system. We also find that in general those models which produce skillful forecasts are also associated with the best climate simulations.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Optimal Sup-norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression
Authors:
Xiaohong Chen,
Timothy M. Christensen
Abstract:
This paper makes several important contributions to the literature about nonparametric instrumental variables (NPIV) estimation and inference on a structural function $h_0$ and its functionals. First, we derive sup-norm convergence rates for computationally simple sieve NPIV (series 2SLS) estimators of $h_0$ and its derivatives. Second, we derive a lower bound that describes the best possible (min…
▽ More
This paper makes several important contributions to the literature about nonparametric instrumental variables (NPIV) estimation and inference on a structural function $h_0$ and its functionals. First, we derive sup-norm convergence rates for computationally simple sieve NPIV (series 2SLS) estimators of $h_0$ and its derivatives. Second, we derive a lower bound that describes the best possible (minimax) sup-norm rates of estimating $h_0$ and its derivatives, and show that the sieve NPIV estimator can attain the minimax rates when $h_0$ is approximated via a spline or wavelet sieve. Our optimal sup-norm rates surprisingly coincide with the optimal root-mean-squared rates for severely ill-posed problems, and are only a logarithmic factor slower than the optimal root-mean-squared rates for mildly ill-posed problems. Third, we use our sup-norm rates to establish the uniform Gaussian process strong approximations and the score bootstrap uniform confidence bands (UCBs) for collections of nonlinear functionals of $h_0$ under primitive conditions, allowing for mildly and severely ill-posed problems. Fourth, as applications, we obtain the first asymptotic pointwise and uniform inference results for plug-in sieve t-statistics of exact consumer surplus (CS) and deadweight loss (DL) welfare functionals under low-level conditions when demand is estimated via sieve NPIV. Empiricists could read our real data application of UCBs for exact CS and DL functionals of gasoline demand that reveals interesting patterns and is applicable to other markets.
△ Less
Submitted 29 April, 2017; v1 submitted 13 August, 2015;
originally announced August 2015.