Search | arXiv e-print repository

Generative Principal Component Regression via Variational Inference

Authors: Austin Talbot, Corey J Keller, David E Carlson, Alex V Kotlar

Abstract: The ability to manipulate complex systems, such as the brain, to modify specific outcomes has far-reaching implications, particularly in the treatment of psychiatric disorders. One approach to designing appropriate manipulations is to target key features of predictive models. While generative latent variable models, such as probabilistic principal component analysis (PPCA), is a powerful tool for… ▽ More The ability to manipulate complex systems, such as the brain, to modify specific outcomes has far-reaching implications, particularly in the treatment of psychiatric disorders. One approach to designing appropriate manipulations is to target key features of predictive models. While generative latent variable models, such as probabilistic principal component analysis (PPCA), is a powerful tool for identifying targets, they struggle incorporating information relevant to low-variance outcomes into the latent space. When stimulation targets are designed on the latent space in such a scenario, the intervention can be suboptimal with minimal efficacy. To address this problem, we develop a novel objective based on supervised variational autoencoders (SVAEs) that enforces such information is represented in the latent space. The novel objective can be used with linear models, such as PPCA, which we refer to as generative principal component regression (gPCR). We show in simulations that gPCR dramatically improves target selection in manipulation as compared to standard PCR and SVAEs. As part of these simulations, we develop a metric for detecting when relevant information is not properly incorporated into the loadings. We then show in two neural datasets related to stress and social behavior in which gPCR dramatically outperforms PCR in predictive performance and that SVAEs exhibit low incorporation of relevant information into the loadings. Overall, this work suggests that our method significantly improves target selection for manipulation using latent variable models over competitor inference schemes. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2108.02164 [pdf, other]

doi 10.1016/j.advwatres.2021.104010

Investigating the Pilot Point Ensemble Kalman Filter for geostatistical inversion and data assimilation

Authors: Johannes Keller, Harrie-Jan Hendricks Franssen, Wolfgang Nowak

Abstract: Parameter estimation has a high importance in the geosciences. The ensemble Kalman filter (EnKF) allows parameter estimation for large, time-dependent systems. For large systems, the EnKF is applied using small ensembles, which may lead to spurious correlations and, ultimately, to filter divergence. We present a thorough evaluation of the pilot point ensemble Kalman filter (PP-EnKF), a variant of… ▽ More Parameter estimation has a high importance in the geosciences. The ensemble Kalman filter (EnKF) allows parameter estimation for large, time-dependent systems. For large systems, the EnKF is applied using small ensembles, which may lead to spurious correlations and, ultimately, to filter divergence. We present a thorough evaluation of the pilot point ensemble Kalman filter (PP-EnKF), a variant of the ensemble Kalman filter for parameter estimation. In this evaluation, we explicitly state the update equations of the PP-EnKF, discuss the differences of this update equation compared to the update equations of similar EnKF methods, and perform an extensive performance comparison. The performance of the PP-EnKF is tested and compared to the performance of seven other EnKF methods in two model setups, a tracer setup and a well setup. In both setups, the PP-EnKF performs well, ranking better than the classical EnKF. For the tracer setup, the PP-EnKF ranks third out of eight methods. At the same time, the PP-EnKF yields estimates of the ensemble variance that are close to EnKF results from a very large-ensemble reference, suggesting that it is not affected by underestimation of the ensemble variance. In a comparison of the ensemble variances, the PP-EnKF ranks first and third out of eight methods. Additionally, for the well model and ensemble size 50, the PP-EnKF yields correlation structures significantly closer to a reference than the classical EnKF, an indication of the method's skill to suppress spurious correlations for small ensemble sizes. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Journal ref: Advances in Water Resources, 2021

arXiv:2107.10118 [pdf, other]

doi 10.1002/sim.9382

Tracking the Transmission Dynamics of COVID-19 with a Time-Varying Coefficient State-Space Model

Authors: Joshua P. Keller, Tianjian Zhou, Andee Kaplan, G. Brooke Anderson, Wen Zhou

Abstract: The spread of COVID-19 has been greatly impacted by regulatory policies and behavior patterns that vary across counties, states, and countries. Population-level dynamics of COVID-19 can generally be described using a set of ordinary differential equations, but these deterministic equations are insufficient for modeling the observed case rates, which can vary due to local testing and case reporting… ▽ More The spread of COVID-19 has been greatly impacted by regulatory policies and behavior patterns that vary across counties, states, and countries. Population-level dynamics of COVID-19 can generally be described using a set of ordinary differential equations, but these deterministic equations are insufficient for modeling the observed case rates, which can vary due to local testing and case reporting policies and non-homogeneous behavior among individuals. To assess the impact of population mobility on the spread of COVID-19, we have developed a novel Bayesian time-varying coefficient state-space model for infectious disease transmission. The foundation of this model is a time-varying coefficient compartment model to recapitulate the dynamics among susceptible, exposed, undetected infectious, detected infectious, undetected removed, detected non-infectious, detected recovered, and detected deceased individuals. The infectiousness and detection parameters are modeled to vary by time, and the infectiousness component in the model incorporates information on multiple sources of population mobility. Along with this compartment model, a multiplicative process model is introduced to allow for deviation from the deterministic dynamics. We apply this model to observed COVID-19 cases and deaths in several US states and Colorado counties. We find that population mobility measures are highly correlated with transmission rates and can explain complicated temporal variation in infectiousness in these regions. Additionally, the inferred connections between mobility and epidemiological parameters, varying across locations, have revealed the heterogeneous effects of different policies on the dynamics of COVID-19. △ Less

Submitted 21 July, 2021; originally announced July 2021.

arXiv:1909.11161 [pdf, other]

doi 10.1111/rssa.12556

Selecting a Scale for Spatial Confounding Adjustment

Authors: Joshua P. Keller, Adam A. Szpiro

Abstract: Unmeasured, spatially-structured factors can confound associations between spatial environmental exposures and health outcomes. Adding flexible splines to a regression model is a simple approach for spatial confounding adjustment, but the spline degrees of freedom do not provide an easily interpretable spatial scale. We describe a method for quantifying the extent of spatial confounding adjustment… ▽ More Unmeasured, spatially-structured factors can confound associations between spatial environmental exposures and health outcomes. Adding flexible splines to a regression model is a simple approach for spatial confounding adjustment, but the spline degrees of freedom do not provide an easily interpretable spatial scale. We describe a method for quantifying the extent of spatial confounding adjustment in terms of the Euclidean distance at which variation is removed. We develop this approach for confounding adjustment with splines and using Fourier and wavelet filtering. We demonstrate differences in the spatial scales these bases can represent and provide a comparison of methods for selecting the amount of confounding adjustment. We find the best performance for selecting the amount of adjustment using an information criterion evaluated on an outcome model without exposure. We apply this method to spatial adjustment in an analysis of particulate matter and blood pressure in a cohort of United States women. △ Less

Submitted 24 September, 2019; originally announced September 2019.

Comments: 22 pages, 6 figures

Journal ref: Journal of the Royal Statistical Society: Series A (2020) 183, Part 3, 1121-1143

arXiv:1908.05340 [pdf, other]

A hierarchical model for estimating exposure-response curves from multiple studies

Authors: Joshua P. Keller, Joanne Katz, Amid K. Pokhrel, Michael N. Bates, James Tielsch, Scott L. Zeger

Abstract: Cookstove replacement trials have found mixed results on their impact on respiratory health. The limited range of concentrations and small sample sizes of individual studies are important factors that may be limiting their statistical power. We present a hierarchical approach to modeling exposure concentrations and pooling data from multiple studies in order to estimate a common exposure-response… ▽ More Cookstove replacement trials have found mixed results on their impact on respiratory health. The limited range of concentrations and small sample sizes of individual studies are important factors that may be limiting their statistical power. We present a hierarchical approach to modeling exposure concentrations and pooling data from multiple studies in order to estimate a common exposure-response curve. The exposure concentration model accommodates temporally sparse, clustered longitudinal observations. The exposure-response curve model provides a flexible, semi-parametric estimate of the exposure-response relationship while accommodating heterogeneous clustered data. We apply this model to data from three studies of cookstoves and respiratory infections in children in Nepal, which represent three study types: crossover trial, parallel trial, and case-control study. We find evidence of increased odds of disease for particulate matter concentrations between 50 and 200 $μ$g/m$^3$ and a flattening of the exposure-response curve for higher exposure concentrations. The model we present can incorporate additional studies and be applied to other settings. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: 22 pages, 8 figures

arXiv:1904.01014 [pdf, other]

doi 10.1117/12.2519484

Comparison of Possibilistic Fuzzy Local Information C-Means and Possibilistic K-Nearest Neighbors for Synthetic Aperture Sonar Image Segmentation

Authors: Joshua Peeples, Matthew Cook, Daniel Suen, Alina Zare, James Keller

Abstract: Synthetic aperture sonar (SAS) imagery can generate high resolution images of the seafloor. Thus, segmentation algorithms can be used to partition the images into different seafloor environments. In this paper, we compare two possibilistic segmentation approaches. Possibilistic approaches allow for the ability to detect novel or outlier environments as well as well known classes. The Possibilistic… ▽ More Synthetic aperture sonar (SAS) imagery can generate high resolution images of the seafloor. Thus, segmentation algorithms can be used to partition the images into different seafloor environments. In this paper, we compare two possibilistic segmentation approaches. Possibilistic approaches allow for the ability to detect novel or outlier environments as well as well known classes. The Possibilistic Fuzzy Local Information C-Means (PFLICM) algorithm has been previously applied to segment SAS imagery. Additionally, the Possibilistic K-Nearest Neighbors (PKNN) algorithm has been used in other domains such as landmine detection and hyperspectral imagery. In this paper, we compare the segmentation performance of a semi-supervised approach using PFLICM and a supervised method using Possibilistic K-NN. We include final segmentation results on multiple SAS images and a quantitative assessment of each algorithm. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Journal ref: Proc. SPIE 110120, Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XXIV (10 May 2019)

arXiv:1808.07843 [pdf, other]

doi 10.1029/2018WR023374

Comparing seven variants of the Ensemble Kalman Filter: How many synthetic experiments are needed?

Authors: Johannes Keller, Harrie-Jan Hendricks Franssen, Gabriele Marquart

Abstract: The Ensemble Kalman Filter (EnKF) is a popular estimation technique in the geosciences. It is used as a numerical tool for state vector prognosis and parameter estimation. The EnKF can, for example, help to evaluate the geothermal potential of an aquifer. In such applications, the EnKF is often used with small or medium ensemble sizes. It is therefore of interest to characterize the EnKF behavior… ▽ More The Ensemble Kalman Filter (EnKF) is a popular estimation technique in the geosciences. It is used as a numerical tool for state vector prognosis and parameter estimation. The EnKF can, for example, help to evaluate the geothermal potential of an aquifer. In such applications, the EnKF is often used with small or medium ensemble sizes. It is therefore of interest to characterize the EnKF behavior for these ensemble sizes. For seven ensemble sizes (50, 70, 100, 250, 500, 1000, 2000) and seven EnKF-variants (Damped, Iterative, Local, Hybrid, Dual, Normal Score and Classical EnKF), we computed 1000 synthetic parameter estimation experiments for two set-ups: a 2D tracer transport problem and a 2D flow problem with one injection well. For each model, the only difference among synthetic experiments was the generated set of random permeability fields. The 1000 synthetic experiments allow to calculate the pdf of the RMSE of the characterization of the permeability field. Comparing mean RMSEs for different EnKF-variants, ensemble sizes and flow/transport set-ups suggests that multiple synthetic experiments are needed for a solid performance comparison. In this work, 10 synthetic experiments were needed to correctly distinguish RMSE differences between EnKF-variants smaller than 10%. For detecting RMSE differences smaller than 2%, 100 synthetic experiments were needed for ensemble sizes 50, 70, 100 and 250. The overall ranking of the EnKF-variants is strongly dependent on the physical model set-up and the ensemble size. △ Less

Submitted 23 August, 2018; originally announced August 2018.

Journal ref: Water Resources Research, 2018

Showing 1–7 of 7 results for author: Keller, J