-
Strategies for Machine Learning Applied to Noisy HEP Datasets: Modular Solid State Detectors from SuperCDMS
Authors:
P. B. Cushman,
M. C. Fritts,
A. D. Chambers,
A. Roy,
T. Li
Abstract:
Background reduction in the SuperCDMS dark matter experiment depends on removing surface events within individual detectors by identifying the location of each incident particle interaction. Position reconstruction is achieved by combining pulse shape information over multiple phonon channels, a task well-suited to machine learning techniques. Data from an Am-241 scan of a SuperCDMS SNOLAB detecto…
▽ More
Background reduction in the SuperCDMS dark matter experiment depends on removing surface events within individual detectors by identifying the location of each incident particle interaction. Position reconstruction is achieved by combining pulse shape information over multiple phonon channels, a task well-suited to machine learning techniques. Data from an Am-241 scan of a SuperCDMS SNOLAB detector was used to study a selection of statistical approaches, including linear regression, artificial neural networks, and symbolic regression. Our results showed that simpler linear regression models were better able than artificial neural networks to generalize on such a noisy and minimal data set, but there are indications that certain architectures and training configurations can counter overfitting tendencies. This study will be repeated on a more complete SuperCDMS data set (in progress) to explore the interplay between data quality and the application of neural networks.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Epileptic seizure forecasting with long short-term memory (LSTM) neural networks
Authors:
Daniel E. Payne,
Jordan D. Chambers,
Anthony Burkitt,
Mark J. Cook,
Levin Kuhlman,
Dean R. Freestone,
David B. Grayden
Abstract:
Objective: Forecasting epileptic seizures can reduce uncertainty for patients and allow preventative actions. While many models can predict the occurrence of seizures from features of the EEG, few models incorporate changes in features over time. Long Short-Term Memory (LSTM) neural networks are a machine learning architecture that can display temporal dynamics due to the recurrent connections. In…
▽ More
Objective: Forecasting epileptic seizures can reduce uncertainty for patients and allow preventative actions. While many models can predict the occurrence of seizures from features of the EEG, few models incorporate changes in features over time. Long Short-Term Memory (LSTM) neural networks are a machine learning architecture that can display temporal dynamics due to the recurrent connections. In this paper, we used LSTMs to monitor changes in EEG features over time to improve the accuracy of seizure forecasts and to alter the time window of the forecast. Methods: Long-term intracranial EEG recordings from eight patients from the NeuroVista dataset were used. A Fourier transform of 1-minute segments of EEG was fed into a Convolutional Neural Network (CNN). The outputs from the CNN were input to three different LSTM models at different time intervals: 1 minute, 1 hour and 1 day. The LSTM model outputs were used to predict seizure onset within a time window. The prediction and start of the time window were separated by the same length of time as the window. Window sizes tested included 2, 4, 10, 20 and 40 minutes. Results and Conclusion: Our model forecast seizure onsets well above a random predictor. Compared to other models using the same dataset, our model performed better for some patients and worse for others. Monitoring the change in EEG features over time allowed our model to produce good results over a range of different window sizes, which is an improvement on previous models and raises the possibility of altering the forecast to meet individual patient needs. Furthermore, a window size of 40 minutes provides a potential intervention time of 40 minutes, which is the first time an intervention time of more than 5 minutes have been forecast using long-term EEG recordings.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Clutter distributions for tomographic image standardization in ground-penetrating radar
Authors:
Brian M. Worthmann,
David H. Chambers,
David S. Perlmutter,
Jeffrey E. Mast,
David W. Paglieroni,
Christian T. Pechard,
Garrett A. Stevenson,
Steven W. Bond
Abstract:
Multistatic ground-penetrating radar (GPR) signals can be imaged tomographically to produce three-dimensional distributions of image intensities. In the absence of objects of interest, these intensities can be considered to be estimates of clutter. These clutter intensities spatially vary over several orders of magnitude, and vary across different arrays, which makes direct comparison of these raw…
▽ More
Multistatic ground-penetrating radar (GPR) signals can be imaged tomographically to produce three-dimensional distributions of image intensities. In the absence of objects of interest, these intensities can be considered to be estimates of clutter. These clutter intensities spatially vary over several orders of magnitude, and vary across different arrays, which makes direct comparison of these raw intensities difficult. However, by gathering statistics on these intensities and their spatial variation, a variety of metrics can be determined. In this study, the clutter distribution is found to fit better to a two-parameter Weibull distribution than Gaussian or lognormal distributions. Based upon the spatial variation of the two Weibull parameters, scale and shape, more information may be gleaned from these data. How well the GPR array is illuminating various parts of the ground, in depth and cross-track, may be determined from the spatial variation of the Weibull scale parameter, which may in turn be used to estimate an effective attenuation coefficient in the soil. The transition in depth from clutter-limited to noise-limited conditions (which is one possible definition of GPR penetration depth) can be estimated from the spatial variation of the Weibull shape parameter. Finally, the underlying clutter distributions also provide an opportunity to standardize image intensities to determine when a statistically significant deviation from background (clutter) has occurred, which is convenient for buried threat detection algorithm development which needs to be robust across multiple different arrays.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
SIRNet: Understanding Social Distancing Measures with Hybrid Neural Network Model for COVID-19 Infectious Spread
Authors:
Nicholas Soures,
David Chambers,
Zachariah Carmichael,
Anurag Daram,
Dimpy P. Shah,
Kal Clark,
Lloyd Potter,
Dhireesha Kudithipudi
Abstract:
The SARS-CoV-2 infectious outbreak has rapidly spread across the globe and precipitated varying policies to effectuate physical distancing to ameliorate its impact. In this study, we propose a new hybrid machine learning model, SIRNet, for forecasting the spread of the COVID-19 pandemic that couples with the epidemiological models. We use categorized spatiotemporally explicit cellphone mobility da…
▽ More
The SARS-CoV-2 infectious outbreak has rapidly spread across the globe and precipitated varying policies to effectuate physical distancing to ameliorate its impact. In this study, we propose a new hybrid machine learning model, SIRNet, for forecasting the spread of the COVID-19 pandemic that couples with the epidemiological models. We use categorized spatiotemporally explicit cellphone mobility data as surrogate markers for physical distancing, along with population weighted density and other local data points. We demonstrate at varying geographical granularity that the spectrum of physical distancing options currently being discussed among policy leaders have epidemiologically significant differences in consequences, ranging from viral extinction to near complete population prevalence. The current mobility inflection points vary across geographical regions. Experimental results from SIRNet establish preliminary bounds on such localized mobility that asymptotically induce containment. The model can support in studying non-pharmacological interventions and approaches that minimize societal collateral damage and control mechanisms for an extended period of time.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Causal Effects of Prenatal Drug Exposure on Birth Defects with Missing by Terathanasia
Authors:
Andrew Ying,
Ronghui Xu,
Christina D. Chambers,
Kenneth Lyons Jones
Abstract:
A recent cohort study revealed a positive correlate between major structural birth defects in infants and a certain medication taken by pregnant women. To draw valid causal inference, an outstanding problem to overcome was the missing birth defect outcomes among pregnancy losses resulting from spontaneous abortion. This led to missing not at random since, according to the theory of "terathanasia",…
▽ More
A recent cohort study revealed a positive correlate between major structural birth defects in infants and a certain medication taken by pregnant women. To draw valid causal inference, an outstanding problem to overcome was the missing birth defect outcomes among pregnancy losses resulting from spontaneous abortion. This led to missing not at random since, according to the theory of "terathanasia", a defected fetus is more likely to be spontaneously aborted. Other complications in the data included left truncation, right censoring, observational nature, and rare events. In addition, the previous analysis stratified on live birth against spontaneous abortion, which was itself a post-exposure variable and hence did not lead to a causal interpretation of the stratified results. In this paper we aim to estimate and provide inference for the causal parameters of scientific interest, including the principal effects, making use of the missing data mechanism informed by "terathanasia". The rare events with missing outcomes led to multiple sensitivity analyses where the causal parameters can be estimated with better confidence in each setting. Our findings should shed light on how studies on causal effects of medication or other exposures during pregnancy may be analyzed using state-of-the-art methodologies.
△ Less
Submitted 7 June, 2022; v1 submitted 17 April, 2020;
originally announced April 2020.
-
Topological Symmetry Groups of the Petersen Graph
Authors:
D. Chambers,
E. Flapan,
D. Heath,
E. Davie Lawrence,
C. Thatcher,
R. Vanderpool
Abstract:
We characterize all groups which can occur as the topological symmetry group or the orientation preserving topological symmetry group of some embedding of the Petersen graph in S^3.
We characterize all groups which can occur as the topological symmetry group or the orientation preserving topological symmetry group of some embedding of the Petersen graph in S^3.
△ Less
Submitted 5 October, 2017;
originally announced October 2017.
-
Semiparametric Sieve Maximum Likelihood Estimation Under Cure Model with Partly Interval Censored and Left Truncated Data for Application to Spontaneous Abortion Data
Authors:
Yuan Wu,
Christina D. Chambers,
Ronghui Xu
Abstract:
This work was motivated by observational studies in pregnancy with spontaneous abortion (SAB) as outcome. Clearly some women experience the SAB event but the rest do not. In addition, the data are left truncated due to the way pregnant women are recruited into these studies. For those women who do experience SAB, their exact event times are sometimes unknown. Finally, a small percentage of the wom…
▽ More
This work was motivated by observational studies in pregnancy with spontaneous abortion (SAB) as outcome. Clearly some women experience the SAB event but the rest do not. In addition, the data are left truncated due to the way pregnant women are recruited into these studies. For those women who do experience SAB, their exact event times are sometimes unknown. Finally, a small percentage of the women are lost to follow-up during their pregnancy. All these give rise to data that are left truncated, partly interval and right-censored, and with a clearly defined cured portion. We consider the non-mixture Cox regression cure rate model and adopt the semiparametric spline-based sieve maximum likelihood approach to analyze such data. Using modern empirical process theory we show that both the parametric and the nonparametric parts of the sieve estimator are consistent, and we establish the asymptotic normality for both parts. Simulation studies are conducted to establish the finite sample performance. Finally, we apply our method to a database of observational studies on spontaneous abortion.
△ Less
Submitted 22 August, 2017;
originally announced August 2017.
-
Personal Food Computer: A new device for controlled-environment agriculture
Authors:
Eduardo Castelló Ferrer,
Jake Rye,
Gordon Brander,
Tim Savas,
Douglas Chambers,
Hildreth England,
Caleb Harper
Abstract:
Due to their interdisciplinary nature, devices for controlled-environment agriculture have the possibility to turn into ideal tools not only to conduct research on plant phenology but also to create curricula in a wide range of disciplines. Controlled-environment devices are increasing their functionalities as well as improving their accessibility. Traditionally, building one of these devices from…
▽ More
Due to their interdisciplinary nature, devices for controlled-environment agriculture have the possibility to turn into ideal tools not only to conduct research on plant phenology but also to create curricula in a wide range of disciplines. Controlled-environment devices are increasing their functionalities as well as improving their accessibility. Traditionally, building one of these devices from scratch implies knowledge in fields such as mechanical engineering, digital electronics, programming, and energy management. However, the requirements of an effective controlled environment device for personal use brings new constraints and challenges. This paper presents the OpenAg Personal Food Computer (PFC); a low cost desktop size platform, which not only targets plant phenology researchers but also hobbyists, makers, and teachers from elementary to high-school levels (K-12). The PFC is completely open-source and it is intended to become a tool that can be used for collective data sharing and plant growth analysis. Thanks to its modular design, the PFC can be used in a large spectrum of activities.
△ Less
Submitted 24 June, 2017; v1 submitted 15 June, 2017;
originally announced June 2017.
-
The Impact of Confounder Selection in Propensity Scores for Rare Events Data - with Applications to Birth Defects
Authors:
Ronghui Xu,
Jue Hou,
Christina D. Chambers
Abstract:
Our work was motivated by a recent study on birth defects of infants born to pregnant women exposed to a certain medication for treating chronic diseases. Outcomes such as birth defects are rare events in the general population, which often translate to very small numbers of events in the unexposed group. As drug safety studies in pregnancy are typically observational in nature, we control for con…
▽ More
Our work was motivated by a recent study on birth defects of infants born to pregnant women exposed to a certain medication for treating chronic diseases. Outcomes such as birth defects are rare events in the general population, which often translate to very small numbers of events in the unexposed group. As drug safety studies in pregnancy are typically observational in nature, we control for confounding in this rare events setting using propensity scores (PS). Using our empirical data, we noticed that the estimated odds ratio for birth defects due to exposure varied drastically depending on the specific approach used. The commonly used approaches with PS are matching, stratification, inverse probability weighting (IPW) and regression adjustment. The extremely rare events setting renders the matching or stratification infeasible. In addition, the PS itself may be formed via different approaches to select confounders from a relatively long list of potential confounders. We carried out simulation experiments to compare different combinations of approaches: IPW or regression adjustment, with 1) including all potential confounders without selection, 2) selection based on univariate association between the candidate variable and the outcome, 3) selection based on change in effects (CIE). The simulation showed that IPW without selection leads to extremely large variances in the estimated odds ratio, which help to explain the empirical data analysis results that we had observed. The simulation also showed that IPW with selection based on univariate association with the outcome is preferred over IPW with CIE. Regression adjustment has small variances of the estimated odds ratio regardless of the selection methods used.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.
-
A Nonparametric Maximum Likelihood Approach for Partially Observed Cured Data with Left Truncation and Right-Censoring
Authors:
Jue Hou,
Christina D. Chambers,
Ronghui Xu
Abstract:
Partially observed cured data occur in the analysis of spontaneous abortion (SAB) in observational studies in pregnancy. In contrast to the traditional cured data, such data has an observable `cured' portion as women who do not abort spontaneously. The data is also subject to left truncate in addition to right-censoring because women may enter or withdraw from a study any time during their pregnan…
▽ More
Partially observed cured data occur in the analysis of spontaneous abortion (SAB) in observational studies in pregnancy. In contrast to the traditional cured data, such data has an observable `cured' portion as women who do not abort spontaneously. The data is also subject to left truncate in addition to right-censoring because women may enter or withdraw from a study any time during their pregnancy. Left truncation in particular causes unique bias in the presence of a cured portion. In this paper, we study a cure rate model and develop a conditional nonparametric maximum likelihood approach. To tackle the computational challenge we adopt an EM algorithm making use of "ghost copies" of the data, and a closed form variance estimator is derived. Under suitable assumptions, we prove the consistency of the resulting estimator involving an unbounded cumulative baseline hazard function, as well as the asymptotic normality. Simulation results are carried out to evaluate the finite sample performance. We present the analysis of the motivating SAB study to illustrate the power of our model addressing both occurrence and timing of SAB, as compared to existing approaches in practice.
△ Less
Submitted 31 August, 2016;
originally announced September 2016.
-
Earthquake Forecasting Using Hidden Markov Models
Authors:
Daniel W. Chambers,
Jenny A. Baglivo,
John E. Ebel,
Alan L. Kafka
Abstract:
This paper develops a novel method, based on hidden Markov models, to forecast earthquakes and applies the method to mainshock seismic activity in southern California and western Nevada. The forecasts are of the probability of a mainshock within one, five, and ten days in the entire study region or in specific subregions and are based on the observations available at the forecast time, namely the…
▽ More
This paper develops a novel method, based on hidden Markov models, to forecast earthquakes and applies the method to mainshock seismic activity in southern California and western Nevada. The forecasts are of the probability of a mainshock within one, five, and ten days in the entire study region or in specific subregions and are based on the observations available at the forecast time, namely the inter event times and locations of the previous mainshocks and the elapsed time since the most recent one. Hidden Markov models have been applied to many problems, including earthquake classification; this is the first application to earthquake forecasting.
△ Less
Submitted 20 November, 2014;
originally announced November 2014.
-
Topological Symmetry Groups for Small Complete Graphs
Authors:
Dwayne Chambers,
Erica Flapan
Abstract:
For each $n\leq 6$, we characterize all the groups which can occur as either the orientation preserving topological symmetry group or the topological symmetry group of some embedding of $K_n$ in $S^3$.
For each $n\leq 6$, we characterize all the groups which can occur as either the orientation preserving topological symmetry group or the topological symmetry group of some embedding of $K_n$ in $S^3$.
△ Less
Submitted 13 February, 2014; v1 submitted 24 December, 2012;
originally announced December 2012.
-
The two sample problem: Exact distributions, numerical solutions, simulations
Authors:
D. E. Chambers
Abstract:
The work presented in this article suggests a solution to the two sample problem. Keywords: Two sample problem, Welch-Aspin solution, Fisher-Behrens problem, nuisance parameter, similarity, the Linnik phenomenon.
The work presented in this article suggests a solution to the two sample problem. Keywords: Two sample problem, Welch-Aspin solution, Fisher-Behrens problem, nuisance parameter, similarity, the Linnik phenomenon.
△ Less
Submitted 11 July, 2010;
originally announced July 2010.
-
Topological Symmetry Groups of K_{4r+3}
Authors:
Dwayne Chambers,
Erica Flapan,
John D. O'Brien
Abstract:
We present the concept of the topological symmetry group as a way to analyze the symmetries of non-rigid molecules. Then we characterize all of the groups which can occur as the topological symmetry group of an embedding of the complete graph K_{4r+3} in S^3.
We present the concept of the topological symmetry group as a way to analyze the symmetries of non-rigid molecules. Then we characterize all of the groups which can occur as the topological symmetry group of an embedding of the complete graph K_{4r+3} in S^3.
△ Less
Submitted 12 March, 2011; v1 submitted 14 November, 2009;
originally announced November 2009.