-
Quantifying Diagnostic Signal Decay in Dementia: A National Study of Medicare Hospitalization Data
Authors:
Federica Spoto,
Jiazi Tian,
Jonas Hügel,
Daniel T. Ortega,
Christine S. Ritchie,
Deborah Blacker,
Francesca Dominici,
Chirag J. Patel,
Daniel Mork,
Hossein Estiri
Abstract:
Background: Artificial intelligence (AI) models in healthcare depend on the fidelity of diagnostic data, yet the quality of such data is often compromised by variability in clinical documentation practices. In dementia, a condition already prone to diagnostic ambiguity, this variability may introduce systematic distortion into claims-based research and AI model development.
Methods: We analyzed…
▽ More
Background: Artificial intelligence (AI) models in healthcare depend on the fidelity of diagnostic data, yet the quality of such data is often compromised by variability in clinical documentation practices. In dementia, a condition already prone to diagnostic ambiguity, this variability may introduce systematic distortion into claims-based research and AI model development.
Methods: We analyzed Medicare Part A hospitalization data from 2016-2018 to examine patterns of dementia-related ICD-10 code utilization across more than 3,000 U.S. counties. Using a clinically informed classification of 17 ICD-10 codes grouped into five diagnostic categories, we applied the transitive Sequential Pattern Mining (tSPM+) algorithm to model temporal usage structures. We then used matrix similarity methods to compare local diagnostic patterns to national norms and fit multivariable linear regressions to identify county-level demographic and structural correlates of divergence.
Findings: We found substantial geographic and demographic variation in dementia-related diagnostic code usage. Non-specific codes were dominant nationwide, while Alzheimer's disease and vascular dementia codes showed pronounced variability. Temporal sequence analysis revealed consistent transitions from specific to non-specific codes, which suggest degradation of diagnostic specificity over time. Counties with higher proportions of rural residents, Medicaid-eligible patients, and Black or Hispanic dementia patients demonstrated significantly lower similarity to national usage patterns. Our model explained 38% of the variation in local-to-national diagnostic alignment.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
A structural nested rate model for estimating the effects of time-varying exposure on recurrent event outcomes in the presence of death
Authors:
Daniel Mork,
Robert L. Strawderman,
Michelle Audirac,
Francesca Dominici,
Ashkan Ertefaie
Abstract:
Assessing the causal effect of time-varying exposures on recurrent event processes is challenging in the presence of a terminating event. Our objective is to estimate both the short-term and delayed marginal causal effects of exposures on recurrent events while addressing the bias of a potentially correlated terminal event. Existing estimators based on marginal structural models and proportional r…
▽ More
Assessing the causal effect of time-varying exposures on recurrent event processes is challenging in the presence of a terminating event. Our objective is to estimate both the short-term and delayed marginal causal effects of exposures on recurrent events while addressing the bias of a potentially correlated terminal event. Existing estimators based on marginal structural models and proportional rate models are unsuitable for estimating delayed marginal causal effects for many reasons, and furthermore, they do not account for competing risks associated with a terminating event. To address these limitations, we propose a class of semiparametric structural nested recurrent event models and two estimators of short-term and delayed marginal causal effects of exposures. We establish the asymptotic linearity of these two estimators under regularity conditions through the novel use of modern empirical process and semiparametric efficiency theory. We examine the performance of these estimators via simulation and provide an R package sncure to apply our methods in real data scenarios. Finally, we present the utility of our methods in the context of a large epidemiological study of 299,661 Medicare beneficiaries, where we estimate the effects of fine particulate matter air pollution on recurrent hospitalizations for cardiovascular disease.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Structured Bayesian Regression Tree Models for Estimating Distributed Lag Effects: The R Package dlmtree
Authors:
Seongwon Im,
Ander Wilson,
Daniel Mork
Abstract:
When examining the relationship between an exposure and an outcome, there is often a time lag between exposure and the observed effect on the outcome. A common statistical approach for estimating the relationship between the outcome and lagged measurements of exposure is a distributed lag model (DLM). Because repeated measurements are often autocorrelated, the lagged effects are typically constrai…
▽ More
When examining the relationship between an exposure and an outcome, there is often a time lag between exposure and the observed effect on the outcome. A common statistical approach for estimating the relationship between the outcome and lagged measurements of exposure is a distributed lag model (DLM). Because repeated measurements are often autocorrelated, the lagged effects are typically constrained to vary smoothly over time. A recent statistical development on the smoothing constraint is a tree structured DLM framework. We present an R package dlmtree, available on CRAN, that integrates tree structured DLM and extensions into a comprehensive software package with user-friendly implementation. A conceptual background on tree structured DLMs and demonstration of the fitting process of each model using simulated data are provided. We also demonstrate inference and interpretation using the fitted models, including summary and visualization. Additionally, a built-in shiny app for heterogeneity analysis is included.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
COOL-LAMPS VIII: Known wide-separation lensed quasars and their host galaxies reveal a lack of evolution in $M_{\rm{BH}}/M_\star$ since $z\sim 3$
Authors:
Aidan P. Cloonan,
Gourav Khullar,
Kate A. Napier,
Michael D. Gladders,
Håkon Dahle,
Riley Rosener,
Jamar Sullivan Jr.,
Matthew B. Bayliss,
Nathalie Chicoine,
Isaiah Escapa,
Diego Garza,
Josh Garza,
Rowen Glusman,
Katya Gozman,
Gabriela Horwath,
Andi Kisare,
Benjamin C. Levine,
Olina Liang,
Natalie Malagon,
Michael N. Martinez,
Alexandra Masegian,
Owen S. Matthews Acuña,
Simon D. Mork,
Kunwanhui Niu,
M. Riley Owens
, et al. (14 additional authors not shown)
Abstract:
Wide-separation lensed quasars (WSLQs) are a rare class of strongly lensed quasars, magnified by foreground massive galaxy clusters, with typically large magnifications of the multiple quasar images. They are a relatively unexplored opportunity for detailed study of quasar host galaxies. The current small sample of known WSLQs has a median redshift of $z\approx 2.1$, larger than most other samples…
▽ More
Wide-separation lensed quasars (WSLQs) are a rare class of strongly lensed quasars, magnified by foreground massive galaxy clusters, with typically large magnifications of the multiple quasar images. They are a relatively unexplored opportunity for detailed study of quasar host galaxies. The current small sample of known WSLQs has a median redshift of $z\approx 2.1$, larger than most other samples of quasar host galaxies studied to date. Here, we derive precise constraints on the properties of six WSLQs and their host galaxies, using parametric surface brightness fitting, measurements of quasar emission lines, and stellar population synthesis of host galaxies in six WSLQ systems. Our results, with significant uncertainty, indicate that these six hosts are a mixture of star-forming and quiescent galaxies. To probe for co-evolution between AGNs and host galaxies, we model the offset from the `local' ($z=0$) $M_{\rm{BH}}\unicode{x2013}M_\star$ relation as a simple power-law in redshift. Accounting for selection effects, a WSLQ-based model for evolution in the $M_{\rm{BH}}\unicode{x2013}M_\star$ relation has a power-law index of $γ_M=-0.42\pm0.31$, consistent with no evolution. Compared to several literature samples, which mostly probe unlensed quasars at $z<2$, the WSLQ sample shows less evolution from the local relation, at $\sim 4σ$. We find that selection affects and choices of $M_{\rm{BH}}$ calibration are the most important systematics in these comparisons. Given that we resolve host galaxy flux confidently even from the ground in some instances, our work demonstrates that WSLQs and highly magnified AGNs are exceptional systems for future AGN$\unicode{x2013}$host co-evolution studies.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Cryogenic optical beam steering for superconducting device calibration
Authors:
K. Stifter,
H. Magoon,
A. J. Anderson,
D. J. Temples,
N. A. Kurinsky,
C. Stoughton,
I. Hernandez,
A. Nuñez,
K. Anyang,
R. Linehan,
M. R. Young,
P. Barry,
D. Baxter,
D. Bowring,
G. Cancelo,
A. Chou,
K. R. Dibert,
E. Figueroa-Feliciano,
L. Hsu,
R. Khatiwada,
S. D. Mork,
L. Stefanazzi,
N. Tabassum,
S. Uemura,
B. A. Young
Abstract:
We have developed a calibration system based on a micro-electromechanical systems (MEMS) mirror that is capable of delivering an optical beam over a wavelength range of 180 -- 2000 nm (0.62 -- 6.89 eV) in a sub-Kelvin environment. This portable, integrated system can steer the beam over a $\sim$3 cm $\times$ 3 cm area on the surface of any sensor with a precision of $\sim$100 $μ$m, enabling charac…
▽ More
We have developed a calibration system based on a micro-electromechanical systems (MEMS) mirror that is capable of delivering an optical beam over a wavelength range of 180 -- 2000 nm (0.62 -- 6.89 eV) in a sub-Kelvin environment. This portable, integrated system can steer the beam over a $\sim$3 cm $\times$ 3 cm area on the surface of any sensor with a precision of $\sim$100 $μ$m, enabling characterization of device response as a function of position. This fills a critical need in the landscape of calibration tools for sub-Kelvin devices, including those used for dark matter detection and quantum computing. These communities have a shared goal of understanding the impact of ionizing radiation on device performance, which can be pursued with our system. This paper describes the design of the first-generation calibration system and the results from successfully testing its performance at room temperature and 20 mK.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
COOL-LAMPS. VII. Quantifying Strong-lens Scaling Relations with 177 Cluster-scale Strong Gravitational Lenses in DECaLS
Authors:
Simon D. Mork,
Michael D. Gladders,
Gourav Khullar,
Keren Sharon,
Nathalie Chicoine,
Aidan P. Cloonan,
Håkon Dahle,
Diego Garza,
Rowen Glusman,
Katya Gozman,
Gabriela Horwath,
Benjamin C. Levine,
Olina Liang,
Daniel Mahronic,
Viraj Manwadkar,
Michael N. Martinez,
Alexandra Masegian,
Owen S. Matthews Acuña,
Kaiya Merz,
Yue Pan,
Jorge A. Sanchez,
Isaac Sierra,
Daniel J. Kavin Stein,
Ezra Sukay,
Marcos Tamargo-Arizmendi
, et al. (5 additional authors not shown)
Abstract:
We estimate the Einstein-radius-enclosed total mass for 177 cluster-scale strong gravitational lenses identified by the ChicagO Optically selected Lenses Located At the Margins of Public Surveys (COOL-LAMPS) collaboration with lens redshifts ranging from $0.2 \lessapprox z \lessapprox 1.0$ using the brightest-cluster-galaxy (BCG) redshift and an observable proxy for the Einstein radius. We constra…
▽ More
We estimate the Einstein-radius-enclosed total mass for 177 cluster-scale strong gravitational lenses identified by the ChicagO Optically selected Lenses Located At the Margins of Public Surveys (COOL-LAMPS) collaboration with lens redshifts ranging from $0.2 \lessapprox z \lessapprox 1.0$ using the brightest-cluster-galaxy (BCG) redshift and an observable proxy for the Einstein radius. We constrain the Einstein-radius-enclosed luminosity and stellar mass by fitting parametric spectral energy distributions to aperture photometry from the Dark Energy Camera Legacy Survey in the $g$-, $r$-, and $z$-band Dark Energy Camera filters. We find that the BCG redshift, enclosed total mass, and enclosed luminosity are strongly correlated and well described by a planar relationship in 3D space. We find that the enclosed total mass and stellar mass are correlated with a logarithmic slope of $0.500^{+0.029}_{-0.031}$, and the enclosed total mass and stellar-to-total mass fraction are correlated with a logarithmic slope of $-0.495^{+0.032}_{-0.033}$. In tandem with the small radii within which these slopes are constrained, this may suggest invariance in baryon conversion efficiency and feedback strength as a function of cluster-centric radii in galaxy clusters. Additionally, the correlations described here should have utility in ranking strong-lensing candidates in upcoming imaging surveys -- such as Rubin/Legacy Survey of Space and Time -- in which an algorithmic treatment of strong lenses will be needed due to the sheer volume of data these surveys will produce.
△ Less
Submitted 27 January, 2025; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Methods for Estimating the Exposure-Response Curve to Inform the New Safety Standards for Fine Particulate Matter
Authors:
Michael Cork,
Daniel Mork,
Francesca Dominici
Abstract:
Exposure to fine particulate matter ($PM_{2.5}$) poses significant health risks and accurately determining the shape of the relationship between $PM_{2.5}$ and health outcomes has crucial policy ramifications. While various statistical methods exist to estimate this exposure-response curve (ERC), few studies have compared their performance under plausible data-generating scenarios. This study comp…
▽ More
Exposure to fine particulate matter ($PM_{2.5}$) poses significant health risks and accurately determining the shape of the relationship between $PM_{2.5}$ and health outcomes has crucial policy ramifications. While various statistical methods exist to estimate this exposure-response curve (ERC), few studies have compared their performance under plausible data-generating scenarios. This study compares seven commonly used ERC estimators across 72 exposure-response and confounding scenarios via simulation. Additionally, we apply these methods to estimate the ERC between long-term $PM_{2.5}$ exposure and all-cause mortality using data from over 68 million Medicare beneficiaries in the United States. Our simulation indicates that regression methods not placed within a causal inference framework are unsuitable when anticipating heterogeneous exposure effects. Under the setting of a large sample size and unknown ERC functional form, we recommend utilizing causal inference methods that allow for nonlinear ERCs. In our data application, we observe a nonlinear relationship between annual average $PM_{2.5}$ and all-cause mortality in the Medicare population, with a sharp increase in relative mortality at low PM2.5 concentrations. Our findings suggest that stricter $PM_{2.5}$ limits could avert numerous premature deaths. To facilitate the utilization of our results, we provide publicly available, reproducible code on Github for every step of the analysis.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Incorporating prior information into distributed lag nonlinear models with zero-inflated monotone regression trees
Authors:
Daniel Mork,
Ander Wilson
Abstract:
In environmental health research there is often interest in the effect of an exposure on a health outcome assessed on the same day and several subsequent days or lags. Distributed lag nonlinear models (DLNM) are a well-established statistical framework for estimating an exposure-lag-response function. We propose methods to allow for prior information to be incorporated into DLNMs. First, we impose…
▽ More
In environmental health research there is often interest in the effect of an exposure on a health outcome assessed on the same day and several subsequent days or lags. Distributed lag nonlinear models (DLNM) are a well-established statistical framework for estimating an exposure-lag-response function. We propose methods to allow for prior information to be incorporated into DLNMs. First, we impose a monotonicity constraint in the exposure-response at lagged time periods which matches with knowledge on how biological mechanisms respond to increased levels of exposures. Second, we introduce variable selection into the DLNM to identify lagged periods of susceptibility with respect to the outcome of interest. The variable selection approach allows for direct application of informative priors on which lags have nonzero association with the outcome. We propose a tree-of-trees model that uses two layers of trees: one for splitting the exposure time frame and one for fitting exposure-response functions over different time periods. We introduce a zero-inflated alternative to the tree splitting prior in Bayesian additive regression trees to allow for lag selection and the addition of informative priors. We develop a computational approach for efficient posterior sampling and perform a comprehensive simulation study to compare our method to existing DLNM approaches. We apply our method to estimate time-lagged extreme temperature relationships with mortality during summer or winter in Chicago, IL.
△ Less
Submitted 4 October, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Heterogeneous Distributed Lag Models to Estimate Personalized Effects of Maternal Exposures to Air Pollution
Authors:
Daniel Mork,
Marianthi-Anna Kioumourtzoglou,
Marc Weisskopf,
Brent A Coull,
Ander Wilson
Abstract:
Children's health studies support an association between maternal environmental exposures and children's birth outcomes. A common goal is to identify critical windows of susceptibility--periods during gestation with increased association between maternal exposures and a future outcome. The timing of the critical windows and magnitude of the associations are likely heterogeneous across different le…
▽ More
Children's health studies support an association between maternal environmental exposures and children's birth outcomes. A common goal is to identify critical windows of susceptibility--periods during gestation with increased association between maternal exposures and a future outcome. The timing of the critical windows and magnitude of the associations are likely heterogeneous across different levels of individual, family, and neighborhood characteristics. Using an administrative Colorado birth cohort we estimate the individualized relationship between weekly exposures to fine particulate matter (PM$_{2.5}$) during gestation and birth weight. To achieve this goal, we propose a statistical learning method combining distributed lag models and Bayesian additive regression trees to estimate critical windows at the individual level and identify characteristics that induce heterogeneity from a high-dimensional set of potential modifying factors. We find evidence of heterogeneity in the PM$_{2.5}$-birth weight relationship, with some mother-child dyads showing a 3 times larger decrease in birth weight for an IQR increase in exposure (5.9 to 8.5 $μg/m^3$ PM$_{2.5}$) compared to the population average. Specifically, we find increased susceptibility for non-Hispanic mothers who are either younger, have higher body mass index or lower educational attainment. Our case study is the first precision health study of critical windows.
△ Less
Submitted 30 June, 2023; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Estimating Perinatal Critical Windows of Susceptibility to Environmental Mixtures via Structured Bayesian Regression Tree Pairs
Authors:
Daniel Mork,
Ander Wilson
Abstract:
Maternal exposure to environmental chemicals during pregnancy can alter birth and children's health outcomes. Research seeks to identify critical windows, time periods when the exposures can change future health outcomes, and estimate the exposure-response relationship. Existing statistical approaches focus on estimation of the association between maternal exposure to a single environmental chemic…
▽ More
Maternal exposure to environmental chemicals during pregnancy can alter birth and children's health outcomes. Research seeks to identify critical windows, time periods when the exposures can change future health outcomes, and estimate the exposure-response relationship. Existing statistical approaches focus on estimation of the association between maternal exposure to a single environmental chemical observed at high-temporal resolution, such as weekly throughout pregnancy, and children's health outcomes. Extending to multiple chemicals observed at high temporal resolution poses a dimensionality problem and statistical methods are lacking. We propose a tree-based model for mixtures of exposures that are observed at high temporal resolution. The proposed approach uses an additive ensemble of structured tree-pairs that define structured main effects and interactions between time-resolved predictors and variable selection to select out of the model predictors not correlated with the outcome. We apply our method in a simulation and the analysis of the relationship between five exposures measured weekly throughout pregnancy and resulting birth weight in a Denver, Colorado birth cohort. We identified critical windows during which fine particulate matter, sulfur dioxide, and temperature are negatively associated with birth weight and an interaction between fine particulate matter and temperature. Software is made available in the R package dlmtree.
△ Less
Submitted 1 July, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Treed distributed lag nonlinear models
Authors:
Daniel Mork,
Ander Wilson
Abstract:
In studies of maternal exposure to air pollution a children's health outcome is regressed on exposures observed during pregnancy. The distributed lag nonlinear model (DLNM) is a statistical method commonly implemented to estimate an exposure-time-response function when it is postulated the exposure effect is nonlinear. Previous implementations of the DLNM estimate an exposure-time-response surface…
▽ More
In studies of maternal exposure to air pollution a children's health outcome is regressed on exposures observed during pregnancy. The distributed lag nonlinear model (DLNM) is a statistical method commonly implemented to estimate an exposure-time-response function when it is postulated the exposure effect is nonlinear. Previous implementations of the DLNM estimate an exposure-time-response surface parameterized with a bivariate basis expansion. However, basis functions such as splines assume smoothness across the entire exposure-time-response surface, which may be unrealistic in settings where the exposure is associated with the outcome only in a specific time window. We propose a framework for estimating the DLNM based on Bayesian additive regression trees. Our method operates using a set of regression trees that each assume piecewise constant relationships across the exposure-time space. In a simulation, we show that our model outperforms spline-based models when the exposure-time surface is not smooth, while both methods perform similarly in settings where the true surface is smooth. Importantly, the proposed approach is lower variance and more precisely identifies critical windows during which exposure is associated with a future health outcome. We apply our method to estimate the association between maternal exposure to PM$_{2.5}$ and birth weight in a Colorado USA birth cohort.
△ Less
Submitted 15 June, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.