-
A tutorial on optimal dynamic treatment regimes
Authors:
Chunyu Wang,
Brian DM Tom
Abstract:
A dynamic treatment regime is a sequence of treatment decision rules tailored to an individual's evolving status over time. In precision medicine, much focus has been placed on finding an optimal dynamic treatment regime which, if followed by everyone in the population, would yield the best outcome on average; and extensive investigation has been conducted from both methodological and applications…
▽ More
A dynamic treatment regime is a sequence of treatment decision rules tailored to an individual's evolving status over time. In precision medicine, much focus has been placed on finding an optimal dynamic treatment regime which, if followed by everyone in the population, would yield the best outcome on average; and extensive investigation has been conducted from both methodological and applications standpoints. The aim of this tutorial is to provide readers who are interested in optimal dynamic treatment regimes with a systematic, detailed but accessible introduction, including the formal definition and formulation of this topic within the framework of causal inference, identification assumptions required to link the causal quantity of interest to the observed data, existing statistical models and estimation methods to learn the optimal regime from data, and application of these methods to both simulated and real data.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Estimating the duration of RT-PCR positivity for SARS-CoV-2 from doubly interval censored data with undetected infections
Authors:
Joshua Blake,
Paul Birrell,
A. Sarah Walker,
Koen B. Pouwels,
Thomas House,
Brian D. M. Tom,
Theodore Kypraios,
Daniela De Angelis
Abstract:
Monitoring the incidence of new infections during a pandemic is critical for an effective public health response. General population prevalence surveys for SARS-CoV-2 can provide high-quality data to estimate incidence. However, estimation relies on understanding the distribution of the duration that infections remain detectable. This study addresses this need using data from the Coronavirus Infec…
▽ More
Monitoring the incidence of new infections during a pandemic is critical for an effective public health response. General population prevalence surveys for SARS-CoV-2 can provide high-quality data to estimate incidence. However, estimation relies on understanding the distribution of the duration that infections remain detectable. This study addresses this need using data from the Coronavirus Infection Survey (CIS), a long-term, longitudinal, general population survey conducted in the UK. Analyzing these data presents unique challenges, such as doubly interval censoring, undetected infections, and false negatives. We propose a Bayesian nonparametric survival analysis approach, estimating a discrete-time distribution of durations and integrating prior information derived from a complementary study. Our methodology is validated through a simulation study, including its resilience to model misspecification, and then applied to the CIS dataset. This results in the first estimate of the full duration distribution in a general population, as well as methodology that could be transferred to new contexts.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Dynamic factor analysis for sparse and irregular longitudinal data: an application to metabolite measurements in a COVID-19 study
Authors:
Jiachen Cai,
Robert J. B. Goudie,
Brian D. M. Tom
Abstract:
It is of scientific interest to identify essential biomarkers in biological processes underlying diseases to facilitate precision medicine. Factor analysis (FA) has long been used to address this goal: by assuming latent biological pathways drive the activity of measurable biomarkers, a biomarker is more influential if its absolute factor loading is larger. Although correlation between biomarkers…
▽ More
It is of scientific interest to identify essential biomarkers in biological processes underlying diseases to facilitate precision medicine. Factor analysis (FA) has long been used to address this goal: by assuming latent biological pathways drive the activity of measurable biomarkers, a biomarker is more influential if its absolute factor loading is larger. Although correlation between biomarkers has been properly handled under this framework, correlation between latent pathways are often overlooked, as one classical assumption in FA is the independence between factors. However, this assumption may not be realistic in the context of pathways, as existing biological knowledge suggests that pathways interact with one another rather than functioning independently. Motivated by sparsely and irregularly collected longitudinal measurements of metabolites in a COVID-19 study of large sample size, we propose a dynamic factor analysis model that can account for the potential cross-correlations between pathways, through a multi-output Gaussian processes (MOGP) prior on the factor trajectories. To mitigate against overfitting caused by sparsity of longitudinal measurements, we introduce a roughness penalty upon MOGP hyperparameters and allow for non-zero mean functions. To estimate these hyperparameters, we develop a stochastic expectation maximization (StEM) algorithm that scales well to the large sample size. In our simulation studies, StEM leads across all sample sizes considered to a more accurate and stable estimate of the MOGP hyperparameters than a comparator algorithm used in previous research. Application to the motivating example identifies a kynurenine pathway that affects the clinical severity of patients with COVID-19. In particular, a novel biomarker taurine is discovered, which has been receiving increased attention clinically, yet its role was overlooked in a previous analysis.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Identifying treatment response subgroups in observational time-to-event data
Authors:
Vincent Jeanselme,
Chang Ho Yoon,
Fabian Falck,
Brian Tom,
Jessica Barrett
Abstract:
Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approaches for treatment effect estimation primarily rely on Randomised Controlled Trials (RCTs), which are often limited by insufficient power, multiple comparisons, and unbalanced covariates. In addition, RCTs tend…
▽ More
Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approaches for treatment effect estimation primarily rely on Randomised Controlled Trials (RCTs), which are often limited by insufficient power, multiple comparisons, and unbalanced covariates. In addition, RCTs tend to feature more homogeneous patient groups, making them less relevant for uncovering subgroups in the population encountered in real-world clinical practice. Subgroup analyses established for RCTs suffer from significant statistical biases when applied to observational studies, which benefit from larger and more representative populations. Our work introduces a novel, outcome-guided, subgroup analysis strategy for identifying subgroups of treatment response in both RCTs and observational studies alike. It hence positions itself in-between individualised and average treatment effect estimation to uncover patient subgroups with distinct treatment responses, critical for actionable insights that may influence treatment guidelines. In experiments, our approach significantly outperforms the current state-of-the-art method for subgroup analysis in both randomised and observational treatment regimes.
△ Less
Submitted 23 February, 2025; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Dynamic Factor Analysis with Dependent Gaussian Processes for High-Dimensional Gene Expression Trajectories
Authors:
Jiachen Cai,
Robert J. B. Goudie,
Colin Starr,
Brian D. M. Tom
Abstract:
The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterising such…
▽ More
The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterising such correlation among different pathways through Dependent Gaussian Processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian Sparse Factor Analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo Expectation Maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA (Konzen and others, 2021), which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on CRAN.
△ Less
Submitted 22 July, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Exploring the big data paradox for various estimands using vaccination data from the global COVID-19 Trends and Impact Survey (CTIS)
Authors:
Youqi Yang,
Walter Dempsey,
Peisong Han,
Yashwant Deshmukh,
Sylvia Richardson,
Brian Tom,
Bhramar Mukherjee
Abstract:
Selection bias poses a challenge to statistical inference validity in non-probability surveys. This study compared estimates of the first-dose COVID-19 vaccination rates among Indian adults in 2021 from a large non-probability survey, COVID-19 Trends and Impact Survey (CTIS), and a small probability survey, the Center for Voting Options and Trends in Election Research (CVoter), against benchmark d…
▽ More
Selection bias poses a challenge to statistical inference validity in non-probability surveys. This study compared estimates of the first-dose COVID-19 vaccination rates among Indian adults in 2021 from a large non-probability survey, COVID-19 Trends and Impact Survey (CTIS), and a small probability survey, the Center for Voting Options and Trends in Election Research (CVoter), against benchmark data from the COVID Vaccine Intelligence Network (CoWIN). Notably, CTIS exhibits a larger estimation error (0.39) compared to CVoter (0.16). Additionally, we investigated the estimation accuracy of the CTIS when using a relative scale and found a significant increase in the effective sample size by altering the estimand from the overall vaccination rate. These results suggest that the big data paradox can manifest in countries beyond the US and it may not apply to every estimand of interest.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Neural Fine-Gray: Monotonic neural networks for competing risks
Authors:
Vincent Jeanselme,
Chang Ho Yoon,
Brian Tom,
Jessica Barrett
Abstract:
Time-to-event modelling, known as survival analysis, differs from standard regression as it addresses censoring in patients who do not experience the event of interest. Despite competitive performances in tackling this problem, machine learning methods often ignore other competing risks that preclude the event of interest. This practice biases the survival estimation. Extensions to address this ch…
▽ More
Time-to-event modelling, known as survival analysis, differs from standard regression as it addresses censoring in patients who do not experience the event of interest. Despite competitive performances in tackling this problem, machine learning methods often ignore other competing risks that preclude the event of interest. This practice biases the survival estimation. Extensions to address this challenge often rely on parametric assumptions or numerical estimations leading to sub-optimal survival approximations. This paper leverages constrained monotonic neural networks to model each competing survival distribution. This modelling choice ensures the exact likelihood maximisation at a reduced computational cost by using automatic differentiation. The effectiveness of the solution is demonstrated on one synthetic and three medical datasets. Finally, we discuss the implications of considering competing risks when developing risk scores for medical practice.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Patient stratification in multi-arm trials: a two-stage procedure with Bayesian profile regression
Authors:
Yuejia Xu,
Angela M. Wood,
Brian D. M. Tom
Abstract:
Precision medicine is an emerging field that takes into account individual heterogeneity to inform better clinical practice. In clinical trials, the evaluation of treatment effect heterogeneity is an important component, and recently, many statistical methods have been proposed for stratifying patients into different subgroups based on such heterogeneity. However, the majority of existing methods…
▽ More
Precision medicine is an emerging field that takes into account individual heterogeneity to inform better clinical practice. In clinical trials, the evaluation of treatment effect heterogeneity is an important component, and recently, many statistical methods have been proposed for stratifying patients into different subgroups based on such heterogeneity. However, the majority of existing methods developed for this purpose focus on the case with a dichotomous treatment and are not directly applicable to multi-arm trials. In this paper, we consider the problem of patient stratification in multi-arm trial settings and propose a two-stage procedure within the Bayesian nonparametric framework. Specifically, we first use Bayesian additive regression trees (BART) to predict potential outcomes (treatment responses) under different treatment options for each patient, and then we leverage Bayesian profile regression to cluster patients into subgroups according to their baseline characteristics and predicted potential outcomes. We further embed a variable selection procedure into our proposed framework to identify the patient characteristics that actively "drive" the clustering structure. We conduct simulation studies to examine the performance of our proposed method and demonstrate the method by applying it to a UK-based multi-arm blood donation trial, wherein our method uncovers five clinically meaningful donor subgroups.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Sequential Re-estimation Learning of Optimal Individualized Treatment Rules Among Ordinal Treatments with Application to Recommended Intervals Between Blood Donations
Authors:
Yuejia Xu,
Angela M. Wood,
David J. Roberts,
Brian D. M. Tom
Abstract:
Personalized medicine has gained much popularity recently as a way of providing better healthcare by tailoring treatments to suit individuals. Our research, motivated by the UK INTERVAL blood donation trial, focuses on estimating the optimal individualized treatment rule (ITR) in the ordinal treatment-arms setting. Restrictions on minimum lengths between whole blood donations exist to safeguard do…
▽ More
Personalized medicine has gained much popularity recently as a way of providing better healthcare by tailoring treatments to suit individuals. Our research, motivated by the UK INTERVAL blood donation trial, focuses on estimating the optimal individualized treatment rule (ITR) in the ordinal treatment-arms setting. Restrictions on minimum lengths between whole blood donations exist to safeguard donor health and quality of blood received. However, the evidence-base for these limits is lacking. Moreover, in England, the blood service is interested in making blood donation both safe and sustainable by integrating multi-marker data from INTERVAL and developing personalized donation strategies. As the three inter-donation interval options in INTERVAL have clear orderings, we propose a sequential re-estimation learning method that effectively incorporates "treatment" orderings when identifying optimal ITRs. Furthermore, we incorporate variable selection into our method for both linear and nonlinear decision rules to handle situations with (noise) covariates irrelevant for decision-making. Simulations demonstrate its superior performance over existing methods that assume multiple nominal treatments by achieving smaller misclassification rates and larger value functions. Application to a much-in-demand donor subgroup shows that the estimated optimal ITR achieves both the highest utilities and largest proportions of donors assigned to the safest inter-donation interval option in INTERVAL.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
A comparison of two frameworks for multi-state modelling, applied to outcomes after hospital admissions with COVID-19
Authors:
Christopher Jackson,
Brian Tom,
Peter Kirwan,
Sema Mandal,
Shaun Seaman,
Kevin Kunzmann,
Anne Presanis,
Daniela De Angelis
Abstract:
We compare two multi-state modelling frameworks that can be used to represent dates of events following hospital admission for people infected during an epidemic. The methods are applied to data from people admitted to hospital with COVID-19, to estimate the probability of admission to ICU, the probability of death in hospital for patients before and after ICU admission, the lengths of stay in hos…
▽ More
We compare two multi-state modelling frameworks that can be used to represent dates of events following hospital admission for people infected during an epidemic. The methods are applied to data from people admitted to hospital with COVID-19, to estimate the probability of admission to ICU, the probability of death in hospital for patients before and after ICU admission, the lengths of stay in hospital, and how all these vary with age and gender. One modelling framework is based on defining transition-specific hazard functions for competing risks. A less commonly used framework defines partially-latent subpopulations who will experience each subsequent event, and uses a mixture model to estimate the probability that an individual will experience each event, and the distribution of the time to the event given that it occurs. We compare the advantages and disadvantages of these two frameworks, in the context of the COVID-19 example. The issues include the interpretation of the model parameters, the computational efficiency of estimating the quantities of interest, implementation in software and assessing goodness of fit. In the example, we find that some groups appear to be at very low risk of some events, in particular ICU admission, and these are best represented by using "cure-rate" models to define transition-specific hazards. We provide general-purpose software to implement all the models we describe in the "flexsurv" R package, which allows arbitrarily-flexible distributions to be used to represent the cause-specific hazards or times to events.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
Latent class mixed modelling for phenotypic stratification of primary biliary cholangitis patients on first line treatment
Authors:
Victoria Mulcahy,
Anais Rouanet,
Alessio Gerussi,
Adam Duckworth,
Steve Flack,
Marco Carbone,
Brian Tom,
George Mells
Abstract:
In patients with primary biliary cholangitis (PBC), the serum liver biochemistry measured during treatment with ursodeoxycholic acid (the UDCA response) accurately predicts long-term outcome. In this study we sought to use liver biochemistry, and in particular alkaline phosphatase (ALP), as a surrogate marker of disease activity, for phenotypic stratification in PBC using a computational modelling…
▽ More
In patients with primary biliary cholangitis (PBC), the serum liver biochemistry measured during treatment with ursodeoxycholic acid (the UDCA response) accurately predicts long-term outcome. In this study we sought to use liver biochemistry, and in particular alkaline phosphatase (ALP), as a surrogate marker of disease activity, for phenotypic stratification in PBC using a computational modelling approach. Our aim here was to identify distinct disease subgroups of patients with distinct disease trajectories. Methods: We used longitudinal ALP results from 1,601 PBC patients on first line treatment with UDCA, and applied latent class mixed modelling (LCMM), to identify distinct phenotypic subgroups, each with distinct disease trajectories, and risks of end stage liver disease (ESLD). Results: We identified four well discriminated phenotypic subgroups within our PBC cohort, each with distinct disease trajectories.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables
Authors:
Anaïs Rouanet,
Rob Johnson,
Magdalena E Strauss,
Sylvia Richardson,
Brian D Tom,
Simon R White,
Paul D W Kirk
Abstract:
The identification of sets of co-regulated genes that share a common function is a key question of modern genomics. Bayesian profile regression is a semi-supervised mixture modelling approach that makes use of a response to guide inference toward relevant clusterings. Previous applications of profile regression have considered univariate continuous, categorical, and count outcomes. In this work, w…
▽ More
The identification of sets of co-regulated genes that share a common function is a key question of modern genomics. Bayesian profile regression is a semi-supervised mixture modelling approach that makes use of a response to guide inference toward relevant clusterings. Previous applications of profile regression have considered univariate continuous, categorical, and count outcomes. In this work, we extend Bayesian profile regression to cases where the outcome is longitudinal (or multivariate continuous) and provide PReMiuMlongi, an updated version of PReMiuM, the R package for profile regression. We consider multivariate normal and Gaussian process regression response models and provide proof of principle applications to four simulation studies. The model is applied on budding yeast data to identify groups of genes co-regulated during the Saccharomyces cerevisiae cell cycle. We identify 4 distinct groups of genes associated with specific patterns of gene expression trajectories, along with the bound transcriptional factors, likely involved in their co-regulation process.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
Trends in risks of severe events and lengths of stay for COVID-19 hospitalisations in England over the pre-vaccination era: results from the Public Health England SARI-Watch surveillance scheme
Authors:
Peter D. Kirwan,
Suzanne Elgohari,
Christopher H. Jackson,
Brian D. M. Tom,
Sema Mandal,
Daniela De Angelis,
Anne M. Presanis
Abstract:
Background: Trends in hospitalised case-fatality risk (HFR), risk of intensive care unit (ICU) admission and lengths of stay for patients hospitalised for COVID-19 in England over the pre-vaccination era are unknown.
Methods: Data on hospital and ICU admissions with COVID-19 at 31 NHS trusts in England were collected by Public Health England's Severe Acute Respiratory Infections surveillance sys…
▽ More
Background: Trends in hospitalised case-fatality risk (HFR), risk of intensive care unit (ICU) admission and lengths of stay for patients hospitalised for COVID-19 in England over the pre-vaccination era are unknown.
Methods: Data on hospital and ICU admissions with COVID-19 at 31 NHS trusts in England were collected by Public Health England's Severe Acute Respiratory Infections surveillance system and linked to death information. We applied parametric multi-state mixture models, accounting for censored outcomes and regressing risks and times between events on month of admission, geography, and baseline characteristics.
Findings: 20,785 adults were admitted with COVID-19 in 2020. Between March and June/July/August estimated HFR reduced from 31.9% (95% confidence interval 30.3-33.5%) to 10.9% (9.4-12.7%), then rose steadily from 21.6% (18.4-25.5%) in September to 25.7% (23.0-29.2%) in December, with steeper increases among older patients, those with multi-morbidity and outside London/South of England. ICU admission risk reduced from 13.9% (12.8-15.2%) in March to 6.2% (5.3-7.1%) in May, rising to a high of 14.2% (11.1-17.2%) in September. Median length of stay in non-critical care increased during 2020, from 6.6 to 12.3 days for those dying, and from 6.1 to 9.3 days for those discharged.
Interpretation: Initial improvements in patient outcomes, corresponding to developments in clinical practice, were not sustained throughout 2020, with HFR in December approaching the levels seen at the start of the pandemic, whilst median hospital stays have lengthened. The role of increased transmission, new variants, case-mix and hospital pressures in increasing COVID-19 severity requires urgent further investigation.
△ Less
Submitted 22 March, 2021; v1 submitted 8 March, 2021;
originally announced March 2021.
-
A Bayesian framework for case-cohort Cox regression: application to dietary epidemiology
Authors:
Andrew Yiu,
Robert J. B. Goudie,
Stephen J. Sharp,
Paul J. Newcombe,
Brian D. M. Tom
Abstract:
The case-cohort study design bypasses resource constraints by collecting certain expensive covariates for only a small subset of the full cohort. Weighted Cox regression is the most widely used approach for analysing case-cohort data within the Cox model, but is inefficient. Alternative approaches based on multiple imputation and nonparametric maximum likelihood suffer from incompatibility and com…
▽ More
The case-cohort study design bypasses resource constraints by collecting certain expensive covariates for only a small subset of the full cohort. Weighted Cox regression is the most widely used approach for analysing case-cohort data within the Cox model, but is inefficient. Alternative approaches based on multiple imputation and nonparametric maximum likelihood suffer from incompatibility and computational issues respectively. We introduce a novel Bayesian framework for case-cohort Cox regression that avoids the aforementioned problems. Users can include auxiliary variables to help predict the unmeasured expensive covariates with a prediction model of their choice, while the models for the nuisance parameters are nonparametrically specified and integrated out. Posterior sampling can be carried out using procedures based on the pseudo-marginal MCMC algorithm. The method scales effectively to large, complex datasets, as demonstrated in our application: investigating the associations between saturated fatty acids and type 2 diabetes using the EPIC-Norfolk study. As part of our analysis, we also develop a new approach for handling compositional data in the Cox model, leading to more reliable and interpretable results compared to previous studies. The performance of our method is illustrated with extensive simulations. The code used to produce the results in this paper can be found at https://github.com/andrewyiu/bayes_cc .
△ Less
Submitted 9 September, 2021; v1 submitted 25 July, 2020;
originally announced July 2020.
-
The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up
Authors:
Razvan V. Marinescu,
Neil P. Oxtoby,
Alexandra L. Young,
Esther E. Bron,
Arthur W. Toga,
Michael W. Weiner,
Frederik Barkhof,
Nick C. Fox,
Arman Eshaghi,
Tina Toni,
Marcin Salaterski,
Veronika Lunina,
Manon Ansart,
Stanley Durrleman,
Pascal Lu,
Samuel Iddi,
Dan Li,
Wesley K. Thompson,
Michael C. Donohue,
Aviv Nahon,
Yarden Levy,
Dan Halbersberg,
Mariya Cohen,
Huiling Liao,
Tengfei Li
, et al. (71 additional authors not shown)
Abstract:
We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcome…
▽ More
We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcomes: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. The methods used by challenge participants included multivariate linear regression, machine learning methods such as support vector machines and deep neural networks, as well as disease progression models. No single submission was best at predicting all three outcomes. For clinical diagnosis and ventricle volume prediction, the best algorithms strongly outperform simple baselines in predictive ability. However, for ADAS-Cog13 no single submitted prediction method was significantly better than random guesswork. Two ensemble methods based on taking the mean and median over all predictions, obtained top scores on almost all tasks. Better than average performance at diagnosis prediction was generally associated with the additional inclusion of features from cerebrospinal fluid (CSF) samples and diffusion tensor imaging (DTI). On the other hand, better performance at ventricle volume prediction was associated with inclusion of summary statistics, such as the slope or maxima/minima of biomarkers. TADPOLE's unique results suggest that current prediction algorithms provide sufficient accuracy to exploit biomarkers related to clinical diagnosis and ventricle volume, for cohort refinement in clinical trials for Alzheimer's disease. However, results call into question the usage of cognitive test scores for patient selection and as a primary endpoint in clinical trials.
△ Less
Submitted 27 December, 2021; v1 submitted 9 February, 2020;
originally announced February 2020.
-
Clustered multi-state models with observation-level random effects, mover-stayer effects and dynamic covariates: Modelling transition intensities and sojourn times in a study of psoriatic arthritis
Authors:
Sean Yiu,
Vernon T. Farewell,
Brian D. M. Tom
Abstract:
In psoriatic arthritis, it is important to understand the joint activity (represented by swelling and pain) and damage processes because both are related to severe physical disability. This paper aims to provide a comprehensive investigation in to both processes occurring over time, in particular their relationship, by specifying a joint multi-state model at the individual hand joint-level, which…
▽ More
In psoriatic arthritis, it is important to understand the joint activity (represented by swelling and pain) and damage processes because both are related to severe physical disability. This paper aims to provide a comprehensive investigation in to both processes occurring over time, in particular their relationship, by specifying a joint multi-state model at the individual hand joint-level, which also accounts for many of their important features. As there are multiple hand joints, such an analysis will be based on the use of clustered multi-state models. Here we consider an observation-level random effects structure with dynamic covariates and allow for the possibility that a subpopulation of patients are at minimal risk of damage. Such an analysis is found to provide further understanding of the activity-damage relationship beyond that provided by previous analyses. Consideration is also given to the modelling of mean sojourn times and jump probabilities. In particular, a novel model parameterization which allows easily interpretable covariate effects to act on these quantities is proposed.
△ Less
Submitted 3 April, 2017;
originally announced April 2017.
-
Two-part models with stochastic processes for modelling longitudinal semicontinuous data: computationally efficient inference and modelling the overall marginal mean
Authors:
Sean Yiu,
Brian Tom
Abstract:
Several researchers have described two-part models with patient-specific stochastic processes for analysing longitudinal semicontinuous data. In theory, such models can offer greater flexibility than the standard two-part model with patient-specific random effects. However, in practice the high dimensional integrations involved in the marginal likelihood (i.e. integrated over the stochastic proces…
▽ More
Several researchers have described two-part models with patient-specific stochastic processes for analysing longitudinal semicontinuous data. In theory, such models can offer greater flexibility than the standard two-part model with patient-specific random effects. However, in practice the high dimensional integrations involved in the marginal likelihood (i.e. integrated over the stochastic processes) significantly complicates model fitting. Thus non-standard computationally intensive procedures based on simulating the marginal likelihood have so far only been proposed. In this paper, we describe an efficient method of implementation by demonstrating how the high dimensional integrations involved in the marginal likelihood can be computed efficiently. Specifically, by using a property of the multivariate normal distribution and the standard marginal cumulative distribution function identity, we transform the marginal likelihood so that the high dimensional integrations are contained in the cumulative distribution function of a multivariate normal distribution, which can then be efficiently evaluated. Hence maximum likelihood estimation can be used to obtain parameter estimates and asymptotic standard errors (from the observed information matrix) of model parameters. We describe our proposed efficient implementation procedure for the standard two-part model parameterisation and when it is of interest to directly model the overall marginal mean. The methodology is applied on a psoriatic arthritis data set concerning functional disability.
△ Less
Submitted 27 March, 2017;
originally announced March 2017.
-
Efficient real-time monitoring of an emerging influenza epidemic: how feasible?
Authors:
Paul J Birrell,
Lorenz Wernisch,
Brian D M Tom,
Leonhard Held,
Gareth O Roberts,
Richard G Pebody,
Daniela De Angelis
Abstract:
A prompt public health response to a new epidemic relies on the ability to monitor and predict its evolution in real time as data accumulate. The 2009 A/H1N1 outbreak in the UK revealed pandemic data as noisy, contaminated, potentially biased, and originating from multiple sources. This seriously challenges the capacity for real-time monitoring. Here we assess the feasibility of real-time inferenc…
▽ More
A prompt public health response to a new epidemic relies on the ability to monitor and predict its evolution in real time as data accumulate. The 2009 A/H1N1 outbreak in the UK revealed pandemic data as noisy, contaminated, potentially biased, and originating from multiple sources. This seriously challenges the capacity for real-time monitoring. Here we assess the feasibility of real-time inference based on such data by constructing an analytic tool combining an age-stratified SEIR transmission model with various observation models describing the data generation mechanisms. As batches of data become available, a sequential Monte Carlo (SMC) algorithm is developed to synthesise multiple imperfect data streams, iterate epidemic inferences and assess model adequacy amidst a rapidly evolving epidemic environment, substantially reducing computation time in comparison to standard MCMC, to ensure timely delivery of real-time epidemic assessments. In application to simulated data designed to mimic the 2009 A/H1N1 epidemic, SMC is shown to have additional benefits in terms of assessing predictive performance and coping with parameter non-identifiability.
△ Less
Submitted 3 May, 2019; v1 submitted 18 August, 2016;
originally announced August 2016.
-
Synthesising evidence to estimate pandemic (2009) A/H1N1 influenza severity in 2009-2011
Authors:
Anne M. Presanis,
Richard G. Pebody,
Paul J. Birrell,
Brian D. M. Tom,
Helen K. Green,
Hayley Durnall,
Douglas Fleming,
Daniela De Angelis
Abstract:
Knowledge of the severity of an influenza outbreak is crucial for informing and monitoring appropriate public health responses, both during and after an epidemic. However, case-fatality, case-intensive care admission and case-hospitalisation risks are difficult to measure directly. Bayesian evidence synthesis methods have previously been employed to combine fragmented, under-ascertained and biased…
▽ More
Knowledge of the severity of an influenza outbreak is crucial for informing and monitoring appropriate public health responses, both during and after an epidemic. However, case-fatality, case-intensive care admission and case-hospitalisation risks are difficult to measure directly. Bayesian evidence synthesis methods have previously been employed to combine fragmented, under-ascertained and biased surveillance data coherently and consistently, to estimate case-severity risks in the first two waves of the 2009 A/H1N1 influenza pandemic experienced in England. We present in detail the complex probabilistic model underlying this evidence synthesis, and extend the analysis to also estimate severity in the third wave of the pandemic strain during the 2010/2011 influenza season. We adapt the model to account for changes in the surveillance data available over the three waves. We consider two approaches: (a) a two-stage approach using posterior distributions from the model for the first two waves to inform priors for the third wave model; and (b) a one-stage approach modelling all three waves simultaneously. Both approaches result in the same key conclusions: (1) that the age-distribution of the case-severity risks is "u"-shaped, with children and older adults having the highest severity; (2) that the age-distribution of the infection attack rate changes over waves, school-age children being most affected in the first two waves and the attack rate in adults over 25 increasing from the second to third waves; and (3) that when averaged over all age groups, case-severity appears to increase over the three waves. The extent to which the final conclusion is driven by the change in age-distribution of those infected over time is subject to discussion.
△ Less
Submitted 3 February, 2015; v1 submitted 29 August, 2014;
originally announced August 2014.
-
Maximum likelihood and pseudo score approaches for parametric time-to-event analysis with informative entry times
Authors:
Brian D. M. Tom,
Vernon T. Farewell,
Sheila M. Bird
Abstract:
We develop a maximum likelihood estimating approach for time-to-event Weibull regression models with outcome-dependent sampling, where sampling of subjects is dependent on the residual fraction of the time left to developing the event of interest. Additionally, we propose a two-stage approach which proceeds by iteratively estimating, through a pseudo score, the Weibull parameters of interest (i.e.…
▽ More
We develop a maximum likelihood estimating approach for time-to-event Weibull regression models with outcome-dependent sampling, where sampling of subjects is dependent on the residual fraction of the time left to developing the event of interest. Additionally, we propose a two-stage approach which proceeds by iteratively estimating, through a pseudo score, the Weibull parameters of interest (i.e., the regression parameters) conditional on the inverse probability of sampling weights; and then re-estimating these weights (given the updated Weibull parameter estimates) through the profiled full likelihood. With these two new methods, both the estimated sampling mechanism parameters and the Weibull parameters are consistently estimated under correct specification of the conditional referral distribution. Standard errors for the regression parameters are obtained directly from inverting the observed information matrix in the full likelihood specification and by either calculating bootstrap or robust standard errors for the hybrid pseudo score/profiled likelihood approach. Loss of efficiency with the latter approach is considered. Robustness of the proposed methods to misspecification of the referral mechanism and the time-to-event distribution is also briefly examined. Further, we show how to extend our methods to the family of parametric time-to-event distributions characterized by the generalized gamma distribution. The motivation for these two approaches came from data on time to cirrhosis from hepatitis C viral infection in patients referred to the Edinburgh liver clinic. We analyze these data here.
△ Less
Submitted 31 July, 2014;
originally announced July 2014.