Search | arXiv e-print repository

arXiv:2501.02128 [pdf, other]

Transfer Learning for Individualized Treatment Rules: Application to Sepsis Patients Data from eICU-CRD and MIMIC-III Databases

Authors: Andong Wang, Kelly Wentzlof, Johnny Rajala, Miontranese Green, Yunshu Zhang, Shu Yang

Abstract: Modern precision medicine aims to utilize real-world data to provide the best treatment for an individual patient. An individualized treatment rule (ITR) maps each patient's characteristics to a recommended treatment scheme that maximizes the expected outcome of the patient. A challenge precision medicine faces is population heterogeneity, as studies on treatment effects are often conducted on sou… ▽ More Modern precision medicine aims to utilize real-world data to provide the best treatment for an individual patient. An individualized treatment rule (ITR) maps each patient's characteristics to a recommended treatment scheme that maximizes the expected outcome of the patient. A challenge precision medicine faces is population heterogeneity, as studies on treatment effects are often conducted on source populations that differ from the populations of interest in terms of the distribution of patient characteristics. Our research goal is to explore a transfer learning algorithm that aims to address the population heterogeneity problem and obtain targeted, optimal, and interpretable ITRs. The algorithm incorporates a calibrated augmented inverse probability weighting (CAIPW) estimator for the average treatment effect (ATE) and employs value function maximization for the target population using Genetic Algorithm (GA) to produce our desired ITR. To demonstrate its practical utility, we apply this transfer learning algorithm to two large medical databases, Electronic Intensive Care Unit Collaborative Research Database (eICU-CRD) and Medical Information Mart for Intensive Care III (MIMIC-III). We first identify the important covariates, treatment options, and outcomes of interest based on the two databases, and then estimate the optimal linear ITRs for patients with sepsis. Our research introduces and applies new techniques for data fusion to obtain data-driven ITRs that cater to patients' individual medical needs in a population of interest. By emphasizing generalizability and personalized decision-making, this methodology extends its potential application beyond medicine to fields such as marketing, technology, social sciences, and education. △ Less

Submitted 3 January, 2025; originally announced January 2025.

Comments: 23 pages, 4 figures

arXiv:2205.12769 [pdf, other]

Small domain estimation of census coverage: A case study in Bayesian analysis of complex survey data

Authors: Joane S. Elleouet, Patrick Graham, Nikolai Kondratev, Abby K. Morgan, Rebecca M. Green

Abstract: Many countries conduct a full census survey to report official population statistics. As no census survey ever achieves 100 per cent response rate, a post-enumeration survey (PES) is usually conducted and analysed to assess census coverage and produce official population estimates by geographic area and demographic attributes. Considering the usually small size of PES, direct estimation at the des… ▽ More Many countries conduct a full census survey to report official population statistics. As no census survey ever achieves 100 per cent response rate, a post-enumeration survey (PES) is usually conducted and analysed to assess census coverage and produce official population estimates by geographic area and demographic attributes. Considering the usually small size of PES, direct estimation at the desired level of disaggregation is not feasible. Design-based estimation with sampling weight adjustment is a commonly used method but is difficult to implement when survey non-response patterns cannot be fully documented and population benchmarks are not available. We overcome these limitations with a fully model-based Bayesian approach applied to the New Zealand PES. Although theory for the Bayesian treatment of complex surveys has been described, published applications of individual level Bayesian models for complex survey data remain scarce. We provide such an application through a case study of the 2018 census and PES surveys. We implement a multilevel model that accounts for the complex design of PES. We then illustrate how mixed posterior predictive checking and cross-validation can assist with model building and model selection. Finally, we discuss potential methodological improvements to the model and potential solutions to mitigate dependence between the two surveys. △ Less

Submitted 20 July, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: 35 pages, 5 figures This is an author version of a paper accepted for publication in the Journal of Official Statistics. Once published by the Journal of Official Statistics use the Journal citation. This version includes supplementary material and corrected version of Figure 1

arXiv:2104.12516 [pdf]

Evaluating the performance of personal, social, health-related, biomarker and genetic data for predicting an individuals future health using machine learning: A longitudinal analysis

Authors: Mark Green

Abstract: As we gain access to a greater depth and range of health-related information about individuals, three questions arise: (1) Can we build better models to predict individual-level risk of ill health? (2) How much data do we need to effectively predict ill health? (3) Are new methods required to process the added complexity that new forms of data bring? The aim of the study is to apply a machine lear… ▽ More As we gain access to a greater depth and range of health-related information about individuals, three questions arise: (1) Can we build better models to predict individual-level risk of ill health? (2) How much data do we need to effectively predict ill health? (3) Are new methods required to process the added complexity that new forms of data bring? The aim of the study is to apply a machine learning approach to identify the relative contribution of personal, social, health-related, biomarker and genetic data as predictors of future health in individuals. Using longitudinal data from 6830 individuals in the UK from Understanding Society (2010-12 to 2015-17), the study compares the predictive performance of five types of measures: personal (e.g. age, sex), social (e.g. occupation, education), health-related (e.g. body weight, grip strength), biomarker (e.g. cholesterol, hormones) and genetic single nucleotide polymorphisms (SNPs). The predicted outcome variable was limiting long-term illness one and five years from baseline. Two machine learning approaches were used to build predictive models: deep learning via neural networks and XGBoost (gradient boosting decision trees). Model fit was compared to traditional logistic regression models. Results found that health-related measures had the strongest prediction of future health status, with genetic data performing poorly. Machine learning models only offered marginal improvements in model accuracy when compared to logistic regression models, but also performed well on other metrics e.g. neural networks were best on AUC and XGBoost on precision. The study suggests that increasing complexity of data and methods does not necessarily translate to improved understanding of the determinants of health or performance of predictive models of ill health. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: 19 pages

Showing 1–3 of 3 results for author: Green, M