Data collaboration for causal inference from limited medical testing and medication data
Authors:
Tomoru Nakayama,
Yuji Kawamata,
Akihiro Toyoda,
Akira Imakura,
Rina Kagawa,
Masaru Sanuki,
Ryoya Tsunoda,
Kunihiro Yamagata,
Tetsuya Sakurai,
Yukihiko Okada
Abstract:
Observational studies enable causal inferences when randomized controlled trials (RCTs) are not feasible. However, integrating sensitive medical data across multiple institutions introduces significant privacy challenges. The data collaboration quasi-experiment (DC-QE) framework addresses these concerns by sharing "intermediate representations" -- dimensionality-reduced data derived from raw data…
▽ More
Observational studies enable causal inferences when randomized controlled trials (RCTs) are not feasible. However, integrating sensitive medical data across multiple institutions introduces significant privacy challenges. The data collaboration quasi-experiment (DC-QE) framework addresses these concerns by sharing "intermediate representations" -- dimensionality-reduced data derived from raw data -- instead of the raw data. While the DC-QE can estimate treatment effects, its application to medical data remains unexplored. This study applied the DC-QE framework to medical data from a single institution to simulate distributed data environments under independent and identically distributed (IID) and non-IID conditions. We propose a novel method for generating intermediate representations within the DC-QE framework. Experimental results demonstrated that DC-QE consistently outperformed individual analyses across various accuracy metrics, closely approximating the performance of centralized analysis. The proposed method further improved performance, particularly under non-IID conditions. These outcomes highlight the potential of the DC-QE framework as a robust approach for privacy-preserving causal inferences in healthcare. Broader adoption of this framework and increased use of intermediate representations could grant researchers access to larger, more diverse datasets while safeguarding patient confidentiality. This approach may ultimately aid in identifying previously unrecognized causal relationships, support drug repurposing efforts, and enhance therapeutic interventions for rare diseases.
△ Less
Submitted 20 March, 2025; v1 submitted 11 January, 2025;
originally announced January 2025.
Effect of exclusion criteria on the distribution of blood test values
Authors:
Rina Kagawa,
Masanori Shiro
Abstract:
The increasing demand for personalized health care has led to the expectation that individualized quantitative evaluation of human disease states is possible. However, this has not yet been achieved at a sufficiently low cost. Our ultimate goal is to determine the most accurate distributions of blood tests commonly used in health checkups. In this study, we quantified differences between the estim…
▽ More
The increasing demand for personalized health care has led to the expectation that individualized quantitative evaluation of human disease states is possible. However, this has not yet been achieved at a sufficiently low cost. Our ultimate goal is to determine the most accurate distributions of blood tests commonly used in health checkups. In this study, we quantified differences between the estimated distributions based on four datasets using the lognormal distribution with three parameters and analyzed the cause of the differences. We focused on two causes of differences: the exclusion criteria and distribution used for estimation of distributions. We compared the expected values across datasets for each laboratory test. We also quantitatively evaluated differences in the shape of the estimated distribution corresponding to the exclusion criteria. We found that exclusion criteria have an important influence on the shape of the distribution for blood test values.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.