Search | arXiv e-print repository

doi 10.1002/sim.10311

Efficient computation of high-dimensional penalized piecewise constant hazard random effects models

Authors: Hillary M. Heiling, Naim U. Rashid, Quefeng Li, Xianlu L. Peng, Jen Jen Yeh

Abstract: Identifying and characterizing relationships between treatments, exposures, or other covariates and time-to-event outcomes has great significance in a wide range of biomedical settings. In research areas such as multi-center clinical trials, recurrent events, and genetic studies, proportional hazard mixed effects models (PHMMs) are used to account for correlations observed in clusters within the d… ▽ More Identifying and characterizing relationships between treatments, exposures, or other covariates and time-to-event outcomes has great significance in a wide range of biomedical settings. In research areas such as multi-center clinical trials, recurrent events, and genetic studies, proportional hazard mixed effects models (PHMMs) are used to account for correlations observed in clusters within the data. In high dimensions, proper specification of the fixed and random effects within PHMMs is difficult and computationally complex. In this paper, we approximate the proportional hazards mixed effects model with a piecewise constant hazard mixed effects survival model. We estimate the model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We also incorporate a factor model decomposition of the random effects in order to more easily scale the variable selection method to larger dimensions. We demonstrate the utility of our method using simulations, and we apply our method to a multi-study pancreatic ductal adenocarcinoma gene expression dataset to select features important for survival. △ Less

Submitted 1 April, 2025; originally announced April 2025.

Journal ref: Statistics in Medicine 2025

arXiv:2305.08201 [pdf, ps, other]

Efficient Computation of High-Dimensional Penalized Generalized Linear Mixed Models by Latent Factor Modeling of the Random Effects

Authors: Hillary M. Heiling, Naim U. Rashid, Quefeng Li, Xianlu L. Peng, Jen Jen Yeh, Joseph G. Ibrahim

Abstract: Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effec… ▽ More Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches. △ Less

Submitted 16 April, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

arXiv:2101.03985 [pdf]

Gene targeting in disease networks

Authors: Deborah Weighill, Marouen Ben Guebila, Kimberly Glass, John Platig, Jen Jen Yeh, John Quackenbush

Abstract: Profiling of whole transcriptomes has become a cornerstone of molecular biology and an invaluable tool for the characterization of clinical phenotypes and the identification of disease subtypes. Analyses of these data are becoming ever more sophisticated as we move beyond simple comparisons to consider networks of higher-order interactions and associations. Gene regulatory networks model the regul… ▽ More Profiling of whole transcriptomes has become a cornerstone of molecular biology and an invaluable tool for the characterization of clinical phenotypes and the identification of disease subtypes. Analyses of these data are becoming ever more sophisticated as we move beyond simple comparisons to consider networks of higher-order interactions and associations. Gene regulatory networks model the regulatory relationships of transcription factors and genes and have allowed the identification of differentially regulated processes in disease systems. In this perspective we discuss gene targeting scores, which measure changes in inferred regulatory network interactions, and their use in identifying disease-relevant processes. In addition, we present an example analysis or pancreatic ductal adenocarcinoma demonstrating the power of gene targeting scores to identify differential processes between complex phenotypes; processes which would have been missed by only performing differential expression analysis. This example demonstrates that gene targeting scores are an invaluable addition to gene expression analysis in the characterization of diseases and other complex phenotypes. △ Less

Submitted 11 January, 2021; originally announced January 2021.

arXiv:1912.06667 [pdf, other]

High dimensional precision medicine from patient-derived xenografts

Authors: Naim U. Rashid, Daniel J. Luckett, Jingxiang Chen, Michael T. Lawson, Longshaokan Wang, Yunshu Zhang, Eric B. Laber, Yufeng Liu, Jen Jen Yeh, Donglin Zeng, Michael R. Kosorok

Abstract: The complexity of human cancer often results in significant heterogeneity in response to treatment. Precision medicine offers potential to improve patient outcomes by leveraging this heterogeneity. Individualized treatment rules (ITRs) formalize precision medicine as maps from the patient covariate space into the space of allowable treatments. The optimal ITR is that which maximizes the mean of a… ▽ More The complexity of human cancer often results in significant heterogeneity in response to treatment. Precision medicine offers potential to improve patient outcomes by leveraging this heterogeneity. Individualized treatment rules (ITRs) formalize precision medicine as maps from the patient covariate space into the space of allowable treatments. The optimal ITR is that which maximizes the mean of a clinical outcome in a population of interest. Patient-derived xenograft (PDX) studies permit the evaluation of multiple treatments within a single tumor and thus are ideally suited for estimating optimal ITRs. PDX data are characterized by correlated outcomes, a high-dimensional feature space, and a large number of treatments. Existing methods for estimating optimal ITRs do not take advantage of the unique structure of PDX data or handle the associated challenges well. In this paper, we explore machine learning methods for estimating optimal ITRs from PDX data. We analyze data from a large PDX study to identify biomarkers that are informative for developing personalized treatment recommendations in multiple cancers. We estimate optimal ITRs using regression-based approaches such as Q-learning and direct search methods such as outcome weighted learning. Finally, we implement a superlearner approach to combine a set of estimated ITRs and show that the resulting ITR performs better than any of the input ITRs, mitigating uncertainty regarding user choice of any particular ITR estimation methodology. Our results indicate that PDX data are a valuable resource for developing individualized treatment strategies in oncology. △ Less

Submitted 13 December, 2019; originally announced December 2019.

arXiv:1708.05508 [pdf, other]

Modeling Between-Study Heterogeneity for Improved Reproducibility in Gene Signature Selection and Clinical Prediction

Authors: Naim U. Rashid, Quefeng Li, Jen Jen Yeh, Joseph G. Ibrahim

Abstract: In the genomic era, the identification of gene signatures associated with disease is of significant interest. Such signatures are often used to predict clinical outcomes in new patients and aid clinical decision-making. However, recent studies have shown that gene signatures are often not replicable. This occurrence has practical implications regarding the generalizability and clinical applicabili… ▽ More In the genomic era, the identification of gene signatures associated with disease is of significant interest. Such signatures are often used to predict clinical outcomes in new patients and aid clinical decision-making. However, recent studies have shown that gene signatures are often not replicable. This occurrence has practical implications regarding the generalizability and clinical applicability of such signatures. To improve replicability, we introduce a novel approach to select gene signatures from multiple datasets whose effects are consistently non-zero and account for between-study heterogeneity. We build our model upon some rank-based quantities, facilitating integration over different genomic datasets. A high dimensional penalized Generalized Linear Mixed Model (pGLMM) is used to select gene signatures and address data heterogeneity. We compare our method to some commonly used strategies that select gene signatures ignoring between-study heterogeneity. We provide asymptotic results justifying the performance of our method and demonstrate its advantage in the presence of heterogeneity through thorough simulation studies. Lastly, we motivate our method through a case study subtyping pancreatic cancer patients from four gene expression studies. △ Less

Submitted 26 March, 2019; v1 submitted 18 August, 2017; originally announced August 2017.

arXiv:1605.09261 [pdf, other]

doi 10.3847/0004-637X/830/2/91

AMiBA: Cluster Sunyaev-Zel'dovich Effect Observations with the Expanded 13-Element Array

Authors: Kai-Yang Lin, Hiroaki Nishioka, Fu-Cheng Wang, Chih-Wei Locutus Huang, Yu-Wei Liao, Jiun-Huei Proty Wu, Patrick M. Koch, Keiichi Umetsu, Ming-Tang Chen, Shun-Hsiang Chan, Shu-Hao Chang, Wen-Hsuan Lucky Chang, Tai-An Cheng, Hoang Ngoc Duy, Szu-Yuan Fu, Chih-Chiang Han, Solomon Ho, Ming-Feng Ho, Paul T. P. Ho, Yau-De Huang, Homin Jiang, Derek Y. Kubo, Chao-Te Li, Yu-Chiung Lin, Guo-Chin Liu , et al. (13 additional authors not shown)

Abstract: The Yuan-Tseh Lee Array for Microwave Background Anisotropy (AMiBA) is a co-planar interferometer array operating at a wavelength of 3mm to measure the Sunyaev-Zeldovich effect (SZE) of galaxy clusters. In the first phase of operation -- with a compact 7-element array with 0.6m antennas (AMiBA-7) -- we observed six clusters at angular scales from 5\arcmin to 23\arcmin. Here, we describe the expans… ▽ More The Yuan-Tseh Lee Array for Microwave Background Anisotropy (AMiBA) is a co-planar interferometer array operating at a wavelength of 3mm to measure the Sunyaev-Zeldovich effect (SZE) of galaxy clusters. In the first phase of operation -- with a compact 7-element array with 0.6m antennas (AMiBA-7) -- we observed six clusters at angular scales from 5\arcmin to 23\arcmin. Here, we describe the expansion of AMiBA to a 13-element array with 1.2m antennas (AMiBA-13), its subsequent commissioning, and our cluster SZE observing program. The most important changes compared to AMiBA-7 are (1) array re-configuration with baselines ranging from 1.4m to 4.8m covering angular scales from 2\arcmin to 11.5\arcmin, (2) thirteen new lightweight carbon-fiber-reinforced plastic (CFRP) 1.2m reflectors, and (3) additional correlators and six new receivers. From the AMiBA-13 SZE observing program, we present here maps of a subset of twelve clusters. In highlights, we combine AMiBA-7 and AMiBA-13 observations of Abell 1689 and perform a joint fitting assuming a generalized NFW pressure profile. Our cylindrically integrated Compton-y values for this cluster are consistent with the BIMA/OVRA, SZA, and Planck results. We report the first targeted SZE detection towards the optically selected galaxy cluster RCS J1447+0828, and we demonstrate the ability of AMiBA SZE data to serve as a proxy for the total cluster mass. Finally, we show that our AMiBA-SZE derived cluster masses are consistent with recent lensing mass measurements in the literature. △ Less

Submitted 29 July, 2016; v1 submitted 30 May, 2016; originally announced May 2016.

Comments: 21 pages, 19 figures; Accepted by ApJ; Comparison between AMiBA-SZE and lensing masses has been newly included in the accepted version

Showing 1–6 of 6 results for author: Yeh, J J