Search | arXiv e-print repository

arXiv:2306.14200 [pdf]

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates

Authors: Hon-Cheong So, Xiao Xue, Pak-Chung Sham

Abstract: Genome-wide association studies (GWAS) are commonly employed to study the genetic basis of complex traits and diseases, and a key question is how much heritability could be explained by all variants in GWAS. One widely used approach that relies on summary statistics only is LD score regression (LDSC), however the approach requires certain assumptions on the SNP effects (all SNPs contribute to heri… ▽ More Genome-wide association studies (GWAS) are commonly employed to study the genetic basis of complex traits and diseases, and a key question is how much heritability could be explained by all variants in GWAS. One widely used approach that relies on summary statistics only is LD score regression (LDSC), however the approach requires certain assumptions on the SNP effects (all SNPs contribute to heritability and each SNP contributes equal variance). More flexible modeling methods may be useful. We previously developed an approach recovering the true z-statistics from a set of observed z-statistics with an empirical Bayes approach, using only summary statistics. However, methods for standard error (SE) estimation are not available yet, limiting the interpretation of results and applicability of the approach. In this study we developed several resampling-based approaches to estimate the SE of SNP-based heritability, including two jackknife and three parametric bootstrap methods. Simulations showed that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Particularly, the parametric bootstrap approaches yield the lowest root-mean-squared-error (RMSE) of the true SE. In addition, we applied our method to estimate SNP-based heritability of 12 immune-related traits (levels of cytokines and growth factors) to shed light on their genetic architecture. We also implemented the methods to compute the sum of heritability explained and the corresponding SE in an R package SumVg, available at https://github.com/lab-hcso/Estimating-SE-of-total-heritability/ . In conclusion, SumVg may provide a useful alternative tool for SNP heritability and SE estimates, which does not rely on distributional assumptions of SNP effects. △ Less

Submitted 25 June, 2023; originally announced June 2023.

arXiv:2108.01848 [pdf]

Improved Non-parametric Penalized Maximum Likelihood Estimation for Arbitrarily Censored Survival Data

Authors: Justin D. Tubbs, Lane Guolan Chen, Thuan Quoc Thach, Pak C. Sham

Abstract: Non-parametric maximum likelihood estimation encompasses a group of classic methods to estimate distribution-associated functions from potentially censored and truncated data, with extensive applications in survival analysis. These methods, including the Kaplan-Meier estimator and Turnbull's method, often result in overfitting, especially when the sample size is small. We propose an improvement to… ▽ More Non-parametric maximum likelihood estimation encompasses a group of classic methods to estimate distribution-associated functions from potentially censored and truncated data, with extensive applications in survival analysis. These methods, including the Kaplan-Meier estimator and Turnbull's method, often result in overfitting, especially when the sample size is small. We propose an improvement to these methods by applying kernel smoothing to their raw estimates, based on a BIC-type loss function that balances the trade-off between optimizing model fit and controlling model complexity. In the context of a longitudinal study with repeated observations, we detail our proposed smoothing procedure and optimization algorithm. With extensive simulation studies over multiple realistic scenarios, we demonstrate that our smoothing-based procedure provides better overall accuracy in both survival function estimation and individual-level time-to-event prediction by reducing overfitting. Our smoothing procedure decreases the discrepancy between the estimated and true simulated survival function using interval-censored data by up to 49% compared to the raw un-smoothed estimate, with similar improvements of up to 41% and 23% in within-sample and out-of-sample prediction, respectively. Finally, we apply our method to real data on censored breast cancer diagnosis, which similarly shows improvement when compared to empirical survival estimates from uncensored data. We provide an R package, SISE, for implementing our penalized likelihood method. △ Less

Submitted 4 August, 2021; originally announced August 2021.

arXiv:2003.08518 [pdf]

A framework to decipher the genetic architecture of combinations of complex diseases: applications in cardiovascular medicine

Authors: Liangying Yin, Carlos Kwan-long Chau, Yu-Ping Lin, Pak-Chung Sham, Hon-Cheong So

Abstract: Genome-wide association studies(GWAS) have proven to be highly useful in revealing the genetic basis of complex diseases. At present, most GWAS are studies of a particular single disease diagnosis against controls. However, in practice, an individual is often affected by more than one condition/disorder. For example, patients with coronary artery disease(CAD) are often comorbid with diabetes melli… ▽ More Genome-wide association studies(GWAS) have proven to be highly useful in revealing the genetic basis of complex diseases. At present, most GWAS are studies of a particular single disease diagnosis against controls. However, in practice, an individual is often affected by more than one condition/disorder. For example, patients with coronary artery disease(CAD) are often comorbid with diabetes mellitus(DM). Along a similar line, it is often clinically meaningful to study patients with one disease but without a comorbidity. For example, obese DM may have different pathophysiology from non-obese DM. Here we developed a statistical framework to uncover susceptibility variants for comorbid disorders (or a disorder without comorbidity), using GWAS summary statistics only. In essence, we mimicked a case-control GWAS in which the cases are affected with comorbidities or a disease without a relevant comorbid condition (in either case, we may consider the cases as those affected by a specific subtype of disease, as characterized by the presence or absence of comorbid conditions). We extended our methodology to deal with continuous traits with clinically meaningful categories (e.g. lipids). In addition, we illustrated how the analytic framework may be extended to more than two traits. We verified the feasibility and validity of our method by applying it to simulated scenarios and four cardiometabolic (CM) traits. We also analyzed the genes, pathways, cell-types/tissues involved in CM disease subtypes. LD-score regression analysis revealed some subtypes may indeed be biologically distinct with low genetic correlations. Further Mendelian randomization analysis found differential causal effects of different subtypes to relevant complications. We believe the findings are of both scientific and clinical value, and the proposed method may open a new avenue to analyzing GWAS data. △ Less

Submitted 29 December, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

arXiv:1611.03191 [pdf]

Exploring shared genetic bases and causal relationships of schizophrenia and bipolar disorder with 28 cardiovascular and metabolic traits

Authors: Hon-Cheong So, Carlos Kwan-Long Chau, Fu-Kiu Ao, Cheuk-Hei Mo, Pak-Chung Sham

Abstract: Cardiovascular diseases (CVD) represent a major health issue in patients with schizophrneia (SCZ) and bipolar disorder (BD), but the exact nature of cardiometabolic (CM) abnormalities involved and the underlying mechanisms remain unclear. Using polygenic risk scores (PRS) and LD score regression, we investigated the shared genetic bases of SCZ and BD with a panel of 28 cardiometabolic traits. We p… ▽ More Cardiovascular diseases (CVD) represent a major health issue in patients with schizophrneia (SCZ) and bipolar disorder (BD), but the exact nature of cardiometabolic (CM) abnormalities involved and the underlying mechanisms remain unclear. Using polygenic risk scores (PRS) and LD score regression, we investigated the shared genetic bases of SCZ and BD with a panel of 28 cardiometabolic traits. We performed Mendelian randomization (MR) to elucidate casual relationships between the two groups of disorders. The analysis was based on large-scale meta-analyses of genome-wide association studies (GWAS). We also identified the potential shared genetic variants by a statistical approach based on local true discovery rates, and inferred the pathways involved. We found polygenic associations of SCZ with glucose metabolism abnormalities, adverse adipokine profiles, increased wait-hip ratio and raised visceral adiposity. However, BMI showed inverse genetic correlation and polygenic link with SCZ. On the other hand, we observed polygenic associations with an overall favorable CM profile in BD. MR analysis showed that SCZ may be causally linked to raised triglyceride and that lower fasting glucose may be linked to BD; otherwise MR did not reveal other significant causal relationships in general. We also identified numerous SNPs and pathways shared between SCZ/BD with cardiometabolic traits, some of which are related to inflammation or the immune system. In conclusion, SCZ patients may be genetically associated with several CM abnormalities independent of medication side-effects, and proper surveillance and management of CV risk factors may be required from the onset of the disease. On the other hand, CM abnormalities in BD are more likely to be secondary. △ Less

Submitted 5 December, 2017; v1 submitted 10 November, 2016; originally announced November 2016.

Showing 1–4 of 4 results for author: Sham, P