-
Addressing Ancestry Disparities in Genomic Medicine: A Geographic-aware Algorithm
Authors:
Daniel Mas Montserrat,
Arvind Kumar,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
With declining sequencing costs a promising and affordable tool is emerging in cancer diagnostics: genomics. By using association studies, genomic variants that predispose patients to specific cancers can be identified, while by using tumor genomics cancer types can be characterized for targeted treatment. However, a severe disparity is rapidly emerging in this new area of precision cancer diagnos…
▽ More
With declining sequencing costs a promising and affordable tool is emerging in cancer diagnostics: genomics. By using association studies, genomic variants that predispose patients to specific cancers can be identified, while by using tumor genomics cancer types can be characterized for targeted treatment. However, a severe disparity is rapidly emerging in this new area of precision cancer diagnosis and treatment planning, one which separates a few genetically well-characterized populations (predominantly European) from all other global populations. Here we discuss the problem of population-specific genetic associations, which is driving this disparity, and present a novel solution--coordinate-based local ancestry--for helping to address it. We demonstrate our boosting-based method on whole genome data from divergent groups across Africa and in the process observe signals that may stem from the transcontinental Bantu-expansion.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
LAI-Net: Local-Ancestry Inference with Neural Networks
Authors:
Daniel Mas Montserrat,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
Local-ancestry inference (LAI), also referred to as ancestry deconvolution, provides high-resolution ancestry estimation along the human genome. In both research and industry, LAI is emerging as a critical step in DNA sequence analysis with applications extending from polygenic risk scores (used to predict traits in embryos and disease risk in adults) to genome-wide association studies, and from p…
▽ More
Local-ancestry inference (LAI), also referred to as ancestry deconvolution, provides high-resolution ancestry estimation along the human genome. In both research and industry, LAI is emerging as a critical step in DNA sequence analysis with applications extending from polygenic risk scores (used to predict traits in embryos and disease risk in adults) to genome-wide association studies, and from pharmacogenomics to inference of human population history. While many LAI methods have been developed, advances in computing hardware (GPUs) combined with machine learning techniques, such as neural networks, are enabling the development of new methods that are fast, robust and easily shared and stored. In this paper we develop the first neural network based LAI method, named LAI-Net, providing competitive accuracy with state-of-the-art methods and robustness to missing or noisy data, while having a small number of layers.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Class-Conditional VAE-GAN for Local-Ancestry Simulation
Authors:
Daniel Mas Montserrat,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
Local ancestry inference (LAI) allows identification of the ancestry of all chromosomal segments in admixed individuals, and it is a critical step in the analysis of human genomes with applications from pharmacogenomics and precision medicine to genome-wide association studies. In recent years, many LAI techniques have been developed in both industry and academic research. However, these methods r…
▽ More
Local ancestry inference (LAI) allows identification of the ancestry of all chromosomal segments in admixed individuals, and it is a critical step in the analysis of human genomes with applications from pharmacogenomics and precision medicine to genome-wide association studies. In recent years, many LAI techniques have been developed in both industry and academic research. However, these methods require large training data sets of human genomic sequences from the ancestries of interest. Such reference data sets are usually limited, proprietary, protected by privacy restrictions, or otherwise not accessible to the public. Techniques to generate training samples that resemble real haploid sequences from ancestries of interest can be useful tools in such scenarios, since a generalized model can often be shared, but the unique human sample sequences cannot. In this work we present a class-conditional VAE-GAN to generate new human genomic sequences that can be used to train local ancestry inference (LAI) algorithms. We evaluate the quality of our generated data by comparing the performance of a state-of-the-art LAI method when trained with generated versus real data.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.