-
The Genomic Landscape of Oceania
Authors:
Consuelo D. Quinto-Cortés,
Carmina Barberena Jonas,
Sofía Vieyra-Sánchez,
Stephen Oppenheimer,
Ram González-Buenfil,
Kathryn Auckland,
Kathryn Robson,
Tom Parks,
J. Víctor Moreno-Mayar,
Javier Blanco-Portillo,
Julian R. Homburger,
Genevieve L. Wojcik,
Alissa L. Severson,
Jonathan S. Friedlaender,
Francoise Friedlaender,
Angela Allen,
Stephen Allen,
Mark Stoneking,
Adrian V. S. Hill,
George Aho,
George Koki,
William Pomat,
Carlos D. Bustamante,
Maude Phipps,
Alexander J. Mentzer
, et al. (2 additional authors not shown)
Abstract:
Encompassing regions that were amongst the first inhabited by humans following the out-of-Africa expansion, hosting populations with the highest levels of archaic hominid introgression, and including Pacific islands that are the most isolated inhabited locations on the planet, Oceania has a rich, but understudied, human genomic landscape. Here we describe the first region-wide analysis of genome-w…
▽ More
Encompassing regions that were amongst the first inhabited by humans following the out-of-Africa expansion, hosting populations with the highest levels of archaic hominid introgression, and including Pacific islands that are the most isolated inhabited locations on the planet, Oceania has a rich, but understudied, human genomic landscape. Here we describe the first region-wide analysis of genome-wide data from population groups spanning Oceania and its surroundings, from island and peninsular southeast Asia to Papua New Guinea, east across the Pacific through Melanesia, Micronesia, and Polynesia, and west across the Indian Ocean to related island populations in the Andamans and Madagascar. In total we generate and analyze genome-wide data from 981 individuals from 92 different populations, 58 separate islands, and 30 countries, representing the most expansive study of Pacific genetics to date. In each sample we disentangle the Papuan and more recent Austronesian ancestries, which have admixed in various proportions across this region, using ancestry-specific analyses, and characterize the distinct patterns of settlement, migration, and archaic introgression separately in these two ancestries. We also focus on the patterns of clinically relevant genetic variation across Oceania--a landscape rippled with strong founder effects and island-specific genetic drift in allele frequencies--providing an atlas for the development of precision genetic health strategies in this understudied region of the world.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Ancestry-specific analyses of genome-wide data confirm the settlement sequence of Polynesia
Authors:
Alexander G. Ioannidis,
Javier Blanco-Portillo,
Erika Hagelberg,
Juan Esteban Rodríguez-Rodríguez,
Keolu Fox,
Adrian V. S. Hill,
Carlos D. Bustamante,
Marcus W. Feldman,
Alexander J. Mentzer,
Andrés Moreno-Estrada
Abstract:
By demonstrating the role that historical population replacements and waves of admixture have played around the world, the genetics work of Reich and colleagues has provided a paradigm for understanding human history [Reich et al. 2009; Reich et al. 2012; Patterson et al. 2012]. Although we show in Ioannidis et al. [2021] that the peopling of Polynesia was a range expansion, and not, as suggested…
▽ More
By demonstrating the role that historical population replacements and waves of admixture have played around the world, the genetics work of Reich and colleagues has provided a paradigm for understanding human history [Reich et al. 2009; Reich et al. 2012; Patterson et al. 2012]. Although we show in Ioannidis et al. [2021] that the peopling of Polynesia was a range expansion, and not, as suggested by Huang et al. [2022], yet another example of waves of admixture and large-scale gene flow between populations, we believe that our result in this recently settled oceanic expanse is the exception that proves the rule.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Diversity in immunogenomics: the value and the challenge
Authors:
Kerui Peng,
Yana Safonova,
Mikhail Shugay,
Alice Popejoy,
Oscar Rodriguez,
Felix Breden,
Petter Brodin,
Amanda M. Burkhardt,
Carlos Bustamante,
Van-Mai Cao-Lormeau,
Martin M. Corcoran,
Darragh Duffy,
Macarena Fuentes Guajardo,
Ricardo Fujita,
Victor Greiff,
Vanessa D. Jonsson,
Xiao Liu,
Lluis Quintana-Murci,
Maura Rossetti,
Jianming Xie,
Gur Yaari,
Wei Zhang,
Malak S. Abedalthagafi,
Khalid O. Adekoya,
Rahaman A. Ahmed
, et al. (10 additional authors not shown)
Abstract:
With the advent of high-throughput sequencing technologies, the fields of immunogenomics and adaptive immune receptor repertoire research are facing both opportunities and challenges. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become an increasingly important tool to characterize T and B cell responses in settings of interest. However, the majority of AIRR-seq studies conducted…
▽ More
With the advent of high-throughput sequencing technologies, the fields of immunogenomics and adaptive immune receptor repertoire research are facing both opportunities and challenges. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become an increasingly important tool to characterize T and B cell responses in settings of interest. However, the majority of AIRR-seq studies conducted so far were performed in individuals of European ancestry, restricting the ability to identify variation in human adaptive immune responses across populations and limiting their applications. As AIRR-seq studies depend on the ability to assign VDJ sequence reads to the correct germline gene segments, efforts to characterize the genomic loci that encode adaptive immune receptor genes in different populations are urgently needed. The availability of comprehensive germline gene databases and further applications of AIRR-seq studies to individuals of non-European ancestry will substantially enhance our understanding of human adaptive immune responses, promote the development of effective diagnostics and treatments, and eventually advance precision medicine.
△ Less
Submitted 1 March, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Addressing Ancestry Disparities in Genomic Medicine: A Geographic-aware Algorithm
Authors:
Daniel Mas Montserrat,
Arvind Kumar,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
With declining sequencing costs a promising and affordable tool is emerging in cancer diagnostics: genomics. By using association studies, genomic variants that predispose patients to specific cancers can be identified, while by using tumor genomics cancer types can be characterized for targeted treatment. However, a severe disparity is rapidly emerging in this new area of precision cancer diagnos…
▽ More
With declining sequencing costs a promising and affordable tool is emerging in cancer diagnostics: genomics. By using association studies, genomic variants that predispose patients to specific cancers can be identified, while by using tumor genomics cancer types can be characterized for targeted treatment. However, a severe disparity is rapidly emerging in this new area of precision cancer diagnosis and treatment planning, one which separates a few genetically well-characterized populations (predominantly European) from all other global populations. Here we discuss the problem of population-specific genetic associations, which is driving this disparity, and present a novel solution--coordinate-based local ancestry--for helping to address it. We demonstrate our boosting-based method on whole genome data from divergent groups across Africa and in the process observe signals that may stem from the transcontinental Bantu-expansion.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
LAI-Net: Local-Ancestry Inference with Neural Networks
Authors:
Daniel Mas Montserrat,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
Local-ancestry inference (LAI), also referred to as ancestry deconvolution, provides high-resolution ancestry estimation along the human genome. In both research and industry, LAI is emerging as a critical step in DNA sequence analysis with applications extending from polygenic risk scores (used to predict traits in embryos and disease risk in adults) to genome-wide association studies, and from p…
▽ More
Local-ancestry inference (LAI), also referred to as ancestry deconvolution, provides high-resolution ancestry estimation along the human genome. In both research and industry, LAI is emerging as a critical step in DNA sequence analysis with applications extending from polygenic risk scores (used to predict traits in embryos and disease risk in adults) to genome-wide association studies, and from pharmacogenomics to inference of human population history. While many LAI methods have been developed, advances in computing hardware (GPUs) combined with machine learning techniques, such as neural networks, are enabling the development of new methods that are fast, robust and easily shared and stored. In this paper we develop the first neural network based LAI method, named LAI-Net, providing competitive accuracy with state-of-the-art methods and robustness to missing or noisy data, while having a small number of layers.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Class-Conditional VAE-GAN for Local-Ancestry Simulation
Authors:
Daniel Mas Montserrat,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
Local ancestry inference (LAI) allows identification of the ancestry of all chromosomal segments in admixed individuals, and it is a critical step in the analysis of human genomes with applications from pharmacogenomics and precision medicine to genome-wide association studies. In recent years, many LAI techniques have been developed in both industry and academic research. However, these methods r…
▽ More
Local ancestry inference (LAI) allows identification of the ancestry of all chromosomal segments in admixed individuals, and it is a critical step in the analysis of human genomes with applications from pharmacogenomics and precision medicine to genome-wide association studies. In recent years, many LAI techniques have been developed in both industry and academic research. However, these methods require large training data sets of human genomic sequences from the ancestries of interest. Such reference data sets are usually limited, proprietary, protected by privacy restrictions, or otherwise not accessible to the public. Techniques to generate training samples that resemble real haploid sequences from ancestries of interest can be useful tools in such scenarios, since a generalized model can often be shared, but the unique human sample sequences cannot. In this work we present a class-conditional VAE-GAN to generate new human genomic sequences that can be used to train local ancestry inference (LAI) algorithms. We evaluate the quality of our generated data by comparing the performance of a state-of-the-art LAI method when trained with generated versus real data.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
Network Enhancement: a general method to denoise weighted biological networks
Authors:
Bo Wang,
Armin Pourshafeie,
Marinka Zitnik,
Junjie Zhu,
Carlos D. Bustamante,
Serafim Batzoglou,
Jure Leskovec
Abstract:
Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise rati…
▽ More
Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise ratio of undirected, weighted networks. NE uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network. As a result, NE removes weak edges, enhances real connections, and leads to better downstream performance. Experiments show that NE improves gene function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species. Our results indicate that NE is widely applicable for denoising biological networks.
△ Less
Submitted 1 June, 2018; v1 submitted 8 May, 2018;
originally announced May 2018.
-
Reconstructing Native American Migrations from Whole-genome and Whole-exome Data
Authors:
Simon Gravel,
Fouad Zakharia,
Andres Moreno-Estrada,
Jake K Byrnes,
Marina Muzzio,
Juan L. Rodriguez-Flores,
Eimear E. Kenny,
Christopher R. Gignoux,
Brian K. Maples,
Wilfried Guiblet,
Julie Dutil,
Marc Via,
Karla Sandoval,
Gabriel Bedoya,
Taras K Oleksyk,
Andres Ruiz-Linares,
Esteban G Burchard,
Juan Carlos Martinez-Cruzado,
Carlos D. Bustamante,
The 1000 Genomes Project
Abstract:
There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparenta…
▽ More
There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern America ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent and ancestry tract length, we show that post-contact populations differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe.
△ Less
Submitted 15 November, 2013; v1 submitted 17 June, 2013;
originally announced June 2013.
-
Reconstructing the Population Genetic History of the Caribbean
Authors:
Andres Moreno-Estrada,
Simon Gravel,
Fouad Zakharia,
Jacob L. McCauley,
Jake K. Byrnes,
Christopher R. Gignoux,
Patricia A. Ortiz-Tello,
Ricardo J. Martinez,
Dale J. Hedges,
Richard W. Morris,
Celeste Eng,
Karla Sandoval,
Suehelay Acevedo-Acevedo,
Juan Carlos Martinez-Cruzado,
Paul J. Norman,
Zulay Layrisse,
Peter Parham,
Esteban Gonzalez Burchard,
Michael L. Cuccaro,
Eden R. Martin,
Carlos D. Bustamante
Abstract:
The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, by making use of genome-wide SNP array data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as fro…
▽ More
The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, by making use of genome-wide SNP array data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures in present day Afro-Caribbean genomes and shedding light on the genetic impact of the dynamics occurring during the slave trade in the Caribbean.
△ Less
Submitted 3 June, 2013;
originally announced June 2013.
-
Genome Sequencing Highlights Genes Under Selection and the Dynamic Early History of Dogs
Authors:
Adam H. Freedman,
Rena M. Schweizer,
Ilan Gronau,
Eunjung Han,
Diego Ortega-Del Vecchyo,
Pedro M. Silva,
Marco Galaverni,
Zhenxin Fan,
Peter Marx,
Belen Lorente-Galdos,
Holly Beale,
Oscar Ramirez,
Farhad Hormozdiari,
Can Alkan,
Carles Vilà,
Kevin Squire,
Eli Geffen,
Josip Kusak,
Adam R. Boyko,
Heidi G. Parker,
Clarence Lee,
Vasisht Tadigotla,
Adam Siepel,
Carlos D. Bustamante,
Timothy T. Harkins
, et al. (5 additional authors not shown)
Abstract:
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we analyzed novel high-quality genome sequences of three gray wolves, one from each of three putative centers of dog domestication, two ancient dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. We find dogs and wolves diverged through a dynamic process involving population…
▽ More
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we analyzed novel high-quality genome sequences of three gray wolves, one from each of three putative centers of dog domestication, two ancient dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. We find dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow, which confounds previous inferences of dog origins. In dogs, the domestication bottleneck was severe involving a 17 to 49-fold reduction in population size, a much stronger bottleneck than estimated previously from less intensive sequencing efforts. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was far larger than represented by modern wolf populations. Conditional on mutation rate, we narrow the plausible range for the date of initial dog domestication to an interval from 11 to 16 thousand years ago. This period predates the rise of agriculture, implying that the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that surprisingly, none of the extant wolf lineages from putative domestication centers are more closely related to dogs, and the sampled wolves instead form a sister monophyletic clade. This result, in combination with our finding of dog-wolf admixture during the process of domestication, suggests a re-evaluation of past hypotheses of dog origin is necessary. Finally, we also detect signatures of selection, including evidence for selection on genes implicated in morphology, metabolism, and neural development. Uniquely, we find support for selective sweeps at regulatory sites suggesting gene regulatory changes played a critical role in dog domestication.
△ Less
Submitted 4 June, 2013; v1 submitted 31 May, 2013;
originally announced May 2013.
-
Genetic recombination is targeted towards gene promoter regions in dogs
Authors:
Adam Auton,
Ying Rui Li,
Jeffrey Kidd,
Kyle Oliveira,
Julie Nadel,
J. Kim Holloway,
Jessica J. Hayward,
Paula E. Cohen,
John M. Greally,
Jun Wang,
Carlos D. Bustamante,
Adam R. Boyko
Abstract:
The identification of the H3K4 trimethylase, PRDM9, as the gene responsible for recombination hotspot localization has provided considerable insight into the mechanisms by which recombination is initiated in mammals. However, uniquely amongst mammals, canids appear to lack a functional version of PRDM9 and may therefore provide a model for understanding recombination that occurs in the absence of…
▽ More
The identification of the H3K4 trimethylase, PRDM9, as the gene responsible for recombination hotspot localization has provided considerable insight into the mechanisms by which recombination is initiated in mammals. However, uniquely amongst mammals, canids appear to lack a functional version of PRDM9 and may therefore provide a model for understanding recombination that occurs in the absence of PRDM9, and thus how PRDM9 functions to shape the recombination landscape. We have constructed a fine-scale genetic map from patterns of linkage disequilibrium assessed using high-throughput sequence data from 51 free-ranging dogs, Canis lupus familiaris. While broad-scale properties of recombination appear similar to other mammalian species, our fine-scale estimates indicate that canine highly elevated recombination rates are observed in the vicinity of CpG rich regions including gene promoter regions, but show little association with H3K4 trimethylation marks identified in spermatocytes. By comparison to genomic data from the Andean fox, Lycalopex culpaeus, we show that biased gene conversion is a plausible mechanism by which the high CpG content of the dog genome could have occurred.
△ Less
Submitted 29 August, 2013; v1 submitted 28 May, 2013;
originally announced May 2013.
-
Improved haplotyping of rare variants using next-generation sequence data
Authors:
Fouad Zakharia,
Carlos Bustamante
Abstract:
Accurate identification of haplotypes in sequenced human genomes can provide invaluable information about population demography and fine-scale correlations along the genome, thus empowering both population genomic and medical association studies. Yet phasing unrelated individuals remains a challenging problem. Incorporating available data from high throughput sequencing into traditional statistica…
▽ More
Accurate identification of haplotypes in sequenced human genomes can provide invaluable information about population demography and fine-scale correlations along the genome, thus empowering both population genomic and medical association studies. Yet phasing unrelated individuals remains a challenging problem. Incorporating available data from high throughput sequencing into traditional statistical phasing approaches is a promising avenue to alleviate these issues. We present a novel statistical method that expands on an existing graphical haplotype reconstruction method (shapeIT) to incorporate phasing information from paired-end read data. The algorithm harnesses the haplotype graph information estimated by shapeIT from genotypes across the population and refines haplotype likelihoods for a given individual to be compatible with the sequencing data. Applying the method to HapMap individuals genotyped on the Affymetrix Axiom chip at 7,745,081 SNPs and on a trio sequenced by Complete Genomics, we found that the inclusion of paired end read data significantly improved phasing, with reductions in switch error on the order of 4-15% against shapeIT across all panels. As expected, the improvements were found to be most significant at sites harboring rare variants; furthermore, we found that longer read sizes and higher throughput translated to greater decreases in switching error, as did higher variance in the size of the insert separating the two reads--suggesting that multi-platform next generation sequencing may be exploited to yield particularly accurate haplotypes. Overall, the phasing improvements afforded by this new method highlight the power of integrating sequencing read information and population genotype data for reconstructing haplotypes in unrelated individuals.
△ Less
Submitted 9 November, 2012;
originally announced November 2012.
-
The genetic prehistory of southern Africa
Authors:
Joseph K. Pickrell,
Nick Patterson,
Chiara Barbieri,
Falko Berthold,
Linda Gerlach,
Tom Güldemann,
Blesswell Kure,
Sununguko Wata Mpoloka,
Hirosi Nakagawa,
Christfried Naumann,
Mark Lipson,
Po-Ru Loh,
Joseph Lachance,
Joanna Mountain,
Carlos Bustamante,
Bonnie Berger,
Sarah Tishkoff,
Brenna Henn,
Mark Stoneking,
David Reich,
Brigitte Pakendorf
Abstract:
Southern and eastern African populations that speak non-Bantu languages with click consonants are known to harbour some of the most ancient genetic lineages in humans, but their relationships are poorly understood. Here, we report data from 23 populations analyzed at over half a million single nucleotide polymorphisms, using a genome-wide array designed for studying human history. The southern Afr…
▽ More
Southern and eastern African populations that speak non-Bantu languages with click consonants are known to harbour some of the most ancient genetic lineages in humans, but their relationships are poorly understood. Here, we report data from 23 populations analyzed at over half a million single nucleotide polymorphisms, using a genome-wide array designed for studying human history. The southern African Khoisan fall into two genetic groups, loosely corresponding to the northwestern and southeastern Kalahari, which we show separated within the last 30,000 years. We find that all individuals derive at least a few percent of their genomes from admixture with non-Khoisan populations that began approximately 1,200 years ago. In addition, the east African Hadza and Sandawe derive a fraction of their ancestry from admixture with a population related to the Khoisan, supporting the hypothesis of an ancient link between southern and eastern Africa
△ Less
Submitted 17 September, 2012; v1 submitted 23 July, 2012;
originally announced July 2012.
-
A robust approach to estimating rates from time-correlation functions
Authors:
John D. Chodera,
Phillip J. Elms,
William C. Swope,
Jan-Hendrik Prinz,
Susan Marqusee,
Carlos Bustamante,
Frank Noé,
Vijay S. Pande
Abstract:
While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an ap…
▽ More
While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an appropriate reactive flux function. However, when applied to real data from single-molecule experiments or molecular dynamics simulations, the rate can sometimes be difficult to extract due to the numerical differentiation of a noisy empirical correlation function or difficulty in locating the plateau region at low sampling frequencies. We present a modified version of this theory which does not require numerical derivatives, allowing rate constants to be robustly estimated from the time-correlation function directly. We compare these approaches using single-molecule force spectroscopy measurements of an RNA hairpin.
△ Less
Submitted 10 August, 2011;
originally announced August 2011.
-
Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty
Authors:
John D. Chodera,
Phillip Elms,
Frank Noé,
Bettina Keller,
Christian M. Kaiser,
Aaron Ewall-Wice,
Susan Marqusee,
Carlos Bustamante,
Nina Singhal Hinrichs
Abstract:
Single-molecule force spectroscopy has proven to be a powerful tool for studying the kinetic behavior of biomolecules. Through application of an external force, conformational states with small or transient populations can be stabilized, allowing them to be characterized and the statistics of individual trajectories studied to provide insight into biomolecular folding and function. Because the obs…
▽ More
Single-molecule force spectroscopy has proven to be a powerful tool for studying the kinetic behavior of biomolecules. Through application of an external force, conformational states with small or transient populations can be stabilized, allowing them to be characterized and the statistics of individual trajectories studied to provide insight into biomolecular folding and function. Because the observed quantity (force or extension) is not necessarily an ideal reaction coordinate, individual observations cannot be uniquely associated with kinetically distinct conformations. While maximum-likelihood schemes such as hidden Markov models have solved this problem for other classes of single-molecule experiments by using temporal information to aid in the inference of a sequence of distinct conformational states, these methods do not give a clear picture of how precisely the model parameters are determined by the data due to instrument noise and finite-sample statistics, both significant problems in force spectroscopy. We solve this problem through a Bayesian extension that allows the experimental uncertainties to be directly quantified, and build in detailed balance to further reduce uncertainty through physical constraints. We illustrate the utility of this approach in characterizing the three-state kinetic behavior of an RNA hairpin in a stationary optical trap.
△ Less
Submitted 5 August, 2011;
originally announced August 2011.
-
Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data
Authors:
Ryan N. Gutenkunst,
Ryan D. Hernandez,
Scott H. Williamson,
Carlos D. Bustamante
Abstract:
Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus t…
▽ More
Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. As applications, we model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We also combine our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations to accurately predict the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).
△ Less
Submitted 4 September, 2009;
originally announced September 2009.
-
Force unfolding kinetics of RNA using optical tweezers. II. Modeling experiments
Authors:
M. Manosas,
J. -D. Wen,
P. T. X. Li,
S. B. Smith,
C. Bustamante,
I. Tinoco, Jr.,
F. Ritort
Abstract:
By exerting mechanical force it is possible to unfold/refold RNA molecules one at a time. In a small range of forces, an RNA molecule can hop between the folded and the unfolded state with force-dependent kinetic rates. Here, we introduce a mesoscopic model to analyze the hopping kinetics of RNA hairpins in an optical tweezers setup. The model includes different elements of the experimental setu…
▽ More
By exerting mechanical force it is possible to unfold/refold RNA molecules one at a time. In a small range of forces, an RNA molecule can hop between the folded and the unfolded state with force-dependent kinetic rates. Here, we introduce a mesoscopic model to analyze the hopping kinetics of RNA hairpins in an optical tweezers setup. The model includes different elements of the experimental setup (beads, handles and RNA sequence) and limitations of the instrument (time lag of the force-feedback mechanism and finite bandwidth of data acquisition). We investigated the influence of the instrument on the measured hopping rates. Results from the model are in good agreement with the experiments reported in the companion article (1). The comparison between theory and experiments allowed us to infer the values of the intrinsic molecular rates of the RNA hairpin alone and to search for the optimal experimental conditions to do the measurements. We conclude that long handles and soft laser traps represent the best conditions to extract rate estimates that are closest to the intrinsic molecular rates. The methodology and rationale presented here can be applied to other experimental setups and other molecules.
△ Less
Submitted 4 July, 2007;
originally announced July 2007.
-
Force unfolding kinetics of RNA using optical tweezers. I. Effects of experimental variables on measured results
Authors:
J. -D. Wen,
M. Manosas,
P. T. X. Li,
S. B. Smith,
C. Bustamante,
F. Ritort,
I. Tinoco Jr
Abstract:
Experimental variables of optical tweezers instrumentation that affect RNA folding/unfolding kinetics were investigated. A model RNA hairpin, P5ab, was attached to two micron-sized beads through hybrid RNA/DNA handles; one bead was trapped by dual-beam lasers and the other was held by a micropipette. Several experimental variables were changed while measuring the unfolding/refolding kinetics, in…
▽ More
Experimental variables of optical tweezers instrumentation that affect RNA folding/unfolding kinetics were investigated. A model RNA hairpin, P5ab, was attached to two micron-sized beads through hybrid RNA/DNA handles; one bead was trapped by dual-beam lasers and the other was held by a micropipette. Several experimental variables were changed while measuring the unfolding/refolding kinetics, including handle lengths, trap stiffness, and modes of force applied to the molecule. In constant-force mode where the tension applied to the RNA was maintained through feedback control, the measured rate coefficients varied within 40% when the handle lengths were changed by 10 fold (1.1 to 10.2 Kbp); they increased by two- to three-fold when the trap stiffness was lowered to one third (from 0.1 to 0.035 pN/nm). In the passive mode, without feedback control and where the force applied to the RNA varied in response to the end-to-end distance change of the tether, the RNA hopped between a high-force folded-state and a low-force unfolded-state. In this mode, the rates increased up to two-fold with longer handles or softer traps. Overall, the measured rates remained with the same order-of-magnitude over the wide range of conditions studied. In the companion paper (1), we analyze how the measured kinetics parameters differ from the intrinsic molecular rates of the RNA, and thus how to obtain the molecular rates.
△ Less
Submitted 4 July, 2007;
originally announced July 2007.
-
Condensation transition in DNA-polyaminoamide dendrimer fibers studied using optical tweezers
Authors:
F. Ritort,
S. Mihardja,
S. B. Smith,
C. Bustamante
Abstract:
When mixed together, DNA and polyaminoamide (PAMAM) dendrimers form fibers that condense into a compact structure. We use optical tweezers to pull condensed fibers and investigate the decondensation transition by measuring force-extension curves (FECs). A characteristic plateau force (around 10 pN) and hysteresis between the pulling and relaxation cycles are observed for different dendrimer size…
▽ More
When mixed together, DNA and polyaminoamide (PAMAM) dendrimers form fibers that condense into a compact structure. We use optical tweezers to pull condensed fibers and investigate the decondensation transition by measuring force-extension curves (FECs). A characteristic plateau force (around 10 pN) and hysteresis between the pulling and relaxation cycles are observed for different dendrimer sizes, indicating the existence of a first-order transition between two phases (condensed and extended) of the fiber. The fact that we can reproduce the same FECs in the absence of additional dendrimers in the buffer medium indicates that dendrimers remain irreversibly bound to the DNA backbone. Upon salt variation FECs change noticeably confirming that electrostatic forces drive the condensation transition. Finally, we propose a simple model for the decondensing transition that qualitatively reproduces the FECs and which is confirmed by AFM images.
△ Less
Submitted 30 May, 2006;
originally announced May 2006.