-
Genomic data analysis in tree spaces
Authors:
Sakellarios Zairis,
Hossein Khiabanian,
Andrew J. Blumberg,
Raul Rabadan
Abstract:
Recently, an elegant approach in phylogenetics was introduced by Billera-Holmes-Vogtmann that allows a systematic comparison of different evolutionary histories using the metric geometry of tree spaces. In many problem settings one encounters heavily populated phylogenetic trees, where the large number of leaves encumbers visualization and analysis in the relevant evolutionary moduli spaces. To ad…
▽ More
Recently, an elegant approach in phylogenetics was introduced by Billera-Holmes-Vogtmann that allows a systematic comparison of different evolutionary histories using the metric geometry of tree spaces. In many problem settings one encounters heavily populated phylogenetic trees, where the large number of leaves encumbers visualization and analysis in the relevant evolutionary moduli spaces. To address this issue, we introduce tree dimensionality reduction, a structured approach to reducing large phylogenetic trees to a distribution of smaller trees. We prove a stability theorem ensuring that small perturbations of the large trees are taken to small perturbations of the resulting distributions.
We then present a series of four biologically motivated applications to the analysis of genomic data, spanning cancer and infectious disease. The first quantifies how chemotherapy can disrupt the evolution of common leukemias. The second examines a link between geometric information and the histologic grade in relapsed gliomas, where longer relapse branches were specific to high grade glioma. The third concerns genetic stability of xenograft models of cancer, where heterogeneity at the single cell level increased with later mouse passages. The last studies genetic diversity in seasonal influenza A virus. We apply tree dimensionality reduction to 24 years of longitudinally collected H3N2 hemagglutinin sequences, generating distributions of smaller trees spanning between three and five seasons. A negative correlation is observed between the influenza vaccine effectiveness during a season and the variance of the distributions produced using preceding seasons' sequence data. We also show how tree distributions relate to antigenic clusters and choice of influenza vaccine. Our formalism exposes links between viral genomic data and clinical observables such as vaccine selection and efficacy.
△ Less
Submitted 25 July, 2016;
originally announced July 2016.
-
Moduli Spaces of Phylogenetic Trees Describing Tumor Evolutionary Patterns
Authors:
Sakellarios Zairis,
Hossein Khiabanian,
Andrew J. Blumberg,
Raul Rabadan
Abstract:
Cancers follow a clonal Darwinian evolution, with fitter subclones replacing more quiescent cells, ultimately giving rise to macroscopic disease. High-throughput genomics provides the opportunity to investigate these processes and determine specific genetic alterations driving disease progression. Genomic sampling of a patient's cancer provides a molecular history, represented by a phylogenetic tr…
▽ More
Cancers follow a clonal Darwinian evolution, with fitter subclones replacing more quiescent cells, ultimately giving rise to macroscopic disease. High-throughput genomics provides the opportunity to investigate these processes and determine specific genetic alterations driving disease progression. Genomic sampling of a patient's cancer provides a molecular history, represented by a phylogenetic tree. Cohorts of patients represent a forest of related phylogenetic structures. To extract clinically relevant information, one must represent and statistically compare these collections of trees. We propose a framework based on an application of the work by Billera, Holmes and Vogtmann on phylogenetic tree spaces to the case of unrooted trees of intra-individual cancer tissue samples. We observe that these tree spaces are globally nonpositively curved, allowing for statistical inference on populations of patient histories. A projective tree space is introduced, permitting visualizations of aggregate evolutionary behavior. Published data from three types of human malignancies are explored within our framework.
△ Less
Submitted 3 October, 2014;
originally announced October 2014.
-
Understanding the Origins of a Pandemic Virus
Authors:
Carlos Xavier Hernandez,
Joseph Chan,
Hossein Khiabanian,
Raul Rabadan
Abstract:
Understanding the origin of infectious diseases provides scientifically based rationales for implementing public health measures that may help to avoid or mitigate future epidemics. The recent ancestors of a pandemic virus provide invaluable information about the set of minimal genomic alterations that transformed a zoonotic agent into a full human pandemic. Since the first confirmed cases of the…
▽ More
Understanding the origin of infectious diseases provides scientifically based rationales for implementing public health measures that may help to avoid or mitigate future epidemics. The recent ancestors of a pandemic virus provide invaluable information about the set of minimal genomic alterations that transformed a zoonotic agent into a full human pandemic. Since the first confirmed cases of the H1N1 pandemic virus in the spring of 2009, several hypotheses about the strain's origins have been proposed. However, how, where, and when it first infected humans is still far from clear. The only way to piece together this epidemiological puzzle relies on the collective effort of the international scientific community to increase genomic sequencing of influenza isolates, especially ones collected in the months prior to the origin of the pandemic.
△ Less
Submitted 23 April, 2011;
originally announced April 2011.
-
Dark Matter Structures In The Deep Lens Survey
Authors:
Jeffrey M. Kubo,
Hossein Khiabanian,
Ian P. Dell'Antonio,
David Wittman,
J. Anthony Tyson
Abstract:
We present a regularized maximum likelihood weak lensing reconstruction of the Deep Lens Survey F2 field (4 deg^2). High signal-to-noise ratio peaks in our lensing significance map appear to be associated with possible projected filamentary structures. The largest apparent structure extends for over a degree in the field and has contributions from known optical clusters at three redshifts (z ~ 0…
▽ More
We present a regularized maximum likelihood weak lensing reconstruction of the Deep Lens Survey F2 field (4 deg^2). High signal-to-noise ratio peaks in our lensing significance map appear to be associated with possible projected filamentary structures. The largest apparent structure extends for over a degree in the field and has contributions from known optical clusters at three redshifts (z ~ 0.3, 0.43, 0.5). Noise in weak lensing reconstructions is known to potentially cause "false positives"; we use Monte Carlo techniques to estimate the contamination in our sample, and find that 10-25% of the peaks are expected to be false detections. For significant lensing peaks we estimate the total signal-to-noise ratio of detection using a method that accounts for pixel-to-pixel correlations in our reconstruction. We also report the detection of a candidate relative underdensity in the F2 field with a total signal-to-noise ratio of ~ 5.5.
△ Less
Submitted 7 July, 2009; v1 submitted 29 September, 2008;
originally announced September 2008.
-
A Multi-Resolution Weak Lensing Mass Reconstruction Method
Authors:
H. Khiabanian,
I. P. Dell'Antonio
Abstract:
Motivated by the limitations encountered with the commonly used direct reconstruction techniques of producing mass maps, we have developed a multi-resolution maximum-likelihood reconstruction method for producing two dimensional mass maps using weak gravitational lensing data. To utilize all the shear information, we employ an iterative inverse method with a properly selected regularization coef…
▽ More
Motivated by the limitations encountered with the commonly used direct reconstruction techniques of producing mass maps, we have developed a multi-resolution maximum-likelihood reconstruction method for producing two dimensional mass maps using weak gravitational lensing data. To utilize all the shear information, we employ an iterative inverse method with a properly selected regularization coefficient which fits the deflection potential at the position of each galaxy. By producing mass maps with multiple resolutions in the different parts of the observed field, we can achieve a comparable level of signal to noise by increasing the resolution in regions of higher distortions or regions with an over-density of background galaxies. In addition, we are able to better study the sub-structure of the massive clusters at a resolution which is not attainable in the rest of the observed field. We apply our method to the simulated data and to a four square degree field obtained by the Deep Lens Survey.
△ Less
Submitted 10 April, 2008;
originally announced April 2008.
-
The Mass Of The Coma Cluster From Weak Lensing In The Sloan Digital Sky Survey
Authors:
Jeffrey M. Kubo,
Albert Stebbins,
James Annis,
Ian P. Dell'Antonio,
Huan Lin,
Hossein Khiabanian,
Joshua A. Frieman
Abstract:
We present a weak lensing analysis of the Coma Cluster using the Sloan Digital Sky Survey (SDSS) Data Release Five. Complete imaging of a ~ 200 square degree region is used to measure the tangential shear of this cluster. The shear is fit to an NFW model and we find a virial radius of r_{200}=1.99_{-0.22}^{+0.21}h^{-1}Mpc which corresponds to a virial mass of M_{200}=1.88_{-0.56}^{+0.65}\times10…
▽ More
We present a weak lensing analysis of the Coma Cluster using the Sloan Digital Sky Survey (SDSS) Data Release Five. Complete imaging of a ~ 200 square degree region is used to measure the tangential shear of this cluster. The shear is fit to an NFW model and we find a virial radius of r_{200}=1.99_{-0.22}^{+0.21}h^{-1}Mpc which corresponds to a virial mass of M_{200}=1.88_{-0.56}^{+0.65}\times10^{15}h^{-1}M_{\odot}. We additionally compare our weak lensing measurement to the virial mass derived using dynamical techniques, and find they are in agreement. This is the lowest redshift, largest angle weak lensing measurement of an individual cluster to date.
△ Less
Submitted 4 September, 2007;
originally announced September 2007.