-
Joint Inference of Genome Structure and Content in Heterogeneous Tumour Samples
Authors:
Andrew McPherson,
Andrew Roth,
Gavin Ha,
Sohrab P. Shah,
Cedric Chauve,
S. Cenk Sahinalp
Abstract:
For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic…
▽ More
For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic content and structure. We propose a method to unmix tumour and contaminating normal signals and jointly predict genomic structure and content of each tumour clone. We use genome graphs to represent tumour clones, and model the likelihood of the observed reads given clones and mixing proportions. Our use of haplotype blocks allows us to accurately measure allele specific read counts, and infer allele specific copy number for each clone. The proposed method is a heuristic local search based on applying incremental, locally optimal modifications of the genome graphs. Using simulated data, we show that our method predicts copy counts and gene adjacencies with reasonable accuracy.
△ Less
Submitted 24 April, 2015; v1 submitted 13 April, 2015;
originally announced April 2015.
-
Mirroring co-evolving trees in the light of their topologies
Authors:
Iman Hajirasouliha,
Alexander Schönhuth,
David Juan,
Alfonso Valencia,
S. Cenk Sahinalp
Abstract:
Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationall…
▽ More
Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to maximize the distance matrices corresponding to the tree topologies in question. In this paper we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question.
Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 minute on a single processor vs. 730 hours on a supercomputer. Furthermore we have advantages over the current state-of-the-art heuristic search approach in terms of precision as well as a recently suggested overall performance measure for mirrortree approaches, while incurring only acceptable losses in recall.
A C implementation of the method demonstrated in this paper is available at http://compbio.cs.sfu.ca/mirrort.htm
△ Less
Submitted 26 October, 2011;
originally announced October 2011.
-
Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version
Authors:
Alexander Schönhuth,
Raheleh Salari,
S. Cenk Sahinalp
Abstract:
Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on…
▽ More
Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points out that our statistics can reliably complement other methods which mostly focus on the reliability of match positions.
△ Less
Submitted 11 June, 2010;
originally announced June 2010.