-
A Drug Recommendation System (Dr.S) for cancer cell lines
Authors:
Marleen Balvert,
Georgios Patoulidis,
Andrew Patti,
Timo M. Deist,
Christine Eyler,
Bas E. Dutilh,
Alexander Schönhuth,
David Craft
Abstract:
Personalizing drug prescriptions in cancer care based on genomic information requires associating genomic markers with treatment effects. This is an unsolved challenge requiring genomic patient data in yet unavailable volumes as well as appropriate quantitative methods. We attempt to solve this challenge for an experimental proxy for which sufficient data is available: 42 drugs tested on 1018 canc…
▽ More
Personalizing drug prescriptions in cancer care based on genomic information requires associating genomic markers with treatment effects. This is an unsolved challenge requiring genomic patient data in yet unavailable volumes as well as appropriate quantitative methods. We attempt to solve this challenge for an experimental proxy for which sufficient data is available: 42 drugs tested on 1018 cancer cell lines. Our goal is to develop a method to identify the drug that is most promising based on a cell line's genomic information. For this, we need to identify for each drug the machine learning method, choice of hyperparameters and genomic features for optimal predictive performance. We extensively compare combinations of gene sets (both curated and random), genetic features, and machine learning algorithms for all 42 drugs. For each drug, the best performing combination (considering only the curated gene sets) is selected. We use these top model parameters for each drug to build and demonstrate a Drug Recommendation System (Dr.S). Insights resulting from this analysis are formulated as best practices for developing drug recommendation systems. The complete software system, called the Cell Line Analyzer, is written in Python and available on github.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
An image representation based convolutional network for DNA classification
Authors:
Bojian Yin,
Marleen Balvert,
Davide Zambrano,
Alexander Schönhuth,
Sander Bohte
Abstract:
The folding structure of the DNA molecule combined with helper molecules, also referred to as the chromatin, is highly relevant for the functional properties of DNA. The chromatin structure is largely determined by the underlying primary DNA sequence, though the interaction is not yet fully understood. In this paper we develop a convolutional neural network that takes an image-representation of pr…
▽ More
The folding structure of the DNA molecule combined with helper molecules, also referred to as the chromatin, is highly relevant for the functional properties of DNA. The chromatin structure is largely determined by the underlying primary DNA sequence, though the interaction is not yet fully understood. In this paper we develop a convolutional neural network that takes an image-representation of primary DNA sequence as its input, and predicts key determinants of chromatin structure. The method is developed such that it is capable of detecting interactions between distal elements in the DNA sequence, which are known to be highly relevant. Our experiments show that the method outperforms several existing methods both in terms of prediction accuracy and training time.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
CLEVER: Clique-Enumerating Variant Finder
Authors:
Tobias Marschall,
Ivan Costa,
Stefan Canzar,
Markus Bauer,
Gunnar Klau,
Alexander Schliep,
Alexander Schönhuth
Abstract:
Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, includ…
▽ More
Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, including also concordant reads into a read alignment graph where max-cliques represent maximal contradiction-free groups of alignments. A specifically engineered algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions (indels). For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present various relevant performance statistics. We achieve superior performance rates in particular on indels of sizes 20--100, which have been exposed as a current major challenge in the SV discovery literature and where prior insert size based approaches have limitations. In that size range, we outperform even split read aligners. We achieve good results also on real data where we make a substantial amount of correct predictions as the only tool, which complement the predictions of split-read aligners. CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com.
△ Less
Submitted 15 July, 2012; v1 submitted 5 March, 2012;
originally announced March 2012.
-
Generic identification of binary-valued hidden Markov processes
Authors:
Alexander Schönhuth
Abstract:
The generic identification problem is to decide whether a stochastic process $(X_t)$ is a hidden Markov process and if yes to infer its parameters for all but a subset of parametrizations that form a lower-dimensional subvariety in parameter space. Partial answers so far available depend on extra assumptions on the processes, which are usually centered around stationarity. Here we present a genera…
▽ More
The generic identification problem is to decide whether a stochastic process $(X_t)$ is a hidden Markov process and if yes to infer its parameters for all but a subset of parametrizations that form a lower-dimensional subvariety in parameter space. Partial answers so far available depend on extra assumptions on the processes, which are usually centered around stationarity. Here we present a general solution for binary-valued hidden Markov processes. Our approach is rooted in algebraic statistics hence it is geometric in nature. We find that the algebraic varieties associated with the probability distributions of binary-valued hidden Markov processes are zero sets of determinantal equations which draws a connection to well-studied objects from algebra. As a consequence, our solution allows for algorithmic implementation based on elementary (linear) algebraic routines.
△ Less
Submitted 22 October, 2013; v1 submitted 19 January, 2011;
originally announced January 2011.
-
Equations for hidden Markov models
Authors:
Alexander Schoenhuth
Abstract:
We will outline novel approaches to derive model invariants for hidden Markov and related models. These approaches are based on a theoretical framework that arises from viewing random processes as elements of the vector space of string functions. Theorems available from that framework then give rise to novel ideas to obtain model invariants for hidden Markov and related models.
We will outline novel approaches to derive model invariants for hidden Markov and related models. These approaches are based on a theoretical framework that arises from viewing random processes as elements of the vector space of string functions. Theorems available from that framework then give rise to novel ideas to obtain model invariants for hidden Markov and related models.
△ Less
Submitted 7 February, 2009; v1 submitted 23 January, 2009;
originally announced January 2009.