-
Large scale modeling of antimicrobial resistance with interpretable classifiers
Authors:
Alexandre Drouin,
Frédéric Raymond,
Gaël Letarte St-Pierre,
Mario Marchand,
Jacques Corbeil,
François Laviolette
Abstract:
Antimicrobial resistance is an important public health concern that has implications in the practice of medicine worldwide. Accurately predicting resistance phenotypes from genome sequences shows great promise in promoting better use of antimicrobial agents, by determining which antibiotics are likely to be effective in specific clinical cases. In healthcare, this would allow for the design of tre…
▽ More
Antimicrobial resistance is an important public health concern that has implications in the practice of medicine worldwide. Accurately predicting resistance phenotypes from genome sequences shows great promise in promoting better use of antimicrobial agents, by determining which antibiotics are likely to be effective in specific clinical cases. In healthcare, this would allow for the design of treatment plans tailored for specific individuals, likely resulting in better clinical outcomes for patients with bacterial infections. In this work, we present the recent work of Drouin et al. (2016) on using Set Covering Machines to learn highly interpretable models of antibiotic resistance and complement it by providing a large scale application of their method to the entire PATRIC database. We report prediction results for 36 new datasets and present the Kover AMR platform, a new web-based tool allowing the visualization and interpretation of the generated models.
△ Less
Submitted 3 December, 2016;
originally announced December 2016.
-
Greedy Biomarker Discovery in the Genome with Applications to Antimicrobial Resistance
Authors:
Alexandre Drouin,
Sébastien Giguère,
Maxime Déraspe,
François Laviolette,
Mario Marchand,
Jacques Corbeil
Abstract:
The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our…
▽ More
The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our results show that the SCM compares favorably in terms of sparsity and accuracy against L1 and L2 regularized Support Vector Machines and CART decision trees. Moreover, the SCM was the only algorithm that could consider the full feature space. For all other algorithms, the latter had to be filtered as a preprocessing step.
△ Less
Submitted 22 May, 2015;
originally announced May 2015.
-
Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine
Authors:
Alexandre Drouin,
Sébastien Giguère,
Vladana Sagatovich,
Maxime Déraspe,
François Laviolette,
Mario Marchand,
Jacques Corbeil
Abstract:
The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa…
▽ More
The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa, an important human pathogen, against 4 antibiotics. Our results demonstrate that extremely sparse models which are biologically relevant can be learnt using this approach.
△ Less
Submitted 2 December, 2014;
originally announced December 2014.
-
Improved design and screening of high bioactivity peptides for drug discovery
Authors:
Sébastien Giguère,
François Laviolette,
Mario Marchand,
Denise Tremblay,
Sylvain Moineau,
Éric Biron,
Jacques Corbeil
Abstract:
The discovery of peptides having high biological activity is very challenging mainly because there is an enormous diversity of compounds and only a minority have the desired properties. To lower cost and reduce the time to obtain promising compounds, machine learning approaches can greatly assist in the process and even replace expensive laboratory experiments by learning a predictor with existing…
▽ More
The discovery of peptides having high biological activity is very challenging mainly because there is an enormous diversity of compounds and only a minority have the desired properties. To lower cost and reduce the time to obtain promising compounds, machine learning approaches can greatly assist in the process and even replace expensive laboratory experiments by learning a predictor with existing data. Unfortunately, selecting ligands having the greatest predicted bioactivity requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate compounds.
We propose an efficient algorithm based on De Bruijn graphs, guaranteed to find the peptides of maximal predicted bioactivity. We demonstrate how this algorithm can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic anti-microbial peptides.
Source code is freely available at http://graal.ift.ulaval.ca/peptide-design/.
△ Less
Submitted 10 April, 2014; v1 submitted 14 November, 2013;
originally announced November 2013.
-
Learning a peptide-protein binding affinity predictor with kernel ridge regression
Authors:
Sébastien Giguère,
Mario Marchand,
François Laviolette,
Alexandre Drouin,
Jacques Corbeil
Abstract:
We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalize eight kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation…
▽ More
We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalize eight kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of accurately predicting the binding affinity of any peptide to any protein. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets.
On all benchmarks, our method significantly (p-value < 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. The method should be of value to a large segment of the research community with the potential to accelerate peptide-based drug and vaccine development.
△ Less
Submitted 31 July, 2012;
originally announced July 2012.