Search | arXiv e-print repository

Large scale modeling of antimicrobial resistance with interpretable classifiers

Authors: Alexandre Drouin, Frédéric Raymond, Gaël Letarte St-Pierre, Mario Marchand, Jacques Corbeil, François Laviolette

Abstract: Antimicrobial resistance is an important public health concern that has implications in the practice of medicine worldwide. Accurately predicting resistance phenotypes from genome sequences shows great promise in promoting better use of antimicrobial agents, by determining which antibiotics are likely to be effective in specific clinical cases. In healthcare, this would allow for the design of tre… ▽ More Antimicrobial resistance is an important public health concern that has implications in the practice of medicine worldwide. Accurately predicting resistance phenotypes from genome sequences shows great promise in promoting better use of antimicrobial agents, by determining which antibiotics are likely to be effective in specific clinical cases. In healthcare, this would allow for the design of treatment plans tailored for specific individuals, likely resulting in better clinical outcomes for patients with bacterial infections. In this work, we present the recent work of Drouin et al. (2016) on using Set Covering Machines to learn highly interpretable models of antibiotic resistance and complement it by providing a large scale application of their method to the entire PATRIC database. We report prediction results for 36 new datasets and present the Kover AMR platform, a new web-based tool allowing the visualization and interpretation of the generated models. △ Less

Submitted 3 December, 2016; originally announced December 2016.

Comments: Peer-reviewed and accepted for presentation at the Machine Learning for Health Workshop, NIPS 2016, Barcelona, Spain

arXiv:1505.06249 [pdf, other]

Greedy Biomarker Discovery in the Genome with Applications to Antimicrobial Resistance

Authors: Alexandre Drouin, Sébastien Giguère, Maxime Déraspe, François Laviolette, Mario Marchand, Jacques Corbeil

Abstract: The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our… ▽ More The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our results show that the SCM compares favorably in terms of sparsity and accuracy against L1 and L2 regularized Support Vector Machines and CART decision trees. Moreover, the SCM was the only algorithm that could consider the full feature space. For all other algorithms, the latter had to be filtered as a preprocessing step. △ Less

Submitted 22 May, 2015; originally announced May 2015.

Comments: Peer-reviewed and accepted for an oral presentation in the Greed is Great workshop at the International Conference on Machine Learning, Lille, France, 2015

arXiv:1412.1074 [pdf, other]

Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

Authors: Alexandre Drouin, Sébastien Giguère, Vladana Sagatovich, Maxime Déraspe, François Laviolette, Mario Marchand, Jacques Corbeil

Abstract: The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa… ▽ More The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa, an important human pathogen, against 4 antibiotics. Our results demonstrate that extremely sparse models which are biologically relevant can be learnt using this approach. △ Less

Submitted 2 December, 2014; originally announced December 2014.

Comments: Presented at Machine Learning in Computational Biology 2014, Montréal, Québec, Canada

arXiv:1311.3573 [pdf, ps, other]

Improved design and screening of high bioactivity peptides for drug discovery

Authors: Sébastien Giguère, François Laviolette, Mario Marchand, Denise Tremblay, Sylvain Moineau, Éric Biron, Jacques Corbeil

Abstract: The discovery of peptides having high biological activity is very challenging mainly because there is an enormous diversity of compounds and only a minority have the desired properties. To lower cost and reduce the time to obtain promising compounds, machine learning approaches can greatly assist in the process and even replace expensive laboratory experiments by learning a predictor with existing… ▽ More The discovery of peptides having high biological activity is very challenging mainly because there is an enormous diversity of compounds and only a minority have the desired properties. To lower cost and reduce the time to obtain promising compounds, machine learning approaches can greatly assist in the process and even replace expensive laboratory experiments by learning a predictor with existing data. Unfortunately, selecting ligands having the greatest predicted bioactivity requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate compounds. We propose an efficient algorithm based on De Bruijn graphs, guaranteed to find the peptides of maximal predicted bioactivity. We demonstrate how this algorithm can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic anti-microbial peptides. Source code is freely available at http://graal.ift.ulaval.ca/peptide-design/. △ Less

Submitted 10 April, 2014; v1 submitted 14 November, 2013; originally announced November 2013.

MSC Class: 92B05 ACM Class: I.2.6; J.3; G.3; G.4; I.5.2

arXiv:1207.7253 [pdf, other]

doi 10.1186/1471-2105-14-82

Learning a peptide-protein binding affinity predictor with kernel ridge regression

Authors: Sébastien Giguère, Mario Marchand, François Laviolette, Alexandre Drouin, Jacques Corbeil

Abstract: We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalize eight kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation… ▽ More We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalize eight kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of accurately predicting the binding affinity of any peptide to any protein. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. On all benchmarks, our method significantly (p-value < 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. The method should be of value to a large segment of the research community with the potential to accelerate peptide-based drug and vaccine development. △ Less

Submitted 31 July, 2012; originally announced July 2012.

Comments: 22 pages, 4 figures, 5 tables

MSC Class: 92B05 ACM Class: I.2.6; J.3; G.3; G.4; I.5.2

Journal ref: BMC Bioinformatics 2013, 14:82

Showing 1–5 of 5 results for author: Marchand, M