Search | arXiv e-print repository

doi 10.1186/s12859-015-0806-7

Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data

Authors: Saulo Alves Aflitos, Edouard Severing, Gabino Sanchez-Perez, Sander Peters, Hans de Jong, Dick de Ridder

Abstract: Background: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with n… ▽ More Background: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100% identification accuracy at supra-species level and 78% accuracy for species level. Discussion: CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design). △ Less

Submitted 17 November, 2015; originally announced November 2015.

Comments: 47 pages, 13 figures

MSC Class: 92D20; 92B10; 92-08 ACM Class: J.3

Journal ref: BMC Bioinformatics. 2015 Nov 2;16:352

arXiv:1504.05612 [pdf]

doi 10.1111/tpj.12800

Introgression Browser: High throughput whole-genome SNP visualization

Authors: Saulo Alves Aflitos, Gabino Sanchez-Perez, Dick de Ridder, Paul Fransz, Eric Schranz, Hans de Jong, Sander Peters

Abstract: Breeding by introgressive hybridization is a pivotal strategy to broaden the genetic basis of crops. Usually, the desired traits are monitored in consecutive crossing generations by marker-assisted selection, but their analyses fail in chromosome regions where crossover recombinants are rare or not viable. Here, we present the Introgression Browser (IBROWSER), a novel bioinformatics tool aimed at… ▽ More Breeding by introgressive hybridization is a pivotal strategy to broaden the genetic basis of crops. Usually, the desired traits are monitored in consecutive crossing generations by marker-assisted selection, but their analyses fail in chromosome regions where crossover recombinants are rare or not viable. Here, we present the Introgression Browser (IBROWSER), a novel bioinformatics tool aimed at visualizing introgressions at nucleotide or SNP accuracy. The software selects homozygous SNPs from Variant Call Format (VCF) information and filters out heterozygous SNPs, Multi-Nucleotide Polymorphisms (MNPs) and insertion-deletions (InDels). For data analysis IBROWSER makes use of sliding windows, but if needed it can generate any desired fragmentation pattern through General Feature Format (GFF) information. In an example of tomato (Solanum lycopersicum) accessions we visualize SNP patterns and elucidate both position and boundaries of the introgressions. We also show that our tool is capable of identifying alien DNA in a panel of the closely related S. pimpinellifolium by examining phylogenetic relationships of the introgressed segments in tomato. In a third example, we demonstrate the power of the IBROWSER in a panel of 600 Arabidopsis accessions, detecting the boundaries of a SNP-free region around a polymorphic 1.17 Mbp inverted segment on the short arm of chromosome 4. The architecture and functionality of IBROWSER makes the software appropriate for a broad set of analyses including SNP mining, genome structure analysis, and pedigree analysis. Its functionality, together with the capability to process large data sets and efficient visualization of sequence variation, makes IBROWSER a valuable breeding tool. △ Less

Submitted 21 April, 2015; originally announced April 2015.

Comments: 33 pages, 4 figures, 4 Supplementary Figures This is the pre-peer reviewed version of the following article: Plant J. 2015 Apr;82(1):174-82, which has been published in final form at http://doi.org/10.1111/tpj.12800

Journal ref: Plant J. 2015 Apr;82(1):174-82

arXiv:1504.05610 [pdf]

doi 10.1111/tpj.12616

Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing

Authors: Saulo A. Aflitos, Elio Schijlen, Richard Finkers, Sandra Smit, Jun Wang, Gengyun Zhang, Ning Li, Likai Mao, Hans de Jong, Freek Bakker, Barbara Gravendeel, Timo Breit, Rob Dirks, Henk Huits, Darush Struss, Ruth Wagner, Hans van Leeuwen, Roeland van Ham, Laia Fito, Laëtitia Guigner, Myrna Sevilla, Philippe Ellul, Eric W. Ganko, Arvind Kapur, Emmanuel Reclus , et al. (32 additional authors not shown)

Abstract: Genetic variation in the tomato clade was explored by sequencing a selection of 84 tomato accessions and related wild species representative for the Lycopersicon, Arcanum, Eriopersicon, and Neolycopersicon groups. We present a reconstruction of three new reference genomes in support of our comparative genome analyses. Sequence diversity in commercial breeding lines appears extremely low, indicatin… ▽ More Genetic variation in the tomato clade was explored by sequencing a selection of 84 tomato accessions and related wild species representative for the Lycopersicon, Arcanum, Eriopersicon, and Neolycopersicon groups. We present a reconstruction of three new reference genomes in support of our comparative genome analyses. Sequence diversity in commercial breeding lines appears extremely low, indicating the dramatic genetic erosion of crop tomatoes. This is reflected by the SNP count in wild species which can exceed 10 million i.e. 20 fold higher than in crop accessions. Comparative sequence alignment reveals group, species, and accession specific polymorphisms, which explain characteristic fruit traits and growth habits in tomato accessions. Using gene models from the annotated Heinz reference genome, we observe a bias in dN/dS ratio in fruit and growth diversification genes compared to a random set of genes, which probably is the result of a positive selection. We detected highly divergent segments in wild S. lycopersicum species, and footprints of introgressions in crop accessions originating from a common donor accession. Phylogenetic relationships of fruit diversification and growth specific genes from crop accessions show incomplete resolution and are dependent on the introgression donor. In contrast, whole genome SNP information has sufficient power to resolve the phylogenetic placement of each accession in the four main groups in the Lycopersicon clade using Maximum Likelihood analyses. Phylogenetic relationships appear correlated with habitat and mating type and point to the occurrence of geographical races within these groups and thus are of practical importance for introgressive hybridization breeding. Our study illustrates the need for multiple reference genomes in support of tomato comparative genomics and Solanum genome evolution studies. △ Less

Submitted 21 April, 2015; originally announced April 2015.

Comments: 4 Figure, 10 Supplementary Figures, 2 Supplementary Figures This is the pre-peer reviewed version of the following article: The Plant Journal 80.1 (2014): 136-148, which has been published in final form at http://doi.org/10.1111/tpj.12616

Journal ref: The Plant Journal 80.1 (2014): 136-148

arXiv:1311.3236 [pdf]

The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana

Authors: Erik Wijnker, Geo Velikkakam James, Jia Ding, Frank Becker, Jonas R. Klasen, Vimal Rawat, Beth A. Rowan, Daniel F. de Jong, C. Bastiaan de Snoo, Luis Zapata, Bruno Huettel, Hans de Jong, Stephan Ossowski, Detlef Weigel, Maarten Koornneef, Joost J. B. Keurentjes, Korbinian Schneeberger

Abstract: Knowledge of the exact distribution of meiotic crossovers (COs) and gene conversions (GCs) is essential for understanding many aspects of population genetics and evolution, from haplotype structure and long-distance genetic linkage to the generation of new allelic variants of genes. To this end, we resequenced the four products of 13 meiotic tetrads along with 10 doubled haploids derived from Arab… ▽ More Knowledge of the exact distribution of meiotic crossovers (COs) and gene conversions (GCs) is essential for understanding many aspects of population genetics and evolution, from haplotype structure and long-distance genetic linkage to the generation of new allelic variants of genes. To this end, we resequenced the four products of 13 meiotic tetrads along with 10 doubled haploids derived from Arabidopsis thaliana hybrids. GC detection through short reads has previously been confounded by genomic rearrangements. Rigid filtering for misaligned reads allowed GC identification at high accuracy and revealed an ~80-kb transposition, which undergoes copy-number changes mediated by meiotic recombination. Non-crossover associated GCs were extremely rare most likely due to their short average length of ~25-50 bp, which is significantly shorter than the length of CO associated GCs. Overall, recombination preferentially targeted non-methylated nucleosome-free regions at gene promoters, which showed significant enrichment of two sequence motifs. △ Less

Submitted 13 November, 2013; originally announced November 2013.

Comments: 44 pages, 5 figures with figure supplements

arXiv:1309.1910 [pdf]

SBML Qualitative Models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools

Authors: Claudine Chaouiya, Duncan Berenguier, Sarah M Keating, Aurelien Naldi, Martijn P. van Iersel, Nicolas Rodriguez, Andreas Dräger, Finja Büchel, Thomas Cokelaer, Bryan Kowal, Benjamin Wicks, Emanuel Gonçalves, Julien Dorier, Michel Page, Pedro T. Monteiro, Axel von Kamp, Ioannis Xenarios, Hidde de Jong, Michael Hucka, Steffen Klamt, Denis Thieffry, Nicolas Le Novère, Julio Saez-Rodriguez, Tomáš Helikar

Abstract: Background: Qualitative frameworks, especially those based on the logical discrete formalism, are increasingly used to model regulatory and signalling networks. A major advantage of these frameworks is that they do not require precise quantitative data, and that they are well-suited for studies of large networks. While numerous groups have developed specific computational tools that provide origin… ▽ More Background: Qualitative frameworks, especially those based on the logical discrete formalism, are increasingly used to model regulatory and signalling networks. A major advantage of these frameworks is that they do not require precise quantitative data, and that they are well-suited for studies of large networks. While numerous groups have developed specific computational tools that provide original methods to analyse qualitative models, a standard format to exchange qualitative models has been missing. Results: We present the System Biology Markup Language (SBML) Qualitative Models Package ("qual"), an extension of the SBML Level 3 standard designed for computer representation of qualitative models of biological networks. We demonstrate the interoperability of models via SBML qual through the analysis of a specific signalling network by three independent software tools. Furthermore, the cooperative development of the SBML qual format paved the way for the development of LogicalModel, an open-source model library, which will facilitate the adoption of the format as well as the collaborative development of algorithms to analyze qualitative models. Conclusion: SBML qual allows the exchange of qualitative models among a number of complementary software tools. SBML qual has the potential to promote collaborative work on the development of novel computational approaches, as well as on the specification and the analysis of comprehensive qualitative models of regulatory and signalling networks. △ Less

Submitted 7 September, 2013; originally announced September 2013.

Comments: 29 pages, 7 figures

arXiv:1005.2107 [pdf, ps, other]

Efficient parameter search for qualitative models of regulatory networks using symbolic model checking

Authors: Grégory Batt, Michel Page, Irene Cantone, Gregor Goessler, Pedro T. Monteiro, Hidde De Jong

Abstract: Investigating the relation between the structure and behavior of complex biological networks often involves posing the following two questions: Is a hypothesized structure of a regulatory network consistent with the observed behavior? And can a proposed structure generate a desired behavior? Answering these questions presupposes that we are able to test the compatibility of network structure and b… ▽ More Investigating the relation between the structure and behavior of complex biological networks often involves posing the following two questions: Is a hypothesized structure of a regulatory network consistent with the observed behavior? And can a proposed structure generate a desired behavior? Answering these questions presupposes that we are able to test the compatibility of network structure and behavior. We cast these questions into a parameter search problem for qualitative models of regulatory networks, in particular piecewise-affine differential equation models. We develop a method based on symbolic model checking that avoids enumerating all possible parametrizations, and show that this method performs well on real biological problems, using the IRMA synthetic network and benchmark experimental data sets. We test the consistency between the IRMA network structure and the time-series data, and search for parameter modifications that would improve the robustness of the external control of the system behavior. △ Less

Submitted 12 May, 2010; originally announced May 2010.

arXiv:0803.0802 [pdf, ps, other]

Temporal Logic Patterns for Querying Dynamic Models of Cellular Interaction Networks

Authors: Pedro T. Monteiro, Delphine Ropers, Radu Mateescu, Ana T. Freitas, Hidde De Jong

Abstract: Models of the dynamics of cellular interaction networks have become increasingly larger in recent years. Formal verification based on model checking provides a powerful technology to keep up with this increase in scale and complexity. The application of model-checking approaches is hampered, however, by the difficulty for non-expert users to formulate appropriate questions in temporal logic. In… ▽ More Models of the dynamics of cellular interaction networks have become increasingly larger in recent years. Formal verification based on model checking provides a powerful technology to keep up with this increase in scale and complexity. The application of model-checking approaches is hampered, however, by the difficulty for non-expert users to formulate appropriate questions in temporal logic. In order to deal with this problem, we propose the use of patterns, that is, high-level query templates that capture recurring biological questions and that can be automatically translated into temporal logic. The applicability of the developed set of patterns has been investigated by the analysis of an extended model of the network of global regulators controlling the carbon starvation response in Escherichia coli. △ Less

Submitted 11 March, 2008; v1 submitted 6 March, 2008; originally announced March 2008.

arXiv:q-bio/0702058 [pdf, ps, other]

Symbolic Reachability Analysis of Genetic Regulatory Networks using Qualitative Abstractions

Authors: Grégory Batt, Delphine Ropers, Hidde De Jong, Michel Page, Johannes Geiselmann

Abstract: The switch-like character of gene regulation has motivated the use of hybrid, discrete-continuous models of genetic regulatory networks. While powerful techniques for the analysis, verification, and control of hybrid systems have been developed, the specificities of the biological application domain pose a number of challenges, notably the absence of quantitative information on parameter values… ▽ More The switch-like character of gene regulation has motivated the use of hybrid, discrete-continuous models of genetic regulatory networks. While powerful techniques for the analysis, verification, and control of hybrid systems have been developed, the specificities of the biological application domain pose a number of challenges, notably the absence of quantitative information on parameter values and the size and complexity of networks of biological interest. We introduce a method for the analysis of reachability properties of genetic regulatory networks that is based on a class of discontinuous piecewise-affine (PA) differential equations well-adapted to the above constraints. More specifically, we introduce a hyperrectangular partition of the state space that forms the basis for a discrete abstraction preserving the sign of the derivatives of the state variables. The resulting discrete transition system provides a conservative approximation of the qualitative dynamics of the network and can be efficiently computed in a symbolic manner from inequality constraints on the parameters. The method has been implemented in the computer tool Genetic Network Analyzer (GNA), which has been applied to the analysis of a regulatory system whose functioning is not well-understood by biologists, the nutritional stress response in the bacterium Escherichia coli. △ Less

Submitted 28 February, 2007; originally announced February 2007.

Showing 1–8 of 8 results for author: de Jong, H