Search | arXiv e-print repository

Automated Inference of Graph Transformation Rules

Authors: Jakob L. Andersen, Akbar Davoodi, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: The explosion of data available in life sciences is fueling an increasing demand for expressive models and computational methods. Graph transformation is a model for dynamic systems with a large variety of applications. We introduce a novel method of the graph transformation model construction, combining generative and dynamical viewpoints to give a fully automated data-driven model inference meth… ▽ More The explosion of data available in life sciences is fueling an increasing demand for expressive models and computational methods. Graph transformation is a model for dynamic systems with a large variety of applications. We introduce a novel method of the graph transformation model construction, combining generative and dynamical viewpoints to give a fully automated data-driven model inference method. The method takes the input dynamical properties, given as a "snapshot" of the dynamics encoded by explicit transitions, and constructs a compatible model. The obtained model is guaranteed to be minimal, thus framing the approach as model compression (from a set of transitions into a set of rules). The compression is permissive to a lossy case, where the constructed model is allowed to exhibit behavior outside of the input transitions, thus suggesting a completion of the input dynamics. The task of graph transformation model inference is naturally highly challenging due to the combinatorics involved. We tackle the exponential explosion by proposing a heuristically minimal translation of the task into a well-established problem, set cover, for which highly optimized solutions exist. We further showcase how our results relate to Kolmogorov complexity expressed in terms of graph transformation. △ Less

Submitted 18 December, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2201.04515 [pdf, other]

Representing catalytic mechanisms with rule composition

Authors: Jakob L. Andersen, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: Reaction mechanisms are often presented as sequences of elementary steps, such as codified by arrow pushing. We propose an approach for representing such mechanisms using graph transformation. In this framework, each elementary step is a rule for modifying a molecular graph and a mechanism is a sequence of such rules. To generate a compact representation of a multi-step reaction, we compose the ru… ▽ More Reaction mechanisms are often presented as sequences of elementary steps, such as codified by arrow pushing. We propose an approach for representing such mechanisms using graph transformation. In this framework, each elementary step is a rule for modifying a molecular graph and a mechanism is a sequence of such rules. To generate a compact representation of a multi-step reaction, we compose the rules of individual steps into a composite rule, providing a rigorous and fully automated approach to coarse-graining. While the composite rule retains the graphical conditions necessary for the execution of a mechanism, it also records information about transient changes not visible by comparing educts and products. By projecting the rule onto a single "overlay graph", we generalize Fujita's idea of an Imaginary Transition Structure from elementary reactions to composite reactions. The utility of the overlay graph construct is exemplified in the context of enzyme-catalyzed reactions. In a first application, we exploit mechanistic information in the Mechanism and Catalytic Site Atlas to construct overlay graphs of hydrolase reactions listed in the database. These graphs point at a spectrum of catalytic entanglement of enzyme and substrate, de-emphasizing the notion of a singular catalyst in favor of a collection of catalytic sites that can be distributed across enzyme and substrate. In a second application, we deploy composite rules to search the Rhea database for reactions of known or unknown mechanism that are, in principle, compatible with the mechanisms implied by the composite rules. We believe this work adds to the utility of graph-transformation formalisms in representing and reasoning about chemistry in an automated yet insightful fashion. △ Less

Submitted 25 August, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: Preprint

arXiv:2201.04360 [pdf, other]

Efficient Modular Graph Transformation Rule Application

Authors: Jakob L. Andersen, Rolf Fagerberg, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: Graph transformation formalisms have proven to be suitable tools for the modelling of chemical reactions. They are well established in theoretical studies and increasingly also in practical applications in chemistry. The latter is made feasible via the development of programming frameworks which makes the formalisms executable. The application of such frameworks to large networks of chemical rea… ▽ More Graph transformation formalisms have proven to be suitable tools for the modelling of chemical reactions. They are well established in theoretical studies and increasingly also in practical applications in chemistry. The latter is made feasible via the development of programming frameworks which makes the formalisms executable. The application of such frameworks to large networks of chemical reactions, however, poses unique computational challenges. One such characteristic is the inherent combinatorial nature of the graphs involved. The graphs consist of many connected components, representing individual molecules. While the existing methods for implementing graph transformations can be applied to such graphs, the combinatorics of constructing graph matches quickly becomes a computational bottleneck as the size of the chemical reaction network grows. In this contribution, we develop a new method of enumerating graph matches during graph transformation rule application. The method is designed to improve performance in such scenarios and is based on constructing graph matches in an iterative, component-wise fashion which allows redundant applications to be detected early and pruned. We further extend the algorithm with an efficient heuristic based on local symmetries of the graphs, which allow us to detect and discard isomorphic applications early. Finally, we conduct chemical network generation experiments on real-life as well as synthetic data and compare against the state-of-the-art algorithm in the field. △ Less

Submitted 25 August, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: preprint

arXiv:2108.04077 [pdf, other]

doi 10.1089/CMB.2020.0548

Cayley Graphs of Semigroups Applied to Atom Tracking in Chemistry

Authors: Nikolai Nøjgaard, Walter Fontana, Marc Hellmuth, Daniel Merkle

Abstract: While atom tracking with isotope-labeled compounds is an essential and sophisticated wet-lab tool in order to, e.g., illuminate reaction mechanisms, there exists only a limited amount of formal methods to approach the problem. Specifically when large (bio-)chemical networks are considered where reactions are stereo-specific, rigorous techniques are inevitable. We present an approach using the righ… ▽ More While atom tracking with isotope-labeled compounds is an essential and sophisticated wet-lab tool in order to, e.g., illuminate reaction mechanisms, there exists only a limited amount of formal methods to approach the problem. Specifically when large (bio-)chemical networks are considered where reactions are stereo-specific, rigorous techniques are inevitable. We present an approach using the right Cayley graph of a monoid in order to track atoms concurrently through sequences of reactions and predict their potential location in product molecules. This can not only be used to systematically build hypothesis or reject reaction mechanisms (we will use the ANRORC mechanism "Addition of the Nucleophile, Ring Opening, and Ring Closure" as an example), but also to infer naturally occurring subsystems of (bio-)chemical systems. Our results include the analysis of the carbon traces within the TCA cycle and infer subsystems based on projections of the right Cayley graph onto a set of relevant atoms. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2107.01893 [pdf, other]

Combining Orthology and Xenology Data in a Common Phylogenetic Tree

Authors: Marc Hellmuth, Mira Michel, Nikolai N. Nøjgaard, David Schaller, Peter F. Stadler

Abstract: A rooted tree $T$ with vertex labels $t(v)$ and set-valued edge labels $λ(e)$ defines maps $δ$ and $\varepsilon$ on the pairs of leaves of $T$ by setting $δ(x,y)=q$ if the last common ancestor $\text{lca}(x,y)$ of $x$ and $y$ is labeled $q$, and $m\in \varepsilon(x,y)$ if $m\inλ(e)$ for at least one edge $e$ along the path from $\text{lca}(x,y)$ to $y$. We show that a pair of maps… ▽ More A rooted tree $T$ with vertex labels $t(v)$ and set-valued edge labels $λ(e)$ defines maps $δ$ and $\varepsilon$ on the pairs of leaves of $T$ by setting $δ(x,y)=q$ if the last common ancestor $\text{lca}(x,y)$ of $x$ and $y$ is labeled $q$, and $m\in \varepsilon(x,y)$ if $m\inλ(e)$ for at least one edge $e$ along the path from $\text{lca}(x,y)$ to $y$. We show that a pair of maps $(δ,\varepsilon)$ derives from a tree $(T,t,λ)$ if and only if there exists a common refinement of the (unique) least-resolved vertex labeled tree $(T_δ,t_δ)$ that explains $δ$ and the (unique) least resolved edge labeled tree $(T_{\varepsilon},λ_{\varepsilon})$ that explains $\varepsilon$ (provided both trees exist). This result remains true if certain combinations of labels at incident vertices and edges are forbidden. △ Less

Submitted 5 July, 2021; originally announced July 2021.

arXiv:1911.00407 [pdf, other]

A Graph-Based Tool to Embed the π-Calculus into a Computational DPO Framework

Authors: Jakob Lykke Andersen, Marc Hellmuth, Daniel Merkle, Nikolai Nøjgaard, Marco Peressotti

Abstract: Graph transformation approaches have been successfully used to analyse and design chemical and biological systems. Here we build on top of a DPO framework, in which molecules are modelled as typed attributed graphs and chemical reactions are modelled as graph transformations. Edges and vertexes can be labelled with first-order terms, which can be used to encode, e.g., steric information of molecul… ▽ More Graph transformation approaches have been successfully used to analyse and design chemical and biological systems. Here we build on top of a DPO framework, in which molecules are modelled as typed attributed graphs and chemical reactions are modelled as graph transformations. Edges and vertexes can be labelled with first-order terms, which can be used to encode, e.g., steric information of molecules. While targeted to chemical settings, the computational framework is intended to be very generic and applicable to the exploration of arbitrary spaces derived via iterative application of rewrite rules, such as process calculi like Milner's π-calculus. To illustrate the generality of the framework, we introduce EpiM: a tool for computing execution spaces of π-calculus processes. EpiM encodes π-calculus processes as typed attributed graphs and then exploits the existing DPO framework to compute their dynamics in the form of graphs where nodes are π-calculus processes and edges are reduction steps. EpiM takes advantage of the graph-based representation and facilities offered by the framework, like efficient isomorphism checking to prune the space without resorting to explicit structural equivalences. EpiM is available as an online Python-based tool. △ Less

Submitted 29 October, 2019; originally announced November 2019.

arXiv:1711.00504 [pdf, other]

Partial Homology Relations - Satisfiability in terms of Di-Cographs

Authors: Nikolai Nøjgaard, Nadia El-Mabrouk, Daniel Merkle, Nikolas Wieseke, Marc Hellmuth

Abstract: Directed cographs (di-cographs) play a crucial role in the reconstruction of evolutionary histories of genes based on homology relations which are binary relations between genes. A variety of methods based on pairwise sequence comparisons can be used to infer such homology relations (e.g.\ orthology, paralogy, xenology). They are \emph{satisfiable} if the relations can be explained by an event-lab… ▽ More Directed cographs (di-cographs) play a crucial role in the reconstruction of evolutionary histories of genes based on homology relations which are binary relations between genes. A variety of methods based on pairwise sequence comparisons can be used to infer such homology relations (e.g.\ orthology, paralogy, xenology). They are \emph{satisfiable} if the relations can be explained by an event-labeled gene tree, i.e., they can simultaneously co-exist in an evolutionary history of the underlying genes. Every gene tree is equivalently interpreted as a so-called cotree that entirely encodes the structure of a di-cograph. Thus, satisfiable homology relations must necessarily form a di-cograph. The inferred homology relations might not cover each pair of genes and thus, provide only partial knowledge on the full set of homology relations. Moreover, for particular pairs of genes, it might be known with a high degree of certainty that they are not orthologs (resp.\ paralogs, xenologs) which yields forbidden pairs of genes. Motivated by this observation, we characterize (partial) satisfiable homology relations with or without forbidden gene pairs, provide a quadratic-time algorithm for their recognition and for the computation of a cotree that explains the given relations. △ Less

Submitted 3 May, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

arXiv:1705.02179 [pdf, other]

Forbidden Time Travel: Characterization of Time-Consistent Tree Reconciliation Maps

Authors: Nikolai Nøjgaard, Manuela Geiß, Peter F. Stadler, Daniel Merkle, Nicolas Wieseke, Marc Hellmuth

Abstract: In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event type… ▽ More In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Here we investigate the mathematical structure of the event labeled reconciliation problem with horizontal transfer. We investigate the issue of time-consistency for the event-labeled version of the reconciliation problem, provide a convenient axiomatic framework, and derive a complete characterization of time-consistent reconciliations. This characterization depends on certain weak conditions on the event-labeled gene trees that reflect conditions under which evolutionary events are observable at least in principle. We give an O(|V(T)|log(|V(S)|))-time algorithm to decide whether a time-consistent reconciliation map exists. It does not require the construction of explicit timing maps, but relies entirely on the comparably easy task of checking whether a small auxiliary graph is acyclic. The combinatorial characterization of time consistency and thus biologically feasible reconciliation is an important step towards the inference of gene family histories with horizontal transfer from orthology data, i.e., without presupposed gene and species trees. The fast algorithm to decide time consistency is useful in a broader context because it constitutes an attractive component for all tools that address tree reconciliation problems. △ Less

Submitted 5 May, 2017; originally announced May 2017.

ACM Class: G.2.2; G.2.3; F.2.2

Showing 1–8 of 8 results for author: Nøjgaard, N