-
Implementation of a practical Markov chain Monte Carlo sampling algorithm in PyBioNetFit
Authors:
Jacob Neumann,
Yen Ting Lin,
Abhishek Mallela,
Ely F. Miller,
Joshua Colvin,
Abell T. Duprat1,
Ye Chen,
William S. Hlavacek,
Richard G. Posner
Abstract:
Bayesian inference in biological modeling commonly relies on Markov chain Monte Carlo (MCMC) sampling of a multidimensional and non-Gaussian posterior distribution that is not analytically tractable. Here, we present the implementation of a practical MCMC method in the open-source software package PyBioNetFit (PyBNF), which is designed to support parameterization of mathematical models for biologi…
▽ More
Bayesian inference in biological modeling commonly relies on Markov chain Monte Carlo (MCMC) sampling of a multidimensional and non-Gaussian posterior distribution that is not analytically tractable. Here, we present the implementation of a practical MCMC method in the open-source software package PyBioNetFit (PyBNF), which is designed to support parameterization of mathematical models for biological systems. The new MCMC method, am, incorporates an adaptive move proposal distribution. For warm starts, sampling can be initiated at a specified location in parameter space and with a multivariate Gaussian proposal distribution defined initially by a specified covariance matrix. Multiple chains can be generated in parallel using a computer cluster. We demonstrate that am can be used to successfully solve real-world Bayesian inference problems, including forecasting of new Coronavirus Disease 2019 case detection with Bayesian quantification of forecast uncertainty. PyBNF version 1.1.9, the first stable release with am, is available at PyPI and can be installed using the pip package-management system on platforms that have a working installation of Python 3. PyBNF relies on libRoadRunner and BioNetGen for simulations (e.g., numerical integration of ordinary differential equations defined in SBML or BNGL files) and Dask.Distributed for task scheduling on Linux computer clusters.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Daily Forecasting of New Cases for Regional Epidemics of Coronavirus Disease 2019 with Bayesian Uncertainty Quantification
Authors:
Yen Ting Lin,
Jacob Neumann,
Ely Miller,
Richard G. Posner,
Abhishek Mallela,
Cosmin Safta,
Jaideep Ray,
Gautam Thakur,
Supriya Chinthavali,
William S. Hlavacek
Abstract:
To increase situational awareness and support evidence-based policy-making, we formulated two types of mathematical models for COVID-19 transmission within a regional population. One is a fitting function that can be calibrated to reproduce an epidemic curve with two timescales (e.g., fast growth and slow decay). The other is a compartmental model that accounts for quarantine, self-isolation, soci…
▽ More
To increase situational awareness and support evidence-based policy-making, we formulated two types of mathematical models for COVID-19 transmission within a regional population. One is a fitting function that can be calibrated to reproduce an epidemic curve with two timescales (e.g., fast growth and slow decay). The other is a compartmental model that accounts for quarantine, self-isolation, social distancing, a non-exponentially distributed incubation period, asymptomatic individuals, and mild and severe forms of symptomatic disease. Using Bayesian inference, we have been calibrating our models daily for consistency with new reports of confirmed cases from the 15 most populous metropolitan statistical areas in the United States and quantifying uncertainty in parameter estimates and predictions of future case reports. This online learning approach allows for early identification of new trends despite considerable variability in case reporting. We infer new significant upward trends for five of the metropolitan areas starting between 19-April-2020 and 12-June-2020.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Bayesian Uncertainty Quantification for Systems Biology Models Parameterized Using Qualitative Data
Authors:
Eshan D. Mitra,
William S. Hlavacek
Abstract:
Motivation: Recent work has demonstrated the feasibility of using non-numerical, qualitative data to parameterize mathematical models. However, uncertainty quantification (UQ) of such parameterized models has remained challenging because of a lack of a statistical interpretation of the objective functions used in optimization. Results: We formulated likelihood functions suitable for performing Bay…
▽ More
Motivation: Recent work has demonstrated the feasibility of using non-numerical, qualitative data to parameterize mathematical models. However, uncertainty quantification (UQ) of such parameterized models has remained challenging because of a lack of a statistical interpretation of the objective functions used in optimization. Results: We formulated likelihood functions suitable for performing Bayesian UQ using qualitative data or a combination of qualitative and quantitative data. To demonstrate the resulting UQ capabilities, we analyzed a published model for IgE receptor signaling using synthetic qualitative and quantitative datasets. Remarkably, estimates of parameter values derived from the qualitative data were nearly as consistent with the assumed ground-truth parameter values as estimates derived from the lower throughput quantitative data. These results provide further motivation for leveraging qualitative data in biological modeling. Availability: The likelihood functions presented here are implemented in a new release of PyBioNetFit, an open-source application for analyzing SBML- and BNGL-formatted models, available online at www.github.com/lanl/PyBNF.
△ Less
Submitted 30 August, 2019;
originally announced September 2019.
-
Parameter Estimation and Uncertainty Quantification for Systems Biology Models
Authors:
Eshan D. Mitra,
William S. Hlavacek
Abstract:
Mathematical models can provide quantitative insight into immunoreceptor signaling, but require parameterization and uncertainty quantification before making reliable predictions. We review currently available methods and software tools to address these problems. We consider gradient-based and gradient-free methods for point estimation of parameter values, and methods of profile likelihood, bootst…
▽ More
Mathematical models can provide quantitative insight into immunoreceptor signaling, but require parameterization and uncertainty quantification before making reliable predictions. We review currently available methods and software tools to address these problems. We consider gradient-based and gradient-free methods for point estimation of parameter values, and methods of profile likelihood, bootstrapping, and Bayesian inference for uncertainty quantification. We consider recent and potential future applications of these methods to systems-level modeling of immune-related phenomena.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.
-
Scaling methods for accelerating kinetic Monte Carlo simulations of chemical reaction networks
Authors:
Yen Ting Lin,
Song Feng,
William S. Hlavacek
Abstract:
Various kinetic Monte Carlo algorithms become inefficient when some of the population sizes in a system are large, which gives rise to a large number of reaction events per unit time. Here, we present a new acceleration algorithm based on adaptive and heterogeneous scaling of reaction rates and stoichiometric coefficients. The algorithm is conceptually related to the commonly used idea of accelera…
▽ More
Various kinetic Monte Carlo algorithms become inefficient when some of the population sizes in a system are large, which gives rise to a large number of reaction events per unit time. Here, we present a new acceleration algorithm based on adaptive and heterogeneous scaling of reaction rates and stoichiometric coefficients. The algorithm is conceptually related to the commonly used idea of accelerating a stochastic simulation by considering a sub-volume $λΩ$ ($0<λ<1$) within a system of interest, which reduces the number of reaction events per unit time occurring in a simulation by a factor $1/λ$ at the cost of greater error in unbiased estimates of first moments and biased overestimates of second moments. Our new approach offers two unique benefits. First, scaling is adaptive and heterogeneous, which eliminates the pitfall of overaggressive scaling. Second, there is no need for an \emph{a priori} classification of populations as discrete or continuous (as in a hybrid method), which is problematic when discreteness of a chemical species changes during a simulation. The method requires specification of only a single algorithmic parameter, $N_c$, a global critical population size above which populations are effectively scaled down to increase simulation efficiency. The method, which we term partial scaling, is implemented in the open-source BioNetGen software package. We demonstrate that partial scaling can significantly accelerate simulations without significant loss of accuracy for several published models of biological systems. These models characterize activation of the mitogen-activated protein kinase ERK, prion protein aggregation, and T-cell receptor signaling.
△ Less
Submitted 10 May, 2019; v1 submitted 20 March, 2019;
originally announced March 2019.
-
PyBioNetFit and the Biological Property Specification Language
Authors:
Eshan D. Mitra,
Ryan Suderman,
Joshua Colvin,
Alexander Ionkov,
Andrew Hu,
Herbert M. Sauro,
Richard G. Posner,
William S. Hlavacek
Abstract:
In systems biology modeling, important steps include model parameterization, uncertainty quantification, and evaluation of agreement with experimental observations. To help modelers perform these steps, we developed the software PyBioNetFit. PyBioNetFit is designed for parameterization, and also supports uncertainty quantification, checking models against known system properties, and solving desig…
▽ More
In systems biology modeling, important steps include model parameterization, uncertainty quantification, and evaluation of agreement with experimental observations. To help modelers perform these steps, we developed the software PyBioNetFit. PyBioNetFit is designed for parameterization, and also supports uncertainty quantification, checking models against known system properties, and solving design problems. PyBioNetFit introduces the Biological Property Specification Language (BPSL) for the formal declaration of system properties. BPSL allows qualitative data to be used alone or in combination with quantitative data for parameterization model checking, and design. PyBioNetFit performs parameterization with parallelized metaheuristic optimization algorithms (differential evolution, particle swarm optimization, scatter search) that work directly with existing model definition standards: BioNetGen Language (BNGL) and Systems Biology Markup Language (SBML). We demonstrate PyBioNetFit's capabilities by solving 31 example problems, including the challenging problem of parameterizing a model of cell cycle control in yeast. We benchmark PyBioNetFit's parallelization efficiency on computer clusters, using up to 288 cores. Finally, we demonstrate the model checking and design applications of PyBioNetFit and BPSL by analyzing a model of therapeutic interventions in autophagy signaling.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
A Step-by-Step Guide to Using BioNetFit
Authors:
William S. Hlavacek,
Jennifer Longo,
Lewis R. Baker,
María del Carmen Ramos Álamo,
Alexander Ionkov,
Eshan D. Mitra,
Ryan Suderman,
Keesha E. Erickson,
Raquel Dias,
Joshua Colvin,
Brandon R. Thomas,
Richard G. Posner
Abstract:
BioNetFit is a software tool designed for solving parameter identification problems that arise in the development of rule-based models. It solves these problems through curve fitting (i.e., nonlinear regression). BioNetFit is compatible with deterministic and stochastic simulators that accept BioNetGen language (BNGL)-formatted files as inputs, such those available within the BioNetGen framework.…
▽ More
BioNetFit is a software tool designed for solving parameter identification problems that arise in the development of rule-based models. It solves these problems through curve fitting (i.e., nonlinear regression). BioNetFit is compatible with deterministic and stochastic simulators that accept BioNetGen language (BNGL)-formatted files as inputs, such those available within the BioNetGen framework. BioNetFit can be used on a laptop or standalone multicore workstation as well as on many Linux clusters, such as those that use the Slurm Workload Manager to schedule jobs. BioNetFit implements a metaheuristic population-based global optimization procedure, an evolutionary algorithm (EA), to minimize a user-defined objective function, such as a residual sum of squares (RSS) function. BioNetFit also implements a bootstrapping procedure for determining confidence intervals for parameter estimates. Here, we provide step-by-step instructions for using BioNetFit to estimate the values of parameters of a BNGL-encoded model and to define bootstrap confidence intervals. The process entails the use of several plain-text files, which are processed by BioNetFit and BioNetGen. In general, these files include 1) one or more EXP files, which each contains (experimental) data to be used in parameter identification/bootstrapping; 2) a BNGL file containing a model section, which defines a (rule-based) model, and an actions section, which defines simulation protocols that generate GDAT and/or SCAN files with model predictions corresponding to the data in the EXP file(s); and 3) a CONF file that configures the fitting/bootstrapping job and that defines algorithmic parameter settings.
△ Less
Submitted 21 September, 2018;
originally announced September 2018.
-
Using RuleBuilder to graphically define and visualize BioNetGen-language patterns and reaction rules
Authors:
Ryan Suderman,
G. Matthew Fricke,
William S. Hlavacek
Abstract:
RuleBuilder is a tool for drawing graphs that can be represented by the BioNetGen language (BNGL), which is used to formulate mathematical, rule-based models of biochemical systems. BNGL provides an intuitive plain-text, or string, representation of such systems, which is based on a graphical formalism. Reactions are defined in terms of graph-rewriting rules that specify the necessary intrinsic pr…
▽ More
RuleBuilder is a tool for drawing graphs that can be represented by the BioNetGen language (BNGL), which is used to formulate mathematical, rule-based models of biochemical systems. BNGL provides an intuitive plain-text, or string, representation of such systems, which is based on a graphical formalism. Reactions are defined in terms of graph-rewriting rules that specify the necessary intrinsic properties of the reactants, a transformation, and a rate law. Rules may also contain contextual constraints that restrict application of the rule. In some cases, the specification of contextual constraints can be verbose, making a rule difficult to read. RuleBuilder is designed to ease the task of reading and writing individual reaction rules, as well as individual BNGL patterns similar to those found in rules. The software assists in the reading of existing models by converting BNGL strings of interest into a graph-based representation composed of nodes and edges. RuleBuilder also enables the user to construct de novo a visual representation of BNGL strings using drawing tools available in its interface. As objects are added to the drawing canvas, the corresponding BNGL string is generated on the fly, and objects are similarly drawn on the fly as BNGL strings are entered into the application. RuleBuilder thus facilitates construction and interpretation of rule-based models.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
Generalizing Gillespie's direct method to enable network-free simulations
Authors:
Ryan Suderman,
Eshan D. Mitra,
Yen Ting Lin,
Keesha E. Erickson,
Song Feng,
William S. Hlavacek
Abstract:
Gillespie's direct method for stochastic simulation of chemical kinetics is a staple of computational systems biology research. However, the algorithm requires explicit enumeration of all reactions and all chemical species that may arise in the system. In many cases, this is not feasible due to the combinatorial explosion of reactions and species in biological networks. Rule-based modeling framewo…
▽ More
Gillespie's direct method for stochastic simulation of chemical kinetics is a staple of computational systems biology research. However, the algorithm requires explicit enumeration of all reactions and all chemical species that may arise in the system. In many cases, this is not feasible due to the combinatorial explosion of reactions and species in biological networks. Rule-based modeling frameworks provide a way to exactly represent networks containing such combinatorial complexity, and generalizations of Gillespie's direct method have been developed as simulation engines for rule-based modeling languages. Here, we provide both a high-level description of the algorithms underlying the simulation engines, termed network-free simulation algorithms, and how they have been applied in systems biology research. We also define a generic rule-based modeling framework and describe a number of technical details required for adapting Gillespie's direct method for network-free simulation. Finally, we briefly discuss potential avenues for advancing network-free simulation and the role they continue to play in modeling dynamical systems in biology.
△ Less
Submitted 30 January, 2018;
originally announced January 2018.
-
Scaffold-mediated Nucleation of Protein Signaling Complexes: Elementary Principles
Authors:
Jin Yang,
William S. Hlavacek
Abstract:
Proteins with multiple binding sites play important roles in cell signaling systems by nucleating protein complexes in which, for example, enzymes and substrates are co-localized. Proteins that specialize in this function are called by a variety names, including adapter, linker and scaffold. Scaffold-mediated nucleation of protein complexes can be either constitutive or induced. Induced nucleation…
▽ More
Proteins with multiple binding sites play important roles in cell signaling systems by nucleating protein complexes in which, for example, enzymes and substrates are co-localized. Proteins that specialize in this function are called by a variety names, including adapter, linker and scaffold. Scaffold-mediated nucleation of protein complexes can be either constitutive or induced. Induced nucleation is commonly mediated by a docking site on a scaffold that is activated by phosphorylation. Here, by considering minimalist mathematical models, which recapitulate scaffold effects seen in more mechanistically detailed models, we obtain analytical and numerical results that provide insights into scaffold function. These results elucidate how recruitment of a pair of ligands to a scaffold depends on the concentrations of the ligands, on the binding constants for ligand-scaffold interactions, on binding cooperativity, and on the milieu of the scaffold, as ligand recruitment is affected by competitive ligands and decoy receptors. For the case of a bivalent scaffold, we obtain an expression for the unique scaffold concentration that maximally recruits a pair of monovalent ligands. Through simulations, we demonstrate that a bivalent scaffold can nucleate distinct sets of ligands to equivalent extents when the scaffold is present at different concentrations. Thus, the function of a scaffold can potentially change qualitatively with a change in copy number. We also demonstrate how a scaffold can change the catalytic efficiency of an enzyme and the sensitivity of the rate of reaction to substrate concentration. The results presented here should be useful for understanding scaffold function and for engineering scaffolds to have desired properties.
△ Less
Submitted 8 September, 2011; v1 submitted 24 May, 2011;
originally announced May 2011.
-
Rule-based Modeling and Simulation of Biochemical Systems with Molecular Finite Automata
Authors:
Jin Yang,
Xin Meng,
William S. Hlavacek
Abstract:
We propose a theoretical formalism, molecular finite automata (MFA), to describe individual proteins as rule-based computing machines. The MFA formalism provides a framework for modeling individual protein behaviors and systems-level dynamics via construction of programmable and executable machines. Models specified within this formalism explicitly represent the context-sensitive dynamics of indiv…
▽ More
We propose a theoretical formalism, molecular finite automata (MFA), to describe individual proteins as rule-based computing machines. The MFA formalism provides a framework for modeling individual protein behaviors and systems-level dynamics via construction of programmable and executable machines. Models specified within this formalism explicitly represent the context-sensitive dynamics of individual proteins driven by external inputs and represent protein-protein interactions as synchronized machine reconfigurations. Both deterministic and stochastic simulations can be applied to quantitatively compute the dynamics of MFA models. We apply the MFA formalism to model and simulate a simple example of a signal transduction system that involves a MAP kinase cascade and a scaffold protein.
△ Less
Submitted 15 September, 2010; v1 submitted 8 July, 2010;
originally announced July 2010.
-
Rejection-free kinetic Monte Carlo simulation of multivalent biomolecular interactions
Authors:
Jin Yang,
William S. Hlavacek
Abstract:
The system-level dynamics of multivalent biomolecular interactions can be simulated using a rule-based kinetic Monte Carlo method in which a rejection sampling strategy is used to generate reaction events. This method becomes inefficient when simulating aggregation processes with large biomolecular complexes. Here, we present a rejection-free method for determining the kinetics of multivalent bi…
▽ More
The system-level dynamics of multivalent biomolecular interactions can be simulated using a rule-based kinetic Monte Carlo method in which a rejection sampling strategy is used to generate reaction events. This method becomes inefficient when simulating aggregation processes with large biomolecular complexes. Here, we present a rejection-free method for determining the kinetics of multivalent biomolecular interactions, and we apply the method to simulate simple models for ligand-receptor interactions. Simulation results show that performance of the rejection-free method is equal to or better than that of the rejection method over wide parameter ranges, and the rejection-free method is more efficient for simulating systems in which aggregation is extensive. The rejection-free method reported here should be useful for simulating a variety of systems in which multisite molecular interactions yield large molecular aggregates.
△ Less
Submitted 3 March, 2010; v1 submitted 25 December, 2008;
originally announced December 2008.
-
Determinants of bistability in induction of the Escherichia coli lac operon
Authors:
David W. Dreisigmeyer,
Jelena Stajic,
Ilya Nemenman,
William S. Hlavacek,
Michael E. Wall
Abstract:
We have developed a mathematical model of regulation of expression of the Escherichia coli lac operon, and have investigated bistability in its steady-state induction behavior in the absence of external glucose. Numerical analysis of equations describing regulation by artificial inducers revealed two natural bistability parameters that can be used to control the range of inducer concentrations o…
▽ More
We have developed a mathematical model of regulation of expression of the Escherichia coli lac operon, and have investigated bistability in its steady-state induction behavior in the absence of external glucose. Numerical analysis of equations describing regulation by artificial inducers revealed two natural bistability parameters that can be used to control the range of inducer concentrations over which the model exhibits bistability. By tuning these bistability parameters, we found a family of biophysically reasonable systems that are consistent with an experimentally determined bistable region for induction by thio-methylgalactoside (Ozbudak et al. Nature 427:737, 2004). The model predicts that bistability can be abolished when passive transport or permease export becomes sufficiently large; the former case is especially relevant to induction by isopropyl-beta, D-thiogalactopyranoside. To model regulation by lactose, we developed similar equations in which allolactose, a metabolic intermediate in lactose metabolism and a natural inducer of lac, is the inducer. For biophysically reasonable parameter values, these equations yield no bistability in response to induction by lactose; however, systems with an unphysically small permease-dependent export effect can exhibit small amounts of bistability for limited ranges of parameter values. These results cast doubt on the relevance of bistability in the lac operon within the natural context of E. coli, and help shed light on the controversy among existing theoretical studies that address this issue. The results also suggest an experimental approach to address the relevance of bistability in the lac operon within the natural context of E. coli.
△ Less
Submitted 8 February, 2008;
originally announced February 2008.
-
Kinetic Monte Carlo Method for Rule-based Modeling of Biochemical Networks
Authors:
Jin Yang,
Michael I. Monine,
James R. Faeder,
William S. Hlavacek
Abstract:
We present a kinetic Monte Carlo method for simulating chemical transformations specified by reaction rules, which can be viewed as generators of chemical reactions, or equivalently, definitions of reaction classes. A rule identifies the molecular components involved in a transformation, how these components change, conditions that affect whether a transformation occurs, and a rate law. The comp…
▽ More
We present a kinetic Monte Carlo method for simulating chemical transformations specified by reaction rules, which can be viewed as generators of chemical reactions, or equivalently, definitions of reaction classes. A rule identifies the molecular components involved in a transformation, how these components change, conditions that affect whether a transformation occurs, and a rate law. The computational cost of the method, unlike conventional simulation approaches, is independent of the number of possible reactions, which need not be specified in advance or explicitly generated in a simulation. To demonstrate the method, we apply it to study the kinetics of multivalent ligand-receptor interactions. We expect the method will be useful for studying cellular signaling systems and other physical systems involving aggregation phenomena.
△ Less
Submitted 22 August, 2008; v1 submitted 21 December, 2007;
originally announced December 2007.
-
Reconstruction of metabolic networks from high-throughput metabolite profiling data: in silico analysis of red blood cell metabolism
Authors:
Ilya Nemenman,
G. Sean Escola,
William S. Hlavacek,
Pat J. Unkefer,
Clifford J. Unkefer,
Michael E. Wall
Abstract:
We investigate the ability of algorithms developed for reverse engineering of transcriptional regulatory networks to reconstruct metabolic networks from high-throughput metabolite profiling data. For this, we generate synthetic metabolic profiles for benchmarking purposes based on a well-established model for red blood cell metabolism. A variety of data sets is generated, accounting for differen…
▽ More
We investigate the ability of algorithms developed for reverse engineering of transcriptional regulatory networks to reconstruct metabolic networks from high-throughput metabolite profiling data. For this, we generate synthetic metabolic profiles for benchmarking purposes based on a well-established model for red blood cell metabolism. A variety of data sets is generated, accounting for different properties of real metabolic networks, such as experimental noise, metabolite correlations, and temporal dynamics. These data sets are made available online. We apply ARACNE, a mainstream transcriptional networks reverse engineering algorithm, to these data sets and observe performance comparable to that obtained in the transcriptional domain, for which the algorithm was originally designed.
△ Less
Submitted 13 June, 2007;
originally announced June 2007.
-
Combinatorial complexity and dynamical restriction of network flows in signal transduction
Authors:
James R. Faeder,
Michael L. Blinov,
Byron Goldstein,
William S. Hlavacek
Abstract:
The activities and interactions of proteins that govern the cellular response to a signal generate a multitude of protein phosphorylation states and heterogeneous protein complexes. Here, using a computational model that accounts for 307 molecular species implied by specified interactions of four proteins involved in signalling by the immunoreceptor Fc$ε$RI, we determine the relative importance…
▽ More
The activities and interactions of proteins that govern the cellular response to a signal generate a multitude of protein phosphorylation states and heterogeneous protein complexes. Here, using a computational model that accounts for 307 molecular species implied by specified interactions of four proteins involved in signalling by the immunoreceptor Fc$ε$RI, we determine the relative importance of molecular species that can be generated during signalling, chemical transitions among these species, and reaction paths that lead to activation of the protein tyrosine kinase (PTK) Syk. By all of these measures and over 2- and 10-fold ranges of model parameters--rate constants and initial concentrations--only a small portion of the biochemical network is active. The spectrum of active complexes, however, can be shifted dramatically, even by a change in the concentration of a single protein, which suggests that the network can produce qualitatively different responses under different cellular conditions and in response to different inputs. Reduced models that reproduce predictions of the full model for a particular set of parameters lose their predictive capacity when parameters are varied over 2-fold ranges.
△ Less
Submitted 5 November, 2004;
originally announced November 2004.