-
Guidelines for releasing a variant effect predictor
Authors:
Benjamin J. Livesey,
Mihaly Badonyi,
Mafalda Dias,
Jonathan Frazer,
Sushant Kumar,
Kresten Lindorff-Larsen,
David M. McCandlish,
Rose Orenbuch,
Courtney A. Shearer,
Lara Muffley,
Julia Foreman,
Andrew M. Glazer,
Ben Lehner,
Debora S. Marks,
Frederick P. Roth,
Alan F. Rubin,
Lea M. Starita,
Joseph A. Marsh
Abstract:
Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in w…
▽ More
Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Buried and accessible surface area control intrinsic protein flexibility
Authors:
Joseph A. Marsh
Abstract:
Proteins experience a wide variety of conformational dynamics that can be crucial for facilitating their diverse functions. How is the intrinsic flexibility required for these motions encoded in their three-dimensional structures? Here, the overall flexibility of a protein is demonstrated to be tightly coupled to the total amount of surface area buried within its fold. A simple proxy for this, the…
▽ More
Proteins experience a wide variety of conformational dynamics that can be crucial for facilitating their diverse functions. How is the intrinsic flexibility required for these motions encoded in their three-dimensional structures? Here, the overall flexibility of a protein is demonstrated to be tightly coupled to the total amount of surface area buried within its fold. A simple proxy for this, the relative solvent accessible surface area (Arel), therefore shows excellent agreement with independent measures of global protein flexibility derived from various experimental and computational methods. Application of Arel on a large scale demonstrates its utility by revealing unique sequence and structural properties associated with intrinsic flexibility. In particular, flexibility as measured by Arel shows little correspondence with intrinsic disorder, but instead tends to be associated with multiple domains and increased α- helical structure. Furthermore, the apparent flexibility of monomeric proteins is found to be useful for identifying quaternary structure errors in published crystal structures. There is also a strong tendency for the crystal structures of more flexible proteins to be solved to lower resolutions. Finally, local solvent accessibility is shown to be a primary determinant of local residue flexibility. Overall this work provides both fundamental mechanistic insight into the origin of protein flexibility and a simple, practical method for predicting flexibility from protein structures.
△ Less
Submitted 13 August, 2013; v1 submitted 12 June, 2013;
originally announced June 2013.
-
Optimal transport on wireless networks
Authors:
Yong Yu,
Bogdan Danila,
John A. Marsh,
Kevin E. Bassler
Abstract:
We present a study of the application of a variant of a recently introduced heuristic algorithm for the optimization of transport routes on complex networks to the problem of finding the optimal routes of communication between nodes on wireless networks. Our algorithm iteratively balances network traffic by minimizing the maximum node betweenness on the network. The variant we consider specifica…
▽ More
We present a study of the application of a variant of a recently introduced heuristic algorithm for the optimization of transport routes on complex networks to the problem of finding the optimal routes of communication between nodes on wireless networks. Our algorithm iteratively balances network traffic by minimizing the maximum node betweenness on the network. The variant we consider specifically accounts for the broadcast restrictions imposed by wireless communication by using a different betweenness measure. We compare the performance of our algorithm to two other known algorithms and find that our algorithm achieves the highest transport capacity both for minimum node degree geometric networks, which are directed geometric networks that model wireless communication networks, and for configuration model networks that are uncorrelated scale-free networks.
△ Less
Submitted 28 March, 2007;
originally announced March 2007.
-
Transport optimization on complex networks
Authors:
Bogdan Danila,
Yong Yu,
John A. Marsh,
Kevin E. Bassler
Abstract:
We present a comparative study of the application of a recently introduced heuristic algorithm to the optimization of transport on three major types of complex networks. The algorithm balances network traffic iteratively by minimizing the maximum node betweenness with as little path lengthening as possible. We show that by using this optimal routing, a network can sustain significantly higher tr…
▽ More
We present a comparative study of the application of a recently introduced heuristic algorithm to the optimization of transport on three major types of complex networks. The algorithm balances network traffic iteratively by minimizing the maximum node betweenness with as little path lengthening as possible. We show that by using this optimal routing, a network can sustain significantly higher traffic without jamming than in the case of shortest path routing. A formula is proved that allows quick computation of the average number of hops along the path and of the average travel times once the betweennesses of the nodes are computed. Using this formula, we show that routing optimization preserves the small-world character exhibited by networks under shortest path routing, and that it significantly reduces the average travel time on congested networks with only a negligible increase in the average travel time at low loads. Finally, we study the correlation between the weights of the links in the case of optimal routing and the betweennesses of the nodes connected by them.
△ Less
Submitted 9 January, 2007;
originally announced January 2007.
-
Optimal routing on complex networks
Authors:
Bogdan Danila,
Yong Yu,
John A. Marsh,
Kevin E. Bassler
Abstract:
We present a novel heuristic algorithm for routing optimization on complex networks. Previously proposed routing optimization algorithms aim at avoiding or reducing link overload. Our algorithm balances traffic on a network by minimizing the maximum node betweenness with as little path lengthening as possible, thus being useful in cases when networks are jamming due to queuing overload. By using…
▽ More
We present a novel heuristic algorithm for routing optimization on complex networks. Previously proposed routing optimization algorithms aim at avoiding or reducing link overload. Our algorithm balances traffic on a network by minimizing the maximum node betweenness with as little path lengthening as possible, thus being useful in cases when networks are jamming due to queuing overload. By using the resulting routing table, a network can sustain significantly higher traffic without jamming than in the case of traditional shortest path routing.
△ Less
Submitted 8 July, 2006; v1 submitted 1 July, 2006;
originally announced July 2006.
-
Generalized Box-Muller method for generating q-Gaussian random deviates
Authors:
William Thistleton,
Kenric Nelson,
John A. Marsh,
Constantino Tsallis
Abstract:
Addendum: The generalized Box-Müller algorithm provides a methodology for generating q-Gaussian random variates. The parameter $-\infty<q\leq3$ is related to the shape of the tail decay; $q<1$ for compact-support including parabola $(q=0)$; $1<q\leq3$ for heavy-tail including Cauchy $(q=2)$. This addendum clarifies the transformation $q'=((3q-1)/(q+1))$ within the algorithm is due to a difference…
▽ More
Addendum: The generalized Box-Müller algorithm provides a methodology for generating q-Gaussian random variates. The parameter $-\infty<q\leq3$ is related to the shape of the tail decay; $q<1$ for compact-support including parabola $(q=0)$; $1<q\leq3$ for heavy-tail including Cauchy $(q=2)$. This addendum clarifies the transformation $q'=((3q-1)/(q+1))$ within the algorithm is due to a difference in the dimensions d of the generalized logarithm and the generalized distribution. The transformation is clarified by the decomposition of $q=1+2κ/(1+dκ)$, where the shape parameter $-1<κ\leq\infty$ quantifies the magnitude of the deformation from exponential. A simpler specification for the generalized Box- Müller algorithm is provided using the shape of the tail decay.
Original: The q-Gaussian distribution is known to be an attractor of certain correlated systems, and is the distribution which, under appropriate constraints, maximizes the entropy Sq, basis of nonextensive statistical mechanics. This theory is postulated as a natural extension of the standard (Boltzmann-Gibbs) statistical mechanics, and may explain the ubiquitous appearance of heavy-tailed distributions in both natural and man-made systems. The q-Gaussian distribution is also used as a numerical tool, for example as a visiting distribution in Generalized Simulated Annealing. We develop and present a simple, easy to implement numerical method for generating random deviates from a q-Gaussian distribution based upon a generalization of the well known Box-Muller method. Our method is suitable for a larger range of q values, q<3, than has previously appeared in the literature, and can generate deviates from q-Gaussian distributions of arbitrary width and center. MATLAB code showing a straightforward implementation is also included.
△ Less
Submitted 10 February, 2021; v1 submitted 23 May, 2006;
originally announced May 2006.
-
Influence of global correlations on central limit theorems and entropic extensivity
Authors:
John A. Marsh,
Miguel A. Fuentes,
Luis G. Moyano,
Constantino Tsallis
Abstract:
We consider probabilistic models of N identical distinguishable, binary random variables. If these variables are strictly or asymptotically independent, then, for N>>1, (i) the attractor in distribution space is, according to the standard central limit theorem, a Gaussian, and (ii) the Boltzmann-Gibbs-Shannon entropy is extensive, meaning that S_BGS(N) ~ N . If these variables have any nonvanish…
▽ More
We consider probabilistic models of N identical distinguishable, binary random variables. If these variables are strictly or asymptotically independent, then, for N>>1, (i) the attractor in distribution space is, according to the standard central limit theorem, a Gaussian, and (ii) the Boltzmann-Gibbs-Shannon entropy is extensive, meaning that S_BGS(N) ~ N . If these variables have any nonvanishing global (i.e., not asymptotically independent) correlations, then the attractor deviates from the Gaussian. The entropy appears to be more robust, in the sense that, in some cases, S_BGS remains extensive even in the presence of strong global correlations. In other cases, however, even weak global correlations make the entropy deviate from the normal behavior. More precisely, in such cases the entropic form Sq can become extensive for some value of q different from unity . This scenario is illustrated with several new as well as previously described models. The discussion illuminates recent progress into q-describable nonextensive probabilistic systems, and the conjectured q-Central Limit Theorem (q-CLT) which posses a q-Gaussian attractor.
△ Less
Submitted 31 March, 2006;
originally announced April 2006.
-
Congestion-gradient driven transport on complex networks
Authors:
Bogdan Danila,
Yong Yu,
Samuel Earl,
John A. Marsh,
Zoltan Toroczkai,
Kevin E. Bassler
Abstract:
We present a study of transport on complex networks with routing based on local information. Particles hop from one node of the network to another according to a set of routing rules with different degrees of congestion awareness, ranging from random diffusion to rigid congestion-gradient driven flow. Each node can be either source or destination for particles and all nodes have the same routing…
▽ More
We present a study of transport on complex networks with routing based on local information. Particles hop from one node of the network to another according to a set of routing rules with different degrees of congestion awareness, ranging from random diffusion to rigid congestion-gradient driven flow. Each node can be either source or destination for particles and all nodes have the same routing capacity, which are features of ad-hoc wireless networks. It is shown that the transport capacity increases when a small amount of congestion awareness is present in the routing rules, and that it then decreases as the routing rules become too rigid when the flow becomes strictly congestion-gradient driven. Therefore, an optimum value of the congestion awareness exists in the routing rules. It is also shown that, in the limit of a large number of nodes, networks using routing based on local information jam at any nonzero load. Finally, we study the correlation between congestion at node level and a betweenness centrality measure.
△ Less
Submitted 31 March, 2006;
originally announced March 2006.