-
Sub-Scalp EEG for Sensorimotor Brain-Computer Interface
Authors:
Timothy B Mahoney,
David B Grayden,
Sam E John
Abstract:
Objective: To establish sub-scalp electroencephalography (EEG) as a viable option for brain-computer interface (BCI) applications, particularly for chronic use, by demonstrating its effectiveness in recording and classifying sensorimotor neural activity. Approach: Two experiments were conducted in this study. The first aim was to demonstrate the high spatial resolution of sub-scalp EEG through ana…
▽ More
Objective: To establish sub-scalp electroencephalography (EEG) as a viable option for brain-computer interface (BCI) applications, particularly for chronic use, by demonstrating its effectiveness in recording and classifying sensorimotor neural activity. Approach: Two experiments were conducted in this study. The first aim was to demonstrate the high spatial resolution of sub-scalp EEG through analysis of somatosensory evoked potentials in sheep models. The second focused on the practical application of sub-scalp EEG, classifying motor execution using data collected during a sheep behavioural experiment. Main Results: We successfully demonstrated the recording of sensorimotor rhythms using sub-scalp EEG in sheep models. Important spatial, temporal, and spectral features of these signals were identified, and we were able to classify motor execution with above-chance performance. These results are comparable to previous work that investigated signal quality and motor execution classification using ECoG and endovascular arrays in sheep models. Significance: These results suggest that sub-scalp EEG may provide signal quality that approaches that of more invasive neural recording methods such as ECoG and endovascular arrays, and support the use of sub-scalp EEG for chronic BCI applications.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
smFISH_batchRun: A smFISH image processing tool for single-molecule RNA Detection and 3D reconstruction
Authors:
Nimmy S. John,
ChangHwan Lee
Abstract:
Single-molecule RNA imaging has been made possible with the recent advances in microscopy methods. However, systematic analysis of these images has been challenging due to the highly variable background noise, even after applying sophisticated computational clearing methods. Here, we describe our custom MATLAB scripts that allow us to detect both nuclear nascent transcripts at the active transcrip…
▽ More
Single-molecule RNA imaging has been made possible with the recent advances in microscopy methods. However, systematic analysis of these images has been challenging due to the highly variable background noise, even after applying sophisticated computational clearing methods. Here, we describe our custom MATLAB scripts that allow us to detect both nuclear nascent transcripts at the active transcription sites (ATS) and mature cytoplasmic mRNAs with single-molecule precision and reconstruct the tissue in 3D for further analysis. Our codes were initially optimized for the C. elegans germline but were designed to be broadly applicable to other species and tissue types.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
Stochastic Time to Extinction of an SIQS Epidemic Model with Quiescence
Authors:
Usman Sanusi,
Sona John,
Johannes Mueller,
Aurélien Tellier
Abstract:
Parasite quiescence is the ability for the pathogen to be inactive, with respect to metabolism and infectiousness, for some amount of time and then become active (infectious) again. The population is thus composed of an inactive proportion, and an active part in which evolution and reproduction takes place. In this paper, we investigate the effect of parasite quiescence on the time to extinction o…
▽ More
Parasite quiescence is the ability for the pathogen to be inactive, with respect to metabolism and infectiousness, for some amount of time and then become active (infectious) again. The population is thus composed of an inactive proportion, and an active part in which evolution and reproduction takes place. In this paper, we investigate the effect of parasite quiescence on the time to extinction of infectious disease epidemics. We build a Susceptible-Infected-Quiescent-Susceptible (SIQS) epidemiological model. Hereby, host individuals infected by a quiescent parasite strain cannot recover, but are not infectious. We particularly focus on stochastic effects. We show that the quiescent state does not affect the reproduction number, but for a wide range of parameters the model behaves as an SIS model at a slower time scale, given by the fraction of time infected individuals are within the I state (and not in the Q state). This finding, proven using a time scale argument and singular perturbation theory for Markov processes, is illustrated and validated by numerical experiments based on the quasi-steady state distribution. We find here that the result even holds without a distinct time scale separation. Our results highlight the influence of quiescence as a bet-hedging strategy against disease stochastic extinction, and are relevant for predicting infectious disease dynamics in small populations.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
The GFB Tree and Tree Imbalance Indices
Authors:
Sean Cleary,
Mareike Fischer,
Katherine St. John
Abstract:
Tree balance plays an important role in various research areas in phylogenetics and computer science. Typically, it is measured with the help of a balance index or imbalance index. There are more than 25 such indices available, recently surveyed in a book by Fischer et al. They are used to rank rooted binary trees on a scale from the most balanced to the stringiest. We show that a wide range of su…
▽ More
Tree balance plays an important role in various research areas in phylogenetics and computer science. Typically, it is measured with the help of a balance index or imbalance index. There are more than 25 such indices available, recently surveyed in a book by Fischer et al. They are used to rank rooted binary trees on a scale from the most balanced to the stringiest. We show that a wide range of subtree-sized based measures satisfying concavity and monotonicity conditions are minimized by the complete or greedy-from-the-bottom (GFB) tree and maximized by the caterpillar tree, yielding an infinitely large family of distinct new imbalance indices. Answering an open question from the literature, we show that one such established measure, the $\widehat{s}$-shape statistic, has the GFB tree as its unique minimizer. We also provide an alternative characterization of GFB trees, showing that they are equivalent to complete trees, which arise in different contexts. We give asymptotic bounds on the expected $\widehat{s}$-shape statistic under the uniform and Yule-Harding distributions of trees, and answer questions for the related $Q$-shape statistic as well.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Decoding Imagined Movement in People with Multiple Sclerosis for Brain-Computer Interface Translation
Authors:
John S. Russo,
Thomas A. Shiels,
Chin-Hsuan Sophie Lin,
Sam E. John,
David B. Grayden
Abstract:
Multiple Sclerosis (MS) is a heterogeneous autoimmune-mediated disorder affecting the central nervous system, commonly manifesting as fatigue and progressive limb impairment. This can significantly impact quality of life due to weakness or paralysis in the upper and lower limbs. A Brain-Computer Interface (BCI) aims to restore quality of life through control of an external device, such as a wheelc…
▽ More
Multiple Sclerosis (MS) is a heterogeneous autoimmune-mediated disorder affecting the central nervous system, commonly manifesting as fatigue and progressive limb impairment. This can significantly impact quality of life due to weakness or paralysis in the upper and lower limbs. A Brain-Computer Interface (BCI) aims to restore quality of life through control of an external device, such as a wheelchair. However, the limited BCI research in people with MS is insufficient. The current study aims to expand on the current MS-BCI literature by highlighting the feasibility of decoding MS imagined movement. We collected electroencephalography (EEG) data from eight participants with various symptoms of MS and ten neurotypical control participants. Participants made imagined movements of the hands and feet as directed by a go no-go protocol. Binary regularised linear discriminant analysis was used to classify imagined movement at individual time-frequency points. The frequency bands which provided the maximal accuracy, and the associated latency, were compared. In all MS participants, the classification algorithm achieved above 70% accuracy in at least one imagined movement vs. rest classification and most movement vs. movement classifications. There was no significant difference between classification of limbs with weakness or paralysis to neurotypical controls. Both the MS and control groups possessed decodable information within the alpha (7-13 Hz) and beta (16-30 Hz) bands at similar latency. This study is the first to demonstrate the feasibility of decoding imagined movements in people with MS. As an alternative to the P300 response, motor imagery-based control of a BCI may also be combined with existing motor imagery therapy to supplement MS rehabilitation. These promising results merit further long term BCI studies to investigate the effect of MS progression on classification performance.
△ Less
Submitted 28 November, 2024;
originally announced November 2024.
-
BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery
Authors:
Peter St. John,
Dejun Lin,
Polina Binder,
Malcolm Greaves,
Vega Shah,
John St. John,
Adrian Lange,
Patrick Hsu,
Rajesh Illango,
Arvind Ramanathan,
Anima Anandkumar,
David H Brookes,
Akosua Busia,
Abhishaike Mahajan,
Stephen Malina,
Neha Prasad,
Sam Sinai,
Lindsay Edwards,
Thomas Gaudelet,
Cristian Regep,
Martin Steinegger,
Burkhard Rost,
Alexander Brace,
Kyle Hippe,
Luca Naef
, et al. (63 additional authors not shown)
Abstract:
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational bio…
▽ More
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.
△ Less
Submitted 9 June, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
The Canadian VirusSeq Data Portal & Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology
Authors:
Erin E. Gill,
Baofeng Jia,
Carmen Lia Murall,
Raphaël Poujol,
Muhammad Zohaib Anwar,
Nithu Sara John,
Justin Richardsson,
Ashley Hobb,
Abayomi S. Olabode,
Alexandru Lepsa,
Ana T. Duggan,
Andrea D. Tyler,
Arnaud N'Guessan,
Atul Kachru,
Brandon Chan,
Catherine Yoshida,
Christina K. Yung,
David Bujold,
Dusan Andric,
Edmund Su,
Emma J. Griffiths,
Gary Van Domselaar,
Gordon W. Jolly,
Heather K. E. Ward,
Henrich Feher
, et al. (45 additional authors not shown)
Abstract:
The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes acro…
▽ More
The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). The Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. Here we also highlight Duotang, a web platform that presents genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Towards Developing Brain-Computer Interfaces for People with Multiple Sclerosis
Authors:
John S. Russo,
Tim Mahoney,
Kirill Kokorin,
Ashley Reynolds,
Chin-Hsuan Sophie Lin,
Sam E. John,
David B. Grayden
Abstract:
Multiple Sclerosis (MS) is a severely disabling condition that leads to various neurological symptoms. A Brain-Computer Interface (BCI) may substitute some lost function; however, there is a lack of BCI research in people with MS. To progress this research area effectively and efficiently, we aimed to evaluate user needs and assess the feasibility and user-centric requirements of a BCI for people…
▽ More
Multiple Sclerosis (MS) is a severely disabling condition that leads to various neurological symptoms. A Brain-Computer Interface (BCI) may substitute some lost function; however, there is a lack of BCI research in people with MS. To progress this research area effectively and efficiently, we aimed to evaluate user needs and assess the feasibility and user-centric requirements of a BCI for people with MS. We conducted an online survey of 34 people with MS to qualitatively assess user preferences and establish the initial steps of user-centred design. The survey aimed to understand their interest and preferences in BCI and bionic applications. We demonstrated widespread interest for BCI applications in all stages of MS, with a preference for a non-invasive (n = 12) or minimally invasive (n = 15) BCI over carer assistance (n = 6). Qualitative assessment indicated that this preference was not influenced by level of independence. Additionally, strong interest was noted in bionic technology for sensory and autonomic functions. Considering the potential to enhance independence and quality of life for people living with MS, the results emphasise the importance of user-centred design for future advancement of BCIs that account for the unique pathological changes associated with MS.
△ Less
Submitted 8 April, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
Plug & Play Directed Evolution of Proteins with Gradient-based Discrete MCMC
Authors:
Patrick Emami,
Aidan Perreault,
Jeffrey Law,
David Biagioni,
Peter C. St. John
Abstract:
A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations that improve the function of a known protein. We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models, such as protein language models, and supervised models that predict protein function from sequence. By…
▽ More
A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations that improve the function of a known protein. We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models, such as protein language models, and supervised models that predict protein function from sequence. By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins. Our framework achieves this without any model fine-tuning or re-training by constructing a product of experts distribution directly in discrete protein space. Instead of resorting to brute force search or random sampling, which is typical of classic directed evolution, we introduce a fast MCMC sampler that uses gradients to propose promising mutations. We conduct in silico directed evolution experiments on wide fitness landscapes and across a range of different pre-trained unsupervised models, including a 650M parameter protein language model. Our results demonstrate an ability to efficiently discover variants with high evolutionary likelihood as well as estimated activity multiple mutations away from a wild type protein, suggesting our sampler provides a practical and effective new paradigm for machine-learning-based protein engineering.
△ Less
Submitted 6 April, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Age structure, replicator equation, and the prisoner's dilemma
Authors:
Sona John,
Johannes Müller
Abstract:
We investigate the evolutionary dynamics of an age-structured population under weak frequency-dependent selection. It turns out that the weak selection is affected in a non-trivial way by the life-history trait. We can disentangle the dynamics, based on the appearance of different time scales. These time scales, which seem to form a universal structure in the interplay of weak selection and life-h…
▽ More
We investigate the evolutionary dynamics of an age-structured population under weak frequency-dependent selection. It turns out that the weak selection is affected in a non-trivial way by the life-history trait. We can disentangle the dynamics, based on the appearance of different time scales. These time scales, which seem to form a universal structure in the interplay of weak selection and life-history traits, allow us to reduce the infinite dimensional model to a one-dimensional modified replicator equation. The modified replicator equation is then used to investigate cooperation (the prisoner's dilemma) by means of adaptive dynamics. We identify conditions under which age structure is able to promote cooperation. At the end we discuss the relevance of our findings.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Quiescence generates moving average in a stochastic epidemiological model with one host and two parasites
Authors:
Usman Sanusi,
Sona John,
Johannes Mueller,
Aurélien Tellier
Abstract:
Mathematical modelling of epidemiological and coevolutionary dynamics is widely being used to improve disease management strategies of infectious diseases. Many diseases present some form of intra-host quiescent stage, also known as covert infection, while others exhibit dormant stages in the environment. As quiescent/dormant stages can be resistant to drug, antibiotics, fungicide treatments, it i…
▽ More
Mathematical modelling of epidemiological and coevolutionary dynamics is widely being used to improve disease management strategies of infectious diseases. Many diseases present some form of intra-host quiescent stage, also known as covert infection, while others exhibit dormant stages in the environment. As quiescent/dormant stages can be resistant to drug, antibiotics, fungicide treatments, it is of practical relevance to study the influence of these two life-history traits on the coevolutionary dynamics. We develop first a deterministic coevolutionary model with two parasite types infecting one host type and study analytically the stability of the dynamical system. We specifically derive a stability condition for a five-by-five system of equations with quiescence. Second, we develop a stochastic version of the model to study the influence of quiescence on stochasticity of the system dynamics. We compute the steady state distribution of the parasite types which follows a multivariate normal distribution. Furthermore, we obtain numerical solutions for the covariance matrix of the system under symmetric and asymmetric quiescence rates between parasite types. When parasite strains are identical, quiescence increases the variance of the number of infected individuals at high transmission rate and vice versa when the transmission rate is low. However, when there is competition between parasite strains with different quiescent rates, quiescence generates a moving average behaviour which dampen off stochasticity and decreases the variance of the number of infected hosts. The strain with the highest rate of entering quiescence determines the strength of the moving average and the magnitude of reduction of stochasticity. Thus, it is worth investigating simple models of multi-strain parasite under quiescence/dormancy to improve disease management strategies.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
The space of equidistant phylogenetic cactuses
Authors:
Katharina T. Huber,
Vincent Moulton,
Megan Owen,
Andreas Spillner,
Katherine St. John
Abstract:
We introduce and investigate the space of \emph{equidistant} $X$-\emph{cactuses}. These are rooted, arc weighted, phylogenetic networks with leaf set $X$, where $X$ is a finite set of species, and all leaves have the same distance from the root. The space contains as a subset the space of ultrametric trees on $X$ that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus spac…
▽ More
We introduce and investigate the space of \emph{equidistant} $X$-\emph{cactuses}. These are rooted, arc weighted, phylogenetic networks with leaf set $X$, where $X$ is a finite set of species, and all leaves have the same distance from the root. The space contains as a subset the space of ultrametric trees on $X$ that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points. As a key step to proving this, we present a combinatorial result concerning \emph{ranked} rooted $X$-cactuses. In particular, we show that such networks can be encoded in terms of a pairwise compatibility condition arising from a poset of collections of pairs of subsets of $X$ that satisfy certain set-theoretic properties. As a corollary, we also obtain an encoding of ranked, rooted $X$-trees in terms of partitions of $X$, which provides an alternative proof that the space of ultrametric trees on $X$ is CAT(0). As with spaces of phylogenetic trees, we expect that our results should provide the basis for and new directions in performing statistical analyses for collections of phylogenetic networks with arc lengths.
△ Less
Submitted 11 November, 2021;
originally announced November 2021.
-
4Dia: A tool for automated 4D microscope image alignment
Authors:
Nimmy S. John,
Michelle A. Urman,
ChangHwan Lee
Abstract:
Recent advances in microscopy enable three-dimensional live imaging at a high resolution. Long-term live imaging of a multicellular organism requires immobilization of the organism under stable physiological conditions. Despite proper immobilization, challenges remain within live imaging data analysis due to other intrinsic and extrinsic dynamics, which can result in misalignments of an image seri…
▽ More
Recent advances in microscopy enable three-dimensional live imaging at a high resolution. Long-term live imaging of a multicellular organism requires immobilization of the organism under stable physiological conditions. Despite proper immobilization, challenges remain within live imaging data analysis due to other intrinsic and extrinsic dynamics, which can result in misalignments of an image series over time. 4Dia, an ImageJ/Fiji macro script, aligns 3D timelapse images through Z-stacks as well as over time using any user selected channel. 4Dia can be used for essentially any tissue sample with no limit on the size of Z-stack or the number of timepoints.
△ Less
Submitted 6 November, 2021;
originally announced November 2021.
-
Maximum Covering Subtrees for Phylogenetic Networks
Authors:
Nathan Davidov,
Amanda Hernandez,
Justin Jian,
Patrick McKenna,
K. A. Medlin,
Roadra Mojumder,
Megan Owen,
Andrew Quijano,
Amanda Rodriguez,
Katherine St. John,
Katherine Thai,
Meliza Uraga
Abstract:
Tree-based phylogenetic networks, which may be roughly defined as leaf-labeled networks built by adding arcs only between the original tree edges, have elegant properties for modeling evolutionary histories. We answer an open question of Francis, Semple, and Steel about the complexity of determining how far a phylogenetic network is from being tree-based, including non-binary phylogenetic networks…
▽ More
Tree-based phylogenetic networks, which may be roughly defined as leaf-labeled networks built by adding arcs only between the original tree edges, have elegant properties for modeling evolutionary histories. We answer an open question of Francis, Semple, and Steel about the complexity of determining how far a phylogenetic network is from being tree-based, including non-binary phylogenetic networks. We show that finding a phylogenetic tree covering the maximum number of nodes in a phylogenetic network can be be computed in polynomial time via an encoding into a minimum-cost maximum flow problem.
△ Less
Submitted 24 November, 2020; v1 submitted 25 September, 2020;
originally announced September 2020.
-
On the maximum agreement subtree conjecture for balanced trees
Authors:
Magnus Bordewich,
Simone Linz,
Megan Owen,
Katherine St. John,
Charles Semple,
Kristina Wicke
Abstract:
We give a counterexample to the conjecture of Martin and Thatte that two balanced rooted binary leaf-labelled trees on $n$ leaves have a maximum agreement subtree (MAST) of size at least $n^{\frac{1}{2}}$. In particular, we show that for any $c>0$, there exist two balanced rooted binary leaf-labelled trees on $n$ leaves such that any MAST for these two trees has size less than $c n^{\frac{1}{2}}$.…
▽ More
We give a counterexample to the conjecture of Martin and Thatte that two balanced rooted binary leaf-labelled trees on $n$ leaves have a maximum agreement subtree (MAST) of size at least $n^{\frac{1}{2}}$. In particular, we show that for any $c>0$, there exist two balanced rooted binary leaf-labelled trees on $n$ leaves such that any MAST for these two trees has size less than $c n^{\frac{1}{2}}$. We also improve the lower bound of the size of such a MAST to $n^{\frac{1}{6}}$.
△ Less
Submitted 15 May, 2020;
originally announced May 2020.
-
Properties for the Frechet Mean in Billera-Holmes-Vogtmann Treespace
Authors:
Maria Anaya,
Olga Anipchenko-Ulaj,
Aisha Ashfaq,
Joyce Chiu,
Mahedi Kaiser,
Max Shoji Ohsawa,
Megan Owen,
Ella Pavlechko,
Katherine St. John,
Shivam Suleria,
Keith Thompson,
Corrine Yap
Abstract:
The Billera-Holmes-Vogtmann (BHV) space of weighted trees can be embedded in Euclidean space, but the extrinsic Euclidean mean often lies outside of treespace. Sturm showed that the intrinsic Frechet mean exists and is unique in treespace. This Frechet mean can be approximated with an iterative algorithm, but bounds on the convergence of the algorithm are not known, and there is no other known pol…
▽ More
The Billera-Holmes-Vogtmann (BHV) space of weighted trees can be embedded in Euclidean space, but the extrinsic Euclidean mean often lies outside of treespace. Sturm showed that the intrinsic Frechet mean exists and is unique in treespace. This Frechet mean can be approximated with an iterative algorithm, but bounds on the convergence of the algorithm are not known, and there is no other known polynomial algorithm for computing the Frechet mean nor even the edges present in the mean. We give the first necessary and sufficient conditions for an edge to be in the Frechet mean. The conditions are in the form of inequalities on the weights of the edges. These conditions provide a pre-processing step for finding the treespace orthant containing the Frechet mean. This work generalizes to orthant spaces.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
Efficient estimation of the maximum metabolic productivity of batch systems
Authors:
Peter C. St. John,
Michael F. Crowley,
Yannick J. Bomble
Abstract:
Production of chemicals from engineered organisms in a batch culture involves an inherent trade-off between productivity, yield, and titer. Existing strategies for strain design typically focus on designing mutations that achieve the highest yield possible while maintaining growth viability. While these methods are computationally tractable, an optimum productivity could be achieved by a dynamic s…
▽ More
Production of chemicals from engineered organisms in a batch culture involves an inherent trade-off between productivity, yield, and titer. Existing strategies for strain design typically focus on designing mutations that achieve the highest yield possible while maintaining growth viability. While these methods are computationally tractable, an optimum productivity could be achieved by a dynamic strategy in which the intracellular division of resources is permitted to change with time. New methods for the design and implementation of dynamic microbial processes, both computational and experimental, have therefore been explored to maximize productivity. However, solving for the optimal metabolic behavior under the assumption that all fluxes in the cell are free to vary is a challenging numerical task. This work presents an efficient method for the calculation of a maximum theoretical productivity of a batch culture system using a dynamic optimization framework. This metric is analogous to the maximum theoretical yield, a measure that is well established in the metabolic engineering literature and whose use helps guide strain and pathway selection. The proposed method follows traditional assumptions of dynamic flux balance analysis: (1) that internal metabolite fluxes are governed by a pseudo-steady state, and (2) that external metabolite fluxes are dynamically bounded. The optimization is achieved via collocation on finite elements, and accounts explicitly for an arbitrary number of flux changes. The method can be further extended to explicitly solve for the trade-off curve between maximum productivity and yield. We demonstrate the method on succinate production in two common microbial hosts, Escherichia coli and Actinobacillus succinogenes, revealing that nearly optimal yields and productivities can be achieved with only two discrete flux stages.
△ Less
Submitted 4 October, 2016;
originally announced October 2016.
-
Deterministic evolution of an asexual population under the action of beneficial and deleterious mutations on additive fitness landscapes
Authors:
Kavita Jain,
Sona John
Abstract:
We study a continuous time model for the frequency distribution of an infinitely large asexual population in which both beneficial and deleterious mutations occur and the fitness is additive. When beneficial mutations are ignored, the exact solution for the frequency distribution is known to be a Poisson distribution. Here we include beneficial mutations and obtain exact expressions for the freque…
▽ More
We study a continuous time model for the frequency distribution of an infinitely large asexual population in which both beneficial and deleterious mutations occur and the fitness is additive. When beneficial mutations are ignored, the exact solution for the frequency distribution is known to be a Poisson distribution. Here we include beneficial mutations and obtain exact expressions for the frequency distribution at all times using an eigenfunction expansion method. We find that the stationary distribution is non-Poissonian and related to the Bessel function of the first kind. We also provide suitable approximations for the stationary distribution and the time to relax to the steady state. Our exact results, especially at mutation-selection equilibrium, can be useful in developing semi-deterministic approaches to understand stochastic evolution.
△ Less
Submitted 31 August, 2016; v1 submitted 13 April, 2016;
originally announced April 2016.
-
On Determining if Tree-based Networks Contain Fixed Trees
Authors:
Maria Anaya,
Olga Anipchenko-Ulaj,
Aisha Ashfaq,
Joyce Chiu,
Mahedi Kaiser,
Max Shoji Ohsawa,
Megan Owen,
Ella Pavlechko,
Katherine St. John,
Shivam Suleria,
Keith Thompson,
Corrine Yap
Abstract:
We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is NP-hard to decide, by reduction from 3-Dimensional Matching (3DM), and further, that the problem is fixed parameter tractable.
We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is NP-hard to decide, by reduction from 3-Dimensional Matching (3DM), and further, that the problem is fixed parameter tractable.
△ Less
Submitted 8 February, 2016;
originally announced February 2016.
-
Exploiting the adaptation dynamics to predict the distribution of beneficial fitness effects
Authors:
Sona John,
Sarada Seetharaman
Abstract:
Adaptation of asexual populations is driven by beneficial mutations and therefore the dynamics of this process, besides other factors, depend on the distribution of beneficial fitness effects. It is known that on uncorrelated fitness landscapes, this distribution can only be of three types: truncated, exponential and power law. We performed extensive stochastic simulations to study the adaptation…
▽ More
Adaptation of asexual populations is driven by beneficial mutations and therefore the dynamics of this process, besides other factors, depend on the distribution of beneficial fitness effects. It is known that on uncorrelated fitness landscapes, this distribution can only be of three types: truncated, exponential and power law. We performed extensive stochastic simulations to study the adaptation dynamics on rugged fitness landscapes, and identified two quantities that can be used to distinguish the underlying distribution of beneficial fitness effects. The first quantity studied here is the fitness difference between successive mutations that spread in the population, which is found to decrease in the case of truncated distributions, remain nearly a constant for exponentially decaying distributions and increase when the fitness distribution decays as a power law. The second quantity of interest, namely, the rate of change of fitness with time also shows quantitatively different behaviour for different beneficial fitness distributions. The patterns displayed by the two aforementioned quantities are found to hold for both low and high mutation rates. We discuss how these patterns can be exploited to determine the distribution of beneficial fitness effects in microbial experiments.
△ Less
Submitted 25 December, 2015; v1 submitted 26 February, 2015;
originally announced March 2015.
-
Bounds on the Expected Size of the Maximum Agreement Subtree
Authors:
Daniel Irving Bernstein,
Lam Si Tung Ho,
Colby Long,
Mike Steel,
Katherine St. John,
Seth Sullivant
Abstract:
We prove polynomial upper and lower bounds on the expected size of the maximum agreement subtree of two random binary phylogenetic trees under both the uniform distribution and Yule-Harding distribution. This positively answers a question posed in earlier work. Determining tight upper and lower bounds remains an open problem.
We prove polynomial upper and lower bounds on the expected size of the maximum agreement subtree of two random binary phylogenetic trees under both the uniform distribution and Yule-Harding distribution. This positively answers a question posed in earlier work. Determining tight upper and lower bounds remains an open problem.
△ Less
Submitted 31 August, 2015; v1 submitted 26 November, 2014;
originally announced November 2014.
-
A Coupled Stochastic Model Explains Differences in Circadian Behavior of Cry1 and Cry2 Knockouts
Authors:
John H. Abel,
Lukas A. Widmer,
Peter C. St. John,
Jörg Stelling,
Francis J. Doyle III
Abstract:
In the mammalian suprachiasmatic nucleus (SCN), a population of noisy cell-autonomous oscillators synchronizes to generate robust circadian rhythms at the organism-level. Within these cells two isoforms of Cryptochrome, Cry1 and Cry2, participate in a negative feedback loop driving circadian rhythmicity. Previous work has shown that single, dissociated SCN neurons respond differently to Cry1 and C…
▽ More
In the mammalian suprachiasmatic nucleus (SCN), a population of noisy cell-autonomous oscillators synchronizes to generate robust circadian rhythms at the organism-level. Within these cells two isoforms of Cryptochrome, Cry1 and Cry2, participate in a negative feedback loop driving circadian rhythmicity. Previous work has shown that single, dissociated SCN neurons respond differently to Cry1 and Cry2 knockouts: Cry1 knockouts are arrhythmic while Cry2 knockouts display more regular rhythms. These differences have led to speculation that CRY1 and CRY2 may play different functional roles in the oscillator. To address this proposition, we have developed a new coupled, stochastic model focused on the Period (Per) and Cry feedback loop, and incorporating intercellular coupling via vasoactive intestinal peptide (VIP). Due to the stochastic nature of molecular oscillations, we demonstrate that single-cell Cry1 knockout oscillations display partially rhythmic behavior, and cannot be classified as simply rhythmic or arrhythmic. Our model demonstrates that intrinsic molecular noise and differences in relative abundance, rather than differing functions, are sufficient to explain the range of rhythmicity encountered in Cry knockouts in the SCN. Our results further highlight the essential role of stochastic behavior in understanding and accurately modeling the circadian network and its response to perturbation.
△ Less
Submitted 22 February, 2015; v1 submitted 17 November, 2014;
originally announced November 2014.
-
Effect of drift, selection and recombination on the equilibrium frequency of deleterious mutations
Authors:
Sona John,
Kavita Jain
Abstract:
We study the stationary state of a population evolving under the action of random genetic drift, selection and recombination in which both deleterious and reverse beneficial mutations can occur. We find that the equilibrium fraction of deleterious mutations decreases as the population size is increased. We calculate exactly the steady state frequency in a nonrecombining population when population…
▽ More
We study the stationary state of a population evolving under the action of random genetic drift, selection and recombination in which both deleterious and reverse beneficial mutations can occur. We find that the equilibrium fraction of deleterious mutations decreases as the population size is increased. We calculate exactly the steady state frequency in a nonrecombining population when population size is infinite and for a neutral finite population, and obtain bounds on the fraction of deleterious mutations. We also find that for small and very large populations, the number of deleterious mutations depends weakly on recombination, but for moderately large populations, recombination alleviates the effect of deleterious mutations. An analytical argument shows that recombination decreases disadvantageous mutations appreciably when beneficial mutations are rare as is the case in adapting microbial populations, whereas it has a moderate effect on codon bias where the mutation rates between the preferred and unpreferred codons are comparable.
△ Less
Submitted 24 February, 2015; v1 submitted 5 August, 2013;
originally announced August 2013.
-
Walks on SPR Neighborhoods
Authors:
Alan Joseph J. Caceres,
Juan Castillo,
Jinnie Lee,
Katherine St. John
Abstract:
A nearest-neighbor-interchange (NNI) walk is a sequence of unrooted phylogenetic trees, T_0, T_1, T_2,... where each consecutive pair of trees differ by a single NNI move. We give tight bounds on the length of the shortest NNI-walks that visit all trees in an subtree-prune-and-regraft (SPR) neighborhood of a given tree. For any unrooted, binary tree, T, on n leaves, the shortest walk takes θ(n^2)…
▽ More
A nearest-neighbor-interchange (NNI) walk is a sequence of unrooted phylogenetic trees, T_0, T_1, T_2,... where each consecutive pair of trees differ by a single NNI move. We give tight bounds on the length of the shortest NNI-walks that visit all trees in an subtree-prune-and-regraft (SPR) neighborhood of a given tree. For any unrooted, binary tree, T, on n leaves, the shortest walk takes θ(n^2) additional steps than the number of trees in the SPR neighborhood. This answers Bryant's Second Combinatorial Conjecture from the Phylogenetics Challenges List, the Isaac Newton Institute, 2011, and the Penny Ante Problem List, 2009.
△ Less
Submitted 5 October, 2011;
originally announced October 2011.
-
The Complexity of Finding Multiple Solutions to Betweenness and Quartet Compatibility
Authors:
Maria Luisa Bonet,
Simone Linz,
Katherine St. John
Abstract:
We show that two important problems that have applications in computational biology are ASP-complete, which implies that, given a solution to a problem, it is NP-complete to decide if another solution exists. We show first that a variation of Betweenness, which is the underlying problem of questions related to radiation hybrid mapping, is ASP-complete. Subsequently, we use that result to show that…
▽ More
We show that two important problems that have applications in computational biology are ASP-complete, which implies that, given a solution to a problem, it is NP-complete to decide if another solution exists. We show first that a variation of Betweenness, which is the underlying problem of questions related to radiation hybrid mapping, is ASP-complete. Subsequently, we use that result to show that Quartet Compatibility, a fundamental problem in phylogenetics that asks whether a set of quartets can be represented by a parent tree, is also ASP-complete. The latter result shows that Steel's \sc Quartet Challenge, which asks whether a solution to Quartet Compatibility is unique, is coNP-complete.
△ Less
Submitted 28 March, 2011; v1 submitted 11 January, 2011;
originally announced January 2011.