Search | arXiv e-print repository

Ordered Leaf Attachment (OLA) Vectors can Identify Reticulation Events even in Multifurcated Trees

Authors: Alexey Markin, Tavis K. Anderson

Abstract: Recently, a new vector encoding, Ordered Leaf Attachment (OLA), was introduced that represents $n$-leaf phylogenetic trees as $n-1$ length integer vectors by recording the placement location of each leaf. Both encoding and decoding of trees run in linear time and depend on a fixed ordering of the leaves. Here, we investigate the connection between OLA vectors and the maximum acyclic agreement fore… ▽ More Recently, a new vector encoding, Ordered Leaf Attachment (OLA), was introduced that represents $n$-leaf phylogenetic trees as $n-1$ length integer vectors by recording the placement location of each leaf. Both encoding and decoding of trees run in linear time and depend on a fixed ordering of the leaves. Here, we investigate the connection between OLA vectors and the maximum acyclic agreement forest (MAAF) problem. A MAAF represents an optimal breakdown of $k$ trees into reticulation-free subtrees, with the roots of these subtrees representing reticulation events. We introduce a corrected OLA distance index over OLA vectors of $k$ trees, which is easily computable in linear time. We prove that the corrected OLA distance corresponds to the size of a MAAF, given an optimal leaf ordering that minimizes that distance. Additionally, a MAAF can be easily reconstructed from optimal OLA vectors. We expand these results to multifurcated trees: we introduce an $O(kn \cdot m\log m)$ algorithm that optimally resolves a set of multifurcated trees given a leaf-ordering, where $m$ is the size of a largest multifurcation, and show that trees resolved via this algorithm also minimize the size of a MAAF. These results suggest a new approach to fast computation of phylogenetic networks and identification of reticulation events via random permutations of leaves. Additionally, in the case of microbial evolution, a natural ordering of leaves is often given by the sample collection date, which means that under mild assumptions, reticulation events can be identified in polynomial time on such datasets. △ Less

Submitted 19 September, 2025; originally announced September 2025.

Comments: 18 pages, 4 figures

MSC Class: 05C05; 68R10; 92B10 ACM Class: F.2.2; G.2.1; G.2.2

arXiv:2505.02750 [pdf, ps, other]

A CRISP approach to QSP: XAI enabling fit-for-purpose models

Authors: Noah DeTal, Christian N. K. Anderson, Mark K. Transtrum

Abstract: Quantitative Systems Pharmacology (QSP) promises to accelerate drug development, enable personalized medicine, and improve the predictability of clinical outcomes. Realizing this potential requires effectively managing the complexity of mathematical models representing biological systems. Here, we present and validate a novel QSP workflow--CRISP (Contextualized Reduction for Identifiability and Sc… ▽ More Quantitative Systems Pharmacology (QSP) promises to accelerate drug development, enable personalized medicine, and improve the predictability of clinical outcomes. Realizing this potential requires effectively managing the complexity of mathematical models representing biological systems. Here, we present and validate a novel QSP workflow--CRISP (Contextualized Reduction for Identifiability and Scientific Precision)--that addresses a central challenge in QSP: the problem of complexity and over-parameterization, in which models contain irrelevant parameters that obscure interpretation and hinder predictive reliability. The CRISP workflow begins with a literature-derived model, constructed to be comprehensive and unbiased by integrating prior mechanistic insights. At the core of the workflow is the Manifold Boundary Approximation Method (MBAM), a reduction technique that simplifies models while preserving mechanistic structure and predictive fidelity. By applying MBAM in a context-specific manner, CRISP links parsimonious models directly to predictions of interest, clarifying causal structure and enhancing interpretability. The resulting models are computationally efficient and well-suited to key QSP tasks, including virtual population generation, experimental design, toxicology, and target discovery. We demonstrate the utility of CRISP on case studies involving the coagulation cascade and SHIV infection, and identify promising directions for improving the efficacy of bNAb therapies for HIV. Together, these results establish CRISP as a general-purpose QSP workflow for turning complex mechanistic models into tools for precise scientific reasoning to guide pharmacological and regulatory decision-making. △ Less

Submitted 6 June, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

arXiv:2503.11900 [pdf, other]

Heterogeneous graph neural networks for species distribution modeling

Authors: Lauren Harrell, Christine Kaeser-Chen, Burcu Karagol Ayan, Keith Anderson, Michelangelo Conserva, Elise Kleeman, Maxim Neumann, Matt Overlan, Melissa Chapman, Drew Purves

Abstract: Species distribution models (SDMs) are necessary for measuring and predicting occurrences and habitat suitability of species and their relationship with environmental factors. We introduce a novel presence-only SDM with graph neural networks (GNN). In our model, species and locations are treated as two distinct node sets, and the learning task is predicting detection records as the edges that conn… ▽ More Species distribution models (SDMs) are necessary for measuring and predicting occurrences and habitat suitability of species and their relationship with environmental factors. We introduce a novel presence-only SDM with graph neural networks (GNN). In our model, species and locations are treated as two distinct node sets, and the learning task is predicting detection records as the edges that connect locations to species. Using GNN for SDM allows us to model fine-grained interactions between species and the environment. We evaluate the potential of this methodology on the six-region dataset compiled by National Center for Ecological Analysis and Synthesis (NCEAS) for benchmarking SDMs. For each of the regions, the heterogeneous GNN model is comparable to or outperforms previously-benchmarked single-species SDMs as well as a feed-forward neural network baseline model. △ Less

Submitted 14 May, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

Comments: 13 pages, 3 figures,

MSC Class: 92B20 (Primary) 68T07; 92D40 (Secondary) ACM Class: I.2.1; J.3

Journal ref: ICLR 2025 Workshop on Tackling Climate Change with Machine Learning

arXiv:2403.13098 [pdf, other]

Interspecific dispersal constraints suppress pattern formation in metacommunities

Authors: Patrick Lawton, Ashkaan K. Fahimipour, Kurt E. Anderson

Abstract: Decisions to disperse from a habitat stand out among organismal behaviors as pivotal drivers of ecosystem dynamics across scales. Encounters with other species are an important component of adaptive decision-making in dispersal, resulting in widespread behaviors like tracking resources or avoiding consumers in space. Despite this, metacommunity models often treat dispersal as a function of intrasp… ▽ More Decisions to disperse from a habitat stand out among organismal behaviors as pivotal drivers of ecosystem dynamics across scales. Encounters with other species are an important component of adaptive decision-making in dispersal, resulting in widespread behaviors like tracking resources or avoiding consumers in space. Despite this, metacommunity models often treat dispersal as a function of intraspecific density alone. We show, focusing initially on three-species network motifs, that interspecific dispersal rules generally drive a transition in metacommunities from homogeneous steady states to self-organized heterogeneous spatial patterns. However, when ecologically realistic constraints reflecting adaptive behaviors are imposed -- prey tracking and predator avoidance -- a pronounced homogenizing effect emerges where spatial pattern formation is suppressed. We demonstrate this effect for each motif by computing master stability functions that separate the contributions of local and spatial interactions to pattern formation. We extend this result to species rich food webs using a random matrix approach, where we find that eventually webs become large enough to override the homogenizing effect of adaptive dispersal behaviors, leading once again to predominately pattern forming dynamics. Our results emphasize the critical role of interspecific dispersal rules in shaping spatial patterns across landscapes, highlighting the need to incorporate adaptive behavioral constraints in efforts to link local species interactions and metacommunity structure. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:1911.06105 [pdf, other]

PharML.Bind: Pharmacologic Machine Learning for Protein-Ligand Interactions

Authors: Aaron D. Vose, Jacob Balma, Damon Farnsworth, Kaylie Anderson, Yuri K. Peterson

Abstract: Is it feasible to create an analysis paradigm that can analyze and then accurately and quickly predict known drugs from experimental data? PharML.Bind is a machine learning toolkit which is able to accomplish this feat. Utilizing deep neural networks and big data, PharML.Bind correlates experimentally-derived drug affinities and protein-ligand X-ray structures to create novel predictions. The util… ▽ More Is it feasible to create an analysis paradigm that can analyze and then accurately and quickly predict known drugs from experimental data? PharML.Bind is a machine learning toolkit which is able to accomplish this feat. Utilizing deep neural networks and big data, PharML.Bind correlates experimentally-derived drug affinities and protein-ligand X-ray structures to create novel predictions. The utility of PharML.Bind is in its application as a rapid, accurate, and robust prediction platform for discovery and personalized medicine. This paper demonstrates that graph neural networks (GNNs) can be trained to screen hundreds of thousands of compounds against thousands of targets in minutes, a vastly shorter time than previous approaches. This manuscript presents results from training and testing using the entirety of BindingDB after cleaning; this includes a test set with 19,708 X-ray structures and 247,633 drugs, leading to 2,708,151 unique protein-ligand pairings. PharML.Bind achieves a prodigious 98.3% accuracy on this test set in under 25 minutes. PharML.Bind is premised on the following key principles: 1) speed and a high enrichment factor per unit compute time, provided by high-quality training data combined with a novel GNN architecture and use of high-performance computing resources, 2) the ability to generalize to proteins and drugs outside of the training set, including those with unknown active sites, through the use of an active-site-agnostic GNN mapping, and 3) the ability to be easily integrated as a component of increasingly-complex prediction and analysis pipelines. PharML.Bind represents a timely and practical approach to leverage the power of machine learning to efficiently analyze and predict drug action on any practical scale and will provide utility in a variety of discovery and medical applications. △ Less

Submitted 23 October, 2019; originally announced November 2019.

arXiv:1801.08116 [pdf, other]

Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents

Authors: Joel Z. Leibo, Cyprien de Masson d'Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, Manuel Sanchez, Simon Green, Audrunas Gruslys, Shane Legg, Demis Hassabis, Matthew M. Botvinick

Abstract: Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artificial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implem… ▽ More Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artificial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implementations of several classical experimental paradigms including visual search, change detection, random dot motion discrimination, and multiple object tracking. We also contribute a study of the visual psychophysics of a specific state-of-the-art deep reinforcement learning agent: UNREAL (Jaderberg et al. 2016). This study leads to the surprising conclusion that UNREAL learns more quickly about larger target stimuli than it does about smaller stimuli. In turn, this insight motivates a specific improvement in the form of a simple model of foveal vision that turns out to significantly boost UNREAL's performance, both on Psychlab tasks, and on standard DeepMind Lab tasks. By open-sourcing Psychlab we hope to facilitate a range of future such studies that simultaneously advance deep reinforcement learning and improve its links with cognitive science. △ Less

Submitted 4 February, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

Comments: 28 pages, 11 figures

Showing 1–6 of 6 results for author: Anderson, K