-
BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery
Authors:
Peter St. John,
Dejun Lin,
Polina Binder,
Malcolm Greaves,
Vega Shah,
John St. John,
Adrian Lange,
Patrick Hsu,
Rajesh Illango,
Arvind Ramanathan,
Anima Anandkumar,
David H Brookes,
Akosua Busia,
Abhishaike Mahajan,
Stephen Malina,
Neha Prasad,
Sam Sinai,
Lindsay Edwards,
Thomas Gaudelet,
Cristian Regep,
Martin Steinegger,
Burkhard Rost,
Alexander Brace,
Kyle Hippe,
Luca Naef
, et al. (63 additional authors not shown)
Abstract:
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational bio…
▽ More
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
A mathematical framework for predicting lifestyles of viral pathogens
Authors:
Alexander Lange
Abstract:
Despite being similar in structure, functioning, and size viral pathogens enjoy very different mostly well-defined ways of life. They occupy their hosts for a few days (influenza), for a few weeks (measles), or even lifelong (HCV), which manifests in acute or chronic infections. The various transmission routes (airborne, via direct contact, etc.), degrees of infectiousness (referring to the load r…
▽ More
Despite being similar in structure, functioning, and size viral pathogens enjoy very different mostly well-defined ways of life. They occupy their hosts for a few days (influenza), for a few weeks (measles), or even lifelong (HCV), which manifests in acute or chronic infections. The various transmission routes (airborne, via direct contact, etc.), degrees of infectiousness (referring to the load required for transmission), antigenic variation/immune escape and virulence define further pathogenic lifestyles. To survive pathogens must infect new hosts; the success determines their fitness. Infection happens with a certain likelihood during contact of hosts, where contact can also be mediated by vectors. Besides structural aspects of the host-contact network, three parameters/concepts appear to be key: the contact rate and the infectiousness during contact, which encode the mode of transmission, and third the immunity of susceptible hosts. From here, what can be concluded about the evolutionary strategies of viral pathogens? This is the biological question addressed in this paper. The answer extends earlier results (Lange & Ferguson 2009, PLoS Comput Biol 5 (10): e1000536) and makes explicit connection to another basic work on the evolution of pathogens (Grenfell et al. 2004, Science 303: 327-332). A mathematical framework is presented that models intra- and inter-host dynamics in a minimalistic but unified fashion covering a broad spectrum of viral pathogens, including those that cause flu-like infections, childhood diseases, and sexually transmitted infections. These pathogens turn out as local maxima of the fitness landscape. The models involve differential- and integral equations, agent-based simulation, networks, and probability.
△ Less
Submitted 15 October, 2017;
originally announced October 2017.
-
Quantitative Comparison of Abundance Structures of Generalized Communities: From B-Cell Receptor Repertoires to Microbiomes
Authors:
Mohammadkarim Saeedghalati,
Farnoush Farahpour,
Bettina Budeus,
Anja Lange,
Astrid M. Westendorf,
Marc Seifert,
Ralf Küppers,
Daniel Hoffmann
Abstract:
The \emph{community}, the assemblage of organisms co-existing in a given space and time, has the potential to become one of the unifying concepts of biology, especially with the advent of high-throughput sequencing experiments that reveal genetic diversity exhaustively. In this spirit we show that a tool from community ecology, the Rank Abundance Distribution (RAD), can be turned by the new MaxRan…
▽ More
The \emph{community}, the assemblage of organisms co-existing in a given space and time, has the potential to become one of the unifying concepts of biology, especially with the advent of high-throughput sequencing experiments that reveal genetic diversity exhaustively. In this spirit we show that a tool from community ecology, the Rank Abundance Distribution (RAD), can be turned by the new MaxRank normalization method into a generic, expressive descriptor for quantitative comparison of communities in many areas of biology. To illustrate the versatility of the method, we analyze RADs from various \emph{generalized communities}, i.e.\ assemblages of genetically diverse cells or organisms, including human B cells, gut microbiomes under antibiotic treatment and of different ages and countries of origin, and other human and environmental microbial communities. We show that normalized RADs enable novel quantitative approaches that help to understand structures and dynamics of complex generalize communities.
△ Less
Submitted 12 December, 2016;
originally announced December 2016.
-
Reconstruction of disease transmission rates: applications to measles, dengue, and influenza
Authors:
Alexander Lange
Abstract:
Transmission rates are key in understanding the spread of infectious diseases. Using the framework of compartmental models, we introduce a simple method that enables us to reconstruct time series of transmission rates directly from incidence or disease-related mortality data. The reconstruction exploits differential equations, which model the time evolution of infective stages and strains. Being s…
▽ More
Transmission rates are key in understanding the spread of infectious diseases. Using the framework of compartmental models, we introduce a simple method that enables us to reconstruct time series of transmission rates directly from incidence or disease-related mortality data. The reconstruction exploits differential equations, which model the time evolution of infective stages and strains. Being sensitive to initial values, the method produces asymptotically correct solutions. The computations are fast, with time complexity being quadratic. We apply the reconstruction to data of measles (England and Wales, 1948-67), dengue (Thailand, 1982-99), and influenza (U.S., 1910-27). The Measles example offers comparison with earlier work. Here we re-investigate reporting corrections, include and exclude demographic information. The dengue example deals with the failure of vector-control measures in reducing dengue hemorrhagic fever (DHF) in Thailand. Two competing mechanisms have been held responsible: strain interaction and demographic transitions. Our reconstruction reveals that both explanations are possible, showing that the increase in DHF cases is consistent with decreasing transmission rates resulting from reduced vector counts. The flu example focuses on the 1918/19 pandemic, examining the transmission rate evolution for an invading strain. Our analysis indicates that the pandemic strain could have circulated in the population for many months before the pandemic was initiated by an event of highly increased transmission.
△ Less
Submitted 4 August, 2014;
originally announced August 2014.