-
Topological Sequence Analysis of Genomes: Delta Complex approaches
Authors:
Jian Liu,
Li Shen,
Dong Chen,
Guo-Wei Wei
Abstract:
Algebraic topology has been widely applied to point cloud data to capture geometric shapes and topological structures. However, its application to genome sequence analysis remains rare. In this work, we propose topological sequence analysis (TSA) techniques by constructing $Δ$-complexes and classifying spaces, leading to persistent homology, and persistent path homology on genome sequences. We als…
▽ More
Algebraic topology has been widely applied to point cloud data to capture geometric shapes and topological structures. However, its application to genome sequence analysis remains rare. In this work, we propose topological sequence analysis (TSA) techniques by constructing $Δ$-complexes and classifying spaces, leading to persistent homology, and persistent path homology on genome sequences. We also develop $Δ$-complex-based persistent Laplacians to facilitate the topological spectral analysis of genome sequences. Finally, we demonstrate the utility of the proposed TSA approaches in phylogenetic analysis using Ebola virus sequences and whole bacterial genomes. The present TSA methods are more efficient than earlier TSA model, k-mer topology, and thus have a potential to be applied to other time-consuming sequential data analyses, such as those in linguistics, literature, music, media, and social contexts.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Enhanced Sampling, Public Dataset and Generative Model for Drug-Protein Dissociation Dynamics
Authors:
Maodong Li,
Jiying Zhang,
Bin Feng,
Wenqi Zeng,
Dechin Chen,
Zhijun Pan,
Yu Li,
Zijing Liu,
Yi Isaac Yang
Abstract:
Drug-protein binding and dissociation dynamics are fundamental to understanding molecular interactions in biological systems. While many tools for drug-protein interaction studies have emerged, especially artificial intelligence (AI)-based generative models, predictive tools on binding/dissociation kinetics and dynamics are still limited. We propose a novel research paradigm that combines molecula…
▽ More
Drug-protein binding and dissociation dynamics are fundamental to understanding molecular interactions in biological systems. While many tools for drug-protein interaction studies have emerged, especially artificial intelligence (AI)-based generative models, predictive tools on binding/dissociation kinetics and dynamics are still limited. We propose a novel research paradigm that combines molecular dynamics (MD) simulations, enhanced sampling, and AI generative models to address this issue. We propose an enhanced sampling strategy to efficiently implement the drug-protein dissociation process in MD simulations and estimate the free energy surface (FES). We constructed a program pipeline of MD simulations based on this sampling strategy, thus generating a dataset including 26,612 drug-protein dissociation trajectories containing about 13 million frames. We named this dissociation dynamics dataset DD-13M and used it to train a deep equivariant generative model UnbindingFlow, which can generate collision-free dissociation trajectories. The DD-13M database and UnbindingFlow model represent a significant advancement in computational structural biology, and we anticipate its broad applicability in machine learning studies of drug-protein interactions. Our ongoing efforts focus on expanding this methodology to encompass a broader spectrum of drug-protein complexes and exploring novel applications in pathway prediction.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
ST-FlowNet: An Efficient Spiking Neural Network for Event-Based Optical Flow Estimation
Authors:
Hongze Sun,
Jun Wang,
Wuque Cai,
Duo Chen,
Qianqian Liao,
Jiayi He,
Yan Cui,
Dezhong Yao,
Daqing Guo
Abstract:
Spiking Neural Networks (SNNs) have emerged as a promising tool for event-based optical flow estimation tasks due to their ability to leverage spatio-temporal information and low-power capabilities. However, the performance of SNN models is often constrained, limiting their application in real-world scenarios. In this work, we address this gap by proposing a novel neural network architecture, ST-F…
▽ More
Spiking Neural Networks (SNNs) have emerged as a promising tool for event-based optical flow estimation tasks due to their ability to leverage spatio-temporal information and low-power capabilities. However, the performance of SNN models is often constrained, limiting their application in real-world scenarios. In this work, we address this gap by proposing a novel neural network architecture, ST-FlowNet, specifically tailored for optical flow estimation from event-based data. The ST-FlowNet architecture integrates ConvGRU modules to facilitate cross-modal feature augmentation and temporal alignment of the predicted optical flow, improving the network's ability to capture complex motion dynamics. Additionally, to overcome the challenges associated with training SNNs, we introduce a novel approach to derive SNN models from pre-trained artificial neural networks (ANNs) through ANN-to-SNN conversion or our proposed BISNN method. Notably, the BISNN method alleviates the complexities involved in biological parameter selection, further enhancing the robustness of SNNs in optical flow estimation tasks. Extensive evaluations on three benchmark event-based datasets demonstrate that the SNN-based ST-FlowNet model outperforms state-of-the-art methods, delivering superior performance in accurate optical flow estimation across a diverse range of dynamic visual scenes. Furthermore, the inherent energy efficiency of SNN models is highlighted, establishing a compelling advantage for their practical deployment. Overall, our work presents a novel framework for optical flow estimation using SNNs and event-based data, contributing to the advancement of neuromorphic vision applications.
△ Less
Submitted 27 April, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
Artificial Intelligence Approaches for Anti-Addiction Drug Discovery
Authors:
Dong Chen,
Jian Jiang,
Zhe Su,
Guo-Wei Wei
Abstract:
Drug addiction is a complex and pervasive global challenge that continues to pose significant public health concerns. Traditional approaches to anti-addiction drug discovery have struggled to deliver effective therapeutics, facing high attrition rates, long development timelines, and inefficiencies in processing large-scale data. Artificial intelligence (AI) has emerged as a transformative solutio…
▽ More
Drug addiction is a complex and pervasive global challenge that continues to pose significant public health concerns. Traditional approaches to anti-addiction drug discovery have struggled to deliver effective therapeutics, facing high attrition rates, long development timelines, and inefficiencies in processing large-scale data. Artificial intelligence (AI) has emerged as a transformative solution to address these issues. Using advanced algorithms, AI is revolutionizing drug discovery by enhancing the speed and precision of key processes. This review explores the transformative role of AI in the pipeline for anti-addiction drug discovery, including data collection, target identification, and compound optimization. By highlighting the potential of AI to overcome traditional barriers, this review systematically examines how AI addresses critical gaps in anti-addiction research, emphasizing its potential to revolutionize drug discovery and development, overcome challenges, and advance more effective therapeutic strategies.
△ Less
Submitted 10 February, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Category-Specific Topological Learning of Metal-Organic Frameworks
Authors:
Dong Chen,
Chun-Long Chen,
Guo-Wei Wei
Abstract:
Metal-organic frameworks (MOFs) are porous, crystalline materials with high surface area, adjustable porosity, and structural tunability, making them ideal for diverse applications. However, traditional experimental and computational methods have limited scalability and interpretability, hindering effective exploration of MOF structure-property relationships. To address these challenges, we introd…
▽ More
Metal-organic frameworks (MOFs) are porous, crystalline materials with high surface area, adjustable porosity, and structural tunability, making them ideal for diverse applications. However, traditional experimental and computational methods have limited scalability and interpretability, hindering effective exploration of MOF structure-property relationships. To address these challenges, we introduce, for the first time, a category-specific topological learning (CSTL), which combines algebraic topology with chemical insights for robust property prediction. The model represents MOF structures as simplicial complexes and incorporates elemental categorizations to enable balanced, interpretable machine learning study. By integrating category-specific persistent homology, CSTL captures both global and local structural characteristics, rendering multi-dimensional, category-specific descriptors that support a predictive model with high accuracy and robustness across eight MOF datasets, outperforming all previous results. This alignment of topological and chemical features enhances the predictive power and interpretability of CSTL, advancing understanding of structure-property relationships of MOFs and promoting efficient material discovery.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
Celcomen: spatial causal disentanglement for single-cell and tissue perturbation modeling
Authors:
Stathis Megas,
Daniel G. Chen,
Krzysztof Polanski,
Moshe Eliasof,
Carola-Bibiane Schonlieb,
Sarah A. Teichmann
Abstract:
Celcomen leverages a mathematical causality framework to disentangle intra- and inter- cellular gene regulation programs in spatial transcriptomics and single-cell data through a generative graph neural network. It can learn gene-gene interactions, as well as generate post-perturbation counterfactual spatial transcriptomics, thereby offering access to experimentally inaccessible samples. We valida…
▽ More
Celcomen leverages a mathematical causality framework to disentangle intra- and inter- cellular gene regulation programs in spatial transcriptomics and single-cell data through a generative graph neural network. It can learn gene-gene interactions, as well as generate post-perturbation counterfactual spatial transcriptomics, thereby offering access to experimentally inaccessible samples. We validated its disentanglement, identifiability, and counterfactual prediction capabilities through simulations and in clinically relevant human glioblastoma, human fetal spleen, and mouse lung cancer samples. Celcomen provides the means to model disease and therapy induced changes allowing for new insights into single-cell spatially resolved tissue responses relevant to human health.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Beyond Efficiency: Molecular Data Pruning for Enhanced Generalization
Authors:
Dingshuo Chen,
Zhixun Li,
Yuyan Ni,
Guibin Zhang,
Ding Wang,
Qiang Liu,
Shu Wu,
Jeffrey Xu Yu,
Liang Wang
Abstract:
With the emergence of various molecular tasks and massive datasets, how to perform efficient training has become an urgent yet under-explored issue in the area. Data pruning (DP), as an oft-stated approach to saving training burdens, filters out less influential samples to form a coreset for training. However, the increasing reliance on pretrained models for molecular tasks renders traditional in-…
▽ More
With the emergence of various molecular tasks and massive datasets, how to perform efficient training has become an urgent yet under-explored issue in the area. Data pruning (DP), as an oft-stated approach to saving training burdens, filters out less influential samples to form a coreset for training. However, the increasing reliance on pretrained models for molecular tasks renders traditional in-domain DP methods incompatible. Therefore, we propose a Molecular data Pruning framework for enhanced Generalization (MolPeg), which focuses on the source-free data pruning scenario, where data pruning is applied with pretrained models. By maintaining two models with different updating paces during training, we introduce a novel scoring function to measure the informativeness of samples based on the loss discrepancy. As a plug-and-play framework, MolPeg realizes the perception of both source and target domain and consistently outperforms existing DP methods across four downstream tasks. Remarkably, it can surpass the performance obtained from full-dataset training, even when pruning up to 60-70% of the data on HIV and PCBA dataset. Our work suggests that the discovery of effective data-pruning metrics could provide a viable path to both enhanced efficiency and superior generalization in transfer learning.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Path-GPTOmic: A Balanced Multi-modal Learning Framework for Survival Outcome Prediction
Authors:
Hongxiao Wang,
Yang Yang,
Zhuo Zhao,
Pengfei Gu,
Nishchal Sapkota,
Danny Z. Chen
Abstract:
For predicting cancer survival outcomes, standard approaches in clinical research are often based on two main modalities: pathology images for observing cell morphology features, and genomic (e.g., bulk RNA-seq) for quantifying gene expressions. However, existing pathology-genomic multi-modal algorithms face significant challenges: (1) Valuable biological insights regarding genes and gene-gene int…
▽ More
For predicting cancer survival outcomes, standard approaches in clinical research are often based on two main modalities: pathology images for observing cell morphology features, and genomic (e.g., bulk RNA-seq) for quantifying gene expressions. However, existing pathology-genomic multi-modal algorithms face significant challenges: (1) Valuable biological insights regarding genes and gene-gene interactions are frequently overlooked; (2) one modality often dominates the optimization process, causing inadequate training for the other modality. In this paper, we introduce a new multi-modal ``Path-GPTOmic" framework for cancer survival outcome prediction. First, to extract valuable biological insights, we regulate the embedding space of a foundation model, scGPT, initially trained on single-cell RNA-seq data, making it adaptable for bulk RNA-seq data. Second, to address the imbalance-between-modalities problem, we propose a gradient modulation mechanism tailored to the Cox partial likelihood loss for survival prediction. The contributions of the modalities are dynamically monitored and adjusted during the training process, encouraging that both modalities are sufficiently trained. Evaluated on two TCGA(The Cancer Genome Atlas) datasets, our model achieves substantially improved survival prediction accuracy.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Drug Resistance Predictions Based on a Directed Flag Transformer
Authors:
Dong Chen,
Gengzhuo Liu,
Hongyan Du,
Benjamin Jones,
Junjie Wee,
Rui Wang,
Jiahui Chen,
Jana Shen,
Guo-Wei Wei
Abstract:
The continuous evolution of the SARS-CoV-2 virus poses a significant challenge to global public health. Of particular concern is the potential resistance to the widely prescribed drug PAXLOVID, of which the main ingredient nirmatrelvir inhibits the viral main protease (Mpro). Here, we developed CAPTURE (direCted flAg laPlacian Transformer for drUg Resistance prEdictions) to analyze the effects of…
▽ More
The continuous evolution of the SARS-CoV-2 virus poses a significant challenge to global public health. Of particular concern is the potential resistance to the widely prescribed drug PAXLOVID, of which the main ingredient nirmatrelvir inhibits the viral main protease (Mpro). Here, we developed CAPTURE (direCted flAg laPlacian Transformer for drUg Resistance prEdictions) to analyze the effects of Mpro mutations on nirmatrelvir-Mpro binding affinities and identify potential drug-resistant mutations. CAPTURE combines a comprehensive mutation analysis with a resistance prediction module based on DFFormer-seq, which is a novel ensemble model that leverages a new Directed Flag Transformer and sequence embeddings from the protein and small-molecule-large-language models. Our analysis of the evolution of Mpro mutations revealed a progressive increase in mutation frequencies for residues near the binding site between May and December 2022, suggesting that the widespread use of PAXLOVID created a selective pressure that accelerated the evolution of drug-resistant variants. Applied to mutations at the nirmatrelvir-Mpro binding site, CAPTURE identified several potential resistance mutations, including H172Y and F140L, which have been experimentally confirmed, as well as five other mutations that await experimental verification. CAPTURE evaluation in a limited experimental data set on Mpro mutants gives a recall of 57\% and a precision of 71\% for predicting potential drug-resistant mutations. Our work establishes a powerful new framework for predicting drug-resistant mutations and real-time viral surveillance. The insights also guide the rational design of more resilient next-generation therapeutics.
△ Less
Submitted 16 January, 2025; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Endowing Protein Language Models with Structural Knowledge
Authors:
Dexiong Chen,
Philip Hartout,
Paolo Pellizzoni,
Carlos Oliver,
Karsten Borgwardt
Abstract:
Understanding the relationships between protein sequence, structure and function is a long-standing biological challenge with manifold implications from drug design to our understanding of evolution. Recently, protein language models have emerged as the preferred method for this challenge, thanks to their ability to harness large sequence databases. Yet, their reliance on expansive sequence data a…
▽ More
Understanding the relationships between protein sequence, structure and function is a long-standing biological challenge with manifold implications from drug design to our understanding of evolution. Recently, protein language models have emerged as the preferred method for this challenge, thanks to their ability to harness large sequence databases. Yet, their reliance on expansive sequence data and parameter sets limits their flexibility and practicality in real-world scenarios. Concurrently, the recent surge in computationally predicted protein structures unlocks new opportunities in protein representation learning. While promising, the computational burden carried by such complex data still hinders widely-adopted practical applications. To address these limitations, we introduce a novel framework that enhances protein language models by integrating protein structural data. Drawing from recent advances in graph transformers, our approach refines the self-attention mechanisms of pretrained language transformers by integrating structural information with structure extractor modules. This refined model, termed Protein Structure Transformer (PST), is further pretrained on a small protein structure database, using the same masked language modeling objective as traditional protein language models. Empirical evaluations of PST demonstrate its superior parameter efficiency relative to protein language models, despite being pretrained on a dataset comprising only 542K structures. Notably, PST consistently outperforms the state-of-the-art foundation model for protein sequences, ESM-2, setting a new benchmark in protein function prediction. Our findings underscore the potential of integrating structural information into protein language models, paving the way for more effective and efficient protein modeling Code and pretrained models are available at https://github.com/BorgwardtLab/PST.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
Authors:
Nianzu Yang,
Kaipeng Zeng,
Haotian Lu,
Yexin Wu,
Zexin Yuan,
Danni Chen,
Shengdian Jiang,
Jiaxiang Wu,
Yimin Wang,
Junchi Yan
Abstract:
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphV…
▽ More
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation. Specifically, MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure. During each layer generation, MorphGrower utilizes a pair of sibling branches as the basic generation block and generates branch pairs synchronously. This approach ensures topological validity and allows for fine-grained generation, thereby enhancing the realism of the final generated morphologies. Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin. Importantly, the electrophysiological response simulation demonstrates the plausibility of our generated samples from a neuroscience perspective. Our code is available at https://github.com/Thinklab-SJTU/MorphGrower.
△ Less
Submitted 27 May, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Current and future directions in network biology
Authors:
Marinka Zitnik,
Michelle M. Li,
Aydin Wells,
Kimberly Glass,
Deisy Morselli Gysi,
Arjun Krishnan,
T. M. Murali,
Predrag Radivojac,
Sushmita Roy,
Anaïs Baudot,
Serdar Bozdag,
Danny Z. Chen,
Lenore Cowen,
Kapil Devkota,
Anthony Gitter,
Sara Gosline,
Pengfei Gu,
Pietro H. Guzzi,
Heng Huang,
Meng Jiang,
Ziynet Nesibe Kesimoglu,
Mehmet Koyuturk,
Jian Ma,
Alexander R. Pico,
Nataša Pržulj
, et al. (12 additional authors not shown)
Abstract:
Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These challenges stem from various fa…
▽ More
Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These challenges stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology and highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on the future directions of network biology. Additionally, we offer insights into scientific communities, educational initiatives, and the importance of fostering diversity within the field. This paper establishes a roadmap for an immediate and long-term vision for network biology.
△ Less
Submitted 11 June, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Evidence for Reduced Sensory Precision and Increased Reliance on Priors in Hallucination-Prone Individuals in a General Population Sample
Authors:
David Benrimoh,
Victoria L. Fisher,
Rashina Seabury,
Ely Sibarium,
Catalina Mourgues,
Doris Chen,
Albert Powers
Abstract:
There is increasing evidence that people with hallucinations overweight perceptual beliefs relative to incoming sensory evidence. Much past work demonstrating prior overweighting has used simple, non-linguistic stimuli. However, auditory hallucinations in psychosis are often complex and linguistic. There may be an interaction between the type of auditory information being processed and its perceiv…
▽ More
There is increasing evidence that people with hallucinations overweight perceptual beliefs relative to incoming sensory evidence. Much past work demonstrating prior overweighting has used simple, non-linguistic stimuli. However, auditory hallucinations in psychosis are often complex and linguistic. There may be an interaction between the type of auditory information being processed and its perceived quality in engendering hallucinations. We administered a linguistic version of the Conditioned Hallucinations (CH) task to an online sample of 88 general population participants. Metrics related to hallucination-proneness, recent auditory hallucinations, stimulus thresholds, and stimulus detection were collected; data was used to fit parameters of a Hierarchical Gaussian Filter model of perceptual inference to determine how latent perceptual states influenced task behavior. Replicating past results, higher CH rates were associated with measures of higher hallucination-proneness and recent hallucinatory experiences; CH rates were positively correlated with increased prior weighting; and increased prior weighting was related to recent hallucinatory experiences. Unlike past results, participants with recent hallucinatory experiences as well as those with higher hallucination-proneness had higher stimulus thresholds, lower sensitivity to stimuli presented at the highest threshold, and tended to have lower response confidence, consistent with lower precision of sensory evidence. We show that hallucination-prone individuals in the general population have increased conditioned hallucination rates using a linguistic version of the CH task, and replicated the finding that increased CH rates and recent hallucinations correlate with increased prior weighting. Results support a role for reduced sensory precision in the interplay between prior weighting and hallucination-proneness. *contributed equally
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Explicitly Solvable Continuous-time Inference for Partially Observed Markov Processes
Authors:
Daniel Chen,
Alexander G. Strang,
Andrew W. Eckford,
Peter J. Thomas
Abstract:
Many natural and engineered systems can be modeled as discrete state Markov processes. Often, only a subset of states are directly observable. Inferring the conditional probability that a system occupies a particular hidden state, given the partial observation, is a problem with broad application. In this paper, we introduce a continuous-time formulation of the sum-product algorithm, which is a we…
▽ More
Many natural and engineered systems can be modeled as discrete state Markov processes. Often, only a subset of states are directly observable. Inferring the conditional probability that a system occupies a particular hidden state, given the partial observation, is a problem with broad application. In this paper, we introduce a continuous-time formulation of the sum-product algorithm, which is a well-known discrete-time method for finding the hidden states' conditional probabilities, given a set of finite, discrete-time observations. From our new formulation, we can explicitly solve for the conditional probability of occupying any state, given the transition rates and observations within a finite time window. We apply our algorithm to a realistic model of the cystic fibrosis transmembrane conductance regulator (CFTR) protein for exact inference of the conditional occupancy probability, given a finite time series of partial observations.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
Improving Molecular Pretraining with Complementary Featurizations
Authors:
Yanqiao Zhu,
Dingshuo Chen,
Yuanqi Du,
Yingze Wang,
Qiang Liu,
Shu Wu
Abstract:
Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular fe…
▽ More
Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular featurizations with their corresponding neural architectures in molecular pretraining remains largely unexamined. In this paper, through two case studies -- chirality classification and aromatic ring counting -- we first demonstrate that different featurization techniques convey chemical information differently. In light of this observation, we propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO). MOCO comprehensively leverages multiple featurizations that complement each other and outperforms existing state-of-the-art models that solely relies on one or two featurizations on a wide range of molecular property prediction tasks.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
GeoTyper: Automated Pipeline from Raw scRNA-Seq Data to Cell Type Identification
Authors:
Cecily Wolfe,
Yayi Feng,
David Chen,
Edwin Purcell,
Anne Talkington,
Sepideh Dolatshahi,
Heman Shakeri
Abstract:
The cellular composition of the tumor microenvironment can directly impact cancer progression and the efficacy of therapeutics. Understanding immune cell activity, the body's natural defense mechanism, in the vicinity of cancerous cells is essential for developing beneficial treatments. Single cell RNA sequencing (scRNA-seq) enables the examination of gene expression on an individual cell basis, p…
▽ More
The cellular composition of the tumor microenvironment can directly impact cancer progression and the efficacy of therapeutics. Understanding immune cell activity, the body's natural defense mechanism, in the vicinity of cancerous cells is essential for developing beneficial treatments. Single cell RNA sequencing (scRNA-seq) enables the examination of gene expression on an individual cell basis, providing crucial information regarding both the disturbances in cell functioning caused by cancer and cell-cell communication in the tumor microenvironment. This novel technique generates large amounts of data, which require proper processing. Various tools exist to facilitate this processing but need to be organized to standardize the workflow from data wrangling to visualization, cell type identification, and analysis of changes in cellular activity, both from the standpoint of malignant cells and immune stromal cells that eliminate them. We aimed to develop a standardized pipeline (GeoTyper, https://github.com/celineyayifeng/GeoTyper) that integrates multiple scRNA-seq tools for processing raw sequence data extracted from NCBI GEO, visualization of results, statistical analysis, and cell type identification. This pipeline leverages existing tools, such as Cellranger from 10X Genomics, Alevin, and Seurat, to cluster cells and identify cell types based on gene expression profiles. We successfully tested and validated the pipeline on several publicly available scRNA-seq datasets, resulting in clusters corresponding to distinct cell types. By determining the cell types and their respective frequencies in the tumor microenvironment across multiple cancers, this workflow will help quantify changes in gene expression related to cell-cell communication and identify possible therapeutic targets.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Machine learning analysis of cocaine addiction informed by DAT, SERT, and NET-based interactome networks
Authors:
Hongsong Feng,
Kaifu Gao,
Dong Chen,
Alfred J Robison,
Edmund Ellsworth,
Guo-Wei Wei
Abstract:
Cocaine addiction is a psychosocial disorder induced by the chronic use of cocaine and causes a large of number deaths around the world. Despite many decades' effort, no drugs have been approved by the Food and Drug Administration (FDA) for the treatment of cocaine dependence. Cocaine dependence is neurological and involves many interacting proteins in the interactome. Among them, dopamine transpo…
▽ More
Cocaine addiction is a psychosocial disorder induced by the chronic use of cocaine and causes a large of number deaths around the world. Despite many decades' effort, no drugs have been approved by the Food and Drug Administration (FDA) for the treatment of cocaine dependence. Cocaine dependence is neurological and involves many interacting proteins in the interactome. Among them, dopamine transporter (DAT), serotonin transporter (SERT), and norepinephrine transporter (NET) are three major targets. Each of these targets has a large protein-protein interaction (PPI) network which must be considered in the anti-cocaine addiction drug discovery. This work presents DAT, SERT, and NET interactome network-informed machine learning/deep learning (ML/DL) studies of cocaine addiction. We collect and analyze 61 protein targets out 460 proteins in the DAT, SERT, and NET PPI networks that have sufficient existing inhibitor datasets. Utilizing autoencoder and other ML algorithms, we build ML/DL models for these targets with 115,407 inhibitors to predict drug repurposing potentials and possible side effects. We further screen their absorption, distribution, metabolism, and excretion, and toxicity (ADMET) properties to search for nearly optimal leads for anti-cocaine addiction. Our approach sets up a systematic protocol for artificial intelligence (AI)-based anti-cocaine addiction lead discovery.
△ Less
Submitted 31 December, 2021;
originally announced January 2022.
-
AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks
Authors:
Ruiwei Feng,
Yufeng Xie,
Minshan Lai,
Danny Z. Chen,
Ji Cao,
Jian Wu
Abstract:
Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the f…
▽ More
Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the first time, our AGMI approach explores gene constraint based multi-omics integration for DRP with the whole-genome using GNNs. Empirical experiments on the CCLE and GDSC datasets show that our AGMI largely outperforms state-of-the-art DRP methods by 8.3%--34.2% on four metrics. Our data and code are available at https://github.com/yivan-WYYGDSG/AGMI.
△ Less
Submitted 9 January, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Proteome-informed machine learning studies of cocaine addiction
Authors:
Kaifu Gao,
Dong Chen,
Alfred J Robison,
Guo-Wei Wei
Abstract:
Cocaine addiction accounts for a large portion of substance use disorders and threatens millions of lives worldwide. There is an urgent need to come up with efficient anti-cocaine addiction drugs. Unfortunately, no medications have been approved by the Food and Drug Administration (FDA), despite the extensive effort in the past few decades. The main challenge is the intricate molecular mechanisms…
▽ More
Cocaine addiction accounts for a large portion of substance use disorders and threatens millions of lives worldwide. There is an urgent need to come up with efficient anti-cocaine addiction drugs. Unfortunately, no medications have been approved by the Food and Drug Administration (FDA), despite the extensive effort in the past few decades. The main challenge is the intricate molecular mechanisms of cocaine addiction, involving synergistic interactions among proteins upstream and downstream of dopamine transporter (DAT) functions impacted by cocaine. However, traditional in vivo or in vitro experiments can not address the roles of so many proteins, highlighting the need for innovative strategies in the field. We propose a proteome-informed machine learning/deep learning (ML/DL) platform to discover nearly optimal anti-cocaine addiction lead compounds. We construct and analyze proteomic protein-protein interaction (PPI) networks for cocaine dependence to identify 141 involved drug targets and represent over 60,000 associated drug candidates or experimental drugs in the latent space using an autoencoder (EA) model trained from over 104 million molecules. We build 32 ML models for cross-target analysis of these drug candidates for side effects and repurposing potential. We further screen the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of these candidates. Our platform reveals that essentially all of the existing drug candidates, including dozens of experimental drugs, fail to pass our cross-target and ADMET screenings. Nonetheless, we have identified two nearly optimal leads for further optimization.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
Comprehensive assessment of error correction methods for high-throughput sequencing data
Authors:
Yun Heo,
Gowthami Manikandan,
Anand Ramachandran,
Deming Chen
Abstract:
The advent of DNA and RNA sequencing has revolutionized the study of genomics and molecular biology. Next generation sequencing (NGS) technologies like Illumina, Ion Torrent, SOLiD sequencing etc. have brought about a quick and cheap way to sequence genomes. Recently, third generation sequencing (TGS) technologies like PacBio and Oxford Nanopore Technology (ONT) have also been developed. Different…
▽ More
The advent of DNA and RNA sequencing has revolutionized the study of genomics and molecular biology. Next generation sequencing (NGS) technologies like Illumina, Ion Torrent, SOLiD sequencing etc. have brought about a quick and cheap way to sequence genomes. Recently, third generation sequencing (TGS) technologies like PacBio and Oxford Nanopore Technology (ONT) have also been developed. Different technologies use different underlying methods for sequencing and are prone to different error rates. Though many tools exist for error correction of sequencing data from NGS and TGS methods, no standard method is available yet to evaluate the accuracy and effectiveness of these error-correction tools. In this study, we present a Software Package for Error Correction Tool Assessment on nuCLEic acid sequences (SPECTACLE) providing comprehensive algorithms to evaluate error-correction methods for DNA and RNA sequencing, for NGS and TGS platforms. We also present a compilation of sequencing datasets for Illumina, PacBio and ONT platforms that present challenging scenarios for error-correction tools. Using these datasets and SPECTACLE, we evaluate the performance of 23 different error-correction tools and present unique and helpful insights into their strengths and weaknesses. We hope that our methodology will standardize the evaluation of DNA and RNA error-correction tools in the future.
△ Less
Submitted 25 March, 2021; v1 submitted 9 July, 2020;
originally announced July 2020.
-
Modeling Pharmacological Effects with Multi-Relation Unsupervised Graph Embedding
Authors:
Dehua Chen,
Amir Jalilifard,
Adriano Veloso,
Nivio Ziviani
Abstract:
A pharmacological effect of a drug on cells, organs and systems refers to the specific biochemical interaction produced by a drug substance, which is called its mechanism of action. Drug repositioning (or drug repurposing) is a fundamental problem for the identification of new opportunities for the use of already approved or failed drugs. In this paper, we present a method based on a multi-relatio…
▽ More
A pharmacological effect of a drug on cells, organs and systems refers to the specific biochemical interaction produced by a drug substance, which is called its mechanism of action. Drug repositioning (or drug repurposing) is a fundamental problem for the identification of new opportunities for the use of already approved or failed drugs. In this paper, we present a method based on a multi-relation unsupervised graph embedding model that learns latent representations for drugs and diseases so that the distance between these representations reveals repositioning opportunities. Once representations for drugs and diseases are obtained we learn the likelihood of new links (that is, new indications) between drugs and diseases. Known drug indications are used for learning a model that predicts potential indications. Compared with existing unsupervised graph embedding methods our method shows superior prediction performance in terms of area under the ROC curve, and we present examples of repositioning opportunities found on recent biomedical literature that were also predicted by our method.
△ Less
Submitted 15 May, 2020; v1 submitted 30 April, 2020;
originally announced April 2020.
-
Build-at-home UV-C disinfection system for healthcare settings
Authors:
Rosemary C. She,
Dongyu Chen,
Pil Pak,
Deniz K. Armani,
Andreas Schubert,
Andrea M. Armani
Abstract:
Significant research has shown that UV-C exposure is an effective disinfectant for a range of bacteria and viruses, including coronaviruses. As such, a UV-C treatment in combination with a chemical wipe, such as EPA hydrogen peroxide, is a common cleaning protocol in a medical setting, and such disinfection protocols have gained in importance during the current COVID-19 pandemic due to the need to…
▽ More
Significant research has shown that UV-C exposure is an effective disinfectant for a range of bacteria and viruses, including coronaviruses. As such, a UV-C treatment in combination with a chemical wipe, such as EPA hydrogen peroxide, is a common cleaning protocol in a medical setting, and such disinfection protocols have gained in importance during the current COVID-19 pandemic due to the need to reuse PPE. However, given the substantial increase in patient volume, the quantity of materials requiring disinfection exceeds the UV-C equipment throughput capabilities at many medical facilities. Therefore, there is a need for a UV-C disinfection system that can be rapidly deployed. In response to this demand, we designed, constructed, and validated a UV-C disinfection system from readily accessible components; specifically, a plastic bin, UV-C light bulb and conventional light housing. To further improve the performance, the interior of the tub was spray-painted with chrome paint, forming a low quality-factor (Q) Fabry-Perot optical cavity. As part of this work, a set of modular design criteria which allows for flexibility in component selection without degradation of UV-C dose performance is established. This flexibility is critical given the current fluctuating availability of source materials. The disinfection capabilities of the system are validated using Bacillus cereus, a gram-positive endospore-forming bacteria.
△ Less
Submitted 1 April, 2020; v1 submitted 28 March, 2020;
originally announced March 2020.
-
Control Efficacy on COVID-19
Authors:
Duanbing Chen,
Tao Zhou
Abstract:
We proposed a Monte-Carlo method to estimate temporal reproduction number without complete information about symptom onsets of all cases. Province-level analysis demonstrated the huge success of Chinese control measures on COVID-19, that is, provinces' reproduction numbers quickly decrease to <1 by just one week after taking actions.
We proposed a Monte-Carlo method to estimate temporal reproduction number without complete information about symptom onsets of all cases. Province-level analysis demonstrated the huge success of Chinese control measures on COVID-19, that is, provinces' reproduction numbers quickly decrease to <1 by just one week after taking actions.
△ Less
Submitted 29 February, 2020;
originally announced March 2020.
-
Improving the soil water module of the Decision Support System for Agrotechnology Transfer cropping system model for subsurface irrigation
Authors:
Dan Chen,
Yusuke Kikuchi,
Kenichiro Fujiyama,
Shunsuke Akimoto,
Shinji Oominato,
Toshihiro Hasegawa
Abstract:
Ensuring that crops use water and nutrients efficiently is an important strategy for increasing the profitability of farming and reducing the environmental load from agriculture. Subsurface irrigation can be an alternative to surface irrigation as a means of losing less irrigation water, but the application timing and amount are often difficult to determine. Well-defined soil and crop models are u…
▽ More
Ensuring that crops use water and nutrients efficiently is an important strategy for increasing the profitability of farming and reducing the environmental load from agriculture. Subsurface irrigation can be an alternative to surface irrigation as a means of losing less irrigation water, but the application timing and amount are often difficult to determine. Well-defined soil and crop models are useful for assisting decision support, but most of the models developed to date have been for surface irrigation. The present study examines whether the Decision Support System for Agrotechnology Transfer (DSSAT, version 4.5) cropping system model is applicable for the production of processing tomatoes with subsurface irrigation, and it revises the soil module to simulate irrigation schemes with subsurface irrigation. Five farmed fields in California, USA, are used to test the performance of the model. The original DSSAT model fails to produce fruit yield by overestimating the water deficiency. The soil water module is then revised by introducing the movement of soil moisture due to a vertical soil moisture gradient. Moreover, an external parameter optimization system is constructed to minimize the error between the simulation and observations. The revised module reduces the errors in the soil moisture profile at each field compared to those by the original DSSAT model. The average soil moisture error decreases from 0.065m^3/m^3 to 0.029m^3/m^3. The yields estimated by the modified model are in a reasonable range from 80 to 150 ton/ha, which is commonly observed under well-managed conditions. The present results show that although further testing is required for yield prediction, the present modification to the original DSSAT model improves the precision of the soil moisture profile under subsurface irrigation and can be used for decision support for efficient producting of processing tomatoes.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
A Review of Mathematical Modeling, Simulation and Analysis of Membrane Channel Charge Transport
Authors:
Duan Chen,
Guowei Wei
Abstract:
The molecular mechanism of ion channel gating and substrate modulation is elusive for many voltage gated ion channels, such as eukaryotic sodium ones. The understanding of channel functions is a pressing issue in molecular biophysics and biology. Mathematical modeling, computation and analysis of membrane channel charge transport have become an emergent field and give rise to significant contribut…
▽ More
The molecular mechanism of ion channel gating and substrate modulation is elusive for many voltage gated ion channels, such as eukaryotic sodium ones. The understanding of channel functions is a pressing issue in molecular biophysics and biology. Mathematical modeling, computation and analysis of membrane channel charge transport have become an emergent field and give rise to significant contributions to our understanding of ion channel gating and function. This review summarizes recent progresses and outlines remaining challenges in mathematical modeling, simulation and analysis of ion channel charge transport. One of our focuses is the Poisson-Nernst-Planck (PNP) model and its generalizations. Specifically, the basic framework of the PNP system and some of its extensions, including size effects, ion-water interactions, coupling with density functional theory and relation to fluid flow models. A reduced theory, the Poisson- Boltzmann-Nernst-Planck (PBNP) model, and a differential geometry based ion transport model are also discussed. For proton channel, a multiscale and multiphysics Poisson-Boltzmann-Kohn-Sham (PBKS) model is presented. We show that all of these ion channel models can be cast into a unified variational multiscale framework with a macroscopic continuum domain of the solvent and a microscopic discrete domain of the solute. The main strategy is to construct a total energy functional of a charge transport system to encompass the polar and nonpolar free energies of solvation and chemical potential related energies. Current computational algorithms and tools for numerical simulations and results from mathematical analysis of ion channel systems are also surveyed.
△ Less
Submitted 29 September, 2016;
originally announced November 2016.
-
Deep Multi-Species Embedding
Authors:
Di Chen,
Yexiang Xue,
Shuo Chen,
Daniel Fink,
Carla Gomes
Abstract:
Understanding how species are distributed across landscapes over time is a fundamental question in biodiversity research. Unfortunately, most species distribution models only target a single species at a time, despite strong ecological evidence that species are not independently distributed. We propose Deep Multi-Species Embedding (DMSE), which jointly embeds vectors corresponding to multiple spec…
▽ More
Understanding how species are distributed across landscapes over time is a fundamental question in biodiversity research. Unfortunately, most species distribution models only target a single species at a time, despite strong ecological evidence that species are not independently distributed. We propose Deep Multi-Species Embedding (DMSE), which jointly embeds vectors corresponding to multiple species as well as vectors representing environmental covariates into a common high-dimensional feature space via a deep neural network. Applied to bird observational data from the citizen science project \textit{eBird}, we demonstrate how the DMSE model discovers inter-species relationships to outperform single-species distribution models (random forests and SVMs) as well as competing multi-label models. Additionally, we demonstrate the benefit of using a deep neural network to extract features within the embedding and show how they improve the predictive performance of species distribution modelling. An important domain contribution of the DMSE model is the ability to discover and describe species interactions while simultaneously learning the shared habitat preferences among species. As an additional contribution, we provide a graphical embedding of hundreds of bird species in the Northeast US.
△ Less
Submitted 21 February, 2017; v1 submitted 27 September, 2016;
originally announced September 2016.
-
ATP consumption of eukaryotic flagella measured at a single-cell level
Authors:
Daniel T. N. Chen,
Michael Heymann,
Seth Fraden,
Daniela Nicastro,
Zvonimir Dogic
Abstract:
The motility of cilia and flagella is driven by thousands of dynein motors that hydrolyze adenosine triphosphate (ATP). Despite decades of genetic, biochemical, structural and biophysical studies, some aspects of ciliary motility remain elusive, such as the regulation of beating patterns and the energetic efficiency of these nanomachines. Here, we introduce an experimental method to measure ATP co…
▽ More
The motility of cilia and flagella is driven by thousands of dynein motors that hydrolyze adenosine triphosphate (ATP). Despite decades of genetic, biochemical, structural and biophysical studies, some aspects of ciliary motility remain elusive, such as the regulation of beating patterns and the energetic efficiency of these nanomachines. Here, we introduce an experimental method to measure ATP consumption of actively beating axonemes on a single-cell level. We encapsulated individual sea urchin sperm with demembranated flagellum inside water-in-oil emulsion droplets and measured the axonemes ATP consumption by monitoring fluorescence intensity of a fluorophore-coupled reporter system for ATP turnover in the droplet. Concomitant phase contrast imaging allowed us to extract a linear dependence between the ATP consumption rate and the flagellar beating frequency, with ~2.3e5 ATP molecules consumed per beat of a demembranated flagellum. Increasing the viscosity of the aqueous medium led to modified beating waveforms of the axonemes and to higher energy consumption per beat cycle. Our single-cell experimental platform provides both new insights into the beating mechanism of flagella and a powerful tool for future studies.
△ Less
Submitted 21 December, 2015; v1 submitted 5 November, 2015;
originally announced November 2015.
-
Investigating the Selectivity of KcsA Channel by an Image Charge Solvation Method (ICSM) in Molecular Dynamics Simulations
Authors:
Katherine Baker,
Duan Chen,
Wei Cai
Abstract:
In this paper, we study the selectivity of the potassium channel KcsA by a recently developed image-charge solvation method(ICSM) combined with molecular dynamics simulations. The hybrid solvation model in the ICSM is able to demonstrate atomistically the function of the selectivity filter of the KcsA channel when potassium and sodium ions are considered and their distributions inside the filter a…
▽ More
In this paper, we study the selectivity of the potassium channel KcsA by a recently developed image-charge solvation method(ICSM) combined with molecular dynamics simulations. The hybrid solvation model in the ICSM is able to demonstrate atomistically the function of the selectivity filter of the KcsA channel when potassium and sodium ions are considered and their distributions inside the filter are simulated. Our study also shows that the reaction field effect, explicitly accounted for through image charge approximation in the ICSM model, is necessary in reproducing the correct selectivity property of the potassium channels.
△ Less
Submitted 15 October, 2015;
originally announced October 2015.
-
A Cellular Automaton Model for Tumor Dormancy: Emergence of a Proliferative Switch
Authors:
Duyu Chen,
Yang Jiao,
Salvatore Torquato
Abstract:
Malignant cancers that lead to fatal outcomes for patients may remain dormant for very long periods of time. Although individual mechanisms such as cellular dormancy, angiogenic dormancy and immunosurveillance have been proposed, a comprehensive understanding of cancer dormancy and the "switch" from a dormant to a proliferative state still needs to be strengthened from both a basic and clinical po…
▽ More
Malignant cancers that lead to fatal outcomes for patients may remain dormant for very long periods of time. Although individual mechanisms such as cellular dormancy, angiogenic dormancy and immunosurveillance have been proposed, a comprehensive understanding of cancer dormancy and the "switch" from a dormant to a proliferative state still needs to be strengthened from both a basic and clinical point of view. Computational modeling enables one to explore a variety of scenarios for possible but realistic microscopic dormancy mechanisms and their predicted outcomes. The aim of this paper is to devise such a predictive computational model of dormancy with an emergent "switch" behavior. Specifically, we generalize a previous cellular automaton (CA) model for proliferative growth of solid tumor that now incorporates a variety of cell-level tumor-host interactions and different mechanisms for tumor dormancy, for example the effects of the immune system. Our new CA rules induce a natural "competition" between the tumor and tumor suppression factors in the microenvironment. This competition either results in a "stalemate" for a period of time in which the tumor either eventually wins (spontaneously emerges) or is eradicated; or it leads to a situation in which the tumor is eradicated before such a "stalemate" could ever develop. We also predict that if the number of actively dividing cells within the proliferative rim of the tumor reaches a critical, yet low level, the dormant tumor has a high probability to resume rapid growth. Our findings may shed light on the fundamental understanding of cancer dormancy.
△ Less
Submitted 17 October, 2014;
originally announced October 2014.
-
Spontaneous motion in hierarchically assembled active matter
Authors:
Tim Sanchez,
Daniel T. N. Chen,
Stephen J. DeCamp,
Michael Heymann,
Zvonimir Dogic
Abstract:
With exquisite precision and reproducibility, cells orchestrate the cooperative action of thousands of nanometer-sized molecular motors to carry out mechanical tasks at much larger length scales, such as cell motility, division and replication. Besides their biological importance, such inherently non-equilibrium processes are an inspiration for developing biomimetic active materials from microscop…
▽ More
With exquisite precision and reproducibility, cells orchestrate the cooperative action of thousands of nanometer-sized molecular motors to carry out mechanical tasks at much larger length scales, such as cell motility, division and replication. Besides their biological importance, such inherently non-equilibrium processes are an inspiration for developing biomimetic active materials from microscopic components that consume energy to generate continuous motion. Being actively driven, these materials are not constrained by the laws of equilibrium statistical mechanics and can thus exhibit highly sought-after properties such as autonomous motility, internally generated flows and self-organized beating. Starting from extensile microtubule bundles, we hierarchically assemble active analogs of conventional polymer gels, liquid crystals and emulsions. At high enough concentration, microtubules form a percolating active network characterized by internally driven chaotic flows, hydrodynamic instabilities, enhanced transport and fluid mixing. When confined to emulsion droplets, 3D networks spontaneously adsorb onto the droplet surfaces to produce highly active 2D nematic liquid crystals whose streaming flows are controlled by internally generated fractures and self-healing, as well as unbinding and annihilation of oppositely charged disclination defects. The resulting active emulsions exhibit unexpected properties, such as autonomous motility, which are not observed in their passive analogues. Taken together, these observations exemplify how assemblages of animate microscopic objects exhibit collective biomimetic properties that are starkly different from those found in materials assembled from inanimate building blocks, challenging us to develop a theoretical framework that would allow for a systematic engineering of their far-from-equilibrium material properties.
△ Less
Submitted 7 January, 2013;
originally announced January 2013.
-
Evolution favors protein mutational robustness in sufficiently large populations
Authors:
Jesse D. Bloom,
Zhongyi Lu,
David Chen,
Alpan Raval,
Ophelia S. Venturelli,
Frances H. Arnold
Abstract:
BACKGROUND: An important question is whether evolution favors properties such as mutational robustness or evolvability that do not directly benefit any individual, but can influence the course of future evolution. Functionally similar proteins can differ substantially in their robustness to mutations and capacity to evolve new functions, but it has remained unclear whether any of these differenc…
▽ More
BACKGROUND: An important question is whether evolution favors properties such as mutational robustness or evolvability that do not directly benefit any individual, but can influence the course of future evolution. Functionally similar proteins can differ substantially in their robustness to mutations and capacity to evolve new functions, but it has remained unclear whether any of these differences might be due to evolutionary selection for these properties.
RESULTS: Here we use laboratory experiments to demonstrate that evolution favors protein mutational robustness if the evolving population is sufficiently large. We neutrally evolve cytochrome P450 proteins under identical selection pressures and mutation rates in populations of different sizes, and show that proteins from the larger and thus more polymorphic population tend towards higher mutational robustness. Proteins from the larger population also evolve greater stability, a biophysical property that is known to enhance both mutational robustness and evolvability. The excess mutational robustness and stability is well described by existing mathematical theories, and can be quantitatively related to the way that the proteins occupy their neutral network.
CONCLUSIONS: Our work is the first experimental demonstration of the general tendency of evolution to favor mutational robustness and protein stability in highly polymorphic populations. We suggest that this phenomenon may contribute to the mutational robustness and evolvability of viruses and bacteria that exist in large populations.
△ Less
Submitted 14 April, 2007;
originally announced April 2007.
-
The Expresso Microarray Experiment Management System: The Functional Genomics of Stress Responses in Loblolly Pine
Authors:
Lenwood S. Heath,
Naren Ramakrishnan,
Ronald R. Sederoff,
Ross W. Whetten,
Boris I. Chevone,
Craig A. Struble,
Vincent Y. Jouenne,
Dawei Chen,
Leonel van Zyl,
Ruth G. Alscher
Abstract:
Conception, design, and implementation of cDNA microarray experiments present a variety of bioinformatics challenges for biologists and computational scientists. The multiple stages of data acquisition and analysis have motivated the design of Expresso, a system for microarray experiment management. Salient aspects of Expresso include support for clone replication and randomized placement; autom…
▽ More
Conception, design, and implementation of cDNA microarray experiments present a variety of bioinformatics challenges for biologists and computational scientists. The multiple stages of data acquisition and analysis have motivated the design of Expresso, a system for microarray experiment management. Salient aspects of Expresso include support for clone replication and randomized placement; automatic gridding, extraction of expression data from each spot, and quality monitoring; flexible methods of combining data from individual spots into information about clones and functional categories; and the use of inductive logic programming for higher-level data analysis and mining. The development of Expresso is occurring in parallel with several generations of microarray experiments aimed at elucidating genomic responses to drought stress in loblolly pine seedlings. The current experimental design incorporates 384 pine cDNAs replicated and randomly placed in two specific microarray layouts. We describe the design of Expresso as well as results of analysis with Expresso that suggest the importance of molecular chaperones and membrane transport proteins in mechanisms conferring successful adaptation to long-term drought stress.
△ Less
Submitted 23 October, 2001;
originally announced October 2001.