-
Targeting Melanoma-Specific Tyrosinase: Cyclic Peptide Disrupts Actin Dynamics for Precision Apoptosis Induction
Authors:
Ruoyang Zhao,
Xiaowei Wang,
Jiajia Hu,
Qingqing Sun,
Xinmin Zhao,
Jun Guo,
Feng Zhang,
Min Wu
Abstract:
Melanoma is an aggressive and highly metastatic cancer that exhibits stubborn resistance to conventional therapies, highlighting the need for novel treatments. Existing therapeutic strategies often suffer from systemic toxicity, poor efficacy and fast-gained drug resistance. In this study, we designed a cyclic peptide system (c-RGDKYQ) that takes the advantage of the overexpression of tyrosinase i…
▽ More
Melanoma is an aggressive and highly metastatic cancer that exhibits stubborn resistance to conventional therapies, highlighting the need for novel treatments. Existing therapeutic strategies often suffer from systemic toxicity, poor efficacy and fast-gained drug resistance. In this study, we designed a cyclic peptide system (c-RGDKYQ) that takes the advantage of the overexpression of tyrosinase in melanoma cells to trigger enzyme-mediated oxidation and self-assembly. The assembled peptide nanostructures can selectively disrupt the actin cytoskeleton, impairing cancer cellular functions, e.g., motility, adhesion, and proliferation, ultimately leading to apoptosis. This approach does not rely on external drug payloads or complex delivery mechanisms. c-RGDKYQ exhibits high selectivity for melanoma cells, strongly suppressing tumor growth in a murine model with minimal systemic toxicity. Our findings illuminate that, through targeting tyrosinase, c-RGDKYQ may be an enzyme-responsive alternative to conventional treatments for melanoma.
△ Less
Submitted 9 July, 2025;
originally announced July 2025.
-
BMFM-DNA: A SNP-aware DNA foundation model to capture variant effects
Authors:
Hongyang Li,
Sanjoy Dey,
Bum Chul Kwon,
Michael Danziger,
Michal Rosen-Tzvi,
Jianying Hu,
James Kozloski,
Ching-Huei Tsou,
Bharath Dandala,
Pablo Meyer
Abstract:
Large language models (LLMs) trained on text demonstrated remarkable results on natural language processing (NLP) tasks. These models have been adapted to decipher the language of DNA, where sequences of nucleotides act as "words" that encode genomic functions. However, the genome differs fundamentally from natural language, as it lacks clearly defined words or a consistent grammar. Although DNA l…
▽ More
Large language models (LLMs) trained on text demonstrated remarkable results on natural language processing (NLP) tasks. These models have been adapted to decipher the language of DNA, where sequences of nucleotides act as "words" that encode genomic functions. However, the genome differs fundamentally from natural language, as it lacks clearly defined words or a consistent grammar. Although DNA language models (DNALMs) such as DNABERT, GENA-LM have achieved high level of performance on genome-related biological tasks, these models do not encode biological functions in the presence of sequence variations. To address this problem, we pre-train foundation models that effectively integrate sequence variations, in particular Single Nucleotide Polymorphisms (SNPs), as they underlie important biological functions. Specifically, we use ModernBERT to pre-train two different Biomedical Foundation Models (BMFM), namely, BMFM-DNA-REF in which the model is trained with sequences of varying lengths along with their reverse complements derived from the reference genome and BMFM-DNA-SNP in which the model is trained with sequences created using a novel representation scheme that encodes sequence variations. Our findings indicate that integrating sequence variations into DNALMs helps capture the biological functions as seen in improvements on all fine-tuning tasks. To explore the model's practical utility, we experimented with various strategies for SNP imputation on promoter detection task introduced in DNABERT-2. However, we acknowledge that the current benchmarks are limited in their ability to fully evaluate these models. To enable more comprehensive assessment in the future and encourage community contributions, we release our models through HuggingFace and the code to reproduce the results at https://github.com/BiomedSciAI/biomed-multi-omic
△ Less
Submitted 26 June, 2025;
originally announced July 2025.
-
HelixDesign-Antibody: A Scalable Production-Grade Platform for Antibody Design Built on HelixFold3
Authors:
Jie Gao,
Jing Hu,
Shanzhuo Zhang,
Kunrui Zhu,
Sheng Qian,
Yueyang Huang,
Xiaonan Zhang,
Xiaomin Fang
Abstract:
Antibody engineering is essential for developing therapeutics and advancing biomedical research. Traditional discovery methods often rely on time-consuming and resource-intensive experimental screening. To enhance and streamline this process, we introduce a production-grade, high-throughput platform built on HelixFold3, HelixDesign-Antibody, which utilizes the high-accuracy structure prediction mo…
▽ More
Antibody engineering is essential for developing therapeutics and advancing biomedical research. Traditional discovery methods often rely on time-consuming and resource-intensive experimental screening. To enhance and streamline this process, we introduce a production-grade, high-throughput platform built on HelixFold3, HelixDesign-Antibody, which utilizes the high-accuracy structure prediction model, HelixFold3. The platform facilitates the large-scale generation of antibody candidate sequences and evaluates their interaction with antigens. Integrated high-performance computing (HPC) support enables high-throughput screening, addressing challenges such as fragmented toolchains and high computational demands. Validation on multiple antigens showcases the platform's ability to generate diverse and high-quality antibodies, confirming a scaling law where exploring larger sequence spaces increases the likelihood of identifying optimal binders. This platform provides a seamless, accessible solution for large-scale antibody design and is available via the antibody design page of PaddleHelix platform.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddings
Authors:
Rong Ma,
Xi Li,
Jingyuan Hu,
Bin Yu
Abstract:
Single-cell sequencing is revolutionizing biology by enabling detailed investigations of cell-state transitions. Many biological processes unfold along continuous trajectories, yet it remains challenging to extract smooth, low-dimensional representations from inherently noisy, high-dimensional single-cell data. Neighbor embedding (NE) algorithms, such as t-SNE and UMAP, are widely used to embed hi…
▽ More
Single-cell sequencing is revolutionizing biology by enabling detailed investigations of cell-state transitions. Many biological processes unfold along continuous trajectories, yet it remains challenging to extract smooth, low-dimensional representations from inherently noisy, high-dimensional single-cell data. Neighbor embedding (NE) algorithms, such as t-SNE and UMAP, are widely used to embed high-dimensional single-cell data into low dimensions. But they often introduce undesirable distortions, resulting in misleading interpretations. Existing evaluation methods for NE algorithms primarily focus on separating discrete cell types rather than capturing continuous cell-state transitions, while dynamic modeling approaches rely on strong assumptions about cellular processes and specialized data. To address these challenges, we build on the Predictability-Computability-Stability (PCS) framework for reliable and reproducible data-driven discoveries. First, we systematically evaluate popular NE algorithms through empirical analysis, simulation, and theory, and reveal their key shortcomings, such as artifacts and instability. We then introduce NESS, a principled and interpretable machine learning approach to improve NE representations by leveraging algorithmic stability and to enable robust inference of smooth biological structures. NESS offers useful concepts, quantitative stability metrics, and efficient computational workflows to uncover developmental trajectories and cell-state transitions in single-cell data. Finally, we apply NESS to six single-cell datasets, spanning pluripotent stem cell differentiation, organoid development, and multiple tissue-specific lineage trajectories. Across these diverse contexts, NESS consistently yields useful biological insights, such as identification of transitional and stable cell states and quantification of transcriptional dynamics during development.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models
Authors:
Bharath Dandala,
Michael M. Danziger,
Ella Barkan,
Tanwi Biswas,
Viatcheslav Gurev,
Jianying Hu,
Matthew Madgwick,
Akira Koseki,
Tal Kozlovski,
Michal Rosen-Zvi,
Yishai Shimoni,
Ching-Huei Tsou
Abstract:
Transcriptomic foundation models (TFMs) have recently emerged as powerful tools for analyzing gene expression in cells and tissues, supporting key tasks such as cell-type annotation, batch correction, and perturbation prediction. However, the diversity of model implementations and training strategies across recent TFMs, though promising, makes it challenging to isolate the contribution of individu…
▽ More
Transcriptomic foundation models (TFMs) have recently emerged as powerful tools for analyzing gene expression in cells and tissues, supporting key tasks such as cell-type annotation, batch correction, and perturbation prediction. However, the diversity of model implementations and training strategies across recent TFMs, though promising, makes it challenging to isolate the contribution of individual design choices or evaluate their potential synergies. This hinders the field's ability to converge on best practices and limits the reproducibility of insights across studies. We present BMFM-RNA, an open-source, modular software package that unifies diverse TFM pretraining and fine-tuning objectives within a single framework. Leveraging this capability, we introduce a novel training objective, whole cell expression decoder (WCED), which captures global expression patterns using an autoencoder-like CLS bottleneck representation. In this paper, we describe the framework, supported input representations, and training objectives. We evaluated four model checkpoints pretrained on CELLxGENE using combinations of masked language modeling (MLM), WCED and multitask learning. Using the benchmarking capabilities of BMFM-RNA, we show that WCED-based models achieve performance that matches or exceeds state-of-the-art approaches like scGPT across more than a dozen datasets in both zero-shot and fine-tuning tasks. BMFM-RNA, available as part of the biomed-multi-omics project ( https://github.com/BiomedSciAI/biomed-multi-omic ), offers a reproducible foundation for systematic benchmarking and community-driven exploration of optimal TFM training strategies, enabling the development of more effective tools to leverage the latest advances in AI for understanding cell biology.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Protein Inverse Folding From Structure Feedback
Authors:
Junde Xu,
Zijun Gao,
Xinyi Zhou,
Jie Hu,
Xingyi Cheng,
Le Song,
Guangyong Chen,
Pheng-Ann Heng,
Jiezhong Qiu
Abstract:
The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference Optimization (DPO) to fine-tune an inverse folding model using feedback from a protein folding model. Given a target protein structure, we begin by sampling cand…
▽ More
The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference Optimization (DPO) to fine-tune an inverse folding model using feedback from a protein folding model. Given a target protein structure, we begin by sampling candidate sequences from the inverse-folding model, then predict the three-dimensional structure of each sequence with the folding model to generate pairwise structural-preference labels. These labels are used to fine-tune the inverse-folding model under the DPO objective. Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning not only improves sequence recovery of baseline models but also leads to a significant improvement in average TM-Score from 0.77 to 0.81, indicating enhanced structure similarity. Furthermore, iterative application of our DPO-based method on challenging protein structures yields substantial gains, with an average TM-Score increase of 79.5\% with regard to the baseline model. This work establishes a promising direction for enhancing protein sequence design ability from structure feedback by effectively utilizing preference optimization.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
HelixDesign-Binder: A Scalable Production-Grade Platform for Binder Design Built on HelixFold3
Authors:
Jie Gao,
Jun Li,
Jing Hu,
Shanzhuo Zhang,
Kunrui Zhu,
Yueyang Huang,
Xiaonan Zhang,
Xiaomin Fang
Abstract:
Protein binder design is central to therapeutics, diagnostics, and synthetic biology, yet practical deployment remains challenging due to fragmented workflows, high computational costs, and complex tool integration. We present HelixDesign-Binder, a production-grade, high-throughput platform built on HelixFold3 that automates the full binder design pipeline, from backbone generation and sequence de…
▽ More
Protein binder design is central to therapeutics, diagnostics, and synthetic biology, yet practical deployment remains challenging due to fragmented workflows, high computational costs, and complex tool integration. We present HelixDesign-Binder, a production-grade, high-throughput platform built on HelixFold3 that automates the full binder design pipeline, from backbone generation and sequence design to structural evaluation and multi-dimensional scoring. By unifying these stages into a scalable and user-friendly system, HelixDesign-Binder enables efficient exploration of binder candidates with favorable structural, energetic, and physicochemical properties. The platform leverages Baidu Cloud's high-performance infrastructure to support large-scale design and incorporates advanced scoring metrics, including ipTM, predicted binding free energy, and interface hydrophobicity. Benchmarking across six protein targets demonstrates that HelixDesign-Binder reliably produces diverse and high-quality binders, some of which match or exceed validated designs in predicted binding affinity. HelixDesign-Binder is accessible via an interactive web interface in PaddleHelix platform, supporting both academic research and industrial applications in antibody and protein binder development.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology
Authors:
Changchun Yang,
Weiqian Dai,
Yilan Zhang,
Siyuan Chen,
Jingdong Hu,
Junkai Su,
Yuxuan Chen,
Ao Xu,
Na Li,
Xin Gao,
Yongguo Yu
Abstract:
Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the s…
▽ More
Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the scarcity of comprehensive datasets spanning diverse resource conditions. Here, we introduce CHROMA, a foundation model for cytogenomics, designed to overcome these challenges by learning generalizable representations of chromosomal abnormalities. Pre-trained on over 84,000 specimens (~4 million chromosomal images) via self-supervised learning, CHROMA outperforms other methods across all types of abnormalities, even when trained on fewer labelled data and more imbalanced datasets. By facilitating comprehensive mapping of instability and clonal leisons across various aberration types, CHROMA offers a scalable and generalizable solution for reliable and automated clinical analysis, reducing the annotation workload for experts and advancing precision oncology through the early detection of rare genomic abnormalities, enabling broad clinical AI applications and making advanced genomic analysis more accessible.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
TransST: Transfer Learning Embedded Spatial Factor Modeling of Spatial Transcriptomics Data
Authors:
Shuo Shuo Liu,
Shikun Wang,
Yuxuan Chen,
Anil K. Rustgi,
Ming Yuan,
Jianhua Hu
Abstract:
Background: Spatial transcriptomics have emerged as a powerful tool in biomedical research because of its ability to capture both the spatial contexts and abundance of the complete RNA transcript profile in organs of interest. However, limitations of the technology such as the relatively low resolution and comparatively insufficient sequencing depth make it difficult to reliably extract real biolo…
▽ More
Background: Spatial transcriptomics have emerged as a powerful tool in biomedical research because of its ability to capture both the spatial contexts and abundance of the complete RNA transcript profile in organs of interest. However, limitations of the technology such as the relatively low resolution and comparatively insufficient sequencing depth make it difficult to reliably extract real biological signals from these data. To alleviate this challenge, we propose a novel transfer learning framework, referred to as TransST, to adaptively leverage the cell-labeled information from external sources in inferring cell-level heterogeneity of a target spatial transcriptomics data.
Results: Applications in several real studies as well as a number of simulation settings show that our approach significantly improves existing techniques. For example, in the breast cancer study, TransST successfully identifies five biologically meaningful cell clusters, including the two subgroups of cancer in situ and invasive cancer; in addition, only TransST is able to separate the adipose tissues from the connective issues among all the studied methods.
Conclusions: In summary, the proposed method TransST is both effective and robust in identifying cell subclusters and detecting corresponding driving biomarkers in spatial transcriptomics data.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content
Authors:
Yongping Zhai,
Xiaoxi Fu,
Qiang Su,
Jia Hu,
Yake Zhang,
Yunfeng Zhou,
Chaofan Zhang,
Xiao Li,
Wenxin Wang,
Dongdong Wu,
Shen Yan
Abstract:
Autofocus is necessary for high-throughput and real-time scanning in microscopic imaging. Traditional methods rely on complex hardware or iterative hill-climbing algorithms. Recent learning-based approaches have demonstrated remarkable efficacy in a one-shot setting, avoiding hardware modifications or iterative mechanical lens adjustments. However, in this paper, we highlight a significant challen…
▽ More
Autofocus is necessary for high-throughput and real-time scanning in microscopic imaging. Traditional methods rely on complex hardware or iterative hill-climbing algorithms. Recent learning-based approaches have demonstrated remarkable efficacy in a one-shot setting, avoiding hardware modifications or iterative mechanical lens adjustments. However, in this paper, we highlight a significant challenge that the richness of image content can significantly affect autofocus performance. When the image content is sparse, previous autofocus methods, whether traditional climbing-hill or learning-based, tend to fail. To tackle this, we propose a content-importance-based solution, named SparseFocus, featuring a novel two-stage pipeline. The first stage measures the importance of regions within the image, while the second stage calculates the defocus distance from selected important regions. To validate our approach and benefit the research community, we collect a large-scale dataset comprising millions of labelled defocused images, encompassing both dense, sparse and extremely sparse scenarios. Experimental results show that SparseFocus surpasses existing methods, effectively handling all levels of content sparsity. Moreover, we integrate SparseFocus into our Whole Slide Imaging (WSI) system that performs well in real-world applications. The code and dataset will be made available upon the publication of this paper.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Authors:
Jingjing Hu,
Dan Guo,
Zhan Si,
Deguang Liu,
Yunfeng Diao,
Jing Zhang,
Jinxing Zhou,
Meng Wang
Abstract:
Molecular representation learning plays a crucial role in various downstream tasks, such as molecular property prediction and drug design. To accurately represent molecules, Graph Neural Networks (GNNs) and Graph Transformers (GTs) have shown potential in the realm of self-supervised pretraining. However, existing approaches often overlook the relationship between molecular structure and electroni…
▽ More
Molecular representation learning plays a crucial role in various downstream tasks, such as molecular property prediction and drug design. To accurately represent molecules, Graph Neural Networks (GNNs) and Graph Transformers (GTs) have shown potential in the realm of self-supervised pretraining. However, existing approaches often overlook the relationship between molecular structure and electronic information, as well as the internal semantic reasoning within molecules. This omission of fundamental chemical knowledge in graph semantics leads to incomplete molecular representations, missing the integration of structural and electronic data. To address these issues, we introduce MOL-Mamba, a framework that enhances molecular representation by combining structural and electronic insights. MOL-Mamba consists of an Atom & Fragment Mamba-Graph (MG) for hierarchical structural reasoning and a Mamba-Transformer (MT) fuser for integrating molecular structure and electronic correlation learning. Additionally, we propose a Structural Distribution Collaborative Training and E-semantic Fusion Training framework to further enhance molecular representation learning. Extensive experiments demonstrate that MOL-Mamba outperforms state-of-the-art baselines across eleven chemical-biological molecular datasets.
△ Less
Submitted 5 February, 2025; v1 submitted 20 December, 2024;
originally announced December 2024.
-
Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer
Authors:
Jie Gao,
Jing Hu,
Lihang Liu,
Yang Xue,
Kunrui Zhu,
Xiaonan Zhang,
Xiaomin Fang
Abstract:
The accurate prediction of antigen-antibody structures is essential for advancing immunology and therapeutic development, as it helps elucidate molecular interactions that underlie immune responses. Despite recent progress with deep learning models like AlphaFold and RoseTTAFold, accurately modeling antigen-antibody complexes remains a challenge due to their unique evolutionary characteristics. He…
▽ More
The accurate prediction of antigen-antibody structures is essential for advancing immunology and therapeutic development, as it helps elucidate molecular interactions that underlie immune responses. Despite recent progress with deep learning models like AlphaFold and RoseTTAFold, accurately modeling antigen-antibody complexes remains a challenge due to their unique evolutionary characteristics. HelixFold-Multimer, a specialized model developed for this purpose, builds on the framework of AlphaFold-Multimer and demonstrates improved precision for antigen-antibody structures. HelixFold-Multimer not only surpasses other models in accuracy but also provides essential insights into antibody development, enabling more precise identification of binding sites, improved interaction prediction, and enhanced design of therapeutic antibodies. These advances underscore HelixFold-Multimer's potential in supporting antibody research and therapeutic innovation.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
A Heterogeneous Network-based Contrastive Learning Approach for Predicting Drug-Target Interaction
Authors:
Junwei Hu,
Michael Bewong,
Selasi Kwashie,
Wen Zhang,
Vincent M. Nofong,
Guangsheng Wu,
Zaiwen Feng
Abstract:
Drug-target interaction (DTI) prediction is crucial for drug development and repositioning. Methods using heterogeneous graph neural networks (HGNNs) for DTI prediction have become a promising approach, with attention-based models often achieving excellent performance. However, these methods typically overlook edge features when dealing with heterogeneous biomedical networks. We propose a heteroge…
▽ More
Drug-target interaction (DTI) prediction is crucial for drug development and repositioning. Methods using heterogeneous graph neural networks (HGNNs) for DTI prediction have become a promising approach, with attention-based models often achieving excellent performance. However, these methods typically overlook edge features when dealing with heterogeneous biomedical networks. We propose a heterogeneous network-based contrastive learning method called HNCL-DTI, which designs a heterogeneous graph attention network to predict potential/novel DTIs. Specifically, our HNCL-DTI utilizes contrastive learning to collaboratively learn node representations from the perspective of both node-based and edge-based attention within the heterogeneous structure of biomedical networks. Experimental results show that HNCL-DTI outperforms existing advanced baseline methods on benchmark datasets, demonstrating strong predictive ability and practical effectiveness. The data and source code are available at https://github.com/Zaiwen/HNCL-DTI.
△ Less
Submitted 20 October, 2024;
originally announced November 2024.
-
Multi-view biomedical foundation models for molecule-target and property prediction
Authors:
Parthasarathy Suryanarayanan,
Yunguang Qiu,
Shreyans Sethi,
Diwakar Mahajan,
Hongyang Li,
Yuxin Yang,
Elif Eyigoz,
Aldo Guzman Saenz,
Daniel E. Platt,
Timothy H. Rumbell,
Kenney Ng,
Sanjoy Dey,
Myson Burch,
Bum Chul Kwon,
Pablo Meyer,
Feixiong Cheng,
Jianying Hu,
Joseph A. Morrone
Abstract:
Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-tr…
▽ More
Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-trained on a dataset of up to 200M molecules and then aggregated into combined representations. Our multi-view model is validated on a diverse set of 18 tasks, encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. We show that the multi-view models perform robustly and are able to balance the strengths and weaknesses of specific views. We then apply this model to screen compounds against a large (>100 targets) set of G Protein-Coupled receptors (GPCRs). From this library of targets, we identify 33 that are related to Alzheimer's disease. On this subset, we employ our model to identify strong binders, which are validated through structure-based modeling and identification of key binding motifs.
△ Less
Submitted 31 January, 2025; v1 submitted 25 October, 2024;
originally announced October 2024.
-
Determining cell population size from cell fraction in cell plasticity models
Authors:
Yuman Wang,
Shuli Chen,
Jie Hu,
Da Zhou
Abstract:
Quantifying the size of cell populations is crucial for understanding biological processes such as growth, injury repair, and disease progression. Often, experimental data offer information in the form of relative frequencies of distinct cell types, rather than absolute cell counts. This emphasizes the need to devise effective strategies for estimating absolute cell quantities from fraction data.…
▽ More
Quantifying the size of cell populations is crucial for understanding biological processes such as growth, injury repair, and disease progression. Often, experimental data offer information in the form of relative frequencies of distinct cell types, rather than absolute cell counts. This emphasizes the need to devise effective strategies for estimating absolute cell quantities from fraction data. In response to this challenge, we present two computational approaches grounded in stochastic cell population models: the first-order moment method (FOM) and the second-order moment method (SOM). These methods explicitly establish mathematical mappings from cell fraction to cell population size using moment equations of the stochastic models. Notably, our investigation demonstrates that the SOM method obviates the requirement for a priori knowledge of the initial population size, highlighting the utility of incorporating variance details from cell proportions. The robustness of both the FOM and SOM methods was analyzed from different perspectives. Additionally, we extended the application of the FOM and SOM methods to various biological mechanisms within the context of cell plasticity models. Our methodologies not only assist in mitigating the inherent limitations of experimental techniques when only fraction data is available for detecting cell population size, but they also offer new insights into utilizing the stochastic characteristics of cell population dynamics to quantify interactions between different biomasses within the system.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights
Authors:
Xiaomin Fang,
Jie Gao,
Jing Hu,
Lihang Liu,
Yang Xue,
Xiaonan Zhang,
Kunrui Zhu
Abstract:
While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex predictio…
▽ More
While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex prediction, tasks based on precise protein-protein interaction analysis also face obstacles. In this report, we highlight the ongoing advancements of our protein complex structure prediction model, HelixFold-Multimer, underscoring its enhanced performance. HelixFold-Multimer provides precise predictions for diverse protein complex structures, especially in therapeutic protein interactions. Notably, HelixFold-Multimer achieves remarkable success in antigen-antibody and peptide-protein structure prediction, greatly surpassing AlphaFold 3. HelixFold-Multimer is now available for public use on the PaddleHelix platform, offering both a general version and an antigen-antibody version. Researchers can conveniently access and utilize this service for their development needs.
△ Less
Submitted 17 May, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Reconstructing Visual Stimulus Images from EEG Signals Based on Deep Visual Representation Model
Authors:
Hongguang Pan,
Zhuoyi Li,
Yunpeng Fu,
Xuebin Qin,
Jianchen Hu
Abstract:
Reconstructing visual stimulus images is a significant task in neural decoding, and up to now, most studies consider the functional magnetic resonance imaging (fMRI) as the signal source. However, the fMRI-based image reconstruction methods are difficult to widely applied because of the complexity and high cost of the acquisition equipments. Considering the advantages of low cost and easy portabil…
▽ More
Reconstructing visual stimulus images is a significant task in neural decoding, and up to now, most studies consider the functional magnetic resonance imaging (fMRI) as the signal source. However, the fMRI-based image reconstruction methods are difficult to widely applied because of the complexity and high cost of the acquisition equipments. Considering the advantages of low cost and easy portability of the electroencephalogram (EEG) acquisition equipments, we propose a novel image reconstruction method based on EEG signals in this paper. Firstly, to satisfy the high recognizability of visual stimulus images in fast switching manner, we build a visual stimuli image dataset, and obtain the EEG dataset by a corresponding EEG signals collection experiment. Secondly, the deep visual representation model(DVRM) consisting of a primary encoder and a subordinate decoder is proposed to reconstruct visual stimuli. The encoder is designed based on the residual-in-residual dense blocks to learn the distribution characteristics between EEG signals and visual stimulus images, while the decoder is designed based on the deep neural network to reconstruct the visual stimulus image from the learned deep visual representation. The DVRM can fit the deep and multiview visual features of human natural state and make the reconstructed images more precise. Finally, we evaluate the DVRM in the quality of the generated images on our EEG dataset. The results show that the DVRM have good performance in the task of learning deep visual representation from EEG signals and generating reconstructed images that are realistic and highly resemble the original images.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Morphological entropy encodes cellular migration strategies on multiple length scales
Authors:
Yanping Liu,
Yang Jiao,
Qihui Fan,
Xinwei Li,
Zhichao Liu,
Jun Hu,
Jianwei Shuai,
Liyu Liu,
Zhangyong Li
Abstract:
Cell migration is crucial to many physiological and pathological processes. During migration, a cell adapts its morphology, including the overall morphology and nucleus morphology, in response to various cues in complex microenvironments, e.g. topotaxis and chemotaxis. Thus, cellular morphology dynamics can encode migration strategies based on which various migration mechanisms can be inferred. Ho…
▽ More
Cell migration is crucial to many physiological and pathological processes. During migration, a cell adapts its morphology, including the overall morphology and nucleus morphology, in response to various cues in complex microenvironments, e.g. topotaxis and chemotaxis. Thus, cellular morphology dynamics can encode migration strategies based on which various migration mechanisms can be inferred. However, how to decipher cell migration mechanisms encoded in the morphology dynamics remains a challenging problem. Here we introduce a novel universal metric, namely cell morphological entropy (CME), by combining parametric morphological analysis with Shannon entropy. The utility of CME, which accurately quantifies the complex cellular morphology on multiple length scales through the deviation from the perfect circular shape, is demonstrated using a variety of normal and tumorous cell lines in distinct in vitro microenvironments. Our results reveal that 1) the effects of geometric constraints on cell nucleus, 2) the emerging interplays of MCF-10A cells migrating on collagen gel, and 3) the critical transition of tumor spheroid from proliferation to invasion. The analysis indicates that the CME offers a physically interpretable and efficient tool to quantify morphology on multiple length scales in real-time, which provides more insights into cell migration, and further contributing to the understanding of the diverse behavioral modes as well as collective cell motility in more complex microenvironment.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Bayesian Inference of Phenotypic Plasticity of Cancer Cells Based on Dynamic Model for Temporal Cell Proportion Data
Authors:
Shuli Chen,
Yuman Wang,
Da Zhou,
Jie Hu
Abstract:
Mounting evidence underscores the prevalent hierarchical organization of cancer tissues. At the foundation of this hierarchy reside cancer stem cells, a subset of cells endowed with the pivotal role of engendering the entire cancer tissue through cell differentiation. In recent times, substantial attention has been directed towards the phenomenon of cancer cell plasticity, where the dynamic interc…
▽ More
Mounting evidence underscores the prevalent hierarchical organization of cancer tissues. At the foundation of this hierarchy reside cancer stem cells, a subset of cells endowed with the pivotal role of engendering the entire cancer tissue through cell differentiation. In recent times, substantial attention has been directed towards the phenomenon of cancer cell plasticity, where the dynamic interconversion between cancer stem cells and non-stem cancer cells has garnered significant interest. Since the task of detecting cancer cell plasticity from empirical data remains a formidable challenge, we propose a Bayesian statistical framework designed to infer phenotypic plasticity within cancer cells, utilizing temporal data on cancer stem cell proportions. Our approach is grounded in a stochastic model, adept at capturing the dynamic behaviors of cells. Leveraging Bayesian analysis, we explore the moment equation governing cancer stem cell proportions, derived from the Kolmogorov forward equation of our stochastic model. With improved Euler method for ordinary differential equations, a new statistical method for parameter estimation in nonlinear ordinary differential equations models is developed, which also provides novel ideas for the study of compositional data. Extensive simulations robustly validate the efficacy of our proposed method. To further corroborate our findings, we apply our approach to analyze published data from SW620 colon cancer cell lines. Our results harmonize with \emph{in situ} experiments, thereby reinforcing the utility of our method in discerning and quantifying phenotypic plasticity within cancer cells.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Evaluation of network-guided random forest for disease gene discovery
Authors:
Jianchang Hu,
Silke Szymczak
Abstract:
Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the cons…
▽ More
Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
BrainNPT: Pre-training of Transformer networks for brain network classification
Authors:
Jinlong Hu,
Yangmin Huang,
Nan Wang,
Shoubin Dong
Abstract:
Deep learning methods have advanced quickly in brain imaging analysis over the past few years, but they are usually restricted by the limited labeled data. Pre-trained model on unlabeled data has presented promising improvement in feature learning in many domains, including natural language processing and computer vision. However, this technique is under-explored in brain network analysis. In this…
▽ More
Deep learning methods have advanced quickly in brain imaging analysis over the past few years, but they are usually restricted by the limited labeled data. Pre-trained model on unlabeled data has presented promising improvement in feature learning in many domains, including natural language processing and computer vision. However, this technique is under-explored in brain network analysis. In this paper, we focused on pre-training methods with Transformer networks to leverage existing unlabeled data for brain functional network classification. First, we proposed a Transformer-based neural network, named as BrainNPT, for brain functional network classification. The proposed method leveraged <cls> token as a classification embedding vector for the Transformer model to effectively capture the representation of brain network. Second, we proposed a pre-training framework for BrainNPT model to leverage unlabeled brain network data to learn the structure information of brain networks. The results of classification experiments demonstrated the BrainNPT model without pre-training achieved the best performance with the state-of-the-art models, and the BrainNPT model with pre-training strongly outperformed the state-of-the-art models. The pre-training BrainNPT model improved 8.75% of accuracy compared with the model without pre-training. We further compared the pre-training strategies, analyzed the influence of the parameters of the model, and interpreted the trained model.
△ Less
Submitted 2 August, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Designing novel protein structures using sequence generator and AlphaFold2
Authors:
Xeerak Agha,
Nihang Fu,
Jianjun Hu
Abstract:
Protein structures and functions are determined by a contiguous arrangement of amino acid sequences. Designing novel protein sequences and structures with desired geometry and functions is a complex task with large state spaces. Here we develop a novel protein design pipeline consisting of two deep learning algorithms, ProteinSolver and AlphaFold2. ProteinSolver is a deep graph neural network that…
▽ More
Protein structures and functions are determined by a contiguous arrangement of amino acid sequences. Designing novel protein sequences and structures with desired geometry and functions is a complex task with large state spaces. Here we develop a novel protein design pipeline consisting of two deep learning algorithms, ProteinSolver and AlphaFold2. ProteinSolver is a deep graph neural network that generates amino acid sequences such that the forces between interacting amino acids are favorable and compatible with the fold while AlphaFold2 is a deep learning algorithm that predicts the protein structures from protein sequences. We present forty de novo designed binding sites of the PTP1B and P53 proteins with high precision, out of which thirty proteins are novel. Using ProteinSolver and AlphaFold2 in conjunction, we can trim the exploration of the large protein conformation space, thus expanding the ability to find novel and diverse de novo protein designs.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response
Authors:
Jie Gao,
Jing Hu,
Wanqing Sun,
Yili Shen,
Xiaonan Zhang,
Xiaomin Fang,
Fan Wang,
Guodong Zhao
Abstract:
Predicting clinical outcomes to anti-cancer drugs on a personalized basis is challenging in cancer treatment due to the heterogeneity of tumors. Traditional computational efforts have been made to model the effect of drug response on individual samples depicted by their molecular profile, yet overfitting occurs because of the high dimension for omics data, hindering models from clinical applicatio…
▽ More
Predicting clinical outcomes to anti-cancer drugs on a personalized basis is challenging in cancer treatment due to the heterogeneity of tumors. Traditional computational efforts have been made to model the effect of drug response on individual samples depicted by their molecular profile, yet overfitting occurs because of the high dimension for omics data, hindering models from clinical application. Recent research shows that deep learning is a promising approach to build drug response models by learning alignment patterns between drugs and samples. However, existing studies employed the simple feature fusion strategy and only considered the drug features as a whole representation while ignoring the substructure information that may play a vital role when aligning drugs and genes. Hereby in this paper, we propose TCR (Transformer based network for Cancer drug Response) to predict anti-cancer drug response. By utilizing an attention mechanism, TCR is able to learn the interactions between drug atom/sub-structure and molecular signatures efficiently in our study. Furthermore, a dual loss function and cross sampling strategy were designed to improve the prediction power of TCR. We show that TCR outperformed all other methods under various data splitting strategies on all evaluation matrices (some with significant improvement). Extensive experiments demonstrate that TCR shows significantly improved generalization ability on independent in-vitro experiments and in-vivo real patient data. Our study highlights the prediction power of TCR and its potential value for cancer drug repurpose and precision oncology treatment.
△ Less
Submitted 10 July, 2022;
originally announced July 2022.
-
An Onsager-Machlup approach to the most probable transition pathway for a genetic regulatory network
Authors:
Jianyu Hu,
Xiaoli Chen,
Jinqiao Duan
Abstract:
We investigate a quantitative network of gene expression dynamics describing the competence development in Bacillus subtilis. First, we introduce an Onsager-Machlup approach to quantify the most probable transition pathway for both excitable and bistable dynamics. Then, we apply a machine learning method to calculate the most probable transition pathway via the Euler-Lagrangian equation. Finally,…
▽ More
We investigate a quantitative network of gene expression dynamics describing the competence development in Bacillus subtilis. First, we introduce an Onsager-Machlup approach to quantify the most probable transition pathway for both excitable and bistable dynamics. Then, we apply a machine learning method to calculate the most probable transition pathway via the Euler-Lagrangian equation. Finally, we analyze how the noise intensity affects the transition phenomena.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data
Authors:
Xinlei Mi,
William Bekerman,
Peter A. Sims,
Peter D. Canoll,
Jianhua Hu
Abstract:
Applications of single-cell RNA sequencing in various biomedical research areas have been blooming. This new technology provides unprecedented opportunities to study disease heterogeneity at the cellular level. However, unique characteristics of scRNA-seq data, including large dimensionality, high dropout rates, and possibly batch effects, bring great difficulty into the analysis of such data. Not…
▽ More
Applications of single-cell RNA sequencing in various biomedical research areas have been blooming. This new technology provides unprecedented opportunities to study disease heterogeneity at the cellular level. However, unique characteristics of scRNA-seq data, including large dimensionality, high dropout rates, and possibly batch effects, bring great difficulty into the analysis of such data. Not appropriately addressing these issues obstructs true scientific discovery. Herein, we propose a unified Regularized Zero-inflated Mixture Model framework designed for scRNA-seq data (RZiMM-scRNA) to simultaneously detect cell subgroups and identify gene differential expression based on a developed importance score, accounting for both dropouts and batch effects. We conduct extensive simulation studies in which we evaluate the performance of RZiMM-scRNA and compare it with several popular methods, including Seurat, SC3, K-Means, and Hierarchical Clustering. Simulation results show that RZiMM-scRNA demonstrates superior clustering performance and enhanced biomarker detection accuracy compared to alternative methods, especially when cell subgroups are less distinct, verifying the robustness of our method. Our empirical investigations focus on two brain tumor studies dealing with astrocytoma of various grades, including the most malignant of all brain tumors, glioblastoma multiforme (GBM). Our goal is to delineate cell heterogeneity and identify driving biomarkers associated with these tumors. Notably, RZiMM-scNRA successfully identifies a small group of oligodendrocyte cells which has drawn much attention in biomedical literature on brain cancers.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Rapid genetic screening with high quality factor metasurfaces
Authors:
Jack Hu,
Fareeha Safir,
Kai Chang,
Sahil Dagli,
Halleh B. Balch,
John M. Abendroth,
Jefferson Dixon,
Parivash Moradifar,
Varun Dolia,
Malaya K. Sahoo,
Benjamin A. Pinsky,
Stefanie S. Jeffrey,
Mark Lawrence,
Jennifer A. Dionne
Abstract:
Genetic analysis methods are foundational to advancing personalized and preventative medicine, accelerating disease diagnostics, and monitoring the health of organisms and ecosystems. Current nucleic acid technologies such as polymerase chain reaction (PCR), next-generation sequencing (NGS), and DNA microarrays rely on fluorescence and absorbance, necessitating sample amplification or replication…
▽ More
Genetic analysis methods are foundational to advancing personalized and preventative medicine, accelerating disease diagnostics, and monitoring the health of organisms and ecosystems. Current nucleic acid technologies such as polymerase chain reaction (PCR), next-generation sequencing (NGS), and DNA microarrays rely on fluorescence and absorbance, necessitating sample amplification or replication and leading to increased processing time and cost. Here, we introduce a label-free genetic screening platform based on high quality (high-Q) factor silicon nanoantennas functionalized with monolayers of nucleic acid fragments. Each nanoantenna exhibits substantial electromagnetic field enhancements with sufficiently localized fields to ensure isolation from neighboring resonators, enabling dense biosensor integration. We quantitatively detect complementary target sequences using DNA hybridization simultaneously for arrays of sensing elements patterned at densities of 160,000 pixels per cm$^2$. In physiological buffer, our nanoantennas exhibit average resonant quality factors of 2,200, allowing detection of two gene fragments, SARS-CoV-2 envelope (E) and open reading frame 1b (ORF1b), down to femtomolar concentrations. We also demonstrate high specificity sensing in clinical nasopharyngeal eluates within 5 minutes of sample introduction. Combined with advances in biomarker isolation from complex samples (e.g., mucus, blood, wastewater), our work provides a foundation for rapid, compact, amplification-free and high throughput multiplexed genetic screening assays spanning medical diagnostics to environmental monitoring.
△ Less
Submitted 31 July, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science
Authors:
Mufei Li,
Jinjing Zhou,
Jiajing Hu,
Wenxuan Fan,
Yangkang Zhang,
Yaxin Gu,
George Karypis
Abstract:
Graph neural networks (GNNs) constitute a class of deep learning methods for graph data. They have wide applications in chemistry and biology, such as molecular property prediction, reaction prediction and drug-target interaction prediction. Despite the interest, GNN-based modeling is challenging as it requires graph data pre-processing and modeling in addition to programming and deep learning. He…
▽ More
Graph neural networks (GNNs) constitute a class of deep learning methods for graph data. They have wide applications in chemistry and biology, such as molecular property prediction, reaction prediction and drug-target interaction prediction. Despite the interest, GNN-based modeling is challenging as it requires graph data pre-processing and modeling in addition to programming and deep learning. Here we present DGL-LifeSci, an open-source package for deep learning on graphs in life science. DGL-LifeSci is a python toolkit based on RDKit, PyTorch and Deep Graph Library (DGL). DGL-LifeSci allows GNN-based modeling on custom datasets for molecular property prediction, reaction prediction and molecule generation. With its command-line interfaces, users can perform modeling without any background in programming and deep learning. We test the command-line interfaces using standard benchmarks MoleculeNet, USPTO, and ZINC. Compared with previous implementations, DGL-LifeSci achieves a speed up by up to 6x. For modeling flexibility, DGL-LifeSci provides well-optimized modules for various stages of the modeling pipeline. In addition, DGL-LifeSci provides pre-trained models for reproducing the test experiment results and applying models without training. The code is distributed under an Apache-2.0 License and is freely accessible at https://github.com/awslabs/dgl-lifesci.
△ Less
Submitted 27 June, 2021;
originally announced June 2021.
-
Algebraic Model Selection and Experimental Design in Biological Data Science
Authors:
Anyu Zhang,
Jingzhen Hu,
Qingzhong Liang,
Elena S. Dimitrova,
Brandilyn Stigler
Abstract:
Design of experiments and model selection, though essential steps in data science, are usually viewed as unrelated processes in the study and analysis of biological networks. Not accounting for their inter-relatedness has the potential to introduce bias and increase the risk of missing salient features in the modeling process. We propose a data-driven computational framework to unify experimental…
▽ More
Design of experiments and model selection, though essential steps in data science, are usually viewed as unrelated processes in the study and analysis of biological networks. Not accounting for their inter-relatedness has the potential to introduce bias and increase the risk of missing salient features in the modeling process. We propose a data-driven computational framework to unify experimental design and model selection for discrete data sets and minimal polynomial models. We use a special affine transformation, called a linear shift, to provide both the data sets and the polynomial terms that form a basis for a model. This framework enables us to address two important questions that arise in biological data science research: finding the data which identify a set of known interactions and finding identifiable interactions given a set of data. We present the theoretical foundation for a web-accessible database. As an example, we apply this methodology to a previously constructed pharmacodynamic model of epidermal derived growth factor receptor (EGFR) signaling.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
CovidNet: To Bring Data Transparency in the Era of COVID-19
Authors:
Tong Yang,
Kai Shen,
Sixuan He,
Enyu Li,
Peter Sun,
Pingying Chen,
Lin Zuo,
Jiayue Hu,
Yiwen Mo,
Weiwei Zhang,
Haonan Zhang,
Jingxue Chen,
Yu Guo
Abstract:
Timely, creditable, and fine-granular case information is vital for local communities and individual citizens to make rational and data-driven responses to the COVID-19 pandemic. This paper presents CovidNet, a COVID-19 tracking project associated with a large scale epidemic dataset, which was initiated by 1Point3Acres. To the best of our knowledge, the project is the only platform providing real-…
▽ More
Timely, creditable, and fine-granular case information is vital for local communities and individual citizens to make rational and data-driven responses to the COVID-19 pandemic. This paper presents CovidNet, a COVID-19 tracking project associated with a large scale epidemic dataset, which was initiated by 1Point3Acres. To the best of our knowledge, the project is the only platform providing real-time global case information of more than 4,124 sub-divisions from over 27 countries worldwide with multi-language supports. The platform also offers interactive visualization tools to analyze the full historical case curves in each region. Initially launched as a voluntary project to bridge the data transparency gap in North America in January 2020, this project by far has become one of the major independent sources worldwide and has been consumed by many other tracking platforms. The accuracy and freshness of the dataset is a result of the painstaking efforts from our voluntary teamwork, crowd-sourcing channels, and automated data pipelines. As of May 18, 2020, the project website has been visited more than 200 million times and the CovidNet dataset has empowered over 522 institutions and organizations worldwide in policy-making and academic researches. All datasets are openly accessible for non-commercial purposes at https://coronavirus.1point3acres.com via a formal request through our APIs.
△ Less
Submitted 20 July, 2020; v1 submitted 21 May, 2020;
originally announced May 2020.
-
COVID-19 Docking Server: A meta server for docking small molecules, peptides and antibodies against potential targets of COVID-19
Authors:
Ren Kong,
Guangbo Yang,
Rui Xue,
Ming Liu,
Feng Wang,
Jianping Hu,
Xiaoqiang Guo,
Shan Chang
Abstract:
Motivation: The coronavirus disease 2019 (COVID-19) caused by a new type of coronavirus has been emerging from China and led to thousands of death globally since December 2019. Despite many groups have engaged in studying the newly emerged virus and searching for the treatment of COVID-19, the understanding of the COVID-19 target-ligand interactions represents a key chal-lenge. Herein, we introduc…
▽ More
Motivation: The coronavirus disease 2019 (COVID-19) caused by a new type of coronavirus has been emerging from China and led to thousands of death globally since December 2019. Despite many groups have engaged in studying the newly emerged virus and searching for the treatment of COVID-19, the understanding of the COVID-19 target-ligand interactions represents a key chal-lenge. Herein, we introduce COVID-19 Docking Server, a web server that predicts the binding modes between COVID-19 targets and the ligands including small molecules, peptides and anti-bodies. Results: Structures of proteins involved in the virus life cycle were collected or constructed based on the homologs of coronavirus, and prepared ready for docking. The meta platform provides a free and interactive tool for the prediction of COVID-19 target-ligand interactions and following drug discovery for COVID-19.
△ Less
Submitted 7 August, 2020; v1 submitted 28 February, 2020;
originally announced March 2020.
-
TF3P: Three-dimensional Force Fields Fingerprint Learned by Deep Capsular Network
Authors:
Yanxing Wang,
Jianxing Hu,
Junyong Lai,
Yibo Li,
Hongwei Jin,
Lihe Zhang,
Liangren Zhang,
Zhenming Liu
Abstract:
Molecular fingerprints are the workhorse in ligand-based drug discovery. In recent years, an increasing number of research papers reported fascinating results on using deep neural networks to learn 2D molecular representations as fingerprints. It is anticipated that the integration of deep learning would also contribute to the prosperity of 3D fingerprints. Here, we unprecedentedly introduce deep…
▽ More
Molecular fingerprints are the workhorse in ligand-based drug discovery. In recent years, an increasing number of research papers reported fascinating results on using deep neural networks to learn 2D molecular representations as fingerprints. It is anticipated that the integration of deep learning would also contribute to the prosperity of 3D fingerprints. Here, we unprecedentedly introduce deep learning into 3D small molecule fingerprints, presenting a new one we termed as the three-dimensional force fields fingerprint (TF3P). TF3P is learned by a deep capsular network whose training is in no need of labeled datasets for specific predictive tasks. TF3P can encode the 3D force fields information of molecules and demonstrates the stronger ability to capture 3D structural changes, to recognize molecules alike in 3D but not in 2D, and to identify similar targets inaccessible by other 2D or 3D fingerprints based on only ligands similarity. Furthermore, TF3P is compatible with both statistical models (e.g. similarity ensemble approach) and machine learning models. Altogether, we report TF3P as a new 3D small molecule fingerprint with a promising future in ligand-based drug discovery. All codes are written in Python and available at https://github.com/canisw/tf3p.
△ Less
Submitted 16 May, 2020; v1 submitted 24 December, 2019;
originally announced December 2019.
-
Binding and segregation of proteins in membrane adhesion: Theory, modelling, and simulations
Authors:
Thomas R. Weikl,
Jinglei Hu,
Batuhan Kav,
Bartosz Rozycki
Abstract:
The adhesion of biomembranes is mediated by the binding of membrane-anchored receptor and ligand proteins. The proteins can only bind if the separation between apposing membranes is sufficiently close to the length of the protein complexes, which leads to an interplay between protein binding and membrane shape. In this article, we review current models of biomembrane adhesion and novel insights ob…
▽ More
The adhesion of biomembranes is mediated by the binding of membrane-anchored receptor and ligand proteins. The proteins can only bind if the separation between apposing membranes is sufficiently close to the length of the protein complexes, which leads to an interplay between protein binding and membrane shape. In this article, we review current models of biomembrane adhesion and novel insights obtained from the models. Theory and simulations with elastic-membrane and coarse-grained molecular models of biomembrane adhesion indicate that the binding of proteins in membrane adhesion strongly depends on nanoscale shape fluctuations of the apposing membranes, which results in binding cooperativity. A length mismatch between protein complexes leads to repulsive interactions that are caused by membrane bending and act as a driving force for the length-based segregation of proteins during membrane adhesion.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
In Silico Prediction of Cell Traction Forces
Authors:
Nicolas Pielawski,
Jianjiang Hu,
Staffan Strömblad,
Carolina Wählby
Abstract:
Traction Force Microscopy (TFM) is a technique used to determine the tensions that a biological cell conveys to the underlying surface. Typically, TFM requires culturing cells on gels with fluorescent beads, followed by bead displacement calculations. We present a new method allowing to predict those forces from a regular fluorescent image of the cell. Using Deep Learning, we trained a Bayesian Ne…
▽ More
Traction Force Microscopy (TFM) is a technique used to determine the tensions that a biological cell conveys to the underlying surface. Typically, TFM requires culturing cells on gels with fluorescent beads, followed by bead displacement calculations. We present a new method allowing to predict those forces from a regular fluorescent image of the cell. Using Deep Learning, we trained a Bayesian Neural Network adapted for pixel regression of the forces and show that it generalises on different cells of the same strain. The predicted forces are computed along with an approximated uncertainty, which shows whether the prediction is trustworthy or not. Using the proposed method could help estimating forces when calculating non-trivial bead displacements and can also free one of the fluorescent channels of the microscope. Code is available at \url{https://github.com/wahlby-lab/InSilicoTFM}.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.
-
DeepScaffold: a comprehensive tool for scaffold-based de novo drug discovery using deep learning
Authors:
Yibo Li,
Jianxing Hu,
Yanxing Wang,
Jielong Zhou,
Liangren Zhang,
Zhenming Liu
Abstract:
The ultimate goal of drug design is to find novel compounds with desirable pharmacological properties. Designing molecules retaining particular scaffolds as the core structures of the molecules is one of the efficient ways to obtain potential drug candidates with desirable properties. We proposed a scaffold-based molecular generative model for scaffold-based drug discovery, which performs molecule…
▽ More
The ultimate goal of drug design is to find novel compounds with desirable pharmacological properties. Designing molecules retaining particular scaffolds as the core structures of the molecules is one of the efficient ways to obtain potential drug candidates with desirable properties. We proposed a scaffold-based molecular generative model for scaffold-based drug discovery, which performs molecule generation based on a wide spectrum of scaffold definitions, including BM-scaffolds, cyclic skeletons, as well as scaffolds with specifications on side-chain properties. The model can generalize the learned chemical rules of adding atoms and bonds to a given scaffold. Furthermore, the generated compounds were evaluated by molecular docking in DRD2 targets and the results demonstrated that this approach can be effectively applied to solve several drug design problems, including the generation of compounds containing a given scaffold and de novo drug design of potential drug candidates with specific docking scores. Finally, a command line interface is created.
△ Less
Submitted 4 September, 2019; v1 submitted 20 August, 2019;
originally announced August 2019.
-
Bayesian Detection of Abnormal ADS in Mutant Caenorhabditis elegans Embryos
Authors:
Wei Liang,
Yuxiao Yang,
Yusi Fang,
Zhongying Zhao,
Jie Hu
Abstract:
Cell division timing is critical for cell fate specification and morphogenesis during embryogenesis. How division timings are regulated among cells during development is poorly understood. Here we focus on the comparison of asynchrony of division between sister cells (ADS) between wild-type and mutant individuals of Caenorhabditis elegans. Since the replicate number of mutant individuals of each m…
▽ More
Cell division timing is critical for cell fate specification and morphogenesis during embryogenesis. How division timings are regulated among cells during development is poorly understood. Here we focus on the comparison of asynchrony of division between sister cells (ADS) between wild-type and mutant individuals of Caenorhabditis elegans. Since the replicate number of mutant individuals of each mutated gene, usually one, is far smaller than that of wild-type, direct comparison of two distributions of ADS between wild-type and mutant type, such as Kolmogorov- Smirnov test, is not feasible. On the other hand, we find that sometimes ADS is correlated with the life span of corresponding mother cell in wild-type. Hence, we apply a semiparametric Bayesian quantile regression method to estimate the 95% confidence interval curve of ADS with respect to life span of mother cell of wild-type individuals. Then, mutant-type ADSs outside the corresponding confidence interval are selected out as abnormal one with a significance level of 0.05. Simulation study demonstrates the accuracy of our method and Gene Enrichment Analysis validates the results of real data sets.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
A Bayesian statistical analysis of stochastic phenotypic plasticity model of cancer cells
Authors:
Da Zhou,
Shanjun Mao,
Kaiyi Chen,
Xiaofang Cao,
Jie Hu
Abstract:
The phenotypic plasticity of cancer cells has received special attention in recent years. Even though related models have been widely studied in terms of mathematical properties, a thorough statistical analysis on parameter estimation and model selection is still very lacking. In this study, we present a Bayesian approach on the relative frequencies of cancer stem cells (CSCs). Both Gibbs sampling…
▽ More
The phenotypic plasticity of cancer cells has received special attention in recent years. Even though related models have been widely studied in terms of mathematical properties, a thorough statistical analysis on parameter estimation and model selection is still very lacking. In this study, we present a Bayesian approach on the relative frequencies of cancer stem cells (CSCs). Both Gibbs sampling and Metropolis-Hastings (MH) algorithm are used to perform point and interval estimations of cell-state transition rates between CSCs and non-CSCs. Extensive simulations demonstrate the validity of our model and algorithm. By applying this method to a published data on SW620 colon cancer cell line, the model selection favors the phenotypic plasticity model, relative to conventional hierarchical model of cancer cells. Moreover, it is found that the initial state of CSCs after cell sorting significantly influences the occurrence of phenotypic plasticity.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Predicting disease-related genes by path-based similarity and community structure in protein-protein interaction network
Authors:
Ke Hu,
Jing-Bo Hu,
Ju Xiang,
Hui-Jia Li,
Yan Zhang,
Shi Chen,
Chen-He Yi
Abstract:
Network-based computational approaches to predict unknown genes associated with certain diseases are of considerable significance for uncovering the molecular basis of human diseases. In this paper, we proposed a kind of new disease-gene-prediction methods by combining the path-based similarity with the community structure in the human protein-protein interaction network. Firstly, we introduced a…
▽ More
Network-based computational approaches to predict unknown genes associated with certain diseases are of considerable significance for uncovering the molecular basis of human diseases. In this paper, we proposed a kind of new disease-gene-prediction methods by combining the path-based similarity with the community structure in the human protein-protein interaction network. Firstly, we introduced a set of path-based similarity indices, a novel community-based similarity index, and a new similarity combining the path-based similarity index. Then we assessed the statistical significance of the measures in distinguishing the disease genes from non-disease genes, to confirm their availability in predicting disease genes. Finally, we applied these measures to the disease-gene prediction of single disease-gene family, and analyzed the performance of these measures in disease-gene prediction, especially the effect of the community structure on the prediction performance in detail. The results indicated that genes associated with the same or similar diseases commonly reside in the same community of the protein-protein interaction network, and the community structure is greatly helpful for the disease-gene prediction.
△ Less
Submitted 21 July, 2017;
originally announced July 2017.
-
White matter deficits underlie the loss of consciousness level and predict recovery outcome in disorders of consciousness
Authors:
Xuehai Wu,
Jiaying Zhang,
Zaixu Cui,
Weijun Tang,
Chunhong Shao,
Jin Hu,
Jianhong Zhu,
Liangfu Zhou,
Yao Zhao,
Lu Lu,
Gang Chen,
Georg Northoff,
Gaolang Gong,
Ying Mao,
Yong He
Abstract:
This study aimed to identify white matter (WM) deficits underlying the loss of consciousness in disorder of consciousness (DOC) patients using Diffusion Tensor Imaging (DTI) and to demonstrate the potential value of DTI parameters in predicting recovery outcomes of DOC patients. With 30 DOC patients (8 comatose, 8 unresponsive wakefulness syndrome/vegetative state, and 14 minimal conscious state)…
▽ More
This study aimed to identify white matter (WM) deficits underlying the loss of consciousness in disorder of consciousness (DOC) patients using Diffusion Tensor Imaging (DTI) and to demonstrate the potential value of DTI parameters in predicting recovery outcomes of DOC patients. With 30 DOC patients (8 comatose, 8 unresponsive wakefulness syndrome/vegetative state, and 14 minimal conscious state) and 25 patient controls, we performed group comparison of DTI parameters across 48 core WM regions of interest (ROIs) using Analysis of Covariance. Compared with controls, DOC patients had decreased Fractional anisotropy (FA) and increased diffusivities in widespread WM area.The corresponding DTI parameters of those WM deficits in DOC patients significantly correlated with the consciousness level evaluated by Coma Recovery Scale Revised (CRS-R) and Glasgow Coma Scale (GCS). As for predicting the recovery outcomes (i.e., regaining consciousness or not, grouped by their Glasgow Outcome Scale more than 2 or not) at 3 months post scan, radial diffusivity of left superior cerebellar peduncle and FA of right sagittal stratum reached an accuracy of 87.5% and 75% respectively. Our findings showed multiple WM deficits underlying the loss of consciousness level, and demonstrated the potential value of these WM areas in predicting the recovery outcomes of DOC patients who have lost awareness of the environment and themselves.
△ Less
Submitted 24 November, 2016;
originally announced November 2016.
-
Binding equilibrium and kinetics of membrane-anchored receptors and ligands in cell adhesion: insights from computational model systems and theory
Authors:
Thomas R. Weikl,
Jinglei Hu,
Guang-Kui Xu,
Reinhard Lipowsky
Abstract:
The adhesion of cell membranes is mediated by the binding of membrane-anchored receptor and ligand proteins. In this article, we review recent results from simulations and theory that lead to novel insights on how the binding equilibrium and kinetics of these proteins is affected by the membranes and by the membrane anchoring and molecular properties of the proteins. Simulations and theory both in…
▽ More
The adhesion of cell membranes is mediated by the binding of membrane-anchored receptor and ligand proteins. In this article, we review recent results from simulations and theory that lead to novel insights on how the binding equilibrium and kinetics of these proteins is affected by the membranes and by the membrane anchoring and molecular properties of the proteins. Simulations and theory both indicate that the binding equilibrium constant K2D and the on- and off-rate constants of anchored receptors and ligands in their 'two-dimensional' (2D) membrane environment strongly depend on the membrane roughness from thermally excited shape fluctuations on nanoscales. Recent theory corroborated by simulations provides a general relation between K2D} and the binding constant K3D of soluble variants of the receptors and ligands that lack the membrane anchors and are free to diffuse in three dimensions (3D).
△ Less
Submitted 14 June, 2016;
originally announced June 2016.
-
Primer on the Gene Ontology
Authors:
Pascale Gaudet,
Nives Škunca,
James C. Hu,
Christophe Dessimoz
Abstract:
The Gene Ontology (GO) project is the largest resource for cataloguing gene function. The combination of solid conceptual underpinnings and a practical set of features have made the GO a widely adopted resource in the research community and an essential resource for data analysis. In this chapter, we provide a concise primer for all users of the GO. We briefly introduce the structure of the ontolo…
▽ More
The Gene Ontology (GO) project is the largest resource for cataloguing gene function. The combination of solid conceptual underpinnings and a practical set of features have made the GO a widely adopted resource in the research community and an essential resource for data analysis. In this chapter, we provide a concise primer for all users of the GO. We briefly introduce the structure of the ontology and explain how to interpret annotations associated with the GO.
△ Less
Submitted 4 February, 2016;
originally announced February 2016.
-
A pathway-based network analysis of hypertension-related genes
Authors:
Huan Wang,
Jing-Bo Hu,
Chuan-Yun Xu,
De-Hai Zhang,
Qian Yan,
Ming Xu,
Ke-Fei Cao,
Xu-Sheng Zhang
Abstract:
Complex network approach has become an effective way to describe interrelationships among large amounts of biological data, which is especially useful in finding core functions and global behavior of biological systems. Hypertension is a complex disease caused by many reasons including genetic, physiological, psychological and even social factors. In this paper, based on the information of biologi…
▽ More
Complex network approach has become an effective way to describe interrelationships among large amounts of biological data, which is especially useful in finding core functions and global behavior of biological systems. Hypertension is a complex disease caused by many reasons including genetic, physiological, psychological and even social factors. In this paper, based on the information of biological pathways, we construct a network model of hypertension-related genes of the salt-sensitive rat to explore the interrelationship between genes. Statistical and topological characteristics show that the network has the small-world but not scale-free property, and exhibits a modular structure, revealing compact and complex connections among these genes. By the threshold of integrated centrality larger than 0.71, seven key hub genes are found: Jun, Rps6kb1, Cycs, Creb3l2, Cdk4, Actg1 and RT1-Da. These genes should play an important role in hypertension, suggesting that the treatment of hypertension should focus on the combination of drugs on multiple genes.
△ Less
Submitted 27 January, 2016;
originally announced January 2016.
-
A complex network analysis of hypertension-related genes
Authors:
Huan Wang,
Chuan-Yun Xu,
Jing-Bo Hu,
Ke-Fei Cao
Abstract:
In this paper, a network of hypertension-related genes is constructed by analyzing the correlations of gene expression data among the Dahl salt-sensitive rat and two consomic rat strains. The numerical calculations show that this sparse and assortative network has small-world and scale-free properties. Further, 16 key hub genes (Col4a1, Lcn2, Cdk4, etc.) are determined by introducing an integrated…
▽ More
In this paper, a network of hypertension-related genes is constructed by analyzing the correlations of gene expression data among the Dahl salt-sensitive rat and two consomic rat strains. The numerical calculations show that this sparse and assortative network has small-world and scale-free properties. Further, 16 key hub genes (Col4a1, Lcn2, Cdk4, etc.) are determined by introducing an integrated centrality and have been confirmed by biological/medical research to play important roles in hypertension.
△ Less
Submitted 26 January, 2016;
originally announced January 2016.
-
Binding kinetics of membrane-anchored receptors and ligands: molecular dynamics simulations and theory
Authors:
Jinglei Hu,
Guang-Kui Xu,
Reinhard Lipowsky,
Thomas R. Weikl
Abstract:
The adhesion of biological membranes is mediated by the binding of membrane-anchored receptor and ligand proteins. Central questions are how the binding kinetics of these proteins is affected by the membranes and by the membrane anchoring of the proteins. In this article, we (i) present detailed data for the binding of membrane-anchored proteins from coarse-grained molecular dynamics simulations,…
▽ More
The adhesion of biological membranes is mediated by the binding of membrane-anchored receptor and ligand proteins. Central questions are how the binding kinetics of these proteins is affected by the membranes and by the membrane anchoring of the proteins. In this article, we (i) present detailed data for the binding of membrane-anchored proteins from coarse-grained molecular dynamics simulations, and (ii) provide a theory that describes how the binding kinetics depends on the average separation and thermal roughness of the adhering membranes, and on the anchoring, lengths, and length variations of the proteins. An important element of our theory is the tilt of bound receptor-ligand complexes and transition-state complexes relative to the membrane normals. This tilt results from an interplay of the anchoring energy and rotational entropy of the complexes and facilitates the formation of receptor-ligand bonds at membrane separations smaller than the preferred separation for binding. In our simulations, we have considered both lipid-anchored and transmembrane receptor and ligand proteins. We find that the binding equilibrium constant and binding on-rate constant of lipid-anchored proteins are considerably smaller than the binding constant and on-rate constant of rigid transmembrane proteins with identical binding domains.
△ Less
Submitted 24 November, 2015;
originally announced November 2015.
-
Binding constants of membrane-anchored receptors and ligands: a general theory corroborated by Monte Carlo simulations
Authors:
Guang-Kui Xu,
Jinglei Hu,
Reinhard Lipowsky,
Thomas R. Weikl
Abstract:
Adhesion processes of biological membranes that enclose cells and cellular organelles are essential for immune responses, tissue formation, and signaling. These processes depend sensitively on the binding constant K2D of the membrane-anchored receptor and ligand proteins that mediate adhesion, which is difficult to measure in the 'two-dimensional' (2D) membrane environment of the proteins. An impo…
▽ More
Adhesion processes of biological membranes that enclose cells and cellular organelles are essential for immune responses, tissue formation, and signaling. These processes depend sensitively on the binding constant K2D of the membrane-anchored receptor and ligand proteins that mediate adhesion, which is difficult to measure in the 'two-dimensional' (2D) membrane environment of the proteins. An important problem therefore is to relate K2D} to the binding constant K3D} of soluble variants of the receptors and ligands that lack the membrane anchors and are free to diffuse in three dimensions (3D). In this article, we present a general theory for the binding constants K2D and K3D of rather stiff proteins whose main degrees of freedom are translation and rotation, along membranes and around anchor points 'in 2D', or unconstrained 'in 3D'. The theory generalizes previous results by describing how K2D depends both on the average separation and thermal nanoscale roughness of the apposing membranes, and on the length and anchoring flexibility of the receptors and ligands. Our theoretical results for the ratio K2D/K3D of the binding constants agree with detailed results from Monte Carlo simulations without any data fitting, which indicates that the theory captures the essential features of the 'dimensionality reduction' due to membrane anchoring. In our Monte Carlo simulations, we consider a novel coarse-grained model of biomembrane adhesion in which the membranes are represented as discretized elastic surfaces, and the receptors and ligands as anchored molecules that diffuse continuously along the membranes and rotate at their anchor points.
△ Less
Submitted 24 November, 2015;
originally announced November 2015.
-
Spatially Adaptive Stochastic Methods for Fluid-Structure Interactions Subject to Thermal Fluctuations in Domains with Complex Geometries
Authors:
Pat Plunkett,
Jon Hu,
Chris Siefert,
Paul J. Atzberger
Abstract:
We develop stochastic mixed finite element methods for spatially adaptive simulations of fluid-structure interactions when subject to thermal fluctuations. To account for thermal fluctuations, we introduce a discrete fluctuation-dissipation balance condition to develop compatible stochastic driving fields for our discretization. We perform analysis that shows our condition is sufficient to ensure…
▽ More
We develop stochastic mixed finite element methods for spatially adaptive simulations of fluid-structure interactions when subject to thermal fluctuations. To account for thermal fluctuations, we introduce a discrete fluctuation-dissipation balance condition to develop compatible stochastic driving fields for our discretization. We perform analysis that shows our condition is sufficient to ensure results consistent with statistical mechanics. We show the Gibbs-Boltzmann distribution is invariant under the stochastic dynamics of the semi-discretization. To generate efficiently the required stochastic driving fields, we develop a Gibbs sampler based on iterative methods and multigrid to generate fields with $O(N)$ computational complexity. Our stochastic methods provide an alternative to uniform discretizations on periodic domains that rely on Fast Fourier Transforms. To demonstrate in practice our stochastic computational methods, we investigate within channel geometries having internal obstacles and no-slip walls how the mobility/diffusivity of particles depends on location. Our methods extend the applicability of fluctuating hydrodynamic approaches by allowing for spatially adaptive resolution of the mechanics and for domains that have complex geometries relevant in many applications.
△ Less
Submitted 22 November, 2013;
originally announced November 2013.
-
Geometric friction directs cell migration
Authors:
M. Le Berre,
Yan-Jun Liu,
J. Hu,
P. Maiuri,
O. Bénichou,
R. Voituriez,
Y. Chen,
M. Piel
Abstract:
In the absence of environmental cues, a migrating cell performs an isotropic random motion. Recently, the breaking of this isotropy has been observed when cells move in the presence of asymmetric adhesive patterns. However, up to now the mechanisms at work to direct cell migration in such environments remain unknown. Here, we show that a non-adhesive surface with asymmetric micro-geometry consisti…
▽ More
In the absence of environmental cues, a migrating cell performs an isotropic random motion. Recently, the breaking of this isotropy has been observed when cells move in the presence of asymmetric adhesive patterns. However, up to now the mechanisms at work to direct cell migration in such environments remain unknown. Here, we show that a non-adhesive surface with asymmetric micro-geometry consisting of dense arrays of tilted micro-pillars can direct cell motion. Our analysis reveals that most features of cell trajectories, including the bias, can be reproduced by a simple model of active Brownian particle in a ratchet potential, which we suggest originates from a generic elastic interaction of the cell body with the environment. The observed guiding effect, independent of adhesion, is therefore robust and could be used to direct cell migration both in vitro and in vivo.
△ Less
Submitted 15 October, 2013;
originally announced October 2013.