-
Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-terminal Coding Sequences
Authors:
Zhanglu Yan,
Weiran Chu,
Yuhua Sheng,
Kaiwen Tang,
Shida Wang,
Yanfeng Liu,
Weng-Fai Wong
Abstract:
N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. T…
▽ More
N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. This paper introduces a deep learning/synthetic biology co-designed few-shot training workflow for NCS optimization. Our method utilizes k-nearest encoding followed by word2vec to encode the NCS, then performs feature extraction using attention mechanisms, before constructing a time-series network for predicting gene expression intensity, and finally a direct search algorithm identifies the optimal NCS with limited training data. We took green fluorescent protein (GFP) expressed by Bacillus subtilis as a reporting protein of NCSs, and employed the fluorescence enhancement factor as the metric of NCS optimization. Within just six iterative experiments, our model generated an NCS (MLD62) that increased average GFP expression by 5.41-fold, outperforming the state-of-the-art NCS designs. Extending our findings beyond GFP, we showed that our engineered NCS (MLD62) can effectively boost the production of N-acetylneuraminic acid by enhancing the expression of the crucial rate-limiting GNA1 gene, demonstrating its practical utility. We have open-sourced our NCS expression database and experimental procedures for public use.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
A Radiomics-Incorporated Deep Ensemble Learning Model for Multi-Parametric MRI-based Glioma Segmentation
Authors:
Yang Chen,
Zhenyu Yang,
Jingtong Zhao,
Justus Adamson,
Yang Sheng,
Fang-Fang Yin,
Chunhao Wang
Abstract:
We developed a deep ensemble learning model with a radiomics spatial encoding execution for improved glioma segmentation accuracy using multi-parametric MRI (mp-MRI). This model was developed using 369 glioma patients with a 4-modality mp-MRI protocol: T1, contrast-enhanced T1 (T1-Ce), T2, and FLAIR. In each modality volume, a 3D sliding kernel was implemented across the brain to capture image het…
▽ More
We developed a deep ensemble learning model with a radiomics spatial encoding execution for improved glioma segmentation accuracy using multi-parametric MRI (mp-MRI). This model was developed using 369 glioma patients with a 4-modality mp-MRI protocol: T1, contrast-enhanced T1 (T1-Ce), T2, and FLAIR. In each modality volume, a 3D sliding kernel was implemented across the brain to capture image heterogeneity: fifty-six radiomic features were extracted within the kernel, resulting in a 4th order tensor. Each radiomic feature can then be encoded as a 3D image volume, namely a radiomic feature map (RFM). PCA was employed for data dimension reduction and the first 4 PCs were selected. Four deep neural networks as sub-models following the U-Net architecture were trained for the segmenting of a region-of-interest (ROI): each sub-model utilizes the mp-MRI and 1 of the 4 PCs as a 5-channel input for a 2D execution. The 4 softmax probability results given by the U-net ensemble were superimposed and binarized by Otsu method as the segmentation result. Three ensemble models were trained to segment enhancing tumor (ET), tumor core (TC), and whole tumor (WT). The adopted radiomics spatial encoding execution enriches the image heterogeneity information that leads to the successful demonstration of the proposed deep ensemble model, which offers a new tool for mp-MRI based medical image segmentation.
△ Less
Submitted 18 March, 2023;
originally announced March 2023.
-
Subtype-Former: a deep learning approach for cancer subtype discovery with multi-omics data
Authors:
Hai Yang,
Yuhang Sheng,
Yi Jiang,
Xiaoyang Fang,
Dongdong Li,
Jing Zhang,
Zhe Wang
Abstract:
Motivation: Cancer is heterogeneous, affecting the precise approach to personalized treatment. Accurate subtyping can lead to better survival rates for cancer patients. High-throughput technologies provide multiple omics data for cancer subtyping. However, precise cancer subtyping remains challenging due to the large amount and high dimensionality of omics data. Results: This study proposed Subtyp…
▽ More
Motivation: Cancer is heterogeneous, affecting the precise approach to personalized treatment. Accurate subtyping can lead to better survival rates for cancer patients. High-throughput technologies provide multiple omics data for cancer subtyping. However, precise cancer subtyping remains challenging due to the large amount and high dimensionality of omics data. Results: This study proposed Subtype-Former, a deep learning method based on MLP and Transformer Block, to extract the low-dimensional representation of the multi-omics data. K-means and Consensus Clustering are also used to achieve accurate subtyping results. We compared Subtype-Former with the other state-of-the-art subtyping methods across the TCGA 10 cancer types. We found that Subtype-Former can perform better on the benchmark datasets of more than 5000 tumors based on the survival analysis. In addition, Subtype-Former also achieved outstanding results in pan-cancer subtyping, which can help analyze the commonalities and differences across various cancer types at the molecular level. Finally, we applied Subtype-Former to the TCGA 10 types of cancers. We identified 50 essential biomarkers, which can be used to study targeted cancer drugs and promote the development of cancer treatments in the era of precision medicine.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Protein corona critically affects the bio-behaviors of SARS-CoV-2
Authors:
Yue-wen Yin,
Yan-jing Sheng,
Min Wang,
Song-di Ni,
Hong-ming Ding,
Yu-qiang Ma
Abstract:
The outbreak of the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has become a worldwide public health crisis. When the SARS-CoV-2 enters the biological fluids in the human body, different types of biomolecules (in particular proteins) may adsorb on its surface and alter its infection ability. Although great efforts have recently been de…
▽ More
The outbreak of the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has become a worldwide public health crisis. When the SARS-CoV-2 enters the biological fluids in the human body, different types of biomolecules (in particular proteins) may adsorb on its surface and alter its infection ability. Although great efforts have recently been devoted to the interaction of the specific antibodies with the SARS-CoV-2, it still remains largely unknown how the other serum proteins affect the infection of the SARS-CoV-2. In this work, we systematically investigate the interaction of serum proteins with the SARS-CoV-2 RBD by the molecular docking and the all-atom molecular dynamics simulations. It is found that the non-specific immunoglobulin (Ig) indeed cannot effectively bind to the SARS-CoV-2 RBD while the human serum albumin (HSA) may have some potential of blocking its infection (to ACE2). More importantly, we find that the RBD can cause the significant structural change of the Apolipoprotein E (ApoE), by which SARS-CoV-2 may hijack the metabolic pathway of the ApoE to facilitate its cell entry. The present study enhances the understanding of the role of protein corona in the bio-behaviors of SARS-CoV-2, which may aid the more precise and personalized treatment for COVID-19 infection in the clinic.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Accurate Evaluation on the Interactions of SARS-CoV-2 with Its Receptor ACE2 and Antibodies CR3022/CB6
Authors:
Hong-ming Ding,
Yue-wen Yin,
Song-di Ni,
Yan-jing Sheng,
Yu-qiang Ma
Abstract:
The spread of the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has become a global health crisis. The binding affinity of SARS-CoV-2 (in particular the receptor binding domain, RBD) to its receptor angiotensin converting enzyme 2 (ACE2) and the antibodies is of great importance in understanding the infectivity of COVID-19 and evaluating…
▽ More
The spread of the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has become a global health crisis. The binding affinity of SARS-CoV-2 (in particular the receptor binding domain, RBD) to its receptor angiotensin converting enzyme 2 (ACE2) and the antibodies is of great importance in understanding the infectivity of COVID-19 and evaluating the candidate therapeutic for COVID-19. In this work, we propose a new method based on molecular mechanics/Poisson-Boltzmann surface area (MM/PBSA) to accurately calculate the free energy of SARS-CoV-2 RBD binding to ACE2 and antibodies. The calculated binding free energy of SARS-CoV-2 RBD to ACE2 is -13.3 kcal/mol, and that of SARS-CoV RBD to ACE2 is -11.4 kcal/mol, which agrees well with experimental result (-11.3 kcal/mol and -10.1 kcal/mol, respectively). Moreover, we take two recently reported antibodies as the example, and calculate the free energy of antibodies binding to SARS-CoV-2 RBD, which is also consistent with the experimental findings. Further, within the framework of the modified MM/PBSA, we determine the key residues and the main driving forces for the SARS-CoV-2 RBD/CB6 interaction by the computational alanine scanning method. The present study offers a computationally efficient and numerically reliable method to evaluate the free energy of SARS-CoV-2 binding to other proteins, which may stimulate the development of the therapeutics against the COVID-19 disease in real applications.
△ Less
Submitted 17 January, 2021;
originally announced February 2021.