-
Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction
Authors:
Linjia Kang,
Songhua Zhou,
Shuyan Fang,
Shichao Liu
Abstract:
Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for hierarchical prompted molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential…
▽ More
Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for hierarchical prompted molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential expression of tasks in molecular representations and mitigate negative transfer caused by conflicts in individual task information. Our framework comprises two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atom and motif levels. Meanwhile, TAP utilizes agglomerative hierarchical clustering algorithm to construct a prompt tree that reflects task affinity and distinctiveness, enabling the model to consider multi-granular correlation information among tasks, thereby effectively handling the complexity of multi-label property prediction. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a novel perspective on multi-label molecular representation learning.
△ Less
Submitted 11 August, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Pro-PRIME: A general Temperature-Guided Language model to engineer enhanced Stability and Activity in Proteins
Authors:
Fan Jiang,
Mingchen Li,
Jiajun Dong,
Yuanxi Yu,
Xinyu Sun,
Banghao Wu,
Jin Huang,
Liqi Kang,
Yufeng Pei,
Liang Zhang,
Shaojie Wang,
Wenxue Xu,
Jingyao Xin,
Wanli Ouyang,
Guisheng Fan,
Lirong Zheng,
Yang Tan,
Zhiqiang Hu,
Yi Xiong,
Yan Feng,
Guangyu Yang,
Qian Liu,
Jie Song,
Jia Liu,
Liang Hong
, et al. (1 additional authors not shown)
Abstract:
Designing protein mutants of both high stability and activity is a critical yet challenging task in protein engineering. Here, we introduce PRIME, a deep learning model, which can suggest protein mutants of improved stability and activity without any prior experimental mutagenesis data of the specified protein. Leveraging temperature-aware language modeling, PRIME demonstrated superior predictive…
▽ More
Designing protein mutants of both high stability and activity is a critical yet challenging task in protein engineering. Here, we introduce PRIME, a deep learning model, which can suggest protein mutants of improved stability and activity without any prior experimental mutagenesis data of the specified protein. Leveraging temperature-aware language modeling, PRIME demonstrated superior predictive power compared to current state-of-the-art models on the public mutagenesis dataset over 283 protein assays. Furthermore, we validated PRIME's predictions on five proteins, examining the top 30-45 single-site mutations' impact on various protein properties, including thermal stability, antigen-antibody binding affinity, and the ability to polymerize non-natural nucleic acid or resilience to extreme alkaline conditions. Remarkably, over 30% of the AI-recommended mutants exhibited superior performance compared to their pre-mutation counterparts across all proteins and desired properties. Moreover, we have developed an efficient, and successful method based on PRIME to rapidly obtain multi-site mutants with enhanced activity and stability. Hence, PRIME demonstrates the general applicability in protein engineering.
△ Less
Submitted 27 October, 2024; v1 submitted 24 July, 2023;
originally announced July 2023.
-
A Hopfield-like model with complementary encodings of memories
Authors:
Louis Kang,
Taro Toyoizumi
Abstract:
We present a Hopfield-like autoassociative network for memories representing examples of concepts. Each memory is encoded by two activity patterns with complementary properties. The first is dense and correlated across examples within concepts, and the second is sparse and exhibits no correlation among examples. The network stores each memory as a linear combination of its encodings. During retrie…
▽ More
We present a Hopfield-like autoassociative network for memories representing examples of concepts. Each memory is encoded by two activity patterns with complementary properties. The first is dense and correlated across examples within concepts, and the second is sparse and exhibits no correlation among examples. The network stores each memory as a linear combination of its encodings. During retrieval, the network recovers sparse or dense patterns with a high or low activity threshold, respectively. As more memories are stored, the dense representation at low threshold shifts from examples to concepts, which are learned from accumulating common example features. Meanwhile, the sparse representation at high threshold maintains distinctions between examples due to the high capacity of sparse, decorrelated patterns. Thus, a single network can retrieve memories at both example and concept scales and perform heteroassociation between them. We obtain our results by deriving macroscopic mean-field equations that yield capacity formulas for sparse examples, dense examples, and dense concepts. We also perform network simulations that verify our theoretical results and explicitly demonstrate the capabilities of the network.
△ Less
Submitted 25 August, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering
Authors:
Mingchen Li,
Liqi Kang,
Yi Xiong,
Yu Guang Wang,
Guisheng Fan,
Pan Tan,
Liang Hong
Abstract:
Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mech…
▽ More
Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mechanism. Our model integrates local evolutionary context from homologous sequences, the global evolutionary context encoding rich semantic from the universal protein sequence space and the structure information accounting for the microenvironment around each residue in a protein. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship on 26 deep mutational scanning datasets. More importantly, we propose a data augmentation strategy by leveraging the data from unsupervised models to pre-train our model. After that, our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants (> 4 mutation sites), when finetuned by using only a small number of experimental mutation data (<50). The strategy proposed is of great practical value as the required experimental effort, i.e., producing a few tens of experimental mutation data on a given protein, is generally affordable by an ordinary biochemical group and can be applied on almost any protein.
△ Less
Submitted 28 December, 2022;
originally announced January 2023.
-
Machine Learning Approaches to Automated Flow Cytometry Diagnosis of Chronic Lymphocytic Leukemia
Authors:
Akum S. Kang,
Loveleen C. Kang,
Stephen M. Mastorides,
Philip R. Foulis,
Lauren A. DeLand,
Robert P. Seifert,
Andrew A. Borkowski
Abstract:
Flow cytometry is a technique that measures multiple fluorescence and light scatter-associated parameters from individual cells as they flow a single file through an excitation light source. These cells are labeled with antibodies to detect various antigens and the fluorescence signals reflect antigen expression. Interpretation of the multiparameter flow cytometry data is laborious, time-consuming…
▽ More
Flow cytometry is a technique that measures multiple fluorescence and light scatter-associated parameters from individual cells as they flow a single file through an excitation light source. These cells are labeled with antibodies to detect various antigens and the fluorescence signals reflect antigen expression. Interpretation of the multiparameter flow cytometry data is laborious, time-consuming, and expensive. It involves manual interpretation of cell distribution and pattern recognition on two-dimensional plots by highly trained medical technologists and pathologists. Using various machine learning algorithms, we attempted to develop an automated analysis for clinical flow cytometry cases that would automatically classify normal and chronic lymphocytic leukemia cases. We achieved the best success with the Gradient Boosting. The XGBoost classifier achieved a specificity of 1.00 and a sensitivity of 0.67, a negative predictive value of 0.75, a positive predictive value of 1.00, and an overall accuracy of 0.83 in prospectively classifying cases with malignancies.
△ Less
Submitted 22 July, 2021; v1 submitted 20 July, 2021;
originally announced July 2021.
-
Relationship between blood pressure and flow rate in arteries using a modified Windkessel model
Authors:
Nam Lyong Kang
Abstract:
This study examined the flow rate in arteries using the modified Windkessel model, considering various models for blood pressure. An exact solution was derived using a Laplace transform method and the effects of blood pressure on the flow rate in an artery were examined. The effects of the flow resistance, arterial compliance, and inertia of blood on the flow rate were also investigated. The flow…
▽ More
This study examined the flow rate in arteries using the modified Windkessel model, considering various models for blood pressure. An exact solution was derived using a Laplace transform method and the effects of blood pressure on the flow rate in an artery were examined. The effects of the flow resistance, arterial compliance, and inertia of blood on the flow rate were also investigated. The flow rate decreased with increasing inertia of the blood and flow resistance and decreasing arterial compliance. The height and position of the secondary peak were determined by a combination of the flow resistance, arterial compliance, and blood inertance. The results suggest that the risk of hypertension may increase with age because decreases in flow rate due to an increase in flow resistance and a decrease in arterial compliance were more substantial than the increase in flow rate caused by a decrease in blood inertance. The proposed method can provide information to examine how factors, such as aging, disease, and exercise, affect the flow rate in blood vessels.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
New method for evaluating fitness using the waist-to-height ratio among Korean adults
Authors:
Nam. Lyong Kang
Abstract:
Objectives: This paper introduces a new method for evaluating fitness and determining effective exercises for reducing abdominal obesity in Korean adults using the new kind of waist-to-height ratio (WHT2R). Materials and Methods: The body mass index (BMI), body shape index (ABSI), and two other waist-to-height ratios (WHT.5R, WHTR) were considered as possible contenders for the WHT2R. The correlat…
▽ More
Objectives: This paper introduces a new method for evaluating fitness and determining effective exercises for reducing abdominal obesity in Korean adults using the new kind of waist-to-height ratio (WHT2R). Materials and Methods: The body mass index (BMI), body shape index (ABSI), and two other waist-to-height ratios (WHT.5R, WHTR) were considered as possible contenders for the WHT2R. The correlation coefficients were calculated by correlation analyses between the indices and four fitness tests for comparison. The LMV (lump mean value) and FSPW (fitness sensitivity percentage to WHT2R) were introduced to find the association between fitness and abdominal obesity using a linear regression method and to use as an indicator for the effective control of abdominal obesity. Results: The WHT2R is more suitable for assessing fitness than the other indices and can be controlled effectively by decreasing the 10-m shuttle run score for both males and females. Conclusions: The WHT2R can be used as a possible contender for evaluating fitness and is an effective indicator for the reduction of abdominal obesity. The LMV and FSPW can be used to establish personal exercise aims.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Age dependence of fitness and body mass index in Korean adults
Authors:
Nam Lyong Kang,
Su Chak Ryu
Abstract:
The aim of this study was to investigate the age dependence of the fitness and body mass index (BMI) in Korean adults and to find an effective exercise to restore the degradation of fitness due to aging. The age dependence of the fitness and BMI were calculated using their lump mean values (LMVs) and a linear regression method. The fitness sensitivity percentage to age (FSPA) and fitness sensitivi…
▽ More
The aim of this study was to investigate the age dependence of the fitness and body mass index (BMI) in Korean adults and to find an effective exercise to restore the degradation of fitness due to aging. The age dependence of the fitness and BMI were calculated using their lump mean values (LMVs) and a linear regression method. The fitness sensitivity percentage to age (FSPA) and fitness sensitivity percentage to BMI (FSPB) were introduced as indicators for the effective improvement of the fitness. The results showed that the degradation of fitness due to aging, especially the degradation of cardiorespiratory endurance and muscular endurance, could be improved effectively by controlling the 20-m multi-stage shuttle run and sit-up scores for both males and females. The results also showed that the BMIs could be effectively controlled with enhancing the 10-m shuttle run and standing long jump scores for both males and females. It is expected that the LMV, FSPA, and FSPB could be used to improve fitness effectively and to establish personal exercise aims.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
A geometric attractor mechanism for self-organization of entorhinal grid modules
Authors:
Louis Kang,
Vijay Balasubramanian
Abstract:
Grid cells in the medial entorhinal cortex (MEC) respond when an animal occupies a periodic lattice of "grid fields" in the environment. The grids are organized in modules with spatial periods, or scales, clustered around discrete values separated by ratios in the range 1.2--2.0. We propose a mechanism that produces this modular structure through dynamical self-organization in the MEC. In attracto…
▽ More
Grid cells in the medial entorhinal cortex (MEC) respond when an animal occupies a periodic lattice of "grid fields" in the environment. The grids are organized in modules with spatial periods, or scales, clustered around discrete values separated by ratios in the range 1.2--2.0. We propose a mechanism that produces this modular structure through dynamical self-organization in the MEC. In attractor network models of grid formation, the grid scale of a single module is set by the distance of recurrent inhibition between neurons. We show that the MEC forms a hierarchy of discrete modules if a smooth increase in inhibition distance along its dorso-ventral axis is accompanied by excitatory interactions along this axis. Moreover, constant scale ratios between successive modules arise through geometric relationships between triangular grids and have values that fall within the observed range. We discuss how interactions required by our model might be tested experimentally.
△ Less
Submitted 11 March, 2019; v1 submitted 4 June, 2018;
originally announced June 2018.
-
The dichotomy structure of Y chromosome Haplogroup N
Authors:
Kang Hu,
Shi Yan,
Kai Liu,
Chao Ning,
Lan-Hai Wei,
Shi-Lin Li,
Bing Song,
Ge Yu,
Feng Chen,
Li-Jun Liu,
Zhi-Peng Zhao,
Chuan-Chao Wang,
Ya-Jun Yang,
Zhen-Dong Qin,
Jing-Ze Tan,
Fu-Zhong Xue,
Hui Li,
Long-Li Kang,
Li Jin
Abstract:
Haplogroup N-M231 of human Y chromosome is a common clade from Eastern Asia to Northern Europe, being one of the most frequent haplogroups in Altaic and Uralic-speaking populations. Using newly discovered bi-allelic markers from high-throughput DNA sequencing, we largely improved the phylogeny of Haplogroup N, in which 16 subclades could be identified by 33 SNPs. More than 400 males belonging to H…
▽ More
Haplogroup N-M231 of human Y chromosome is a common clade from Eastern Asia to Northern Europe, being one of the most frequent haplogroups in Altaic and Uralic-speaking populations. Using newly discovered bi-allelic markers from high-throughput DNA sequencing, we largely improved the phylogeny of Haplogroup N, in which 16 subclades could be identified by 33 SNPs. More than 400 males belonging to Haplogroup N in 34 populations in China were successfully genotyped, and populations in Northern Asia and Eastern Europe were also compared together. We found that all the N samples were typed as inside either clade N1-F1206 (including former N1a-M128, N1b-P43 and N1c-M46 clades), most of which were found in Altaic, Uralic, Russian and Chinese-speaking populations, or N2-F2930, common in Tibeto-Burman and Chinese-speaking populations. Our detailed results suggest that Haplogroup N developed in the region of China since the final stage of late Paleolithic Era.
△ Less
Submitted 24 April, 2015;
originally announced April 2015.
-
The syncytial Drosophila embryo as a mechanically excitable medium
Authors:
Timon Idema,
Julien O. Dubuis,
Louis Kang,
M. Lisa Manning,
Philip C. Nelson,
Tom C. Lubensky,
Andrea J. Liu
Abstract:
Mitosis in the early syncytial Drosophila embryo is highly correlated in space and time, as manifested in mitotic wavefronts that propagate across the embryo. In this paper we investigate the idea that the embryo can be considered a mechanically-excitable medium, and that mitotic wavefronts can be understood as nonlinear wavefronts that propagate through this medium. We study the wavefronts via bo…
▽ More
Mitosis in the early syncytial Drosophila embryo is highly correlated in space and time, as manifested in mitotic wavefronts that propagate across the embryo. In this paper we investigate the idea that the embryo can be considered a mechanically-excitable medium, and that mitotic wavefronts can be understood as nonlinear wavefronts that propagate through this medium. We study the wavefronts via both image analysis of confocal microscopy videos and theoretical models. We find that the mitotic waves travel across the embryo at a well-defined speed that decreases with replication cycle. We find two markers of the wavefront in each cycle, corresponding to the onsets of metaphase and anaphase. Each of these onsets is followed by displacements of the nuclei that obey the same wavefront pattern. To understand the mitotic wavefronts theoretically we analyze wavefront propagation in excitable media. We study two classes of models, one with biochemical signaling and one with mechanical signaling. We find that the dependence of wavefront speed on cycle number is most naturally explained by mechanical signaling, and that the entire process suggests a scenario in which biochemical and mechanical signaling are coupled.
△ Less
Submitted 27 August, 2013; v1 submitted 15 April, 2013;
originally announced April 2013.
-
Adaptation through stochastic switching into transient mutators in finite asexual populations
Authors:
Muyoung Heo,
Louis Kang,
Eugene Shakhnovich
Abstract:
The importance of mutator clones in the adaptive evolution of asexual populations is not fully understood. Here we address this problem by using an ab initio microscopic model of living cells, whose fitness is derived directly from their genomes using a biophysically realistic model of protein folding and interactions in the cytoplasm. The model organisms contain replication controlling genes (D…
▽ More
The importance of mutator clones in the adaptive evolution of asexual populations is not fully understood. Here we address this problem by using an ab initio microscopic model of living cells, whose fitness is derived directly from their genomes using a biophysically realistic model of protein folding and interactions in the cytoplasm. The model organisms contain replication controlling genes (DCGs) and genes modeling the mismatch repair (MMR) complexes. We find that adaptation occurs through the transient fixation of a mutator phenotype, regardless of particular perturbations in the fitness landscape. The microscopic pathway of adaptation follows a well-defined set of events: stochastic switching to the mutator phenotype first, then mutation in the MMR complex that hitchhikes with a beneficial mutation in the DCGs, and finally a compensating mutation in the MMR complex returning the population to a non-mutator phenotype. Similarity of these results to reported adaptation events points out to robust universal physical principles of evolutionary adaptation.
△ Less
Submitted 13 February, 2009;
originally announced February 2009.
-
Emergence of species in evolutionary simulated annealing
Authors:
Muyoung Heo,
Louis Kang,
Eugene I. Shakhnovich
Abstract:
Which factors govern the evolution of mutation rates and emergence of species? Here, we address this question using a first principles model of life where population dynamics of asexual organisms is coupled to molecular properties and interactions of proteins encoded in their genomes. Simulating evolution of populations, we found that fitness increases in punctuated steps via epistatic events, l…
▽ More
Which factors govern the evolution of mutation rates and emergence of species? Here, we address this question using a first principles model of life where population dynamics of asexual organisms is coupled to molecular properties and interactions of proteins encoded in their genomes. Simulating evolution of populations, we found that fitness increases in punctuated steps via epistatic events, leading to formation of stable and functionally interacting proteins. At low mutation rates, species - populations of organisms with identical genotypes - form, while at higher mutation rates, species are lost through delocalization in sequence space without an apparent loss of fitness. However, when mutation rate was a selectable trait, the population initially maintained high mutation rate until a high fitness level is reached, after which organisms with low mutation rates are gradually selected, with the population eventually reaching mutation rates comparable to those of modern DNA-based organisms. These results provide microscopic insights into the dynamic fitness landscape of asexual populations of unicellular organisms.
△ Less
Submitted 9 October, 2008;
originally announced October 2008.