-
Guide your favorite protein sequence generative model
Authors:
Junhao Xiong,
Hunter Nisonoff,
Maria Lukarska,
Ishan Gaur,
Luke M. Oltrogge,
David F. Savage,
Jennifer Listgarten
Abstract:
Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in a plug-and-play manner. Herein, we present ProteinGuide -- a principled and general method for conditioning -- by unifying a broad class of protein generative models under a single framewo…
▽ More
Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in a plug-and-play manner. Herein, we present ProteinGuide -- a principled and general method for conditioning -- by unifying a broad class of protein generative models under a single framework. We demonstrate the applicability of ProteinGuide by guiding two protein generative models, ProteinMPNN and ESM3, to generate amino acid and structure token sequences, conditioned on several user-specified properties such as enhanced stability, enzyme classes, and CATH-labeled folds. We also used ProteinGuide with inverse folding models and our own experimental assay to design adenine base editor sequences for high activity.
△ Less
Submitted 27 May, 2025; v1 submitted 7 May, 2025;
originally announced May 2025.
-
Cross-Attention Graph Neural Networks for Inferring Gene Regulatory Networks with Skewed Degree Distribution
Authors:
Jiaqi Xiong,
Nan Yin,
Shiyang Liang,
Haoyang Li,
Yingxu Wang,
Duo Ai,
Fang Pan,
Jingjie Wang
Abstract:
Inferencing Gene Regulatory Networks (GRNs) from gene expression data is a pivotal challenge in systems biology, and several innovative computational methods have been introduced. However, most of these studies have not considered the skewed degree distribution of genes. Specifically, some genes may regulate multiple target genes while some genes may be regulated by multiple regulator genes. Such…
▽ More
Inferencing Gene Regulatory Networks (GRNs) from gene expression data is a pivotal challenge in systems biology, and several innovative computational methods have been introduced. However, most of these studies have not considered the skewed degree distribution of genes. Specifically, some genes may regulate multiple target genes while some genes may be regulated by multiple regulator genes. Such a skewed degree distribution issue significantly complicates the application of directed graph embedding methods. To tackle this issue, we propose the Cross-Attention Complex Dual Graph Embedding Model (XATGRN). Our XATGRN employs a cross-attention mechanism to effectively capture intricate gene interactions from gene expression profiles. Additionally, it uses a Dual Complex Graph Embedding approach to manage the skewed degree distribution, thereby ensuring precise prediction of regulatory relationships and their directionality. Our model consistently outperforms existing state-of-the-art methods across various datasets, underscoring its efficacy in elucidating complex gene regulatory mechanisms. Our codes used in this paper are publicly available at: https://github.com/kikixiong/XATGRN.
△ Less
Submitted 9 January, 2025; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Synthetic frequency-controlled gene circuits unlock expanded cellular states
Authors:
Rongrong Zhang,
Shengjie Wan,
Jiarui Xiong,
Lei Ni,
Ye Li,
Yajia Huang,
Bing Li,
Mei Li,
Shuai Yang,
Fan Jin
Abstract:
Natural biological systems process environmental information through both amplitude and frequency-modulated signals, yet engineered biological circuits have largely relied on amplitude-based regulation alone. Despite the prevalence of frequency-encoded signals in natural systems, fundamental challenges in designing and implementing frequency-responsive gene circuits have limited their development…
▽ More
Natural biological systems process environmental information through both amplitude and frequency-modulated signals, yet engineered biological circuits have largely relied on amplitude-based regulation alone. Despite the prevalence of frequency-encoded signals in natural systems, fundamental challenges in designing and implementing frequency-responsive gene circuits have limited their development in synthetic biology. Here we present a Time-Resolved Gene Circuit (TRGC) architecture that enables frequency-to-amplitude signal conversion in engineered biological systems. Through systematic analysis, we establish a theoretical framework that guides the design of synthetic circuits capable of distinct frequency-dependent responses, implementing both high-pass and low-pass filtering behaviors. To enable rigorous characterization of these dynamic circuits, we developed a high-throughput automated platform that ensures stable and reproducible measurements of frequency-dependent r esponses across diverse conditions. Using this platform, we demonstrate that these frequency-modulated circuits can access cellular states unreachable through conventional amplitude modulation, significantly expanding the controllable gene expression space in multi-gene systems. Our results show that frequency modulation expands the range of achievable expression patterns when controlling multiple genes through a single input, demonstrating a new paradigm for engineering cellular behaviors. This work establishes frequency modulation as a powerful strategy for expanding the capabilities of engineered biological systems and enhancing cellular response to dynamic signals.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges
Authors:
Dancheng Liu,
Jason Yang,
Ishan Albrecht-Buehler,
Helen Qin,
Sophie Li,
Yuting Hu,
Amir Nassereldine,
Jinjun Xiong
Abstract:
Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a…
▽ More
Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a growing need for efficient and scalable SLA methods powered by artificial intelligence. This position paper presents a survey of existing techniques suitable for automating SLA pipelines, with an emphasis on adapting automatic speech recognition (ASR) models for children's speech, an overview of current SLAs and their automated counterparts to demonstrate the feasibility of AI-enhanced SLA pipelines, and a discussion of practical considerations, including accessibility and privacy concerns, associated with the deployment of AI-powered SLAs.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Optimal Frequency in Second Messenger Signaling Quantifying cAMP Information Transmission in Bacteria
Authors:
Jiarui Xiong,
Liang Wang,
Jialun Lin,
Lei Ni,
Rongrong Zhang,
Shuai Yang,
Yajia Huang,
Jun Chu,
Fan Jin
Abstract:
Bacterial second messengers are crucial for transmitting environmental information to cellular responses. However, quantifying their information transmission capacity remains challenging. Here, we engineer an isolated cAMP signaling channel in Pseudomonas aeruginosa using targeted gene knockouts, optogenetics, and a fluorescent cAMP probe. This design allows precise optical control and real-time m…
▽ More
Bacterial second messengers are crucial for transmitting environmental information to cellular responses. However, quantifying their information transmission capacity remains challenging. Here, we engineer an isolated cAMP signaling channel in Pseudomonas aeruginosa using targeted gene knockouts, optogenetics, and a fluorescent cAMP probe. This design allows precise optical control and real-time monitoring of cAMP dynamics. By integrating experimental data with information theory, we reveal an optimal frequency for light-mediated cAMP signaling that maximizes information transmission, reaching about 40 bits/h. This rate correlates strongly with cAMP degradation kinetics and employs a two-state encoding scheme. Our findings suggest a mechanism for fine-tuned regulation of multiple genes through temporal encoding of second messenger signals, providing new insights into bacterial adaptation strategies. This approach offers a framework for quantifying information processing in cellular signaling systems.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Implementation of an AI-based MRD evaluation and prediction model for multiple myeloma
Authors:
Jianfeng Chen,
Jize Xiong,
Yixu Wang,
Qi Xin,
Hong Zhou
Abstract:
With the application of hematopoietic stem cell transplantation and new drugs, the progression-free survival rate and overall survival rate of multiple myeloma have been greatly improved, but it is still considered as a kind of disease that cannot be completely cured. Many patients have disease recurrence after complete remission, which is rooted in the presence of minimal residual disease MRD in…
▽ More
With the application of hematopoietic stem cell transplantation and new drugs, the progression-free survival rate and overall survival rate of multiple myeloma have been greatly improved, but it is still considered as a kind of disease that cannot be completely cured. Many patients have disease recurrence after complete remission, which is rooted in the presence of minimal residual disease MRD in patients. Studies have shown that positive MRD is an independent adverse prognostic factor affecting survival, so MRD detection is an important indicator to judge the prognosis of patients and guide clinical treatment. At present, multipa-rameter flow cytometry (MFC), polymerase chain reaction (PCR), positron emission tomography (positron emission) Several techniques, such as PET/computer tomography (CT), have been used for MRD detection of multiple myeloma.However, there is still no cure for the disease. "IFM2013-04" four clinical studies confirmed for the first time that proteasome inhibitors (PIs) and immunomodulatory drugs, The synergism and importance of the combination of IMiDs in the treatment of MM, the large Phase 3 clinical study SWOG SO777 compared the combination of bortezomib plus lenalidomide and dexamethasone. The efficacy of VRD and D established the status of VRD first-line treatment of MM, and due to the good efficacy of CD38 monoclonal antibody in large clinical studies, combination therapy with VRD has been recommended as the first-line treatment of MM. However, to explore the clinical value and problems of applying artificial intelligence bone marrow cell recognition system Morphogo in the detection of multiple myeloma minimal residual disease (MRD)
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
"Virus hunting" using radial distance weighted discrimination
Authors:
Jie Xiong,
D. P. Dittmer,
J. S. Marron
Abstract:
Motivated by the challenge of using DNA-seq data to identify viruses in human blood samples, we propose a novel classification algorithm called "Radial Distance Weighted Discrimination" (or Radial DWD). This classifier is designed for binary classification, assuming one class is surrounded by the other class in very diverse radial directions, which is seen to be typical for our virus detection dat…
▽ More
Motivated by the challenge of using DNA-seq data to identify viruses in human blood samples, we propose a novel classification algorithm called "Radial Distance Weighted Discrimination" (or Radial DWD). This classifier is designed for binary classification, assuming one class is surrounded by the other class in very diverse radial directions, which is seen to be typical for our virus detection data. This separation of the 2 classes in multiple radial directions naturally motivates the development of Radial DWD. While classical machine learning methods such as the Support Vector Machine and linear Distance Weighted Discrimination can sometimes give reasonable answers for a given data set, their generalizability is severely compromised because of the linear separating boundary. Radial DWD addresses this challenge by using a more appropriate (in this particular case) spherical separating boundary. Simulations show that for appropriate radial contexts, this gives much better generalizability than linear methods, and also much better than conventional kernel based (nonlinear) Support Vector Machines, because the latter methods essentially use much of the information in the data for determining the shape of the separating boundary. The effectiveness of Radial DWD is demonstrated for real virus detection.
△ Less
Submitted 9 February, 2016;
originally announced February 2016.
-
Warburg Effect due to Exposure to Different Types of Radiation
Authors:
Zhitong Bing,
Bin Ao,
Yanan Zhang,
Fengling Wang,
Caiyong Ye,
Jinpeng He,
Jintu Sun,
Jie Xiong,
Nan Ding,
Xiao-fei Gao,
Ji Qi,
Sheng Zhang,
Guangming Zhou,
Lei Yang
Abstract:
Cancer cells maintain a high level of aerobic glycolysis (the Warburg effect), which is associated with their rapid proliferation. Many studies have reported that the suppression of glycolysis and activation of oxidative phosphorylation can repress the growth of cancer cells through regulation of key regulators. Whether Warburg effect of cancer cells could be switched by some other environmental s…
▽ More
Cancer cells maintain a high level of aerobic glycolysis (the Warburg effect), which is associated with their rapid proliferation. Many studies have reported that the suppression of glycolysis and activation of oxidative phosphorylation can repress the growth of cancer cells through regulation of key regulators. Whether Warburg effect of cancer cells could be switched by some other environmental stimulus? Herein, we report an interesting phenomenon in which cells alternated between glycolysis and mitochondrial respiration depending on the type of radiation they were exposed to. We observed enhanced glycolysis and mitochondrial respiration in HeLa cells exposed to 2-Gy X-ray and 2-Gy carbon ion radiation, respectively. This discovery may provide novel insights for tumor therapy.
△ Less
Submitted 10 March, 2013;
originally announced March 2013.