Search | arXiv e-print repository

MODA: A Unified 3D Diffusion Framework for Multi-Task Target-Aware Molecular Generation

Authors: Dong Xu, Zhangfan Yang, Sisi Yuan, Jenna Xinyi Yao, Jiangqiang Li, Junkai Ji

Abstract: Three-dimensional molecular generators based on diffusion models can now reach near-crystallographic accuracy, yet they remain fragmented across tasks. SMILES-only inputs, two-stage pretrain-finetune pipelines, and one-task-one-model practices hinder stereochemical fidelity, task alignment, and zero-shot transfer. We introduce MODA, a diffusion framework that unifies fragment growing, linker desig… ▽ More Three-dimensional molecular generators based on diffusion models can now reach near-crystallographic accuracy, yet they remain fragmented across tasks. SMILES-only inputs, two-stage pretrain-finetune pipelines, and one-task-one-model practices hinder stereochemical fidelity, task alignment, and zero-shot transfer. We introduce MODA, a diffusion framework that unifies fragment growing, linker design, scaffold hopping, and side-chain decoration with a Bayesian mask scheduler. During training, a contiguous spatial fragment is masked and then denoised in one pass, enabling the model to learn shared geometric and chemical priors across tasks. Multi-task training yields a universal backbone that surpasses six diffusion baselines and three training paradigms on substructure, chemical property, interaction, and geometry. Model-C reduces ligand-protein clashes and substructure divergences while maintaining Lipinski compliance, whereas Model-B preserves similarity but trails in novelty and binding affinity. Zero-shot de novo design and lead-optimisation tests confirm stable negative Vina scores and high improvement rates without force-field refinement. These results demonstrate that a single-stage multi-task diffusion routine can replace two-stage workflows for structure-based molecular design. △ Less

Submitted 9 July, 2025; originally announced July 2025.

arXiv:2506.14488 [pdf, ps, other]

Reimagining Target-Aware Molecular Generation through Retrieval-Enhanced Aligned Diffusion

Authors: Dong Xu, Zhangfan Yang, Ka-chun Wong, Zexuan Zhu, Jiangqiang Li, Junkai Ji

Abstract: Breakthroughs in high-accuracy protein structure prediction, such as AlphaFold, have established receptor-based molecule design as a critical driver for rapid early-phase drug discovery. However, most approaches still struggle to balance pocket-specific geometric fit with strict valence and synthetic constraints. To resolve this trade-off, a Retrieval-Enhanced Aligned Diffusion termed READ is intr… ▽ More Breakthroughs in high-accuracy protein structure prediction, such as AlphaFold, have established receptor-based molecule design as a critical driver for rapid early-phase drug discovery. However, most approaches still struggle to balance pocket-specific geometric fit with strict valence and synthetic constraints. To resolve this trade-off, a Retrieval-Enhanced Aligned Diffusion termed READ is introduced, which is the first to merge molecular Retrieval-Augmented Generation with an SE(3)-equivariant diffusion model. Specifically, a contrastively pre-trained encoder aligns atom-level representations during training, then retrieves graph embeddings of pocket-matched scaffolds to guide each reverse-diffusion step at inference. This single mechanism can inject real-world chemical priors exactly where needed, producing valid, diverse, and shape-complementary ligands. Experimental results demonstrate that READ can achieve very competitive performance in CBGBench, surpassing state-of-the-art generative models and even native ligands. That suggests retrieval and diffusion can be co-optimized for faster, more reliable structure-based drug design. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: 13 pages, 5 figures

arXiv:2503.10663 [pdf, other]

Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing

Authors: Yang Xiao, Wang Lu, Jie Ji, Ruimeng Ye, Gen Li, Xiaolong Ma, Bo Hui

Abstract: The design of artificial neural networks (ANNs) is inspired by the structure of the human brain, and in turn, ANNs offer a potential means to interpret and understand brain signals. Existing methods primarily align brain signals with real-world signals using Mean Squared Error (MSE), which solely focuses on local point-wise alignment, and ignores global matching, leading to coarse interpretations… ▽ More The design of artificial neural networks (ANNs) is inspired by the structure of the human brain, and in turn, ANNs offer a potential means to interpret and understand brain signals. Existing methods primarily align brain signals with real-world signals using Mean Squared Error (MSE), which solely focuses on local point-wise alignment, and ignores global matching, leading to coarse interpretations and inaccuracies in brain signal decoding. In this paper, we address these issues through optimal transport (OT) and theoretically demonstrate why OT provides a more effective alignment strategy than MSE. Specifically, we construct a transport plan between brain voxel embeddings and image embeddings, enabling more precise matching. By controlling the amount of transport, we mitigate the influence of redundant information. We apply our alignment model directly to the Brain Captioning task by feeding brain siginals into a large language model (LLM) instead of images. Our approach achieves state-of-the-art performance across ten evaluation metrics, surpassing the previous best method by an average of 6.11\% in single-subject training and 3.81\% in cross-subject training. Additionally, we have uncovered several insightful conclusions that align with existing brain research. We unveil the redundancy and synergy of brain information processing through region masking and data dimensionality reduction visualization experiments. We believe our approach paves the way for a more precise understanding of brain signals in the future. The code is available soon. △ Less

Submitted 9 March, 2025; originally announced March 2025.

Comments: 14pages

arXiv:2410.21283

pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2

Authors: Joongwon Chae, Zhenyu Wang, Ijaz Gul, Jiansong Ji, Zhenglin Chen, Peiwu Qin

Abstract: Recent advancements in protein structure prediction, particularly AlphaFold2, have revolutionized structural biology by achieving near-experimental accuracy ($\text{average RMSD} < 1.5\textÅ$). However, the computational demands of these models (approximately 30 minutes per protein on an RTX 4090) significantly limit their application in high-throughput protein screening. While large language mode… ▽ More Recent advancements in protein structure prediction, particularly AlphaFold2, have revolutionized structural biology by achieving near-experimental accuracy ($\text{average RMSD} < 1.5\textÅ$). However, the computational demands of these models (approximately 30 minutes per protein on an RTX 4090) significantly limit their application in high-throughput protein screening. While large language models like ESM (Evolutionary Scale Modeling) have shown promise in extracting structural information directly from protein sequences, rapid assessment of protein structure quality for large-scale analyses remains a major challenge. We introduce pLDDT-Predictor, a high-speed protein screening tool that achieves a $250,000\times$ speedup compared to AlphaFold2 by leveraging pre-trained ESM2 protein embeddings and a Transformer architecture. Our model predicts AlphaFold2's pLDDT (predicted Local Distance Difference Test) scores with a Pearson correlation of 0.7891 and processes proteins in just 0.007 seconds on average. Using a comprehensive dataset of 1.5 million diverse protein sequences (ranging from 50 to 2048 amino acids), we demonstrate that pLDDT-Predictor accurately classifies high-confidence structures (pLDDT $>$ 70) with 91.2\% accuracy and achieves an MSE of 84.8142 compared to AlphaFold2's predictions. The source code and pre-trained models are freely available at https://github.com/jw-chae/pLDDT_Predictor, enabling the research community to perform rapid, large-scale protein structure quality assessments. △ Less

Submitted 6 June, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

Comments: Further experiments confirmed overfitting, and we are retracting the paper

arXiv:2406.01636 [pdf]

COVID-19: post infection implications in different age groups, mechanism, diagnosis, effective prevention, treatment, and recommendations

Authors: Muhammad Akmal Raheem, Muhammad Ajwad Rahim, Ijaz Gul, Md. Reyad-ul-Ferdous, Liyan Le, Junguo Hui, Shuiwei Xia, Minjiang Chen, Dongmei Yu, Vijay Pandey, Peiwu Qin, Jiansong Ji

Abstract: SARS-CoV-2, the highly contagious pathogen responsible for the COVID-19 pandemic, has persistent effects that begin four weeks after initial infection and last for an undetermined duration. These chronic effects are more harmful than acute ones. This review explores the long-term impact of the virus on various human organs, including the pulmonary, cardiovascular, neurological, reproductive, gastr… ▽ More SARS-CoV-2, the highly contagious pathogen responsible for the COVID-19 pandemic, has persistent effects that begin four weeks after initial infection and last for an undetermined duration. These chronic effects are more harmful than acute ones. This review explores the long-term impact of the virus on various human organs, including the pulmonary, cardiovascular, neurological, reproductive, gastrointestinal, musculoskeletal, endocrine, and lymphoid systems, particularly in older adults. Regarding diagnosis, RT-PCR is the gold standard for detecting COVID-19, though it requires specialized equipment, skilled personnel, and considerable time to produce results. To address these limitations, artificial intelligence in imaging and microfluidics technologies offers promising alternatives for diagnosing COVID-19 efficiently. Pharmacological and non-pharmacological strategies are effective in mitigating the persistent impacts of COVID-19. These strategies enhance immunity in post-COVID-19 patients by reducing cytokine release syndrome, improving T cell response, and increasing the circulation of activated natural killer and CD8 T cells in blood and tissues. This, in turn, alleviates symptoms such as fever, nausea, fatigue, muscle weakness, and pain. Vaccines, including inactivated viral, live attenuated viral, protein subunit, viral vectored, mRNA, DNA, and nanoparticle vaccines, significantly reduce the adverse long-term effects of the virus. However, no vaccine has been reported to provide lifetime protection against COVID-19. Consequently, protective measures such as physical distancing, mask usage, and hand hygiene remain essential strategies. This review offers a comprehensive understanding of the persistent effects of COVID-19 on individuals of varying ages, along with insights into diagnosis, treatment, vaccination, and future preventative measures against the spread of SARS-CoV-2. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2306.07505 [pdf]

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2010.13478 [pdf, other]

Pairwise heuristic sequence alignment algorithm based on deep reinforcement learning

Authors: Yong Joon Song, Dong Jin Ji, Hye In Seo, Gyu Bum Han, Dong Ho Cho

Abstract: Various methods have been developed to analyze the association between organisms and their genomic sequences. Among them, sequence alignment is the most frequently used for comparative analysis of biological genomes. However, the traditional sequence alignment method is considerably complicated in proportion to the sequences' length, and it is significantly challenging to align long sequences such… ▽ More Various methods have been developed to analyze the association between organisms and their genomic sequences. Among them, sequence alignment is the most frequently used for comparative analysis of biological genomes. However, the traditional sequence alignment method is considerably complicated in proportion to the sequences' length, and it is significantly challenging to align long sequences such as a human genome. Currently, several multiple sequence alignment algorithms are available that can reduce the complexity and improve the alignment performance of various genomes. However, there have been relatively fewer attempts to improve the alignment performance of the pairwise alignment algorithm. After grasping these problems, we intend to propose a new sequence alignment method using deep reinforcement learning. This research shows the application method of the deep reinforcement learning to the sequence alignment system and the way how the deep reinforcement learning can improve the conventional sequence alignment method. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 20pages, 9figures

Showing 1–7 of 7 results for author: Ji, J