Skip to main content

Showing 1–50 of 61 results for author: Wu, L

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2506.21085  [pdf, ps, other

    q-bio.BM cs.AI cs.LG

    CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions

    Authors: Yangzhe Peng, Kaiyuan Gao, Liang He, Yuheng Cong, Haiguang Liu, Kun He, Lijun Wu

    Abstract: Molecular docking plays a crucial role in predicting the binding mode of ligands to target proteins, and covalent interactions, which involve the formation of a covalent bond between the ligand and the target, are particularly valuable due to their strong, enduring binding nature. However, most existing docking methods and deep learning approaches hardly account for the formation of covalent bonds… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to KDD 2025 Research Track

  2. arXiv:2506.07553  [pdf, ps, other

    cs.AI q-bio.QM

    GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition

    Authors: Jingchao Wang, Haote Yang, Jiang Wu, Yifan He, Xingjian Wei, Yinfan Wang, Chengjin Liu, Lingli Ge, Lijun Wu, Bin Wang, Dahua Lin, Conghui He

    Abstract: Optical Chemical Structure Recognition (OCSR) is crucial for digitizing chemical knowledge by converting molecular images into machine-readable formats. While recent vision-language models (VLMs) have shown potential in this task, their image-captioning approach often struggles with complex molecular structures and inconsistent annotations. To overcome these challenges, we introduce GTR-Mol-VLM, a… ▽ More

    Submitted 9 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  3. arXiv:2505.19014  [pdf, ps, other

    cs.LG physics.chem-ph q-bio.QM

    Tokenizing Electron Cloud in Protein-Ligand Interaction Learning

    Authors: Haitao Lin, Odin Zhang, Jia Xu, Yunfan Liu, Zheng Cheng, Lirong Wu, Yufei Huang, Zhifeng Gao, Stan Z. Li

    Abstract: The affinity and specificity of protein-molecule binding directly impact functional outcomes, uncovering the mechanisms underlying biological regulation and signal transduction. Most deep-learning-based prediction approaches focus on structures of atoms or fragments. However, quantum chemical properties, such as electronic structures, are the key to unveiling interaction patterns but remain largel… ▽ More

    Submitted 31 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

    Comments: conference paper

  4. arXiv:2504.12527  [pdf

    q-bio.OT eess.IV

    Analysis of the MICCAI Brain Tumor Segmentation -- Metastases (BraTS-METS) 2025 Lighthouse Challenge: Brain Metastasis Segmentation on Pre- and Post-treatment MRI

    Authors: Nazanin Maleki, Raisa Amiruddin, Ahmed W. Moawad, Nikolay Yordanov, Athanasios Gkampenis, Pascal Fehringer, Fabian Umeh, Crystal Chukwurah, Fatima Memon, Bojan Petrovic, Justin Cramer, Mark Krycia, Elizabeth B. Shrickel, Ichiro Ikuta, Gerard Thompson, Lorenna Vidal, Vilma Kosovic, Adam E. Goldman-Yassen, Virginia Hill, Tiffany So, Sedra Mhana, Albara Alotaibi, Nathan Page, Prisha Bhatia, Melisa S. Guelen , et al. (219 additional authors not shown)

    Abstract: Despite continuous advancements in cancer treatment, brain metastatic disease remains a significant complication of primary cancer and is associated with an unfavorable prognosis. One approach for improving diagnosis, management, and outcomes is to implement algorithms based on artificial intelligence for the automated segmentation of both pre- and post-treatment MRI brain images. Such algorithms… ▽ More

    Submitted 10 July, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: 28 pages, 4 figures, 2 tables

  5. arXiv:2503.01910  [pdf, other

    q-bio.QM cs.AI

    dyAb: Flow Matching for Flexible Antibody Design with AlphaFold-driven Pre-binding Antigen

    Authors: Cheng Tan, Yijie Zhang, Zhangyang Gao, Yufei Huang, Haitao Lin, Lirong Wu, Fandi Wu, Mathieu Blanchette, Stan. Z. Li

    Abstract: The development of therapeutic antibodies heavily relies on accurate predictions of how antigens will interact with antibodies. Existing computational methods in antibody design often overlook crucial conformational changes that antigens undergo during the binding process, significantly impacting the reliability of the resulting antibodies. To bridge this gap, we introduce dyAb, a flexible framewo… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: AAAI 2025 Oral

  6. arXiv:2502.14934  [pdf, other

    q-bio.QM cs.AI cs.LG

    Fast and Accurate Blind Flexible Docking

    Authors: Zizhuo Zhang, Lijun Wu, Kaiyuan Gao, Jiangchao Yao, Tao Qin, Bo Han

    Abstract: Molecular docking that predicts the bound structures of small molecules (ligands) to their protein targets, plays a vital role in drug discovery. However, existing docking methods often face limitations: they either overlook crucial structural changes by assuming protein rigidity or suffer from low computational efficiency due to their reliance on generative models for structure sampling. To addre… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 25 pages, Accepted by ICLR 2025

  7. arXiv:2502.06913  [pdf, other

    q-bio.QM cs.AI cs.LG

    A Simple yet Effective DDG Predictor is An Unsupervised Antibody Optimizer and Explainer

    Authors: Lirong Wu, Yunfan Liu, Haitao Lin, Yufei Huang, Guojiang Zhao, Zhifeng Gao, Stan Z. Li

    Abstract: The proteins that exist today have been optimized over billions of years of natural evolution, during which nature creates random mutations and selects them. The discovery of functionally promising mutations is challenged by the limited evolutionary accessible regions, i.e., only a small region on the fitness landscape is beneficial. There have been numerous priors used to constrain protein evolut… ▽ More

    Submitted 13 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  8. arXiv:2501.00013  [pdf, other

    q-bio.QM cs.AI cs.LG

    Relation-Aware Equivariant Graph Networks for Epitope-Unknown Antibody Design and Specificity Optimization

    Authors: Lirong Wu, Haitao Lin, Yufei Huang, Zhangyang Gao, Cheng Tan, Yunfan Liu, Tailin Wu, Stan Z. Li

    Abstract: Antibodies are Y-shaped proteins that protect the host by binding to specific antigens, and their binding is mainly determined by the Complementary Determining Regions (CDRs) in the antibody. Despite the great progress made in CDR design, existing computational methods still encounter several challenges: 1) poor capability of modeling complex CDRs with long sequences due to insufficient contextual… ▽ More

    Submitted 13 December, 2024; originally announced January 2025.

  9. arXiv:2412.01564  [pdf, other

    cs.LG q-bio.BM

    Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates

    Authors: Kaiyuan Gao, Yusong Wang, Haoxiang Guan, Zun Wang, Qizhi Pei, John E. Hopcroft, Kun He, Lijun Wu

    Abstract: The application of language models (LMs) to molecular structure generation using line notations such as SMILES and SELFIES has been well-established in the field of cheminformatics. However, extending these models to generate 3D molecular structures presents significant challenges. Two primary obstacles emerge: (1) the difficulty in designing a 3D line notation that ensures SE(3)-invariant atomic… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 17 pages, 6 figures, preprint

  10. arXiv:2411.01856  [pdf, other

    cs.LG q-bio.BM

    MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

    Authors: Cheng Tan, Zhenxiao Cao, Zhangyang Gao, Lirong Wu, Siyuan Li, Yufei Huang, Jun Xia, Bozhen Hu, Stan Z. Li

    Abstract: Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome, regulating protein attributes and interactions that are crucial for biological processes. Accurately predicting PTM sites and their specific types is therefore essential for elucidating protein function and understanding disease mechanisms. Existing computational approaches predominantly foc… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: 26 pages, 20 figures, 10 tables

  11. arXiv:2410.24022  [pdf, other

    q-bio.QM cs.AI cs.CL cs.LG

    SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation

    Authors: Liang He, Peiran Jin, Yaosen Min, Shufang Xie, Lijun Wu, Tao Qin, Xiaozhuan Liang, Kaiyuan Gao, Yuliang Jiang, Tie-Yan Liu

    Abstract: Proteins, essential to biological systems, perform functions intricately linked to their three-dimensional structures. Understanding the relationship between protein structures and their amino acid sequences remains a core challenge in protein modeling. While traditional protein foundation models benefit from pre-training on vast unlabeled datasets, they often struggle to capture critical co-evolu… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

  12. arXiv:2407.15202  [pdf, other

    q-bio.BM cs.AI cs.LG

    Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

    Authors: Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan

    Abstract: Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose $k$NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by 33rd ACM International Conference on Information and Knowledge Management 2024 (CIKM 2024)

  13. arXiv:2407.11758  [pdf, other

    physics.bio-ph cond-mat.soft cond-mat.stat-mech nlin.AO q-bio.SC

    Diffusion-driven self-assembly of emerin nanodomains at the nuclear envelope

    Authors: Carlos D. Alas, Liying Wu, Fabien Pinaud, Christoph A. Haselwandter

    Abstract: Emerin, a nuclear membrane protein with important biological roles in mechanotransduction and nuclear shape adaptation, self-assembles into nanometer-size domains at the inner nuclear membrane. The size and emerin occupancy of these nanodomains change with applied mechanical stress as well as under emerin mutations associated with Emery-Dreifuss muscular dystrophy (EDMD). Through a combination of… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 17 pages, 5 figures total. 6 main pages and 2 main figures, 11 supplemental material pages and 3 supplemental material figures

  14. arXiv:2406.10840  [pdf, other

    cs.LG cs.AI q-bio.BM

    CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

    Authors: Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Zicheng Liu, Siyuan Li, Cheng Tan, Zhifeng Gao, Stan Z. Li

    Abstract: Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair compariso… ▽ More

    Submitted 10 October, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 9 pages main context

  15. arXiv:2406.05797  [pdf, other

    q-bio.BM cs.AI cs.CE cs.CL cs.LG

    3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling

    Authors: Qizhi Pei, Rui Yan, Kaiyuan Gao, Jinhua Zhu, Lijun Wu

    Abstract: The integration of molecular and natural language representations has emerged as a focal point in molecular science, with recent advancements in Language Models (LMs) demonstrating significant potential for comprehensive modeling of both domains. However, existing approaches face notable limitations, particularly in their neglect of three-dimensional (3D) information, which is crucial for understa… ▽ More

    Submitted 18 March, 2025; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by ICLR 2025

  16. arXiv:2405.18968  [pdf, other

    cs.AI cs.LG q-bio.QM

    UniIF: Unified Molecule Inverse Folding

    Authors: Zhangyang Gao, Jue Wang, Cheng Tan, Lirong Wu, Yufei Huang, Siyuan Li, Zhirui Ye, Stan Z. Li

    Abstract: Molecule inverse folding has been a long-standing challenge in chemistry and biology, with the potential to revolutionize drug discovery and material science. Despite specified models have been proposed for different small- or macro-molecules, few have attempted to unify the learning process, resulting in redundant efforts. Complementary to recent advancements in molecular structure prediction, su… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  17. arXiv:2405.10348  [pdf, other

    q-bio.QM cs.AI cs.LG

    Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning

    Authors: Lirong Wu, Yijun Tian, Haitao Lin, Yufei Huang, Siyuan Li, Nitesh V Chawla, Stan Z. Li

    Abstract: Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial. To tackle the scarcity of annotated mutation data, pre-training with massive unlabeled data has emerged as a promising solution. However, this process faces a series of challenges: (1) complex higher-order dependen… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  18. arXiv:2405.06642  [pdf, other

    q-bio.BM cs.AI cs.LG

    PPFlow: Target-aware Peptide Design with Torsional Flow Matching

    Authors: Haitao Lin, Odin Zhang, Huifeng Zhao, Dejun Jiang, Lirong Wu, Zicheng Liu, Yufei Huang, Stan Z. Li

    Abstract: Therapeutic peptides have proven to have great pharmaceutical value and potential in recent decades. However, methods of AI-assisted peptide drug discovery are not fully explored. To fill the gap, we propose a target-aware peptide design method called \textsc{PPFlow}, based on conditional flow matching on torus manifolds, to model the internal geometries of torsion angles for the peptide structure… ▽ More

    Submitted 9 December, 2024; v1 submitted 5 March, 2024; originally announced May 2024.

    Comments: 18 pages

  19. arXiv:2403.20261  [pdf, other

    q-bio.BM cs.AI cs.LG

    FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation

    Authors: Kaiyuan Gao, Qizhi Pei, Gongbo Zhang, Jinhua Zhu, Kun He, Lijun Wu

    Abstract: Molecular docking is a pivotal process in drug discovery. While traditional techniques rely on extensive sampling and simulation governed by physical principles, these methods are often slow and costly. The advent of deep learning-based approaches has shown significant promise, offering increases in both accuracy and efficiency. Building upon the foundational work of FABind, a model designed with… ▽ More

    Submitted 24 February, 2025; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted for presentation at KDD 2025

  20. arXiv:2403.09673  [pdf, other

    q-bio.BM cs.AI cs.LG

    FoldToken: Learning Protein Language via Vector Quantization and Beyond

    Authors: Zhangyang Gao, Cheng Tan, Jue Wang, Yufei Huang, Lirong Wu, Stan Z. Li

    Abstract: Is there a foreign language describing protein sequences and structures simultaneously? Protein structures, represented by continuous 3D points, have long posed a challenge due to the contrasting modeling paradigms of discrete sequences. We introduce \textbf{FoldTokenizer} to represent protein sequence-structure as discrete symbols. This innovative approach involves projecting residue types and st… ▽ More

    Submitted 19 March, 2024; v1 submitted 4 February, 2024; originally announced March 2024.

  21. arXiv:2403.05314  [pdf, other

    q-bio.BM

    Advances of Deep Learning in Protein Science: A Comprehensive Survey

    Authors: Bozhen Hu, Cheng Tan, Lirong Wu, Jiangbin Zheng, Jun Xia, Zhangyang Gao, Zicheng Liu, Fandi Wu, Guijun Zhang, Stan Z. Li

    Abstract: Protein representation learning plays a crucial role in understanding the structure and function of proteins, which are essential biomolecules involved in various biological processes. In recent years, deep learning has emerged as a powerful tool for protein modeling due to its ability to learn complex patterns and representations from large-scale protein data. This comprehensive survey aims to pr… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  22. arXiv:2403.02361  [pdf

    q-bio.QM

    Renal function changes in chronic hepatitis B patients

    Authors: Jinhua Zhao, Lili Wu, Xiaoan Yang, Zhilaing Gao, Hong Deng

    Abstract: The best way to treat chronic hepatitis B is with pegylated interferon alone or with oral antiviral drugs. There is limited research comparing the renal safety of entecavir and tenofovir when used with pegylated interferon. This study will compare changes in renal function in chronic hepatitis B patients treated with pegylated interferon and either entecavir or tenofovir. The study included a coho… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Over the 48-week duration of combined treatment in patients with chronic hepatitis B (CHB), it was found that both Tenofovir Disoproxil Fumarate (TDF) and Entecavir (ETV) did not lead to an increase in renal injury

    ACM Class: G.1

  23. arXiv:2403.01528  [pdf, other

    cs.CL cs.AI q-bio.BM

    Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao Qin, Rui Yan

    Abstract: The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sources to enhance our fundamental understanding and enable downstream computational tasks such as biomol… ▽ More

    Submitted 5 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: Survey Paper. 25 pages, 9 figures, and 3 tables

  24. arXiv:2403.00875  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    Enhancing Protein Predictive Models via Proteins Data Augmentation: A Benchmark and New Directions

    Authors: Rui Sun, Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li

    Abstract: Augmentation is an effective alternative to utilize the small amount of labeled protein data. However, most of the existing work focuses on design-ing new architectures or pre-training tasks, and relatively little work has studied data augmentation for proteins. This paper extends data augmentation techniques previously used for images and texts to proteins and then benchmarks these techniques on… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  25. arXiv:2402.17810  [pdf, other

    q-bio.QM cs.AI cs.CE cs.LG q-bio.BM

    BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan

    Abstract: Recent research trends in computational biology have increasingly focused on integrating text and bio-entity modeling, especially in the context of molecules and proteins. However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e.g., IUPAC). This paper intro… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 (Findings)

  26. arXiv:2402.14391  [pdf, other

    cs.LG q-bio.BM

    MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

    Authors: Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li

    Abstract: Protein-Protein Interactions (PPIs) are fundamental in various biological processes and play a key role in life activities. The growing demand and cost of experimental PPI assays require computational methods for efficient PPI prediction. While existing methods rely heavily on protein sequence for PPI prediction, it is the protein structure that is the key to determine the interactions. To take bo… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  27. arXiv:2402.11459  [pdf, other

    q-bio.BM cs.AI cs.LG physics.chem-ph

    Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge

    Authors: Yufei Huang, Odin Zhang, Lirong Wu, Cheng Tan, Haitao Lin, Zhangyang Gao, Siyuan Li, Stan. Z. Li

    Abstract: Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging. While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation pre… ▽ More

    Submitted 21 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  28. arXiv:2402.08198  [pdf, other

    q-bio.BM cs.AI cs.LG

    PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction

    Authors: Lirong Wu, Yufei Huang, Cheng Tan, Zhangyang Gao, Bozhen Hu, Haitao Lin, Zicheng Liu, Stan Z. Li

    Abstract: Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery. Existing deep learning-based methods utilize only the single modality of protein sequences or structures and lack the co-modeling of the joint distribution of the two modalities, which may lead to significant performance drops in complex real-world sc… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  29. arXiv:2310.11802  [pdf, other

    cs.CE cs.LG q-bio.BM

    De novo protein design using geometric vector field networks

    Authors: Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao Chen, Chunhua Shen

    Abstract: Innovations like protein diffusion have enabled significant progress in de novo protein design, which is a vital topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context. Thus far,… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  30. arXiv:2310.11466  [pdf, other

    cs.LG cs.AI q-bio.QM

    Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction

    Authors: Yufei Huang, Siyuan Li, Jin Su, Lirong Wu, Odin Zhang, Haitao Lin, Jingqi Qi, Zihan Liu, Zhangyang Gao, Yuyang Liu, Jiangbin Zheng, Stan. ZQ. Li

    Abstract: Protein structure-based property prediction has emerged as a promising approach for various biological tasks, such as protein function prediction and sub-cellular location estimation. The existing methods highly rely on experimental protein structure data and fail in scenarios where these data are unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were utilized as alternati… ▽ More

    Submitted 19 October, 2023; v1 submitted 14 October, 2023; originally announced October 2023.

  31. arXiv:2310.07276  [pdf, other

    cs.CL cs.AI cs.LG q-bio.BM

    BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

    Authors: Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, Rui Yan

    Abstract: Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery. However, current models exhibit several limitations, such as the generation of invalid molecular SMILES, underutilization of contextual information, and equal treatment of structured and unstructured knowledge. To address these issues, we propose… ▽ More

    Submitted 28 January, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted by Empirical Methods in Natural Language Processing 2023 (EMNLP 2023)

  32. arXiv:2310.06763  [pdf, other

    cs.LG cs.AI q-bio.BM

    FABind: Fast and Accurate Protein-Ligand Binding

    Authors: Qizhi Pei, Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Kun He, Tie-Yan Liu, Rui Yan

    Abstract: Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based meth… ▽ More

    Submitted 8 January, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted by Neural Information Processing Systems 2023 (NeurIPS 2023)

  33. arXiv:2306.17775  [pdf, other

    stat.ML cs.LG q-bio.BM

    Practical and Asymptotically Exact Conditional Sampling in Diffusion Models

    Authors: Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, John P. Cunningham

    Abstract: Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requir… ▽ More

    Submitted 22 November, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Code: https://github.com/blt2114/twisted_diffusion_sampler

    Journal ref: NeurIPS 2023

  34. arXiv:2306.13769  [pdf, other

    q-bio.BM cs.LG

    Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration

    Authors: Haitao Lin, Yufei Huang, Odin Zhang, Lirong Wu, Siyuan Li, Zhiyuan Chen, Stan Z. Li

    Abstract: In recent years, AI-assisted drug design methods have been proposed to generate molecules given the pockets' structures of target proteins. Most of them are atom-level-based methods, which consider atoms as basic components and generate atom positions and types. In this way, however, it is hard to generate realistic fragments with complicated structures. To solve this, we propose D3FG, a functiona… ▽ More

    Submitted 18 March, 2024; v1 submitted 30 May, 2023; originally announced June 2023.

    Comments: 9 pages

  35. arXiv:2306.00838  [pdf, other

    q-bio.OT eess.IV

    The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI

    Authors: Ahmed W. Moawad, Anastasia Janas, Ujjwal Baid, Divya Ramakrishnan, Rachit Saluja, Nader Ashraf, Nazanin Maleki, Leon Jekel, Nikolay Yordanov, Pascal Fehringer, Athanasios Gkampenis, Raisa Amiruddin, Amirreza Manteghinejad, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Sanjay Aneja, Syed Muhammad Anwar, Timothy Bergquist, Veronica Chiang, Verena Chung, Gian Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov , et al. (207 additional authors not shown)

    Abstract: The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and chara… ▽ More

    Submitted 8 December, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

  36. arXiv:2305.09480  [pdf, other

    q-bio.BM cs.AI cs.LG

    Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot Antibody Designer

    Authors: Cheng Tan, Zhangyang Gao, Lirong Wu, Jun Xia, Jiangbin Zheng, Xihong Yang, Yue Liu, Bozhen Hu, Stan Z. Li

    Abstract: Antibodies are crucial proteins produced by the immune system in response to foreign substances or antigens. The specificity of an antibody is determined by its complementarity-determining regions (CDRs), which are located in the variable domains of the antibody chains and form the antigen-binding site. Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadeq… ▽ More

    Submitted 10 January, 2024; v1 submitted 21 April, 2023; originally announced May 2023.

    Comments: Accepted by AAAI 2024

  37. arXiv:2302.12563  [pdf, other

    q-bio.BM cs.LG

    Retrieved Sequence Augmentation for Protein Representation Learning

    Authors: Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, Lingpeng Kong

    Abstract: Protein language models have excelled in a variety of tasks, ranging from structure prediction to protein engineering. However, proteins are highly diverse in functions and structures, and current state-of-the-art models including the latest version of AlphaFold rely on Multiple Sequence Alignments (MSA) to feed in the evolutionary knowledge. Despite their success, heavy computational overheads, a… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  38. arXiv:2302.10888  [pdf, other

    cs.LG cs.AI q-bio.BM

    Data-Efficient Protein 3D Geometric Pretraining via Refinement of Diffused Protein Structure Decoy

    Authors: Yufei Huang, Lirong Wu, Haitao Lin, Jiangbin Zheng, Ge Wang, Stan Z. Li

    Abstract: Learning meaningful protein representation is important for a variety of biological downstream tasks such as structure-based drug design. Having witnessed the success of protein sequence pretraining, pretraining for structural data which is more informative has become a promising research topic. However, there are three major challenges facing protein structure pretraining: insufficient sample div… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

  39. arXiv:2212.14041  [pdf, other

    q-bio.BM cs.AI cs.LG

    Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective

    Authors: Cheng Tan, Zhangyang Gao, Hanqun Cao, Xingran Chen, Ge Wang, Lirong Wu, Jun Xia, Jiangbin Zheng, Stan Z. Li

    Abstract: The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby… ▽ More

    Submitted 19 June, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: Accepted by ICML 2024

  40. arXiv:2212.03447  [pdf

    cs.LG cs.CE q-bio.QM

    Integration of Pre-trained Protein Language Models into Geometric Deep Learning Networks

    Authors: Fang Wu, Lirong Wu, Dragomir Radev, Jinbo Xu, Stan Z. Li

    Abstract: Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad… ▽ More

    Submitted 29 October, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

  41. arXiv:2211.11214  [pdf, other

    q-bio.BM cs.LG

    DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

    Authors: Haitao Lin, Yufei Huang, Odin Zhang, Siqi Ma, Meng Liu, Xuanjing Li, Lirong Wu, Jishui Wang, Tingjun Hou, Stan Z. Li

    Abstract: Generating molecules that bind to specific proteins is an important but challenging task in drug discovery. Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one. However, in real-world molecular systems, the interactions among atoms in an entire molecule are global, leading to the energy function pair-coupled amon… ▽ More

    Submitted 14 July, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 13 pages

  42. arXiv:2211.08406  [pdf, other

    q-bio.BM cs.AI cs.LG

    Incorporating Pre-training Paradigm for Antibody Sequence-Structure Co-design

    Authors: Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Tianbo Peng, Yingce Xia, Liang He, Shufang Xie, Tao Qin, Haiguang Liu, Kun He, Tie-Yan Liu

    Abstract: Antibodies are versatile proteins that can bind to pathogens and provide effective protection for human body. Recently, deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences. However, the computational methods heavily rely on high-quality antibody structure data… ▽ More

    Submitted 17 November, 2022; v1 submitted 26 October, 2022; originally announced November 2022.

  43. arXiv:2210.00674  [pdf

    cs.LG q-bio.GN q-bio.QM

    Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength

    Authors: Chen Zhao, Joyce H Keyak, Xuewei Cao, Qiuying Sha, Li Wu, Zhe Luo, Lanjuan Zhao, Qing Tian, Chuan Qiu, Ray Su, Hui Shen, Hong-Wen Deng, Weihua Zhou

    Abstract: The aim of this paper is to design a deep learning-based model to predict proximal femoral strength using multi-view information fusion. Method: We developed new models using multi-view variational autoencoder (MVAE) for feature representation learning and a product of expert (PoE) model for multi-view information fusion. We applied the proposed models to an in-house Louisiana Osteoporosis Study (… ▽ More

    Submitted 27 March, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: 16 pages, 3 figures

  44. arXiv:2209.06158  [pdf, other

    q-bio.BM cs.LG

    Tailoring Molecules for Protein Pockets: a Transformer-based Generative Solution for Structured-based Drug Design

    Authors: Kehan Wu, Yingce Xia, Yang Fan, Pan Deng, Haiguang Liu, Lijun Wu, Shufang Xie, Tong Wang, Tao Qin, Tie-Yan Liu

    Abstract: Structure-based drug design is drawing growing attentions in computer-aided drug discovery. Compared with the virtual screening approach where a pre-defined library of compounds are computationally screened, de novo drug design based on the structure of a target protein can provide novel drug candidates. In this paper, we present a generative solution named TamGent (Target-aware molecule generator… ▽ More

    Submitted 30 August, 2022; originally announced September 2022.

  45. arXiv:2209.02876  [pdf, other

    cs.LG eess.IV q-bio.NC

    Self-supervised multimodal neuroimaging yields predictive representations for a spectrum of Alzheimer's phenotypes

    Authors: Alex Fedorov, Eloy Geenjaar, Lei Wu, Tristan Sylvain, Thomas P. DeRamus, Margaux Luck, Maria Misiura, R Devon Hjelm, Sergey M. Plis, Vince D. Calhoun

    Abstract: Recent neuroimaging studies that focus on predicting brain disorders via modern machine learning approaches commonly include a single modality and rely on supervised over-parameterized models.However, a single modality provides only a limited view of the highly complex brain. Critically, supervised models in clinical settings lack accurate diagnostic labels for training. Coarse labels do not captu… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

  46. arXiv:2206.09818  [pdf, other

    q-bio.BM cs.AI cs.LG

    SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction

    Authors: Qizhi Pei, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Haiguang Liu, Tie-Yan Liu, Rui Yan

    Abstract: Accurate prediction of Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep… ▽ More

    Submitted 17 October, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: Accepted by Briefings in Bioinformatics 2023

  47. arXiv:2110.08850  [pdf

    physics.soc-ph cs.LG cs.SI q-bio.MN stat.ML

    Understanding the network formation pattern for better link prediction

    Authors: Jiating Yu, Ling-Yun Wu

    Abstract: As a classical problem in the field of complex networks, link prediction has attracted much attention from researchers, which is of great significance to help us understand the evolution and dynamic development mechanisms of networks. Although various network type-specific algorithms have been proposed to tackle the link prediction problem, most of them suppose that the network structure is domina… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: 21 pages, 3 figures, 18 tables, and 29 references

    Journal ref: Physica A: Statistical Mechanics and its Applications, 600 (2022) 127522

  48. arXiv:2109.09119  [pdf

    q-bio.MN cs.SI physics.bio-ph physics.soc-ph q-bio.QM

    Network Refinement: A unified framework for enhancing signal or removing noise of networks

    Authors: Jiating Yu, Jiacheng Leng, Ling-Yun Wu

    Abstract: Networks are widely used in many fields for their powerful ability to provide vivid representations of relationships between variables. However, many of them may be corrupted by experimental noise or inappropriate network inference methods that inherently hamper the efficacy of network-based downstream analysis. Consequently, it's necessary to develop systematic methods for denoising networks, nam… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

    Comments: 20 pages, 7 figures, 1 table, 44 references, and 2 appendices. Submitted to IEEE Transactions on Network Science and Engineering

  49. arXiv:2108.04682  [pdf, other

    physics.chem-ph cs.LG q-bio.QM

    ChemiRise: a data-driven retrosynthesis engine

    Authors: Xiangyan Sun, Ke Liu, Yuquan Lin, Lingjie Wu, Haoming Xing, Minghong Gao, Ji Liu, Suocheng Tan, Zekun Ni, Qi Han, Junqiu Wu, Jie Fan

    Abstract: We have developed an end-to-end, retrosynthesis system, named ChemiRise, that can propose complete retrosynthesis routes for organic compounds rapidly and reliably. The system was trained on a processed patent database of over 3 million organic reactions. Experimental reactions were atom-mapped, clustered, and extracted into reaction templates. We then trained a graph convolutional neural network-… ▽ More

    Submitted 9 August, 2021; originally announced August 2021.

  50. arXiv:1907.02058  [pdf, other

    q-bio.PE physics.soc-ph

    Evolution of cooperation driven by active information spreading

    Authors: Bin Wu, Hye Jin Park, Lingshan Wu, Da Zhou

    Abstract: Cooperators forgo their interest to benefit others. Thus cooperation should not be favored by natural selection. It challenges the evolutionists, since cooperation is widespread. As one of the resolutions, information spreading has been revealed to play a key role in the emergence of cooperation. Individuals, however, are typically assumed to be passive in the information spreading. Here we assume… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: 14 pages, 4 figures

    Journal ref: Phys. Rev. E 100, 042303 (2019)