Skip to main content

Showing 1–30 of 30 results for author: Hsieh, C

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2506.06915  [pdf

    q-bio.BM cs.LG

    Graph Neural Networks in Modern AI-aided Drug Discovery

    Authors: Odin Zhang, Haitao Lin, Xujun Zhang, Xiaorui Wang, Zhenxing Wu, Qing Ye, Weibo Zhao, Jike Wang, Kejun Ying, Yu Kang, Chang-yu Hsieh, Tingjun Hou

    Abstract: Graph neural networks (GNNs), as topology/structure-aware models within deep learning, have emerged as powerful tools for AI-aided drug discovery (AIDD). By directly operating on molecular graphs, GNNs offer an intuitive and expressive framework for learning the complex topological and geometric features of drug-like molecules, cementing their role in modern molecular modeling. This review provide… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  2. arXiv:2505.03121  [pdf

    q-bio.BM

    AutoLoop: a novel autoregressive deep learning method for protein loop prediction with high accuracy

    Authors: Tianyue Wang, Xujun Zhang, Langcheng Wang, Odin Zhang, Jike Wang, Ercheng Wang, Jialu Wu, Renling Hu, Jingxuan Ge, Shimeng Li, Qun Su, Jiajun Yu, Chang-Yu Hsieh, Tingjun Hou, Yu Kang

    Abstract: Protein structure prediction is a critical and longstanding challenge in biology, garnering widespread interest due to its significance in understanding biological processes. A particular area of focus is the prediction of missing loops in proteins, which are vital in determining protein function and activity. To address this challenge, we propose AutoLoop, a novel computational model designed to… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 34 pages, 7 figures

  3. arXiv:2504.11454  [pdf, ps, other

    cs.LG cs.AI q-bio.QM

    Elucidating the Design Space of Multimodal Protein Language Models

    Authors: Cheng-Yen Hsieh, Xinyou Wang, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, Quanquan Gu

    Abstract: Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design spa… ▽ More

    Submitted 11 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: ICML 2025 Spotlight; Project Page: https://bytedance.github.io/dplm/dplm-2.1/

  4. arXiv:2504.10983  [pdf, other

    cs.LG cs.AI q-bio.BM

    ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings

    Authors: Zitai Kong, Yiheng Zhu, Yinlong Xu, Hanjing Zhou, Mingzhe Yin, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jian Wu

    Abstract: The design of protein sequences with desired functionalities is a fundamental task in protein engineering. Deep generative methods, such as autoregressive models and diffusion models, have greatly accelerated the discovery of novel protein sequences. However, these methods mainly focus on local or shallow residual semantics and suffer from low inference efficiency, large modeling space and high tr… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  5. arXiv:2411.15215  [pdf, other

    cs.LG cs.AI q-bio.BM

    S$^2$ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning

    Authors: Mingze Yin, Hanjing Zhou, Jialu Wu, Yiheng Zhu, Yuxuan Zhan, Zitai Kong, Hongxia Xu, Chang-Yu Hsieh, Jintai Chen, Tingjun Hou, Jian Wu

    Abstract: Antibodies safeguard our health through their precise and potent binding to specific antigens, demonstrating promising therapeutic efficacy in the treatment of numerous diseases, including COVID-19. Recent advancements in biomedical language models have shown the great potential to interpret complex biological structures and functions. However, existing antibody specific models have a notable limi… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

  6. arXiv:2411.09030  [pdf

    q-bio.PE math.DS

    How to quantify interaction strengths? A critical rethinking of the interaction Jacobian and evaluation methods for non-parametric inference in time series analysis

    Authors: Takeshi Miki, Chun-Wei Chang, Po-Ju Ke, Arndt Telschow, Cheng-Han Tsai, Masayuki Ushio, Chih-hao Hsieh

    Abstract: Quantifying interaction strengths between state variables in dynamical systems is essential for understanding ecological networks. Within the empirical dynamic modeling approach, multivariate S-map infers the interaction Jacobian from time series data without assuming specific dynamical models. This approach enables the non-parametric statistical inference of interspecific interactions through sta… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: 39 pages, 5 figures and 2 tables

  7. arXiv:2410.14946  [pdf, other

    cs.LG cs.AI q-bio.BM

    DEL-Ranking: Ranking-Correction Denoising Framework for Elucidating Molecular Affinities in DNA-Encoded Libraries

    Authors: Hanqun Cao, Mutian He, Ning Ma, Chang-yu Hsieh, Chunbin Gu, Pheng-Ann Heng

    Abstract: DNA-encoded library (DEL) screening has revolutionized the detection of protein-ligand interactions through read counts, enabling rapid exploration of vast chemical spaces. However, noise in read counts, stemming from nonspecific interactions, can mislead this exploration process. We present DEL-Ranking, a novel distribution-correction denoising framework that addresses these challenges. Our appro… ▽ More

    Submitted 4 December, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

  8. arXiv:2409.05916  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries

    Authors: Chunbin Gu, Mutian He, Hanqun Cao, Guangyong Chen, Chang-yu Hsieh, Pheng Ann Heng

    Abstract: In the realm of drug discovery, DNA-encoded library (DEL) screening technology has emerged as an efficient method for identifying high-affinity compounds. However, DEL screening faces a significant challenge: noise arising from nonspecific interactions within complex biological systems. Neural networks trained on DEL libraries have been employed to extract compound features, aiming to denoise the… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  9. arXiv:2407.12296  [pdf

    q-bio.BM

    Discovery of novel antimicrobial peptides with notable antibacterial potency by a LLM-based foundation model

    Authors: Jike Wang, Jianwen Feng, Yu Kang, Peichen Pan, Jingxuan Ge, Yan Wang, Mingyang Wang, Zhenxing Wu, Xingcai Zhang, Jiameng Yu, Xujun Zhang, Tianyue Wang, Lirong Wen, Guangning Yan, Yafeng Deng, Hui Shi, Chang-Yu Hsieh, Zhihui Jiang, Tingjun Hou

    Abstract: Large language models (LLMs) have shown remarkable advancements in chemistry and biomedical research, acting as versatile foundation models for various tasks. We introduce AMP-Designer, an LLM-based approach for swiftly designing novel antimicrobial peptides (AMPs) with desired properties. Within 11 days, AMP-Designer achieved the de novo design of 18 AMPs with broad-spectrum activity against Gram… ▽ More

    Submitted 2 March, 2025; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 43 pages, 6 figures, 5 tables. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF file

  10. arXiv:2407.07930  [pdf

    q-bio.BM cs.LG

    Token-Mol 1.0: Tokenized drug design with large language model

    Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More

    Submitted 19 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  11. arXiv:2406.12910  [pdf

    cs.LG cs.AI cs.NE physics.chem-ph q-bio.BM

    Human-level molecular optimization driven by mol-gene evolution

    Authors: Jiebin Fang, Churu Mao, Yuchen Zhu, Xiaoming Chen, Chang-Yu Hsieh, Zhongjun Ma

    Abstract: De novo molecule generation allows the search for more drug-like hits across a vast chemical space. However, lead optimization is still required, and the process of optimizing molecular structures faces the challenge of balancing structural novelty with pharmacological properties. This study introduces the Deep Genetic Molecular Modification Algorithm (DGMM), which brings structure modification to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2404.19230  [pdf

    q-bio.BM cs.AI

    Deep Lead Optimization: Leveraging Generative AI for Structural Modification

    Authors: Odin Zhang, Haitao Lin, Hui Zhang, Huifeng Zhao, Yufei Huang, Yuansheng Huang, Dejun Jiang, Chang-yu Hsieh, Peichen Pan, Tingjun Hou

    Abstract: The idea of using deep-learning-based molecular generation to accelerate discovery of drug candidates has attracted extraordinary attention, and many deep generative models have been developed for automated drug design, termed molecular generation. In general, molecular generation encompasses two main strategies: de novo design, which generates novel molecular structures from scratch, and lead opt… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  13. arXiv:2404.10573  [pdf, other

    cs.AI cs.CE q-bio.BM

    AAVDiff: Experimental Validation of Enhanced Viability and Diversity in Recombinant Adeno-Associated Virus (AAV) Capsids through Diffusion Generation

    Authors: Lijun Liu, Jiali Yang, Jianfei Song, Xinglin Yang, Lele Niu, Zeqi Cai, Hui Shi, Tingjun Hou, Chang-yu Hsieh, Weiran Shen, Yafeng Deng

    Abstract: Recombinant adeno-associated virus (rAAV) vectors have revolutionized gene therapy, but their broad tropism and suboptimal transduction efficiency limit their clinical applications. To overcome these limitations, researchers have focused on designing and screening capsid libraries to identify improved vectors. However, the large sequence space and limited resources present challenges in identifyin… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  14. arXiv:2404.00014  [pdf

    physics.chem-ph cs.AI q-bio.BM

    Deep Geometry Handling and Fragment-wise Molecular 3D Graph Generation

    Authors: Odin Zhang, Yufei Huang, Shichen Cheng, Mengyao Yu, Xujun Zhang, Haitao Lin, Yundian Zeng, Mingyang Wang, Zhenxing Wu, Huifeng Zhao, Zaixi Zhang, Chenqing Hua, Yu Kang, Sunliang Cui, Peichen Pan, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Most earlier 3D structure-based molecular generation approaches follow an atom-wise paradigm, incrementally adding atoms to a partially built molecular fragment within protein pockets. These methods, while effective in designing tightly bound ligands, often overlook other essential properties such as synthesizability. The fragment-wise generation paradigm offers a promising solution. However, a co… ▽ More

    Submitted 15 March, 2024; originally announced April 2024.

  15. arXiv:2402.10516  [pdf, other

    q-bio.BM cs.AI cs.LG

    Generative AI for Controllable Protein Sequence Design: A Survey

    Authors: Yiheng Zhu, Zitai Kong, Jialu Wu, Weize Liu, Yuqiang Han, Mingze Yin, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou

    Abstract: The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particul… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 9 pages

  16. arXiv:2311.02798  [pdf, other

    cs.LG physics.chem-ph q-bio.QM

    Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation

    Authors: Yue Wan, Jialu Wu, Tingjun Hou, Chang-Yu Hsieh, Xiaowei Jia

    Abstract: Reliable molecular property prediction is essential for various scientific endeavors and industrial applications, such as drug discovery. However, the data scarcity, combined with the highly non-linear causal relationships between physicochemical and biological properties and conventional molecular featurization schemes, complicates the development of robust molecular machine learning models. Self… ▽ More

    Submitted 12 January, 2025; v1 submitted 5 November, 2023; originally announced November 2023.

    Journal ref: Nat Commun 16, 413 (2025)

  17. arXiv:2308.02172  [pdf

    q-bio.BM

    Delete: Deep Lead Optimization Enveloped in Protein Pocket through Unified Deleting Strategies and a Structure-aware Network

    Authors: Haotian Zhang, Huifeng Zhao, Xujun Zhang, Qun Su, Hongyan Du, Chao Shen, Zhe Wang, Dan Li, Peichen Pan, Guangyong Chen, Yu Kang, Chang-yu Hsieh, Tingjun Hou

    Abstract: Drug discovery is a highly complicated process, and it is unfeasible to fully commit it to the recently developed molecular generation methods. Deep learning-based lead optimization takes expert knowledge as a starting point, learning from numerous historical cases about how to modify the structure for better drug-forming properties. However, compared with the more established de novo generation s… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  18. arXiv:2306.12754  [pdf, other

    q-bio.BM

    Highly accurate and efficient deep learning paradigm for full-atom protein loop modeling with KarmaLoop

    Authors: Tianyue Wang, Xujun Zhang, Odin Zhang, Peichen Pan, Guangyong Chen, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Protein loop modeling is the most challenging yet highly non-trivial task in protein structure prediction. Despite recent progress, existing methods including knowledge-based, ab initio, hybrid and deep learning (DL) methods fall significantly short of either atomic accuracy or computational efficiency. Moreover, an overarching focus on backbone atoms has resulted in a dearth of attention given to… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 20 pages, 6 figures, journal articles and keywords:Protein loop modeling, Loop prediction, Antibody H3 loop, Deep Learning

  19. arXiv:2304.12436  [pdf, other

    q-bio.BM cs.LG

    An Equivariant Generative Framework for Molecular Graph-Structure Co-Design

    Authors: Zaixi Zhang, Qi Liu, Chee-Kong Lee, Chang-Yu Hsieh, Enhong Chen

    Abstract: Designing molecules with desirable physiochemical properties and functionalities is a long-standing challenge in chemistry, material science, and drug discovery. Recently, machine learning-based generative models have emerged as promising approaches for \emph{de novo} molecule design. However, further refinement of methodology is highly desired as most existing methods lack unified modeling of 2D… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Under review

  20. arXiv:2205.09548  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution

    Authors: Lixue Cheng, Ziyi Yang, Changyu Hsieh, Benben Liao, Shengyu Zhang

    Abstract: Directed evolution is a versatile technique in protein engineering that mimics the process of natural selection by iteratively alternating between mutagenesis and screening in order to search for sequences that optimize a given property of interest, such as catalytic activity and binding affinity to a specified target. However, the space of possible proteins is too large to search exhaustively in… ▽ More

    Submitted 1 May, 2024; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: 27 pages, 13 figures

  21. arXiv:2201.04065  [pdf, other

    eess.SP cs.LG q-bio.NC

    ExBrainable: An Open-Source GUI for CNN-based EEG Decoding and Model Interpretation

    Authors: Ya-Lin Huang, Chia-Ying Hsieh, Jian-Xue Huang, Chun-Shu Wei

    Abstract: We have developed a graphic user interface (GUI), ExBrainable, dedicated to convolutional neural networks (CNN) model training and visualization in electroencephalography (EEG) decoding. Available functions include model training, evaluation, and parameter visualization in terms of temporal and spatial representations. We demonstrate these functions using a well-studied public dataset of motor-ima… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

  22. arXiv:2111.08008  [pdf, other

    q-bio.QM cs.LG

    SPLDExtraTrees: Robust machine learning approach for predicting kinase inhibitor resistance

    Authors: Ziyi Yang, Zhaofeng Ye, Yijia Xiao, Changyu Hsieh, Shengyu Zhang

    Abstract: Drug resistance is a major threat to the global health and a significant concern throughout the clinical treatment of diseases and drug development. The mutation in proteins that is related to drug binding is a common cause for adaptive drug resistance. Therefore, quantitative estimations of how mutations would affect the interaction between a drug and the target protein would be of vital signific… ▽ More

    Submitted 14 January, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 14 pages, 5 figures

    MSC Class: machine learning

  23. arXiv:2108.07435  [pdf, other

    cs.LG cs.CL q-bio.BM

    Modeling Protein Using Large-scale Pretrain Language Model

    Authors: Yijia Xiao, Jiezhong Qiu, Ziang Li, Chang-Yu Hsieh, Jie Tang

    Abstract: Protein is linked to almost every life process. Therefore, analyzing the biological structure and property of protein sequences is critical to the exploration of life, as well as disease detection and drug discovery. Traditional protein analysis methods tend to be labor-intensive and time-consuming. The emergence of deep learning models makes modeling data patterns in large quantities of data poss… ▽ More

    Submitted 7 December, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

    Comments: Accepted paper in Pretrain@KDD 2021 (The International Workshop on Pretraining: Algorithms, Architectures, and Applications)

  24. arXiv:2102.04069  [pdf

    q-bio.PE q-bio.QM

    Reconstructing large networks with time-varying interactions

    Authors: Chun-Wei Chang, Takeshi Miki, Masayuki Ushio, Hsiao-Pei Lu, Fuh-Kwo Shiah, Chih-hao Hsieh

    Abstract: Reconstructing interactions from observational data is a critical need for investigating natural biological networks, wherein network dimensionality (i.e. number of interacting components) is usually high and interactions are time-varying. These pose a challenge to existing methods that can quantify only small interaction networks or assume static interactions under steady state. Here, we proposed… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: The first 28 pages are the main text (including four figures and one table) and followed by SI Texts, including Fig. S1-S12 and Table S1-S2

  25. arXiv:2012.14333  [pdf, other

    cond-mat.soft q-bio.QM

    Radius evolution for bubbles with elastic shells

    Authors: S. C. Mancas, H. C. Rosu, C. -C. Hsieh

    Abstract: We present an analysis of an extended Rayleigh-Plesset (RP) equation for a three dimensional cell of microorganisms such as bacteria or viruses in some liquid, where the cell membrane in bacteria or the envelope (capsid) in viruses possess elastic properties. To account for rapid changes in the shape configuration of such microorganisms, the bubble membrane/envelope must be rigid to resist large p… ▽ More

    Submitted 27 August, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: 11 pages, 6 figures with subfigures, published version

    Journal ref: Commun Nonlinear Sci Numer Simulat 103 (2021) 106003

  26. arXiv:2011.00790  [pdf, ps, other

    math.OC eess.SY q-bio.PE

    On Control of Epidemics with Application to COVID-19

    Authors: Chung-Han Hsieh

    Abstract: At the time of writing, the ongoing COVID-19 pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), had already resulted in more than thirty-two million cases infected and more than one million deaths worldwide. Given the fact that the pandemic is still threatening health and safety, it is in the urgency to understand the COVID-19 contagion process and know how it migh… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: Submitted to the SIAM Journal on Control and Optimization

    MSC Class: 93E03; 93D15; 92B05; 92D30

    Journal ref: IEEE Access, vol. 9, pp. 167948-167958, 2021

  27. arXiv:1909.05577  [pdf

    q-bio.PE q-bio.QM

    Ecosystem-level stabilizing effects of biodiversity via nutrient-diversity feedbacks in multitrophic systems

    Authors: Chun-Wei Chang, Chih-hao Hsieh, Takeshi Miki

    Abstract: Statistical averaging and asynchronous population dynamics as portfolio mechanisms are considered as the most important processes with which biodiversity contributes to ecosystem stability. However, portfolio theories usually regard biodiversity as a fixed property, but overlook the dynamics of biodiversity altered by other ecosystem components. Here, we proposed a new mechanistic food chain model… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: Main text: 30 pages including 4 figures and 2 tables; Supplementary Information: 10 pages

  28. arXiv:physics/0206024  [pdf, ps, other

    physics.bio-ph q-bio.GN

    Minimal model for genome evolution and growth

    Authors: L. C. Hsieh, L. F. Luo, F. M. Ji, H. C. Lee

    Abstract: Textual analysis of typical microbial genomes reveals that they have the statistical characteristics of a DNA sequence of a much shorter length. This peculiar property supports an evolutionary model in which a genome evolves by random mutation but primarily grows by random segmental self-copying. That genomes grew mostly by self-copying is consistent with the observation that repeat sequences in… ▽ More

    Submitted 11 June, 2002; originally announced June 2002.

    Comments: gzip file (LaTeX source and 5 .ps file for figures), 4 pages, 5 figures, 1 table

  29. arXiv:physics/0104015  [pdf, ps, other

    physics.bio-ph physics.comp-ph q-bio

    Geometric and Statistical Properties of the Mean-Field HP Model, the LS Model and Real Protein Sequences

    Authors: C. T. Shih, Z. Y. Su, J. F. Gwan, B. L. Hao, C. H. Hsieh, J. L. Lo., H. C. Lee

    Abstract: Lattice models, for their coarse-grained nature, are best suited for the study of the ``designability problem'', the phenomenon in which most of the about 16,000 proteins of known structure have their native conformations concentrated in a relatively small number of about 500 topological classes of conformations. Here it is shown that on a lattice the most highly designable simulated protein str… ▽ More

    Submitted 27 December, 2001; v1 submitted 3 April, 2001; originally announced April 2001.

    Comments: 12 pages, 10 figures

  30. Mean-Field HP Model, Designability and Alpha-Helices in Protein Structures

    Authors: C. T. Shih, Z. Y. Su, J. F. Gwan, H. C. Lee, B. L. Hao, C. H. Hsieh

    Abstract: Analysis of the geometric properties of a mean-field HP model on a square lattice for protein structure shows that structures with large number of switch backs between surface and core sites are chosen favorably by peptides as unique ground states. Global comparison of model (binary) peptide sequences with concatenated (binary) protein sequences listed in the Protein Data Bank and the Dali Domai… ▽ More

    Submitted 16 November, 1999; v1 submitted 14 December, 1998; originally announced December 1998.

    Comments: 4 pages, 2 figures

    Report number: NCHC-phys-1998-1024

    Journal ref: Phys. Rev. Lett. 84, p.386, 2000