Skip to main content

Showing 1–50 of 90 results for author: Chen, H

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2509.14788  [pdf, ps, other

    cs.LG cs.AI q-bio.BM

    Structure-Aware Contrastive Learning with Fine-Grained Binding Representations for Drug Discovery

    Authors: Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Nga-Chun Ng, Gwing Kei Yip, Gerald W. Y. Cheng, Yunlin Mao, Jing Cai, Liang-ting Lin, Jung Sun Yoo

    Abstract: Accurate identification of drug-target interactions (DTI) remains a central challenge in computational pharmacology, where sequence-based methods offer scalability. This work introduces a sequence-based drug-target interaction framework that integrates structural priors into protein representations while maintaining high-throughput screening capability. Evaluated across multiple benchmarks, the mo… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  2. arXiv:2509.12600  [pdf, ps, other

    cs.LG cs.AI q-bio.QM

    A Multimodal Foundation Model to Enhance Generalizability and Data Efficiency for Pan-cancer Prognosis Prediction

    Authors: Huajun Zhou, Fengtao Zhou, Jiabo Ma, Yingxue Xu, Xi Wang, Xiuming Zhang, Li Liang, Zhenhui Li, Hao Chen

    Abstract: Multimodal data provides heterogeneous information for a holistic understanding of the tumor microenvironment. However, existing AI models often struggle to harness the rich information within multimodal data and extract poorly generalizable representations. Here we present MICE (Multimodal data Integration via Collaborative Experts), a multimodal foundation model that effectively integrates patho… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

    Comments: 27 pages, 7 figures

  3. arXiv:2508.01799  [pdf, ps, other

    q-bio.BM cs.AI cs.LG

    Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery

    Authors: Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Nga-Chun Ng, Gerald W. Y. Cheng, Zongxi Li, Jing Cai, Liang-ting Lin, Jung Sun Yoo

    Abstract: Accurate prediction of protein-ligand interactions is essential for computer-aided drug discovery. However, existing methods often fail to capture solvent-dependent conformational changes and lack the ability to jointly learn multiple related tasks. To address these limitations, we introduce a pre-training method that incorporates ligand conformational ensembles generated under diverse solvent con… ▽ More

    Submitted 27 August, 2025; v1 submitted 3 August, 2025; originally announced August 2025.

    Comments: 10 pages, 4 figures

  4. arXiv:2507.02379  [pdf

    cs.AI q-bio.BM

    An AI-native experimental laboratory for autonomous biomolecular engineering

    Authors: Mingyu Wu, Zhaoguo Wang, Jiabin Wang, Zhiyuan Dong, Jingkai Yang, Qingting Li, Tianyu Huang, Lei Zhao, Mingqiang Li, Fei Wang, Chunhai Fan, Haibo Chen

    Abstract: Autonomous scientific research, capable of independently conducting complex experiments and serving non-specialists, represents a long-held aspiration. Achieving it requires a fundamental paradigm shift driven by artificial intelligence (AI). While autonomous experimental systems are emerging, they remain confined to areas featuring singular objectives and well-defined, simple experimental workflo… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  5. arXiv:2507.00953  [pdf, ps, other

    q-bio.BM cs.AI

    From Sentences to Sequences: Rethinking Languages in Biological System

    Authors: Ke Liu, Shuaike Shen, Hao Chen

    Abstract: The paradigm of large language models in natural language processing (NLP) has also shown promise in modeling biological languages, including proteins, RNA, and DNA. Both the auto-regressive generation paradigm and evaluation metrics have been transferred from NLP to biological sequence modeling. However, the intrinsic structural correlations in natural and biological languages differ fundamentall… ▽ More

    Submitted 3 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

  6. arXiv:2506.01302  [pdf, ps, other

    cs.LG q-bio.QM

    Recent Developments in GNNs for Drug Discovery

    Authors: Zhengyu Fang, Xiaoge Zhang, Anyin Zhao, Xiao Li, Huiyuan Chen, Jing Li

    Abstract: In this paper, we review recent developments and the role of Graph Neural Networks (GNNs) in computational drug discovery, including molecule generation, molecular property prediction, and drug-drug interaction prediction. By summarizing the most recent developments in this area, we underscore the capabilities of GNNs to comprehend intricate molecular patterns, while exploring both their current a… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  7. arXiv:2503.07664  [pdf

    q-bio.QM cs.IR cs.LG stat.AP

    Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs

    Authors: Fateme Nateghi Haredasht, Fatemeh Amrollahi, Manoj Maddali, Nicholas Marshall, Stephen P. Ma, Lauren N. Cooper, Andrew O. Johnson, Ziming Wei, Richard J. Medford, Sanjat Kanjilal, Niaz Banaei, Stanley Deresinski, Mary K. Goldstein, Steven M. Asch, Amy Chang, Jonathan H. Chen

    Abstract: The Antibiotic Resistance Microbiology Dataset (ARMD) is a de-identified resource derived from electronic health records (EHR) that facilitates research in antimicrobial resistance (AMR). ARMD encompasses big data from adult patients collected from over 15 years at two academic-affiliated hospitals, focusing on microbiological cultures, antibiotic susceptibilities, and associated clinical and demo… ▽ More

    Submitted 21 July, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

  8. arXiv:2502.12479  [pdf, other

    cs.LG q-bio.BM

    MotifBench: A standardized protein design benchmark for motif-scaffolding problems

    Authors: Zhuoqi Zheng, Bo Zhang, Kieran Didi, Kevin K. Yang, Jason Yim, Joseph L. Watson, Hai-Feng Chen, Brian L. Trippe

    Abstract: The motif-scaffolding problem is a central task in computational protein design: Given the coordinates of atoms in a geometry chosen to confer a desired biochemical function (a motif), the task is to identify diverse protein structures (scaffolds) that include the motif and maintain its geometry. Significant recent progress on motif-scaffolding has been made due to computational evaluation with re… ▽ More

    Submitted 19 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Associated content available at github.com/blt2114/MotifBench

  9. arXiv:2502.12453  [pdf, other

    cs.LG cs.AI q-bio.BM

    UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery

    Authors: Ruifeng Li, Mingqian Li, Wei Liu, Yuhua Zhou, Xiangxin Zhou, Yuan Yao, Qiang Zhang, Hongyang Chen

    Abstract: Drug discovery is crucial for identifying candidate drugs for various diseases.However, its low success rate often results in a scarcity of annotations, posing a few-shot learning problem. Existing methods primarily focus on single-scale features, overlooking the hierarchical molecular structures that determine different molecular properties. To address these issues, we introduce Universal Matchin… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: accepted as ICLR 2025 Spotlight

    MSC Class: 68U07

  10. arXiv:2502.09662  [pdf, other

    q-bio.QM cs.CV eess.IV

    Generalizable Cervical Cancer Screening via Large-scale Pretraining and Test-Time Adaptation

    Authors: Hao Jiang, Cheng Jin, Huangjing Lin, Yanning Zhou, Xi Wang, Jiabo Ma, Li Ding, Jun Hou, Runsheng Liu, Zhizhong Chai, Luyang Luo, Huijuan Shi, Yinling Qian, Qiong Wang, Changzhong Li, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

    Abstract: Cervical cancer is a leading malignancy in female reproductive system. While AI-assisted cytology offers a cost-effective and non-invasive screening solution, current systems struggle with generalizability in complex clinical scenarios. To address this issue, we introduced Smart-CCS, a generalizable Cervical Cancer Screening paradigm based on pretraining and adaptation to create robust and general… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  11. arXiv:2501.13628  [pdf

    q-bio.NC

    Language modulates vision: Evidence from neural networks and human brain-lesion models

    Authors: Haoyang Chen, Bo Liu, Shuyue Wang, Xiaosha Wang, Wenjuan Han, Yixin Zhu, Xiaochun Wang, Yanchao Bi

    Abstract: Comparing information structures in between deep neural networks (DNNs) and the human brain has become a key method for exploring their similarities and differences. Recent research has shown better alignment of vision-language DNN models, such as CLIP, with the activity of the human ventral occipitotemporal cortex (VOTC) than earlier vision models, supporting the idea that language modulates huma… ▽ More

    Submitted 17 September, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  12. arXiv:2501.08187  [pdf, other

    cs.CL cs.AI cs.CE cs.HC cs.LG q-bio.CB

    A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

    Authors: Yin Fang, Xinle Deng, Kangwei Liu, Ningyu Zhang, Jingyang Qian, Penghui Yang, Xiaohui Fan, Huajun Chen

    Abstract: Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. In the life sciences, single-cell RNA sequencing (scRNA-seq) data serves as the "language of cellular biology", capturing intricate gene expression patterns at the single-cell level. However, interacting with this "language" through conventional tools is often ineffici… ▽ More

    Submitted 14 January, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

    Comments: 37 pages; 13 figures; Code: https://github.com/zjunlp/Instructcell, Models: https://huggingface.co/zjunlp/Instructcell-chat, https://huggingface.co/zjunlp/InstructCell-instruct

  13. arXiv:2411.14743  [pdf, other

    cs.CV cs.AI q-bio.QM

    FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification

    Authors: Zhengrui Guo, Conghao Xiong, Jiabo Ma, Qichen Sun, Lishuang Feng, Jinzhuo Wang, Hao Chen

    Abstract: Few-shot learning presents a critical solution for cancer diagnosis in computational pathology (CPath), addressing fundamental limitations in data availability, particularly the scarcity of expert annotations and patient privacy constraints. A key challenge in this paradigm stems from the inherent disparity between the limited training set of whole slide images (WSIs) and the enormous number of co… ▽ More

    Submitted 20 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: Accepted by CVPR'2025

  14. arXiv:2410.20711  [pdf, other

    cs.LG cs.AI q-bio.BM

    Contextual Representation Anchor Network to Alleviate Selection Bias in Few-Shot Drug Discovery

    Authors: Ruifeng Li, Wei Liu, Xiangxin Zhou, Mingqian Li, Qiang Zhang, Hongyang Chen, Xuemin Lin

    Abstract: In the drug discovery process, the low success rate of drug candidate screening often leads to insufficient labeled data, causing the few-shot learning problem in molecular property prediction. Existing methods for few-shot molecular property prediction overlook the sample selection bias, which arises from non-random sample selection in chemical experiments. This bias in data representativeness le… ▽ More

    Submitted 29 October, 2024; v1 submitted 27 October, 2024; originally announced October 2024.

    Comments: 13 pages, 7 figures

    MSC Class: 68U07 ACM Class: I.2.1

  15. arXiv:2410.13872  [pdf, other

    cs.NE cs.LG q-bio.NC

    BLEND: Behavior-guided Neural Population Dynamics Modeling via Privileged Knowledge Distillation

    Authors: Zhengrui Guo, Fangxu Zhou, Wei Wu, Qichen Sun, Lishuang Feng, Jinzhuo Wang, Hao Chen

    Abstract: Modeling the nonlinear dynamics of neuronal populations represents a key pursuit in computational neuroscience. Recent research has increasingly focused on jointly modeling neural activity and behavior to unravel their interconnections. Despite significant efforts, these approaches often necessitate either intricate model designs or oversimplified assumptions. Given the frequent absence of perfect… ▽ More

    Submitted 6 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted by ICLR'2025

  16. arXiv:2410.09543  [pdf, other

    cs.CE cs.AI q-bio.BM

    Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions

    Authors: Xiaoran Jiao, Weian Mao, Wengong Jin, Peiyuan Yang, Hao Chen, Chunhua Shen

    Abstract: Predicting the change in binding free energy ($ΔΔG$) is crucial for understanding and modulating protein-protein interactions, which are critical in drug design. Due to the scarcity of experimental $ΔΔG$ data, existing methods focus on pre-training, while neglecting the importance of alignment. In this work, we propose the Boltzmann Alignment technique to transfer knowledge from pre-trained invers… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  17. Advancing biomolecular understanding and design following human instructions

    Authors: Xiang Zhuang, Keyan Ding, Tianwen Lyu, Yinuo Jiang, Xiaotong Li, Zhuoyi Xiang, Zeyuan Wang, Ming Qin, Kehua Feng, Jike Wang, Qiang Zhang, Huajun Chen

    Abstract: Understanding and designing biomolecules, such as proteins and small molecules, is central to advancing drug discovery, synthetic biology and enzyme engineering. Recent breakthroughs in artificial intelligence have revolutionized biomolecular research, achieving remarkable accuracy in biomolecular prediction and design. However, a critical gap remains between artificial intelligence's computationa… ▽ More

    Submitted 25 July, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

    Journal ref: Nature Machine Intelligence volume 7, pages1154-1167 (2025)

  18. arXiv:2409.19407  [pdf, other

    q-bio.NC cs.AI cs.CV

    Brain-JEPA: Brain Dynamics Foundation Model with Gradient Positioning and Spatiotemporal Masking

    Authors: Zijian Dong, Ruilin Li, Yilei Wu, Thuan Tinh Nguyen, Joanna Su Xian Chong, Fang Ji, Nathanael Ren Jie Tong, Christopher Li Hsian Chen, Juan Helen Zhou

    Abstract: We introduce Brain-JEPA, a brain dynamics foundation model with the Joint-Embedding Predictive Architecture (JEPA). This pioneering model achieves state-of-the-art performance in demographic prediction, disease diagnosis/prognosis, and trait prediction through fine-tuning. Furthermore, it excels in off-the-shelf evaluations (e.g., linear probing) and demonstrates superior generalizability across d… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

    Comments: The first two authors contributed equally. NeurIPS 2024 Spotlight

  19. arXiv:2408.07636  [pdf, ps, other

    q-bio.QM cs.AI cs.LG

    Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding

    Authors: Bing Hu, Anita Layton, Helen Chen

    Abstract: Artificial intelligence (AI) is increasingly used in every stage of drug development. One challenge facing drug discovery AI is that drug pharmacokinetic (PK) datasets are often collected independently from each other, often with limited overlap, creating data overlap sparsity. Data sparsity makes data curation difficult for researchers looking to answer research questions in poly-pharmacy, drug c… ▽ More

    Submitted 1 July, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: 13 pages, 5 figures, 4 tables

  20. arXiv:2406.19611  [pdf, other

    q-bio.QM cs.AI

    Multimodal Data Integration for Precision Oncology: Challenges and Future Directions

    Authors: Huajun Zhou, Fengtao Zhou, Chenyu Zhao, Yingxue Xu, Luyang Luo, Hao Chen

    Abstract: The essence of precision oncology lies in its commitment to tailor targeted treatments and care measures to each patient based on the individual characteristics of the tumor. The inherent heterogeneity of tumors necessitates gathering information from diverse data sources to provide valuable insights from various perspectives, fostering a holistic comprehension of the tumor. Over the past decade,… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 15 pages, 4 figures

  21. arXiv:2406.03141  [pdf, other

    q-bio.BM cs.LG

    Floating Anchor Diffusion Model for Multi-motif Scaffolding

    Authors: Ke Liu, Weian Mao, Shuaike Shen, Xiaoran Jiao, Zheng Sun, Hao Chen, Chunhua Shen

    Abstract: Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes. Previous works approach the problem by inpainting or conditional generation. Both of them can only scaffold motifs with fixed positions, and the conditional generation cannot guarantee the presence of motifs. However… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  22. arXiv:2405.03799  [pdf, other

    cs.LG cs.AI q-bio.QM

    Synthetic Data from Diffusion Models Improve Drug Discovery Prediction

    Authors: Bing Hu, Ashish Saragadam, Anita Layton, Helen Chen

    Abstract: Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new data challenge that slows the advancement of drug discovery AI: datasets are often collected independently from each other, often with little overlap, creating d… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  23. arXiv:2404.10354  [pdf

    q-bio.QM cs.CE cs.LG

    Physical formula enhanced multi-task learning for pharmacokinetics prediction

    Authors: Ruifeng Li, Dongzhan Zhou, Ancheng Shen, Ao Zhang, Mao Su, Mingqian Li, Hongyang Chen, Gang Chen, Yin Zhang, Shufei Zhang, Yuqiang Li, Wanli Ouyang

    Abstract: Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. I… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  24. arXiv:2403.01433  [pdf, other

    cs.CE q-bio.NC

    BrainMass: Advancing Brain Network Analysis for Diagnosis with Large-scale Self-Supervised Learning

    Authors: Yanwu Yang, Chenfei Ye, Guinan Su, Ziyao Zhang, Zhikai Chang, Hairui Chen, Piu Chan, Yue Yu, Ting Ma

    Abstract: Foundation models pretrained on large-scale datasets via self-supervised learning demonstrate exceptional versatility across various tasks. Due to the heterogeneity and hard-to-collect medical data, this approach is especially beneficial for medical image analysis and neuroscience research, as it streamlines broad downstream tasks without the need for numerous costly annotations. However, there ha… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  25. arXiv:2402.01467  [pdf, other

    eess.SY cs.AI cs.CE cs.NE q-bio.NC

    Brain-Like Replay Naturally Emerges in Reinforcement Learning Agents

    Authors: Jiyi Wang, Likai Tang, Huimiao Chen, Marcelo G Mattar, Sen Song

    Abstract: Replay is a powerful strategy to promote learning in artificial intelligence and the brain. However, the conditions to generate it and its functional advantages have not been fully recognized. In this study, we develop a modular reinforcement learning model that could generate replay. We prove that replay generated in this way helps complete the task. We also analyze the information contained in t… ▽ More

    Submitted 6 October, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  26. arXiv:2401.02683  [pdf, other

    cs.LG cs.AI q-bio.BM

    Geometric-Facilitated Denoising Diffusion Model for 3D Molecule Generation

    Authors: Can Xu, Haosen Wang, Weigang Wang, Pengfei Zheng, Hongyang Chen

    Abstract: Denoising diffusion models have shown great potential in multiple research areas. Existing diffusion-based generative methods on de novo 3D molecule generation face two major challenges. Since majority heavy atoms in molecules allow connections to multiple atoms through single bonds, solely using pair-wise distance to model molecule geometries is insufficient. Therefore, the first one involves pro… ▽ More

    Submitted 22 April, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: 9 pages, 6 figures, AAAI-24 Main Track

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 1 (March 25, 2024): 338-346

  27. arXiv:2310.11802  [pdf, other

    cs.CE cs.LG q-bio.BM

    De novo protein design using geometric vector field networks

    Authors: Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao Chen, Chunhua Shen

    Abstract: Innovations like protein diffusion have enabled significant progress in de novo protein design, which is a vital topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context. Thus far,… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  28. arXiv:2310.10138  [pdf, other

    cs.DB cs.CL q-bio.QM

    Node-based Knowledge Graph Contrastive Learning for Medical Relationship Prediction

    Authors: Zhiguang Fan, Yuedong Yang, Mingyuan Xu, Hongming Chen

    Abstract: The embedding of Biomedical Knowledge Graphs (BKGs) generates robust representations, valuable for a variety of artificial intelligence applications, including predicting drug combinations and reasoning disease-drug relationships. Meanwhile, contrastive learning (CL) is widely employed to enhance the distinctiveness of these representations. However, constructing suitable contrastive pairs for CL,… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 10 pages,5 figures,conference

  29. arXiv:2310.03269  [pdf, other

    q-bio.BM cs.CL

    InstructProtein: Aligning Human and Protein Language via Knowledge Instruction

    Authors: Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, Huajun Chen

    Abstract: Large Language Models (LLMs) have revolutionized the field of natural language processing, but they fall short in comprehending biological sequences such as proteins. To address this challenge, we propose InstructProtein, an innovative LLM that possesses bidirectional generation capabilities in both human and protein languages: (i) taking a protein sequence as input to predict its textual function… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  30. arXiv:2308.00237  [pdf, other

    q-bio.BM physics.chem-ph

    EC-Conf: An Ultra-fast Diffusion Model for Molecular Conformation Generation with Equivariant Consistency

    Authors: Zhiguang Fan, Yuedong Yang, Mingyuan Xu, Hongming Chen

    Abstract: Despite recent advancement in 3D molecule conformation generation driven by diffusion models, its high computational cost in iterative diffusion/denoising process limits its application. In this paper, an equivariant consistency model (EC-Conf) was proposed as a fast diffusion method for low-energy conformation generation. In EC-Conf, a modified SE (3)-equivariant transformer model was directly us… ▽ More

    Submitted 23 November, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: 10 pages, 3 figures

  31. arXiv:2307.15170  [pdf

    q-bio.NC

    Quantifying interictal intracranial EEG to predict focal epilepsy

    Authors: Ryan S Gallagher, Nishant Sinha, Akash R Pattnaik, William K. S. Ojemann, Alfredo Lucas, Joshua J. LaRocque, John M Bernabei, Adam S Greenblatt, Elizabeth M Sweeney, H Isaac Chen, Kathryn A Davis, Erin C Conrad, Brian Litt

    Abstract: Intracranial EEG (IEEG) is used for 2 main purposes, to determine: (1) if epileptic networks are amenable to focal treatment and (2) where to intervene. Currently these questions are answered qualitatively and sometimes differently across centers. There is a need for objective, standardized methods to guide surgical decision making and to enable large scale data analysis across centers and prospec… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: 25 pages, 4 Figures, 1 table

  32. arXiv:2306.16780  [pdf, other

    cs.LG q-bio.BM

    Graph Sampling-based Meta-Learning for Molecular Property Prediction

    Authors: Xiang Zhuang, Qiang Zhang, Bin Wu, Keyan Ding, Yin Fang, Huajun Chen

    Abstract: Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-ba… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Accepted by IJCAI 2023

  33. arXiv:2306.08018  [pdf, other

    q-bio.QM cs.AI cs.CE cs.CL cs.IR cs.LG

    Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

    Authors: Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen

    Abstract: Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a comprehensive instruction dataset designed for the biomolecular doma… ▽ More

    Submitted 4 March, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: ICLR 2024. Project homepage: https://github.com/zjunlp/Mol-Instructions

  34. arXiv:2306.05257  [pdf, other

    cs.LG q-bio.QM

    Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction

    Authors: Xuan Lin, Lichang Dai, Yafang Zhou, Zu-Guo Yu, Wen Zhang, Jian-Yu Shi, Dong-Sheng Cao, Li Zeng, Haowen Chen, Bosheng Song, Philip S. Yu, Xiangxiang Zeng

    Abstract: Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted by Briefings in Bioinformatics

  35. arXiv:2305.17183  [pdf

    q-bio.QM cs.AI eess.IV

    ProGroTrack: Deep Learning-Assisted Tracking of Intracellular Protein Growth Dynamics

    Authors: Kai San Chan, Huimiao Chen, Chenyu Jin, Yuxuan Tian, Dingchang Lin

    Abstract: Accurate tracking of cellular and subcellular structures, along with their dynamics, plays a pivotal role in understanding the underlying mechanisms of biological systems. This paper presents a novel approach, ProGroTrack, that combines the You Only Look Once (YOLO) and ByteTrack algorithms within the detection-based tracking (DBT) framework to track intracellular protein nanostructures. Focusing… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  36. arXiv:2303.00313  [pdf, other

    cs.LG q-bio.BM

    Deep Learning Methods for Small Molecule Drug Discovery: A Survey

    Authors: Wenhao Hu, Yingying Liu, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang

    Abstract: With the development of computer-assisted techniques, research communities including biochemistry and deep learning have been devoted into the drug discovery field for over a decade. Various applications of deep learning have drawn great attention in drug discovery, such as molecule generation, molecular property prediction, retrosynthesis prediction, and reaction prediction. While most existing s… ▽ More

    Submitted 5 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  37. arXiv:2210.00395  [pdf, other

    stat.ME q-bio.GN

    Federated Generalized Linear Mixed Models for Collaborative Genome-wide Association Studies

    Authors: Wentao Li, Han Chen, Xiaoqian Jiang, Arif Harmanci

    Abstract: As the sequencing costs are decreasing, there is great incentive to perform large scale association studies to increase power of detecting new variants. Federated association testing among different institutions is a viable solution for increasing sample sizes by sharing the intermediate testing statistics that are aggregated by a central server. There are, however, standing challenges to performi… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

  38. arXiv:2209.13022  [pdf

    q-bio.QM eess.IV q-bio.BM q-bio.GN

    Artificial Intelligence Models for Cell Type and Subtype Identification Based on Single-Cell RNA Sequencing Data in Vision Science

    Authors: Yeganeh Madadi, Aboozar Monavarfeshani, Hao Chen, W. Daniel Stamer, Robert W. Williams, Siamak Yousefi

    Abstract: Single-cell RNA sequencing (scRNA-seq) provides a high throughput, quantitative and unbiased framework for scientists in many research fields to identify and characterize cell types within heterogeneous cell populations from various tissues. However, scRNA-seq based identification of discrete cell-types is still labor intensive and depends on prior molecular knowledge. Artificial intelligence has… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  39. arXiv:2207.12369  [pdf, other

    q-bio.NC cs.HC eess.SP

    Toward reliable signals decoding for electroencephalogram: A benchmark study to EEGNeX

    Authors: Xia Chen, Xiangbin Teng, Han Chen, Yafeng Pan, Philipp Geyer

    Abstract: This study examines the efficacy of various neural network (NN) models in interpreting mental constructs via electroencephalogram (EEG) signals. Through the assessment of 16 prevalent NN models and their variants across four brain-computer interface (BCI) paradigms, we gauged their information representation capability. Rooted in comprehensive literature review findings, we proposed EEGNeX, a nove… ▽ More

    Submitted 24 September, 2023; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: 19 pages, 6 figures

    Journal ref: Biomedical Signal Processing and Control, 2023

  40. arXiv:2207.10080  [pdf, other

    q-bio.QM cs.AI cs.CL cs.IR cs.LG

    Multi-modal Protein Knowledge Graph Construction and Applications

    Authors: Siyuan Cheng, Xiaozhuan Liang, Zhen Bi, Huajun Chen, Ningyu Zhang

    Abstract: Existing data-centric methods for protein science generally cannot sufficiently capture and leverage biology knowledge, which may be crucial for many protein tasks. To facilitate research in this field, we create ProteinKG65, a knowledge graph for protein science. Using gene ontology and Uniprot knowledge base as a basis, we transform and integrate various kinds of knowledge with aligned descripti… ▽ More

    Submitted 14 November, 2022; v1 submitted 27 May, 2022; originally announced July 2022.

    Comments: Accepted by AAAI 2023 (Student Abstract). Dataset available in https://zjunlp.github.io/project/ProteinKG65/

  41. arXiv:2204.10476  [pdf

    q-bio.MN cs.LG cs.SI stat.AP

    Global Mapping of Gene/Protein Interactions in PubMed Abstracts: A Framework and an Experiment with P53 Interactions

    Authors: Xin Li, Hsinchun Chen, Zan Huang, Hua Su, Jesse D. Martinez

    Abstract: Gene/protein interactions provide critical information for a thorough understanding of cellular processes. Recently, considerable interest and effort has been focused on the construction and analysis of genome-wide gene networks. The large body of biomedical literature is an important source of gene/protein interaction information. Recent advances in text mining tools have made it possible to auto… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Journal ref: Journal of biomedical informatics, 2007

  42. arXiv:2204.10473  [pdf

    q-bio.MN cs.LG q-bio.QM stat.ML

    Gene Function Prediction with Gene Interaction Networks: A Context Graph Kernel Approach

    Authors: Xin Li, Hsinchun Chen, Jiexun Li, Zhu Zhang

    Abstract: Predicting gene functions is a challenge for biologists in the post genomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Journal ref: IEEE Transactions on Information Technology in Biomedicine, 2010

  43. arXiv:2203.05786  [pdf

    cond-mat.soft physics.bio-ph q-bio.BM

    Free energy landscape of two-state protein Acylphosphatase with large contact order revealed by force-dependent folding and unfolding dynamics

    Authors: Xuening Ma, Hao Sun, Haiyan Hong, Zilong Guo, Huanhuan Su, Hu Chen

    Abstract: Acylphosphatase (AcP) is a small protein with 98 amino acid residues that catalyzes the hydrolysis of carboxyl-phosphate bonds. AcP is a typical two-state protein with slow folding rate due to its relatively large contact order in the native structure. The mechanical properties and unfolding behavior of AcP has been studied by atomic force microscope. But the folding and unfolding dynamics at low… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: 21 pages, 9 figures

  44. arXiv:2202.08195  [pdf, other

    eess.IV cs.CV q-bio.QM

    Nuclei Segmentation with Point Annotations from Pathology Images via Self-Supervised Learning and Co-Training

    Authors: Yi Lin, Zhiyong Qu, Hao Chen, Zhongke Gao, Yuexiang Li, Lili Xia, Kai Ma, Yefeng Zheng, Kwang-Ting Cheng

    Abstract: Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as po… ▽ More

    Submitted 17 August, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: Accepted by MedIA

  45. arXiv:2202.02944  [pdf, other

    cs.AI q-bio.BM

    Prompt-Guided Injection of Conformation to Pre-trained Protein Model

    Authors: Qiang Zhang, Zeyuan Wang, Yuqiang Han, Haoran Yu, Xurui Jin, Huajun Chen

    Abstract: Pre-trained protein models (PTPMs) represent a protein with one fixed embedding and thus are not capable for diverse tasks. For example, protein structures can shift, namely protein folding, between several conformations in various biological processes. To enable PTPMs to produce task-aware representations, we propose to learn interpretable, pluggable and extensible protein prompts as a way of inj… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

    Comments: Work in progress

  46. arXiv:2201.11147  [pdf, other

    q-bio.BM cs.AI cs.CL cs.IR cs.LG

    OntoProtein: Protein Pretraining With Gene Ontology Embedding

    Authors: Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Jiazhang Lian, Qiang Zhang, Huajun Chen

    Abstract: Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorpora… ▽ More

    Submitted 3 June, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

    Comments: Accepted by ICLR 2022

  47. arXiv:2112.00544  [pdf, other

    cs.LG cs.AI q-bio.QM

    Molecular Contrastive Learning with Chemical Element Knowledge Graph

    Authors: Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, Huajun Chen

    Abstract: Molecular representation learning contributes to multiple downstream tasks such as molecular property prediction and drug design. To properly represent molecules, graph contrastive learning is a promising paradigm as it utilizes self-supervision signals and has no requirements for human annotations. However, prior works fail to incorporate fundamental domain knowledge into graph semantics and thus… ▽ More

    Submitted 10 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted in AAAI 2022 Main track

  48. arXiv:2111.03063  [pdf, other

    eess.IV cs.CV q-bio.QM

    PDBL: Improving Histopathological Tissue Classification with Plug-and-Play Pyramidal Deep-Broad Learning

    Authors: Jiatai Lin, Guoqiang Han, Xipeng Pan, Hao Chen, Danyi Li, Xiping Jia, Zhenwei Shi, Zhizhen Wang, Yanfen Cui, Haiming Li, Changhong Liang, Li Liang, Zaiyi Liu, Chu Han

    Abstract: Histopathological tissue classification is a fundamental task in pathomics cancer research. Precisely differentiating different tissue types is a benefit for the downstream researches, like cancer diagnosis, prognosis and etc. Existing works mostly leverage the popular classification backbones in computer vision to achieve histopathological tissue classification. In this paper, we proposed a super… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 10 pages, 5 figures

  49. arXiv:2108.05417  [pdf

    astro-ph.EP astro-ph.IM q-bio.QM

    Habitability Models for Astrobiology

    Authors: Abel Méndez, Edgard E. Rivera-Valentín, Dirk Schulze-Makuch, Justin Filiberto, Ramses M. Ramírez, Tana Wood, Alfonso Dávila, Chris McKay, Kevin N. Ortiz Ceballos, Marcos Jusino-Maldonado, Nicole J. Torres-Santiago, Guillermo Nery, René Heller, Paul K. Byrne, Michael J. Malaska, Erica Nathan, Marta F. Simões, André Antunes, Jesús Martínez-Frías, Ludmila Carone, Noam R. Izenberg, Dimitra Atri, Humberto I. Carvajal Chitty, Priscilla Nowajewski-Barra, Frances Rivera-Hernández , et al. (9 additional authors not shown)

    Abstract: Habitability has been generally defined as the capability of an environment to support life. Ecologists have been using Habitat Suitability Models (HSMs) for more than four decades to study the habitability of Earth from local to global scales. Astrobiologists have been proposing different habitability models for some time, with little integration and consistency among them, being different in fun… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: Published in Astrobiology, 21(8). arXiv admin note: substantial text overlap with arXiv:2007.05491

  50. arXiv:2107.13681  [pdf, other

    cs.ET q-bio.MN

    Rate-Independent Computation in Continuous Chemical Reaction Networks

    Authors: Ho-Lin Chen, David Doty, Wyatt Reeves, David Soloveichik

    Abstract: Coupled chemical interactions in a well-mixed solution are commonly formalized as chemical reaction networks (CRNs). However, despite the widespread use of CRNs in the natural sciences, the range of computational behaviors exhibited by CRNs is not well understood. Here we study the following problem: what functions $f:\mathbb{R}^k \to \mathbb{R}$ can be computed by a CRN, in which the CRN eventual… ▽ More

    Submitted 7 April, 2023; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: accepted to JACM (https://doi.org/10.1145/3590776); preliminary version appeared in ITCS 2014: http://doi.org/10.1145/2554797.2554827