Skip to main content

Showing 1–25 of 25 results for author: Feng, S

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2503.08179  [pdf, other

    q-bio.BM cs.AI

    ProtTeX: Structure-In-Context Reasoning and Editing of Proteins with Large Language Models

    Authors: Zicheng Ma, Chuanliu Fan, Zhicong Wang, Zhenyu Chen, Xiaohan Lin, Yanheng Li, Shihao Feng, Jun Zhang, Ziqiang Cao, Yi Qin Gao

    Abstract: Large language models have made remarkable progress in the field of molecular science, particularly in understanding and generating functional small molecules. This success is largely attributed to the effectiveness of molecular tokenization strategies. In protein science, the amino acid sequence serves as the sole tokenizer for LLMs. However, many fundamental challenges in protein science are inh… ▽ More

    Submitted 13 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 26 pages, 9 figures

  2. arXiv:2502.15597  [pdf, other

    q-bio.OT

    From FAIR to CURE: Guidelines for Computational Models of Biological Systems

    Authors: Herbert M. Sauro, Eran Agmon, Michael L. Blinov, John H. Gennari, Joe Hellerstein, Adel Heydarabadipour, Peter Hunter, Bartholomew E. Jardine, Elebeoba May, David P. Nickerson, Lucian P. Smith, Gary D Bader, Frank Bergmann, Patrick M. Boyle, Andreas Drager, James R. Faeder, Song Feng, Juliana Freire, Fabian Frohlich, James A. Glazier, Thomas E. Gorochowski, Tomas Helikar, Stefan Hoops, Princess Imoukhuede, Sarah M. Keating , et al. (26 additional authors not shown)

    Abstract: Guidelines for managing scientific data have been established under the FAIR principles requiring that data be Findable, Accessible, Interoperable, and Reusable. In many scientific disciplines, especially computational biology, both data and models are key to progress. For this reason, and recognizing that such models are a very special type of 'data', we argue that computational models, especiall… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  3. arXiv:2501.16386  [pdf

    q-bio.QM cs.LG

    ILETIA: An AI-enhanced method for individualized trigger-oocyte pickup interval estimation of progestin-primed ovarian stimulation protocol

    Authors: Binjian Wu, Qian Li, Zhe Kuang, Hongyuan Gao, Xinyi Liu, Haiyan Guo, Qiuju Chen, Xinyi Liu, Yangruizhe Jiang, Yuqi Zhang, Jinyin Zha, Mingyu Li, Qiuhan Ren, Sishuo Feng, Haicang Zhang, Xuefeng Lu, Jian Zhang

    Abstract: In vitro fertilization-embryo transfer (IVF-ET) stands as one of the most prevalent treatments for infertility. During an IVF-ET cycle, the time interval between trigger shot and oocyte pickup (OPU) is a pivotal period for follicular maturation, which determines mature oocytes yields and impacts the success of subsequent procedures. However, accurately predicting this interval is severely hindered… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  4. arXiv:2410.10516  [pdf, other

    cs.LG cs.AI q-bio.BM

    UniGEM: A Unified Approach to Generation and Property Prediction for Molecules

    Authors: Shikun Feng, Yuyan Ni, Yan Lu, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

    Abstract: Molecular generation and molecular property prediction are both crucial for drug discovery, but they are often developed independently. Inspired by recent studies, which demonstrate that diffusion model, a prominent generative approach, can learn meaningful data representations that enhance predictive tasks, we explore the potential for developing a unified generative model in the molecular domain… ▽ More

    Submitted 4 April, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: 11 pages, 5 figures

  5. arXiv:2410.09346  [pdf

    q-bio.MN q-bio.QM

    Transcriptome and Redox Proteome Reveal Temporal Scales of Carbon Metabolism Regulation in Model Cyanobacteria Under Light Disturbance

    Authors: Connah G. M. Johnson, Zachary Johnson, Liam S. Mackey, Xiaolu Li, Natalie C. Sadler, Tong Zhang, Wei-Jun Qian, Pavlo Bohutskyi, Song Feng, Margaret S. Cheung

    Abstract: We develop a systems approach based on an energy-landscape concept to differentiate interactions involving redox activities and conformational changes of proteins and nucleic acids interactions in multi-layered protein-DNA regulatory networks under light disturbance. Our approach is a data-driven modeling workflow using a physics-informed machine learning algorithm to train a non-linear mathematic… ▽ More

    Submitted 2 June, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

  6. arXiv:2406.11568  [pdf, other

    cs.CL cs.SD eess.AS q-bio.NC

    Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

    Authors: Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang

    Abstract: In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade m… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Journal ref: Proceedings of Interspeech2024

  7. arXiv:2405.10343  [pdf, other

    q-bio.BM cs.AI cs.LG

    UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

    Authors: Shikun Feng, Yuyan Ni, Minghao Li, Yanwen Huang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

    Abstract: Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound un… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  8. arXiv:2402.13779  [pdf, other

    cs.LG cs.AI q-bio.BM

    Contextual Molecule Representation Learning from Chemical Reaction Knowledge

    Authors: Han Tang, Shikun Feng, Bicheng Lin, Yuyan Ni, JIngjing Liu, Wei-Ying Ma, Yanyan Lan

    Abstract: In recent years, self-supervised learning has emerged as a powerful tool to harness abundant unlabelled data for representation learning and has been broadly adopted in diverse areas. However, when applied to molecular representation learning (MRL), prevailing techniques such as masked sub-unit reconstruction often fall short, due to the high degree of freedom in the possible combinations of atoms… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Preprint. Under Review

  9. arXiv:2311.16160  [pdf, other

    q-bio.BM cs.LG

    Protein-ligand binding representation learning from fine-grained interactions

    Authors: Shikun Feng, Minghao Li, Yinjun Jia, Weiying Ma, Yanyan Lan

    Abstract: The binding between proteins and ligands plays a crucial role in the realm of drug discovery. Previous deep learning approaches have shown promising results over traditional computationally intensive methods, but resulting in poor generalization due to limited supervised data. In this paper, we propose to learn protein-ligand binding representation in a self-supervised learning manner. Different f… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  10. arXiv:2311.02124  [pdf, other

    q-bio.BM cs.AI cs.LG

    Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

    Authors: Yuyan Ni, Shikun Feng, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan

    Abstract: While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  11. arXiv:2310.14216  [pdf, other

    cs.LG cs.AI q-bio.BM

    UniMAP: Universal SMILES-Graph Representation Learning

    Authors: Shikun Feng, Lixin Yang, Yanwen Huang, Yuyan Ni, Weiying Ma, Yanyan Lan

    Abstract: Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may le… ▽ More

    Submitted 4 November, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

  12. arXiv:2307.10683  [pdf, other

    q-bio.QM cs.LG physics.chem-ph

    Fractional Denoising for 3D Molecular Pre-training

    Authors: Shikun Feng, Yuyan Ni, Yanyan Lan, Zhi-Ming Ma, Wei-Ying Ma

    Abstract: Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples… ▽ More

    Submitted 26 February, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

  13. arXiv:2307.06235  [pdf, other

    cs.LG q-bio.BM

    Multimodal Molecular Pretraining via Modality Blending

    Authors: Qiying Yu, Yudi Zhang, Yuyan Ni, Shikun Feng, Yanyan Lan, Hao Zhou, Jingjing Liu

    Abstract: Self-supervised learning has recently gained growing interest in molecular modeling for scientific tasks such as AI-assisted drug discovery. Current studies consider leveraging both 2D and 3D molecular structures for representation learning. However, relying on straightforward alignment strategies that treat each modality separately, these methods fail to exploit the intrinsic correlation between… ▽ More

    Submitted 8 October, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

  14. arXiv:2305.06488  [pdf

    q-bio.QM

    A Platform for the Biomedical Application of Large Language Models

    Authors: Sebastian Lobentanzer, Shaohong Feng, The BioChatter Consortium, Andreas Maier, Cankun Wang, Jan Baumbach, Nils Krehl, Qin Ma, Julio Saez-Rodriguez

    Abstract: Current-generation Large Language Models (LLMs) have stirred enormous interest in recent months, yielding great potential for accessibility and automation, while simultaneously posing significant challenges and risk of misuse. To facilitate interfacing with LLMs in the biomedical space, while at the same time safeguarding their functionalities through sensible constraints, we propose a dedicated,… ▽ More

    Submitted 17 February, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 31 pages, 3 figures

  15. arXiv:2202.13004  [pdf

    q-bio.MN

    SBbadger: Biochemical Reaction Networks with Definable Degree Distributions

    Authors: Michael A. Kochen, H. Steven Wiley, Song Feng, Herbert M. Sauro

    Abstract: Motivation: An essential step in developing computational tools for the inference, optimization, and simulation of biochemical reaction networks is gauging tool performance against earlier efforts using an appropriate set of benchmarks. General strategies for the assembly of benchmark models include collection from the literature, creation via subnetwork extraction and de novo generation. However,… ▽ More

    Submitted 12 September, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

  16. arXiv:2106.06929  [pdf, other

    q-bio.MN q-bio.SC

    Dynamics and Sensitivity of Signaling Pathways

    Authors: Michael A. Kochen, Steven S. Andrews, H. Steven Wiley, Song Feng, Herbert M. Sauro

    Abstract: Signaling pathways serve to communicate information about extracellular conditions into the cell, to both the nucleus and cytoplasmic processes to control cell responses. Genetic mutations in signaling network components are frequently associated with cancer and can result in cells acquiring an ability to divide and grow uncontrollably. Because signaling pathways play such a significant role in ca… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  17. arXiv:2010.03068  [pdf, other

    q-bio.QM math.CO

    Hypergraph Models of Biological Networks to Identify Genes Critical to Pathogenic Viral Response

    Authors: Song Feng, Emily Heath, Brett Jefferson, Cliff Joslyn, Henry Kvinge, Hugh D. Mitchell, Brenda Praggastis, Amie J. Eisfeld, Amy C. Sims, Larissa B. Thackray, Shufang Fan, Kevin B. Walters, Peter J. Halfmann, Danielle Westhoff-Smith, Qing Tan, Vineet D. Menachery, Timothy P. Sheahan, Adam S. Cockrell, Jacob F. Kocher, Kelly G. Stratton, Natalie C. Heller, Lisa M. Bramer, Michael S. Diamond, Ralph S. Baric, Katrina M. Waters , et al. (3 additional authors not shown)

    Abstract: Background: Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    MSC Class: 92C42; 92-08; 05C65

  18. arXiv:2009.11241  [pdf, other

    q-bio.QM

    Deep learning for peptide identification from metaproteomics datasets

    Authors: Xuan Guo, Shichao Feng

    Abstract: Metaproteomics are becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra ag… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  19. arXiv:1903.08615  [pdf, other

    q-bio.QM physics.chem-ph q-bio.MN

    Scaling methods for accelerating kinetic Monte Carlo simulations of chemical reaction networks

    Authors: Yen Ting Lin, Song Feng, William S. Hlavacek

    Abstract: Various kinetic Monte Carlo algorithms become inefficient when some of the population sizes in a system are large, which gives rise to a large number of reaction events per unit time. Here, we present a new acceleration algorithm based on adaptive and heterogeneous scaling of reaction rates and stoichiometric coefficients. The algorithm is conceptually related to the commonly used idea of accelera… ▽ More

    Submitted 10 May, 2019; v1 submitted 20 March, 2019; originally announced March 2019.

    Comments: 18 pages, 7 figures, 1 table

  20. arXiv:1811.09366  [pdf

    q-bio.BM

    Prediction of Cytochrome P450-Mediated Metabolism Using a Combination of QSAR Derived Reactivity and Induced Fit Docking

    Authors: Shulu Feng, Richard A. Friesner

    Abstract: Prediction of metabolism in cytochrome P450s remains to be a crucial yet challenging topic in discovering and designing drugs, agrochemicals and nutritional supplements. The problem is challenging because the rate of P450 metabolism depends upon both the intrinsic chemical reactivity of the site and the protein-ligand geometry that is energetically accessible in the active site of a given P450 iso… ▽ More

    Submitted 23 November, 2018; originally announced November 2018.

  21. arXiv:1802.00462  [pdf

    q-bio.MN q-bio.QM q-bio.SC

    In silico evolution of signaling networks using rule-based models: bistable response dynamics

    Authors: Song Feng, Orkun S. Soyer

    Abstract: One of the ultimate goals in biology is to understand the design principles of biological systems. Such principles, if they exist, can help us better understand complex, natural biological systems and guide the engineering of de novo ones. Towards deciphering design principles, in silico evolution of biological systems with proper abstraction is a promising approach. Here, we demonstrate the appli… ▽ More

    Submitted 6 February, 2018; v1 submitted 1 February, 2018; originally announced February 2018.

    Comments: 24 pages, 7 figures

  22. arXiv:1801.10227  [pdf, other

    q-bio.QM q-bio.MN q-bio.SC

    Generalizing Gillespie's direct method to enable network-free simulations

    Authors: Ryan Suderman, Eshan D. Mitra, Yen Ting Lin, Keesha E. Erickson, Song Feng, William S. Hlavacek

    Abstract: Gillespie's direct method for stochastic simulation of chemical kinetics is a staple of computational systems biology research. However, the algorithm requires explicit enumeration of all reactions and all chemical species that may arise in the system. In many cases, this is not feasible due to the combinatorial explosion of reactions and species in biological networks. Rule-based modeling framewo… ▽ More

    Submitted 30 January, 2018; originally announced January 2018.

    Comments: 27 pages, 6 figures

  23. An expanded evaluation of protein function prediction methods shows an improvement in accuracy

    Authors: Yuxiang Jiang, Tal Ronnen Oron, Wyatt T Clark, Asma R Bankapur, Daniel D'Andrea, Rosalba Lepore, Christopher S Funk, Indika Kahanda, Karin M Verspoor, Asa Ben-Hur, Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed ME Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio, Renzhi Cao, Zhaolong Zhong, Jianlin Cheng, Adrian Altenhoff, Nives Skunca , et al. (122 additional authors not shown)

    Abstract: Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our a… ▽ More

    Submitted 2 January, 2016; originally announced January 2016.

    Comments: Submitted to Genome Biology

  24. arXiv:1508.03373  [pdf, other

    math.PR math.OC q-bio.NC q-fin.MF

    A martingale analysis of first passage times of time-dependent Wiener diffusion models

    Authors: Vaibhav Srivastava, Samuel F. Feng, Jonathan D. Cohen, Naomi Ehrich Leonard, Amitai Shenhav

    Abstract: Research in psychology and neuroscience has successfully modeled decision making as a process of noisy evidence accumulation to a decision bound. While there are several variants and implementations of this idea, the majority of these models make use of a noisy accumulation between two absorbing boundaries. A common assumption of these models is that decision parameters, e.g., the rate of accumula… ▽ More

    Submitted 30 September, 2016; v1 submitted 13 August, 2015; originally announced August 2015.

  25. arXiv:math/9809203  [pdf, ps

    math.PR q-bio

    Large deviations for the Fleming-Viot process with neutral mutation and selection

    Authors: Donald Dawson, Shui Feng

    Abstract: Large deviation principles are established for the Fleming-Viot processes with neutral mutation and selection, and the corresponding equilibrium measures as the sampling rate goes to 0. All results are first proved for the finite allele model, and then generalized, through the projective limit technique, to the infinite allele model. Explicit expressions are obtained for the rate functions.

    Submitted 16 September, 1998; originally announced September 1998.

    Report number: FI-NP1998-005