Skip to main content

Showing 1–22 of 22 results for author: Xie, Y

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2506.07619  [pdf, ps, other

    cs.LG q-bio.QM

    The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning

    Authors: Toby Boyne, Juan S. Campos, Becky D. Langdon, Jixiang Qing, Yilin Xie, Shiqiang Zhang, Calvin Tsay, Ruth Misener, Daniel W. Davies, Kim E. Jelfs, Sarah Boyall, Thomas M. Dixon, Linden Schrecker, Jose Pablo Folch

    Abstract: Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retro-synthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset fo… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  2. arXiv:2505.09816  [pdf, ps, other

    q-bio.NC cs.NE physics.bio-ph

    Slow Transition to Low-Dimensional Chaos in Heavy-Tailed Recurrent Neural Networks

    Authors: Yi Xie, Stefan Mihalas, Łukasz Kuśmierz

    Abstract: Growing evidence suggests that synaptic weights in the brain follow heavy-tailed distributions, yet most theoretical analyses of recurrent neural networks (RNNs) assume Gaussian connectivity. We systematically study the activity of RNNs with random weights drawn from biologically plausible Lévy alpha-stable distributions. While mean-field theory for the infinite system predicts that the quiescent… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2504.04280  [pdf, other

    cs.LG q-bio.QM

    Foundation Models for Environmental Science: A Survey of Emerging Frontiers

    Authors: Runlong Yu, Shengyu Chen, Yiqun Xie, Huaxiu Yao, Jared Willard, Xiaowei Jia

    Abstract: Modeling environmental ecosystems is essential for effective resource management, sustainable development, and understanding complex ecological processes. However, traditional data-driven methods face challenges in capturing inherently complex and interconnected processes and are further constrained by limited observational data in many environmental applications. Foundation models, which leverage… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  4. arXiv:2412.09115  [pdf, other

    q-bio.NC cs.CV cs.LG cs.NE

    Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations

    Authors: Yudi Xie, Weichen Huang, Esther Alter, Jeremy Schwartz, Joshua B. Tenenbaum, James J. DiCarlo

    Abstract: Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also deriv… ▽ More

    Submitted 17 February, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: 30 pages, 21 figures, ICLR 2025

  5. arXiv:2410.11046  [pdf

    cs.IR cs.LG q-bio.QM

    SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data

    Authors: Liang Tao, Yixin Xie, Jeffrey D Deng, Hui Shen, Hong-Wen Deng, Weihua Zhou, Chen Zhao

    Abstract: Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia, significantly impacting cost, mortality, and burden worldwide. The advent of high-throughput omics technologies, such as genomics, transcriptomics, proteomics, and epigenomics, has revolutionized the molecular understanding of AD. Conventional AI approaches typically require the completion of all om… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 20 pages, 2 figures

  6. arXiv:2406.15488  [pdf, other

    q-bio.NC cs.AI

    Orangutan: A Multiscale Brain Emulation-Based Artificial Intelligence Framework for Dynamic Environments

    Authors: Yong Xie

    Abstract: Achieving General Artificial Intelligence (AGI) has long been a grand challenge in the field of AI, and brain-inspired computing is widely acknowledged as one of the most promising approaches to realize this goal. This paper introduces a novel brain-inspired AI framework, Orangutan. It simulates the structure and computational mechanisms of biological brains on multiple scales, encompassing multi-… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2403.01702  [pdf

    q-bio.MN

    Hill Function-based Model of Transcriptional Response: Impact of Nonspecific Binding and RNAP Interactions

    Authors: Wenjia Shi, Yao Ma, Peilin Hu, Mi Pang, Xiaona Huang, Yiting Dang, Yuxin Xie, Danni Wu

    Abstract: Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical i… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  8. arXiv:2311.10255  [pdf, other

    cs.LG q-bio.PE

    FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems

    Authors: Shiyuan Luo, Juntong Ni, Shengyu Chen, Runlong Yu, Yiqun Xie, Licheng Liu, Zhenong Jin, Huaxiu Yao, Xiaowei Jia

    Abstract: Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values… ▽ More

    Submitted 22 February, 2025; v1 submitted 16 November, 2023; originally announced November 2023.

  9. arXiv:2309.15132  [pdf, other

    q-bio.QM cs.LG

    Genetic InfoMax: Exploring Mutual Information Maximization in High-Dimensional Imaging Genetics Studies

    Authors: Yaochen Xie, Ziqian Xie, Sheikh Muhammad Saiful Islam, Degui Zhi, Shuiwang Ji

    Abstract: Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits. When applied to high-dimensional medical imaging data, a key step is to extract lower-dimensional, yet informative representations of the data as traits. Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS in compari… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 17 pages, 7 figures

  10. arXiv:2308.03198  [pdf, other

    q-bio.OT

    Re-imagining the Future of Forest Management -- An Age-Dependent Approach towards Harvesting

    Authors: Shuyang Bian, Yuanyuan Xie, Flora Zhang

    Abstract: Facing the drastic climate changes, current strategies for enhancing carbon dioxide stocks need to be thoroughly honed. To address the problem, we first built a carbon sequestration growth model driven by growth rate dependency (GRDM). We abstracted the carbon cycling system into the process of photosynthesis, the humidity fluctuation, and the original storage of carbon in the trees. In the photos… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    MSC Class: 92-10

  11. arXiv:2306.00041  [pdf, other

    q-bio.QM cs.LG

    Causal Intervention for Measuring Confidence in Drug-Target Interaction Prediction

    Authors: Wenting Ye, Chen Li, Yang Xie, Wen Zhang, Hong-Yu Zhang, Bowen Wang, Debo Cheng, Zaiwen Feng

    Abstract: Identifying and discovering drug-target interactions(DTIs) are vital steps in drug discovery and development. They play a crucial role in assisting scientists in finding new drugs and accelerating the drug development process. Recently, knowledge graph and knowledge graph embedding (KGE) models have made rapid advancements and demonstrated impressive performance in drug discovery. However, such mo… ▽ More

    Submitted 14 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

  12. Single-Cell Multimodal Prediction via Transformers

    Authors: Wenzhuo Tang, Hongzhi Wen, Renming Liu, Jiayuan Ding, Wei Jin, Yuying Xie, Hui Liu, Jiliang Tang

    Abstract: The recent development of multimodal single-cell technology has made the possibility of acquiring multiple omics data from individual cells, thereby enabling a deeper understanding of cellular states and dynamics. Nevertheless, the proliferation of multimodal single-cell data also introduces tremendous challenges in modeling the complex interactions among different modalities. The recently advance… ▽ More

    Submitted 13 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: CIKM 2023

    Journal ref: In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM 23), 2023, Birmingham, United Kingdom

  13. arXiv:2302.03038  [pdf, other

    q-bio.GN cs.AI cs.LG

    Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation

    Authors: Hongzhi Wen, Wenzhuo Tang, Wei Jin, Jiayuan Ding, Renming Liu, Xinnan Dai, Feng Shi, Lulu Shang, Hui Liu, Yuying Xie

    Abstract: Spatially resolved transcriptomics brings exciting breakthroughs to single-cell analysis by providing physical locations along with gene expression. However, as a cost of the extremely high spatial resolution, the cellular level spatial transcriptomic data suffer significantly from missing values. While a standard solution is to perform imputation on the missing values, most existing methods eithe… ▽ More

    Submitted 16 February, 2024; v1 submitted 5 February, 2023; originally announced February 2023.

  14. arXiv:2210.12385  [pdf, other

    q-bio.QM cs.AI

    Deep Learning in Single-Cell Analysis

    Authors: Dylan Molho, Jiayuan Ding, Zhaoheng Li, Hongzhi Wen, Wenzhuo Tang, Yixin Wang, Julian Venegas, Wei Jin, Renming Liu, Runze Su, Patrick Danaher, Robert Yang, Yu Leo Lei, Yuying Xie, Jiliang Tang

    Abstract: Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performan… ▽ More

    Submitted 5 November, 2022; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysis

  15. arXiv:2206.12240  [pdf, other

    q-bio.BM cs.LG

    PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction

    Authors: Sirui Liu, Jun Zhang, Haotian Chu, Min Wang, Boxin Xue, Ningxi Ni, Jialiang Yu, Yuhao Xie, Zhenyu Chen, Mengyun Chen, Yuan Liu, Piya Patra, Fan Xu, Jie Chen, Zidong Wang, Lijiang Yang, Fan Yu, Lei Chen, Yi Qin Gao

    Abstract: Proteins are essential component of human life and their structures are important for function and mechanism analysis. Recent work has shown the potential of AI-driven methods for protein structure prediction. However, the development of new models is restricted by the lack of dataset and benchmark training procedure. To the best of our knowledge, the existing open source datasets are far less to… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  16. AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks

    Authors: Ruiwei Feng, Yufeng Xie, Minshan Lai, Danny Z. Chen, Ji Cao, Jian Wu

    Abstract: Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the f… ▽ More

    Submitted 9 January, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

  17. arXiv:2108.09499  [pdf

    q-bio.OT eess.IV

    MITI Minimum Information guidelines for highly multiplexed tissue images

    Authors: Denis Schapiro, Clarence Yapp, Artem Sokolov, Sheila M. Reynolds, Yu-An Chen, Damir Sudar, Yubin Xie, Jeremy L. Muhlich, Raquel Arias-Camison, Sarah Arena, Adam J. Taylor, Milen Nikolov, Madison Tyler, Jia-Ren Lin, Erik A. Burlingame, Human Tumor Atlas Network, Young H. Chang, Samouil L Farhi, Vésteinn Thorsson, Nithya Venkatamohan, Julia L. Drewes, Dana Pe'er, David A. Gutman, Markus D. Herrmann, Nils Gehlenborg , et al. (14 additional authors not shown)

    Abstract: The imminent release of tissue atlases combining multi-channel microscopy with single cell sequencing and other omics data from normal and diseased specimens creates an urgent need for data and metadata standards that guide data deposition, curation and release. We describe a Minimum Information about highly multiplexed Tissue Imaging (MITI) standard that applies best practices developed for genom… ▽ More

    Submitted 23 February, 2022; v1 submitted 21 August, 2021; originally announced August 2021.

  18. arXiv:2103.10432  [pdf, other

    q-bio.BM cs.CE cs.LG

    MARS: Markov Molecular Sampling for Multi-objective Drug Discovery

    Authors: Yutong Xie, Chence Shi, Hao Zhou, Yuwei Yang, Weinan Zhang, Yong Yu, Lei Li

    Abstract: Searching for novel molecules with desired chemical properties is crucial in drug discovery. Existing work focuses on developing neural models to generate either molecular sequences or chemical graphs. However, it remains a big challenge to find novel and diverse compounds satisfying several properties. In this paper, we propose MARS, a method for multi-objective drug molecule discovery. MARS is b… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: ICLR 2021

  19. arXiv:2012.01981  [pdf, other

    q-bio.QM cs.LG

    Advanced Graph and Sequence Neural Networks for Molecular Property Prediction and Drug Discovery

    Authors: Zhengyang Wang, Meng Liu, Youzhi Luo, Zhao Xu, Yaochen Xie, Limei Wang, Lei Cai, Qi Qi, Zhuoning Yuan, Tianbao Yang, Shuiwang Ji

    Abstract: Properties of molecules are indicative of their functions and thus are useful in many applications. With the advances of deep learning methods, computational approaches for predicting molecular properties are gaining increasing momentum. However, there lacks customized and advanced methods and comprehensive tools for this task currently. Here we develop a suite of comprehensive machine learning me… ▽ More

    Submitted 6 July, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: Supplementary Material: https://github.com/divelab/MoleculeX/blob/master/AdvProp/AdvProp_supp.pdf

  20. arXiv:1910.13632  [pdf

    stat.ME q-bio.QM stat.AP

    RCRnorm: An integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data

    Authors: Gaoxiang Jia, Xinlei Wang, Qiwei Li, Wei Lu, Ximing Tang, Ignacio Wistuba, Yang Xie

    Abstract: Formalin-fixed paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies and diagnosis or prognosis of diseases. Their application, however, is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expressio… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    MSC Class: 97K80

    Journal ref: Ann. Appl. Stat. 13 (2019), no. 3, 1617--1647. https://projecteuclid.org/euclid.aoas/1571277766

  21. arXiv:1901.08114  [pdf

    q-bio.NC cs.AI

    Convolution Forgetting Curve Model for Repeated Learning

    Authors: Yanlu Xie, Yue Chen, Man Li

    Abstract: Most of mathematic forgetting curve models fit well with the forgetting data under the learning condition of one time rather than repeated. In the paper, a convolution model of forgetting curve is proposed to simulate the memory process during learning. In this model, the memory ability (i.e. the central procedure in the working memory model) and learning material (i.e. the input in the working me… ▽ More

    Submitted 19 January, 2019; originally announced January 2019.

    Comments: 12 pages, 9 figures

  22. arXiv:1305.6760  [pdf

    q-bio.GN

    SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads

    Authors: Yinlong Xie, Gengxiong Wu, Jingbo Tang, Ruibang Luo, Jordan Patterson, Shanlin Liu, Weihua Huang, Guangzhu He, Shengchang Gu, Shengkang Li, Xin Zhou, Tak-Wah Lam, Yingrui Li, Xun Xu, Gane Ka-Shu Wong, Jun Wang

    Abstract: Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences for a large number of genes from an organism with no reference genome. With the rapidly increasing throughputs and decreasing costs of next generation sequencing, RNA-Seq has gained in popularity; but given the typically short reads (e.g. 2 x 90 bp paired ends) of this techno… ▽ More

    Submitted 9 August, 2013; v1 submitted 29 May, 2013; originally announced May 2013.

    Comments: 7 pages, 4 figures, 3 tables