-
AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy
Authors:
Yan Ding,
Hao Cheng,
Ziliang Ye,
Ruyi Feng,
Wei Tian,
Peng Xie,
Juan Zhang,
Zhongze Gu
Abstract:
We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptabilit…
▽ More
We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptability in granularity enriches the model's learning capability at multiple levels and improves its performance in multi-task scenarios. Specifically, the substructure-level molecular representation preserves information about specific atom groups or arrangements, influencing chemical properties and functionalities. This proves advantageous for tasks such as property prediction. Simultaneously, the atomic-level representation, combined with generative molecular canonicalization pre-training tasks, enhances validity, novelty, and uniqueness in generative tasks. All of these features work together to give AdaMR outstanding performance on a range of downstream tasks. We fine-tuned our proposed pre-trained model on six molecular property prediction tasks (MoleculeNet datasets) and two generative tasks (ZINC250K datasets), achieving state-of-the-art (SOTA) results on five out of eight tasks.
△ Less
Submitted 27 April, 2024; v1 submitted 28 December, 2023;
originally announced January 2024.
-
May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations
Authors:
Rui Feng,
Qi Zhu,
Huan Tran,
Binghong Chen,
Aubrey Toland,
Rampi Ramprasad,
Chao Zhang
Abstract:
Recent works have shown the promise of learning pre-trained models for 3D molecular representation. However, existing pre-training models focus predominantly on equilibrium data and largely overlook off-equilibrium conformations. It is challenging to extend these methods to off-equilibrium data because their training objective relies on assumptions of conformations being the local energy minima. W…
▽ More
Recent works have shown the promise of learning pre-trained models for 3D molecular representation. However, existing pre-training models focus predominantly on equilibrium data and largely overlook off-equilibrium conformations. It is challenging to extend these methods to off-equilibrium data because their training objective relies on assumptions of conformations being the local energy minima. We address this gap by proposing a force-centric pretraining model for 3D molecular conformations covering both equilibrium and off-equilibrium data. For off-equilibrium data, our model learns directly from their atomic forces. For equilibrium data, we introduce zero-force regularization and forced-based denoising techniques to approximate near-equilibrium forces. We obtain a unified pre-trained model for 3D molecular representation with over 15 million diverse conformations. Experiments show that, with our pre-training objective, we increase forces accuracy by around 3 times compared to the un-pre-trained Equivariant Transformer model. By incorporating regularizations on equilibrium data, we solved the problem of unstable MD simulations in vanilla Equivariant Transformers, achieving state-of-the-art simulation performance with 2.45 times faster inference time than NequIP. As a powerful molecular encoder, our pre-trained model achieves on-par performance with state-of-the-art property prediction tasks.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks
Authors:
Ruiwei Feng,
Yufeng Xie,
Minshan Lai,
Danny Z. Chen,
Ji Cao,
Jian Wu
Abstract:
Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the f…
▽ More
Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the first time, our AGMI approach explores gene constraint based multi-omics integration for DRP with the whole-genome using GNNs. Empirical experiments on the CCLE and GDSC datasets show that our AGMI largely outperforms state-of-the-art DRP methods by 8.3%--34.2% on four metrics. Our data and code are available at https://github.com/yivan-WYYGDSG/AGMI.
△ Less
Submitted 9 January, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Agent-Based Campus Novel Coronavirus Infection and Control Simulation
Authors:
Pei Lv,
Quan Zhang,
Boya Xu,
Ran Feng,
Chaochao Li,
Junxiao Xue,
Bing Zhou,
Mingliang Xu
Abstract:
Corona Virus Disease 2019 (COVID-19), due to its extremely high infectivity, has been spreading rapidly around the world and bringing huge influence to socioeconomic development as well as people's daily life. Taking for example the virus transmission that may occur after college students return to school, we analyze the quantitative influence of the key factors on the virus spread, including crow…
▽ More
Corona Virus Disease 2019 (COVID-19), due to its extremely high infectivity, has been spreading rapidly around the world and bringing huge influence to socioeconomic development as well as people's daily life. Taking for example the virus transmission that may occur after college students return to school, we analyze the quantitative influence of the key factors on the virus spread, including crowd density and self-protection. One Campus Virus Infection and Control Simulation model (CVICS) of the novel coronavirus is proposed in this paper, fully considering the characteristics of repeated contact and strong mobility of crowd in the closed environment. Specifically, we build an agent-based infection model, introduce the mean field theory to calculate the probability of virus transmission, and micro-simulate the daily prevalence of infection among individuals. The experimental results show that the proposed model in this paper efficiently simulate how the virus spread in the dense crowd in frequent contact under closed environment. Furthermore, preventive and control measures such as self-protection, crowd decentralization and isolation during the epidemic can effectively delay the arrival of infection peak and reduce the prevalence, and finally lower the risk of COVID-19 transmission after the students return to school.
△ Less
Submitted 1 September, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Effects of free-ranging livestock on sympatric herbivores at fine spatiotemporal scales
Authors:
Rongna Feng,
Xinyue Lu,
Tianming Wang,
Jiawei Feng,
Yifei Sun,
Wenhong Xiao,
Yu Guan,
Limin Feng,
James L. D. Smith,
Jianping Ge
Abstract:
Understanding wildlife-livestock interactions is crucial for the design and management of protected areas that aim to conserve large mammal communities undergoing conflicts with humans worldwide. An example of the need to quantify the strength and direction of species interactions is the conservation of big cats in newly established protected areas in China. Currently, free-ranging livestock degra…
▽ More
Understanding wildlife-livestock interactions is crucial for the design and management of protected areas that aim to conserve large mammal communities undergoing conflicts with humans worldwide. An example of the need to quantify the strength and direction of species interactions is the conservation of big cats in newly established protected areas in China. Currently, free-ranging livestock degrade the food and habitat of the endangered Amur tiger and Amur leopard in the forest landscapes of Northeast China, but quantitative assessments of how livestock affect the use of habitat by the major ungulate prey of these predators are very limited. Here, we examined livestock-ungulate interactions using large-scale camera-trap data in the newly established Tiger and Leopard National Park in Northeast China, which borders Russia. We used N-mixture models, two-species occupancy models and activity pattern overlap to understand the effects of cattle grazing on three ungulate species (wild boar, roe deer and sika deer) at a fine spatiotemporal scale. Our results showed that incorporating the biotic interactions with cattle had significant negative effects on encounters with three ungulates; sika deer were particularly displaced as more cattle encroached on forest habitat, as they exhibited low levels of co-occurrence with cattle in terms of habitat use. These results, combined with spatiotemporal overlap, suggested fine-scale avoidance behaviours, and they can help to refine strategies for the conservation of tigers, leopards and their prey in human-dominated transboundary landscapes. Progressively controlling cattle and the impact of cattle on biodiversity while simultaneously addressing the economic needs of local communities should be key priority actions for the Chinese government.
△ Less
Submitted 23 January, 2020; v1 submitted 26 October, 2018;
originally announced October 2018.