-
DEL-Ranking: Ranking-Correction Denoising Framework for Elucidating Molecular Affinities in DNA-Encoded Libraries
Authors:
Hanqun Cao,
Mutian He,
Ning Ma,
Chang-yu Hsieh,
Chunbin Gu,
Pheng-Ann Heng
Abstract:
DNA-encoded library (DEL) screening has revolutionized the detection of protein-ligand interactions through read counts, enabling rapid exploration of vast chemical spaces. However, noise in read counts, stemming from nonspecific interactions, can mislead this exploration process. We present DEL-Ranking, a novel distribution-correction denoising framework that addresses these challenges. Our appro…
▽ More
DNA-encoded library (DEL) screening has revolutionized the detection of protein-ligand interactions through read counts, enabling rapid exploration of vast chemical spaces. However, noise in read counts, stemming from nonspecific interactions, can mislead this exploration process. We present DEL-Ranking, a novel distribution-correction denoising framework that addresses these challenges. Our approach introduces two key innovations: (1) a novel ranking loss that rectifies relative magnitude relationships between read counts, enabling the learning of causal features determining activity levels, and (2) an iterative algorithm employing self-training and consistency loss to establish model coherence between activity label and read count predictions. Furthermore, we contribute three new DEL screening datasets, the first to comprehensively include multi-dimensional molecular representations, protein-ligand enrichment values, and their activity labels. These datasets mitigate data scarcity issues in AI-driven DEL screening research. Rigorous evaluation on diverse DEL datasets demonstrates DEL-Ranking's superior performance across multiple correlation metrics, with significant improvements in binding affinity prediction accuracy. Our model exhibits zero-shot generalization ability across different protein targets and successfully identifies potential motifs determining compound binding affinity. This work advances DEL screening analysis and provides valuable resources for future research in this area.
△ Less
Submitted 4 December, 2024; v1 submitted 18 October, 2024;
originally announced October 2024.
-
Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries
Authors:
Chunbin Gu,
Mutian He,
Hanqun Cao,
Guangyong Chen,
Chang-yu Hsieh,
Pheng Ann Heng
Abstract:
In the realm of drug discovery, DNA-encoded library (DEL) screening technology has emerged as an efficient method for identifying high-affinity compounds. However, DEL screening faces a significant challenge: noise arising from nonspecific interactions within complex biological systems. Neural networks trained on DEL libraries have been employed to extract compound features, aiming to denoise the…
▽ More
In the realm of drug discovery, DNA-encoded library (DEL) screening technology has emerged as an efficient method for identifying high-affinity compounds. However, DEL screening faces a significant challenge: noise arising from nonspecific interactions within complex biological systems. Neural networks trained on DEL libraries have been employed to extract compound features, aiming to denoise the data and uncover potential binders to the desired therapeutic target. Nevertheless, the inherent structure of DEL, constrained by the limited diversity of building blocks, impacts the performance of compound encoders. Moreover, existing methods only capture compound features at a single level, further limiting the effectiveness of the denoising strategy. To mitigate these issues, we propose a Multimodal Pretraining DEL-Fusion model (MPDF) that enhances encoder capabilities through pretraining and integrates compound features across various scales. We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions, enhancing the compound encoders' ability to acquire generic features. Furthermore, we propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels, as captured by various compound encoders. The synergy of these innovations equips MPDF with enriched, multi-scale features, enabling comprehensive downstream denoising. Evaluated on three DEL datasets, MPDF demonstrates superior performance in data processing and analysis for validation tasks. Notably, MPDF offers novel insights into identifying high-affinity molecules, paving the way for improved DEL utility in drug discovery.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
Propensity-score matching analysis in COVID-19-related studies: a method and quality systematic review
Authors:
Chunhui Gu,
Ruosha Li,
Guoqiang Zhang
Abstract:
Objectives: To provide an overall quality assessment of the methods used for COVID-19-related studies using propensity score matching (PSM).
Study Design and Setting: A systematic search was conducted in June 2021 on PubMed to identify COVID-19-related studies that use the PSM analysis between 2020 and 2021. Key information about study design and PSM analysis were extracted, such as covariates,…
▽ More
Objectives: To provide an overall quality assessment of the methods used for COVID-19-related studies using propensity score matching (PSM).
Study Design and Setting: A systematic search was conducted in June 2021 on PubMed to identify COVID-19-related studies that use the PSM analysis between 2020 and 2021. Key information about study design and PSM analysis were extracted, such as covariates, matching algorithm, and reporting of estimated treatment effect type.
Results: One-hundred-and-fifty (87.72%) cohort studies and thirteen (7.60%) case-control studies were found among 171 identified articles. Forty-five studies (26.32%) provided a reasonable justification for covariates selection. One-hundred-and-three (60.23%) and Sixty-nine (40.35%) studies did not provide the model that was used for calculating the propensity score or did not report the matching algorithm, respectively. Seventy-three (42.69%) studies reported the method(s) for checking covariates balance. Forty studies (23.39%) had a statistician co-author. All the case-control studies (n=13) did not have a statistician co-author (p=0.006) and all studies that clarified the treatment effect estimation (n=6) had a statistician co-author (p<0.001).
Conclusions: The reporting quality of the PSM analysis is suboptimal in some COVID-19 epidemiological studies. Some pitfalls may undermine study findings that involve PSM analysis, such as a mismatch between PSM analysis and study design.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Enhancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework
Authors:
Zhuoyao Xin,
Christopher Wu,
Dong Liu,
Chunming Gu,
Jia Guo,
Jun Hua
Abstract:
Image segmentation, real-value prediction, and cross-modal translation are critical challenges in medical imaging. In this study, we propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture, capable of simultaneously, selectively, and adaptively addressing these medical image tasks. Validation is performed on a public repository of human brain MR…
▽ More
Image segmentation, real-value prediction, and cross-modal translation are critical challenges in medical imaging. In this study, we propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture, capable of simultaneously, selectively, and adaptively addressing these medical image tasks. Validation is performed on a public repository of human brain MR and CT images. We decompose the traditional problem of synthesizing CT images into distinct subtasks, which include skull segmentation, Hounsfield unit (HU) value prediction, and image sequential reconstruction. To enhance the framework's versatility in handling multi-modal data, we expand the model with multiple image channels. Comparisons between synthesized CT images derived from T1-weighted and T2-Flair images were conducted, evaluating the model's capability to integrate multi-modal information from both morphological and pixel value perspectives.
△ Less
Submitted 17 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Dynamic Brain Transformer with Multi-level Attention for Functional Brain Network Analysis
Authors:
Xuan Kan,
Antonio Aodong Chen Gu,
Hejie Cui,
Ying Guo,
Carl Yang
Abstract:
Recent neuroimaging studies have highlighted the importance of network-centric brain analysis, particularly with functional magnetic resonance imaging. The emergence of Deep Neural Networks has fostered a substantial interest in predicting clinical outcomes and categorizing individuals based on brain networks. However, the conventional approach involving static brain network analysis offers limite…
▽ More
Recent neuroimaging studies have highlighted the importance of network-centric brain analysis, particularly with functional magnetic resonance imaging. The emergence of Deep Neural Networks has fostered a substantial interest in predicting clinical outcomes and categorizing individuals based on brain networks. However, the conventional approach involving static brain network analysis offers limited potential in capturing the dynamism of brain function. Although recent studies have attempted to harness dynamic brain networks, their high dimensionality and complexity present substantial challenges. This paper proposes a novel methodology, Dynamic bRAin Transformer (DART), which combines static and dynamic brain networks for more effective and nuanced brain function analysis. Our model uses the static brain network as a baseline, integrating dynamic brain networks to enhance performance against traditional methods. We innovatively employ attention mechanisms, enhancing model explainability and exploiting the dynamic brain network's temporal variations. The proposed approach offers a robust solution to the low signal-to-noise ratio of blood-oxygen-level-dependent signals, a recurring issue in direct DNN modeling. It also provides valuable insights into which brain circuits or dynamic networks contribute more to final predictions. As such, DRAT shows a promising direction in neuroimaging studies, contributing to the comprehensive understanding of brain organization and the role of neural circuits.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks
Authors:
Hejie Cui,
Wei Dai,
Yanqiao Zhu,
Xuan Kan,
Antonio Aodong Chen Gu,
Joshua Lukemire,
Liang Zhan,
Lifang He,
Ying Guo,
Carl Yang
Abstract:
Mapping the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their superior performance in many fields, there has not…
▽ More
Mapping the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their superior performance in many fields, there has not yet been a systematic study of how to design effective GNNs for brain network analysis. To bridge this gap, we present BrainGB, a benchmark for brain network analysis with GNNs. BrainGB standardizes the process by (1) summarizing brain network construction pipelines for both functional and structural neuroimaging modalities and (2) modularizing the implementation of GNN designs. We conduct extensive experiments on datasets across cohorts and modalities and recommend a set of general recipes for effective GNN designs on brain networks. To support open and reproducible research on GNN-based brain network analysis, we host the BrainGB website at https://braingb.us with models, tutorials, examples, as well as an out-of-box Python package. We hope that this work will provide useful empirical evidence and offer insights for future research in this novel and promising direction.
△ Less
Submitted 28 November, 2022; v1 submitted 17 March, 2022;
originally announced April 2022.
-
Deep Learning for Genomics: A Concise Overview
Authors:
Tianwei Yue,
Yuanxin Wang,
Longxiang Zhang,
Chunming Gu,
Haoru Xue,
Wenping Wang,
Qi Lyu,
Yujie Dun
Abstract:
Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entai…
▽ More
Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.
△ Less
Submitted 4 October, 2023; v1 submitted 2 February, 2018;
originally announced February 2018.
-
A Chessboard Model of Human Brain and One Application on Memory Capacity
Authors:
Chenxia Gu,
Shaotong Wang,
Hao Yu
Abstract:
The famous claim that we only use about 10% of the brain capacity has recently been challenged. Researchers argue that we are likely to use the whole brain, against the 10% claim. Some evidence and results from relevant studies and experiments related to memory in the field of neuroscience leads to the conclusion that if the rest 90% of the brain is not used, then many neural pathways would degene…
▽ More
The famous claim that we only use about 10% of the brain capacity has recently been challenged. Researchers argue that we are likely to use the whole brain, against the 10% claim. Some evidence and results from relevant studies and experiments related to memory in the field of neuroscience leads to the conclusion that if the rest 90% of the brain is not used, then many neural pathways would degenerate. What is memory? How does the brain function? What would be the limit of memory capacity? This article provides a model established upon the physiological and neurological characteristics of the human brain, which could give some theoretical support and scientific explanation to explain some phenomena. It may not only have theoretically significance in neuroscience, but could also be practically useful to fill in the gap between the natural and machine intelligence.
△ Less
Submitted 29 January, 2016; v1 submitted 3 December, 2015;
originally announced December 2015.
-
A Collaboration Network Model Of Cytokine-Protein Network
Authors:
Sheng-Rong Zou,
Ta Zhou,
Yu-Jing Peng,
Zhong-Wei Guo,
Chang-gui Gu,
Da-Ren He
Abstract:
Complex networks provide us a new view for investigation of immune systems. In this paper we collect data through STRING database and present a model with cooperation network theory. The cytokine-protein network model we consider is constituted by two kinds of nodes, one is immune cytokine types which can act as acts, other one is protein type which can act as actors. From act degree distributio…
▽ More
Complex networks provide us a new view for investigation of immune systems. In this paper we collect data through STRING database and present a model with cooperation network theory. The cytokine-protein network model we consider is constituted by two kinds of nodes, one is immune cytokine types which can act as acts, other one is protein type which can act as actors. From act degree distribution that can be well described by typical SPL -shifted power law functions, we find that HRAS.TNFRSF13C.S100A8.S100A1.MAPK8.S100A7.LIF.CCL4.CXCL13 are highly collaborated with other proteins. It reveals that these mediators are important in cytokine-protein network to regulate immune activity. Dyad act degree distribution is another important property to generalized collaboration network. Dyad is two proteins and they appear in one cytokine collaboration relationship. The dyad act degree distribution can be well described by typical SPL functions. The length of the average shortest path is 1.29. These results show that this model could describe the cytokine-protein collaboration preferably
△ Less
Submitted 5 December, 2007;
originally announced December 2007.
-
An Empirical Study of Immune System Based On Bipartite Network
Authors:
Sheng-Rong Zou,
Yu-Jing Peng,
Zhong-Wei Guo,
Ta Zhou,
Chang-gui Gu,
Da-Ren He
Abstract:
Immune system is the most important defense system to resist human pathogens. In this paper we present an immune model with bipartite graphs theory. We collect data through COPE database and construct an immune cell- mediators network. The act degree distribution of this network is proved to be power-law, with index of 1.8. From our analysis, we found that some mediators with high degree are ver…
▽ More
Immune system is the most important defense system to resist human pathogens. In this paper we present an immune model with bipartite graphs theory. We collect data through COPE database and construct an immune cell- mediators network. The act degree distribution of this network is proved to be power-law, with index of 1.8. From our analysis, we found that some mediators with high degree are very important mediators in the process of regulating immune activity, such as TNF-alpha, IL-8, TNF-alpha receptors, CCL5, IL-6, IL-2 receptors, TNF-beta receptors, TNF-beta, IL-4 receptors, IL-1 beta, CD54 and so on. These mediators are important in immune system to regulate their activity. We also found that the assortative of the immune system is -0.27. It reveals that our immune system is non-social network. Finally we found similarity of the network is 0.13. Each two cells are similar to small extent. It reveals that many cells have its unique features. The results show that this model could describe the immune system comprehensive.
△ Less
Submitted 5 December, 2007;
originally announced December 2007.
-
A Brand-new Research Method of Neuroendocrine System
Authors:
Sheng-Rong Zou,
Zhong-Wei Guo,
Yu-Jing Peng,
Ta Zhou,
Chang-Gui Gu,
Da-Ren He
Abstract:
In this paper, we present the empirical investigation results on the neuroendocrine system by bipartite graphs. This neuroendocrine network model can describe the structural characteristic of neuroendocrine system. The act degree distribution and cumulate act degree distribution show so-called shifted power law-SPL function forms. In neuroendocrine network, the act degree stands for the number o…
▽ More
In this paper, we present the empirical investigation results on the neuroendocrine system by bipartite graphs. This neuroendocrine network model can describe the structural characteristic of neuroendocrine system. The act degree distribution and cumulate act degree distribution show so-called shifted power law-SPL function forms. In neuroendocrine network, the act degree stands for the number of the cells that secretes a single mediator, in which bFGF(basic fibroblast growth factor) is the largest node act degree. It is an important mitogenic cytokine, followed by TGF-beta, IL-6, IL1-beta, VEGF, IGF-1and so on. They are critical in neuroendocrine system to maintain bodily healthiness, emotional stabilization and endocrine harmony. The average act degree of neuroendocrine network is h = 3.01, It means each mediator is secreted by three cells on an average . The similarity that stand for the average probability of secreting the same mediators by all the neuroendocrine cells is s = 0.14. Our results may be used in the research of the medical treatment of neuroendocrine diseases.
△ Less
Submitted 2 December, 2007;
originally announced December 2007.