Search | arXiv e-print repository

A cell-level model to predict the spatiotemporal dynamics of neurodegenerative disease

Authors: Shih-Huan Huang, Matthew W. Cotton, Tuomas P. J. Knowles, David Klenerman, Georg Meisl

Abstract: A central challenge in modeling neurodegenerative diseases is connecting cellular-level mechanisms to tissue-level pathology, in particular to determine whether pathology is driven primarily by cell-autonomous triggers or by propagation from cells that are already in a pathological, runaway aggregation state. To bridge this gap, we here develop a bottom-up physical model that explicitly incorporat… ▽ More A central challenge in modeling neurodegenerative diseases is connecting cellular-level mechanisms to tissue-level pathology, in particular to determine whether pathology is driven primarily by cell-autonomous triggers or by propagation from cells that are already in a pathological, runaway aggregation state. To bridge this gap, we here develop a bottom-up physical model that explicitly incorporates these two fundamental cell-level drivers of protein aggregation dynamics. We show that our model naturally explains the characteristic long, slow development of pathology followed by a rapid acceleration, a hallmark of many neurodegenerative diseases. Furthermore, the model reveals the existence of a critical switch point at which the system's dynamics transition from being dominated by slow, spontaneous formation of diseased cells to being driven by fast propagation. This framework provides a robust physical foundation for interpreting pathological data and offers a method to predict which class of therapeutic strategies is best matched to the underlying drivers of a specific disease. △ Less

Submitted 20 August, 2025; originally announced August 2025.

arXiv:2505.01264 [pdf, ps, other]

Cardiovascular function changes following lung resection: a computational model to compare afterload increase and contractility loss mechanisms

Authors: Shiting Huang, Sanjay Pant, Sean McGinty, Richard Good, Ben Shelley, Ankush Aggarwal

Abstract: Functional limitation after lung resection surgery has been consistently documented in clinical studies, and right ventricle (RV) dysfunction has been hypothesized as a contributing reason. However, the mechanisms of RV dysfunction after lung resection remain unclear, particularly whether change in afterload or contractility is the main cause. This study is the first to employ a lumped parameter m… ▽ More Functional limitation after lung resection surgery has been consistently documented in clinical studies, and right ventricle (RV) dysfunction has been hypothesized as a contributing reason. However, the mechanisms of RV dysfunction after lung resection remain unclear, particularly whether change in afterload or contractility is the main cause. This study is the first to employ a lumped parameter model to simulate the effects of lung resection. The implementation of a computational model allowed us to isolate certain mechanisms that are difficult to perform clinically. Specifically, two mechanisms were compared: afterload increase and RV contractility loss. Furthermore, our rigorous approach included local and global sensitivity analyses to evaluate the effect of parameters on our results, both individually and collectively. Our results demonstrate that contractility and afterload exhibited consistent trends across various pressure and volume conditions, pulmonary artery systolic pressure, pulmonary artery diastolic pressure, and right ventricular systolic pressure showed opposite variations. The results show that post-operative RV dysfunction may result from a combination of RV contractility loss and afterload increase. Further exploration and refinement of this first computational model presented herein will help us predict RV dysfunction after lung resection and pave the way towards improving outcomes for lung cancer patients. △ Less

Submitted 2 May, 2025; originally announced May 2025.

arXiv:2504.11454 [pdf, ps, other]

Elucidating the Design Space of Multimodal Protein Language Models

Authors: Cheng-Yen Hsieh, Xinyou Wang, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, Quanquan Gu

Abstract: Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design spa… ▽ More Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks. To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling. The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5.52 to 2.36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models. Project page and code: https://bytedance.github.io/dplm/dplm-2.1/. △ Less

Submitted 11 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

Comments: ICML 2025 Spotlight; Project Page: https://bytedance.github.io/dplm/dplm-2.1/

arXiv:2412.12668 [pdf, other]

Artificial Intelligence for Central Dogma-Centric Multi-Omics: Challenges and Breakthroughs

Authors: Lei Xin, Caiyun Huang, Hao Li, Shihong Huang, Yuling Feng, Zhenglun Kong, Zicheng Liu, Siyuan Li, Chang Yu, Fei Shen, Hao Tang

Abstract: With the rapid development of high-throughput sequencing platforms, an increasing number of omics technologies, such as genomics, metabolomics, and transcriptomics, are being applied to disease genetics research. However, biological data often exhibit high dimensionality and significant noise, making it challenging to effectively distinguish disease subtypes using a single-omics approach. To addre… ▽ More With the rapid development of high-throughput sequencing platforms, an increasing number of omics technologies, such as genomics, metabolomics, and transcriptomics, are being applied to disease genetics research. However, biological data often exhibit high dimensionality and significant noise, making it challenging to effectively distinguish disease subtypes using a single-omics approach. To address these challenges and better capture the interactions among DNA, RNA, and proteins described by the central dogma, numerous studies have leveraged artificial intelligence to develop multi-omics models for disease research. These AI-driven models have improved the accuracy of disease prediction and facilitated the identification of genetic loci associated with diseases, thus advancing precision medicine. This paper reviews the mathematical definitions of multi-omics, strategies for integrating multi-omics data, applications of artificial intelligence and deep learning in multi-omics, the establishment of foundational models, and breakthroughs in multi-omics technologies, drawing insights from over 130 related articles. It aims to provide practical guidance for computational biologists to better understand and effectively utilize AI-based multi-omics machine learning algorithms in the context of central dogma. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2410.13782 [pdf, other]

DPLM-2: A Multimodal Diffusion Protein Language Model

Authors: Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu

Abstract: Proteins are essential macromolecules defined by their amino acid sequences, which determine their three-dimensional structures and, consequently, their functions in all living organisms. Therefore, generative protein modeling necessitates a multimodal approach to simultaneously model, understand, and generate both sequences and structures. However, existing methods typically use separate models f… ▽ More Proteins are essential macromolecules defined by their amino acid sequences, which determine their three-dimensional structures and, consequently, their functions in all living organisms. Therefore, generative protein modeling necessitates a multimodal approach to simultaneously model, understand, and generate both sequences and structures. However, existing methods typically use separate models for each modality, limiting their ability to capture the intricate relationships between sequence and structure. This results in suboptimal performance in tasks that requires joint understanding and generation of both modalities. In this paper, we introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures. To enable structural learning with the language model, 3D coordinates are converted to discrete tokens using a lookup-free quantization-based tokenizer. By training on both experimental and high-quality synthetic structures, DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals. We also implement an efficient warm-up strategy to exploit the connection between large-scale evolutionary data and structural inductive biases from pre-trained sequence-based protein language models. Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures eliminating the need for a two-stage generation approach. Moreover, DPLM-2 demonstrates competitive performance in various conditional generation tasks, including folding, inverse folding, and scaffolding with multimodal motif inputs, as well as providing structure-aware representations for predictive tasks. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2408.13919 [pdf, other]

Quantum Multimodal Contrastive Learning Framework

Authors: Chi-Sheng Chen, Aidan Hung-Wen Tsai, Sheng-Chieh Huang

Abstract: In this paper, we propose a novel framework for multimodal contrastive learning utilizing a quantum encoder to integrate EEG (electroencephalogram) and image data. This groundbreaking attempt explores the integration of quantum encoders within the traditional multimodal learning framework. By leveraging the unique properties of quantum computing, our method enhances the representation learning cap… ▽ More In this paper, we propose a novel framework for multimodal contrastive learning utilizing a quantum encoder to integrate EEG (electroencephalogram) and image data. This groundbreaking attempt explores the integration of quantum encoders within the traditional multimodal learning framework. By leveraging the unique properties of quantum computing, our method enhances the representation learning capabilities, providing a robust framework for analyzing time series and visual information concurrently. We demonstrate that the quantum encoder effectively captures intricate patterns within EEG signals and image features, facilitating improved contrastive learning across modalities. This work opens new avenues for integrating quantum computing with multimodal data analysis, particularly in applications requiring simultaneous interpretation of temporal and visual data. △ Less

Submitted 4 March, 2025; v1 submitted 25 August, 2024; originally announced August 2024.

Comments: 15 pages

arXiv:2405.06690 [pdf, other]

DrugLLM: Open Large Language Model for Few-shot Molecule Generation

Authors: Xianggen Liu, Yan Guo, Haoran Li, Jin Liu, Shudong Huang, Bowen Ke, Jiancheng Lv

Abstract: Large Language Models (LLMs) have made great strides in areas such as language processing and computer vision. Despite the emergence of diverse techniques to improve few-shot learning capacity, current LLMs fall short in handling the languages in biology and chemistry. For example, they are struggling to capture the relationship between molecule structure and pharmacochemical properties. Consequen… ▽ More Large Language Models (LLMs) have made great strides in areas such as language processing and computer vision. Despite the emergence of diverse techniques to improve few-shot learning capacity, current LLMs fall short in handling the languages in biology and chemistry. For example, they are struggling to capture the relationship between molecule structure and pharmacochemical properties. Consequently, the few-shot learning capacity of small-molecule drug modification remains impeded. In this work, we introduced DrugLLM, a LLM tailored for drug design. During the training process, we employed Group-based Molecular Representation (GMR) to represent molecules, arranging them in sequences that reflect modifications aimed at enhancing specific molecular properties. DrugLLM learns how to modify molecules in drug discovery by predicting the next molecule based on past modifications. Extensive computational experiments demonstrate that DrugLLM can generate new molecules with expected properties based on limited examples, presenting a powerful few-shot molecule generation capacity. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 17 pages, 3 figures

arXiv:2402.18567 [pdf, other]

Diffusion Language Models Are Versatile Protein Learners

Authors: Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu

Abstract: This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework, which generalizes language modeling for proteins in a princ… ▽ More This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework, which generalizes language modeling for proteins in a principled way. After pre-training, DPLM exhibits the ability to generate structurally plausible, novel, and diverse protein sequences for unconditional generation. We further demonstrate the proposed diffusion generative pre-training makes DPLM possess a better understanding of proteins, making it a superior representation learner, which can be fine-tuned for various predictive tasks, comparing favorably to ESM2 (Lin et al., 2022). Moreover, DPLM can be tailored for various needs, which showcases its prowess of conditional generation in several ways: (1) conditioning on partial peptide sequences, e.g., generating scaffolds for functional motifs with high success rate; (2) incorporating other modalities as conditioner, e.g., structure-conditioned generation for inverse folding; and (3) steering sequence generation towards desired properties, e.g., satisfying specified secondary structures, through a plug-and-play classifier guidance. Code is released at \url{https://github.com/bytedance/dplm}. △ Less

Submitted 16 October, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: ICML 2024 camera-ready version

arXiv:2401.03095 [pdf, other]

Dimensional reduction of gradient-like stochastic systems with multiplicative noise via Fokker-Planck diffusion maps

Authors: Andrew Baumgartner, Sui Huang, Jennifer Hadlock, Cory Funk

Abstract: Dimensional reduction techniques have long been used to visualize the structure and geometry of high dimensional data. However, most widely used techniques are difficult to interpret due to nonlinearities and opaque optimization processes. Here we present a specific graph based construction for dimensionally reducing continuous stochastic systems with multiplicative noise moving under the influenc… ▽ More Dimensional reduction techniques have long been used to visualize the structure and geometry of high dimensional data. However, most widely used techniques are difficult to interpret due to nonlinearities and opaque optimization processes. Here we present a specific graph based construction for dimensionally reducing continuous stochastic systems with multiplicative noise moving under the influence of a potential. To achieve this, we present a specific graph construction which generates the Fokker-Planck equation of the stochastic system in the continuum limit. The eigenvectors and eigenvalues of the normalized graph Laplacian are used as a basis for the dimensional reduction and yield a low dimensional representation of the dynamics which can be used for downstream analysis such as spectral clustering. We focus on the use case of single cell RNA sequencing data and show how current diffusion map implementations popular in the single cell literature fit into this framework. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2306.13957 [pdf, other]

DiffDTM: A conditional structure-free framework for bioactive molecules generation targeted for dual proteins

Authors: Lei Huang, Zheng Yuan, Huihui Yan, Rong Sheng, Linjing Liu, Fuzhou Wang, Weidun Xie, Nanjun Chen, Fei Huang, Songfang Huang, Ka-Chun Wong, Yaoyun Zhang

Abstract: Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free… ▽ More Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free deep generative model based on a diffusion model for dual targets based molecule generation to address the above issues. Specifically, DiffDTM receives protein sequences and molecular graphs as inputs instead of protein and molecular conformations and incorporates an information fusion module to achieve conditional generation in a one-shot manner. We have conducted comprehensive multi-view experiments to demonstrate that DiffDTM can generate drug-like, synthesis-accessible, novel, and high-binding affinity molecules targeting specific dual proteins, outperforming the state-of-the-art (SOTA) models in terms of multiple evaluation metrics. Furthermore, we utilized DiffDTM to generate molecules towards dopamine receptor D2 and 5-hydroxytryptamine receptor 1A as new antipsychotics. The experimental results indicate that DiffDTM can be easily plugged into unseen dual targets to generate bioactive molecules, addressing the issues of requiring insufficient active molecule data for training as well as the need to retrain when encountering new targets. △ Less

Submitted 24 June, 2023; originally announced June 2023.

arXiv:2302.00855 [pdf, other]

Molecular Geometry-aware Transformer for accurate 3D Atomic System modeling

Authors: Zheng Yuan, Yaoyun Zhang, Chuanqi Tan, Wei Wang, Fei Huang, Songfang Huang

Abstract: Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learning-based methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use a… ▽ More Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learning-based methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use atoms as inputs which lack explicit modeling of the aforementioned factors. To alleviate this limitation, we propose Moleformer, a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them using rotational and translational invariant geometry-aware spatial encoding. Proposed spatial encoding calculates relative position information including distances and angles among nodes and edges. We benchmark Moleformer on OC20 and QM9 datasets, and our model achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties compared to other Transformer and Graph Neural Network (GNN) methods which proves the effectiveness of the proposed geometry-aware spatial encoding in Moleformer. △ Less

Submitted 1 February, 2023; originally announced February 2023.

arXiv:2301.03782 [pdf, other]

doi 10.1016/j.jtbi.2023.111645

Cell Population Growth Kinetics in the Presence of Stochastic Heterogeneity of Cell Phenotype

Authors: Yue Wang, Joseph X. Zhou, Edoardo Pedrini, Irit Rubin, May Khalil, Roberto Taramelli, Hong Qian, Sui Huang

Abstract: Recent studies at individual cell resolution have revealed phenotypic heterogeneity in nominally clonal tumor cell populations. The heterogeneity affects cell growth behaviors, which can result in departure from the idealized uniform exponential growth of the cell population. Here we measured the stochastic time courses of growth of an ensemble of populations of HL60 leukemia cells in cultures, st… ▽ More Recent studies at individual cell resolution have revealed phenotypic heterogeneity in nominally clonal tumor cell populations. The heterogeneity affects cell growth behaviors, which can result in departure from the idealized uniform exponential growth of the cell population. Here we measured the stochastic time courses of growth of an ensemble of populations of HL60 leukemia cells in cultures, starting with distinct initial cell numbers to capture a departure from the {uniform exponential growth model for the initial growth (``take-off'')}. Despite being derived from the same cell clone, we observed significant variations in the early growth patterns of individual cultures with statistically significant differences in growth dynamics, which could be explained by the presence of inter-converting subpopulations with different growth rates, and which could last for many generations. Based on the hypothesis of existence of multiple subpopulations, we developed a branching process model that was consistent with the experimental observations. △ Less

Submitted 18 October, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

arXiv:2210.09574 [pdf]

Integrative Pan-Cancer Analysis of RNMT: a Potential Prognostic and Immunological Biomarker

Authors: Shuqiang Huang, Cuiyu Tan, Jinzhen Zheng, Zhugu Huang, Zhihong Li, Ziyin Lv, Wanru Chen

Abstract: Background: RNA guanine-7 methyltransferase (RNMT) is one of the main regulators of N7-methylguanosine, and the deregulation of RNMT correlated with tumor development and immune metabolism. However, the specific function of RNMT in pan-cancer remains unclear. Methods: RNMT expression in different cancers was analyzed using multiple databases, including Cancer Cell Line Encyclopedia (CCLE), Genot… ▽ More Background: RNA guanine-7 methyltransferase (RNMT) is one of the main regulators of N7-methylguanosine, and the deregulation of RNMT correlated with tumor development and immune metabolism. However, the specific function of RNMT in pan-cancer remains unclear. Methods: RNMT expression in different cancers was analyzed using multiple databases, including Cancer Cell Line Encyclopedia (CCLE), Genotype-Tissue Expression Project (GTEx), and The Cancer Genome Atlas (TCGA). Cox regression analysis and Kaplan-Meier analysis were used to estimate the correlation of RNMT expression to prognosis. The data was also used to research the relationship between RNMT expression and common immunoregulators, tumor mutation burden (TMB), microsatellite instability (MSI), mismatch repair (MMR), and DNA methyltransferase (DNMT). Additionally, the cBioPortal website was used to evaluate the characteristics of RNMT alteration. The TISDB database was used to obtain the expression of different subtypes. The Tumor Immune Estimation Resource (TIMER) database was used to analyze the association between RNMT and tumor immune infiltration. Gene set enrichment analysis (GSEA) was used to identify the relevant pathways. Results: RNMT was ubiquitously highly expressed across cancers and survival analysis revealed that its expression was highly associated with the clinical prognosis of various cancer types. Remarkably, RNMT participates in immune regulation and plays a crucial part in the tumor microenvironment. A positive association was found between RNMT expression and six immune cell types expression in colon adenocarcinoma, kidney renal clear cell carcinoma, and liver hepatocellular carcinoma. Moreover, RNMT expression was highly associated with immunoregulators in most cancer types, and correlated to TMB, MSI, MMR, and DNMT. Finally, GSEA indicated that RNMT may correlate with tumor immunity. △ Less

Submitted 21 March, 2024; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2207.02985 [pdf, other]

Orthogonal Matrix Retrieval with Spatial Consensus for 3D Unknown-View Tomography

Authors: Shuai Huang, Mona Zehni, Ivan Dokmanić, Zhizhen Zhao

Abstract: Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations. A line of work starting with Kam (1980) employs the method of moments (MoM) with rotation-invariant Fourier features to solve UVT in the frequency domain, assuming that the orientations are uniformly distributed. This line of work includes the recent orthogonal matrix retrieval (OMR… ▽ More Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations. A line of work starting with Kam (1980) employs the method of moments (MoM) with rotation-invariant Fourier features to solve UVT in the frequency domain, assuming that the orientations are uniformly distributed. This line of work includes the recent orthogonal matrix retrieval (OMR) approaches based on matrix factorization, which, while elegant, either require side information about the density that is not available, or fail to be sufficiently robust. For OMR to break free from those restrictions, we propose to jointly recover the density map and the orthogonal matrices by requiring that they be mutually consistent. We regularize the resulting non-convex optimization problem by a denoised reference projection and a nonnegativity constraint. This is enabled by the new closed-form expressions for spatial autocorrelation features. Further, we design an easy-to-compute initial density map which effectively mitigates the non-convexity of the reconstruction problem. Experimental results show that the proposed OMR with spatial consensus is more robust and performs significantly better than the previous state-of-the-art OMR approach in the typical low-SNR scenario of 3D UVT. △ Less

Submitted 10 June, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: Keywords: unknown view tomography, single-particle cryo-electron microscopy, method of moments, autocorrelation, spherical harmonics

MSC Class: 92C55; 68U10; 33C55; 78M05

arXiv:2206.02788 [pdf]

doi 10.1073/pnas.2118836119

Accurate Virus Identification with Interpretable Raman Signatures by Machine Learning

Authors: Jiarong Ye, Yin-Ting Yeh, Yuan Xue, Ziyang Wang, Na Zhang, He Liu, Kunyan Zhang, RyeAnne Ricker, Zhuohang Yu, Allison Roder, Nestor Perea Lopez, Lindsey Organtini, Wallace Greene, Susan Hafenstein, Huaguang Lu, Elodie Ghedin, Mauricio Terrones, Shengxi Huang, Sharon Xiaolei Huang

Abstract: Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device coupled with label-free Raman Spectroscopy holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning approach applied to recognize the virus based on i… ▽ More Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device coupled with label-free Raman Spectroscopy holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning approach applied to recognize the virus based on its Raman spectrum, which is used as a fingerprint. We present such a machine learning approach for analyzing Raman spectra of human and avian viruses. A Convolutional Neural Network (CNN) classifier specifically designed for spectral data achieves very high accuracy for a variety of virus type or subtype identification tasks. In particular, it achieves 99% accuracy for classifying influenza virus type A vs. type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and non-enveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus, IBV) from other avian viruses. Furthermore, interpretation of neural net responses in the trained CNN model using a full-gradient algorithm highlights Raman spectral ranges that are most important to virus identification. By correlating ML-selected salient Raman ranges with the signature ranges of known biomolecules and chemical functional groups (for example, amide, amino acid, carboxylic acid), we verify that our ML model effectively recognizes the Raman signatures of proteins, lipids and other vital functional groups present in different viruses and uses a weighted combination of these signatures to identify viruses. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: 23 pages, 8 figures

Journal ref: Proceedings of the National Academy of Sciences of the United States of America (2022)

arXiv:2203.05716 [pdf, other]

Evaluating U-net Brain Extraction for Multi-site and Longitudinal Preclinical Stroke Imaging

Authors: Erendiz Tarakci, Joseph Mandeville, Fahmeed Hyder, Basavaraju G. Sanganahalli, Daniel R. Thedens, Ali Arbab, Shuning Huang, Adnan Bibic, Jelena Mihailovic, Andreia Morais, Jessica Lamb, Karisma Nagarkatti, Marcio A. Dinitz, Andre Rogatko, Arthur W. Toga, Patrick Lyden, Cenk Ayata, Ryan P. Cabeen

Abstract: Rodent stroke models are important for evaluating treatments and understanding the pathophysiology and behavioral changes of brain ischemia, and magnetic resonance imaging (MRI) is a valuable tool for measuring outcome in preclinical studies. Brain extraction is an essential first step in most neuroimaging pipelines; however, it can be challenging in the presence of severe pathology and when datas… ▽ More Rodent stroke models are important for evaluating treatments and understanding the pathophysiology and behavioral changes of brain ischemia, and magnetic resonance imaging (MRI) is a valuable tool for measuring outcome in preclinical studies. Brain extraction is an essential first step in most neuroimaging pipelines; however, it can be challenging in the presence of severe pathology and when dataset quality is highly variable. Convolutional neural networks (CNNs) can improve accuracy and reduce operator time, facilitating high throughput preclinical studies. As part of an ongoing preclinical stroke imaging study, we developed a deep-learning mouse brain extraction tool by using a U-net CNN. While previous studies have evaluated U-net architectures, we sought to evaluate their practical performance across data types. We ask how performance is affected with data across: six imaging centers, two time points after experimental stroke, and across four MRI contrasts. We trained, validated, and tested a typical U-net model on 240 multimodal MRI datasets including quantitative multi-echo T2 and apparent diffusivity coefficient (ADC) maps, and performed qualitative evaluation with a large preclinical stroke database (N=1,368). We describe the design and development of this system, and report our findings linking data characteristics to segmentation performance. We consistently found high accuracy and ability of the U-net architecture to generalize performance in a range of 95-97% accuracy, with only modest reductions in performance based on lower fidelity imaging hardware and brain pathology. This work can help inform the design of future preclinical rodent imaging studies and improve their scalability and reliability. △ Less

Submitted 10 March, 2022; originally announced March 2022.

arXiv:2203.05714 [pdf, other]

Computational Image-based Stroke Assessment for Evaluation of Cerebroprotectants with Longitudinal and Multi-site Preclinical MRI

Authors: Ryan P. Cabeen, Joseph Mandeville, Fahmeed Hyder, Basavaraju G. Sanganahalli, Daniel R. Thedens, Ali Arbab, Shuning Huang, Adnan Bibic, Erendiz Tarakci, Jelena Mihailovic, Andreia Morais, Jessica Lamb, Karisma Nagarkatti, Arthur W. Toga, Patrick Lyden, Cenk Ayata

Abstract: While ischemic stroke is a leading cause of death worldwide, there has been little success translating putative cerebroprotectants from rodent preclinical trials to human patients. We investigated computational image-based assessment tools for practical improvement of the quality, scalability, and outlook for large scale preclinical screening for potential therapeutic interventions in rodent model… ▽ More While ischemic stroke is a leading cause of death worldwide, there has been little success translating putative cerebroprotectants from rodent preclinical trials to human patients. We investigated computational image-based assessment tools for practical improvement of the quality, scalability, and outlook for large scale preclinical screening for potential therapeutic interventions in rodent models. We developed, evaluated, and deployed a pipeline for image-based stroke outcome quantification for the Stroke Preclinical Assessment Network (SPAN), a multi-site, multi-arm, multi-stage study evaluating a suite of cerebroprotectant interventions. Our fully automated pipeline combines state-of-the-art algorithmic and data analytic approaches to assess stroke outcomes from multi-parameter MRI data collected longitudinally from a rodent model of middle cerebral artery occlusion (MCAO), including measures of infarct volume, brain atrophy, midline shift, and data quality. We applied our approach to 1,368 scans and report population level results of lesion extent and longitudinal changes from injury. We validated our system by comparison with both manual annotations of coronal MRI slices and tissue sections from the same brain, using crowdsourcing from blinded stroke experts from the network. Our results demonstrate the efficacy and robustness of our image-based stroke assessments. The pipeline may provide a promising resource for ongoing rodent preclinical studies conducted by SPAN and other networks in the future. △ Less

Submitted 29 March, 2023; v1 submitted 10 March, 2022; originally announced March 2022.

arXiv:2201.00087 [pdf, other]

Persistent Homological State-Space Estimation of Functional Human Brain Networks at Rest

Authors: Moo K. Chung, Shih-Gu Huang, Ian C. Carroll, Vince D. Calhoun, H. Hill Goldsmith

Abstract: We introduce an innovative, data-driven topological data analysis (TDA) technique for estimating the state spaces of dynamically changing functional human brain networks at rest. Our method utilizes the Wasserstein distance to measure topological differences, enabling the clustering of brain networks into distinct topological states. This technique outperforms the commonly used k-means clustering… ▽ More We introduce an innovative, data-driven topological data analysis (TDA) technique for estimating the state spaces of dynamically changing functional human brain networks at rest. Our method utilizes the Wasserstein distance to measure topological differences, enabling the clustering of brain networks into distinct topological states. This technique outperforms the commonly used k-means clustering in identifying brain network state spaces by effectively incorporating the temporal dynamics of the data without the need for explicit model specification. We further investigate the genetic underpinnings of these topological features using a twin study design, examining the heritability of such state changes. Our findings suggest that the topology of brain networks, particularly in their dynamic state changes, may hold significant hidden genetic information. MATLAB code for the method is available at https://github.com/laplcebeltrami/PH-STAT. △ Less

Submitted 16 April, 2024; v1 submitted 31 December, 2021; originally announced January 2022.

Comments: To be published in PLOS Computational Biology

arXiv:2110.01503 [pdf]

Emotionally-Informed Decisions: Bringing Gut's Feelings into Self-adaptive and Co-adaptive Software Systems

Authors: Emmanuelle Tognoli, Shihong Huang

Abstract: Software systems now complement an incredibly vast number of human activities, and much effort has been deployed to make them quasi-autonomous with the build-up of increasingly performant self-adaptive capabilities, so that the burden of failure, interruption and functional loss requiring expert intervention is fewer and far in between. Even as software systems are rapidly gaining skills that beat… ▽ More Software systems now complement an incredibly vast number of human activities, and much effort has been deployed to make them quasi-autonomous with the build-up of increasingly performant self-adaptive capabilities, so that the burden of failure, interruption and functional loss requiring expert intervention is fewer and far in between. Even as software systems are rapidly gaining skills that beat humans', humans retain greatly superior adaptability, especially in the context of emotionally-informed decisions and decisions under uncertainty; that is to say, self-adaptive and co-adaptive software systems have yet to acquire a "gut's feeling". This provides the double opportunity to conceptualize human-inspired processes of decision-making under uncertainty in the self-adaptive part of a software, as well as to source human unique emotional competences in co-adaptive architectures. In this paper, some algorithms are discussed that can provide software systems with realistic decision-making, and some architectures are conceptualized that resort to human emotions to quantify uncertainty and to contribute in the software's adaptation process. △ Less

Submitted 4 October, 2021; originally announced October 2021.

arXiv:2107.02962 [pdf, other]

Transmission Dynamics of COVID-19 Pandemic Non-pharmaceutical Interventions and Vaccination

Authors: Bin-Guo Wang, Shunxiang Huang, Yongping Xiong, Ming-Zhen Xin, Jing LI, Jiangqian Zhang, Zhihui Ma

Abstract: Non-pharmaceutical interventions(NPIs) play an important role in the early stage control of COVID-19 pandemic. Vaccination is considered to be the inevitable course to stop the spread of SARS-CoV-2. Based on the mechanism, a SVEIR COVID-19 model with vaccination and NPIs is proposed. By means of the basic reproduction number $R_{0}$, it is shown that the disease-free equilibrium is globally attrac… ▽ More Non-pharmaceutical interventions(NPIs) play an important role in the early stage control of COVID-19 pandemic. Vaccination is considered to be the inevitable course to stop the spread of SARS-CoV-2. Based on the mechanism, a SVEIR COVID-19 model with vaccination and NPIs is proposed. By means of the basic reproduction number $R_{0}$, it is shown that the disease-free equilibrium is globally attractive if $\mathscr{R}_{0}<1$, and COVID-19 is uniform persistence if $\mathscr{R}_{0}>1$. Taking Indian dates for example in the numerical simulation, we find that our dynamical results fits well with the statistical dates. Consequently, we forecast the spreading trend of COVID-19 pandemic in India. Furthermore, our results imply that improving the intensity of NPIs will greatly reduce the number of confirmed cases. Especially, NPIs are indispensable even if all the people were vaccinated when the efficiency of vaccine is relatively low. By simulating the relation ships of the basic reproduction number $\mathscr{R}_{0}$, the vaccination rate and the efficiency of vaccine, we find that it is impossible to achieve the herd immunity without NPIs when the efficiency of vaccine is lower than $76.9\%$. Therefore, the herd immunity area is defined by the evolution of relationships between the vaccination rate and the efficiency of vaccine. In the study of two patchy, we give the conditions for India and China to be open to navigation. Furthermore, an appropriate dispersal of population between India and China is obtained. A discussion completes the paper. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2103.09390 [pdf, other]

Identify Hidden Spreaders of Pandemic over Contact Tracing Networks

Authors: Shuhong Huang, Jiachen Sun, Ling Feng, Jiarong Xie, Dashun Wang, Yanqing Hu

Abstract: The COVID-19 infection cases have surged globally, causing devastations to both the society and economy. A key factor contributing to the sustained spreading is the presence of a large number of asymptomatic or hidden spreaders, who mix among the susceptible population without being detected or quarantined. Here we propose an effective non-pharmacological intervention method of detecting the asymp… ▽ More The COVID-19 infection cases have surged globally, causing devastations to both the society and economy. A key factor contributing to the sustained spreading is the presence of a large number of asymptomatic or hidden spreaders, who mix among the susceptible population without being detected or quarantined. Here we propose an effective non-pharmacological intervention method of detecting the asymptomatic spreaders in contact-tracing networks, and validated it on the empirical COVID-19 spreading network in Singapore. We find that using pure physical spreading equations, the hidden spreaders of COVID-19 can be identified with remarkable accuracy. Specifically, based on the unique characteristics of COVID-19 spreading dynamics, we propose a computational framework capturing the transition probabilities among different infectious states in a network, and extend it to an efficient algorithm to identify asymptotic individuals. Our simulation results indicate that a screening method using our prediction outperforms machine learning algorithms, e.g. graph neural networks, that are designed as baselines in this work, as well as random screening of infection's closest contacts widely used by China in its early outbreak. Furthermore, our method provides high precision even with incomplete information of the contract-tracing networks. Our work can be of critical importance to the non-pharmacological interventions of COVID-19, especially with increasing adoptions of contact tracing measures using various new technologies. Beyond COVID-19, our framework can be useful for other epidemic diseases that also feature asymptomatic spreading △ Less

Submitted 16 March, 2021; originally announced March 2021.

Comments: 14 pages, 4 figures

arXiv:2012.00001 [pdf, other]

Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

Authors: Lingjing Jiang, Niina Haiminen, Anna-Paola Carrieri, Shi Huang, Yoshiki Vazquez-Baeza, Laxmi Parida, Ho-Cheol Kim, Austin D. Swafford, Rob Knight, Loki Natarajan

Abstract: Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high-dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model prediction… ▽ More Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high-dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the training data would lead to large changes in the chosen feature subset, then many of the biological features that an algorithm has found are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metric MSE and proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications. We conclude that Stability is a preferred feature selection criterion over MSE because it better quantifies the reproducibility of the feature selection method. △ Less

Submitted 30 November, 2020; originally announced December 2020.

Report number: https://doi.org/10.1111/biom.13481

arXiv:2006.10376 [pdf, other]

The Weather Impacts the Outbreak of COVID-19 in Mainland China

Authors: Siyu Huang, Ji Liu, Haoyi Xiong, Jizhou Huang, Haozhe An, Dejing Dou

Abstract: Recent literature has suggested that climate conditions have considerably significant influences on the transmission of coronavirus COVID-19. However, there is a lack of comprehensive study that investigates the relationships between multiple weather factors and the development of COVID-19 pandemic while excluding the impact of social factors. In this paper, we study the relationships between six… ▽ More Recent literature has suggested that climate conditions have considerably significant influences on the transmission of coronavirus COVID-19. However, there is a lack of comprehensive study that investigates the relationships between multiple weather factors and the development of COVID-19 pandemic while excluding the impact of social factors. In this paper, we study the relationships between six main weather factors and the infection statistics of COVID-19 on 250 cities in Mainland China. Our correlation analysis using weather and infection statistics indicates that all the studied weather factors are correlated with the spread of COVID-19, where precipitation shows the strongest correlation. We also build a weather-aware predictive model that forecasts the number of infected cases should there be a second wave of the outbreak in Mainland China. Our predicted results show that cities located in different geographical areas are likely to be challenged with the second wave of COVID-19 at very different time periods and the severity of the outbreak varies to a large degree, in correspondence with the varying weather conditions. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 18 pages, 12 figures, 1 table

arXiv:2005.08312 [pdf, other]

Extinction and quasi-stationarity for discrete-time, endemic SIS and SIR models

Authors: Sebastian J. Schreiber, Shuo Huang, Jifa Jiang, Hao Wang

Abstract: Stochastic discrete-time SIS and SIR models of endemic diseases are introduced and analyzed. For the deterministic, mean-field model, the basic reproductive number $R_0$ determines their global dynamics. If $R_0\le 1$, then the frequency of infected individuals asymptotically converges to zero. If $R_0>1$, then the infectious class uniformly persists for all time; conditions for a globally stable,… ▽ More Stochastic discrete-time SIS and SIR models of endemic diseases are introduced and analyzed. For the deterministic, mean-field model, the basic reproductive number $R_0$ determines their global dynamics. If $R_0\le 1$, then the frequency of infected individuals asymptotically converges to zero. If $R_0>1$, then the infectious class uniformly persists for all time; conditions for a globally stable, endemic equilibrium are given. In contrast, the infection goes extinct in finite time with probability one in the stochastic models for all $R_0$ values. To understand the length of the transient prior to extinction as well as the behavior of the transients, the quasi-stationary distributions and the associated mean time to extinction are analyzed using large deviation methods. When $R_0>1$, these mean times to extinction are shown to increase exponentially with the population size $N$. Moreover, as $N$ approaches $\infty$, the quasi-stationary distributions are supported by a compact set bounded away from extinction; sufficient conditions for convergence to a Dirac measure at the endemic equilibrium of the deterministic model are also given. In contrast, when $R_0<1$, the mean times to extinction are bounded above $1/(1-α)$ where $α<1$ is the geometric rate of decrease of the infection when rare; as $N$ approaches $\infty$, the quasi-stationary distributions converge to a Dirac measure at the disease-free equilibrium for the deterministic model. For several special cases, explicit formulas for approximating the quasi-stationary distribution and the associated mean extinction are given. These formulas illustrate how for arbitrarily small $R_0$ values, the mean time to extinction can be arbitrarily large, and how for arbitrarily large $R_0$ values, the mean time to extinction can be arbitrarily large. △ Less

Submitted 17 May, 2020; originally announced May 2020.

arXiv:2003.13473 [pdf]

Modern alleles in archaic human Y chromosomes support origin of modern human paternal lineages in Asia rather than Africa

Authors: Hongyao Chen, Shi Huang

Abstract: Recent studies have shown that hybridization between modern and archaic humans was commonplace in the history of our species. After admixture, some individuals with admixed autosomes carried the modern Homo Sapiens uniparental DNAs, while the rest carried the archaic versions. Coevolution of admixed autosomes and uniparental DNAs is expected to cause some of the sites in modern uniparental DNAs to… ▽ More Recent studies have shown that hybridization between modern and archaic humans was commonplace in the history of our species. After admixture, some individuals with admixed autosomes carried the modern Homo Sapiens uniparental DNAs, while the rest carried the archaic versions. Coevolution of admixed autosomes and uniparental DNAs is expected to cause some of the sites in modern uniparental DNAs to revert back to archaic alleles, while the opposite process would occur (from archaic to modern) in some of the sites in archaic uniparental DNAs. This type of coevolution is one of the elements that differentiate the two different models of the Y phylogenetic tree of modern humans, rooting it either in Africa or East Asia. The expected reversion to archaic alleles is assumed to occur and is easily traceable in the Asia model, but is absent in the Africa model due to its infinite site assumption, which also precludes the independent or convergent mutation to modern alleles in archaic uniparental DNAs since mutations are assumed to occur randomly across a neutral genome, and convergent evolution is assumed not to occur. Here, we examined newly published high coverage Y chromosome sequencing data of two Denisovan and two Neanderthal samples to determine whether they carry modern-Homo Sapiens alleles in sites where they are not supposed to according to the Africa model. The results showed that a significant fraction of the sites that, according to the Asia model, should differentiate the original modern Y from the original archaic Y carried modern alleles in the archaic Y samples here. Some of these modern alleles were shared among all archaic humans while others could differentiate Denisovans from Neanderthals. The observation is best accounted for by coevolution of archaic Y and admixed modern autosomes, and hence supports the Asia model, since it takes such coevolution into account. △ Less

Submitted 26 March, 2020; originally announced March 2020.

Comments: 14 pages, 2 figures, 2 tables

arXiv:1912.11423 [pdf, other]

Towards Multicellular Biological Deep Neural Nets Based on Transcriptional Regulation

Authors: Sihao Huang

Abstract: Artificial neurons built on synthetic gene networks have potential applications ranging from complex cellular decision-making to bioreactor regulation. Furthermore, due to the high information throughput of natural systems, it provides an interesting candidate for biologically-based supercomputing and analog simulations of traditionally intractable problems. In this paper, we propose an architectu… ▽ More Artificial neurons built on synthetic gene networks have potential applications ranging from complex cellular decision-making to bioreactor regulation. Furthermore, due to the high information throughput of natural systems, it provides an interesting candidate for biologically-based supercomputing and analog simulations of traditionally intractable problems. In this paper, we propose an architecture for constructing multicellular neural networks and programmable nonlinear systems. We design an artificial neuron based on gene regulatory networks and optimize its dynamics for modularity. Using gene expression models, we simulate its ability to perform arbitrary linear classifications from multiple inputs. Finally, we construct a two-layer neural network to demonstrate scalability and nonlinear decision boundaries and discuss future directions for utilizing uncontrolled neurons in computational tasks. △ Less

Submitted 30 January, 2020; v1 submitted 24 December, 2019; originally announced December 2019.

arXiv:1911.10316 [pdf]

Deep graph embedding for prioritizing synergistic anticancer drug combinations

Authors: Peiran Jiang, Shujun Huang, Zhenyuan Fu, Zexuan Sun, Ted M. Lakowski, Pingzhao Hu

Abstract: Drug combinations are frequently used for the treatment of cancer patients in order to increase efficacy, decrease adverse side effects, or overcome drug resistance. Given the enormous number of drug combinations, it is cost- and time-consuming to screen all possible drug pairs experimentally. Currently, it has not been fully explored to integrate multiple networks to predict synergistic drug comb… ▽ More Drug combinations are frequently used for the treatment of cancer patients in order to increase efficacy, decrease adverse side effects, or overcome drug resistance. Given the enormous number of drug combinations, it is cost- and time-consuming to screen all possible drug pairs experimentally. Currently, it has not been fully explored to integrate multiple networks to predict synergistic drug combinations using recently developed deep learning technologies. In this study, we proposed a Graph Convolutional Network (GCN) model to predict synergistic drug combinations in particular cancer cell lines. Specifically, the GCN method used a convolutional neural network model to do heterogeneous graph embedding, and thus solved a link prediction task. The graph in this study was a multimodal graph, which was constructed by integrating the drug-drug combination, drug-protein interaction, and protein-protein interaction networks. We found that the GCN model was able to correctly predict cell line-specific synergistic drug combinations from a large heterogonous network. The majority (30) of the 39 cell line-specific models show an area under the receiver operational characteristic curve (AUC) larger than 0.80, resulting in a mean AUC of 0.84. Moreover, we conducted an in-depth literature survey to investigate the top predicted drug combinations in specific cancer cell lines and found that many of them have been found to show synergistic antitumor activity against the same or other cancers in vitro or in vivo. Taken together, the results indicate that our study provides a promising way to better predict and optimize synergistic drug pairs in silico. △ Less

Submitted 23 November, 2019; originally announced November 2019.

arXiv:1911.10313 [pdf, other]

DTF: Deep Tensor Factorization for Predicting Anticancer Drug Synergy

Authors: Zexuan Sun, Shujun Huang, Peiran Jiang, Pingzhao Hu

Abstract: Motivation: Combination therapies have been widely used to treat cancers. However, it is cost- and time-consuming to experimentally screen synergistic drug pairs due to the enormous number of possible drug combinations. Thus, computational methods have become an important way to predict and prioritize synergistic drug pairs. Results: We proposed a Deep Tensor Factorization (DTF) model, which int… ▽ More Motivation: Combination therapies have been widely used to treat cancers. However, it is cost- and time-consuming to experimentally screen synergistic drug pairs due to the enormous number of possible drug combinations. Thus, computational methods have become an important way to predict and prioritize synergistic drug pairs. Results: We proposed a Deep Tensor Factorization (DTF) model, which integrated a tensor factorization method and a deep neural network (DNN), to predict drug synergy. The former extracts latent features from drug synergy information while the latter constructs a binary classifier to predict the drug synergy status. Compared to the tensor-based method, the DTF model performed better in predicting drug synergy. The area under the precision-recall curve (PR AUC) was 0.57 for DTF and 0.24 for the tensor method. We also compared the DTF model with DeepSynergy and logistic regression models and found that the DTF outperformed the logistic regression model and achieved almost the same performance as DeepSynergy using several typical metrics for the classification task. Applying the DTF model to predict missing entries in our drug-cell line tensor, we identified novel synergistic drug combinations for 10 cell lines from the 5 cancer types. A literature survey showed that some of these predicted drug synergies have been identified in vivo or in vitro. Thus, the DTF model could be valuable in silico tool for prioritizing novel synergistic drug combinations. △ Less

Submitted 16 September, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

Comments: Final draft in Bioinformatics, btaa287, https://doi.org/10.1093/bioinformatics/btaa287

arXiv:1911.02731 [pdf, other]

Statistical Analysis of Dynamic Functional Brain Networks in Twins

Authors: Moo K. Chung, Shih-Gu Huang, Tananun Songdechakraiwut, Ian C. Carroll, H. Hill Goldsmith

Abstract: Recent studies have shown that functional brain brainwork is dynamic even during rest. A common approach to modeling the brain network in whole brain resting-state fMRI is to compute the correlation between anatomical regions via sliding windows. However, the direct use of the sample correlation matrices is not reliable due to the image acquisition, processing noises and the use of discrete window… ▽ More Recent studies have shown that functional brain brainwork is dynamic even during rest. A common approach to modeling the brain network in whole brain resting-state fMRI is to compute the correlation between anatomical regions via sliding windows. However, the direct use of the sample correlation matrices is not reliable due to the image acquisition, processing noises and the use of discrete windows that often introduce spurious high-frequency fluctuations and the zig-zag pattern in the estimated time-varying correlation measures. To address the problem and obtain more robust correlation estimates, we propose the heat kernel based dynamic correlations. We demonstrate that the proposed heat kernel method can smooth out the unwanted high-frequency fluctuations in correlation estimations and achieve higher accuracy in identifying dynamically changing distinct states. The method is further used in determining if such dynamic state change is genetically heritable using a large-scale twin study. Various methodological challenges for analyzing paired twin dynamic networks are addressed. △ Less

Submitted 11 October, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

arXiv:1812.10050 [pdf, other]

Statistical Model for Dynamically-Changing Correlation Matrices with Application to Brain Connectivity

Authors: Shih-Gu Huang, S. Balqis Samdin, Chee-Ming Ting, Hernando Ombao, Moo K. Chung

Abstract: Background: Recent studies have indicated that functional connectivity is dynamic even during rest. A common approach to modeling the dynamic functional connectivity in whole-brain resting-state fMRI is to compute the correlation between anatomical regions via sliding time windows. However, the direct use of the sample correlation matrices is not reliable due to the image acquisition and processin… ▽ More Background: Recent studies have indicated that functional connectivity is dynamic even during rest. A common approach to modeling the dynamic functional connectivity in whole-brain resting-state fMRI is to compute the correlation between anatomical regions via sliding time windows. However, the direct use of the sample correlation matrices is not reliable due to the image acquisition and processing noises in resting-sate fMRI. New method: To overcome these limitations, we propose a new statistical model that smooths out the noise by exploiting the geometric structure of correlation matrices. The dynamic correlation matrix is modeled as a linear combination of symmetric positive-definite matrices combined with cosine series representation. The resulting smoothed dynamic correlation matrices are clustered into disjoint brain connectivity states using the k-means clustering algorithm. Results: The proposed model preserves the geometric structure of underlying physiological dynamic correlation, eliminates unwanted noise in connectivity and obtains more accurate state spaces. The difference in the estimated dynamic connectivity states between males and females is identified. Comparison with existing methods: We demonstrate that the proposed statistical model has less rapid state changes caused by noise and improves the accuracy in identifying and discriminating different states. Conclusions: We propose a new regression model on dynamically changing correlation matrices that provides better performance over existing windowed correlation and is more reliable for the modeling of dynamic connectivity. △ Less

Submitted 3 November, 2019; v1 submitted 25 December, 2018; originally announced December 2018.

Comments: Accepted for publication in Journal of Neuroscience Methods

arXiv:1811.10958 [pdf, other]

A Bayesian model of acquisition and clearance of bacterial colonization

Authors: Marko Järvenpää, Mohamad R. Abdul Sater, Georgia K. Lagoudas, Paul C. Blainey, Loren G. Miller, James A. McKinnell, Susan S. Huang, Yonatan H. Grad, Pekka Marttinen

Abstract: Bacterial populations that colonize a host play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a singl… ▽ More Bacterial populations that colonize a host play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a single population or distinct colonization events. A common solution to estimate acquisition and clearance rates is to use a fixed genetic distance threshold. However, this approach is often inadequate to account for the diversity of the underlying within-host evolving population, the time intervals between consecutive measurements, and the uncertainty in the estimated acquisition and clearance rates. Here, we summarize recently submitted work \cite{jarvenpaa2018named} and present a Bayesian model that provides probabilities of whether two strains should be considered the same, allowing to determine bacterial clearance and acquisition from genomes sampled over time. We explicitly model the within-host variation using population genetic simulation, and the inference is done by combining information from multiple data sources by using a combination of Approximate Bayesian Computation (ABC) and Markov Chain Monte Carlo (MCMC). We use the method to analyse a collection of methicillin resistant Staphylococcus aureus (MRSA) isolates. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/87

arXiv:1810.03687 [pdf]

Sequential Wnt Agonist then Antagonist Treatment Accelerates Tissue Repair and Minimizes Fibrosis

Authors: Xiao-Jun Tian, Dong Zhou, Haiyan Fu, Rong Zhang, Xiaojie Wang, Sui Huang, Youhua Liu, Jianhua Xing

Abstract: Tissue fibrosis compromises organ function and occurs as a potential long-term outcome in response to acute tissue injuries. Currently, lack of mechanistic understanding prevents effective prevention and treatment of the progression from acute injury to fibrosis. Here, we combined quantitative experimental studies with a mouse kidney injury model and a computational approach to determine how the p… ▽ More Tissue fibrosis compromises organ function and occurs as a potential long-term outcome in response to acute tissue injuries. Currently, lack of mechanistic understanding prevents effective prevention and treatment of the progression from acute injury to fibrosis. Here, we combined quantitative experimental studies with a mouse kidney injury model and a computational approach to determine how the physiological consequences are determined by the severity of ischemia injury, and to identify how to manipulate Wnt signaling to accelerate repair of ischemic tissue damage while minimizing fibrosis. The study reveals that Wnt-mediated memory of prior injury contributes to fibrosis progression, and ischemic preconditioning reduces the risk of death but increases the risk of fibrosis. Furthermore, we validated the prediction that sequential combination therapy of initial treatment with a Wnt agonist followed by treatment with a Wnt antagonist can reduce both the risk of death and fibrosis in response to acute injuries. △ Less

Submitted 4 July, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

arXiv:1611.09542 [pdf, other]

doi 10.1016/j.physd.2019.02.005

Time Dependent Saddle Node Bifurcation: Breaking Time and the Point of No Return in a Non-Autonomous Model of Critical Transitions

Authors: Jeremiah Li, Felix X. -F. Ye, Hong Qian, Sui Huang

Abstract: There is a growing awareness that catastrophic phenomena in biology and medicine can be mathematically represented in terms of saddle-node bifurcations. In particular, the term `tipping', or critical transition has in recent years entered the discourse of the general public in relation to ecology, medicine, and public health. The saddle-node bifurcation and its associated theory of catastrophe as… ▽ More There is a growing awareness that catastrophic phenomena in biology and medicine can be mathematically represented in terms of saddle-node bifurcations. In particular, the term `tipping', or critical transition has in recent years entered the discourse of the general public in relation to ecology, medicine, and public health. The saddle-node bifurcation and its associated theory of catastrophe as put forth by Thom and Zeeman has seen applications in a wide range of fields including molecular biophysics, mesoscopic physics, and climate science. In this paper, we investigate a simple model of a non-autonomous system with a time-dependent parameter $p(τ)$ and its corresponding `dynamic' (time-dependent) saddle-node bifurcation by the modern theory of non-autonomous dynamical systems. We show that the actual point of no return for a system undergoing tipping can be significantly delayed in comparison to the {\em breaking time} $\hatτ$ at which the corresponding autonomous system with a time-independent parameter $p_{a}= p(\hatτ)$ undergoes a bifurcation. A dimensionless parameter $α=λp_0^3V^{-2}$ is introduced, in which $λ$ is the curvature of the autonomous saddle-node bifurcation according to parameter $p(τ)$, which has an initial value of $p_{0}$ and a constant rate of change $V$. We find that the breaking time $\hatτ$ is always less than the actual point of no return $τ^*$ after which the critical transition is irreversible; specifically, the relation $τ^*-\hatτ\simeq 2.338(λV)^{-\frac{1}{3}}$ is analytically obtained. For a system with a small $λV$, there exists a significant window of opportunity $(\hatτ,τ^*)$ during which rapid reversal of the environment can save the system from catastrophe. △ Less

Submitted 3 January, 2019; v1 submitted 29 November, 2016; originally announced November 2016.

arXiv:1602.04721 [pdf, ps, other]

Evaluating hospital infection control measures for antimicrobial-resistant pathogens using stochastic transmission models: application to Vancomycin-Resistant Enterococci in intensive care units

Authors: Yinghui Wei, Theodore Kypraios, Philip D. O'Neill, Susan S. Huang, Sheryl L. Rifas-Shiman, Ben S. Cooper

Abstract: Nosocomial pathogens such as Methicillin-Resistant {\em Staphylococcus aureus} (MRSA) and Vancomycin-resistant {\em Enterococci} (VRE) are the cause of significant morbidity and mortality among hospital patients. It is important to be able to assess the efficacy of control measures using data on patient outcomes. In this paper we describe methods for analysing such data using patient-level stochas… ▽ More Nosocomial pathogens such as Methicillin-Resistant {\em Staphylococcus aureus} (MRSA) and Vancomycin-resistant {\em Enterococci} (VRE) are the cause of significant morbidity and mortality among hospital patients. It is important to be able to assess the efficacy of control measures using data on patient outcomes. In this paper we describe methods for analysing such data using patient-level stochastic models which seek to describe the underlying unobserved process of transmission. The methods are applied to detailed longitudinal patient-level data on VRE from a study in a US hospital with eight intensive care units (ICUs). The data comprise admission and discharge dates, dates and results of screening tests, and dates during which precautionary measures were in place for each patient during the study period. Results include estimates of the efficacy of the control measures, the proportion of unobserved patients colonized with VRE and the proportion of patients colonized on admission. △ Less

Submitted 15 February, 2016; originally announced February 2016.

arXiv:1510.05918 [pdf]

doi 10.1016/j.ygeno.2016.01.008

New thoughts on an old riddle: what determines genetic diversity within and between species?

Authors: Shi Huang

Abstract: The question of what determines genetic diversity both between and within species has long remained unsolved by the modern evolutionary theory (MET). However, it has not deterred researchers from producing interpretations of genetic diversity by using MET. We here examine the two key experimental observations of genetic diversity made in the 1960s, one between species and the other within a popula… ▽ More The question of what determines genetic diversity both between and within species has long remained unsolved by the modern evolutionary theory (MET). However, it has not deterred researchers from producing interpretations of genetic diversity by using MET. We here examine the two key experimental observations of genetic diversity made in the 1960s, one between species and the other within a population of a species, that directly contributed to the development of MET. The interpretations of these observations as well as the assumptions by MET are widely known to be inadequate. We review the recent progress of an alternative framework, the maximum genetic diversity (MGD) hypothesis, that uses axioms and natural selection to explain the vast majority of genetic diversity as being at optimum equilibrium that is largely determined by organismal complexity. The MGD hypothesis fully absorbs the proven virtues of MET and considers its assumptions relevant only to a much more limited scope. This new synthesis has accounted for the much overlooked phenomenon of progression towards higher complexity, and more importantly, been instrumental in directing productive research into both evolutionary and biomedical problems. △ Less

Submitted 21 October, 2015; v1 submitted 20 October, 2015; originally announced October 2015.

Comments: 23 pages, 1 figure

Journal ref: Genomics, 108: 3-10 (2016)

arXiv:1501.04709 [pdf]

doi 10.1038/srep16361

Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering

Authors: Chris Gaiteri, Mingming Chen, Boleslaw Szymanski, Konstantin Kuzmin, Jierui Xie, Changkyu Lee, Timothy Blanche, Elias Chaibub Neto, Su-Chun Huang, Thomas Grabowski, Tara Madhyastha, Vitalina Komashko

Abstract: Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods… ▽ More Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities, and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities, and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging. △ Less

Submitted 25 February, 2015; v1 submitted 19 January, 2015; originally announced January 2015.

Journal ref: Scientific Reports 5, Article number: 16361 (2015)

arXiv:1407.6117 [pdf]

doi 10.1016/j.biosystems.2016.03.002

Relative Stability of Network States in Boolean Network Models of Gene Regulation in Development

Authors: Joseph Xu Zhou, Areejit Samal, Aymeric Fouquier d'Hèrouël, Nathan D. Price, Sui Huang

Abstract: Progress in cell type reprogramming has revived the interest in Waddington's concept of the epigenetic landscape. Recently researchers developed the quasi-potential theory to represent the Waddington's landscape. The Quasi-potential U(x), derived from interactions in the gene regulatory network (GRN) of a cell, quantifies the relative stability of network states, which determine the effort require… ▽ More Progress in cell type reprogramming has revived the interest in Waddington's concept of the epigenetic landscape. Recently researchers developed the quasi-potential theory to represent the Waddington's landscape. The Quasi-potential U(x), derived from interactions in the gene regulatory network (GRN) of a cell, quantifies the relative stability of network states, which determine the effort required for state transitions in a multi-stable dynamical system. However, quasi-potential landscapes, originally developed for continuous systems, are not suitable for discrete-valued networks which are important tools to study complex systems. In this paper, we provide a framework to quantify the landscape for discrete Boolean networks (BNs). We apply our framework to study pancreas cell differentiation where an ensemble of BN models is considered based on the structure of a minimal GRN for pancreas development. We impose biologically motivated structural constraints (corresponding to specific type of Boolean functions) and dynamical constraints (corresponding to stable attractor states) to limit the space of BN models for pancreas development. In addition, we enforce a novel functional constraint corresponding to the relative ordering of attractor states in BN models to restrict the space of BN models to the biological relevant class. We find that BNs with canalyzing/sign-compatible Boolean functions best capture the dynamics of pancreas cell differentiation. This framework can also determine the genes' influence on cell state transitions, and thus can facilitate the rational design of cell reprogramming protocols. △ Less

Submitted 12 October, 2015; v1 submitted 23 July, 2014; originally announced July 2014.

Comments: 24 pages, 6 figures, 1 table

Journal ref: Biosystems 142-143:15-24 (2016)

arXiv:1403.7924 [pdf]

doi 10.1016/j.desal.2016.08.011

Modified Kedem-Katchalsky equations for osmosis through nano-pore

Authors: Liangsuo Shu, Xiaokang Liu, Yingjie Li, Baoxue Yang, Suyi Huang, Yixin Lin, Shiping Jin

Abstract: This work presents a modified Kedem-Katchalsky equations for osmosis through nano-pore. osmotic reflection coefficient of a solute was found to be chiefly affected by the entrance of the pore while filtration reflection coefficient can be affected by both the entrance and the internal structure of the pore. Using an analytical method, we get the quantitative relationship between osmotic reflection… ▽ More This work presents a modified Kedem-Katchalsky equations for osmosis through nano-pore. osmotic reflection coefficient of a solute was found to be chiefly affected by the entrance of the pore while filtration reflection coefficient can be affected by both the entrance and the internal structure of the pore. Using an analytical method, we get the quantitative relationship between osmotic reflection coefficient and the molecule size. The model is verified by comparing the theoretical results with the reported experimental data of aquaporin osmosis. Our work is expected to pave the way for a better understanding of osmosis in bio-system and to give us new ideas in designing new membranes with better performance. △ Less

Submitted 30 April, 2016; v1 submitted 31 March, 2014; originally announced March 2014.

Comments: 19 pages, 4 figures

MSC Class: 92C05

Journal ref: Desalination.399(2016)47-52

arXiv:1402.0136 [pdf, other]

IsoDOT Detects Differential RNA-isoform Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and Specificity

Authors: Wei Sun, Yufeng Liu, James J. Crowley, Ting-Huei Chen, Hua Zhou, Haitao Chu, Shunping Huang, Pei-Fen Kuan, Yuan Li, Darla Miller, Ginger Shaw, Yichao Wu, Vasyl Zhabotynsky, Leonard McMillan, Fei Zou, Patrick F. Sullivan, Fernando Pardo-Manuel de Villena

Abstract: We have developed a statistical method named IsoDOT to assess differential isoform expression (DIE) and differential isoform usage (DIU) using RNA-seq data. Here isoform usage refers to relative isoform expression given the total expression of the corresponding gene. IsoDOT performs two tasks that cannot be accomplished by existing methods: to test DIE/DIU with respect to a continuous covariate, a… ▽ More We have developed a statistical method named IsoDOT to assess differential isoform expression (DIE) and differential isoform usage (DIU) using RNA-seq data. Here isoform usage refers to relative isoform expression given the total expression of the corresponding gene. IsoDOT performs two tasks that cannot be accomplished by existing methods: to test DIE/DIU with respect to a continuous covariate, and to test DIE/DIU for one case versus one control. The latter task is not an uncommon situation in practice, e.g., comparing paternal and maternal allele of one individual or comparing tumor and normal sample of one cancer patient. Simulation studies demonstrate the high sensitivity and specificity of IsoDOT. We apply IsoDOT to study the effects of haloperidol treatment on mouse transcriptome and identify a group of genes whose isoform usages respond to haloperidol treatment. △ Less

Submitted 29 October, 2014; v1 submitted 1 February, 2014; originally announced February 2014.

arXiv:1303.1788 [pdf, other]

doi 10.1002/gepi.21808

Poly-Omic Prediction of Complex Traits: OmicKriging

Authors: Heather E. Wheeler, Keston Aquino-Michaels, Eric R. Gamazon, Vassily V. Trubetskoy, M. Eileen Dolan, R. Stephanie Huang, Nancy J. Cox, Hae Kyung Im

Abstract: High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevanc… ▽ More High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. We also integrate genotype and expression data to predict change in LDL cholesterol levels after statin treatment and show that OmicKriging performs better than the polygenic score method. We provide an R package to implement OmicKriging. △ Less

Submitted 12 September, 2013; v1 submitted 7 March, 2013; originally announced March 2013.

arXiv:1302.7276 [pdf]

doi 10.1016/j.ygeno.2015.04.002

Role of genetic polymorphisms in transgenerational inheritance in budding yeast

Authors: Zuobin Zhu, Qing Lu, Dejian Yuan, Yanke Li, Xian Man, Yueran Zhu, Shi Huang

Abstract: Transgenerational inheritance of a trait is presumably affected by both genetic and environmental factors but remains poorly understood. We studied the effect of genetic polymorphisms on transgenerational inheritance of yeast segregants that were derived from a cross between a laboratory strain and a wild strain of Saccharomyces cerevisiae. For each SNP analyzed, the parental allele present in les… ▽ More Transgenerational inheritance of a trait is presumably affected by both genetic and environmental factors but remains poorly understood. We studied the effect of genetic polymorphisms on transgenerational inheritance of yeast segregants that were derived from a cross between a laboratory strain and a wild strain of Saccharomyces cerevisiae. For each SNP analyzed, the parental allele present in less than half of the segregants panel was called the minor allele (MA). We found a nonrandom distribution of MAs in the segregants, indicating natural selection. We compared segregants with high MA content (MAC) relative to those with less and found a more dramatic shortening of the lag phase length for the high MAC group in response to 14 days of ethanol training. Also, the short lag phase as acquired and epigenetically memorized by ethanol training was more dramatically lost after 7 days of recovery in ethanol free medium for the high MAC group. Sodium chloride treatment produced similar observations. Using public datasets, we found MAC linkage to mRNA expression of hundreds of genes. Finally, we found preferential effect of MAC on traits with high number of known additive quantitative trait loci (QTLs). These results provide evidence for the slightly deleterious nature of most MAs and a lower capacity to maintain inheritance of traits in individuals or cells with greater MAC, which have implications for disease prevention and treatment and the "missing heritability" problem in complex traits and diseases. △ Less

Submitted 12 July, 2013; v1 submitted 27 February, 2013; originally announced February 2013.

Comments: 22 pages, 3 figures, 1 table, 7 supplementary tables

Journal ref: Genomics, 106: 23-29 (2015)

arXiv:1212.0661 [pdf]

GWAPP: A Web Application for Genome-wide Association Mapping in A. thaliana

Authors: Ümit Seren, Bjarni J. Vilhjálmsson, Matthew W. Horton, Dazhe Meng, Petar Forai, Yu S. Huang, Quan Long, Vincent Segura, Magnus Nordborg

Abstract: Arabidopsis thaliana is an important model organism for understanding the genetics and molecular biology of plants. Its highly selfing nature, together with other important features, such as small size, short generation time, small genome size, and wide geographic distribution, make it an ideal model organism for understanding natural variation. Genome-wide association studies (GWAS) have proven… ▽ More Arabidopsis thaliana is an important model organism for understanding the genetics and molecular biology of plants. Its highly selfing nature, together with other important features, such as small size, short generation time, small genome size, and wide geographic distribution, make it an ideal model organism for understanding natural variation. Genome-wide association studies (GWAS) have proven a useful technique for identifying genetic loci responsible for natural variation in A. thaliana. Previously genotyped accessions (natural inbred lines) can be grown in replicate under different conditions, and phenotyped for different traits. These important features greatly simplify association mapping of traits and allow for systematic dissection of the genetics of natural variation by the entire Arabidopsis community. To facilitate this, we present GWAPP, an interactive web-based application for conducting GWAS in A. thaliana. Using an efficient Python implementation of a linear mixed model, traits measured for a subset of 1386 publicly available ecotypes can be uploaded and mapped with an efficient mixed model and other methods in just a couple of minutes. GWAPP features an extensive, interactive, and a user-friendly interface that includes interactive manhattan plots and interactive local and genome-wide LD plots. It facilitates exploratory data analysis by implementing features such as the inclusion of candidate SNPs in the model as cofactors. △ Less

Submitted 10 December, 2012; v1 submitted 4 December, 2012; originally announced December 2012.

Comments: Submitted to The Plant Cell (http://www.plantcell.org/) 42 pages with 15 figures

arXiv:1209.2911 [pdf]

doi 10.1007/s11427-014-4704-4

Methods for scoring the collective effect of SNPs: Minor alleles of common SNPs quantitatively affect traits/diseases and are under both positive and negative selection

Authors: Dejian Yuan, Zuobin Zhu, Xiaohua Tan, Jie Liang, Ceng Zeng, Jiegen Zhang, Jun Chen, Long Ma, Ayca Dogan, Gudrun Brockmann, Oliver Goldmann, Eva Medina, Amanda D. Rice, Richard W. Moyer, Xian Man, Ke Yi, Yanke Li, Qing Lu, Yimin Huang, Dapeng Wang, Jun Yu, Hui Guo, Kun Xia, Shi Huang

Abstract: Most common SNPs are popularly assumed to be neutral. We here developed novel methods to examine in animal models and humans whether extreme amount of minor alleles (MAs) carried by an individual may represent extreme trait values and common diseases. We analyzed panels of genetic reference populations and identified the MAs in each panel and the MA content (MAC) that each strain carried. We also… ▽ More Most common SNPs are popularly assumed to be neutral. We here developed novel methods to examine in animal models and humans whether extreme amount of minor alleles (MAs) carried by an individual may represent extreme trait values and common diseases. We analyzed panels of genetic reference populations and identified the MAs in each panel and the MA content (MAC) that each strain carried. We also analyzed 21 published GWAS datasets of human diseases and identified the MAC of each case or control. MAC was nearly linearly linked to quantitative variations in numerous traits in model organisms, including life span, tumor susceptibility, learning and memory, sensitivity to alcohol and anti-psychotic drugs, and two correlated traits poor reproductive fitness and strong immunity. Similarly, in Europeans or European Americans, enrichment of MAs of fast but not slow evolutionary rate was linked to autoimmune and numerous other diseases, including type 2 diabetes, Parkinson's disease, psychiatric disorders, alcohol and cocaine addictions, cancer, and less life span. Therefore, both high and low MAC correlated with extreme values in many traits, indicating stabilizing selection on most MAs. The methods here are broadly applicable and may help solve the missing heritability problem in complex traits and diseases. △ Less

Submitted 15 July, 2013; v1 submitted 12 September, 2012; originally announced September 2012.

Journal ref: Sci China Life Sci. 57:876-888. (2014)

arXiv:1206.2311 [pdf]

Quasi-potential landscape in complex multi-stable systems

Authors: Joseph Xu Zhou, M. D. S. Aliyu, Erik Aurell, Sui Huang

Abstract: Developmental dynamics of multicellular organism is a process that takes place in a multi-stable system in which each attractor state represents a cell type and attractor transitions correspond to cell differentiation paths. This new understanding has revived the idea of a quasi-potential landscape, first proposed by Waddington as a metaphor. To describe development one is interested in the "relat… ▽ More Developmental dynamics of multicellular organism is a process that takes place in a multi-stable system in which each attractor state represents a cell type and attractor transitions correspond to cell differentiation paths. This new understanding has revived the idea of a quasi-potential landscape, first proposed by Waddington as a metaphor. To describe development one is interested in the "relative stabilities" of N attractors (N>2). Existing theories of state transition between local minima on some potential landscape deal with the exit in the transition between a pair attractor but do not offer the notion of a global potential function that relate more than two attractors to each other. Several ad hoc methods have been used in systems biology to compute a landscape in non-gradient systems, such as gene regulatory networks. Here we present an overview of the currently available methods, discuss their limitations and propose a new decomposition of vector fields that permit the computation of a quasi-potential function that is equivalent to the Freidlin-Wentzell potential but is not limited to two attractors. Several examples of decomposition are given and the significance of such a quasi-potential function is discussed. △ Less

Submitted 11 June, 2012; originally announced June 2012.

Comments: 30 pages, 6 figures

arXiv:0903.4215 [pdf, other]

doi 10.1016/j.jtbi.2009.07.005

A Model of Sequential Branching in Hierarchical Cell Fate Determination

Authors: David V. Foster, Jacob G. Foster, Sui Huang, Stuart A. Kauffman

Abstract: Multipotent stem or progenitor cells undergo a sequential series of binary fate decisions, which ultimately generate the diversity of differentiated cells. Efforts to understand cell fate control have focused on simple gene regulatory circuits that predict the presence of multiple stable states, bifurcations and switch-like transitions. However, existing gene network models do not explain more c… ▽ More Multipotent stem or progenitor cells undergo a sequential series of binary fate decisions, which ultimately generate the diversity of differentiated cells. Efforts to understand cell fate control have focused on simple gene regulatory circuits that predict the presence of multiple stable states, bifurcations and switch-like transitions. However, existing gene network models do not explain more complex properties of cell fate dynamics such as the hierarchical branching of developmental paths. Here, we construct a generic minimal model of the genetic regulatory network controlling cell fate determination, which exhibits five elementary characteristics of cell differentiation: stability, directionality, branching, exclusivity, and promiscuous expression. We argue that a modular architecture comprising repeated network elements reproduces these features of differentiation by sequentially repressing selected modules and hence restricting the dynamics to lower dimensional subspaces of the high-dimensional state space. We implement our model both with ordinary differential equations (ODEs), to explore the role of bifurcations in producing the one-way character of differentiation, and with stochastic differential equations (SDEs), to demonstrate the effect of noise on the system. We further argue that binary cell fate decisions are prevalent in cell differentiation due to general features of the underlying dynamical system. This minimal model makes testable predictions about the structural basis for directional, discrete and diversifying cell phenotype development and thus can guide the evaluation of real gene regulatory networks that govern differentiation. △ Less

Submitted 14 September, 2009; v1 submitted 24 March, 2009; originally announced March 2009.

Comments: 10 pages, 7 figures

Journal ref: Journal of Theoretical Biology 260 (2009) pp. 589-597

arXiv:0809.1716 [pdf]

doi 10.1038/emboj.2008.64

Functioning of the dimeric GABA(B) receptor extracellular domain revealed by glycan wedge scanning

Authors: Philippe Rondard, Siluo Huang, Carine Monnier, Haijun Tu, Bertrand Blanchard, Nadia Oueslati, Fanny Malhaire, Ying Li, Eric Trinquet, Gilles Labesse, Jean-Philippe Pin, Jianfeng Liu

Abstract: The G-protein-coupled receptor (GPCR) activated by the neurotransmitter GABA is made up of two subunits, GABA(B1) and GABA(B2). GABA(B1) binds agonists, whereas GABA(B2) is required for trafficking GABA(B1) to the cell surface, increasing agonist affinity to GABA(B1), and activating associated G proteins. These subunits each comprise two domains, a Venus flytrap domain (VFT) and a heptahelical t… ▽ More The G-protein-coupled receptor (GPCR) activated by the neurotransmitter GABA is made up of two subunits, GABA(B1) and GABA(B2). GABA(B1) binds agonists, whereas GABA(B2) is required for trafficking GABA(B1) to the cell surface, increasing agonist affinity to GABA(B1), and activating associated G proteins. These subunits each comprise two domains, a Venus flytrap domain (VFT) and a heptahelical transmembrane domain (7TM). How agonist binding to the GABA(B1) VFT leads to GABA(B2) 7TM activation remains unknown. Here, we used a glycan wedge scanning approach to investigate how the GABA(B) VFT dimer controls receptor activity. We first identified the dimerization interface using a bioinformatics approach and then showed that introducing an N-glycan at this interface prevents the association of the two subunits and abolishes all activities of GABA(B2), including agonist activation of the G protein. We also identified a second region in the VFT where insertion of an N-glycan does not prevent dimerization, but blocks agonist activation of the receptor. These data provide new insight into the function of this prototypical GPCR and demonstrate that a change in the dimerization interface is required for receptor activation. △ Less

Submitted 10 September, 2008; originally announced September 2008.

Journal ref: The EMBO Journal 27, 9 (2008) 1321-1332

Showing 1–46 of 46 results for author: Huang, S