-
Hypergraph: A Unified and Uniform Definition with Application to Chemical Hypergraph and More
Authors:
Daniel T. Chang
Abstract:
The conventional definition of hypergraph has two major issues: (1) there is not a standard definition of directed hypergraph and (2) there is not a formal definition of nested hypergraph. To resolve these issues, we propose a new definition of hypergraph that unifies the concepts of undirected, directed and nested hypergraphs, and that is uniform in using hyperedge as a single construct for repre…
▽ More
The conventional definition of hypergraph has two major issues: (1) there is not a standard definition of directed hypergraph and (2) there is not a formal definition of nested hypergraph. To resolve these issues, we propose a new definition of hypergraph that unifies the concepts of undirected, directed and nested hypergraphs, and that is uniform in using hyperedge as a single construct for representing high-order correlations among things, i.e., nodes and hyperedges. Specifically, we define a hyperedge to be a simple hyperedge, a nesting hyperedge, or a directed hyperedge. With this new definition, a hypergraph is nested if it has nesting hyperedge(s), and is directed if it has directed hyperedge(s). Otherwise, a hypergraph is a simple hypergraph. The uniformity and power of this new definition, with visualization, should facilitate the use of hypergraph for representing (hierarchical) high-order correlations in general and chemical systems in particular. Graph has been widely used as a mathematical structure for machine learning on molecular structures and 3D molecular geometries. However, graph has a major limitation: it can represent only pairwise correlations between nodes. Hypergraph extends graph with high-order correlations among nodes. This extension is significant or essential for machine learning on chemical systems. For molecules, this is significant as it allows the direct, explicit representation of multicenter bonds and molecular substructures. For chemical reactions, this is essential since most chemical reactions involve multiple participants. We propose the use of chemical hypergraph, a multilevel hypergraph with simple, nesting and directed hyperedges, as a single mathematical structure for representing chemical systems. We apply the new definition of hypergraph to chemical hypergraph and, as simplified versions, molecular hypergraph and chemical reaction hypergraph.
△ Less
Submitted 21 October, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
COVID-19 Regional Waves and Spread Risk Assessment through the Analysis of the Initial Outbreak in Guatemala
Authors:
Juan Adolfo Ponciano,
Juan Diego Chang,
Mariela Abdalah,
Kevin Facey,
José Miguel Ponciano
Abstract:
The initial surge of the COVID-19 pandemic hit Guatemala on March 2020. On a country scale, the epidemic has undergone a fairly well-known and distinguishable initial phase, reaching its peak on mid July 2020. However, the detailed picture is more involved and reflects inter-regional variations in the epidemic dynamics, presumably grounded on socio-demographic, connectivity, and human mobility fac…
▽ More
The initial surge of the COVID-19 pandemic hit Guatemala on March 2020. On a country scale, the epidemic has undergone a fairly well-known and distinguishable initial phase, reaching its peak on mid July 2020. However, the detailed picture is more involved and reflects inter-regional variations in the epidemic dynamics, presumably grounded on socio-demographic, connectivity, and human mobility factors. Classifying the regional epidemic curves and identifying the major hubs of regional COVID-19 spread can contribute towards defining an evidence-based risk map for future outbreaks of infectious diseases with similar transmissibility properties. In this work, we make a regional wave decomposition of the initial epidemic phase registered in Guatemala, and we use the Richards phenomenological model alongside multivariate ordination techniques of its estimated model parameters to draw a countrywide picture of the first epidemiological wave. By exploring similarities in the model space parameters, we traced routes for the disease spread across the country. We evaluated how well the proposed classification can help to define a regional risk hierarchy comprising early stage focal points, major hubs, and secondary regions of epidemic progression.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Distance-Geometric Graph Attention Network (DG-GAT) for 3D Molecular Geometry
Authors:
Daniel T. Chang
Abstract:
Deep learning for molecular science has so far mainly focused on 2D molecular graphs. Recently, however, there has been work to extend it to 3D molecular geometry, due to its scientific significance and critical importance in real-world applications. The 3D distance-geometric graph representation (DG-GR) adopts a unified scheme (distance) for representing the geometry of 3D graphs. It is invariant…
▽ More
Deep learning for molecular science has so far mainly focused on 2D molecular graphs. Recently, however, there has been work to extend it to 3D molecular geometry, due to its scientific significance and critical importance in real-world applications. The 3D distance-geometric graph representation (DG-GR) adopts a unified scheme (distance) for representing the geometry of 3D graphs. It is invariant to rotation and translation of the graph, and it reflects pair-wise node interactions and their generally local nature, particularly relevant for 3D molecular geometry. To facilitate the incorporation of 3D molecular geometry in deep learning for molecular science, we adopt the new graph attention network with dynamic attention (GATv2) for use with DG-GR and propose the 3D distance-geometric graph attention network (DG-GAT). GATv2 is a great fit for DG-GR since the attention can vary by node and by distance between nodes. Experimental results of DG-GAT for the ESOL and FreeSolv datasets show major improvement (31% and 38%, respectively) over those of the standard graph convolution network based on 2D molecular graphs. The same is true for the QM9 dataset. Our work demonstrates the utility and value of DG-GAT for deep learning based on 3D molecular geometry.
△ Less
Submitted 16 July, 2022;
originally announced July 2022.
-
Deep Learning for Molecular Graphs with Tiered Graph Autoencoders and Graph Prediction
Authors:
Daniel T. Chang
Abstract:
Tiered graph autoencoders provide the architecture and mechanisms for learning tiered latent representations and latent spaces for molecular graphs that explicitly represent and utilize groups (e.g., functional groups). This enables the utilization and exploration of tiered molecular latent spaces, either individually - the node (atom) tier, the group tier, or the graph (molecule) tier - or jointl…
▽ More
Tiered graph autoencoders provide the architecture and mechanisms for learning tiered latent representations and latent spaces for molecular graphs that explicitly represent and utilize groups (e.g., functional groups). This enables the utilization and exploration of tiered molecular latent spaces, either individually - the node (atom) tier, the group tier, or the graph (molecule) tier - or jointly, as well as navigation across the tiers. In this paper, we discuss the use of tiered graph autoencoders together with graph prediction for molecular graphs. We show features of molecular graphs used, and groups in molecular graphs identified for some sample molecules. We briefly review graph prediction and the QM9 dataset for background information, and discuss the use of tiered graph embeddings for graph prediction, particularly weighted group pooling. We find that functional groups and ring groups effectively capture and represent the chemical essence of molecular graphs (structures). Further, tiered graph autoencoders and graph prediction together provide effective, efficient and interpretable deep learning for molecular graphs, with the former providing unsupervised, transferable learning and the latter providing supervised, task-optimized learning.
△ Less
Submitted 1 July, 2021; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Using a hydrogen-bond index to predict the gene-silencing efficiency of siRNA based on the local structure of mRNA
Authors:
Kathy Q. Luo,
Donald C. Chang
Abstract:
The gene silencing effect of short interfering RNA (siRNA) is known to vary strongly with the targeted position of the mRNA. A number of hypotheses have been suggested to explain this phenomenon. We would like to test if this positional effect is mainly due to the secondary structure of the mRNA at the target site. We proposed that this structural factor can be characterized by a single parameter…
▽ More
The gene silencing effect of short interfering RNA (siRNA) is known to vary strongly with the targeted position of the mRNA. A number of hypotheses have been suggested to explain this phenomenon. We would like to test if this positional effect is mainly due to the secondary structure of the mRNA at the target site. We proposed that this structural factor can be characterized by a single parameter called "the hydrogen bond (H-b) index", which represents the average number of hydrogen bonds formed between nucleotides in the target region and the rest of the mRNA. This index can be determined using a computational approach. We tested the correlation between the H-b index and the gene-silencing effects on three genes (Bcl-2, hTF and cyclin B1) using a variety of siRNAs. We found that the gene-silencing effect is inversely dependent on the H-b index, indicating that the local mRNA structure at the targeted site is the main cause of the positional effect. Based on this finding, we suggest that the H-b index can be a useful guideline for future siRNA design.
△ Less
Submitted 20 October, 2017;
originally announced October 2017.
-
Distinction between the Preneoplastic and Neoplastic State of Murine Mammary Glands by spin-echo NMR
Authors:
C. F. Hazlewood,
D. C. Chang,
D. Medina,
G. Cleveland,
B. L. Nichols
Abstract:
We have, using spin-echo nuclear magnetic resonance spectroscopy, measured the relaxation times and diffusion coefficient of water protons in primary mammary adenocarcinomas of mice. In our biological model, three morphological stages were defined: (a) mammary gland tissue from pregnant mice, (b) preneoplastic nodules, and (c) neoplastic tissue. It was found that neoplastic tissues could be distin…
▽ More
We have, using spin-echo nuclear magnetic resonance spectroscopy, measured the relaxation times and diffusion coefficient of water protons in primary mammary adenocarcinomas of mice. In our biological model, three morphological stages were defined: (a) mammary gland tissue from pregnant mice, (b) preneoplastic nodules, and (c) neoplastic tissue. It was found that neoplastic tissues could be distinguished from normal and prenoeplastic tissue. Spin-spin and spin-lattice relaxation times and the diffusion coefficient of water protons are increased in the neoplastic tissue relative to mammary gland tissue from pregnant mice and preneoplastic nodule tissue. These results suggested that one can use a pulsed NMR method to detect and even predict breast cancer.
△ Less
Submitted 1 March, 2014;
originally announced March 2014.
-
Using biophotonics to study signaling mechanisms in a single living cell
Authors:
Donald C. Chang
Abstract:
To illustrate the power of the biophysical approach in solving important problems in life science, I present here one of our current research projects as an example. We have developed special biophotonic techniques to study the dynamic properties of signaling proteins in a single living cell. Such a study allowed us to gain new insight into the signaling mechanism that regulates programmed cell de…
▽ More
To illustrate the power of the biophysical approach in solving important problems in life science, I present here one of our current research projects as an example. We have developed special biophotonic techniques to study the dynamic properties of signaling proteins in a single living cell. Such a study allowed us to gain new insight into the signaling mechanism that regulates programmed cell death.
△ Less
Submitted 8 January, 2014;
originally announced January 2014.
-
Neutral genomic regions refine models of recent rapid human population growth
Authors:
Elodie Gazave,
Li Ma,
Diana Chang,
Alex Coventry,
Feng Gao,
Donna Muzny,
Eric Boerwinkle,
Richard Gibbs,
Charles F. Sing,
Andrew G. Clark,
Alon Keinan
Abstract:
Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants, and provided clear evidence of recent rapid growth in effective population size, though estimates have varied greatly among studies. All these studies were based on protein-coding genes, in which variants ar…
▽ More
Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants, and provided clear evidence of recent rapid growth in effective population size, though estimates have varied greatly among studies. All these studies were based on protein-coding genes, in which variants are also impacted by natural selection. In this study, we introduce targeted sequencing data for studying recent human history with minimal confounding by natural selection. We sequenced loci very far from genes that meet a wide array of additional criteria such that mutations in these loci are putatively neutral. As population structure also skews allele frequencies, we sequenced a sample of relatively homogeneous ancestry by first analyzing the population structure of 9,716 European Americans. We employed very high coverage sequencing to reliably call rare variants, and fit an extensive array of models of recent European demographic history to the site frequency spectrum. The best-fit model estimates ~3.4% growth per generation during the last ~140 generations, resulting in a population size increase of two orders of magnitude. This model fits the data very well, largely due to our observation that assumptions of more ancient demography can impact estimates of recent growth. This observation and results also shed light on the discrepancy in demographic estimates among recent studies.
△ Less
Submitted 15 November, 2013; v1 submitted 25 September, 2013;
originally announced September 2013.