-
Integrating spatially-resolved transcriptomics data across tissues and individuals: challenges and opportunities
Authors:
Boyi Guo,
Wodan Ling,
Sang Ho Kwon,
Pratibha Panwar,
Shila Ghazanfar,
Keri Martinowich,
Stephanie C. Hicks
Abstract:
Advances in spatially-resolved transcriptomics (SRT) technologies have propelled the development of new computational analysis methods to unlock biological insights. As the cost of generating these data decreases, these technologies provide an exciting opportunity to create large-scale atlases that integrate SRT data across multiple tissues, individuals, species, or phenotypes to perform populatio…
▽ More
Advances in spatially-resolved transcriptomics (SRT) technologies have propelled the development of new computational analysis methods to unlock biological insights. As the cost of generating these data decreases, these technologies provide an exciting opportunity to create large-scale atlases that integrate SRT data across multiple tissues, individuals, species, or phenotypes to perform population-level analyses. Here, we describe unique challenges of varying spatial resolutions in SRT data, as well as highlight the opportunities for standardized preprocessing methods along with computational algorithms amenable to atlas-scale datasets leading to improved sensitivity and reproducibility in the future.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets
Authors:
Sean K. Maden,
Sang Ho Kwon,
Louise A. Huuki-Myers,
Leonardo Collado-Torres,
Stephanie C. Hicks,
Kristen R. Maynard
Abstract:
Deconvolution of cell mixtures in "bulk" transcriptomic samples from homogenate human tissue is important for understanding the pathologies of diseases. However, several experimental and computational challenges remain in developing and implementing transcriptomics-based deconvolution approaches, especially those using a single cell/nuclei RNA-seq reference atlas, which are becoming rapidly availa…
▽ More
Deconvolution of cell mixtures in "bulk" transcriptomic samples from homogenate human tissue is important for understanding the pathologies of diseases. However, several experimental and computational challenges remain in developing and implementing transcriptomics-based deconvolution approaches, especially those using a single cell/nuclei RNA-seq reference atlas, which are becoming rapidly available across many tissues. Notably, deconvolution algorithms are frequently developed using samples from tissues with similar cell sizes. However, brain tissue or immune cell populations have cell types with substantially different cell sizes, total mRNA expression, and transcriptional activity. When existing deconvolution approaches are applied to these tissues, these systematic differences in cell sizes and transcriptomic activity confound accurate cell proportion estimates and instead may quantify total mRNA content. Furthermore, there is a lack of standard reference atlases and computational approaches to facilitate integrative analyses, including not only bulk and single cell/nuclei RNA-seq data, but also new data modalities from spatial -omic or imaging approaches. New multi-assay datasets need to be collected with orthogonal data types generated from the same tissue block and the same individual, to serve as a "gold standard" for evaluating new and existing deconvolution methods. Below, we discuss these key challenges and how they can be addressed with the acquisition of new datasets and approaches to analysis.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
3D Graph Contrastive Learning for Molecular Property Prediction
Authors:
Kisung Moon,
Sunyoung Kwon
Abstract:
Self-supervised learning (SSL) is a method that learns the data representation by utilizing supervision inherent in the data. This learning method is in the spotlight in the drug field, lacking annotated data due to time-consuming and expensive experiments. SSL using enormous unlabeled data has shown excellent performance for molecular property prediction, but a few issues exist. (1) Existing SSL…
▽ More
Self-supervised learning (SSL) is a method that learns the data representation by utilizing supervision inherent in the data. This learning method is in the spotlight in the drug field, lacking annotated data due to time-consuming and expensive experiments. SSL using enormous unlabeled data has shown excellent performance for molecular property prediction, but a few issues exist. (1) Existing SSL models are large-scale; there is a limitation to implementing SSL where the computing resource is insufficient. (2) In most cases, they do not utilize 3D structural information for molecular representation learning. The activity of a drug is closely related to the structure of the drug molecule. Nevertheless, most current models do not use 3D information or use it partially. (3) Previous models that apply contrastive learning to molecules use the augmentation of permuting atoms and bonds. Therefore, molecules having different characteristics can be in the same positive samples. We propose a novel contrastive learning framework, small-scale 3D Graph Contrastive Learning (3DGCL) for molecular property prediction, to solve the above problems. 3DGCL learns the molecular representation by reflecting the molecule's structure through the pre-training process that does not change the semantics of the drug. Using only 1,128 samples for pre-train data and 1 million model parameters, we achieved the state-of-the-art or comparable performance in four regression benchmark datasets. Extensive experiments demonstrate that 3D structural information based on chemical knowledge is essential to molecular representation learning for property prediction.
△ Less
Submitted 18 August, 2022; v1 submitted 31 May, 2022;
originally announced August 2022.
-
Spatially distributed computation in cortical circuits
Authors:
Sergei Gepshtein,
Ambarish Pawar,
Sunwoo Kwon,
Sergey Savel'ev,
Thomas D. Albright
Abstract:
The traditional view of neural computation in the cerebral cortex holds that sensory neurons are specialized, i.e., selective for certain dimensions of sensory stimuli. This view was challenged by evidence of contextual interactions between stimulus dimensions in which a neuron's response to one dimension strongly depends on other dimensions. Here we use methods of mathematical modeling, psychophy…
▽ More
The traditional view of neural computation in the cerebral cortex holds that sensory neurons are specialized, i.e., selective for certain dimensions of sensory stimuli. This view was challenged by evidence of contextual interactions between stimulus dimensions in which a neuron's response to one dimension strongly depends on other dimensions. Here we use methods of mathematical modeling, psychophysics, and electrophysiology to address shortcomings of the traditional view. Using a model of a generic cortical circuit, we begin with the simple demonstration that cortical responses are always distributed among neurons, forming characteristic waveforms, which we call neural waves. When stimulated by patterned stimuli, circuit responses arise by interference of neural waves. Resulting patterns of interference depend on interaction between stimulus dimensions. Comparison of these modeled responses with responses of biological vision makes it clear that the framework of neural wave interference provides a useful alternative to the standard concept of neural computation.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Seeded Ising Model and Statistical Natures of Human Iris Templates
Authors:
Song-Hwa Kwon,
Hyeong In Choi,
Sung Jin Lee,
Nam-Sook Wee
Abstract:
We propose a variant of Ising model, called the Seeded Ising Model, to model probabilistic nature of human iris templates. This model is an Ising model in which the values at certain lattice points are held fixed throughout Ising model evolution. Using this we show how to reconstruct the full iris template from partial information, and we show that about 1/6 of the given template is needed to reco…
▽ More
We propose a variant of Ising model, called the Seeded Ising Model, to model probabilistic nature of human iris templates. This model is an Ising model in which the values at certain lattice points are held fixed throughout Ising model evolution. Using this we show how to reconstruct the full iris template from partial information, and we show that about 1/6 of the given template is needed to recover almost all information content of the original one in the sense that the resulting Hamming distance is well within the range to assert correctly the identity of the subject. This leads us to propose the concept of effective statistical degree of freedom of iris templates and show it is about 1/6 of the total number of bits. In particular, for a template of $2048$ bits, its effective statistical degree of freedom is about $342$ bits, which coincides very well with the degree of freedom computed by the completely different method proposed by Daugman.
△ Less
Submitted 2 January, 2018;
originally announced February 2018.
-
NASCUP: Nucleic Acid Sequence Classification by Universal Probability
Authors:
Sunyoung Kwon,
Gyuwan Kim,
Byunghan Lee,
Jongsik Chun,
Sungroh Yoon,
Young-Han Kim
Abstract:
Motivated by the need for fast and accurate classification of unlabeled nucleotide sequences on a large scale, we developed NASCUP, a new classification method that captures statistical structures of nucleotide sequences by compact context-tree models and universal probability from information theory. NASCUP achieved BLAST-like classification accuracy consistently for several large-scale databases…
▽ More
Motivated by the need for fast and accurate classification of unlabeled nucleotide sequences on a large scale, we developed NASCUP, a new classification method that captures statistical structures of nucleotide sequences by compact context-tree models and universal probability from information theory. NASCUP achieved BLAST-like classification accuracy consistently for several large-scale databases in orders-of-magnitude reduced runtime, and was applied to other bioinformatics tasks such as outlier detection and synthetic sequence generation.
△ Less
Submitted 29 November, 2018; v1 submitted 16 November, 2015;
originally announced November 2015.
-
Synaptotagmin 7 Functions as a Ca2+-sensor for Synaptic Vesicle Replenishment
Authors:
Huisheng Liu,
Hua Bai,
Enfu Hui,
Lu Yang,
Chantell Evans,
Zhao Wang,
Sung Kwon,
Edwin Chapman
Abstract:
Synaptotagmin (syt) 7 is one of three syt isoforms found in all metazoans; it is ubiquitously expressed, yet its function in neurons remains obscure. Here, we resolved Ca2+-dependent and Ca2+-independent synaptic vesicle (SV) replenishment pathways, and found that syt 7 plays a selective and critical role in the Ca2+-dependent pathway. Mutations that disrupt Ca2+-binding to syt 7 abolish this func…
▽ More
Synaptotagmin (syt) 7 is one of three syt isoforms found in all metazoans; it is ubiquitously expressed, yet its function in neurons remains obscure. Here, we resolved Ca2+-dependent and Ca2+-independent synaptic vesicle (SV) replenishment pathways, and found that syt 7 plays a selective and critical role in the Ca2+-dependent pathway. Mutations that disrupt Ca2+-binding to syt 7 abolish this function, suggesting that syt 7 functions as a Ca2+-sensor for replenishment. The Ca2+-binding protein calmodulin (CaM) has also been implicated in SV replenishment, and we found that loss of syt 7 was phenocopied by a CaM antagonist. Moreover, we discovered that syt 7 binds to CaM in a highly specific and Ca2+-dependent manner; this interaction requires intact Ca2+-binding sites within syt 7. Together, these data indicate that a complex of two conserved Ca2+-binding proteins, syt 7 and CaM, serve as a key regulator of SV replenishment in presynaptic nerve terminals.
△ Less
Submitted 23 January, 2014;
originally announced January 2014.