-
DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations
Authors:
Dogyun Park,
Sihyeon Kim,
Sojin Lee,
Hyunwoo J. Kim
Abstract:
Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize…
▽ More
Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize INRs and evaluate the network with fixed positional embeddings (PEs). Arguably, this architecture limits the expressive power of generative models and results in low-quality INR generation. To address this limitation, we propose Domain-agnostic Latent Diffusion Model for INRs (DDMI) that generates adaptive positional embeddings instead of neural networks' weights. Specifically, we develop a Discrete-to-continuous space Variational AutoEncoder (D2C-VAE), which seamlessly connects discrete data and the continuous signal functions in the shared latent space. Additionally, we introduce a novel conditioning mechanism for evaluating INRs with the hierarchically decomposed PEs to further enhance expressive power. Extensive experiments across four modalities, e.g., 2D images, 3D shapes, Neural Radiance Fields, and videos, with seven benchmark datasets, demonstrate the versatility of DDMI and its superior performance compared to the existing INR generative models.
△ Less
Submitted 20 March, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Bayesian Model Calibration and Sensitivity Analysis for Oscillating Biological Experiments
Authors:
Youngdeok Hwang,
Hang J. Kim,
Won Chang,
Christian Hong,
Steven N. MacEachern
Abstract:
Understanding the oscillating behaviors that govern organisms' internal biological processes requires interdisciplinary efforts combining both biological and computer experiments, as the latter can complement the former by simulating perturbed conditions with higher resolution. Harmonizing the two types of experiment, however, poses significant statistical challenges due to identifiability issues,…
▽ More
Understanding the oscillating behaviors that govern organisms' internal biological processes requires interdisciplinary efforts combining both biological and computer experiments, as the latter can complement the former by simulating perturbed conditions with higher resolution. Harmonizing the two types of experiment, however, poses significant statistical challenges due to identifiability issues, numerical instability, and ill behavior in high dimension. This article devises a new Bayesian calibration framework for oscillating biochemical models. The proposed Bayesian model is estimated relying on an advanced Markov chain Monte Carlo (MCMC) technique which can efficiently infer the parameter values that match the simulated and observed oscillatory processes. Also proposed is an approach to sensitivity analysis based on the intervention posterior. This approach measures the influence of individual parameters on the target process by using the obtained MCMC samples as a computational tool. The proposed framework is illustrated with circadian oscillations observed in a filamentous fungus, Neurospora crassa.
△ Less
Submitted 13 December, 2024; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Survey data integration for regression analysis using model calibration
Authors:
Zhonglei Wang,
Hang J. Kim,
Jae Kwang Kim
Abstract:
We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a "working" reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The ac…
▽ More
We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a "working" reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the information projection and model calibration weighting. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.
△ Less
Submitted 11 October, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
GPA-Tree: Statistical Approach for Functional-Annotation-Tree-Guided Prioritization of GWAS Results
Authors:
Aastha Khatiwada,
Bethany J. Wolf,
Ayse Selen Yilmaz,
Paula S. Ramos,
Maciej Pietrzak,
Andrew Lawson,
Kelly J. Hunt,
Hang J. Kim,
Dongjun Chung
Abstract:
Motivation: In spite of great success of genome-wide association studies (GWAS), multiple challenges still remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes. Second, our understanding of the functional mechanisms through which genetic variants are associated with complex traits is still limited. To address…
▽ More
Motivation: In spite of great success of genome-wide association studies (GWAS), multiple challenges still remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes. Second, our understanding of the functional mechanisms through which genetic variants are associated with complex traits is still limited. To address these challenges, we propose GPA-Tree and it simultaneously implements association mapping and identifies key combinations of functional annotations related to risk-associated SNPs by combining a decision tree algorithm with a hierarchical modeling framework. Results: First, we implemented simulation studies to evaluate the proposed GPA-Tree method and compared its performance with existing statistical approaches. The results indicate that GPA-Tree outperforms existing statistical approaches in detecting risk-associated SNPs and identifying the true combinations of functional annotations with high accuracy. Second, we applied GPA-Tree to a systemic lupus erythematosus (SLE) GWAS and functional annotation data including GenoSkyline and GenoSkylinePlus. The results from GPA-Tree highlight the dysregulation of blood immune cells, including but not limited to primary B, memory helper T, regulatory T, neutrophils and CD8+ memory T cells in SLE. These results demonstrate that GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits.
△ Less
Submitted 12 June, 2021;
originally announced June 2021.
-
Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy
Authors:
Jingchen Hu,
Joerg Drechsler,
Hang J. Kim
Abstract:
Recent research in differential privacy demonstrated that (sub)sampling can amplify the level of protection. For example, for $ε$-differential privacy and simple random sampling with sampling rate $r$, the actual privacy guarantee is approximately $rε$, if a value of $ε$ is used to protect the output from the sample. In this paper, we study whether this amplification effect can be exploited system…
▽ More
Recent research in differential privacy demonstrated that (sub)sampling can amplify the level of protection. For example, for $ε$-differential privacy and simple random sampling with sampling rate $r$, the actual privacy guarantee is approximately $rε$, if a value of $ε$ is used to protect the output from the sample. In this paper, we study whether this amplification effect can be exploited systematically to improve the accuracy of the privatized estimate. Specifically, assuming the agency has information for the full population, we ask under which circumstances accuracy gains could be expected, if the privatized estimate would be computed on a random sample instead of the full population. We find that accuracy gains can be achieved for certain regimes. However, gains can typically only be expected, if the sensitivity of the output with respect to small changes in the database does not depend too strongly on the size of the database. We only focus on algorithms that achieve differential privacy by adding noise to the final output and illustrate the accuracy implications for two commonly used statistics: the mean and the median. We see our research as a first step towards understanding the conditions required for accuracy gains in practice and we hope that these findings will stimulate further research broadening the scope of differential privacy algorithms and outputs considered.
△ Less
Submitted 21 February, 2022; v1 submitted 17 March, 2021;
originally announced March 2021.
-
Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs
Authors:
Dasol Hwang,
Jinyoung Park,
Sunyoung Kwon,
Kyung-Min Kim,
Jung-Woo Ha,
Hyunwoo J. Kim
Abstract:
Graph neural networks have shown superior performance in a wide range of applications providing a powerful representation of graph-structured data. Recent works show that the representation can be further improved by auxiliary tasks. However, the auxiliary tasks for heterogeneous graphs, which contain rich semantic information with various types of nodes and edges, have less explored in the litera…
▽ More
Graph neural networks have shown superior performance in a wide range of applications providing a powerful representation of graph-structured data. Recent works show that the representation can be further improved by auxiliary tasks. However, the auxiliary tasks for heterogeneous graphs, which contain rich semantic information with various types of nodes and edges, have less explored in the literature. In this paper, to learn graph neural networks on heterogeneous graphs we propose a novel self-supervised auxiliary learning method using meta-paths, which are composite relations of multiple edge types. Our proposed method is learning to learn a primary task by predicting meta-paths as auxiliary tasks. This can be viewed as a type of meta-learning. The proposed method can identify an effective combination of auxiliary tasks and automatically balance them to improve the primary task. Our methods can be applied to any graph neural networks in a plug-in manner without manual labeling or additional data. The experiments demonstrate that the proposed method consistently improves the performance of link prediction and node classification on heterogeneous graphs.
△ Less
Submitted 7 February, 2021; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Graph Transformer Networks
Authors:
Seongjun Yun,
Minbyul Jeong,
Raehyun Kim,
Jaewoo Kang,
Hyunwoo J. Kim
Abstract:
Graph neural networks (GNNs) have been widely used in representation learning on graphs and achieved state-of-the-art performance in tasks such as node classification and link prediction. However, most existing GNNs are designed to learn node representations on the fixed and homogeneous graphs. The limitations especially become problematic when learning representations on a misspecified graph or a…
▽ More
Graph neural networks (GNNs) have been widely used in representation learning on graphs and achieved state-of-the-art performance in tasks such as node classification and link prediction. However, most existing GNNs are designed to learn node representations on the fixed and homogeneous graphs. The limitations especially become problematic when learning representations on a misspecified graph or a heterogeneous graph that consists of various types of nodes and edges. In this paper, we propose Graph Transformer Networks (GTNs) that are capable of generating new graph structures, which involve identifying useful connections between unconnected nodes on the original graph, while learning effective node representation on the new graphs in an end-to-end fashion. Graph Transformer layer, a core layer of GTNs, learns a soft selection of edge types and composite relations for generating useful multi-hop connections so-called meta-paths. Our experiments show that GTNs learn new graph structures, based on data and tasks without domain knowledge, and yield powerful node representation via convolution on the new graphs. Without domain-specific graph preprocessing, GTNs achieved the best performance in all three benchmark node classification tasks against the state-of-the-art methods that require pre-defined meta-paths from domain knowledge.
△ Less
Submitted 4 February, 2020; v1 submitted 6 November, 2019;
originally announced November 2019.
-
Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential Families
Authors:
Seong Jae Hwang,
Ronak Mehta,
Hyunwoo J. Kim,
Vikas Singh
Abstract:
There has recently been a concerted effort to derive mechanisms in vision and machine learning systems to offer uncertainty estimates of the predictions they make. Clearly, there are enormous benefits to a system that is not only accurate but also has a sense for when it is not sure. Existing proposals center around Bayesian interpretations of modern deep architectures -- these are effective but c…
▽ More
There has recently been a concerted effort to derive mechanisms in vision and machine learning systems to offer uncertainty estimates of the predictions they make. Clearly, there are enormous benefits to a system that is not only accurate but also has a sense for when it is not sure. Existing proposals center around Bayesian interpretations of modern deep architectures -- these are effective but can often be computationally demanding. We show how classical ideas in the literature on exponential families on probabilistic networks provide an excellent starting point to derive uncertainty estimates in Gated Recurrent Units (GRU). Our proposal directly quantifies uncertainty deterministically, without the need for costly sampling-based estimation. We demonstrate how our model can be used to quantitatively and qualitatively measure uncertainty in unsupervised image sequence prediction. To our knowledge, this is the first result describing sampling-free uncertainty estimation for powerful sequential models such as GRUs.
△ Less
Submitted 2 September, 2018; v1 submitted 19 April, 2018;
originally announced April 2018.
-
Finding Differentially Covarying Needles in a Temporally Evolving Haystack: A Scan Statistics Perspective
Authors:
Ronak Mehta,
Hyunwoo J. Kim,
Shulei Wang,
Sterling C. Johnson,
Ming Yuan,
Vikas Singh
Abstract:
Recent results in coupled or temporal graphical models offer schemes for estimating the relationship structure between features when the data come from related (but distinct) longitudinal sources. A novel application of these ideas is for analyzing group-level differences, i.e., in identifying if trends of estimated objects (e.g., covariance or precision matrices) are different across disparate co…
▽ More
Recent results in coupled or temporal graphical models offer schemes for estimating the relationship structure between features when the data come from related (but distinct) longitudinal sources. A novel application of these ideas is for analyzing group-level differences, i.e., in identifying if trends of estimated objects (e.g., covariance or precision matrices) are different across disparate conditions (e.g., gender or disease). Often, poor effect sizes make detecting the differential signal over the full set of features difficult: for example, dependencies between only a subset of features may manifest differently across groups. In this work, we first give a parametric model for estimating trends in the space of SPD matrices as a function of one or more covariates. We then generalize scan statistics to graph structures, to search over distinct subsets of features (graph partitions) whose temporal dependency structure may show statistically significant group-wise differences. We theoretically analyze the Family Wise Error Rate (FWER) and bounds on Type 1 and Type 2 error. On a cohort of individuals with risk factors for Alzheimer's disease (but otherwise cognitively healthy), we find scientifically interesting group differences where the default analysis, i.e., models estimated on the full graph, do not survive reasonable significance thresholds.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
Bandwidth Selection for Kernel Density Estimation with a Markov Chain Monte Carlo Sample
Authors:
Hang J. Kim,
Steven N. MacEachern,
Yoonsuh Jung
Abstract:
Markov chain Monte Carlo samplers produce dependent streams of variates drawn from the limiting distribution of the Markov chain. With this as motivation, we introduce novel univariate kernel density estimators which are appropriate for the stationary sequences of dependent variates. We modify the asymptotic mean integrated squared error criterion to account for dependence and find that the modifi…
▽ More
Markov chain Monte Carlo samplers produce dependent streams of variates drawn from the limiting distribution of the Markov chain. With this as motivation, we introduce novel univariate kernel density estimators which are appropriate for the stationary sequences of dependent variates. We modify the asymptotic mean integrated squared error criterion to account for dependence and find that the modified criterion suggests data-driven adjustments to standard bandwidth selection methods. Simulation studies show that our proposed methods find bandwidths close to the optimal value while standard methods lead to smaller bandwidths and hence to undersmoothed density estimates. Empirically, the proposed methods have considerably smaller integrated mean squared error than do standard methods.
△ Less
Submitted 27 July, 2016;
originally announced July 2016.