-
Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data
Authors:
Lan Tao,
Shirong Xu,
Chi-Hua Wang,
Namjoon Suh,
Guang Cheng
Abstract:
With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively charac…
▽ More
With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively characterizes the relation between the Bayes risk in classifying two distributions and their TV distance. Therefore, the estimation of total variation distance reduces to that of the Bayes risk. In particular, this paper establishes theoretical results regarding the convergence rate of the estimation error of TV distance between two Gaussian distributions. We demonstrate that, with a specific choice of hypothesis class in classification, a fast convergence rate in estimating the TV distance can be achieved. Specifically, the estimation accuracy of the TV distance is proven to inherently depend on the separation of two Gaussian distributions: smaller estimation errors are achieved when the two Gaussian distributions are farther apart. This phenomenon is also validated empirically through extensive simulations. In the end, we apply this discriminative estimation method to rank fidelity of synthetic image data using the MNIST dataset.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Approximation of RKHS Functionals by Neural Networks
Authors:
Tian-Yi Zhou,
Namjoon Suh,
Guang Cheng,
Xiaoming Huo
Abstract:
Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i.e., functionals). In this paper, we study the approximation of functionals on reproducing kernel Hilbert spaces (RKHS's) using neural networks. We establish the universality of the approximation…
▽ More
Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i.e., functionals). In this paper, we study the approximation of functionals on reproducing kernel Hilbert spaces (RKHS's) using neural networks. We establish the universality of the approximation of functionals on the RKHS's. Specifically, we derive explicit error bounds for those induced by inverse multiquadric, Gaussian, and Sobolev kernels. Moreover, we apply our findings to functional regression, proving that neural networks can accurately approximate the regression maps in generalized functional linear models. Existing works on functional learning require integration-type basis function expansions with a set of pre-specified basis functions. By leveraging the interpolating orthogonal projections in RKHS's, our proposed network is much simpler in that we use point evaluations to replace basis function expansions.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data
Authors:
Yue Xing,
Xiaofeng Lin,
Chenheng Xu,
Namjoon Suh,
Qifan Song,
Guang Cheng
Abstract:
Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume the input $x_i$ and the output $y_i$ of each demonstration example are in the same token (i.e., structured data). However, in real practice, the examples are usua…
▽ More
Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume the input $x_i$ and the output $y_i$ of each demonstration example are in the same token (i.e., structured data). However, in real practice, the examples are usually text input, and all words, regardless of their logic relationship, are stored in different tokens (i.e., unstructured data \cite{wibisono2023role}). To understand how LLMs learn from the unstructured data in ICL, this paper studies the role of each component in the transformer architecture and provides a theoretical understanding to explain the success of the architecture. In particular, we consider a simple transformer with one/two attention layers and linear regression tasks for the ICL prediction. We observe that (1) a transformer with two layers of (self-)attentions with a look-ahead attention mask can learn from the prompt in the unstructured data, and (2) positional encoding can match the $x_i$ and $y_i$ tokens to achieve a better ICL performance.
△ Less
Submitted 18 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
Authors:
Namjoon Suh,
Guang Cheng
Abstract:
In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression (and classification in Appendix~{\color{blue}B}). These results rely on explicit constructions of neural networks…
▽ More
In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression (and classification in Appendix~{\color{blue}B}). These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. Last but not least, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs) from two perpsectives reviewed previously, i.e., approximation and training dynamics.
△ Less
Submitted 16 September, 2024; v1 submitted 13 January, 2024;
originally announced January 2024.
-
AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing
Authors:
Namjoon Suh,
Xiaofeng Lin,
Din-Yin Hsieh,
Merhdad Honarkhah,
Guang Cheng
Abstract:
Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem…
▽ More
Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over $15$ publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available at https://github.com/UCLA-Trustworthy-AI-Lab/AutoDiffusion.
△ Less
Submitted 16 November, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
On Excess Risk Convergence Rates of Neural Network Classifiers
Authors:
Hyunouk Ko,
Namjoon Suh,
Xiaoming Huo
Abstract:
The recent success of neural networks in pattern recognition and classification problems suggests that neural networks possess qualities distinct from other more classical classifiers such as SVMs or boosting classifiers. This paper studies the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. Compared to the typical s…
▽ More
The recent success of neural networks in pattern recognition and classification problems suggests that neural networks possess qualities distinct from other more classical classifiers such as SVMs or boosting classifiers. This paper studies the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. Compared to the typical settings imposed in the literature, we consider a more general scenario that resembles actual practice in two respects: first, the function class to be approximated includes the Barron functions as a proper subset, and second, the neural network classifier constructed is the minimizer of a surrogate loss instead of the $0$-$1$ loss so that gradient descent-based numerical optimizations can be easily applied. While the class of functions we consider is quite large that optimal rates cannot be faster than $n^{-\frac{1}{3}}$, it is a regime in which dimension-free rates are possible and approximation power of neural networks can be taken advantage of. In particular, we analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence for the excess risk. Finally, we show that the rate obtained is in fact minimax optimal up to a logarithmic factor, and the minimax lower bound shows the effect of the margin assumption in this regime.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Asymptotic Theory of $\ell_1$-Regularized PDE Identification from a Single Noisy Trajectory
Authors:
Yuchen He,
Namjoon Suh,
Xiaoming Huo,
Sungha Kang,
Yajun Mei
Abstract:
We prove the support recovery for a general class of linear and nonlinear evolutionary partial differential equation (PDE) identification from a single noisy trajectory using $\ell_1$ regularized Pseudo-Least Squares model~($\ell_1$-PsLS). In any associative $\mathbb{R}$-algebra generated by finitely many differentiation operators that contain the unknown PDE operator, applying $\ell_1$-PsLS to a…
▽ More
We prove the support recovery for a general class of linear and nonlinear evolutionary partial differential equation (PDE) identification from a single noisy trajectory using $\ell_1$ regularized Pseudo-Least Squares model~($\ell_1$-PsLS). In any associative $\mathbb{R}$-algebra generated by finitely many differentiation operators that contain the unknown PDE operator, applying $\ell_1$-PsLS to a given data set yields a family of candidate models with coefficients $\mathbf{c}(λ)$ parameterized by the regularization weight $λ\geq 0$. The trace of $\{\mathbf{c}(λ)\}_{λ\geq 0}$ suffers from high variance due to data noises and finite difference approximation errors. We provide a set of sufficient conditions which guarantee that, from a single trajectory data denoised by a Local-Polynomial filter, the support of $\mathbf{c}(λ)$ asymptotically converges to the true signed-support associated with the underlying PDE for sufficiently many data and a certain range of $λ$. We also show various numerical experiments to validate our theory.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Factor Analysis on Citation, Using a Combined Latent and Logistic Regression Model
Authors:
Namjoon Suh,
Xiaoming Huo,
Eric Heim,
Lee Seversky
Abstract:
We propose a combined model, which integrates the latent factor model and the logistic regression model, for the citation network. It is noticed that neither a latent factor model nor a logistic regression model alone is sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represents the main technological trends (a.k.a., factors), and a…
▽ More
We propose a combined model, which integrates the latent factor model and the logistic regression model, for the citation network. It is noticed that neither a latent factor model nor a logistic regression model alone is sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represents the main technological trends (a.k.a., factors), and adds a sparse component that captures the remaining ad-hoc dependence. Parameter estimation is carried out through the construction of a joint-likelihood function of edges and properly chosen penalty terms. The convexity of the objective function allows us to develop an efficient algorithm, while the penalty terms push towards a low-dimensional latent component and a sparse graphical structure. Simulation results show that the proposed method works well in practical situations. The proposed method has been applied to a real application, which contains a citation network of statisticians (Ji and Jin, 2016). Some interesting findings are reported.
△ Less
Submitted 1 December, 2019;
originally announced December 2019.
-
Review on Parameter Estimation in HMRF
Authors:
Namjoon Suh
Abstract:
This is a technical report which explores the estimation methodologies on hyper-parameters in Markov Random Field and Gaussian Hidden Markov Random Field. In first section, we briefly investigate a theoretical framework on Metropolis-Hastings algorithm. Next, by using MH algorithm, we simulate the data from Ising model, and study on how hyper-parameter estimation in Ising model is enabled through…
▽ More
This is a technical report which explores the estimation methodologies on hyper-parameters in Markov Random Field and Gaussian Hidden Markov Random Field. In first section, we briefly investigate a theoretical framework on Metropolis-Hastings algorithm. Next, by using MH algorithm, we simulate the data from Ising model, and study on how hyper-parameter estimation in Ising model is enabled through MCMC algorithm using pseudo-likelihood approximation. Following section deals with an issue on parameters estimation process of Gaussian Hidden Markov Random Field using MAP estimation and EM algorithm, and also discusses problems, found through several experiments. In following section, we expand this idea on estimating parameters in Gaussian Hidden Markov Spatial-Temporal Random Field, and display results on two performed experiments.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.