-
Tight Generalization Error Bounds for Stochastic Gradient Descent in Non-convex Learning
Authors:
Wenjun Xiong,
Juan Ding,
Xinlei Zuo,
Qizhai Li
Abstract:
Stochastic Gradient Descent (SGD) is fundamental for training deep neural networks, especially in non-convex settings. Understanding SGD's generalization properties is crucial for ensuring robust model performance on unseen data. In this paper, we analyze the generalization error bounds of SGD for non-convex learning by introducing the Type II perturbed SGD (T2pm-SGD), which accommodates both sub-…
▽ More
Stochastic Gradient Descent (SGD) is fundamental for training deep neural networks, especially in non-convex settings. Understanding SGD's generalization properties is crucial for ensuring robust model performance on unseen data. In this paper, we analyze the generalization error bounds of SGD for non-convex learning by introducing the Type II perturbed SGD (T2pm-SGD), which accommodates both sub-Gaussian and bounded loss functions. The generalization error bound is decomposed into two components: the trajectory term and the flatness term. Our analysis improves the trajectory term to $O(n^{-1})$, significantly enhancing the previous $O((nb)^{-1/2})$ bound for bounded losses, where n is the number of training samples and b is the batch size. By selecting an optimal variance for the perturbation noise, the overall bound is further refined to $O(n^{-2/3})$. For sub-Gaussian loss functions, a tighter trajectory term is also achieved. In both cases, the flatness term remains stable across iterations and is smaller than those reported in previous literature, which increase with iterations. This stability, ensured by T2pm-SGD, leads to tighter generalization error bounds for both loss function types. Our theoretical results are validated through extensive experiments on benchmark datasets, including MNIST and CIFAR-10, demonstrating the effectiveness of T2pm-SGD in establishing tighter generalization bounds.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Accelerated Markov Chain Monte Carlo Algorithms on Discrete States
Authors:
Bohan Zhou,
Shu Liu,
Xinzhe Zuo,
Wuchen Li
Abstract:
We propose a class of discrete state sampling algorithms based on Nesterov's accelerated gradient method, which extends the classical Metropolis-Hastings (MH) algorithm. The evolution of the discrete states probability distribution governed by MH can be interpreted as a gradient descent direction of the Kullback--Leibler (KL) divergence, via a mobility function and a score function. Specifically,…
▽ More
We propose a class of discrete state sampling algorithms based on Nesterov's accelerated gradient method, which extends the classical Metropolis-Hastings (MH) algorithm. The evolution of the discrete states probability distribution governed by MH can be interpreted as a gradient descent direction of the Kullback--Leibler (KL) divergence, via a mobility function and a score function. Specifically, this gradient is defined on a probability simplex equipped with a discrete Wasserstein-2 metric with a mobility function. This motivates us to study a momentum-based acceleration framework using damped Hamiltonian flows on the simplex set, whose stationary distribution matches the discrete target distribution. Furthermore, we design an interacting particle system to approximate the proposed accelerated sampling dynamics. The extension of the algorithm with a general choice of potentials and mobilities is also discussed. In particular, we choose the accelerated gradient flow of the relative Fisher information, demonstrating the advantages of the algorithm in estimating discrete score functions without requiring the normalizing constant and keeping positive probabilities. Numerical examples, including sampling on a Gaussian mixture supported on lattices or a distribution on a hypercube, demonstrate the effectiveness of the proposed discrete-state sampling algorithm.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Dark Brain Energy: Toward an Integrative Model of Spontaneous Slow Oscillations
Authors:
ZhuQing Gong,
XiNian Zuo
Abstract:
Neural oscillations facilitate the functioning of the human brain in spatial and temporal dimensions at various frequencies. These oscillations feature a universal frequency architecture that is governed by brain anatomy, ensuring frequency specificity remains invariant across different measurement techniques. Initial magnetic resonance imaging (MRI) methodology constrained functional MRI (fMRI) i…
▽ More
Neural oscillations facilitate the functioning of the human brain in spatial and temporal dimensions at various frequencies. These oscillations feature a universal frequency architecture that is governed by brain anatomy, ensuring frequency specificity remains invariant across different measurement techniques. Initial magnetic resonance imaging (MRI) methodology constrained functional MRI (fMRI) investigations to a singular frequency range, thereby neglecting the frequency characteristics inherent in blood oxygen level-dependent oscillations. With advancements in MRI technology, it has become feasible to decode intricate brain activities via multi-band frequency analysis (MBFA). During the past decade, the utilization of MBFA in fMRI studies has surged, unveiling frequency-dependent characteristics of spontaneous slow oscillations (SSOs) believed to base dark energy in the brain. There remains a dearth of conclusive insights and hypotheses pertaining to the properties and functionalities of SSOs in distinct bands. We surveyed the SSO MBFA studies during the past 15 years to delineate the attributes of SSOs and enlighten their correlated functions. We further proposed a model to elucidate the hierarchical organization of multi-band SSOs by integrating their function, aimed at bridging theoretical gaps and guiding future MBFA research endeavors.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Fisher information dissipation for time inhomogeneous stochastic differential equations
Authors:
Qi Feng,
Xinzhe Zuo,
Wuchen Li
Abstract:
We provide a Lyapunov convergence analysis for time-inhomogeneous variable coefficient stochastic differential equations (SDEs). Three typical examples include overdamped, irreversible drift, and underdamped Langevin dynamics. We first formula the probability transition equation of Langevin dynamics as a modified gradient flow of the Kullback-Leibler divergence in the probability space with respec…
▽ More
We provide a Lyapunov convergence analysis for time-inhomogeneous variable coefficient stochastic differential equations (SDEs). Three typical examples include overdamped, irreversible drift, and underdamped Langevin dynamics. We first formula the probability transition equation of Langevin dynamics as a modified gradient flow of the Kullback-Leibler divergence in the probability space with respect to time-dependent optimal transport metrics. This formulation contains both gradient and non-gradient directions depending on a class of time-dependent target distribution. We then select a time-dependent relative Fisher information functional as a Lyapunov functional. We develop a time-dependent Hessian matrix condition, which guarantees the convergence of the probability density function of the SDE. We verify the proposed conditions for several time-inhomogeneous Langevin dynamics. For the overdamped Langevin dynamics, we prove the $O(t^{-1/2})$ convergence in $L^1$ distance for the simulated annealing dynamics with a strongly convex potential function. For the irreversible drift Langevin dynamics, we prove an improved convergence towards the target distribution in an asymptotic regime. We also verify the convergence condition for the underdamped Langevin dynamics. Numerical examples demonstrate the convergence results for the time-dependent Langevin dynamics.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Comparing with Python: Text Analysis in Stata
Authors:
Xiangtai Zuo
Abstract:
Text analysis is the process of constructing structured data from unstructured textual content, usually implemented in Python. In terms of the principles of text analysis, a computer program with the ability to read a file and match it with a regular expression is all that is needed for basic text analysis. However, few researchers have used Stata as their main text analysis tool. In this paper, I…
▽ More
Text analysis is the process of constructing structured data from unstructured textual content, usually implemented in Python. In terms of the principles of text analysis, a computer program with the ability to read a file and match it with a regular expression is all that is needed for basic text analysis. However, few researchers have used Stata as their main text analysis tool. In this paper, I will take a step-by-step approach to the practical process, giving examples of how text analysis can be performed with Stata, and comparing the code and running time with Python.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
A Paradigm Shift in Neuroscience Driven by Big Data: State of art, Challenges, and Proof of Concept
Authors:
Zi-Xuan Zhou,
Xi-Nian Zuo
Abstract:
A recent editorial in Nature noted that cognitive neuroscience is at a crossroads where it is a thorny issue to reliably reveal brain-behavior associations. This commentary sketches a big data science way out for cognitive neuroscience, namely population neuroscience. In terms of design, analysis, and interpretations, population neuroscience research takes the design control to an unprecedented le…
▽ More
A recent editorial in Nature noted that cognitive neuroscience is at a crossroads where it is a thorny issue to reliably reveal brain-behavior associations. This commentary sketches a big data science way out for cognitive neuroscience, namely population neuroscience. In terms of design, analysis, and interpretations, population neuroscience research takes the design control to an unprecedented level, greatly expands the dimensions of the data analysis space, and paves a paradigm shift for exploring mechanisms on brain-behavior associations.
△ Less
Submitted 3 March, 2023; v1 submitted 8 December, 2022;
originally announced December 2022.
-
State capital involvement, managerial sentiment and firm innovation performance Evidence from China
Authors:
Xiangtai Zuo
Abstract:
In recent years, more and more state-owned enterprises (SOEs) have been embedded in the restructuring and governance of private enterprises through equity participation, providing a more advantageous environment for private enterprises in financing and innovation. However, there is a lack of knowledge about the underlying mechanisms of SOE intervention on corporate innovation performance. Hence, i…
▽ More
In recent years, more and more state-owned enterprises (SOEs) have been embedded in the restructuring and governance of private enterprises through equity participation, providing a more advantageous environment for private enterprises in financing and innovation. However, there is a lack of knowledge about the underlying mechanisms of SOE intervention on corporate innovation performance. Hence, in this study, we investigated the association of state capital intervention with innovation performance, meanwhile further investigated the potential mediating and moderating role of managerial sentiment and financing constraints, respectively, using all listed non-ST firms from 2010 to 2020 as the sample. The results revealed two main findings: 1) state capital intervention would increase innovation performance through managerial sentiment; 2) financing constraints would moderate the effect of state capital intervention on firms' innovation performance.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.