-
Local Dispersive and Strichartz estimates for the Schrödinger equation associated to the Ornstein-Uhlenbeck operator
Authors:
Aparajita Dasgupta,
Uttam Kumar Dolai,
Cheng Luo,
Manli Song
Abstract:
In this paper we study the linear and nonlinear Schrödinger equations associated with the Ornstein-Uhlenbeck (OU) operator endowed with the Gaussian measure. While classical Strichartz estimates are well-developed for the free Schrödinger operator on Euclidean spaces, extending them to non-translation-invariant operators like the OU operator presents significant challenges due to the lack of globa…
▽ More
In this paper we study the linear and nonlinear Schrödinger equations associated with the Ornstein-Uhlenbeck (OU) operator endowed with the Gaussian measure. While classical Strichartz estimates are well-developed for the free Schrödinger operator on Euclidean spaces, extending them to non-translation-invariant operators like the OU operator presents significant challenges due to the lack of global dispersive decay. In this work, we overcome these difficulties by deriving localized $L^1 \to L^\infty$ dispersive estimates for the OU Schrödinger propagator using Mehler kernel techniques. We then establish a family of weighted Strichartz estimates in Gaussian $L^p$ spaces via interpolation and the abstract $TT^*$-method. As an application, we prove local well-posedness results for the nonlinear Schrödinger equation with power-type nonlinearity in both subcritical and critical regimes. Our framework reveals new dispersive phenomena in the context of the OU semigroup and provides the first comprehensive Strichartz theory in this setting.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
A Holomorphic Splitting Theorem
Authors:
Miao Song
Abstract:
A long-term project is to construct a complete Calabi-Yau metric on the complement of the anticanonical divisor in a compact Kähler manifold $\oM$. We focus on the case where this smooth divisor has multiplicity 2 and is itself a compact Calabi-Yau manifold. Firstly we solved the Monge-Ampère equation when the Ricci potiential is of $O(r^{-1})$ decay on the generalized $ALG$ manifolds. Then we use…
▽ More
A long-term project is to construct a complete Calabi-Yau metric on the complement of the anticanonical divisor in a compact Kähler manifold $\oM$. We focus on the case where this smooth divisor has multiplicity 2 and is itself a compact Calabi-Yau manifold. Firstly we solved the Monge-Ampère equation when the Ricci potiential is of $O(r^{-1})$ decay on the generalized $ALG$ manifolds. Then we used the solution to this Kähler Ricci flat metric to prove a holomorphic splitting theorem: If $K_{\oM}=\calo(-2D)$, where $D$ can be realized as a smooth Calabi-Yau manifold, and if $\calo_{3D}(D)$ is trivial, then this Kähler manifold $\oM$ is biholomorphic to $\bbp^1\times D$.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Orthonormal Strichartz estimates for Dunkl-Schrödinger equation of initial data with Sobolev regularity
Authors:
Guoxia Feng,
Shyam Swarup Mondal,
Manli Song,
Huoxiong Wu
Abstract:
Let $Δ_κ$ be the Dunkl-Laplacian on $\mathbb{R}^n$. The main aim of this paper is to investigate the orthonormal Strichartz estimates for the Schrödinger equation with initial data from the homogeneous Dunkl-Sobolev space $\dot{H}_κ^s (\mathbb{R}^n)$. Our approach is based on restricted weak-type orthonormal estimates, frequency-localized estimates for the Dunkl-Schrödinger propagator $e^{itΔ_κ}$,…
▽ More
Let $Δ_κ$ be the Dunkl-Laplacian on $\mathbb{R}^n$. The main aim of this paper is to investigate the orthonormal Strichartz estimates for the Schrödinger equation with initial data from the homogeneous Dunkl-Sobolev space $\dot{H}_κ^s (\mathbb{R}^n)$. Our approach is based on restricted weak-type orthonormal estimates, frequency-localized estimates for the Dunkl-Schrödinger propagator $e^{itΔ_κ}$, and a series of successive real and complex interpolation techniques.
△ Less
Submitted 10 June, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
Acceleration via Perturbations on Low-resolution Ordinary Differential Equations
Authors:
Xudong Li,
Lei Shi,
Mingqi Song
Abstract:
Recently, the high-resolution ordinary differential equation (ODE) framework, which retains higher-order terms, has been proposed to analyze gradient-based optimization algorithms. Through this framework, the term $\nabla^2 f(X_t)\dot{X_t}$, known as the gradient-correction term, was found to be essential for reducing oscillations and accelerating the convergence rate of function values. Despite t…
▽ More
Recently, the high-resolution ordinary differential equation (ODE) framework, which retains higher-order terms, has been proposed to analyze gradient-based optimization algorithms. Through this framework, the term $\nabla^2 f(X_t)\dot{X_t}$, known as the gradient-correction term, was found to be essential for reducing oscillations and accelerating the convergence rate of function values. Despite the importance of this term, simply adding it to the low-resolution ODE may sometimes lead to a slower convergence rate. To fully understand this phenomenon, we propose a generalized perturbed ODE and analyze the role of the gradient and gradient-correction perturbation terms under both continuous-time and discrete-time settings. We demonstrate that while the gradient-correction perturbation is essential for obtaining accelerations, it can hinder the convergence rate of function values in certain cases. However, this adverse effect can be mitigated by involving an additional gradient perturbation term. Moreover, by conducting a comprehensive analysis, we derive proper choices of perturbation parameters. Numerical experiments are also provided to validate our theoretical findings.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Validity of the total quasi-steady-state approximation in stochastic biochemical reaction networks
Authors:
Yun Min Song,
Kangmin Lee,
Jae Kyoung Kim
Abstract:
Stochastic models for biochemical reaction networks are widely used to explore their complex dynamics but face significant challenges, including difficulties in determining rate constants and high computational costs. To address these issues, model reduction approaches based on deterministic quasi-steady-state approximations (QSSA) have been employed, resulting in propensity functions in the form…
▽ More
Stochastic models for biochemical reaction networks are widely used to explore their complex dynamics but face significant challenges, including difficulties in determining rate constants and high computational costs. To address these issues, model reduction approaches based on deterministic quasi-steady-state approximations (QSSA) have been employed, resulting in propensity functions in the form of deterministic non-elementary reaction functions, such as the Michaelis-Menten equation. In particular, the total QSSA (tQSSA), known for its accuracy in deterministic frameworks, has been perceived as universally valid for stochastic model reduction. However, recent studies have challenged this perception. In this review, we demonstrate that applying tQSSA in stochastic model reduction can distort dynamics, even in cases where the deterministic tQSSA is rigorously valid. This highlights the need for caution when using deterministic QSSA in stochastic model reduction to avoid erroneous conclusions from model simulations.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Dimension-independent convergence rate of propagation of chaos and numerical analysis for McKean-Vlasov stochastic differential equations with coefficients nonlinearly dependent on measure
Authors:
Yuhang Zhang,
Minghui Song
Abstract:
In contrast to ordinary stochastic differential equations (SDEs), the numerical simulation of McKean-Vlasov stochastic differential equations (MV-SDEs) requires approximating the distribution law first. Based on the theory of propagation of chaos, particle approximation method is widely used. Then, a natural question is to investigate the convergence rate of the method (also referred to as the con…
▽ More
In contrast to ordinary stochastic differential equations (SDEs), the numerical simulation of McKean-Vlasov stochastic differential equations (MV-SDEs) requires approximating the distribution law first. Based on the theory of propagation of chaos, particle approximation method is widely used. Then, a natural question is to investigate the convergence rate of the method (also referred to as the convergence rate of PoC). In fact, the PoC convergence rate is well understood for MV-SDEs with coefficients linearly dependent on the measure, but the rate deteriorates with dimension $d$ under the $L^p$-Wasserstein metric for nonlinear measure-dependent coefficients, even when Lipschitz continuity with respect to the measure is assumed. The main objective of this paper is to establish a dimension-independent convergence result of PoC for MV-SDEs whose coefficients are nonlinear with respect to the measure component but Lipschitz continuous. As a complement we further give the time discretization of the equations and thus verify the convergence rate of PoC using numerical experiments.
△ Less
Submitted 11 June, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Strong convergence of the adaptive Milstein method for nonlinear stochastic differential equations with piecewise continuous arguments
Authors:
Yuhang Zhang,
Minghui Song,
Jiaqi Zhu
Abstract:
In this work, an adaptive time-stepping Milstein method is constructed for stochastic differential equations with piecewise continuous arguments (SDEPCAs), where the drift is one-sided Lipschitz continuous and the diffusion does not impose the commutativity condition. It is widely recognized that explicit Euler or Milstein methods may blow up when the system exhibits superlinear growth, and modifi…
▽ More
In this work, an adaptive time-stepping Milstein method is constructed for stochastic differential equations with piecewise continuous arguments (SDEPCAs), where the drift is one-sided Lipschitz continuous and the diffusion does not impose the commutativity condition. It is widely recognized that explicit Euler or Milstein methods may blow up when the system exhibits superlinear growth, and modifications are needed. Hence we propose an adaptive variant to deal with the case of superlinear growth drift coefficient. To the best of our knowledge, this is the first work to develop a numerical method with variable step sizes for nonlinear SDEPCAs. It is proven that the adaptive Milstein method is strongly convergent in the sense of $L_p, p\ge 2$, and the convergence rate is optimal, which is consistent with the order of the explicit Milstein scheme with globally Lipschitz coefficients. Finally, several numerical experiments are presented to support the theoretical analysis.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Combinatorics of generalized orthogonal polynomials of type $R_{II}$
Authors:
Jang Soo Kim,
Minho Song
Abstract:
In 1995, Ismail and Masson introduced orthogonal polynomials of types \( R_I \) and \( R_{II} \), which are defined by specific three-term recurrence relations with additional conditions. Recently, Kim and Stanton found a combinatorial interpretation for the moments of orthogonal polynomials of type \( R_I \) in the spirit of the combinatorial theory of orthogonal polynomials due to Flajolet and V…
▽ More
In 1995, Ismail and Masson introduced orthogonal polynomials of types \( R_I \) and \( R_{II} \), which are defined by specific three-term recurrence relations with additional conditions. Recently, Kim and Stanton found a combinatorial interpretation for the moments of orthogonal polynomials of type \( R_I \) in the spirit of the combinatorial theory of orthogonal polynomials due to Flajolet and Viennot. In this paper, we push this combinatorial model further to orthogonal polynomials of type \( R_{II} \). Moreover, we generalize orthogonal polynomials of type \( R_{II} \) by relaxing some of their conditions. We then prove a master theorem, which generalizes combinatorial models for moments of various types of orthogonal polynomials: classical orthogonal polynomials, Laurent biorthogonal polynomials, and orthogonal polynomials of types \( R_I \) and \( R_{II} \).
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Learning-Augmented Algorithms for Online Concave Packing and Convex Covering Problems
Authors:
Elena Grigorescu,
Young-San Lin,
Maoyuan Song
Abstract:
Learning-augmented algorithms have been extensively studied across the computer science community in the recent years, driven by advances in machine learning predictors, which can provide additional information to augment classical algorithms. Such predictions are especially powerful in the context of online problems, where decisions have to be made without knowledge of the future, and which tradi…
▽ More
Learning-augmented algorithms have been extensively studied across the computer science community in the recent years, driven by advances in machine learning predictors, which can provide additional information to augment classical algorithms. Such predictions are especially powerful in the context of online problems, where decisions have to be made without knowledge of the future, and which traditionally exhibits impossibility results bounding the performance of any online algorithm. The study of learning-augmented algorithms thus aims to use external advice prudently, to overcome classical impossibility results when the advice is accurate, and still perform comparably to the state-of-the-art online algorithms even when the advice is inaccurate.
In this paper, we present learning-augmented algorithmic frameworks for two fundamental optimizations settings, extending and generalizing prior works. For online packing with concave objectives, we present a simple but overarching strategy that switches between the advice and the state-of-the-art online algorithm. For online covering with convex objectives, we greatly extend primal-dual methods for online convex covering programs by Azar et al. (FOCS 2016) and previous learning-augmented framework for online covering linear programs from the literature, to many new applications. We show that our algorithms break impossibility results when the advice is accurate, while maintaining comparable performance with state-of-the-art classical online algorithms even when the advice is erroneous.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
Asymptotic Normality of the Largest Eigenvalue for Noncentral Sample Covariance Matrices
Authors:
Huihui Cheng,
Minjie Song
Abstract:
Let $X$ be a $p\times n$ independent identically distributed real Gaussian matrix with positive mean $μ$ and variance $σ^2$ entries. The goal of this paper is to investigate the largest eigenvalue of the noncentral sample covariance matrix $W=XX^{T}/n$, when the dimension $p$ and the sample size $n$ both grow to infinity with the limit $p/n=c\,(0<c<\infty)$. Utilizing the von Mises iteration metho…
▽ More
Let $X$ be a $p\times n$ independent identically distributed real Gaussian matrix with positive mean $μ$ and variance $σ^2$ entries. The goal of this paper is to investigate the largest eigenvalue of the noncentral sample covariance matrix $W=XX^{T}/n$, when the dimension $p$ and the sample size $n$ both grow to infinity with the limit $p/n=c\,(0<c<\infty)$. Utilizing the von Mises iteration method, we derive an approximation of the largest eigenvalue $λ_{1}(W)$ and show that $λ_{1}(W)$ asymptotically has a normal distribution with expectation $pμ^2+(1+c)σ^2$ and variance $4cμ^2σ^2$.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Time-domain direct sampling method for inverse electromagnetic scattering with a single incident source
Authors:
Chen Geng,
Minghui Song,
Xianchao Wang,
Yuliang Wang
Abstract:
In this paper, we consider an inverse electromagnetic medium scattering problem of reconstructing unknown objects from time-dependent boundary measurements. A novel time-domain direct sampling method is developed for determining the locations of unknown scatterers by using only a single incident source. Notably, our method imposes no restrictions on the the waveform of the incident wave. Based on…
▽ More
In this paper, we consider an inverse electromagnetic medium scattering problem of reconstructing unknown objects from time-dependent boundary measurements. A novel time-domain direct sampling method is developed for determining the locations of unknown scatterers by using only a single incident source. Notably, our method imposes no restrictions on the the waveform of the incident wave. Based on the Fourier-Laplace transform, we first establish the connection between the frequency-domain and the time-domain direct sampling method. Furthermore, we elucidate the mathematical mechanism of the imaging functional through the properties of modified Bessel functions. Theoretical justifications and stability analyses are provided to demonstrate the effectiveness of the proposed method. Finally, several numerical experiments are presented to illustrate the feasibility of our approach.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Orthonormal Strichartz inequalities and their applications on abstract measure spaces
Authors:
Guoxia Feng,
Shyam Swarup Mondal,
Manli Song,
Huoxiong Wu
Abstract:
The main objective of this paper is to extend certain fundamental inequalities from a single function to a family of orthonormal systems. In the first part of the paper, we consider a non-negative, self-adjoint operator $L$ on $L^2(X,μ)$, where $(X,μ)$ is a measure space. Under the assumption that the kernel $K_{it}(x,y)$ of the Schrödinger propagator $e^{itL}$ satisfies a uniform $L^\infty$-decay…
▽ More
The main objective of this paper is to extend certain fundamental inequalities from a single function to a family of orthonormal systems. In the first part of the paper, we consider a non-negative, self-adjoint operator $L$ on $L^2(X,μ)$, where $(X,μ)$ is a measure space. Under the assumption that the kernel $K_{it}(x,y)$ of the Schrödinger propagator $e^{itL}$ satisfies a uniform $L^\infty$-decay estimate of the form
\begin{equation*}
\sup_{x,y\in X}|K_{it}(x,y)|\lesssim |t|^{-\frac{n}{2}},\,|t|<T_0, \text{ for some }n\geq1,
\end{equation*} where $T_0\in(0,+\infty]$, we establish Strichartz estimates for the Schrödinger propagator $e^{itL}$ and using a duality principle argument by Frank-Sabin \cite{FS}, we extend it for a system of infinitely many fermions on $L^2(X)$. We also obtain orthonormal Strichartz estimates for a class of dispersive semigroup $U(t)=e^{itφ(L)}ψ(\sqrt{L}),$ where $φ: \mathbb{R}^+\rightarrow \mathbb{R}$ is a smooth function and $ψ\in C_c^\infty([\frac{1}{2},2])$. As an application of these orthonormal versions of Strichartz estimates, we prove the well-posedness for the Hartree equation in the Schatten spaces.
In the next part of the paper, we obtain some new orthonormal Strichartz estimates, which extend prior work of Kenig-Ponce-Vega \cite{Kenig-Ponce-Vega} for single functions. Using those orthonormal versions of Kenig-Ponce-Vega result, we prove the orthonormal restriction theorem for the Fourier transform on some particular noncompact hypersurface of the form $S=\{(ξ, φ(ξ): ξ\in \mathbb{R})\}$, where $φ$ satisfies certain growth condition.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
On the local and global minimizers of the smooth stress function in Euclidean Distance Matrix problems
Authors:
Mengmeng Song,
Douglas Goncalves,
Woosuk L. Jung,
Carlile Lavor,
Antonio Mucherino,
Henry Wolkowicz
Abstract:
We consider the nonconvex minimization problem, with quartic objective function, that arises in the exact recovery of a configuration matrix $P\in \Rnd$ of $n$ points when a Euclidean distance matrix, \EDMp, is given with embedding dimension $d$. It is an open question in the literature under which conditions such a minimization problem admits a local nonglobal minimizer, \lngmp. We prove that all…
▽ More
We consider the nonconvex minimization problem, with quartic objective function, that arises in the exact recovery of a configuration matrix $P\in \Rnd$ of $n$ points when a Euclidean distance matrix, \EDMp, is given with embedding dimension $d$. It is an open question in the literature under which conditions such a minimization problem admits a local nonglobal minimizer, \lngmp. We prove that all second order stationary points are global minimizers whenever $n \leq d + 1$. For $n > d+1$, we numerically find a local nonglobal minimum and show analytically that there indeed exists a nearby \lngm for the underlying quartic minimization problem. Thus, we answer in the affirmative the previously open question about their existence. Our approach to finding the \lngm is novel in that we first exploit the translation and rotation invariance to reduce the size of the problem from $nd$ variables in $P$ to $(n-1)d - d(d-1)/2 = d(2n-d-1)/2$ variables. This allows for finding examples that satisfy the strict second order sufficient optimality conditions.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Combinatorics of orthogonal polynomials on the unit circle
Authors:
Jihyeug Jang,
Minho Song
Abstract:
Orthogonal polynomials on the unit circle (OPUC for short) are a family of polynomials whose orthogonality is given by integration over the unit circle in the complex plane. There are combinatorial studies on the moments of various types of orthogonal polynomials, including classical orthogonal polynomials, Laurent biorthogonal polynomials, and orthogonal polynomials of type \( R_I \). In this pap…
▽ More
Orthogonal polynomials on the unit circle (OPUC for short) are a family of polynomials whose orthogonality is given by integration over the unit circle in the complex plane. There are combinatorial studies on the moments of various types of orthogonal polynomials, including classical orthogonal polynomials, Laurent biorthogonal polynomials, and orthogonal polynomials of type \( R_I \). In this paper, we study the moments of OPUC from a combinatorial perspective. We provide three path interpretations for them: Łukasiewicz paths, gentle Motzkin paths, and Schröder paths. Additionally, using these combinatorial interpretations, we derive explicit formulas for the generalized moments of some examples of OPUC, including the circular Jacobi polynomials and the Rogers--Szegő polynomials. Furthermore, we introduce several kinds of generalized linearization coefficients and give combinatorial interpretations for them.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Decay estimates for a class of Dunkl wave equations
Authors:
Cheng Luo,
Shyam Swarup Mondal,
Manli Song
Abstract:
Let $Δ_κ$ be the Dunkl Laplacian on $\mathbb{R}^n$ and $φ: \mathbb{R}^+ \to \mathbb{R}$ is a smooth function. The aim of this manuscript is twofold. First, we study the decay estimate for a class of dispersive semigroup of the form $e^{itφ(\sqrt{-Δ_κ})}$.W e overcome the difficulty arising from the non-homogeneousity of $φ$ by frequency localization. As applications, in the next part of the paper,…
▽ More
Let $Δ_κ$ be the Dunkl Laplacian on $\mathbb{R}^n$ and $φ: \mathbb{R}^+ \to \mathbb{R}$ is a smooth function. The aim of this manuscript is twofold. First, we study the decay estimate for a class of dispersive semigroup of the form $e^{itφ(\sqrt{-Δ_κ})}$.W e overcome the difficulty arising from the non-homogeneousity of $φ$ by frequency localization. As applications, in the next part of the paper, we establish Strichartz estimates for some concrete wave equations associated with the Dunkl Laplacian $Δ_k,$ which corresponds to $φ(r)=r, r^2, r^2+r^4, \sqrt{1+r^2}, \sqrt{1+r^4}$, and $r^μ,0<μ\leq 2, μ\neq 1$. More precisely, we unify and simplify all the known dispersive estimates and extend to more general cases. Finally, using the decay estimates, we prove the global-in-time existence of small data Sobolev solutions for the nonlinear Klein-Gordon equation and beam equation with the power type nonlinearities.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Decay estimates and Strichartz inequalities for a class of dispersive equations on H-type groups
Authors:
Manli Song,
Jinggang Tan
Abstract:
Let $\mathcal{L}$ be the sub-Laplacian on H-type groups and $φ: \mathbb{R}^+ \to \mathbb{R}$ be a smooth function. The primary objective of the paper is to study the decay estimate for a class of dispersive semigroup given by $e^{itφ(\mathcal{L})}$. Inspired by earlier work of Guo-Peng-Wang \cite{GPW2008} in the Euclidean space and Song-Yang \cite{SY2023} on the Heisenberg group, we overcome the d…
▽ More
Let $\mathcal{L}$ be the sub-Laplacian on H-type groups and $φ: \mathbb{R}^+ \to \mathbb{R}$ be a smooth function. The primary objective of the paper is to study the decay estimate for a class of dispersive semigroup given by $e^{itφ(\mathcal{L})}$. Inspired by earlier work of Guo-Peng-Wang \cite{GPW2008} in the Euclidean space and Song-Yang \cite{SY2023} on the Heisenberg group, we overcome the difficulty arising from the non-homogeneousness of $φ$ by frequency localization, which is based on the non-commutative Fourier transform on H-type groups, the properties of the Laguerre functions and Bessel functions, and the stationary phase theorem. Finally, as applications, we derive the new Strichartz inequalities for the solutions of some specific equations, such as the fractional Schrödinger equation, the fourth-order Schrödinger equation, the beam equation and the Klein-Gordon equation, which corresponds to $φ(r)=r^α$, $r^2+r,\sqrt{1+r^2},\sqrt{1+r}$, respectively. Moreover, we also prove that the time decay is sharp in these cases.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
A Simple Learning-Augmented Algorithm for Online Packing with Concave Objectives
Authors:
Elena Grigorescu,
Young-San Lin,
Maoyuan Song
Abstract:
Learning-augmented algorithms has been extensively studied recently in the computer-science community, due to the potential of using machine learning predictions in order to improve the performance of algorithms. Predictions are especially useful for online algorithms making irrevocable decisions without knowledge of the future. Such learning-augmented algorithms aim to overcome the limitations of…
▽ More
Learning-augmented algorithms has been extensively studied recently in the computer-science community, due to the potential of using machine learning predictions in order to improve the performance of algorithms. Predictions are especially useful for online algorithms making irrevocable decisions without knowledge of the future. Such learning-augmented algorithms aim to overcome the limitations of classical online algorithms when the predictions are accurate, and still perform comparably when the predictions are inaccurate.
A common approach is to adapt existing online algorithms to the particular advice notion employed, which often involves understanding previous sophisticated algorithms and their analyses. However, ideally, one would simply use previous online solutions in a black-box fashion, without much loss in the approximation guarantees. Such clean solutions that avoid opening up black-boxes are often rare, and may be even missed the first time around. For example, Grigorescu et al. (NeurIPS 22) proposed a learning-augmented algorithms for online covering linear programs, but it later turned out that their results can be subsumed by a natural approach that switches between the advice and an online algorithm given as a black-box, as noted in their paper.
In this work, we introduce and analyze a simple learning-augmented algorithm for online packing problems with linear constraints and concave objectives. We exhibit several direct applications of our framework including online packing linear programming, knapsack, resource management benefit, throughput maximization, and network utility maximization. We further raise the problem of understanding necessary and sufficient conditions for when such simple black-box solutions may be optimal. We believe this is an important direction of research that would unify many ad-hoc approaches from the literature.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Does SGD really happen in tiny subspaces?
Authors:
Minhak Song,
Kwangjun Ahn,
Chulhee Yun
Abstract:
Understanding the training dynamics of deep neural networks is challenging due to their high-dimensional nature and intricate loss landscapes. Recent studies have revealed that, along the training trajectory, the gradient approximately aligns with a low-rank top eigenspace of the training loss Hessian, referred to as the dominant subspace. Given this alignment, this paper explores whether neural n…
▽ More
Understanding the training dynamics of deep neural networks is challenging due to their high-dimensional nature and intricate loss landscapes. Recent studies have revealed that, along the training trajectory, the gradient approximately aligns with a low-rank top eigenspace of the training loss Hessian, referred to as the dominant subspace. Given this alignment, this paper explores whether neural networks can be trained within the dominant subspace, which, if feasible, could lead to more efficient training methods. Our primary observation is that when the SGD update is projected onto the dominant subspace, the training loss does not decrease further. This suggests that the observed alignment between the gradient and the dominant subspace is spurious. Surprisingly, projecting out the dominant subspace proves to be just as effective as the original update, despite removing the majority of the original update component. We observe similar behavior across practical setups, including the large learning rate regime (also known as Edge of Stability), Sharpness-Aware Minimization, momentum, and adaptive optimizers. We discuss the main causes and implications of this spurious alignment, shedding light on the dynamics of neural network training.
△ Less
Submitted 10 March, 2025; v1 submitted 24 May, 2024;
originally announced May 2024.
-
An adaptive Euler-Maruyama scheme for SDDEs
Authors:
Dongyang Liu,
Minghui Song,
Yuhang Zhang
Abstract:
This paper proposes an adaptive numerical method for stochastic delay differential equations (SDDEs) with a non-global Lipschitz drift term and a non-constant delay, building upon the work of Wei Fang and others. The method adapts the step size based on the growth of the drift term. Differing slightly from the conventional Euler-Maruyama format, this paper addresses the estimation of the delay ter…
▽ More
This paper proposes an adaptive numerical method for stochastic delay differential equations (SDDEs) with a non-global Lipschitz drift term and a non-constant delay, building upon the work of Wei Fang and others. The method adapts the step size based on the growth of the drift term. Differing slightly from the conventional Euler-Maruyama format, this paper addresses the estimation of the delay term by substituting it with the numerically obtained solution closest to the left endpoint.This approach overcomes the challenge of numerical nodes not falling within the nodes after subtracting the delay. The paper proves the convergence of the numerical method for a class of non-global Lipschitz continuous SDDEs under the assumption that the step size function satisfies certain conditions.
△ Less
Submitted 29 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Cryptographic Hardness of Score Estimation
Authors:
Min Jae Song
Abstract:
We show that $L^2$-accurate score estimation, in the absence of strong assumptions on the data distribution, is computationally hard even when sample complexity is polynomial in the relevant problem parameters. Our reduction builds on the result of Chen et al. (ICLR 2023), who showed that the problem of generating samples from an unknown data distribution reduces to $L^2$-accurate score estimation…
▽ More
We show that $L^2$-accurate score estimation, in the absence of strong assumptions on the data distribution, is computationally hard even when sample complexity is polynomial in the relevant problem parameters. Our reduction builds on the result of Chen et al. (ICLR 2023), who showed that the problem of generating samples from an unknown data distribution reduces to $L^2$-accurate score estimation. Our hard-to-estimate distributions are the "Gaussian pancakes" distributions, originally due to Diakonikolas et al. (FOCS 2017), which have been shown to be computationally indistinguishable from the standard Gaussian under widely believed hardness assumptions from lattice-based cryptography (Bruna et al., STOC 2021; Gupte et al., FOCS 2022).
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Refined canonical stable Grothendieck polynomials and their duals, Part 2
Authors:
Byung-Hak Hwang,
Jihyeug Jang,
Jang Soo Kim,
Minho Song,
U-Keun Song
Abstract:
This paper is the sequel of the paper under the same title with part 1, where we introduced refined canonical stable Grothendieck polynomials and their duals with two families of infinite parameters. In this paper we give combinatorial interpretations for these polynomials using generalizations of set-valued tableaux and reverse plane partitions, respectively. Our results extend to their flagged a…
▽ More
This paper is the sequel of the paper under the same title with part 1, where we introduced refined canonical stable Grothendieck polynomials and their duals with two families of infinite parameters. In this paper we give combinatorial interpretations for these polynomials using generalizations of set-valued tableaux and reverse plane partitions, respectively. Our results extend to their flagged and skew versions.
△ Less
Submitted 22 April, 2025; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+α$ Moments
Authors:
Trung Dang,
Jasper C. H. Lee,
Maoyuan Song,
Paul Valiant
Abstract:
There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distribution…
▽ More
There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distributions with finite but unknown variance, and 2) the analysis of the median-of-means algorithm by [BCL13] and a lower bound by [DLLO16], characterizing the big-O optimal errors for distributions for which only a $1+α$ moment exists for $α\in (0,1)$. Both results, however, are optimal only in the worst case. We initiate the fine-grained study of the mean estimation problem: Can algorithms leverage useful features of the input distribution to beat the sub-Gaussian rate, without explicit knowledge of such features?
We resolve this question with an unexpectedly nuanced answer: "Yes in limited regimes, but in general no". For any distribution $p$ with a finite mean, we construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$ and $q$ are not distinguishable with high probability, and $q$ further preserves $p$'s moments up to constants. The main consequence is that no reasonable estimator can asymptotically achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22]. More generally, we introduce a new definitional framework to analyze the fine-grained optimality of algorithms, which we call "neighborhood optimality", interpolating between the unattainably strong "instance optimality" and the trivially weak "admissibility" definitions. Applying the new framework, we show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-optimal estimator without constant factor slackness.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Linear attention is (maybe) all you need (to understand transformer optimization)
Authors:
Kwangjun Ahn,
Xiang Cheng,
Minhak Song,
Chulhee Yun,
Ali Jadbabaie,
Suvrit Sra
Abstract:
Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical linearized shallow Transformer model. Specifically, we train linear Transformers to solve regression tasks, inspired by J.~von Oswald et al.~(ICML 2023), and…
▽ More
Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical linearized shallow Transformer model. Specifically, we train linear Transformers to solve regression tasks, inspired by J.~von Oswald et al.~(ICML 2023), and K.~Ahn et al.~(NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of Transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized Transformer model could actually be a valuable, realistic abstraction for understanding Transformer optimization.
△ Less
Submitted 13 March, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Kemeny's constant and enumerating Braess edges in trees
Authors:
Jihyeug Jang,
Mark Kempton,
Sooyeong Kim,
Adam Knudson,
Neal Madras,
Minho Song
Abstract:
We study the problem of enumerating Braess edges for Kemeny's constant in trees. We obtain bounds and asympotic results for the number of Braess edges in some families of trees.
We study the problem of enumerating Braess edges for Kemeny's constant in trees. We obtain bounds and asympotic results for the number of Braess edges in some families of trees.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
SGMM: Stochastic Approximation to Generalized Method of Moments
Authors:
Xiaohong Chen,
Sokbae Lee,
Yuan Liao,
Myung Hwan Seo,
Youngki Shin,
Myunghyun Song
Abstract:
We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure c…
▽ More
We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure convergence, and the (functional) central limit theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we propose online versions of the Durbin-Wu-Hausman and Sargan-Hansen tests that can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo simulations show that as the sample size increases, the SGMM matches the standard (offline) GMM in terms of estimation accuracy and gains over computational efficiency, indicating its practical value for both large-scale and online datasets. We demonstrate the efficacy of our approach by a proof of concept using two well known empirical examples with large sample sizes.
△ Less
Submitted 30 October, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Quantile Optimization via Multiple Timescale Local Search for Black-box Functions
Authors:
Jiaqiao Hu,
Meichen Song,
Michael C. Fu
Abstract:
We consider quantile optimization of black-box functions that are estimated with noise. We propose two new iterative three-timescale local search algorithms. The first algorithm uses an appropriately modified finite-difference-based gradient estimator that requires $2d$ + 1 samples of the black-box function per iteration of the algorithm, where $d$ is the number of decision variables (dimension of…
▽ More
We consider quantile optimization of black-box functions that are estimated with noise. We propose two new iterative three-timescale local search algorithms. The first algorithm uses an appropriately modified finite-difference-based gradient estimator that requires $2d$ + 1 samples of the black-box function per iteration of the algorithm, where $d$ is the number of decision variables (dimension of the input vector). For higher-dimensional problems, this algorithm may not be practical if the black-box function estimates are expensive. The second algorithm employs a simultaneous-perturbation-based gradient estimator that uses only three samples for each iteration regardless of problem dimension. Under appropriate conditions, we show the almost sure convergence of both algorithms. In addition, for the class of strongly convex functions, we further establish their (finite-time) convergence rate through a novel fixed-point argument. Simulation experiments indicate that the algorithms work well on a variety of test problems and compare well with recently proposed alternative methods.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Decay estimates for a class of semigroups related to self-adjoint operators on metric measure spaces
Authors:
Guoxia Feng,
Manli Song,
Huoxiong Wu
Abstract:
Assume that $(X,d,μ)$ is a metric space endowed with a non-negative Borel measure $μ$ satisfying the doubling condition and the additional condition that $μ(B(x,r))\gtrsim r^n$ for any $x\in X, \,r>0$ and some $n\geq1$. Let $L$ be a non-negative self-adjoint operator on $L^2(X,μ)$. We assume that $e^{-tL}$ satisfies a Gaussian upper bound and the Schrödinger operator $e^{itL}$ satisfies an…
▽ More
Assume that $(X,d,μ)$ is a metric space endowed with a non-negative Borel measure $μ$ satisfying the doubling condition and the additional condition that $μ(B(x,r))\gtrsim r^n$ for any $x\in X, \,r>0$ and some $n\geq1$. Let $L$ be a non-negative self-adjoint operator on $L^2(X,μ)$. We assume that $e^{-tL}$ satisfies a Gaussian upper bound and the Schrödinger operator $e^{itL}$ satisfies an $L^1\to L^\infty$ decay estimate of the form \begin{equation*} \|e^{itL}\|_{L^1\to L^\infty} \lesssim |t|^{-\frac{n}{2}}. \end{equation*}
Then for a general class of dispersive semigroup $e^{itφ(L)}$, where $φ: \mathbb{R}^+ \to \mathbb{R}$ is smooth, we establish a similar $L^1\to L^\infty$ decay estimate by a suitable subordination formula connecting it with the Schrödinger operator $e^{itL}$. As applications, we derive new Strichartz estimates for several dispersive equations related to Hermite operators, twisted Laplacians and Laguerre operators.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
Authors:
Minhak Song,
Chulhee Yun
Abstract:
Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe the Edge of Stability (EoS) phenomenon. The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this…
▽ More
Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe the Edge of Stability (EoS) phenomenon. The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.
△ Less
Submitted 26 October, 2023; v1 submitted 9 July, 2023;
originally announced July 2023.
-
Improved Projection-free Online Continuous Submodular Maximization
Authors:
Yucheng Liao,
Yuanyu Wan,
Chang Yao,
Mingli Song
Abstract:
We investigate the problem of online learning with monotone and continuous DR-submodular reward functions, which has received great attention recently. To efficiently handle this problem, especially in the case with complicated decision sets, previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations and linear optimiza…
▽ More
We investigate the problem of online learning with monotone and continuous DR-submodular reward functions, which has received great attention recently. To efficiently handle this problem, especially in the case with complicated decision sets, previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations and linear optimization steps in total. However, it only attains a $(1-1/e)$-regret bound of $O(T^{4/5})$. In this paper, we propose an improved projection-free algorithm, namely POBGA, which reduces the regret bound to $O(T^{3/4})$ while keeping the same computational complexity as Mono-FW. Instead of modifying Mono-FW, our key idea is to make a novel combination of a projection-based algorithm called online boosting gradient ascent, an infeasible projection technique, and a blocking technique. Furthermore, we consider the decentralized setting and develop a variant of POBGA, which not only reduces the current best regret bound of efficient projection-free algorithms for this setting from $O(T^{4/5})$ to $O(T^{3/4})$, but also reduces the total communication complexity from $O(T)$ to $O(\sqrt{T})$.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Conditional mean embeddings and optimal feature selection via positive definite kernels
Authors:
Palle E. T. Jorgensen,
Myung-Sin Song,
James Tian
Abstract:
Motivated by applications, we consider here new operator theoretic approaches to Conditional mean embeddings (CME). Our present results combine a spectral analysis-based optimization scheme with the use of kernels, stochastic processes, and constructive learning algorithms. For initially given non-linear data, we consider optimization-based feature selections. This entails the use of convex sets o…
▽ More
Motivated by applications, we consider here new operator theoretic approaches to Conditional mean embeddings (CME). Our present results combine a spectral analysis-based optimization scheme with the use of kernels, stochastic processes, and constructive learning algorithms. For initially given non-linear data, we consider optimization-based feature selections. This entails the use of convex sets of positive definite (p.d.) kernels in a construction of optimal feature selection via regression algorithms from learning models. Thus, with initial inputs of training data (for a suitable learning algorithm,) each choice of p.d. kernel $K$ in turn yields a variety of Hilbert spaces and realizations of features. A novel idea here is that we shall allow an optimization over selected sets of kernels $K$ from a convex set $C$ of positive definite kernels $K$. Hence our \textquotedblleft optimal\textquotedblright{} choices of feature representations will depend on a secondary optimization over p.d. kernels $K$ within a specified convex set $C$.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Linear programming on the Stiefel manifold
Authors:
Mengmeng Song,
Yong Xia
Abstract:
Linear programming on the Stiefel manifold (LPS) is studied for the first time. It aims at minimizing a linear objective function over the set of all $p$-tuples of orthonormal vectors in ${\mathbb R}^n$ satisfying $k$ additional linear constraints. Despite the classical polynomial-time solvable case $k=0$, general (LPS) is NP-hard. According to the Shapiro-Barvinok-Pataki theorem, (LPS) admits an…
▽ More
Linear programming on the Stiefel manifold (LPS) is studied for the first time. It aims at minimizing a linear objective function over the set of all $p$-tuples of orthonormal vectors in ${\mathbb R}^n$ satisfying $k$ additional linear constraints. Despite the classical polynomial-time solvable case $k=0$, general (LPS) is NP-hard. According to the Shapiro-Barvinok-Pataki theorem, (LPS) admits an exact semidefinite programming (SDP) relaxation when $p(p+1)/2\le n-k$, which is tight when $p=1$. Surprisingly, we can greatly strengthen this sufficient exactness condition to $p\le n-k$, which covers the classical case $p\le n$ and $k=0$. Regarding (LPS) as a smooth nonlinear programming problem, we reveal a nice property that under the linear independence constraint qualification, the standard first- and second-order {\it local} necessary optimality conditions are sufficient for {\it global} optimality when $p+1\le n-k$.
△ Less
Submitted 31 October, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Operator theory, kernels, and Feedforward Neural Networks
Authors:
Palle E. T. Jorgensen,
Myung-Sin Song,
James Tian
Abstract:
In this paper we show how specific families of positive definite kernels serve as powerful tools in analyses of iteration algorithms for multiple layer feedforward Neural Network models. Our focus is on particular kernels that adapt well to learning algorithms for data-sets/features which display intrinsic self-similarities at feedforward iterations of scaling.
In this paper we show how specific families of positive definite kernels serve as powerful tools in analyses of iteration algorithms for multiple layer feedforward Neural Network models. Our focus is on particular kernels that adapt well to learning algorithms for data-sets/features which display intrinsic self-similarities at feedforward iterations of scaling.
△ Less
Submitted 5 January, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
Learning Single-Index Models with Shallow Neural Networks
Authors:
Alberto Bietti,
Joan Bruna,
Clayton Sanford,
Min Jae Song
Abstract:
Single-index models are a class of functions given by an unknown univariate ``link'' function applied to an unknown one-dimensional projection of the input. These models are particularly relevant in high dimension, when the data might present low-dimensional structure that learning algorithms should adapt to. While several statistical aspects of this model, such as the sample complexity of recover…
▽ More
Single-index models are a class of functions given by an unknown univariate ``link'' function applied to an unknown one-dimensional projection of the input. These models are particularly relevant in high dimension, when the data might present low-dimensional structure that learning algorithms should adapt to. While several statistical aspects of this model, such as the sample complexity of recovering the relevant (one-dimensional) subspace, are well-understood, they rely on tailored algorithms that exploit the specific structure of the target function. In this work, we introduce a natural class of shallow neural networks and study its ability to learn single-index models via gradient flow. More precisely, we consider shallow networks in which biases of the neurons are frozen at random initialization. We show that the corresponding optimization landscape is benign, which in turn leads to generalization guarantees that match the near-optimal sample complexity of dedicated semi-parametric methods.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Kemeny's constant and Wiener index on trees
Authors:
Jihyeug Jang,
Sooyeong Kim,
Minho Song
Abstract:
On trees of fixed order, we show a direct relation between Kemeny's constant and Wiener index, and provide a new formula of Kemeny's constant from the relation with a combinatorial interpretation. Moreover, the relation simplifies proofs of several known results for extremal trees in terms of Kemeny's constant for random walks on trees. Finally, we provide various families of co-Kemeny's mates, wh…
▽ More
On trees of fixed order, we show a direct relation between Kemeny's constant and Wiener index, and provide a new formula of Kemeny's constant from the relation with a combinatorial interpretation. Moreover, the relation simplifies proofs of several known results for extremal trees in terms of Kemeny's constant for random walks on trees. Finally, we provide various families of co-Kemeny's mates, which are two non-isomorphic connected graphs with the same Kemeny's constant, and we also give a necessary condition for a tree to attain maximum Kemeny's constant for trees with fixed diameter.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Learning-Augmented Algorithms for Online Linear and Semidefinite Programming
Authors:
Elena Grigorescu,
Young-San Lin,
Sandeep Silwal,
Maoyuan Song,
Samson Zhou
Abstract:
Semidefinite programming (SDP) is a unifying framework that generalizes both linear programming and quadratically-constrained quadratic programming, while also yielding efficient solvers, both in theory and in practice. However, there exist known impossibility results for approximating the optimal solution when constraints for covering SDPs arrive in an online fashion. In this paper, we study onli…
▽ More
Semidefinite programming (SDP) is a unifying framework that generalizes both linear programming and quadratically-constrained quadratic programming, while also yielding efficient solvers, both in theory and in practice. However, there exist known impossibility results for approximating the optimal solution when constraints for covering SDPs arrive in an online fashion. In this paper, we study online covering linear and semidefinite programs in which the algorithm is augmented with advice from a possibly erroneous predictor. We show that if the predictor is accurate, we can efficiently bypass these impossibility results and achieve a constant-factor approximation to the optimal solution, i.e., consistency. On the other hand, if the predictor is inaccurate, under some technical conditions, we achieve results that match both the classical optimal upper bounds and the tight lower bounds up to constant factors, i.e., robustness.
More broadly, we introduce a framework that extends both (1) the online set cover problem augmented with machine-learning predictors, studied by Bamas, Maggiori, and Svensson (NeurIPS 2020), and (2) the online covering SDP problem, initiated by Elad, Kale, and Naor (ICALP 2016). Specifically, we obtain general online learning-augmented algorithms for covering linear programs with fractional advice and constraints, and initiate the study of learning-augmented algorithms for covering SDP problems.
Our techniques are based on the primal-dual framework of Buchbinder and Naor (Mathematics of Operations Research, 34, 2009) and can be further adjusted to handle constraints where the variables lie in a bounded region, i.e., box constraints.
△ Less
Submitted 21 October, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Infinite-Dimensional Stochastic Transforms and Reproducing Kernel Hilbert space
Authors:
Palle E. T. Jorgensen,
Myung-Sin Song,
James Feng Tian
Abstract:
By way of concrete presentations, we construct two infinite-dimensional transforms at the crossroads of Gaussian fields and reproducing kernel Hilbert spaces (RKHS), thus leading to a new infinite-dimensional Fourier transform in a general setting of Gaussian processes. Our results serve to unify existing tools from infinite-dimensional analysis.
By way of concrete presentations, we construct two infinite-dimensional transforms at the crossroads of Gaussian fields and reproducing kernel Hilbert spaces (RKHS), thus leading to a new infinite-dimensional Fourier transform in a general setting of Gaussian processes. Our results serve to unify existing tools from infinite-dimensional analysis.
△ Less
Submitted 29 March, 2023; v1 submitted 8 September, 2022;
originally announced September 2022.
-
The rate of Lp-convergence for the Euler-Maruyama method of the stochastic differential equations with Markovian switching
Authors:
Minghui Song,
Yuhang Zhang,
Mingzhu Liu
Abstract:
This work deals with the Euler-Maruyama (EM) scheme for stochastic differential equations with Markovian switching (SDEwMSs). We focus on the Lp-convergence rate (p is greater than or equal to 2) of the EM method given in this paper. As far as we know, the skeleton process of the Markov chain is used in the continuous numerical methods in most papers. By contrast, the continuous EM method in this…
▽ More
This work deals with the Euler-Maruyama (EM) scheme for stochastic differential equations with Markovian switching (SDEwMSs). We focus on the Lp-convergence rate (p is greater than or equal to 2) of the EM method given in this paper. As far as we know, the skeleton process of the Markov chain is used in the continuous numerical methods in most papers. By contrast, the continuous EM method in this paper is to use the Markov chain directly. To the best of our knowledge, there are only two papers that consider the rate of Lp-convergence, which is no more than 1/p (p is greater than or equal to 2) in these papers. The contribution of this paper is that the rate of Lp-convergence of the EM method can reach 1/2. We believe that the technique used in this paper to construct the EM method can also be used to construct other methods for SDEwMSs.
△ Less
Submitted 28 August, 2022;
originally announced August 2022.
-
Orthonormal Strichartz inequalities for the $(k, a)$-generalized Laguerre operator and Dunkl operator
Authors:
Shyam Swarup Mondal,
Manli Song
Abstract:
Let $Δ_{k,a}$ and $Δ_k $ be the $(k,a)$-generalized Laguerre operator and the Dunkl Laplacian operator on $\mathbb{R}^n$, respectively. The aim of this article is twofold. First, we prove a restriction theorem for the Fourier-$Δ_{k,a}$ transform. Next, as an application of the restriction problem, we establish Strichartz estimates for orthonormal families of initial data for the Schrödinger propag…
▽ More
Let $Δ_{k,a}$ and $Δ_k $ be the $(k,a)$-generalized Laguerre operator and the Dunkl Laplacian operator on $\mathbb{R}^n$, respectively. The aim of this article is twofold. First, we prove a restriction theorem for the Fourier-$Δ_{k,a}$ transform. Next, as an application of the restriction problem, we establish Strichartz estimates for orthonormal families of initial data for the Schrödinger propagator $e^{-i t Δ_{k, a}} $ associated with the operator $ Δ_{k, a}$. Further, using the classical Strichartz estimates for the free Schrödinger propagator $e^{-i t Δ_{k, a}} $ for orthonormal systems of initial data and the kernel relation between the semigroups $e^{-i t Δ_{k, a}}$ and $e^{i \frac{t}{a}\|x\|^{2-a} Δ_{k}},$ we prove Strichartz estimates for orthonormal systems of initial data associated with the Dunkl operator $ Δ_k $ on $\mathbb{R}^n$. Finally, we present some applications to our aforementioned results.
△ Less
Submitted 25 August, 2022;
originally announced August 2022.
-
Strictly monotone sequences of lower and upper bounds on Perron values and their combinatorial applications
Authors:
Sooyeong Kim,
Minho Song
Abstract:
In this paper, we present monotone sequences of lower and upper bounds on the Perron value of a nonngeative matrix, and we study their strict monotonicity. Using those sequences, we provide two combinatorial applications. One is to improve bounds on Perron values of rooted trees in combinatorial settings, in order to find characteristic sets of trees. The other is to generate log-concave and log-c…
▽ More
In this paper, we present monotone sequences of lower and upper bounds on the Perron value of a nonngeative matrix, and we study their strict monotonicity. Using those sequences, we provide two combinatorial applications. One is to improve bounds on Perron values of rooted trees in combinatorial settings, in order to find characteristic sets of trees. The other is to generate log-concave and log-convex sequences through the monotone sequences.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
On globally solving nonconvex trust region subproblem via projected gradient method
Authors:
Mengmeng Song,
Yong Xia,
Jinyang Zheng
Abstract:
The trust region subproblem (TRS) is to minimize a possibly nonconvex quadratic function over a Euclidean ball. There are typically two cases for (TRS), the so-called ``easy case'' and ``hard case''. Even in the ``easy case'', the sequence generated by the classical projected gradient method (PG) may converge to a saddle point at a sublinear local rate, when the initial point is arbitrarily select…
▽ More
The trust region subproblem (TRS) is to minimize a possibly nonconvex quadratic function over a Euclidean ball. There are typically two cases for (TRS), the so-called ``easy case'' and ``hard case''. Even in the ``easy case'', the sequence generated by the classical projected gradient method (PG) may converge to a saddle point at a sublinear local rate, when the initial point is arbitrarily selected from a nonzero measure feasible set. To our surprise, when applying (PG) to solve a cheap and possibly nonconvex reformulation of (TRS), the generated sequence initialized with {\it any} feasible point almost always converges to its global minimizer. The local convergence rate is at least linear for the ``easy case'', without assuming that we have possessed the information that the ``easy case'' holds. We also consider how to use (PG) to globally solve equality-constrained (TRS).
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Restriction theorems and Strichartz inequalities for the Laguerre operator involving orthonormal functions
Authors:
Guoxia Feng,
Manli Song
Abstract:
In this paper, we prove restriction theorems for the Fourier-Laguerre transform and establish Strichartz estimates for the Schrödinger propagator $e^{-itL_α}$ for the Laguerre operator $L_α=-Δ-\sum_{j=1}^{n}(\dfrac{2α_j+1}{x_j}\dfrac{\partial}{\partial x_j})+\dfrac{|x|^2}{4}$, $α=(α_1,α_2,\cdots,α_n)\in{(-\frac{1}{2},\infty)^n}$ on $\mathbb{R}_+^n$ involving systems of orthonormal functions.
In this paper, we prove restriction theorems for the Fourier-Laguerre transform and establish Strichartz estimates for the Schrödinger propagator $e^{-itL_α}$ for the Laguerre operator $L_α=-Δ-\sum_{j=1}^{n}(\dfrac{2α_j+1}{x_j}\dfrac{\partial}{\partial x_j})+\dfrac{|x|^2}{4}$, $α=(α_1,α_2,\cdots,α_n)\in{(-\frac{1}{2},\infty)^n}$ on $\mathbb{R}_+^n$ involving systems of orthonormal functions.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Decay estimates for a class of wave equations on the Heisenberg group
Authors:
Manli Song,
Jiale Yang
Abstract:
In this paper, we study a class of dispersive wave equations on the Heisenberg group $H^n$. Based on the group Fourier transform on $H^n$, the properties of the Laguerre functions and the stationary phase lemma, we establish the decay estimates for a class of dispersive semigroup on $H^n$ given by $e^{itφ(\mathcal{L})}$, where $φ: \mathbb{R}^+ \to \mathbb{R}$ is smooth, and $\mathcal{L}$ is the su…
▽ More
In this paper, we study a class of dispersive wave equations on the Heisenberg group $H^n$. Based on the group Fourier transform on $H^n$, the properties of the Laguerre functions and the stationary phase lemma, we establish the decay estimates for a class of dispersive semigroup on $H^n$ given by $e^{itφ(\mathcal{L})}$, where $φ: \mathbb{R}^+ \to \mathbb{R}$ is smooth, and $\mathcal{L}$ is the sub-Laplacian on $H^n$. Finally, using the duality arguments, we apply the obtained results to derive the Strichartz inequalities for the solutions of some specific equations, such as the fractional Schrödinger equation, the fractional wave equation and the fourth-order Schrödinger equation.
△ Less
Submitted 20 May, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
Negative moments of orthogonal polynomials
Authors:
Jihyeug Jang,
Donghyun Kim,
Jang Soo Kim,
Minho Song,
U-Keun Song
Abstract:
If a sequence indexed by nonnegative integers satisfies a linear recurrence without constant terms, one can extend the indices of the sequence to negative integers using the recurrence. Recently, Cigler and Krattenthaler showed that the negative version of the number of bounded Dyck paths is the number of bounded alternating sequences. In this paper we provide two methods to compute the negative v…
▽ More
If a sequence indexed by nonnegative integers satisfies a linear recurrence without constant terms, one can extend the indices of the sequence to negative integers using the recurrence. Recently, Cigler and Krattenthaler showed that the negative version of the number of bounded Dyck paths is the number of bounded alternating sequences. In this paper we provide two methods to compute the negative versions of sequences related to moments of orthogonal polynomials. We give a combinatorial model for the negative version of the number of bounded Motzkin paths. We also prove two conjectures of Cigler and Krattenthaler on reciprocity between determinants.
△ Less
Submitted 7 March, 2023; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Lattice-Based Methods Surpass Sum-of-Squares in Clustering
Authors:
Ilias Zadik,
Min Jae Song,
Alexander S. Wein,
Joan Bruna
Abstract:
Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering d-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the cl…
▽ More
Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering d-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the class of low-degree polynomial methods and the sum-of-squares (SoS) hierarchy for recovering certain hidden structures planted in Gaussian clustering instances. Prior work on many similar inference tasks portends that such lower bounds strongly suggest the presence of an inherent statistical-to-computational gap for clustering, that is, a parameter regime where the clustering task is statistically possible but no polynomial-time algorithm succeeds.
One special case of the clustering task we consider is equivalent to the problem of finding a planted hypercube vector in an otherwise random subspace. We show that, perhaps surprisingly, this particular clustering model does not exhibit a statistical-to-computational gap, even though the aforementioned low-degree and SoS lower bounds continue to apply in this case. To achieve this, we give a polynomial-time algorithm based on the Lenstra--Lenstra--Lovasz lattice basis reduction method which achieves the statistically-optimal sample complexity of d+1 samples. This result extends the class of problems whose conjectured statistical-to-computational gaps can be "closed" by "brittle" polynomial-time algorithms, highlighting the crucial but subtle role of noise in the onset of statistical-to-computational gaps.
△ Less
Submitted 7 January, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Local Optimality Conditions for a Class of Hidden Convex Optimization
Authors:
Mengmeng Song,
Yong Xia,
Hongying Liu
Abstract:
Hidden convex optimization is such a class of nonconvex optimization problems that can be globally solved in polynomial time via equivalent convex programming reformulations. In this paper, we focus on checking local optimality in hidden convex optimization. We first introduce a class of hidden convex optimization problems by jointing the classical nonconvex trust-region subproblem (TRS) with conv…
▽ More
Hidden convex optimization is such a class of nonconvex optimization problems that can be globally solved in polynomial time via equivalent convex programming reformulations. In this paper, we focus on checking local optimality in hidden convex optimization. We first introduce a class of hidden convex optimization problems by jointing the classical nonconvex trust-region subproblem (TRS) with convex optimization (CO), and then present a comprehensive study on local optimality conditions. In order to guarantee the existence of a necessary and sufficient condition for local optimality, we need more restrictive assumptions. To our surprise, while (TRS) has at most one local non-global minimizer and (CO) has no local non-global minimizer, their joint problem could have more than one local non-global minimizer.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
On local minimizers of generalized trust-region subproblem
Authors:
Jiulin Wang,
Mengmeng Song,
Yong Xia
Abstract:
Generalized trust-region subproblem (GT) is a nonconvex quadratic optimization with a single quadratic constraint. It reduces to the classical trust-region subproblem (T) if the constraint set is a Euclidean ball. (GT) is polynomially solvable based on its inherent hidden convexity. In this paper, we study local minimizers of (GT). Unlike (T) with at most one local nonglobal minimizer, we can prov…
▽ More
Generalized trust-region subproblem (GT) is a nonconvex quadratic optimization with a single quadratic constraint. It reduces to the classical trust-region subproblem (T) if the constraint set is a Euclidean ball. (GT) is polynomially solvable based on its inherent hidden convexity. In this paper, we study local minimizers of (GT). Unlike (T) with at most one local nonglobal minimizer, we can prove that two-dimensional (GT) has at most two local nonglobal minimizers, which are shown by example to be attainable. The main contribution of this paper is to prove that, at any local nonglobal minimizer of (GT), not only the strict complementarity condition holds, but also the standard second-order sufficient optimality condition remains necessary. As a corollary, finding all local nonglobal minimizers of (GT) or proving the nonexistence can be done in polynomial time. Finally, for (GT) in complex domain, we prove that there is no local nonglobal minimizer, which demonstrates that real-valued optimization problem may be more difficult to solve than its complex version.
△ Less
Submitted 13 September, 2021; v1 submitted 31 August, 2021;
originally announced August 2021.
-
Polyak's convexity theorem, Yuan's lemma and S-lemma: extensions and applications
Authors:
Mengmeng Song,
Yong Xia
Abstract:
We extend Polyak's theorem on the convexity of joint numerical range from three to any number of quadratic forms on condition that they can be generated by three quadratic forms with a positive definite linear combination. Our new result covers the fundamental Dines's theorem. As applications, we further extend Yuan's lemma and S-lemma, respectively. Our extended Yuan's lemma is used to build a mo…
▽ More
We extend Polyak's theorem on the convexity of joint numerical range from three to any number of quadratic forms on condition that they can be generated by three quadratic forms with a positive definite linear combination. Our new result covers the fundamental Dines's theorem. As applications, we further extend Yuan's lemma and S-lemma, respectively. Our extended Yuan's lemma is used to build a more generalized assumption than that of Haeser (J. Optim. Theory Appl. 174(3): 641-649, 2017), under which the standard second-order necessary optimality condition holds at local minimizer. The extended S-lemma reveals strong duality of homogeneous quadratic optimization problem with two bilateral quadratic constraints.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Trust-region and $p$-regularized subproblems: local nonglobal minimum is the second smallest objective function value among all first-order stationary points
Authors:
Jiulin Wang,
Mengmeng Song,
Yong Xia
Abstract:
The local nonglobal minimizer of trust-region subproblem, if it exists, is shown to have the second smallest objective function value among all KKT points. This new property is extended to $p$-regularized subproblem. As a corollary, we show for the first time that finding the local nonglobal minimizer of Nesterov-Polyak subproblem corresponds to a generalized eigenvalue problem.
The local nonglobal minimizer of trust-region subproblem, if it exists, is shown to have the second smallest objective function value among all KKT points. This new property is extended to $p$-regularized subproblem. As a corollary, we show for the first time that finding the local nonglobal minimizer of Nesterov-Polyak subproblem corresponds to a generalized eigenvalue problem.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
On Local Minimizers of Quadratically Constrained Nonconvex Homogeneous Quadratic Optimization with at Most Two Constraints
Authors:
Mengmeng Song,
Hongying Liu,
Yong Xia
Abstract:
We study nonconvex homogeneous quadratically constrained quadratic optimization with one or two constraints, denoted by (QQ1) and (QQ2), respectively. (QQ2) contains (QQ1), trust region subproblem (TRS) and ellipsoid regularized total least squares problem as special cases. It is known that there is a necessary and sufficient optimality condition for the global minimizer of (QQ2). In this paper, w…
▽ More
We study nonconvex homogeneous quadratically constrained quadratic optimization with one or two constraints, denoted by (QQ1) and (QQ2), respectively. (QQ2) contains (QQ1), trust region subproblem (TRS) and ellipsoid regularized total least squares problem as special cases. It is known that there is a necessary and sufficient optimality condition for the global minimizer of (QQ2). In this paper, we first show that any local minimizer of (QQ1) is globally optimal. Unlike its special case (TRS) with at most one local non-global minimizer, (QQ2) may have infinitely many local non-global minimizers. At any local non-global minimizer of (QQ2), both linearly independent constraint qualification and strict complementary condition hold, and the Hessian of the Lagrangian has exactly one negative eigenvalue. As a main contribution, we prove that the standard second-order sufficient optimality condition for any strict local non-global minimizer of (QQ2) remains necessary. Applications and the impossibility of further extension are discussed.
△ Less
Submitted 20 July, 2021; v1 submitted 12 July, 2021;
originally announced July 2021.
-
On the Cryptographic Hardness of Learning Single Periodic Neurons
Authors:
Min Jae Song,
Ilias Zadik,
Joan Bruna
Abstract:
We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice p…
▽ More
We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice problems, whose hardness form the foundation of lattice-based cryptography. Our core hard family of functions, which are well-approximated by one-layer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradient-based (Shamir'18), and Statistical Query (SQ) algorithms (Song et al.'17). We show that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomial-time algorithms, beyond gradient-based and SQ algorithms, under the aforementioned cryptographic assumptions. Moreover, we demonstrate the necessity of noise in the hardness result by designing a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise. Our proposed algorithm is not a gradient-based or an SQ algorithm, but is rather based on the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm. Furthermore, in the absence of noise, this algorithm can be directly applied to solve CLWE detection (Bruna et al.'21) and phase retrieval with an optimal sample complexity of $d+1$ samples. In the former case, this improves upon the quadratic-in-$d$ sample complexity required in (Bruna et al.'21).
△ Less
Submitted 16 September, 2021; v1 submitted 20 June, 2021;
originally announced June 2021.