-
Pogorelov type interior $C^2$ estimate for Hessian quotient equation and its application
Authors:
Siyuan Lu,
Yi-Lin Tsai
Abstract:
In this paper, we derive a Pogorelov type interior $C^2$ estimate for the Hessian quotient equation $\frac{σ_n}{σ_k}\left( D^2u\right) =f$. As an application, we show that convex viscosity solutions are regular for $k\leq n-3$ if $u\in C^{1,α}$ with $α>1-\frac{2}{n-k}$ or $u\in W^{2,p}$ with $p\geq\frac{(n-1)(n-k)}{2}$. Both exponents are sharp in view of the example in arXiv:2401.12229.
In this paper, we derive a Pogorelov type interior $C^2$ estimate for the Hessian quotient equation $\frac{σ_n}{σ_k}\left( D^2u\right) =f$. As an application, we show that convex viscosity solutions are regular for $k\leq n-3$ if $u\in C^{1,α}$ with $α>1-\frac{2}{n-k}$ or $u\in W^{2,p}$ with $p\geq\frac{(n-1)(n-k)}{2}$. Both exponents are sharp in view of the example in arXiv:2401.12229.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Preasymptotic error estimates of higher-order EEM for the time-harmonic Maxwell equations with large wave number
Authors:
Shuaishuai Lu,
Haijun Wu
Abstract:
The time-harmonic Maxwell equations with impedance boundary condition and large wave number are discretized using the second-type Nédélec's edge element method (EEM). Preasymptotic error bounds are derived, showing that, under the mesh condition $κ^{2p+1}h^{2p}$ being sufficiently small, the error of the EEM of order $p$ in the energy norm is bounded by…
▽ More
The time-harmonic Maxwell equations with impedance boundary condition and large wave number are discretized using the second-type Nédélec's edge element method (EEM). Preasymptotic error bounds are derived, showing that, under the mesh condition $κ^{2p+1}h^{2p}$ being sufficiently small, the error of the EEM of order $p$ in the energy norm is bounded by $\mathcal{O}\big(κ^{p}h^p + κ^{2p+1}h^{2p}\big)$, while the error in the $κ$-scaled $\boldsymbol{L}^2$ norm is bounded by $\mathcal{O}\big((κh)^{p+1} + κ^{2p+1} h^{2p}\big)$. Here, $κ$ is the wave number and $h$ is the mesh size. Numerical tests are provided to illustrate our theoretical results.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
A Class of Optimal Directed Graphs for Network Synchronization
Authors:
Susie Lu,
Ji Liu
Abstract:
In a paper by Nishikawa and Motter, a quantity called the normalized spread of the Laplacian eigenvalues is used to measure the synchronizability of certain network dynamics. Through simulations, and without theoretical validation, it is conjectured that among all simple directed graphs with a fixed number of vertices and arcs, the optimal value of this quantity is achieved if the Laplacian spectr…
▽ More
In a paper by Nishikawa and Motter, a quantity called the normalized spread of the Laplacian eigenvalues is used to measure the synchronizability of certain network dynamics. Through simulations, and without theoretical validation, it is conjectured that among all simple directed graphs with a fixed number of vertices and arcs, the optimal value of this quantity is achieved if the Laplacian spectrum satisfies a specific pattern. This paper proves that the conjectured Laplacian spectrum is always achievable by a class of almost regular directed graphs. For a few special cases, it is also shown that the corresponding value of the quantity is indeed optimal.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Stability for an inverse flux and an inverse boundary coefficient problems
Authors:
Mourad Choulli,
Shuai Lu,
Hiroshi Takase
Abstract:
We establish both Lipschitz and logarithmic stability estimates for an inverse flux problem and subsequently apply these results to an inverse boundary coefficient problem. Furthermore, we demonstrate how the stability inequalities derived for the inverse boundary coefficient problem can be utilized in solving an inverse corrosion problem. This involves determining the unknown corrosion coefficien…
▽ More
We establish both Lipschitz and logarithmic stability estimates for an inverse flux problem and subsequently apply these results to an inverse boundary coefficient problem. Furthermore, we demonstrate how the stability inequalities derived for the inverse boundary coefficient problem can be utilized in solving an inverse corrosion problem. This involves determining the unknown corrosion coefficient on an inaccessible part of the boundary based on measurements taken on the accessible part of the boundary.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Stochastic tamed 3D Navier-Stokes equations with locally weak monotonicity coefficients: existence, uniqueness and averaging principle
Authors:
Shuaishuai Lu,
Xue Yang,
Yong Li
Abstract:
This paper investigates the stochastic tamed 3D Navier-Stokes equations with locally weak monotonicity coefficients in the whole space as well as in the three-dimensional torus, which play a crucial role in turbulent flows analysis. A significant issue is addressed in this work, specifically, the reduced regularity of the coefficients and the inapplicability of Gronwall's lemma complicates the est…
▽ More
This paper investigates the stochastic tamed 3D Navier-Stokes equations with locally weak monotonicity coefficients in the whole space as well as in the three-dimensional torus, which play a crucial role in turbulent flows analysis. A significant issue is addressed in this work, specifically, the reduced regularity of the coefficients and the inapplicability of Gronwall's lemma complicates the establishment of pathwise uniqueness for weak solutions. Initially, the existence of a martingale solution for the system is established via Galerkin approximation; thereafter, the pathwise uniqueness of this martingale solution is confirmed by constructing a specialized control function. Ultimately, the Yamada-Watanabe theorem is employed to establish the existence and uniqueness of the strong solution to the system. Furthermore, an averaging principle, referred to as the first Bogolyubov theorem, is established for stochastic tamed 3D Navier-Stokes equations with highly oscillating components, where the coefficients satisfy the assumptions of linear growth and locally weak monotonicity. This result is achieved using classical Khasminskii time discretization, which illustrates the convergence of the solution from the original Cauchy problem to the averaged equation over a finite interval [0, T].
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Interpolation and inverse problems in spectral Barron spaces
Authors:
Shuai Lu,
Peter Mathé
Abstract:
Spectral Barron spaces, which quantify the absolute value of weighted Fourier coefficients of a function, have gained considerable attention due to their capability for universal approximation across certain function classes. By establishing a connection between these spaces and a specific positive linear operator, we investigate the interpolation and scaling relationships among diverse spectral B…
▽ More
Spectral Barron spaces, which quantify the absolute value of weighted Fourier coefficients of a function, have gained considerable attention due to their capability for universal approximation across certain function classes. By establishing a connection between these spaces and a specific positive linear operator, we investigate the interpolation and scaling relationships among diverse spectral Barron spaces. Furthermore, we introduce a link condition by relating the spectral Barron space to inverse problems, illustrating this with three exemplary cases. We revisit the notion of universal approximation within the context of spectral Barron spaces and validate an error bound for Tikhonov regularization, penalized by the spectral Barron norm.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
The moduli space of HCMU surfaces
Authors:
Sicheng Lu,
Bin Xu
Abstract:
HCMU surfaces are compact Riemann surfaces equipped with an extremal Kähler metric and a finite number of singularities. Research on these surfaces was initiated by E. Calabi and X.-X. Chen over thirty years ago. We provide a detailed description of the geometric structure of HCMU surfaces, building on the classical football decomposition introduced by Chen-Chen-Wu. From this perspective, most HCM…
▽ More
HCMU surfaces are compact Riemann surfaces equipped with an extremal Kähler metric and a finite number of singularities. Research on these surfaces was initiated by E. Calabi and X.-X. Chen over thirty years ago. We provide a detailed description of the geometric structure of HCMU surfaces, building on the classical football decomposition introduced by Chen-Chen-Wu. From this perspective, most HCMU surfaces can be uniquely described by a set of data that includes both discrete topological information and continuous geometric parameters. This data representation is effective for studying the moduli space of HCMU surfaces with specified genus and conical angles, suggesting a topological approach to this topic. As a first application, we present a unified proof of the angle constraints on HCMU surfaces. Using the same approach, we establish an existence theorem for HCMU surfaces of any genus with a single conical point, which is also a saddle point. Finally, we determine the dimension of the moduli space, defined as the number of independent continuous parameters. This is achieved by examining several geometric deformations of HCMU surfaces and the various relationships between the quantities in the data set representation.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
Stable inversion of potential in nonlinear wave equations with cubic nonlinearity
Authors:
Xi Chen,
Shuai Lu,
Ruochong Zhang
Abstract:
This paper investigates inverse potential problems of wave equations with cubic nonlinearity. We develop a methodology for establishing stability estimates for inversion of lower order coefficients. The new ingredients of our approach include trilinear approximations of nonlinear response operators, symbol estimates of distorted plane waves, and lower order symbol calculus.
This paper investigates inverse potential problems of wave equations with cubic nonlinearity. We develop a methodology for establishing stability estimates for inversion of lower order coefficients. The new ingredients of our approach include trilinear approximations of nonlinear response operators, symbol estimates of distorted plane waves, and lower order symbol calculus.
△ Less
Submitted 20 January, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
Dynamical Behaviors of the Gradient Flows for In-Context Learning
Authors:
Songtao Lu,
Yingdong Lu,
Tomasz Nowicki
Abstract:
We derive the system of differential equations for the gradient flow characterizing the training process of linear in-context learning in full generality. Next, we explore the geometric structure of the gradient flows in two instances, including identifying its invariants, optimum, and saddle points. This understanding allows us to quantify the behavior of the two gradient flows under the full gen…
▽ More
We derive the system of differential equations for the gradient flow characterizing the training process of linear in-context learning in full generality. Next, we explore the geometric structure of the gradient flows in two instances, including identifying its invariants, optimum, and saddle points. This understanding allows us to quantify the behavior of the two gradient flows under the full generality of parameters and data.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
On primality and atomicity of numerical power monoids
Authors:
Anay Aggarwal,
Felix Gotti,
Susie Lu
Abstract:
In the first part of this paper, we establish a variation of a recent result by Bienvenu and Geroldinger on the (almost) non-existence of absolute irreducibles in (restricted) power monoids of numerical monoids: we argue the (almost) non-existence of primal elements in the same class of power monoids. The second part of this paper, devoted to the study of the atomic density of…
▽ More
In the first part of this paper, we establish a variation of a recent result by Bienvenu and Geroldinger on the (almost) non-existence of absolute irreducibles in (restricted) power monoids of numerical monoids: we argue the (almost) non-existence of primal elements in the same class of power monoids. The second part of this paper, devoted to the study of the atomic density of $\mathcal{P}_{\text{fin}, 0}(\mathbb{N}_0)$, is motivated by work of Shitov, a recent paper by Bienvenu and Geroldinger, and some questions pointed out by Geroldinger and Tringali. In the same, we study atomic density through the lens of the natural partition $\{ \mathcal{A}_{n,k} : k \in \mathbb{N}_0\}$ of $\mathcal{A}_n$, the set of atoms of $\mathcal{P}_{\text{fin}, 0}(\mathbb{N}_0)$ with maximum at most $n$: \[ \mathcal{A}_{n,k} = \{A \in \mathcal{A} : \max A \le n \text{ and } |A| = k\} \] for all $n,k \in \mathbb{N}$, where $\mathcal{A}$ is the set of atoms of $\mathcal{P}_{\text{fin}, 0}(\mathbb{N}_0)$. We pay special attention to the sequence $(α_{n,k})_{n,k \ge 1}$, where $α_{n,k}$ denote the size of the block $\mathcal{A}_{n,k}$. First, we establish some bounds and provide some asymptotic results for $(α_{n,k})_{n,k \ge 1}$. Then, we take some probabilistic approach to argue that, for each $n \in \mathbb{N}$, the sequence $(α_{n,k})_{k \ge 1}$ is almost unimodal. Finally, for each $n \in \mathbb{N}$, we consider the random variable $X_n : \mathcal{A}_n \to \mathbb{N}_0$ defined by the assignments $X_n : A \mapsto |A|$, whose probability mass function is $\mathbb{P}(X_n=k) = α_{n,k}/| \mathcal{A}_n|$. We conclude proving that, for each $m \in \mathbb{N}$, the sequence of moments $(\mathbb{E}(X_n^m))_{n \ge 1}$ behaves asymptotically as that of a sequence $(\mathbb{E}(Y_n^m))_{n \ge 1}$, where $Y_n$ is a binomially distributed random variable with parameters $n$ and $\frac12$.
△ Less
Submitted 8 December, 2024;
originally announced December 2024.
-
A simple proof of curvature estimates for the n-1 Hessian equation
Authors:
Siyuan Lu,
Yi-Lin Tsai
Abstract:
In [Amer. J. Math. 141 (2019), no. 5, 1281-1315], Ren and Wang proved the curvature estimates for the $n-1$ curvature equation. The purpose of this note is to give a simple proof of their theorem.
In [Amer. J. Math. 141 (2019), no. 5, 1281-1315], Ren and Wang proved the curvature estimates for the $n-1$ curvature equation. The purpose of this note is to give a simple proof of their theorem.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization
Authors:
Shuchen Zhu,
Boao Kong,
Songtao Lu,
Xinmeng Huang,
Kun Yuan
Abstract:
This paper studies decentralized bilevel optimization, in which multiple agents collaborate to solve problems involving nested optimization structures with neighborhood communications. Most existing literature primarily utilizes gradient tracking to mitigate the influence of data heterogeneity, without exploring other well-known heterogeneity-correction techniques such as EXTRA or Exact Diffusion.…
▽ More
This paper studies decentralized bilevel optimization, in which multiple agents collaborate to solve problems involving nested optimization structures with neighborhood communications. Most existing literature primarily utilizes gradient tracking to mitigate the influence of data heterogeneity, without exploring other well-known heterogeneity-correction techniques such as EXTRA or Exact Diffusion. Additionally, these studies often employ identical decentralized strategies for both upper- and lower-level problems, neglecting to leverage distinct mechanisms across different levels. To address these limitations, this paper proposes SPARKLE, a unified Single-loop Primal-dual AlgoRithm frameworK for decentraLized bilEvel optimization. SPARKLE offers the flexibility to incorporate various heterogeneitycorrection strategies into the algorithm. Moreover, SPARKLE allows for different strategies to solve upper- and lower-level problems. We present a unified convergence analysis for SPARKLE, applicable to all its variants, with state-of-the-art convergence rates compared to existing decentralized bilevel algorithms. Our results further reveal that EXTRA and Exact Diffusion are more suitable for decentralized bilevel optimization, and using mixed strategies in bilevel algorithms brings more benefits than relying solely on gradient tracking.
△ Less
Submitted 17 December, 2024; v1 submitted 21 November, 2024;
originally announced November 2024.
-
FADAS: Towards Federated Adaptive Asynchronous Optimization
Authors:
Yujia Wang,
Shiqiang Wang,
Songtao Lu,
Jinghui Chen
Abstract:
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. While the SGD-based FL algorithms have demonstrated considerable success in the past, there is a growing trend towards adopting adaptive federated optimization methods, particularly for training large-scale models. However, the conventional synchronous aggregation design poses a signi…
▽ More
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. While the SGD-based FL algorithms have demonstrated considerable success in the past, there is a growing trend towards adopting adaptive federated optimization methods, particularly for training large-scale models. However, the conventional synchronous aggregation design poses a significant challenge to the practical deployment of those adaptive federated optimization methods, particularly in the presence of straggler clients. To fill this research gap, this paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees. To further enhance the efficiency and resilience of our proposed method in scenarios with significant asynchronous delays, we also extend FADAS with a delay-adaptive learning adjustment strategy. We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Convergences of Combinatorial Ricci Flows to Degenerated Circle Packings in Hyperbolic Background Geometry
Authors:
Guangming Hu,
Sicheng Lu,
Dong Tan,
Youliang Zhong,
Puchun Zhou
Abstract:
This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circl…
▽ More
This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circle packed surface, analougus to the methods of Chow-Luo and Takatsu.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Preasymptotic error estimates of EEM and CIP-EEM for the time-harmonic Maxwell equations with large wave number
Authors:
Shuaishuai Lu,
Haijun Wu
Abstract:
Preasymptotic error estimates are derived for the linear edge element method (EEM) and the linear $\boldsymbol{H}(\boldsymbol{\mathrm{curl}})$-conforming interior penalty edge element method (CIP-EEM) for the time-harmonic Maxwell equations with large wave number. It is shown that under the mesh condition that $κ^3 h^2$ is sufficiently small, the errors of the solutions to both methods are bounded…
▽ More
Preasymptotic error estimates are derived for the linear edge element method (EEM) and the linear $\boldsymbol{H}(\boldsymbol{\mathrm{curl}})$-conforming interior penalty edge element method (CIP-EEM) for the time-harmonic Maxwell equations with large wave number. It is shown that under the mesh condition that $κ^3 h^2$ is sufficiently small, the errors of the solutions to both methods are bounded by $\mathcal{O} (κh + κ^3 h^2 )$ in the energy norm and $\mathcal{O} (κh^2 + κ^2 h^2 )$ in the $\boldsymbol{L}^2$ norm, where $κ$ is the wave number and $h$ is the mesh size. Numerical tests are provided to verify our theoretical results and to illustrate the potential of CIP-EEM in significantly reducing the pollution effect.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Function and derivative approximation by shallow neural networks
Authors:
Yuanyuan Li,
Shuai Lu
Abstract:
We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a unit cubic domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi…
▽ More
We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a unit cubic domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi)norms: the extended Barron norm, the variation norm, and the Radon-BV seminorm. These choices of the penalty term are contingent upon the specific architecture of the neural network being utilized. We establish the connection between various network norms and particularly trace the dependence of the dimensionality index, aiming to deepen our understanding of how these norms interplay with each other. We revisit the universality of function approximation through various norms, establish rigorous error-bound analysis for the Tikhonov regularization scheme, and explicitly elucidate the dependency of the dimensionality index, providing a clearer understanding of how the dimensionality affects the approximation performance and how one designs a neural network with diverse approximating tasks.
△ Less
Submitted 14 December, 2024; v1 submitted 6 July, 2024;
originally announced July 2024.
-
The weak averaging principle of stochastic functional partial differential equations with H$\ddot{\text{o}}$lder continuous coefficients and infinite delay
Authors:
Shuaishuai Lu,
Xue Yang,
Yong Li
Abstract:
In this paper, we establish the weak averaging principle for stochastic functional partial differential equations (in short, SFPDEs) with H$\ddot{\text{o}}$lder continuous coefficients and infinite delay by a new generalized coupling approach. Firstly, we rigorously establish the existence and uniqueness of weak solutions for a specific class of finite-dimensional systems by the generalized coupli…
▽ More
In this paper, we establish the weak averaging principle for stochastic functional partial differential equations (in short, SFPDEs) with H$\ddot{\text{o}}$lder continuous coefficients and infinite delay by a new generalized coupling approach. Firstly, we rigorously establish the existence and uniqueness of weak solutions for a specific class of finite-dimensional systems by the generalized coupling approach. Then we extend these results to their infinite-dimensional counterparts using the variational approach and Galerkin projection technique. Subsequently, we establish the averaging principle for SFPDEs with infinite delay in the weak sense, i.e., we prove that the solution of the original system converges in law to that of the averaged system on a finite interval $[0,T]$ as the small parameter $\varepsilon\to 0$. To illustrate our findings, we present two applications: stochastic generalized porous media equations and stochastic reaction-diffusion equations.
△ Less
Submitted 28 March, 2025; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Forward and backward problems for coupled subdiffusion systems
Authors:
Dian Feng,
Yikan Liu,
Shuai Lu
Abstract:
In this article, we investigate both forward and backward problems for coupled systems of time-fractional diffusion equations, encompassing scenarios of strong coupling. For the forward problem, we establish the well-posedness of the system, leveraging the eigensystem of the corresponding elliptic system as the foundation. When considering the backward problem, specifically the determination of in…
▽ More
In this article, we investigate both forward and backward problems for coupled systems of time-fractional diffusion equations, encompassing scenarios of strong coupling. For the forward problem, we establish the well-posedness of the system, leveraging the eigensystem of the corresponding elliptic system as the foundation. When considering the backward problem, specifically the determination of initial values through final time observations, we demonstrate a Lipschitz stability estimate, which is consistent with the stability observed in the case of a single equation. To numerically address this backward problem, we refer to the explicit formulation of Tikhonov regularization to devise a multi-channel neural network architecture. This innovative architecture offers a versatile approach, exhibiting its efficacy in multidimensional settings through numerical examples and its robustness in handling initial values that have not been trained.
△ Less
Submitted 3 February, 2025; v1 submitted 30 June, 2024;
originally announced July 2024.
-
Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV
Authors:
Qian Chen,
Xiaofeng Yang,
Shengli Lu
Abstract:
Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflows can be categorized into coarse and fine granularity. Coarse dataflows offer good spatial locality but suffer from low parallelism, while fine dataflows provide high parallelism but disrupt the spatial structure, leading to in…
▽ More
Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflows can be categorized into coarse and fine granularity. Coarse dataflows offer good spatial locality but suffer from low parallelism, while fine dataflows provide high parallelism but disrupt the spatial structure, leading to increased nodes and poor data reuse. This paper proposes a novel hardware accelerator for SpTRSV or SpTRSV-like DAGs. The accelerator implements a medium granularity dataflow through hardware-software codesign and achieves both excellent spatial locality and high parallelism. Additionally, a partial sum caching mechanism is introduced to reduce the blocking frequency of processing elements (PEs), and a reordering algorithm of intra-node edges computation is developed to enhance data reuse. Experimental results on 245 benchmarks with node counts reaching up to 85,392 demonstrate that this work achieves average performance improvements of 7.0$\times$ (up to 27.8$\times$) over CPUs and 5.8$\times$ (up to 98.8$\times$) over GPUs. Compared to the state-of-the-art technique (DPU-v2), this work shows a 2.5$\times$ (up to 5.9$\times$) average performance improvement and 1.7$\times$ (up to 4.1$\times$) average energy efficiency enhancement.
△ Less
Submitted 17 March, 2025; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Distributed Bilevel Optimization with Communication Compression
Authors:
Yutong He,
Jie Hu,
Xinmeng Huang,
Songtao Lu,
Bin Wang,
Kun Yuan
Abstract:
Stochastic bilevel optimization tackles challenges involving nested optimization structures. Its fast-growing scale nowadays necessitates efficient distributed algorithms. In conventional distributed bilevel methods, each worker must transmit full-dimensional stochastic gradients to the server every iteration, leading to significant communication overhead and thus hindering efficiency and scalabil…
▽ More
Stochastic bilevel optimization tackles challenges involving nested optimization structures. Its fast-growing scale nowadays necessitates efficient distributed algorithms. In conventional distributed bilevel methods, each worker must transmit full-dimensional stochastic gradients to the server every iteration, leading to significant communication overhead and thus hindering efficiency and scalability. To resolve this issue, we introduce the first family of distributed bilevel algorithms with communication compression. The primary challenge in algorithmic development is mitigating bias in hypergradient estimation caused by the nested structure. We first propose C-SOBA, a simple yet effective approach with unbiased compression and provable linear speedup convergence. However, it relies on strong assumptions on bounded gradients. To address this limitation, we explore the use of moving average, error feedback, and multi-step compression in bilevel optimization, resulting in a series of advanced algorithms with relaxed assumptions and improved convergence properties. Numerical experiments show that our compressed bilevel algorithms can achieve $10\times$ reduction in communication overhead without severe performance degradation.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Stochastic functional partial differential equations with monotone coefficients: Poisson stability measures, exponential mixing and limit theorems
Authors:
Shuaishuai Lu,
Xue Yang,
Yong Li
Abstract:
This paper examines Poisson stable (including stationary, periodic, almost periodic, Levitan almost periodic, Bohr almost automorphic, pseudo-periodic, Birkhoff recurrent, pseudo-recurrent, etc.) measures and limit theorems for stochastic functional partial differential equations(SFPDEs) with monotone coefficients. We first show the existence and uniqueness of entrance measure $μ_{t}$ for SFPDEs b…
▽ More
This paper examines Poisson stable (including stationary, periodic, almost periodic, Levitan almost periodic, Bohr almost automorphic, pseudo-periodic, Birkhoff recurrent, pseudo-recurrent, etc.) measures and limit theorems for stochastic functional partial differential equations(SFPDEs) with monotone coefficients. We first show the existence and uniqueness of entrance measure $μ_{t}$ for SFPDEs by dissipative method (or remoting start). Then, with the help of Shcherbakov's comparability method in character of recurrence, we prove that the entrance measure inherits the same recurrence of coefficients. Thirdly, we show the tightness of the set of measures $μ_{t}$. As a result, any sequence of the average of $\{μ_{t}\}_{t\in\mathbb{R} }$ have the limit point $μ^{*}$. Further, we study the uniform exponential mixing of the measure $μ^{*}$ in the sense of Wasserstein metric. Fourthly, under uniform exponential mixing and Markov property, we establish the strong law of large numbers, the central limit theorem and estimate the corresponding rates of convergence for solution maps of SFPDEs. Finally, we give applications of stochastic generalized porous media equations with delay to illustrate of our results.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
McKean-Vlasov SPDEs with coefficients exhibiting locally weak monotonicity: existence, uniqueness, ergodicity, exponential mixing and limit theorems
Authors:
Shuaishuai Lu,
Xue Yang,
Yong Li
Abstract:
This paper investigates the existence and uniqueness of solutions, as well as the ergodicity and exponential mixing to invariant measures, and limit theorems for a class of McKean-Vlasov SPDEs with locally weak monotonicity. In particular, for a class of weak monotonicity conditions, including H$\ddot{\text{o}}$lder continuity, we rigorously establish the existence and uniqueness of weak solutions…
▽ More
This paper investigates the existence and uniqueness of solutions, as well as the ergodicity and exponential mixing to invariant measures, and limit theorems for a class of McKean-Vlasov SPDEs with locally weak monotonicity. In particular, for a class of weak monotonicity conditions, including H$\ddot{\text{o}}$lder continuity, we rigorously establish the existence and uniqueness of weak solutions to McKean-Vlasov SPDEs by employing the Galerkin projection technique and the generalized coupling approach. Additionally, we explore the properties of the solutions, including time homogeneity, the Markov and the Feller property. Building upon these properties, we examine the exponential ergodicity and mixing of invariant measures under Lyapunov conditions. Finally, within the framework of coefficients meeting the criteria of locally weak monotonicity and Lyapunov conditions, alongside the uniform mixing property of invariant measures, we establish the strong law of large numbers and the central limit theorem for the solution and obtain estimates of corresponding convergence rates.
△ Less
Submitted 9 March, 2025; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Interval-valued fuzzy soft $β$-covering approximation spaces
Authors:
Shizhan Lu
Abstract:
The concept of interval-valued fuzzy soft $β$-covering approximation spaces (IFS$β$CASs) is introduced to combine the theories of soft sets, rough sets and interval-valued fuzzy sets, and some fundamental propositions concerning interval-valued fuzzy soft $β$-neighborhoods and soft $β$-neighborhoods of IFS$β$CASs are explored. And then four kinds of interval-valued fuzzy soft $β$-coverings based f…
▽ More
The concept of interval-valued fuzzy soft $β$-covering approximation spaces (IFS$β$CASs) is introduced to combine the theories of soft sets, rough sets and interval-valued fuzzy sets, and some fundamental propositions concerning interval-valued fuzzy soft $β$-neighborhoods and soft $β$-neighborhoods of IFS$β$CASs are explored. And then four kinds of interval-valued fuzzy soft $β$-coverings based fuzzy rough sets are researched. Finally, the relationships of four kinds of interval-valued fuzzy soft $β$-coverings based fuzzy rough sets are investigated.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Fast Consensus Topology Design via Minimizing Laplacian Energy
Authors:
Susie Lu,
Ji Liu
Abstract:
This paper characterizes the graphical properties of an optimal topology with minimal Laplacian energy under the constraint of fixed numbers of vertices and edges, and devises an algorithm to construct such connected optimal graphs. These constructed graphs possess maximum vertex and edge connectivity, and more importantly, exhibit large algebraic connectivity of an optimal order provided they are…
▽ More
This paper characterizes the graphical properties of an optimal topology with minimal Laplacian energy under the constraint of fixed numbers of vertices and edges, and devises an algorithm to construct such connected optimal graphs. These constructed graphs possess maximum vertex and edge connectivity, and more importantly, exhibit large algebraic connectivity of an optimal order provided they are not sparse. These properties guarantee fast and resilient consensus processes over these graphs.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Decentralized Bilevel Optimization: A Perspective from Transient Iteration Complexity
Authors:
Boao Kong,
Shuchen Zhu,
Songtao Lu,
Xinmeng Huang,
Kun Yuan
Abstract:
Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. Howev…
▽ More
Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, most decentralized SBO algorithms focus solely on asymptotic convergence rates, overlooking transient iteration complexity-the number of iterations required before asymptotic rates dominate, which results in limited understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. To address this issue, this paper introduces D-SOBA, a Decentralized Stochastic One-loop Bilevel Algorithm framework. D-SOBA comprises two variants: D-SOBA-SO, which incorporates second-order Hessian and Jacobian matrices, and D-SOBA-FO, which relies entirely on first-order gradients. We provide a comprehensive non-asymptotic convergence analysis and establish the transient iteration complexity of D-SOBA. This provides the first theoretical understanding of how network topology, data heterogeneity, and nested bilevel structures influence decentralized SBO. Extensive experimental results demonstrate the efficiency and theoretical advantages of D-SOBA.
△ Less
Submitted 31 March, 2025; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Regularization of linear inverse problems with irregular noise using embedding operators
Authors:
Xinyan Li,
Simon Hubmer,
Shuai Lu,
Ronny Ramlau
Abstract:
In this paper, we investigate regularization of linear inverse problems with irregular noise. In particular, we consider the case that the noise can be preprocessed by certain adjoint embedding operators. By introducing the consequent preprocessed problem, we provide convergence analysis for general regularization schemes under standard assumptions. Furthermore, for a special case of Tikhonov regu…
▽ More
In this paper, we investigate regularization of linear inverse problems with irregular noise. In particular, we consider the case that the noise can be preprocessed by certain adjoint embedding operators. By introducing the consequent preprocessed problem, we provide convergence analysis for general regularization schemes under standard assumptions. Furthermore, for a special case of Tikhonov regularization in Computerized Tomography, we show that our approach leads to a novel (Fourier-based) filtered backprojection algorithm. Numerical examples with different parameter choice rules verify the efficiency of our proposed algorithm.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Interior $C^2$ estimate for Hessian quotient equation in general dimension
Authors:
Siyuan Lu
Abstract:
In this paper, we study the interior $C^2$ regularity problem for the Hessian quotient equation $\left(\frac{σ_n}{σ_k}\right)(D^2u)=f$. We give a complete answer to this longstanding problem: for $k=n-1,n-2$, we establish an interior $C^2$ estimate; for $k\leq n-3$, we show that interior $C^2$ estimate fails by finding a singular solution.
In this paper, we study the interior $C^2$ regularity problem for the Hessian quotient equation $\left(\frac{σ_n}{σ_k}\right)(D^2u)=f$. We give a complete answer to this longstanding problem: for $k=n-1,n-2$, we establish an interior $C^2$ estimate; for $k\leq n-3$, we show that interior $C^2$ estimate fails by finding a singular solution.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Moduli Space of Dihedral Spherical Surfaces and Measured Foliations
Authors:
Sicheng Lu,
Bin Xu
Abstract:
Cone spherical surfaces are orientable Riemannian surfaces with constant curvature one and a finite set of conical singularities. A subset of these surfaces, referred to as dihedral surfaces, is characterized by their monodromy groups, which notably preserve a pair of antipodal points on the unit two-sphere within three-dimensional Euclidean space. On each dihedral surface, we define a pair of tra…
▽ More
Cone spherical surfaces are orientable Riemannian surfaces with constant curvature one and a finite set of conical singularities. A subset of these surfaces, referred to as dihedral surfaces, is characterized by their monodromy groups, which notably preserve a pair of antipodal points on the unit two-sphere within three-dimensional Euclidean space. On each dihedral surface, we define a pair of transverse measured foliations that, in turn, comprehensively characterize the original dihedral surface. Furthermore, we introduce a variety of geometric decompositions and deformations specific to dihedral surfaces. As a practical application, we ascertain the dimension of the moduli space for dihedral surfaces given specified cone angles and topological types. This dimension acts as an indicator of the independent geometric parameters that determine the isometric classes of these surfaces.
△ Less
Submitted 2 April, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Interior $C^2$ estimate for Hessian quotient equation in dimension three
Authors:
Siyuan Lu
Abstract:
In this paper, we establish an interior $C^2$ estimate for the Hessian quotient equation $\left(\frac{σ_3}{σ_1}\right)(D^2u)=f$ in dimension three. A crucial ingredient in our proof is a Jacobi inequality.
In this paper, we establish an interior $C^2$ estimate for the Hessian quotient equation $\left(\frac{σ_3}{σ_1}\right)(D^2u)=f$ in dimension three. A crucial ingredient in our proof is a Jacobi inequality.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Quadratic Differentials as Stability Conditions of Graded Skew-gentle Algebras
Authors:
Suiqi Lu,
Yu Qiu,
Dongjian Wu
Abstract:
We prove that the principal component of the exchange graph of hearts of a graded skew-gentle algebra can be identified with the corresponding exchange graph of S-graphs, using the geometric models and $\operatorname{Int}=\operatorname{dim}\operatorname{Hom}$ formula in Qiu-Zhang-Zhou. Using the same argument in Bridgeland-Smith, Barbieri-Möller-Qiu-So and Christ-Haiden-Qiu, we extend this identif…
▽ More
We prove that the principal component of the exchange graph of hearts of a graded skew-gentle algebra can be identified with the corresponding exchange graph of S-graphs, using the geometric models and $\operatorname{Int}=\operatorname{dim}\operatorname{Hom}$ formula in Qiu-Zhang-Zhou. Using the same argument in Bridgeland-Smith, Barbieri-Möller-Qiu-So and Christ-Haiden-Qiu, we extend this identification to an isomorphism between the spaces of stability conditions and of quadratic differentials.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Finiteness of pointed maps to moduli spaces of polarized varieties
Authors:
Ariyan Javanpeykar,
Steven Lu,
Ruiran Sun,
Kang Zuo
Abstract:
We prove a finiteness result for pointed maps to the base space of a family of polarized varieties with maximal variation in moduli. A key ingredient is a new criterion for the rigidity of pointed maps.
We prove a finiteness result for pointed maps to the base space of a family of polarized varieties with maximal variation in moduli. A key ingredient is a new criterion for the rigidity of pointed maps.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Principal Stratification with Continuous Post-Treatment Variables: Nonparametric Identification and Semiparametric Estimation
Authors:
Sizhu Lu,
Zhichao Jiang,
Peng Ding
Abstract:
Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatme…
▽ More
Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatment effect heterogeneity across principal strata and unveiling the mechanism of the treatment's impact on the outcome related to post-treatment variables. However, the existing literature has primarily focused on binary post-treatment variables, leaving the case with continuous post-treatment variables largely unexplored. This gap persists due to the complexity of infinitely many principal strata, which present challenges to both the identification and estimation of causal effects. We fill this gap by providing nonparametric identification and semiparametric estimation theory for principal stratification with continuous post-treatment variables. We propose to use working models to approximate the underlying causal effect surfaces and derive the efficient influence functions of the corresponding model parameters. Based on the theory, we construct doubly robust estimators and implement them in an R package.
△ Less
Submitted 3 April, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition
Authors:
Quan Xiao,
Songtao Lu,
Tianyi Chen
Abstract:
Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly c…
▽ More
Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting. In this paper, we first introduce a stationary metric for the considered bilevel problems, which generalizes the existing metric, for a nonconvex lower-level objective that satisfies the Polyak-Łojasiewicz (PL) condition. We then propose a Generalized ALternating mEthod for bilevel opTimization (GALET) tailored to BLO with convex PL LL problem and establish that GALET achieves an $ε$-stationary point for the considered problem within $\tilde{\cal O}(ε^{-1})$ iterations, which matches the iteration complexity of GD for single-level smooth nonconvex problems.
△ Less
Submitted 5 October, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
Harmonic Measures and Numerical Computation of Cauchy Problems for Laplace Equations
Authors:
Yu Chen,
Jin Cheng,
Shuai Lu,
Masahiro Yamamoto
Abstract:
It is well known that Cauchy problem for Laplace equations is an ill-posed problem in Hadamard's sense. Small deviations in Cauchy data may lead to large errors in the solutions. It is observed that if a bound is imposed on the solution, there exists a conditional stability estimate. This gives a reasonable way to construct stable algorithms. However, it is impossible to have good results at all p…
▽ More
It is well known that Cauchy problem for Laplace equations is an ill-posed problem in Hadamard's sense. Small deviations in Cauchy data may lead to large errors in the solutions. It is observed that if a bound is imposed on the solution, there exists a conditional stability estimate. This gives a reasonable way to construct stable algorithms. However, it is impossible to have good results at all points in the domain. Although numerical methods for Cauchy problems for Laplace equations have been widely studied for quite a long time, there are still some unclear points, for example, how to evaluate the numerical solutions, which means whether we can approximate the Cauchy data well and keep the bound of the solution, and at which points the numerical results are reliable? In this paper, we will prove the conditional stability estimate which is quantitatively related to harmonic measures. The harmonic measure can be used as an indicate function to pointwisely evaluate the numerical result, which further enables us to find a reliable subdomain where the local convergence rate is higher than a certain order.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data
Authors:
Yonggui Yan,
Jie Chen,
Pin-Yu Chen,
Xiaodong Cui,
Songtao Lu,
Yangyang Xu
Abstract:
We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information. Both methods need only $\mathcal{O}(1)$ samples pe…
▽ More
We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information. Both methods need only $\mathcal{O}(1)$ samples per worker for each proximal update, which is important to achieve good generalization performance on training deep neural networks. With a smoothness condition on the expected loss function (but not on each sample function), the proposed methods can achieve an optimal sample complexity result to produce a near-stationary point. Numerical experiments on training neural networks demonstrate the significantly better generalization performance of our methods over large-batch training methods and momentum variance-reduction methods and also, the ability of handling heterogeneous data by the gradient tracking scheme.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Curvature estimates for semi-convex solutions of Hessian equations in hyperbolic space
Authors:
Siyuan Lu
Abstract:
In this paper, we establish a curvature estimate for semi-convex solutions of Hessian equations in hyperbolic space. We also obtain a curvature estimate for admissible solutions to prescribed curvature measure type problem in hyperbolic space. A crucial ingredient in both estimates is a concavity inequality for Hessian operator.
In this paper, we establish a curvature estimate for semi-convex solutions of Hessian equations in hyperbolic space. We also obtain a curvature estimate for admissible solutions to prescribed curvature measure type problem in hyperbolic space. A crucial ingredient in both estimates is a concavity inequality for Hessian operator.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks
Authors:
Shuai Zhang,
Meng Wang,
Pin-Yu Chen,
Sijia Liu,
Songtao Lu,
Miao Liu
Abstract:
Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainab…
▽ More
Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainable weights. Despite the empirical successes in reducing the training cost while maintaining the test accuracy, the theoretical generalization analysis of sparse learning for GNNs remains elusive. To the best of our knowledge, this paper provides the first theoretical characterization of joint edge-model sparse learning from the perspective of sample complexity and convergence rate in achieving zero generalization error. It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy. Although the analysis is centered on two-layer GNNs with structural constraints on data, the insights are applicable to more general setups and justified by both synthetic and practical citation datasets.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Increasing stability of a linearized inverse boundary value problem for a nonlinear Schrödinger equation on transversally anisotropic manifolds
Authors:
Shuai Lu,
Jian Zhai
Abstract:
We consider the problem of recovering a nonlinear potential function in a nonlinear Schrödinger equation on transversally anisotropic manifolds from the linearized Dirichlet-to-Neumann map at a large wavenumber. By calibrating the complex geometric optics (CGO) solutions according to the wavenumber, we prove the increasing stability of recovering the coefficient of a cubic term as the wavenumber b…
▽ More
We consider the problem of recovering a nonlinear potential function in a nonlinear Schrödinger equation on transversally anisotropic manifolds from the linearized Dirichlet-to-Neumann map at a large wavenumber. By calibrating the complex geometric optics (CGO) solutions according to the wavenumber, we prove the increasing stability of recovering the coefficient of a cubic term as the wavenumber becomes large.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization
Authors:
Zichong Li,
Pin-Yu Chen,
Sijia Liu,
Songtao Lu,
Yangyang Xu
Abstract:
Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth+non…
▽ More
Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth+nonsmooth) objective and nonconvex smooth functional constraints. We adopt the standard iALM framework and design a subroutine by using the momentum-based variance-reduced proximal stochastic gradient method (PStorm) and a postprocessing step. Under certain regularity conditions (assumed also in existing works), to reach an $\varepsilon$-KKT point in expectation, we establish an oracle complexity result of $O(\varepsilon^{-5})$, which is better than the best-known $O(\varepsilon^{-6})$ result. Numerical experiments on the fairness constrained problem and the Neyman-Pearson classification problem with real data demonstrate that our proposed method outperforms an existing method with the previously best-known complexity result.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Increasing stability of the first order linearized inverse Schrödinger potential problem with integer power type nonlinearities
Authors:
Sen Zou,
Shuai Lu,
Boxi Xu
Abstract:
We investigate the increasing stability of the inverse Schrödinger potential problem with integer power type nonlinearities at a large wavenumber. By considering the first order linearized system with respect to the unknown potential function, a combination formula of the first order linearization is proposed, which provides a Lipschitz type stability for the recovery of the Fourier coefficients o…
▽ More
We investigate the increasing stability of the inverse Schrödinger potential problem with integer power type nonlinearities at a large wavenumber. By considering the first order linearized system with respect to the unknown potential function, a combination formula of the first order linearization is proposed, which provides a Lipschitz type stability for the recovery of the Fourier coefficients of the unknown potential function in low frequency mode. These stability results highlight the advantage of nonlinearity in solving this inverse potential problem by explicitly quantifying the dependence to the wavenumber and the nonlinearities index. A reconstruction algorithm for general power type nonlinearities is also provided. Several numerical examples illuminate the efficiency of our proposed algorithm.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
On a Dynamic Variant of the Iteratively Regularized Gauss-Newton Method with Sequential Data
Authors:
Neil K. Chada,
Marco A. Iglesias,
Shuai Lu,
Frank Werner
Abstract:
For numerous parameter and state estimation problems, assimilating new data as they become available can help produce accurate and fast inference of unknown quantities. While most existing algorithms for solving those kind of ill-posed inverse problems can only be used with a single instance of the observed data, in this work we propose a new framework that enables existing algorithms to invert mu…
▽ More
For numerous parameter and state estimation problems, assimilating new data as they become available can help produce accurate and fast inference of unknown quantities. While most existing algorithms for solving those kind of ill-posed inverse problems can only be used with a single instance of the observed data, in this work we propose a new framework that enables existing algorithms to invert multiple instances of data in a sequential fashion. Specifically we will work with the well-known iteratively regularized Gauss-Newton method (IRGNM), a variational methodology for solving nonlinear inverse problems. We develop a theory of convergence analysis for a proposed dynamic IRGNM algorithm in the presence of Gaussian white noise. We combine this algorithm with the classical IRGNM to deliver a practical (hybrid) algorithm that can invert data sequentially while producing fast estimates. Our work includes the proof of well-definedness of the proposed iterative scheme, as well as various error bounds that rely on standard assumptions for nonlinear inverse problems. We use several numerical experiments to verify our theoretical findings, and to highlight the benefits of incorporating sequential data. The context of the numerical experiments comprises various parameter identification problems including a Darcy flow elliptic PDE example, and that of electrical impedance tomography.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks
Authors:
Zhuqing Liu,
Xin Zhang,
Prashant Khanduri,
Songtao Lu,
Jia Liu
Abstract:
In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized…
▽ More
In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized bilevel optimization over peer-to-peer networks with limited computation and communication capabilities, how to achieve low sample and communication complexities are two fundamental challenges that remain under-explored so far. In this paper, we make the first attempt to investigate the class of decentralized bilevel optimization problems with nonconvex and strongly-convex structure corresponding to the outer and inner subproblems, respectively. Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n ε^{-1})$ and communication complexity of $\mathcal{O}(ε^{-1})$ to solve the bilevel optimization problem, where $n$ and $ε> 0$ are the number of samples at each agent and the desired stationarity gap, respectively. ii) To relax the need for full gradient evaluations in each iteration, we propose a stochastic variance-reduced version of INTERACT (SVR-INTERACT), which improves the sample complexity to $\mathcal{O}(\sqrt{n} ε^{-1})$ while achieving the same communication complexity as the deterministic algorithm. To our knowledge, this work is the first that achieves both low sample and communication complexities for solving decentralized bilevel optimization problems over networks. Our numerical experiments also corroborate our theoretical findings.
△ Less
Submitted 5 October, 2022; v1 submitted 27 July, 2022;
originally announced July 2022.
-
A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization
Authors:
Songtao Lu
Abstract:
Nonconvex constrained optimization problems can be used to model a number of machine learning problems, such as multi-class Neyman-Pearson classification and constrained Markov decision processes. However, such kinds of problems are challenging because both the objective and constraints are possibly nonconvex, so it is difficult to balance the reduction of the loss value and reduction of constrain…
▽ More
Nonconvex constrained optimization problems can be used to model a number of machine learning problems, such as multi-class Neyman-Pearson classification and constrained Markov decision processes. However, such kinds of problems are challenging because both the objective and constraints are possibly nonconvex, so it is difficult to balance the reduction of the loss value and reduction of constraint violation. Although there are a few methods that solve this class of problems, all of them are double-loop or triple-loop algorithms, and they require oracles to solve some subproblems up to certain accuracy by tuning multiple hyperparameters at each iteration. In this paper, we propose a novel gradient descent and perturbed ascent (GDPA) algorithm to solve a class of smooth nonconvex inequality constrained problems. The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way. The key feature of the proposed algorithm is that it is a single-loop algorithm, where only two step-sizes need to be tuned. We show that under a mild regularity condition GDPA is able to find Karush-Kuhn-Tucker (KKT) points of nonconvex functional constrained problems with convergence rate guarantees. To the best of our knowledge, it is the first single-loop algorithm that can solve the general nonconvex smooth problems with nonconvex inequality constraints. Numerical results also showcase the superiority of GDPA compared with the best-known algorithms (in terms of both stationarity measure and feasibility of the obtained solutions).
△ Less
Submitted 2 December, 2024; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Understanding Benign Overfitting in Gradient-Based Meta Learning
Authors:
Lisha Chen,
Songtao Lu,
Tianyi Chen
Abstract:
Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign ove…
▽ More
Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign overfitting." To understand this phenomenon, we focus on the meta learning settings with a challenging bilevel structure that we term the gradient-based meta learning, and analyze its generalization performance under an overparameterized meta linear regression model. While our analysis uses the relatively tractable linear models, our theory contributes to understanding the delicate interplay among data heterogeneity, model adaptation and benign overfitting in gradient-based meta learning tasks. We corroborate our theoretical claims through numerical simulations.
△ Less
Submitted 9 November, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
On the asymptotic Plateau problem in hyperbolic space
Authors:
Siyuan Lu
Abstract:
In this paper, we solve the asymptotic Plateau problem in hyperbolic space for constant $σ_{n-1}$ curvature, i.e. the existence of a complete hypersurface in $\mathbb{H}^{n+1}$ satisfying $σ_{n-1}(κ)=σ\in (0,n)$ with a prescribed asymptotic boundary $Γ$. The key ingredient is the curvature estimates. Previously, this is only known for $σ_0<σ<n$, where $σ_0$ is a positive constant.
In this paper, we solve the asymptotic Plateau problem in hyperbolic space for constant $σ_{n-1}$ curvature, i.e. the existence of a complete hypersurface in $\mathbb{H}^{n+1}$ satisfying $σ_{n-1}(κ)=σ\in (0,n)$ with a prescribed asymptotic boundary $Γ$. The key ingredient is the curvature estimates. Previously, this is only known for $σ_0<σ<n$, where $σ_0$ is a positive constant.
△ Less
Submitted 12 February, 2023; v1 submitted 31 May, 2022;
originally announced June 2022.
-
Discrimination-Based Double Auction for Maximizing Social Welfare in the Electricity and Heating Market Considering Privacy Preservation
Authors:
Lu Wang,
Wei Gu,
Shuai Lu,
Haifeng Qiu,
Zhi Wu
Abstract:
This paper proposes a doubled-sided auction mechanism with price discrimination for social welfare (SW) maximization in the electricity and heating market. In this mechanism, energy service providers (ESPs) submit offers and load aggregators (LAs) submit bids to an energy trading center (ETC) to maximize their utility; in turn, the selfless ETC as an auctioneer leverages dis-criminatory price weig…
▽ More
This paper proposes a doubled-sided auction mechanism with price discrimination for social welfare (SW) maximization in the electricity and heating market. In this mechanism, energy service providers (ESPs) submit offers and load aggregators (LAs) submit bids to an energy trading center (ETC) to maximize their utility; in turn, the selfless ETC as an auctioneer leverages dis-criminatory price weights to regulate the behaviors of ESPs and LAs, which combines the individual benefits of each stakeholder with the overall social welfare to achieve the global optimum. Nash games are employed to describe the interactions between players with the same market role. Theoretically, we first prove the existence and uniqueness of the Nash equilibrium; then, considering the requirement of game players to preserve privacy, a distributed algorithm based on the alternating direction method of multipliers is developed to implement distributed bidding and analytical target cascading algorithm is applied to reach the balance of demand and supply. We validated the proposed mechanism using case studies on a city-level distribution system. The results indicated that the achieved SW improved by 4%-15% compared with other mechanisms, and also verified the effectiveness of the distributed algorithm.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
On the Dirichlet problem for Lagrangian phase equation with critical and supercritical phase
Authors:
Siyuan Lu
Abstract:
In this paper, we solve the Dirichlet problem for Lagrangian phase equation with critical and supercritical phase. A crucial ingredient is the interior $C^2$ estimate. Our result is sharp in the sense that there exist singular solutions in the subcritical phase case.
In this paper, we solve the Dirichlet problem for Lagrangian phase equation with critical and supercritical phase. A crucial ingredient is the interior $C^2$ estimate. Our result is sharp in the sense that there exist singular solutions in the subcritical phase case.
△ Less
Submitted 12 February, 2023; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning
Authors:
Alex Gu,
Songtao Lu,
Parikshit Ram,
Lily Weng
Abstract:
We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descent-ascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary poi…
▽ More
We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descent-ascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary point at a rate of $\widetilde{\mathcal{O}}(n^{1/2} K^{-2/5})$ for a class of weakly convex problems with $n$ objectives upon $K$ iterations of the algorithm. Our analysis utilizes novel results to handle the non-smooth min-max multi-objective setup and to obtain a sublinear dependence in the number of objectives $n$. Experimental results on robust representation learning and robust hyperparameter optimization showcase (i) the advantages of considering the min-max multi-objective setup, and (ii) convergence properties of the proposed MORBiT. Our code is at https://github.com/minimario/MORBiT.
△ Less
Submitted 7 March, 2023; v1 submitted 3 March, 2022;
originally announced March 2022.
-
On the Exactness of an Energy-efficient Train Control model based on Convex Optimization
Authors:
Shaofeng Lu,
Minling Feng,
Kunpeng Wu
Abstract:
In this paper, we demonstrate the exactness proof for the energy-efficient train control (EETC) model based on convex optimization. The proof of exactness shows that the convex optimization model will share the same optimization results with the initial model on which the convex relaxations are conducted. We first show how the relaxation on the initial non-convex model is conducted and provide ana…
▽ More
In this paper, we demonstrate the exactness proof for the energy-efficient train control (EETC) model based on convex optimization. The proof of exactness shows that the convex optimization model will share the same optimization results with the initial model on which the convex relaxations are conducted. We first show how the relaxation on the initial non-convex model is conducted and provide analysis to show that the relaxations are convex constraints and the relaxed model is thus a convex model. Subsequently, we prove that the relaxed convex model will always achieve its optimal solution on the initial equality constraints and the optimal solution achieved by convex optimization will be the same as the one obtained by the initial non-convex model and the relaxations applied are exact. A numerical verification has been conducted based on a typical urban rail system with a steep gradient. The results of this paper shed lights on further applications of convex optimization on energy-efficient train control and relevant areas related to operation and control of low-carbon transportation systems.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
The double contravariant powerset monad in the Goguen category of fuzzy sets
Authors:
Sijia Lu,
Dexue Zhang
Abstract:
A monad is constructed in the Goguen category of fuzzy sets valued in a unital quantale, which is an analog of the double contravariant powerset monad in the category of sets. With help of this monad it is proved that the Goguen category of fuzzy sets is dually monadic over itself.
A monad is constructed in the Goguen category of fuzzy sets valued in a unital quantale, which is an analog of the double contravariant powerset monad in the category of sets. With help of this monad it is proved that the Goguen category of fuzzy sets is dually monadic over itself.
△ Less
Submitted 3 August, 2022; v1 submitted 13 February, 2022;
originally announced February 2022.