Search | arXiv e-print repository

Remarks on the Polyak-Lojasiewicz inequality and the convergence of gradient systems

Authors: Arthur Castello B. de Oliveira, Leilei Cui, Eduardo D. Sontag

Abstract: This work explores generalizations of the Polyak-Lojasiewicz inequality (PLI) and their implications for the convergence behavior of gradient flows in optimization problems. Motivated by the continuous-time linear quadratic regulator (CT-LQR) policy optimization problem -- where only a weaker version of the PLI is characterized in the literature -- this work shows that while weaker conditions are… ▽ More This work explores generalizations of the Polyak-Lojasiewicz inequality (PLI) and their implications for the convergence behavior of gradient flows in optimization problems. Motivated by the continuous-time linear quadratic regulator (CT-LQR) policy optimization problem -- where only a weaker version of the PLI is characterized in the literature -- this work shows that while weaker conditions are sufficient for global convergence to, and optimality of the set of critical points of the cost function, the "profile" of the gradient flow solution can change significantly depending on which "flavor" of inequality the cost satisfies. After a general theoretical analysis, we focus on fitting the CT-LQR policy optimization problem to the proposed framework, showing that, in fact, it can never satisfy a PLI in its strongest form. We follow up our analysis with a brief discussion on the difference between continuous- and discrete-time LQR policy optimization, and end the paper with some intuition on the extension of this framework to optimization problems with L1 regularization and solved through proximal gradient flows. △ Less

Submitted 30 March, 2025; originally announced March 2025.

arXiv:2305.09904 [pdf, ps, other]

On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations

Authors: Arthur Castello B. de Oliveira, Milad Siami, Eduardo D. Sontag

Abstract: Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to classical statistical belief. This phenomenon, sometimes known as ``benign overfitting'', raises questions regarding in what other ways might overparameterizat… ▽ More Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to classical statistical belief. This phenomenon, sometimes known as ``benign overfitting'', raises questions regarding in what other ways might overparameterization affect the properties of a learning problem. In this work, we investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation. This uncertainty arises naturally if the gradient is estimated from noisy data or directly measured. Our object of study is a linear neural network with a single, arbitrarily wide, hidden layer and an arbitrary number of inputs and outputs. In this paper we solve the problem for the case where the input and output of our neural-network are one-dimensional, deriving sufficient conditions for robustness of our system based on necessary and sufficient conditions for convergence in the undisturbed case. We then show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized, and discuss directions of future work that might extend our current results for more general formulations. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 10 pages, 1 figure, extended conference version

arXiv:2209.06932 [pdf, ps, other]

doi 10.1016/j.neunet.2025.107486

Optimizing Connectivity through Network Gradients for Restricted Boltzmann Machines

Authors: A. C. N. de Oliveira, D. R. Figueiredo

Abstract: Leveraging sparse networks to connect successive layers in deep neural networks has recently been shown to provide benefits to large-scale state-of-the-art models. However, network connectivity also plays a significant role in the learning performance of shallow networks, such as the classic Restricted Boltzmann Machine (RBM). Efficiently finding sparse connectivity patterns that improve the learn… ▽ More Leveraging sparse networks to connect successive layers in deep neural networks has recently been shown to provide benefits to large-scale state-of-the-art models. However, network connectivity also plays a significant role in the learning performance of shallow networks, such as the classic Restricted Boltzmann Machine (RBM). Efficiently finding sparse connectivity patterns that improve the learning performance of shallow networks is a fundamental problem. While recent principled approaches explicitly include network connections as model parameters that must be optimized, they often rely on explicit penalization or network sparsity as a hyperparameter. This work presents the Network Connectivity Gradients (NCG), an optimization method to find optimal connectivity patterns for RBMs. NCG leverages the idea of network gradients: given a specific connection pattern, it determines the gradient of every possible connection and uses the gradient to drive a continuous connection strength parameter that in turn is used to determine the connection pattern. Thus, learning RBM parameters and learning network connections is truly jointly performed, albeit with different learning rates, and without changes to the model's classic energy-based objective function. The proposed method is applied to the MNIST and other data sets showing that better RBM models are found for the benchmark tasks of sample generation and classification. Results also show that NCG is robust to network initialization and is capable of both adding and removing network connections while learning. △ Less

Submitted 29 May, 2025; v1 submitted 14 September, 2022; originally announced September 2022.

Journal ref: Neural Networks, 2025

arXiv:2006.03748 [pdf, other]

Thruster-assisted center manifold shaping in bipedal legged locomotion

Authors: Arthur C. B. de Oliveira, Alireza Ramezani

Abstract: This work tries to contribute to the design of legged robots with capabilities boosted through thruster-assisted locomotion. Our long-term goal is the development of robots capable of negotiating unstructured environments, including land and air, by leveraging legs and thrusters collaboratively. These robots could be used in a broad number of applications including search and rescue operations, sp… ▽ More This work tries to contribute to the design of legged robots with capabilities boosted through thruster-assisted locomotion. Our long-term goal is the development of robots capable of negotiating unstructured environments, including land and air, by leveraging legs and thrusters collaboratively. These robots could be used in a broad number of applications including search and rescue operations, space exploration, automated package handling in residential spaces and digital agriculture, to name a few. In all of these examples, the unique capability of thruster-assisted mobility greatly broadens the locomotion designs possibilities for these systems. In an effort to demonstrate thrusters effectiveness in the robustification and efficiency of bipedal locomotion gaits, this work explores their effects on the gait limit cycles and proposes new design paradigms based on shaping these center manifolds with strong foliations. Unilateral contact force feasibility conditions are resolved in an optimal control scheme. △ Less

Submitted 5 June, 2020; originally announced June 2020.

Comments: 6 pages, accepted in International Conference on Advanced Intelligent Mechatronics (AIM) 2020

Showing 1–4 of 4 results for author: de Oliveira, A C