-
Efficient Path Planning with Soft Homology Constraints
Authors:
Carlos A. Taveras,
Santiago Segarra,
César A. Uribe
Abstract:
We study the problem of path planning with soft homology constraints on a surface topologically equivalent to a disk with punctures. Specifically, we propose an algorithm, named $\Hstar$, for the efficient computation of a path homologous to a user-provided reference path. We show that the algorithm can generate a suite of paths in distinct homology classes, from the overall shortest path to the s…
▽ More
We study the problem of path planning with soft homology constraints on a surface topologically equivalent to a disk with punctures. Specifically, we propose an algorithm, named $\Hstar$, for the efficient computation of a path homologous to a user-provided reference path. We show that the algorithm can generate a suite of paths in distinct homology classes, from the overall shortest path to the shortest path homologous to the reference path, ordered both by path length and similarity to the reference path. Rollout is shown to improve the results produced by the algorithm. Experiments demonstrate that $\Hstar$ can be an efficient alternative to optimal methods, especially for configuration spaces with many obstacles.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
An Optimal Transport Approach for Network Regression
Authors:
Alex G. Zalles,
Kai M. Hung,
Ann E. Finneran,
Lydia Beaudrot,
César A. Uribe
Abstract:
We study the problem of network regression, where one is interested in how the topology of a network changes as a function of Euclidean covariates. We build upon recent developments in generalized regression models on metric spaces based on Fréchet means and propose a network regression method using the Wasserstein metric. We show that when representing graphs as multivariate Gaussian distribution…
▽ More
We study the problem of network regression, where one is interested in how the topology of a network changes as a function of Euclidean covariates. We build upon recent developments in generalized regression models on metric spaces based on Fréchet means and propose a network regression method using the Wasserstein metric. We show that when representing graphs as multivariate Gaussian distributions, the network regression problem requires the computation of a Riemannian center of mass (i.e., Fréchet means). Fréchet means with non-negative weights translates into a barycenter problem and can be efficiently computed using fixed point iterations. Although the convergence guarantees of fixed-point iterations for the computation of Wasserstein affine averages remain an open problem, we provide evidence of convergence in a large number of synthetic and real-data scenarios. Extensive numerical results show that the proposed approach improves existing procedures by accurately accounting for graph size, topology, and sparsity in synthetic experiments. Additionally, real-world experiments using the proposed approach result in higher Coefficient of Determination ($R^{2}$) values and lower mean squared prediction error (MSPE), cementing improved prediction capabilities in practice.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
A Moreau Envelope Approach for LQR Meta-Policy Estimation
Authors:
Ashwin Aravind,
Mohammad Taha Toghani,
César A. Uribe
Abstract:
We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems. We propose a Moreau Envelope-based surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations. Moreover, we design an algorithm to find an approximate first…
▽ More
We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems. We propose a Moreau Envelope-based surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations. Moreover, we design an algorithm to find an approximate first-order stationary point of the meta-LQR cost function. Numerical results show that the proposed approach outperforms naive averaging of controllers on new realizations of the linear system. We also provide empirical evidence that our method has better sample complexity than Model-Agnostic Meta-Learning (MAML) approaches.
△ Less
Submitted 7 October, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Decentralized and Equitable Optimal Transport
Authors:
Ivan Lau,
Shiqian Ma,
César A. Uribe
Abstract:
This paper considers the decentralized (discrete) optimal transport (D-OT) problem. In this setting, a network of agents seeks to design a transportation plan jointly, where the cost function is the sum of privately held costs for each agent. We reformulate the D-OT problem as a constraint-coupled optimization problem and propose a single-loop decentralized algorithm with an iteration complexity o…
▽ More
This paper considers the decentralized (discrete) optimal transport (D-OT) problem. In this setting, a network of agents seeks to design a transportation plan jointly, where the cost function is the sum of privately held costs for each agent. We reformulate the D-OT problem as a constraint-coupled optimization problem and propose a single-loop decentralized algorithm with an iteration complexity of O(1/ε) that matches existing centralized first-order approaches. Moreover, we propose the decentralized equitable optimal transport (DE-OT) problem. In DE-OT, in addition to cooperatively designing a transportation plan that minimizes transportation costs, agents seek to ensure equity in their individual costs. The iteration complexity of the proposed method to solve DE-OT is also O(1/ε). This rate improves existing centralized algorithms, where the best iteration complexity obtained is O(1/ε^2).
△ Less
Submitted 12 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
PIDformer: Transformer Meets Control Theory
Authors:
Tam Nguyen,
César A. Uribe,
Tan M. Nguyen,
Richard G. Baraniuk
Abstract:
In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input p…
▽ More
In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input perturbations. We incorporate a Proportional-Integral-Derivative (PID) closed-loop feedback control system with a reference point into the model to improve robustness and representation capacity. This integration aims to preserve high-frequency details while bolstering model stability, rendering it more noise-resilient. The resulting controlled state-space model is theoretically proven robust and adept at addressing the rank collapse. Motivated by this control framework, we derive a novel class of transformers, PID-controlled Transformer (PIDformer), aimed at improving robustness and mitigating the rank-collapse issue inherent in softmax transformers. We empirically evaluate the model for advantages and robustness against baseline transformers across various practical tasks, including object classification, image segmentation, and language modeling.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Improving Denoising Diffusion Probabilistic Models via Exploiting Shared Representations
Authors:
Delaram Pirhayatifard,
Mohammad Taha Toghani,
Guha Balakrishnan,
César A. Uribe
Abstract:
In this work, we address the challenge of multi-task image generation with limited data for denoising diffusion probabilistic models (DDPM), a class of generative models that produce high-quality images by reversing a noisy diffusion process. We propose a novel method, SR-DDPM, that leverages representation-based techniques from few-shot learning to effectively learn from fewer samples across diff…
▽ More
In this work, we address the challenge of multi-task image generation with limited data for denoising diffusion probabilistic models (DDPM), a class of generative models that produce high-quality images by reversing a noisy diffusion process. We propose a novel method, SR-DDPM, that leverages representation-based techniques from few-shot learning to effectively learn from fewer samples across different tasks. Our method consists of a core meta architecture with shared parameters, i.e., task-specific layers with exclusive parameters. By exploiting the similarity between diverse data distributions, our method can scale to multiple tasks without compromising the image quality. We evaluate our method on standard image datasets and show that it outperforms both unconditional and conditional DDPM in terms of FID and SSIM metrics.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Adaptive Federated Learning with Auto-Tuned Clients
Authors:
Junhyung Lyle Kim,
Mohammad Taha Toghani,
César A. Uribe,
Anastasios Kyrillidis
Abstract:
Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data. While being a flexible framework, where the distribution of local data, participation rate, and computing power of each client can greatly vary, such flexibility gives rise to many new challen…
▽ More
Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data. While being a flexible framework, where the distribution of local data, participation rate, and computing power of each client can greatly vary, such flexibility gives rise to many new challenges, especially in the hyperparameter tuning on the client side. We propose $Δ$-SGD, a simple step size rule for SGD that enables each client to use its own step size by adapting to the local smoothness of the function each client is optimizing. We provide theoretical and empirical results where the benefit of the client adaptivity is shown in various FL scenarios.
△ Less
Submitted 2 May, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
On First-Order Meta-Reinforcement Learning with Moreau Envelopes
Authors:
Mohammad Taha Toghani,
Sebastian Perez-Salazar,
César A. Uribe
Abstract:
Meta-Reinforcement Learning (MRL) is a promising framework for training agents that can quickly adapt to new environments and tasks. In this work, we study the MRL problem under the policy gradient formulation, where we propose a novel algorithm that uses Moreau envelope surrogate regularizers to jointly learn a meta-policy that is adjustable to the environment of each individual task. Our algorit…
▽ More
Meta-Reinforcement Learning (MRL) is a promising framework for training agents that can quickly adapt to new environments and tasks. In this work, we study the MRL problem under the policy gradient formulation, where we propose a novel algorithm that uses Moreau envelope surrogate regularizers to jointly learn a meta-policy that is adjustable to the environment of each individual task. Our algorithm, called Moreau Envelope Meta-Reinforcement Learning (MEMRL), learns a meta-policy that can adapt to a distribution of tasks by efficiently updating the policy parameters using a combination of gradient-based optimization and Moreau Envelope regularization. Moreau Envelopes provide a smooth approximation of the policy optimization problem, which enables us to apply standard optimization techniques and converge to an appropriate stationary point. We provide a detailed analysis of the MEMRL algorithm, where we show a sublinear convergence rate to a first-order stationary point for non-convex policy gradient optimization. We finally show the effectiveness of MEMRL on a multi-task 2D-navigation problem.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
On the Performance of Gradient Tracking with Local Updates
Authors:
Edward Duc Hien Nguyen,
Sulaiman A. Alghunaim,
Kun Yuan,
César A. Uribe
Abstract:
We study the decentralized optimization problem where a network of $n$ agents seeks to minimize the average of a set of heterogeneous non-convex cost functions distributedly. State-of-the-art decentralized algorithms like Exact Diffusion~(ED) and Gradient Tracking~(GT) involve communicating every iteration. However, communication is expensive, resource intensive, and slow. In this work, we analyze…
▽ More
We study the decentralized optimization problem where a network of $n$ agents seeks to minimize the average of a set of heterogeneous non-convex cost functions distributedly. State-of-the-art decentralized algorithms like Exact Diffusion~(ED) and Gradient Tracking~(GT) involve communicating every iteration. However, communication is expensive, resource intensive, and slow. In this work, we analyze a locally updated GT method (LU-GT), where agents perform local recursions before interacting with their neighbors. While local updates have been shown to reduce communication overhead in practice, their theoretical influence has not been fully characterized. We show LU-GT has the same communication complexity as the Federated Learning setting but allows arbitrary network topologies. In addition, we prove that the number of local updates does not degrade the quality of the solution achieved by LU-GT. Numerical examples reveal that local updates can lower communication costs in certain regimes (e.g., well-connected graphs).
△ Less
Submitted 12 October, 2022; v1 submitted 10 October, 2022;
originally announced October 2022.
-
PersA-FL: Personalized Asynchronous Federated Learning
Authors:
Mohammad Taha Toghani,
Soomin Lee,
César A. Uribe
Abstract:
We study the personalized federated learning problem under asynchronous updates. In this problem, each client seeks to obtain a personalized model that simultaneously outperforms local and global models. We consider two optimization-based frameworks for personalization: (i) Model-Agnostic Meta-Learning (MAML) and (ii) Moreau Envelope (ME). MAML involves learning a joint model adapted for each clie…
▽ More
We study the personalized federated learning problem under asynchronous updates. In this problem, each client seeks to obtain a personalized model that simultaneously outperforms local and global models. We consider two optimization-based frameworks for personalization: (i) Model-Agnostic Meta-Learning (MAML) and (ii) Moreau Envelope (ME). MAML involves learning a joint model adapted for each client through fine-tuning, whereas ME requires a bi-level optimization problem with implicit gradients to enforce personalization via regularized losses. We focus on improving the scalability of personalized federated learning by removing the synchronous communication assumption. Moreover, we extend the studied function class by removing boundedness assumptions on the gradient norm. Our main technical contribution is a unified proof for asynchronous federated learning with bounded staleness that we apply to MAML and ME personalization frameworks. For the smooth and non-convex functions class, we show the convergence of our method to a first-order stationary point. We illustrate the performance of our method and its tolerance to staleness through experiments for classification tasks over heterogeneous datasets.
△ Less
Submitted 4 October, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Unbounded Gradients in Federated Learning with Buffered Asynchronous Aggregation
Authors:
Mohammad Taha Toghani,
César A. Uribe
Abstract:
Synchronous updates may compromise the efficiency of cross-device federated learning once the number of active clients increases. The \textit{FedBuff} algorithm (Nguyen et al., 2022) alleviates this problem by allowing asynchronous updates (staleness), which enhances the scalability of training while preserving privacy via secure aggregation. We revisit the \textit{FedBuff} algorithm for asynchron…
▽ More
Synchronous updates may compromise the efficiency of cross-device federated learning once the number of active clients increases. The \textit{FedBuff} algorithm (Nguyen et al., 2022) alleviates this problem by allowing asynchronous updates (staleness), which enhances the scalability of training while preserving privacy via secure aggregation. We revisit the \textit{FedBuff} algorithm for asynchronous federated learning and extend the existing analysis by removing the boundedness assumptions from the gradient norm. This paper presents a theoretical analysis of the convergence rate of this algorithm when heterogeneity in data, batch size, and delay are considered.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks
Authors:
Mohammad Taha Toghani,
César A. Uribe
Abstract:
We study the decentralized consensus and stochastic optimization problems with compressed communications over static directed graphs. We propose an iterative gradient-based algorithm that compresses messages according to a desired compression ratio. The proposed method provably reduces the communication overhead on the network at every communication round. Contrary to existing literature, we allow…
▽ More
We study the decentralized consensus and stochastic optimization problems with compressed communications over static directed graphs. We propose an iterative gradient-based algorithm that compresses messages according to a desired compression ratio. The proposed method provably reduces the communication overhead on the network at every communication round. Contrary to existing literature, we allow for arbitrary compression ratios in the communicated messages. We show a linear convergence rate for the proposed method on the consensus problem. Moreover, we provide explicit convergence rates for decentralized stochastic optimization problems on smooth functions that are either (i) strongly convex, (ii) convex, or (iii) non-convex. Finally, we provide numerical experiments to illustrate convergence under arbitrary compression ratios and the communication efficiency of our algorithm.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
On Acceleration of Gradient-Based Empirical Risk Minimization using Local Polynomial Regression
Authors:
Ekaterina Trimbach,
Edward Duc Hien Nguyen,
César A. Uribe
Abstract:
We study the acceleration of the Local Polynomial Interpolation-based Gradient Descent method (LPI-GD) recently proposed for the approximate solution of empirical risk minimization problems (ERM). We focus on loss functions that are strongly convex and smooth with condition number $σ$. We additionally assume the loss function is $η$-Hölder continuous with respect to the data. The oracle complexity…
▽ More
We study the acceleration of the Local Polynomial Interpolation-based Gradient Descent method (LPI-GD) recently proposed for the approximate solution of empirical risk minimization problems (ERM). We focus on loss functions that are strongly convex and smooth with condition number $σ$. We additionally assume the loss function is $η$-Hölder continuous with respect to the data. The oracle complexity of LPI-GD is $\tilde{O}\left(σm^d \log(1/\varepsilon)\right)$ for a desired accuracy $\varepsilon$, where $d$ is the dimension of the parameter space, and $m$ is the cardinality of an approximation grid. The factor $m^d$ can be shown to scale as $O((1/\varepsilon)^{d/2η})$. LPI-GD has been shown to have better oracle complexity than gradient descent (GD) and stochastic gradient descent (SGD) for certain parameter regimes. We propose two accelerated methods for the ERM problem based on LPI-GD and show an oracle complexity of $\tilde{O}\left(\sqrtσ m^d \log(1/\varepsilon)\right)$. Moreover, we provide the first empirical study on local polynomial interpolation-based gradient methods and corroborate that LPI-GD has better performance than GD and SGD in some scenarios, and the proposed methods achieve acceleration.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
On Distributed Exact Sparse Linear Regression over Networks
Authors:
Tu Anh-Nguyen,
César A. Uribe
Abstract:
In this work, we propose an algorithm for solving exact sparse linear regression problems over a network in a distributed manner. Particularly, we consider the problem where data is stored among different computers or agents that seek to collaboratively find a common regressor with a specified sparsity k, i.e., the L0-norm is less than or equal to k. Contrary to existing literature that uses L1 re…
▽ More
In this work, we propose an algorithm for solving exact sparse linear regression problems over a network in a distributed manner. Particularly, we consider the problem where data is stored among different computers or agents that seek to collaboratively find a common regressor with a specified sparsity k, i.e., the L0-norm is less than or equal to k. Contrary to existing literature that uses L1 regularization to approximate sparseness, we solve the problem with exact sparsity k. The main novelty in our proposal lies in showing a problem formulation with zero duality gap for which we adopt a dual approach to solve the problem in a decentralized way. This sets a foundational approach for the study of distributed optimization with explicit sparsity constraints. We show theoretically and empirically that, under appropriate assumptions, where each agent solves smaller and local integer programming problems, all agents will eventually reach a consensus on the same sparse optimal regressor.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
Local Stochastic Factored Gradient Descent for Distributed Quantum State Tomography
Authors:
Junhyung Lyle Kim,
Mohammad Taha Toghani,
César A. Uribe,
Anastasios Kyrillidis
Abstract:
We propose a distributed Quantum State Tomography (QST) protocol, named Local Stochastic Factored Gradient Descent (Local SFGD), to learn the low-rank factor of a density matrix over a set of local machines. QST is the canonical procedure to characterize the state of a quantum system, which we formulate as a stochastic nonconvex smooth optimization problem. Physically, the estimation of a low-rank…
▽ More
We propose a distributed Quantum State Tomography (QST) protocol, named Local Stochastic Factored Gradient Descent (Local SFGD), to learn the low-rank factor of a density matrix over a set of local machines. QST is the canonical procedure to characterize the state of a quantum system, which we formulate as a stochastic nonconvex smooth optimization problem. Physically, the estimation of a low-rank density matrix helps characterizing the amount of noise introduced by quantum computation. Theoretically, we prove the local convergence of Local SFGD for a general class of restricted strongly convex/smooth loss functions, i.e., Local SFGD converges locally to a small neighborhood of the global optimum at a linear rate with a constant step size, while it locally converges exactly at a sub-linear rate with diminishing step sizes. With a proper initialization, local convergence results imply global convergence. We validate our theoretical findings with numerical simulations of QST on the Greenberger-Horne-Zeilinger (GHZ) state.
△ Less
Submitted 1 June, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
The Role of Local Steps in Local SGD
Authors:
Tiancheng Qin,
S. Rasoul Etesami,
César A. Uribe
Abstract:
We consider the distributed stochastic optimization problem where $n$ agents want to minimize a global function given by the sum of agents' local functions, and focus on the heterogeneous setting when agents' local functions are defined over non-i.i.d. data sets. We study the Local SGD method, where agents perform a number of local stochastic gradient steps and occasionally communicate with a cent…
▽ More
We consider the distributed stochastic optimization problem where $n$ agents want to minimize a global function given by the sum of agents' local functions, and focus on the heterogeneous setting when agents' local functions are defined over non-i.i.d. data sets. We study the Local SGD method, where agents perform a number of local stochastic gradient steps and occasionally communicate with a central node to improve their local optimization tasks. We analyze the effect of local steps on the convergence rate and the communication complexity of Local SGD. In particular, instead of assuming a fixed number of local steps across all communication rounds, we allow the number of local steps during the $i$-th communication round, $H_i$, to be different and arbitrary numbers. Our main contribution is to characterize the convergence rate of Local SGD as a function of $\{H_i\}_{i=1}^R$ under various settings of strongly convex, convex, and nonconvex local functions, where $R$ is the total number of communication rounds. Based on this characterization, we provide sufficient conditions on the sequence $\{H_i\}_{i=1}^R$ such that Local SGD can achieve linear speed-up with respect to the number of workers. Furthermore, we propose a new communication strategy with increasing local steps superior to existing communication strategies for strongly convex local functions. On the other hand, for convex and nonconvex local functions, we argue that fixed local steps are the best communication strategy for Local SGD and recover state-of-the-art convergence rate results. Finally, we justify our theoretical results through extensive numerical experiments.
△ Less
Submitted 29 September, 2022; v1 submitted 13 March, 2022;
originally announced March 2022.
-
Faster Convergence of Local SGD for Over-Parameterized Models
Authors:
Tiancheng Qin,
S. Rasoul Etesami,
César A. Uribe
Abstract:
Modern machine learning architectures are often highly expressive. They are usually over-parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. For general…
▽ More
Modern machine learning architectures are often highly expressive. They are usually over-parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. For general convex loss functions, we establish an error bound of $Ø(1/T)$ under a mild data similarity assumption and an error bound of $Ø(K/T)$ otherwise, where $K$ is the number of local steps and $T$ is the total number of iterations. For non-convex loss functions we prove an error bound of $Ø(K/T)$. These bounds improve upon the best previous bound of $Ø(1/\sqrt{nT})$ in both cases, where $n$ is the number of nodes, when no assumption on the model being over-parameterized is made. We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize. Finally, we validate our theoretical results by performing large-scale numerical experiments that reveal the convergence behavior of Local SGD for practical over-parameterized deep learning models, in which the $Ø(1/T)$ convergence rate of Local SGD is clearly shown.
△ Less
Submitted 10 June, 2024; v1 submitted 29 January, 2022;
originally announced January 2022.
-
Scalable Average Consensus with Compressed Communications
Authors:
Mohammad Taha Toghani,
César A. Uribe
Abstract:
We propose a new decentralized average consensus algorithm with compressed communication that scales linearly with the network size n. We prove that the proposed method converges to the average of the initial values held locally by the agents of a network when agents are allowed to communicate with compressed messages. The proposed algorithm works for a broad class of compression operators (possib…
▽ More
We propose a new decentralized average consensus algorithm with compressed communication that scales linearly with the network size n. We prove that the proposed method converges to the average of the initial values held locally by the agents of a network when agents are allowed to communicate with compressed messages. The proposed algorithm works for a broad class of compression operators (possibly biased), where agents interact over arbitrary static, undirected, and connected networks. We further present numerical experiments that confirm our theoretical results and illustrate the scalability and communication efficiency of our algorithm.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Robust Distributed Optimization With Randomly Corrupted Gradients
Authors:
Berkay Turan,
Cesar A. Uribe,
Hoi-To Wai,
Mahnoosh Alizadeh
Abstract:
In this paper, we propose a first-order distributed optimization algorithm that is provably robust to Byzantine failures-arbitrary and potentially adversarial behavior, where all the participating agents are prone to failure. We model each agent's state over time as a two-state Markov chain that indicates Byzantine or trustworthy behaviors at different time instants. We set no restrictions on the…
▽ More
In this paper, we propose a first-order distributed optimization algorithm that is provably robust to Byzantine failures-arbitrary and potentially adversarial behavior, where all the participating agents are prone to failure. We model each agent's state over time as a two-state Markov chain that indicates Byzantine or trustworthy behaviors at different time instants. We set no restrictions on the maximum number of Byzantine agents at any given time. We design our method based on three layers of defense: 1) temporal robust aggregation, 2) spatial robust aggregation, and 3) gradient normalization. We study two settings for stochastic optimization, namely Sample Average Approximation and Stochastic Approximation. We provide convergence guarantees of our method for strongly convex and smooth non-convex cost functions.
△ Less
Submitted 17 June, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Communication-efficient Distributed Cooperative Learning with Compressed Beliefs
Authors:
Mohammad Taha Toghani,
César A. Uribe
Abstract:
We study the problem of distributed cooperative learning, where a group of agents seeks to agree on a set of hypotheses that best describes a sequence of private observations. In the scenario where the set of hypotheses is large, we propose a belief update rule where agents share compressed (either sparse or quantized) beliefs with an arbitrary positive compression rate. Our algorithm leverages a…
▽ More
We study the problem of distributed cooperative learning, where a group of agents seeks to agree on a set of hypotheses that best describes a sequence of private observations. In the scenario where the set of hypotheses is large, we propose a belief update rule where agents share compressed (either sparse or quantized) beliefs with an arbitrary positive compression rate. Our algorithm leverages a unified communication rule that enables agents to access wide-ranging compression operators as black-box modules. We prove the almost sure asymptotic exponential convergence of beliefs around the set of optimal hypotheses. Additionally, we show a non-asymptotic, explicit, and linear concentration rate in probability of the beliefs on the optimal hypothesis set. We provide numerical experiments to illustrate the communication benefits of our method. The simulation results show that the number of transmitted bits can be reduced to 5-10% of the non-compressed method in the studied scenarios.
△ Less
Submitted 21 September, 2021; v1 submitted 14 February, 2021;
originally announced February 2021.
-
A General Framework for Distributed Inference with Uncertain Models
Authors:
James Z. Hare,
Cesar A. Uribe,
Lance Kaplan,
Ali Jadbabaie
Abstract:
This paper studies the problem of distributed classification with a network of heterogeneous agents. The agents seek to jointly identify the underlying target class that best describes a sequence of observations. The problem is first abstracted to a hypothesis-testing framework, where we assume that the agents seek to agree on the hypothesis (target class) that best matches the distribution of obs…
▽ More
This paper studies the problem of distributed classification with a network of heterogeneous agents. The agents seek to jointly identify the underlying target class that best describes a sequence of observations. The problem is first abstracted to a hypothesis-testing framework, where we assume that the agents seek to agree on the hypothesis (target class) that best matches the distribution of observations. Non-Bayesian social learning theory provides a framework that solves this problem in an efficient manner by allowing the agents to sequentially communicate and update their beliefs for each hypothesis over the network. Most existing approaches assume that agents have access to exact statistical models for each hypothesis. However, in many practical applications, agents learn the likelihood models based on limited data, which induces uncertainty in the likelihood function parameters. In this work, we build upon the concept of uncertain models to incorporate the agents' uncertainty in the likelihoods by identifying a broad set of parametric distribution that allows the agents' beliefs to converge to the same result as a centralized approach. Furthermore, we empirically explore extensions to non-parametric models to provide a generalized framework of uncertain models in non-Bayesian social learning.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Communication-efficient Decentralized Local SGD over Undirected Networks
Authors:
Tiancheng Qin,
S. Rasoul Etesami,
César A. Uribe
Abstract:
We consider the distributed learning problem where a network of $n$ agents seeks to minimize a global function $F$. Agents have access to $F$ through noisy gradients, and they can locally communicate with their neighbors a network. We study the Decentralized Local SDG method, where agents perform a number of local gradient steps and occasionally exchange information with their neighbors. Previous…
▽ More
We consider the distributed learning problem where a network of $n$ agents seeks to minimize a global function $F$. Agents have access to $F$ through noisy gradients, and they can locally communicate with their neighbors a network. We study the Decentralized Local SDG method, where agents perform a number of local gradient steps and occasionally exchange information with their neighbors. Previous algorithmic analysis efforts have focused on the specific network topology (star topology) where a leader node aggregates all agents' information. We generalize that setting to an arbitrary network by analyzing the trade-off between the number of communication rounds and the computational effort of each agent. We bound the expected optimality gap in terms of the number of iterates $T$, the number of workers $n$, and the spectral gap of the underlying network. Our main results show that by using only $R=Ω(n)$ communication rounds, one can achieve an error that scales as $O({1}/{nT})$, where the number of communication rounds is independent of $T$ and only depends on the number of agents. Finally, we provide numerical evidence of our theoretical results through experiments on real and synthetic data.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
Robust Asynchronous and Network-Independent Cooperative Learning
Authors:
Eduardo Mojica-Nava,
David Yanguas-Rojas,
César A. Uribe
Abstract:
We consider the model of cooperative learning via distributed non-Bayesian learning, where a network of agents tries to jointly agree on a hypothesis that best described a sequence of locally available observations. Building upon recently proposed weak communication network models, we propose a robust cooperative learning rule that allows asynchronous communications, message delays, unpredictable…
▽ More
We consider the model of cooperative learning via distributed non-Bayesian learning, where a network of agents tries to jointly agree on a hypothesis that best described a sequence of locally available observations. Building upon recently proposed weak communication network models, we propose a robust cooperative learning rule that allows asynchronous communications, message delays, unpredictable message losses, and directed communication among nodes. We show that our proposed learning dynamics guarantee that all agents in the network will have an asymptotic exponential decay of their beliefs on the wrong hypothesis, indicating that the beliefs of all agents will concentrate on the optimal hypotheses. Numerical experiments provide evidence on a number of network setups.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
A Distributed Cubic-Regularized Newton Method for Smooth Convex Optimization over Networks
Authors:
César A. Uribe,
Ali Jadbabaie
Abstract:
We propose a distributed, cubic-regularized Newton method for large-scale convex optimization over networks. The proposed method requires only local computations and communications and is suitable for federated learning applications over arbitrary network topologies. We show a $O(k^{{-}3})$ convergence rate when the cost function is convex with Lipschitz gradient and Hessian, with $k$ being the nu…
▽ More
We propose a distributed, cubic-regularized Newton method for large-scale convex optimization over networks. The proposed method requires only local computations and communications and is suitable for federated learning applications over arbitrary network topologies. We show a $O(k^{{-}3})$ convergence rate when the cost function is convex with Lipschitz gradient and Hessian, with $k$ being the number of iterations. We further provide network-dependent bounds for the communication required in each step of the algorithm. We provide numerical experiments that validate our theoretical results.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Generalized Self-concordant Hessian-barrier algorithms
Authors:
Pavel Dvurechensky,
Mathias Staudigl,
César A. Uribe
Abstract:
Many problems in statistical learning, imaging, and computer vision involve the optimization of a non-convex objective function with singularities at the boundary of the feasible set. For such challenging instances, we develop a new interior-point technique building on the Hessian-barrier algorithm recently introduced in Bomze, Mertikopoulos, Schachinger and Staudigl, [SIAM J. Opt. 2019 29(3), pp.…
▽ More
Many problems in statistical learning, imaging, and computer vision involve the optimization of a non-convex objective function with singularities at the boundary of the feasible set. For such challenging instances, we develop a new interior-point technique building on the Hessian-barrier algorithm recently introduced in Bomze, Mertikopoulos, Schachinger and Staudigl, [SIAM J. Opt. 2019 29(3), pp. 2100-2127], where the Riemannian metric is induced by a generalized self-concordant function. This class of functions is sufficiently general to include most of the commonly used barrier functions in the literature of interior point methods. We prove global convergence to an approximate stationary point of the method, and in cases where the feasible set admits an easily computable self-concordant barrier, we verify worst-case optimal iteration complexity of the method. Applications in non-convex statistical estimation and $L^{p}$-minimization are discussed to given the efficiency of the method.
△ Less
Submitted 6 November, 2019; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Non-Bayesian Social Learning with Uncertain Models
Authors:
James Z. Hare,
Cesar A. Uribe,
Lance Kaplan,
Ali Jadbabaie
Abstract:
Non-Bayesian social learning theory provides a framework that models distributed inference for a group of agents interacting over a social network. In this framework, each agent iteratively forms and communicates beliefs about an unknown state of the world with their neighbors using a learning rule. Existing approaches assume agents have access to precise statistical models (in the form of likelih…
▽ More
Non-Bayesian social learning theory provides a framework that models distributed inference for a group of agents interacting over a social network. In this framework, each agent iteratively forms and communicates beliefs about an unknown state of the world with their neighbors using a learning rule. Existing approaches assume agents have access to precise statistical models (in the form of likelihoods) for the state of the world. However in many situations, such models must be learned from finite data. We propose a social learning rule that takes into account uncertainty in the statistical models using second-order probabilities. Therefore, beliefs derived from uncertain models are sensitive to the amount of past evidence collected for each hypothesis. We characterize how well the hypotheses can be tested on a social network, as consistent or not with the state of the world. We explicitly show the dependency of the generated beliefs with respect to the amount of prior evidence. Moreover, as the amount of prior evidence goes to infinity, learning occurs and is consistent with traditional social learning theory.
△ Less
Submitted 27 September, 2019; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Non-Bayesian Social Learning with Uncertain Models over Time-Varying Directed Graphs
Authors:
César A. Uribe,
James Z. Hare,
Lance Kaplan,
Ali Jadbabaie
Abstract:
We study the problem of non-Bayesian social learning with uncertain models, in which a network of agents seek to cooperatively identify the state of the world based on a sequence of observed signals. In contrast with the existing literature, we focus our attention on the scenario where the statistical models held by the agents about possible states of the world are built from finite observations.…
▽ More
We study the problem of non-Bayesian social learning with uncertain models, in which a network of agents seek to cooperatively identify the state of the world based on a sequence of observed signals. In contrast with the existing literature, we focus our attention on the scenario where the statistical models held by the agents about possible states of the world are built from finite observations. We show that existing non-Bayesian social learning approaches may select a wrong hypothesis with non-zero probability under these conditions. Therefore, we propose a new algorithm to iteratively construct a set of beliefs that indicate whether a certain hypothesis is supported by the empirical evidence. This new algorithm can be implemented over time-varying directed graphs, with non{-}doubly stochastic weights.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
On Increasing Self-Confidence in Non-Bayesian Social Learning over Time-Varying Directed Graphs
Authors:
César A. Uribe,
Ali Jadbabaie
Abstract:
We study the convergence of the log-linear non-Bayesian social learning update rule, for a group of agents that collectively seek to identify a parameter that best describes a joint sequence of observations. Contrary to recent literature, we focus on the case where agents assign decaying weights to its neighbors, and the network is not connected at every time instant but over some finite time inte…
▽ More
We study the convergence of the log-linear non-Bayesian social learning update rule, for a group of agents that collectively seek to identify a parameter that best describes a joint sequence of observations. Contrary to recent literature, we focus on the case where agents assign decaying weights to its neighbors, and the network is not connected at every time instant but over some finite time intervals. We provide a necessary and sufficient condition for the rate at which agents decrease the weights and still guarantees social learning.
△ Less
Submitted 23 December, 2018;
originally announced December 2018.
-
Graph-Theoretic Analysis of Belief System Dynamics under Logic Constraints
Authors:
Angelia Nedić,
Alex Olshevsky,
César A. Uribe
Abstract:
Opinion formation cannot be modeled solely as an ideological deduction from a set of principles; rather, repeated social interactions and logic constraints among statements are consequential in the construct of belief systems. We address three basic questions in the analysis of social opinion dynamics: (i) Will a belief system converge? (ii) How long does it take to converge? (iii) Where does it c…
▽ More
Opinion formation cannot be modeled solely as an ideological deduction from a set of principles; rather, repeated social interactions and logic constraints among statements are consequential in the construct of belief systems. We address three basic questions in the analysis of social opinion dynamics: (i) Will a belief system converge? (ii) How long does it take to converge? (iii) Where does it converge? We provide graph-theoretic answers to these questions for a model of opinion dynamics of a belief system with logic constraints. Our results make plain the implicit dependence of the convergence properties of a belief system on the underlying social network and on the set of logic constraints that relate beliefs on different statements. Moreover, we provide an explicit analysis of a variety of commonly used large-scale network models.
△ Less
Submitted 30 December, 2018; v1 submitted 4 October, 2018;
originally announced October 2018.
-
A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks
Authors:
César A. Uribe,
Soomin Lee,
Alexander Gasnikov,
Angelia Nedić
Abstract:
We study dual-based algorithms for distributed convex optimization problems over networks, where the objective is to minimize a sum $\sum_{i=1}^{m}f_i(z)$ of functions over in a network. We provide complexity bounds for four different cases, namely: each function $f_i$ is strongly convex and smooth, each function is either strongly convex or smooth, and when it is convex but neither strongly conve…
▽ More
We study dual-based algorithms for distributed convex optimization problems over networks, where the objective is to minimize a sum $\sum_{i=1}^{m}f_i(z)$ of functions over in a network. We provide complexity bounds for four different cases, namely: each function $f_i$ is strongly convex and smooth, each function is either strongly convex or smooth, and when it is convex but neither strongly convex nor smooth. Our approach is based on the dual of an appropriately formulated primal problem, which includes a graph that models the communication restrictions. We propose distributed algorithms that achieve the same optimal rates as their centralized counterparts (up to constant and logarithmic factors), with an additional optimal cost related to the spectral properties of the network. Initially, we focus on functions for which we can explicitly minimize its Legendre-Fenchel conjugate, i.e., admissible or dual friendly functions. Then, we study distributed optimization algorithms for non-dual friendly functions, as well as a method to improve the dependency on the parameters of the functions involved. Numerical analysis of the proposed algorithms is also provided.
△ Less
Submitted 16 March, 2020; v1 submitted 3 September, 2018;
originally announced September 2018.
-
Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Authors:
Pavel Dvurechensky,
Darina Dvinskikh,
Alexander Gasnikov,
César A. Uribe,
Angelia Nedić
Abstract:
We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in the network…
▽ More
We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in the network by getting samples from its local measure and exchanging information with its neighbors. Motivated by this problem, we develop, and analyze, a novel accelerated primal-dual stochastic gradient method for general stochastic convex optimization problems with linear equality constraints. Then, we apply this method to the decentralized distributed optimization setting to obtain a new algorithm for the distributed semi-discrete regularized Wasserstein barycenter problem. Moreover, we show explicit non-asymptotic complexity for the proposed algorithm.
△ Less
Submitted 19 February, 2020; v1 submitted 11 June, 2018;
originally announced June 2018.
-
Distributed Computation of Wasserstein Barycenters over Networks
Authors:
César A. Uribe,
Darina Dvinskikh,
Pavel Dvurechensky,
Alexander Gasnikov,
Angelia Nedić
Abstract:
We propose a new \cu{class-optimal} algorithm for the distributed computation of Wasserstein Barycenters over networks. Assuming that each node in a graph has a probability distribution, we prove that every node can reach the barycenter of all distributions held in the network by using local interactions compliant with the topology of the graph. We provide an estimate for the minimum number of com…
▽ More
We propose a new \cu{class-optimal} algorithm for the distributed computation of Wasserstein Barycenters over networks. Assuming that each node in a graph has a probability distribution, we prove that every node can reach the barycenter of all distributions held in the network by using local interactions compliant with the topology of the graph. We provide an estimate for the minimum number of communication rounds required for the proposed method to achieve arbitrary relative precision both in the optimality of the solution and the consensus among all agents for undirected fixed networks.
△ Less
Submitted 20 September, 2018; v1 submitted 7 March, 2018;
originally announced March 2018.
-
Optimal Algorithms for Distributed Optimization
Authors:
César A. Uribe,
Soomin Lee,
Alexander Gasnikov,
Angelia Nedić
Abstract:
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or…
▽ More
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.
△ Less
Submitted 14 November, 2018; v1 submitted 1 December, 2017;
originally announced December 2017.
-
Distributed Learning for Cooperative Inference
Authors:
Angelia Nedić,
Alex Olshevsky,
César A. Uribe
Abstract:
We study the problem of cooperative inference where a group of agents interact over a network and seek to estimate a joint parameter that best explains a set of observations. Agents do not know the network topology or the observations of other agents. We explore a variational interpretation of the Bayesian posterior density, and its relation to the stochastic mirror descent algorithm, to propose a…
▽ More
We study the problem of cooperative inference where a group of agents interact over a network and seek to estimate a joint parameter that best explains a set of observations. Agents do not know the network topology or the observations of other agents. We explore a variational interpretation of the Bayesian posterior density, and its relation to the stochastic mirror descent algorithm, to propose a new distributed learning algorithm. We show that, under appropriate assumptions, the beliefs generated by the proposed algorithm concentrate around the true parameter exponentially fast. We provide explicit non-asymptotic bounds for the convergence rate. Moreover, we develop explicit and computationally efficient algorithms for observation models belonging to exponential families.
△ Less
Submitted 10 April, 2017;
originally announced April 2017.
-
Distributed Gaussian Learning over Time-varying Directed Graphs
Authors:
Angelia Nedić,
Alex Olshevsky,
César A. Uribe
Abstract:
We present a distributed (non-Bayesian) learning algorithm for the problem of parameter estimation with Gaussian noise. The algorithm is expressed as explicit updates on the parameters of the Gaussian beliefs (i.e. means and precision). We show a convergence rate of $O(1/k)$ with the constant term depending on the number of agents and the topology of the network. Moreover, we show almost sure conv…
▽ More
We present a distributed (non-Bayesian) learning algorithm for the problem of parameter estimation with Gaussian noise. The algorithm is expressed as explicit updates on the parameters of the Gaussian beliefs (i.e. means and precision). We show a convergence rate of $O(1/k)$ with the constant term depending on the number of agents and the topology of the network. Moreover, we show almost sure convergence to the optimal solution of the estimation problem for the general case of time-varying directed graphs.
△ Less
Submitted 6 December, 2016; v1 submitted 5 December, 2016;
originally announced December 2016.
-
A Tutorial on Distributed (Non-Bayesian) Learning: Problem, Algorithms and Results
Authors:
Angelia Nedić,
Alex Olshevsky,
César A. Uribe
Abstract:
We overview some results on distributed learning with focus on a family of recently proposed algorithms known as non-Bayesian social learning. We consider different approaches to the distributed learning problem and its algorithmic solutions for the case of finitely many hypotheses. The original centralized problem is discussed at first, and then followed by a generalization to the distributed set…
▽ More
We overview some results on distributed learning with focus on a family of recently proposed algorithms known as non-Bayesian social learning. We consider different approaches to the distributed learning problem and its algorithmic solutions for the case of finitely many hypotheses. The original centralized problem is discussed at first, and then followed by a generalization to the distributed setting. The results on convergence and convergence rate are presented for both asymptotic and finite time regimes. Various extensions are discussed such as those dealing with directed time-varying networks, Nesterov's acceleration technique and a continuum sets of hypothesis.
△ Less
Submitted 23 September, 2016;
originally announced September 2016.