Search | arXiv e-print repository

Distributed gradient methods under heavy-tailed communication noise

Authors: Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic, Soummya Kar

Abstract: We consider a standard distributed optimization problem in which networked nodes collaboratively minimize the sum of their locally known convex costs. For this setting, we address for the first time the fundamental problem of design and analysis of distributed methods to solve the above problem when inter-node communication is subject to \emph{heavy-tailed} noise. Heavy-tailed noise is highly rele… ▽ More We consider a standard distributed optimization problem in which networked nodes collaboratively minimize the sum of their locally known convex costs. For this setting, we address for the first time the fundamental problem of design and analysis of distributed methods to solve the above problem when inter-node communication is subject to \emph{heavy-tailed} noise. Heavy-tailed noise is highly relevant and frequently arises in densely deployed wireless sensor and Internet of Things (IoT) networks. Specifically, we design a distributed gradient-type method that features a carefully balanced mixed time-scale time-varying consensus and gradient contribution step sizes and a bounded nonlinear operator on the consensus update to limit the effect of heavy-tailed noise. Assuming heterogeneous strongly convex local costs with mutually different minimizers that are arbitrarily far apart, we show that the proposed method converges to a neighborhood of the network-wide problem solution in the mean squared error (MSE) sense, and we also characterize the corresponding convergence rate. We further show that the asymptotic MSE can be made arbitrarily small through consensus step-size tuning, possibly at the cost of slowing down the transient error decay. Numerical experiments corroborate our findings and demonstrate the resilience of the proposed method to heavy-tailed (and infinite variance) communication noise. They also show that existing distributed methods, designed for finite-communication-noise-variance settings, fail in the presence of infinite variance noise. △ Less

Submitted 30 May, 2025; originally announced May 2025.

Comments: This work has been submitted to the IEEE for possible publication

MSC Class: 90C25; 65K05

arXiv:2410.15637 [pdf, ps, other]

Large Deviation Upper Bounds and Improved MSE Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry

Authors: Aleksandar Armacki, Shuhua Yu, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

Abstract: We study large deviation upper bounds and mean-squared error (MSE) guarantees of a general framework of nonlinear stochastic gradient methods in the online setting, in the presence of heavy-tailed noise. Unlike existing works that rely on the closed form of a nonlinearity (typically clipping), our framework treats the nonlinearity in a black-box manner, allowing us to provide unified guarantees fo… ▽ More We study large deviation upper bounds and mean-squared error (MSE) guarantees of a general framework of nonlinear stochastic gradient methods in the online setting, in the presence of heavy-tailed noise. Unlike existing works that rely on the closed form of a nonlinearity (typically clipping), our framework treats the nonlinearity in a black-box manner, allowing us to provide unified guarantees for a broad class of bounded nonlinearities, including many popular ones, like sign, quantization, normalization, as well as component-wise and joint clipping. We provide several strong results for a broad range of step-sizes in the presence of heavy-tailed noise with symmetric probability density function, positive in a neighbourhood of zero and potentially unbounded moments. In particular, for non-convex costs we provide a large deviation upper bound for the minimum norm-squared of gradients, showing an asymptotic tail decay on an exponential scale, at a rate $\sqrt{t} / \log(t)$. We establish the accompanying rate function, showing an explicit dependence on the choice of step-size, nonlinearity, noise and problem parameters. Next, for non-convex costs and the minimum norm-squared of gradients, we derive the optimal MSE rate $\widetilde{\mathcal{O}}(t^{-1/2})$. Moreover, for strongly convex costs and the last iterate, we provide an MSE rate that can be made arbitrarily close to the optimal rate $\mathcal{O}(t^{-1})$, improving on the state-of-the-art results in the presence of heavy-tailed noise. Finally, we establish almost sure convergence of the minimum norm-squared of gradients, providing an explicit rate, which can be made arbitrarily close to $o(t^{-1/4})$. △ Less

Submitted 21 March, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

Comments: 35 pages. arXiv admin note: text overlap with arXiv:2410.13954

arXiv:2410.13954 [pdf, other]

Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees

Authors: Aleksandar Armacki, Shuhua Yu, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

Abstract: We study high-probability convergence in online learning, in the presence of heavy-tailed noise. To combat the heavy tails, a general framework of nonlinear SGD methods is considered, subsuming several popular nonlinearities like sign, quantization, component-wise and joint clipping. In our work the nonlinearity is treated in a black-box manner, allowing us to establish unified guarantees for a br… ▽ More We study high-probability convergence in online learning, in the presence of heavy-tailed noise. To combat the heavy tails, a general framework of nonlinear SGD methods is considered, subsuming several popular nonlinearities like sign, quantization, component-wise and joint clipping. In our work the nonlinearity is treated in a black-box manner, allowing us to establish unified guarantees for a broad range of nonlinear methods. For symmetric noise and non-convex costs we establish convergence of gradient norm-squared, at a rate $\widetilde{\mathcal{O}}(t^{-1/4})$, while for the last iterate of strongly convex costs we establish convergence to the population optima, at a rate $\mathcal{O}(t^{-ζ})$, where $ζ\in (0,1)$ depends on noise and problem parameters. Further, if the noise is a (biased) mixture of symmetric and non-symmetric components, we show convergence to a neighbourhood of stationarity, whose size depends on the mixture coefficient, nonlinearity and noise. Compared to state-of-the-art, who only consider clipping and require unbiased noise with bounded $p$-th moments, $p \in (1,2]$, we provide guarantees for a broad class of nonlinearities, without any assumptions on noise moments. While the rate exponents in state-of-the-art depend on noise moments and vanish as $p \rightarrow 1$, our exponents are constant and strictly better whenever $p < 6/5$ for non-convex and $p < 8/7$ for strongly convex costs. Experiments validate our theory, showing that clipping is not always the optimal nonlinearity, further underlining the value of a general framework. △ Less

Submitted 20 March, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

Comments: 40 pages, 6 figures

arXiv:2310.18784 [pdf, other]

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

Authors: Aleksandar Armacki, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

Abstract: We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong resul… ▽ More We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong results. First, for non-convex costs and component-wise nonlinearities, we establish a convergence rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{4}}\right)$, whose exponent is independent of noise and problem parameters. Second, for strongly convex costs and component-wise nonlinearities, we establish a rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{2}}\right)$ for the weighted average of iterates, with exponent again independent of noise and problem parameters. Finally, for strongly convex costs and a broader class of nonlinearities, we establish convergence of the last iterate, with a rate $\mathcal{O}\left(t^{-ζ} \right)$, where $ζ\in (0,1)$ depends on problem parameters, noise and nonlinearity. As we show analytically and numerically, $ζ$ can be used to inform the preferred choice of nonlinearity for given problem settings. Compared to state-of-the-art, who only consider clipping, require bounded noise moments of order $η\in (1,2]$, and establish convergence rates whose exponents go to zero as $η\rightarrow 1$, we provide high-probability guarantees for a much broader class of nonlinearities and symmetric density noise, with convergence rates whose exponents are bounded away from zero, even when the noise has finite first moment only. Moreover, in the case of strongly convex functions, we demonstrate analytically and numerically that clipping is not always the optimal nonlinearity, further underlining the value of our general framework. △ Less

Submitted 30 April, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

Comments: 30 pages, 3 figures

arXiv:2212.11959 [pdf, other]

Nonlinear consensus+innovations under correlated heavy-tailed noises: Mean square convergence rate and asymptotics

Authors: Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic, Soummya Kar

Abstract: We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent identically distributed (i.i.d.) in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed,… ▽ More We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent identically distributed (i.i.d.) in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed, infinite-variance noises are highly relevant in practice and are shown to occur, e.g., in dense internet of things (IoT) deployments. We develop a consensus+innovations distributed estimator that employs a general nonlinearity in both consensus and innovations steps to combat the noise. We establish the estimator's almost sure convergence, asymptotic normality, and mean squared error (MSE) convergence. Moreover, we establish and explicitly quantify for the estimator a sublinear MSE convergence rate. We then quantify through analytical examples the effects of the nonlinearity choices and the noises correlation on the system performance. Finally, numerical examples corroborate our findings and verify that the proposed method works in the simultaneous heavy-tail communication-sensing noise setting, while existing methods fail under the same noise conditions. △ Less

Submitted 9 November, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

MSC Class: 93E10; 93E35; 60G35; 94A13; 62M05

arXiv:2211.00969 [pdf, other]

Large deviations rates for stochastic gradient descent with strongly convex functions

Authors: Dragana Bajovic, Dusan Jakovetic, Soummya Kar

Abstract: Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily boun… ▽ More Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily bounded) gradient noise satisfying mild technical assumptions, allowing for the dependence of the noise distribution on the current iterate. Under the preceding assumptions, we find an upper large deviations bound for SGD with strongly convex functions. The corresponding rate function captures analytical dependence on the noise distribution and other problem parameters. This is in contrast with conventional mean-square error analysis that captures only the noise dependence through the variance and does not capture the effect of higher order moments nor interplay between the noise geometry and the shape of the cost function. We also derive exact large deviation rates for the case when the objective function is quadratic and show that the obtained function matches the one from the general upper bound hence showing the tightness of the general upper bound. Numerical examples illustrate and corroborate theoretical findings. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 32 pages, 2 figures

arXiv:2204.02593 [pdf, other]

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

Authors: Dusan Jakovetic, Dragana Bajovic, Anit Kumar Sahu, Soummya Kar, Nemanja Milosevic, Dusan Stamenkovic

Abstract: We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assum… ▽ More We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assuming a strongly convex cost function with Lipschitz continuous gradients under very general assumptions on the gradient noise. Most notably, we show that, for a nonlinearity with bounded outputs and for the gradient noise that may not have finite moments of order greater than one, the nonlinear SGD's mean squared error (MSE), or equivalently, the expected cost function's optimality gap, converges to zero at rate~$O(1/t^ζ)$, $ζ\in (0,1)$. In contrast, for the same noise setting, the linear SGD generates a sequence with unbounded variances. Furthermore, for the nonlinearities that can be decoupled component wise, like, e.g., sign gradient or component-wise clipping, we show that the nonlinear SGD asymptotically (locally) achieves a $O(1/t)$ rate in the weak convergence sense and explicitly quantify the corresponding asymptotic variance. Experiments show that, while our framework is more general than existing studies of SGD under heavy-tail noise, several easy-to-implement nonlinearities from our framework are competitive with state of the art alternatives on real data sets with heavy tail noises. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: Submitted for publication Nov 2021

arXiv:1912.08546 [pdf, ps, other]

Primal-dual methods for large-scale and distributed convex optimization and data analytics

Authors: Dusan Jakovetic, Dragana Bajovic, Joao Xavier, Jose M. F. Moura

Abstract: The augmented Lagrangian method (ALM) is a classical optimization tool that solves a given "difficult" (constrained) problem via finding solutions of a sequence of "easier"(often unconstrained) sub-problems with respect to the original (primal) variable, wherein constraints satisfaction is controlled via the so-called dual variables. ALM is highly flexible with respect to how primal sub-problems c… ▽ More The augmented Lagrangian method (ALM) is a classical optimization tool that solves a given "difficult" (constrained) problem via finding solutions of a sequence of "easier"(often unconstrained) sub-problems with respect to the original (primal) variable, wherein constraints satisfaction is controlled via the so-called dual variables. ALM is highly flexible with respect to how primal sub-problems can be solved, giving rise to a plethora of different primal-dual methods. The powerful ALM mechanism has recently proved to be very successful in various large scale and distributed applications. In addition, several significant advances have appeared, primarily on precise complexity results with respect to computational and communication costs in the presence of inexact updates and design and analysis of novel optimal methods for distributed consensus optimization. We provide a tutorial-style introduction to ALM and its variants for solving convex optimization problems in large scale and distributed settings. We describe control-theoretic tools for the algorithms' analysis and design, survey recent results, and provide novel insights in the context of two emerging applications: federated learning and distributed energy trading. △ Less

Submitted 14 April, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

arXiv:1809.02920 [pdf, other]

Communication-Efficient Distributed Strongly Convex Stochastic Optimization: Non-Asymptotic Rates

Authors: Anit Kumar Sahu, Dusan Jakovetic, Dragana Bajovic, Soummya Kar

Abstract: We examine fundamental tradeoffs in iterative distributed zeroth and first order stochastic optimization in multi-agent networks in terms of \emph{communication cost} (number of per-node transmissions) and \emph{computational cost}, measured by the number of per-node noisy function (respectively, gradient) evaluations with zeroth order (respectively, first order) methods. Specifically, we develop… ▽ More We examine fundamental tradeoffs in iterative distributed zeroth and first order stochastic optimization in multi-agent networks in terms of \emph{communication cost} (number of per-node transmissions) and \emph{computational cost}, measured by the number of per-node noisy function (respectively, gradient) evaluations with zeroth order (respectively, first order) methods. Specifically, we develop novel distributed stochastic optimization methods for zeroth and first order strongly convex optimization by utilizing a probabilistic inter-agent communication protocol that increasingly sparsifies communications among agents as time progresses. Under standard assumptions on the cost functions and the noise statistics, we establish with the proposed method the $O(1/(C_{\mathrm{comm}})^{4/3-ζ})$ and $O(1/(C_{\mathrm{comm}})^{8/9-ζ})$ mean square error convergence rates, for the first and zeroth order optimization, respectively, where $C_{\mathrm{comm}}$ is the expected number of network communications and $ζ>0$ is arbitrarily small. The methods are shown to achieve order-optimal convergence rates in terms of computational cost~$C_{\mathrm{comp}}$, $O(1/C_{\mathrm{comp}})$ (first order optimization) and $O(1/(C_{\mathrm{comp}})^{2/3})$ (zeroth order optimization), while achieving the order-optimal convergence rates in terms of iterations. Experiments on real-life datasets illustrate the efficacy of the proposed algorithms. △ Less

Submitted 9 September, 2018; originally announced September 2018.

Comments: 32 pages. Submitted for journal publication. Initial Submission: September 2018

arXiv:1803.07844 [pdf, other]

Distributed Zeroth Order Optimization Over Random Networks: A Kiefer-Wolfowitz Stochastic Approximation Approach

Authors: Anit Kumar Sahu, Dusan Jakovetic, Dragana Bajovic, Soummya Kar

Abstract: We study a standard distributed optimization framework where $N$ networked nodes collaboratively minimize the sum of their local convex costs. The main body of existing work considers the described problem when the underling network is either static or deterministically varying, and the distributed optimization algorithm is of first or second order, i.e., it involves the local costs' gradients and… ▽ More We study a standard distributed optimization framework where $N$ networked nodes collaboratively minimize the sum of their local convex costs. The main body of existing work considers the described problem when the underling network is either static or deterministically varying, and the distributed optimization algorithm is of first or second order, i.e., it involves the local costs' gradients and possibly the local Hessians. In this paper, we consider the currently understudied but highly relevant scenarios when: 1) only noisy function values' estimates are available (no gradients nor Hessians can be evaluated); and 2) the underlying network is randomly varying (according to an independent, identically distributed process). For the described random networks-zeroth order optimization setting, we develop a distributed stochastic approximation method of the Kiefer-Wolfowitz type. Furthermore, under standard smoothness and strong convexity assumptions on the local costs, we establish the $O(1/k^{1/2})$ mean square convergence rate for the method -- the rate that matches that of the method's centralized counterpart under equivalent conditions. △ Less

Submitted 21 March, 2018; originally announced March 2018.

Comments: Submitted to CDC 2018

arXiv:1803.07836 [pdf, other]

Convergence rates for distributed stochastic optimization over random networks

Authors: Dusan Jakovetic, Dragana Bajovic, Anit Kumar Sahu, Soummya Kar

Abstract: We establish the O($\frac{1}{k}$) convergence rate for distributed stochastic gradient methods that operate over strongly convex costs and random networks. The considered class of methods is standard each node performs a weighted average of its own and its neighbors solution estimates (consensus), and takes a negative step with respect to a noisy version of its local functions gradient (innovation… ▽ More We establish the O($\frac{1}{k}$) convergence rate for distributed stochastic gradient methods that operate over strongly convex costs and random networks. The considered class of methods is standard each node performs a weighted average of its own and its neighbors solution estimates (consensus), and takes a negative step with respect to a noisy version of its local functions gradient (innovation). The underlying communication network is modeled through a sequence of temporally independent identically distributed (i.i.d.) Laplacian matrices connected on average, while the local gradient noises are also i.i.d. in time, have finite second moment, and possibly unbounded support. We show that, after a careful setting of the consensus and innovations potentials (weights), the distributed stochastic gradient method achieves a (order-optimal) O($\frac{1}{k}$) convergence rate in the mean square distance from the solution. This is the first order-optimal convergence rate result on distributed strongly convex stochastic optimization when the network is random and/or the gradient noises have unbounded support. Simulation examples confirm the theoretical findings. △ Less

Submitted 21 March, 2018; originally announced March 2018.

Comments: Submitted to CDC 2018

arXiv:1709.01307 [pdf, other]

Distributed second order methods with increasing number of working nodes

Authors: Natasa Krklec Jerinkic, Dusan Jakovetic, Natasa Krejic, Dragana Bajovic

Abstract: Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays id… ▽ More Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays idle with probability $1-p_k$, while the activations are independent both across nodes and across iterations. In this paper, we demonstrate that the idling mechanism can be successfully incorporated in \emph{distributed second order methods} also. Specifically, we apply the idling mechanism to the recently proposed Distributed Quasi Newton method (DQN). We first show theoretically that, when $p_k$ grows to one across iterations in a controlled manner, DQN with idling exhibits very similar theoretical convergence and convergence rates properties as the standard DQN method, thus achieving the same order of convergence rate (R-linear) as the standard DQN, but with significantly cheaper updates. Simulation examples confirm the benefits of incorporating the idling mechanism, demonstrate the method's flexibility with respect to the choice of the $p_k$'s, and compare the proposed idling method with related algorithms from the literature. △ Less

Submitted 20 September, 2018; v1 submitted 5 September, 2017; originally announced September 2017.

arXiv:1509.01703 [pdf, other]

Newton-like method with diagonal correction for distributed optimization

Authors: Dragana Bajovic, Dusan Jakovetic, Natasa Krejic, Natasa Krklec Jerinkic

Abstract: We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the di… ▽ More We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the distributed methods, but this task is challenging: although the Hessians which arise in the algorithm design respect the sparsity of the network, their inverses are dense, hence rendering distributed implementations difficult. We overcome this challenge and propose a class of distributed Newton-like methods, which we refer to as Distributed Quasi Newton (DQN). The DQN family approximates the Hessian inverse by: 1) splitting the Hessian into its diagonal and off-diagonal part, 2) inverting the diagonal part, and 3) approximating the inverse of the off-diagonal part through a weighted linear function. The approximation is parameterized by the tuning variables which correspond to different splittings of the Hessian and by different weightings of the off-diagonal Hessian part. Specific choices of the tuning variables give rise to different variants of the proposed general DQN method -- dubbed DQN-0, DQN-1 and DQN-2 -- which mutually trade-off communication and computational costs for convergence. Simulations demonstrate the effectiveness of the proposed DQN methods. △ Less

Submitted 20 February, 2017; v1 submitted 5 September, 2015; originally announced September 2015.

Comments: authors' order is alphabetical; last revision of the paper on Feb 7, 2017

arXiv:1504.04049 [pdf, other]

doi 10.1109/TSP.2016.2560133

Distributed Gradient Methods with Variable Number of Working Nodes

Authors: Dusan Jakovetic, Dragana Bajovic, Natasa Krejic, Natasa Krklec-Jerinkic

Abstract: We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by… ▽ More We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by weight-averaging its solution estimate with the estimates of its active neighbors, taking a negative gradient step with respect to its local cost, and performing a projection onto the constraint set; inactive nodes perform no updates. Assuming that nodes' local costs are strongly convex, with Lipschitz continuous gradients, we show that, as long as activation probability $p_k$ grows to one asymptotically, our algorithm converges in the mean square sense (MSS) to the same solution as the standard distributed gradient method, i.e., as if all the nodes were active at all iterations. Moreover, when $p_k$ grows to one linearly, with an appropriately set convergence factor, the algorithm has a linear MSS convergence, with practically the same factor as the standard distributed gradient method. Simulations on both synthetic and real world data sets demonstrate that, when compared with the standard distributed gradient method, the proposed algorithm significantly reduces the overall number of per-node communications and per-node gradient evaluations (computational cost) for the same required accuracy. △ Less

Submitted 10 March, 2016; v1 submitted 15 April, 2015; originally announced April 2015.

Comments: submitted to a journal on April 15, 2015; revised on September 23, 2015, and March 10, 2016

arXiv:1202.6389 [pdf, other]

doi 10.1109/TSP.2013.2248003

Consensus and Products of Random Stochastic Matrices: Exact Rate for Convergence in Probability

Authors: Dragana Bajovic, Joao Xavier, Jose M. F. Moura, Bruno Sinopoli

Abstract: Distributed consensus and other linear systems with system stochastic matrices $W_k$ emerge in various settings, like opinion formation in social networks, rendezvous of robots, and distributed inference in sensor networks. The matrices $W_k$ are often random, due to, e.g., random packet dropouts in wireless sensor networks. Key in analyzing the performance of such systems is studying convergence… ▽ More Distributed consensus and other linear systems with system stochastic matrices $W_k$ emerge in various settings, like opinion formation in social networks, rendezvous of robots, and distributed inference in sensor networks. The matrices $W_k$ are often random, due to, e.g., random packet dropouts in wireless sensor networks. Key in analyzing the performance of such systems is studying convergence of matrix products $W_kW_{k-1}... W_1$. In this paper, we find the exact exponential rate $I$ for the convergence in probability of the product of such matrices when time $k$ grows large, under the assumption that the $W_k$'s are symmetric and independent identically distributed in time. Further, for commonly used random models like with gossip and link failure, we show that the rate $I$ is found by solving a min-cut problem and, hence, easily computable. Finally, we apply our results to optimally allocate the sensors' transmission power in consensus+innovations distributed detection. △ Less

Submitted 28 February, 2012; originally announced February 2012.

Showing 1–15 of 15 results for author: Bajovic, D