Skip to main content

Showing 1–15 of 15 results for author: Bajovic, D

Searching in archive math. Search in all archives.
.
  1. arXiv:2505.24464  [pdf, ps, other

    math.OC cs.LG

    Distributed gradient methods under heavy-tailed communication noise

    Authors: Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic, Soummya Kar

    Abstract: We consider a standard distributed optimization problem in which networked nodes collaboratively minimize the sum of their locally known convex costs. For this setting, we address for the first time the fundamental problem of design and analysis of distributed methods to solve the above problem when inter-node communication is subject to \emph{heavy-tailed} noise. Heavy-tailed noise is highly rele… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: This work has been submitted to the IEEE for possible publication

    MSC Class: 90C25; 65K05

  2. arXiv:2410.15637  [pdf, ps, other

    cs.LG math.OC math.PR

    Large Deviation Upper Bounds and Improved MSE Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry

    Authors: Aleksandar Armacki, Shuhua Yu, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

    Abstract: We study large deviation upper bounds and mean-squared error (MSE) guarantees of a general framework of nonlinear stochastic gradient methods in the online setting, in the presence of heavy-tailed noise. Unlike existing works that rely on the closed form of a nonlinearity (typically clipping), our framework treats the nonlinearity in a black-box manner, allowing us to provide unified guarantees fo… ▽ More

    Submitted 21 March, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

    Comments: 35 pages. arXiv admin note: text overlap with arXiv:2410.13954

  3. arXiv:2410.13954  [pdf, other

    cs.LG math.OC

    Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees

    Authors: Aleksandar Armacki, Shuhua Yu, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

    Abstract: We study high-probability convergence in online learning, in the presence of heavy-tailed noise. To combat the heavy tails, a general framework of nonlinear SGD methods is considered, subsuming several popular nonlinearities like sign, quantization, component-wise and joint clipping. In our work the nonlinearity is treated in a black-box manner, allowing us to establish unified guarantees for a br… ▽ More

    Submitted 20 March, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

    Comments: 40 pages, 6 figures

  4. arXiv:2310.18784  [pdf, other

    cs.LG math.OC math.ST stat.ML

    High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

    Authors: Aleksandar Armacki, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

    Abstract: We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong resul… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: 30 pages, 3 figures

  5. arXiv:2212.11959  [pdf, other

    math.OC cs.IT

    Nonlinear consensus+innovations under correlated heavy-tailed noises: Mean square convergence rate and asymptotics

    Authors: Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic, Soummya Kar

    Abstract: We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent identically distributed (i.i.d.) in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed,… ▽ More

    Submitted 9 November, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    MSC Class: 93E10; 93E35; 60G35; 94A13; 62M05

  6. arXiv:2211.00969  [pdf, other

    cs.LG cs.IT math.OC stat.ML

    Large deviations rates for stochastic gradient descent with strongly convex functions

    Authors: Dragana Bajovic, Dusan Jakovetic, Soummya Kar

    Abstract: Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily boun… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: 32 pages, 2 figures

  7. arXiv:2204.02593  [pdf, other

    math.OC cs.IT cs.LG

    Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

    Authors: Dusan Jakovetic, Dragana Bajovic, Anit Kumar Sahu, Soummya Kar, Nemanja Milosevic, Dusan Stamenkovic

    Abstract: We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assum… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: Submitted for publication Nov 2021

  8. arXiv:1912.08546  [pdf, ps, other

    math.OC cs.IT

    Primal-dual methods for large-scale and distributed convex optimization and data analytics

    Authors: Dusan Jakovetic, Dragana Bajovic, Joao Xavier, Jose M. F. Moura

    Abstract: The augmented Lagrangian method (ALM) is a classical optimization tool that solves a given "difficult" (constrained) problem via finding solutions of a sequence of "easier"(often unconstrained) sub-problems with respect to the original (primal) variable, wherein constraints satisfaction is controlled via the so-called dual variables. ALM is highly flexible with respect to how primal sub-problems c… ▽ More

    Submitted 14 April, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

  9. arXiv:1809.02920  [pdf, other

    math.OC

    Communication-Efficient Distributed Strongly Convex Stochastic Optimization: Non-Asymptotic Rates

    Authors: Anit Kumar Sahu, Dusan Jakovetic, Dragana Bajovic, Soummya Kar

    Abstract: We examine fundamental tradeoffs in iterative distributed zeroth and first order stochastic optimization in multi-agent networks in terms of \emph{communication cost} (number of per-node transmissions) and \emph{computational cost}, measured by the number of per-node noisy function (respectively, gradient) evaluations with zeroth order (respectively, first order) methods. Specifically, we develop… ▽ More

    Submitted 9 September, 2018; originally announced September 2018.

    Comments: 32 pages. Submitted for journal publication. Initial Submission: September 2018

  10. arXiv:1803.07844  [pdf, other

    math.OC

    Distributed Zeroth Order Optimization Over Random Networks: A Kiefer-Wolfowitz Stochastic Approximation Approach

    Authors: Anit Kumar Sahu, Dusan Jakovetic, Dragana Bajovic, Soummya Kar

    Abstract: We study a standard distributed optimization framework where $N$ networked nodes collaboratively minimize the sum of their local convex costs. The main body of existing work considers the described problem when the underling network is either static or deterministically varying, and the distributed optimization algorithm is of first or second order, i.e., it involves the local costs' gradients and… ▽ More

    Submitted 21 March, 2018; originally announced March 2018.

    Comments: Submitted to CDC 2018

  11. arXiv:1803.07836  [pdf, other

    math.OC

    Convergence rates for distributed stochastic optimization over random networks

    Authors: Dusan Jakovetic, Dragana Bajovic, Anit Kumar Sahu, Soummya Kar

    Abstract: We establish the O($\frac{1}{k}$) convergence rate for distributed stochastic gradient methods that operate over strongly convex costs and random networks. The considered class of methods is standard each node performs a weighted average of its own and its neighbors solution estimates (consensus), and takes a negative step with respect to a noisy version of its local functions gradient (innovation… ▽ More

    Submitted 21 March, 2018; originally announced March 2018.

    Comments: Submitted to CDC 2018

  12. arXiv:1709.01307  [pdf, other

    cs.IT math.OC

    Distributed second order methods with increasing number of working nodes

    Authors: Natasa Krklec Jerinkic, Dusan Jakovetic, Natasa Krejic, Dragana Bajovic

    Abstract: Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays id… ▽ More

    Submitted 20 September, 2018; v1 submitted 5 September, 2017; originally announced September 2017.

  13. arXiv:1509.01703  [pdf, other

    cs.IT math.OC

    Newton-like method with diagonal correction for distributed optimization

    Authors: Dragana Bajovic, Dusan Jakovetic, Natasa Krejic, Natasa Krklec Jerinkic

    Abstract: We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the di… ▽ More

    Submitted 20 February, 2017; v1 submitted 5 September, 2015; originally announced September 2015.

    Comments: authors' order is alphabetical; last revision of the paper on Feb 7, 2017

  14. Distributed Gradient Methods with Variable Number of Working Nodes

    Authors: Dusan Jakovetic, Dragana Bajovic, Natasa Krejic, Natasa Krklec-Jerinkic

    Abstract: We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by… ▽ More

    Submitted 10 March, 2016; v1 submitted 15 April, 2015; originally announced April 2015.

    Comments: submitted to a journal on April 15, 2015; revised on September 23, 2015, and March 10, 2016

  15. arXiv:1202.6389  [pdf, other

    math.PR cs.IT cs.SI

    Consensus and Products of Random Stochastic Matrices: Exact Rate for Convergence in Probability

    Authors: Dragana Bajovic, Joao Xavier, Jose M. F. Moura, Bruno Sinopoli

    Abstract: Distributed consensus and other linear systems with system stochastic matrices $W_k$ emerge in various settings, like opinion formation in social networks, rendezvous of robots, and distributed inference in sensor networks. The matrices $W_k$ are often random, due to, e.g., random packet dropouts in wireless sensor networks. Key in analyzing the performance of such systems is studying convergence… ▽ More

    Submitted 28 February, 2012; originally announced February 2012.