-
The Dynamics of Gradient Descent for Overparametrized Neural Networks
Authors:
Siddhartha Satpathi,
R Srikant
Abstract:
We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neu…
▽ More
We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution subject to the condition that there is no training error when using the linear approximation to the neural network. To illustrate the application of this result, we show that the GD converges to a prediction function that generalizes well, thereby providing an alternative proof of the generalization results in Arora et al. (2019).
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation
Authors:
Semih Cayci,
Siddhartha Satpathi,
Niao He,
R. Srikant
Abstract:
In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}. We consider two practically used algorithms, projection-free and max-norm regularized Neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our result…
▽ More
In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}. We consider two practically used algorithms, projection-free and max-norm regularized Neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms, both in terms of sample complexity and overparameterization. In particular, we prove that max-norm regularization improves state-of-the-art sample complexity and overparameterization bounds. The results in this work rely on a novel Lyapunov drift analysis of the network parameters as a stopped and controlled random process.
△ Less
Submitted 5 August, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Learning Latent Events from Network Message Logs
Authors:
Siddhartha Satpathi,
Supratim Deb,
R Srikant,
He Yan
Abstract:
We consider the problem of separating error messages generated in large distributed data center networks into error events. In such networks, each error event leads to a stream of messages generated by hardware and software components affected by the event. These messages are stored in a giant message log. We consider the unsupervised learning problem of identifying the signatures of events that g…
▽ More
We consider the problem of separating error messages generated in large distributed data center networks into error events. In such networks, each error event leads to a stream of messages generated by hardware and software components affected by the event. These messages are stored in a giant message log. We consider the unsupervised learning problem of identifying the signatures of events that generated these messages; here, the signature of an error event refers to the mixture of messages generated by the event. One of the main contributions of the paper is a novel mapping of our problem which transforms it into a problem of topic discovery in documents. Events in our problem correspond to topics and messages in our problem correspond to words in the topic discovery problem. However, there is no direct analog of documents. Therefore, we use a non-parametric change-point detection algorithm, which has linear computational complexity in the number of messages, to divide the message log into smaller subsets called episodes, which serve as the equivalents of documents. After this mapping has been done, we use a well-known algorithm for topic discovery, called LDA, to solve our problem. We theoretically analyze the change-point detection algorithm, and show that it is consistent and has low sample complexity. We also demonstrate the scalability of our algorithm on a real data set consisting of $97$ million messages collected over a period of $15$ days, from a distributed data center network which supports the operations of a large wireless service provider.
△ Less
Submitted 17 July, 2019; v1 submitted 10 April, 2018;
originally announced April 2018.
-
A Modified Multiple OLS (m$^2$OLS) Algorithm for Signal Recovery in Compressive Sensing
Authors:
Samrat Mukhopadhyay,
Siddhartha Satpathi,
Mrityunjoy Chakraborty
Abstract:
Orthogonal least square (OLS) is an important sparse signal recovery algorithm for compressive sensing, which enjoys superior probability of success over other well-known recovery algorithms under conditions of correlated measurement matrices. Multiple OLS (mOLS) is a recently proposed improved version of OLS which selects multiple candidates per iteration by generalizing the greedy selection prin…
▽ More
Orthogonal least square (OLS) is an important sparse signal recovery algorithm for compressive sensing, which enjoys superior probability of success over other well-known recovery algorithms under conditions of correlated measurement matrices. Multiple OLS (mOLS) is a recently proposed improved version of OLS which selects multiple candidates per iteration by generalizing the greedy selection principle used in OLS and enjoys faster convergence than OLS. In this paper, we present a refined version of the mOLS algorithm where at each step of the iteration, we first preselect a submatrix of the measurement matrix suitably and then apply the mOLS computations to the chosen submatrix. Since mOLS now works only on a submatrix and not on the overall matrix, computations reduce drastically. Convergence of the algorithm, however, requires ensuring passage of true candidates through the two stages of preselection and mOLS based selection successively. This paper presents convergence conditions for both noisy and noise free signal models. The proposed algorithm enjoys faster convergence properties similar to mOLS, at a much reduced computational complexity.
△ Less
Submitted 1 August, 2018; v1 submitted 27 November, 2015;
originally announced November 2015.
-
Optimal Offline and Competitive Online Strategies for Transmitter-Receiver Energy Harvesting
Authors:
Siddhartha Satpathi,
Rushil Nagda,
Rahul Vaze
Abstract:
A joint transmitter-receiver energy harvesting model is considered, where both the transmitter and receiver are powered by (renewable) energy harvesting source. Given a fixed number of bits, the problem is to find the optimal transmission power profile at the transmitter and ON-OFF profile at the receiver to minimize the transmission time. With infinite capacity at both the transmitter and receive…
▽ More
A joint transmitter-receiver energy harvesting model is considered, where both the transmitter and receiver are powered by (renewable) energy harvesting source. Given a fixed number of bits, the problem is to find the optimal transmission power profile at the transmitter and ON-OFF profile at the receiver to minimize the transmission time. With infinite capacity at both the transmitter and receiver, optimal offline and optimal online policies are derived. The optimal online policy is shown to be two-competitive in the arbitrary input case. With finite battery capacities at both ends, only random energy arrival sequence with given distribution are considered, for which an online policy with bounded expected competitive ratio is proposed.
△ Less
Submitted 4 March, 2015; v1 submitted 8 December, 2014;
originally announced December 2014.
-
Optimal Offline and Competitive Online Strategies for Transmitter-Receiver Energy Harvesting
Authors:
Rushil Nagda,
Siddharth Satpathi,
Rahul Vaze
Abstract:
Transmitter-receiver energy harvesting model is assumed, where both the transmitter and receiver are powered by random energy source. Given a fixed number of bits, the problem is to find the optimal transmission power profile at the transmitter and ON-OFF profile at the receiver to minimize the transmission time. Structure of the optimal offline strategy is derived together with an optimal offline…
▽ More
Transmitter-receiver energy harvesting model is assumed, where both the transmitter and receiver are powered by random energy source. Given a fixed number of bits, the problem is to find the optimal transmission power profile at the transmitter and ON-OFF profile at the receiver to minimize the transmission time. Structure of the optimal offline strategy is derived together with an optimal offline policy. An online policy with competitive ratio of strictly less than two is also derived.
△ Less
Submitted 6 October, 2014;
originally announced October 2014.
-
On the Number of Iterations for Convergence of CoSaMP and Subspace Pursuit Algorithms
Authors:
Siddhartha Satpathi,
Mrityunjoy Chakraborty
Abstract:
In compressive sensing, one important parameter that characterizes the various greedy recovery algorithms is the iteration bound which provides the maximum number of iterations by which the algorithm is guaranteed to converge. In this letter, we present a new iteration bound for CoSaMP by certain mathematical manipulations including formulation of appropriate sufficient conditions that ensure pass…
▽ More
In compressive sensing, one important parameter that characterizes the various greedy recovery algorithms is the iteration bound which provides the maximum number of iterations by which the algorithm is guaranteed to converge. In this letter, we present a new iteration bound for CoSaMP by certain mathematical manipulations including formulation of appropriate sufficient conditions that ensure passage of a chosen support through the two selection stages of CoSaMP, Augment and Update. Subsequently, we extend the treatment to the subspace pursuit (SP) algorithm. The proposed iteration bounds for both CoSaMP and SP algorithms are seen to be improvements over their existing counterparts, revealing that both CoSaMP and SP algorithms converge in fewer iterations than suggested by results available in literature.
△ Less
Submitted 23 November, 2016; v1 submitted 19 April, 2014;
originally announced April 2014.
-
Group-Sparse Model Selection: Hardness and Relaxations
Authors:
Luca Baldassarre,
Nirav Bhan,
Volkan Cevher,
Anastasios Kyrillidis,
Siddhartha Satpathi
Abstract:
Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing. The main promise of these models is the recovery of "interpretable" signals through the identification of their constituent groups. In this paper, we establish a combinatorial framework for group-model selection problems and highli…
▽ More
Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing. The main promise of these models is the recovery of "interpretable" signals through the identification of their constituent groups. In this paper, we establish a combinatorial framework for group-model selection problems and highlight the underlying tractability issues. In particular, we show that the group-model selection problem is equivalent to the well-known NP-hard weighted maximum coverage problem (WMC). Leveraging a graph-based understanding of group models, we describe group structures which enable correct model selection in polynomial time via dynamic programming. Furthermore, group structures that lead to totally unimodular constraints have tractable discrete as well as convex relaxations. We also present a generalization of the group-model that allows for within group sparsity, which can be used to model hierarchical sparsity. Finally, we study the Pareto frontier of group-sparse approximations for two tractable models, among which the tree sparsity model, and illustrate selection and computation trade-offs between our framework and the existing convex relaxations.
△ Less
Submitted 4 March, 2015; v1 submitted 13 March, 2013;
originally announced March 2013.
-
Improved Bounds on RIP for Generalized Orthogonal Matching Pursuit
Authors:
Siddhartha Satpathi,
Rajib Lochan Das,
Mrityunjoy Chakraborty
Abstract:
Generalized Orthogonal Matching Pursuit (gOMP) is a natural extension of OMP algorithm where unlike OMP, it may select $N (\geq1)$ atoms in each iteration. In this paper, we demonstrate that gOMP can successfully reconstruct a $K$-sparse signal from a compressed measurement $ {\bf y}={\bf Φx}$ by $K^{th}$ iteration if the sensing matrix ${\bf Φ}$ satisfies restricted isometry property (RIP) of ord…
▽ More
Generalized Orthogonal Matching Pursuit (gOMP) is a natural extension of OMP algorithm where unlike OMP, it may select $N (\geq1)$ atoms in each iteration. In this paper, we demonstrate that gOMP can successfully reconstruct a $K$-sparse signal from a compressed measurement $ {\bf y}={\bf Φx}$ by $K^{th}$ iteration if the sensing matrix ${\bf Φ}$ satisfies restricted isometry property (RIP) of order $NK$ where $δ_{NK} < \frac {\sqrt{N}}{\sqrt{K}+2\sqrt{N}}$. Our bound offers an improvement over the very recent result shown in \cite{wang_2012b}. Moreover, we present another bound for gOMP of order $NK+1$ with $δ_{NK+1} < \frac {\sqrt{N}}{\sqrt{K}+\sqrt{N}}$ which exactly relates to the near optimal bound of $δ_{K+1} < \frac {1}{\sqrt{K}+1}$ for OMP (N=1) as shown in \cite{wang_2012a}.
△ Less
Submitted 3 February, 2013;
originally announced February 2013.