-
On the Approximability of Stationary Processes using the ARMA Model
Authors:
Anand Ganesh,
Babhrubahan Bose,
Anand Rajagopalan
Abstract:
Within the theoretical literature on stationary random variables, pure Moving Average models and pure Autoregressive models have a rich body of work, but the corresponding literature on Autoregressive Moving Average (ARMA) models is very sparse. We attempt to fill certain gaps in this sparse line of work. Central to our observations is the spectral lemma connecting supnorm based function approxima…
▽ More
Within the theoretical literature on stationary random variables, pure Moving Average models and pure Autoregressive models have a rich body of work, but the corresponding literature on Autoregressive Moving Average (ARMA) models is very sparse. We attempt to fill certain gaps in this sparse line of work. Central to our observations is the spectral lemma connecting supnorm based function approximation on the unit circle to random variable approximation. This method allows us to provide quantitative approximation bounds in contrast with the qualitative boundedness and stability guarantees associated with unit root tests. Using the spectral lemma we first identify a class of stationary processes where approximation guarantees are feasible. This turns a known heuristic argument motivating ARMA models based on rational approximations into a rigorous result. Second, we identify an idealized stationary random process for which we conjecture that a good ARMA approximation is not possible. Third, we calculate exact approximation bounds for an example process, and a constructive proof that, for a given order, Padé approximations do not always correspond to the best ARMA approximation. Unlike prior literature, our approach uses the generating function of the random process rather than the spectral measure, and further our results focus on approximation error of the random variable rather than the prediction error as in some classical infimum results by Szego, Kolmogorov, and Wiener.
△ Less
Submitted 19 March, 2025; v1 submitted 20 August, 2024;
originally announced August 2024.
-
Faster Differentially Private Convex Optimization via Second-Order Methods
Authors:
Arun Ganesh,
Mahdi Haghifam,
Thomas Steinke,
Abhradeep Thakurta
Abstract:
Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate D…
▽ More
Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate DP convex optimization. We first develop a private variant of the regularized cubic Newton method of Nesterov and Polyak, and show that for the class of strongly convex loss functions, our algorithm has quadratic convergence and achieves the optimal excess loss. We then design a practical second-order DP algorithm for the unconstrained logistic regression problem. We theoretically and empirically study the performance of our algorithm. Empirical results show our algorithm consistently achieves the best excess loss compared to other baselines and is 10-40x faster than DP-GD/DP-SGD.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks
Authors:
Arun Ganesh,
Daogao Liu,
Sewoong Oh,
Abhradeep Thakurta
Abstract:
We consider the problem of minimizing a non-convex objective while preserving the privacy of the examples in the training data. Building upon the previous variance-reduced algorithm SpiderBoost, we introduce a new framework that utilizes two different kinds of gradient oracles. The first kind of oracles can estimate the gradient of one point, and the second kind of oracles, less precise and more c…
▽ More
We consider the problem of minimizing a non-convex objective while preserving the privacy of the examples in the training data. Building upon the previous variance-reduced algorithm SpiderBoost, we introduce a new framework that utilizes two different kinds of gradient oracles. The first kind of oracles can estimate the gradient of one point, and the second kind of oracles, less precise and more cost-effective, can estimate the gradient difference between two points. SpiderBoost uses the first kind periodically, once every few steps, while our framework proposes using the first oracle whenever the total drift has become large and relies on the second oracle otherwise. This new framework ensures the gradient estimations remain accurate all the time, resulting in improved rates for finding second-order stationary points.
Moreover, we address a more challenging task of finding the global minima of a non-convex objective using the exponential mechanism. Our findings indicate that the regularized exponential mechanism can closely match previous empirical and population risk bounds, without requiring smoothness assumptions for algorithms with polynomial running time. Furthermore, by disregarding running time considerations, we show that the exponential mechanism can achieve a good population risk bound and provide a nearly matching lower bound.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
On the geometry of generalization and memorization in deep neural networks
Authors:
Cory Stephenson,
Suchismita Padhy,
Abhinav Ganesh,
Yue Hui,
Hanlin Tang,
SueYeon Chung
Abstract:
Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. To examine the structure of when and where memorization occurs in a deep network, we use a recently developed replica-based mean field theoretic geometric analysis method. We find that all layers preferentially learn from examples which share features, and link this be…
▽ More
Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. To examine the structure of when and where memorization occurs in a deep network, we use a recently developed replica-based mean field theoretic geometric analysis method. We find that all layers preferentially learn from examples which share features, and link this behavior to generalization performance. Memorization predominately occurs in the deeper layers, due to decreasing object manifolds' radius and dimension, whereas early layers are minimally affected. This predicts that generalization can be restored by reverting the final few layer weights to earlier epochs before significant memorization occurred, which is confirmed by the experiments. Additionally, by studying generalization under different model sizes, we reveal the connection between the double descent phenomenon and the underlying model geometry. Finally, analytical analysis shows that networks avoid memorization early in training because close to initialization, the gradient contribution from permuted examples are small. These findings provide quantitative evidence for the structure of memorization across layers of a deep neural network, the drivers for such structure, and its connection to manifold geometric properties.
△ Less
Submitted 30 May, 2021;
originally announced May 2021.
-
The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits
Authors:
Ronshee Chawla,
Abishek Sankararaman,
Ayalvadi Ganesh,
Sanjay Shakkottai
Abstract:
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agent…
▽ More
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate $Ω(\log(T))$ times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order $N$ smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results thus demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.
△ Less
Submitted 2 July, 2024; v1 submitted 15 January, 2020;
originally announced January 2020.
-
Social Learning in Multi Agent Multi Armed Bandits
Authors:
Abishek Sankararaman,
Ayalvadi Ganesh,
Sanjay Shakkottai
Abstract:
In this paper, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed MAB to minimize the average cumulative regret over all agents. The agents can communicate and collaborate among each other \emph{only} through a pairwise asynch…
▽ More
In this paper, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed MAB to minimize the average cumulative regret over all agents. The agents can communicate and collaborate among each other \emph{only} through a pairwise asynchronous gossip based protocol that exchange a limited number of bits. In our model, agents at each point decide on (i) which arm to play, (ii) whether to, and if so (iii) what and whom to communicate with. Agents in our model are decentralized, namely their actions only depend on their observed history in the past.
We develop a novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random. The per-agent regret scaling achieved by our algorithm is $O \left( \frac{\lceil\frac{K}{n}\rceil+\log(n)}Δ
\log(T) + \frac{\log^3(n) \log \log(n)}{Δ^2}
\right)$. Furthermore, any agent in our algorithm communicates only a total of $Θ(\log(T))$ times over a time interval of $T$.
We compare our results to two benchmarks - one where there is no communication among agents and one corresponding to complete interaction. We show both theoretically and empirically, that our algorithm experiences a significant reduction both in per-agent regret when compared to the case when agents do not collaborate and in communication complexity when compared to the full interaction setting which requires $T$ communication attempts by an agent over $T$ arm pulls.
△ Less
Submitted 4 November, 2019; v1 submitted 4 October, 2019;
originally announced October 2019.
-
Monaural Audio Speaker Separation with Source Contrastive Estimation
Authors:
Cory Stephenson,
Patrick Callier,
Abhinav Ganesh,
Karl Ni
Abstract:
We propose an algorithm to separate simultaneously speaking persons from each other, the "cocktail party problem", using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. Such a vector space can embed empirically determined speaker characteristics and is optimized by distinguishing between speaker m…
▽ More
We propose an algorithm to separate simultaneously speaking persons from each other, the "cocktail party problem", using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. Such a vector space can embed empirically determined speaker characteristics and is optimized by distinguishing between speaker masks. We call this technique source-contrastive estimation. The methodology is inspired by negative sampling, which has seen success in natural language processing, where an embedding is learned by correlating and de-correlating a given input vector with output weights. Although the matrix determined by the output weights is dependent on a set of known speakers, we only use the input vectors during inference. Doing so will ensure that source separation is explicitly speaker-independent. Our approach is similar to recent deep neural network clustering and permutation-invariant training research; we use weighted spectral features and masks to augment individual speaker frequencies while filtering out other speakers. We avoid, however, the severe computational burden of other approaches with our technique. Furthermore, by training a vector space rather than combinations of different speakers or differences thereof, we avoid the so-called permutation problem during training. Our algorithm offers an intuitive, computationally efficient response to the cocktail party problem, and most importantly boasts better empirical performance than other current techniques.
△ Less
Submitted 12 May, 2017;
originally announced May 2017.
-
Non-parametric change-point detection using string matching algorithms
Authors:
Oliver Johnson,
Dino Sejdinovic,
James Cruise,
Ayalvadi Ganesh,
Robert Piechocki
Abstract:
Given the output of a data source taking values in a finite alphabet, we wish to detect change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs w…
▽ More
Given the output of a data source taking values in a finite alphabet, we wish to detect change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately detect the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, we establish consistency of the CRECHE estimator under a related toy model, by establishing a fluid limit and using martingale arguments.
△ Less
Submitted 28 July, 2011; v1 submitted 28 June, 2011;
originally announced June 2011.