-
Stationary Proportional Hazard Processes via Complementary Power Function Distribution Processes
Authors:
Barry C. Arnold,
B. G. Manjunath,
S. Sachdeva
Abstract:
In the following, we introduce new proportional hazard (PH) processes, which are derived by a marginal transformation applied to complementary power function distribution (CPFD) processes. Also, we introduce two new Pareto processes, which are derived from the proportional hazard family. We discuss distributional features of such processes, explore inferential aspects and include an example of app…
▽ More
In the following, we introduce new proportional hazard (PH) processes, which are derived by a marginal transformation applied to complementary power function distribution (CPFD) processes. Also, we introduce two new Pareto processes, which are derived from the proportional hazard family. We discuss distributional features of such processes, explore inferential aspects and include an example of applications of the new processes to real-life data.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
How are cities pledging net zero? A computational approach to analyzing subnational climate strategies
Authors:
Siddharth Sachdeva,
Angel Hsu,
Ian French,
Elwin Lim
Abstract:
Cities have become primary actors on climate change and are increasingly setting goals aimed at net-zero emissions. The rapid proliferation of subnational governments "racing to zero" emissions and articulating their own climate mitigation plans warrants closer examination to understand how these actors intend to meet these goals. The scattered, incomplete and heterogeneous nature of city climate…
▽ More
Cities have become primary actors on climate change and are increasingly setting goals aimed at net-zero emissions. The rapid proliferation of subnational governments "racing to zero" emissions and articulating their own climate mitigation plans warrants closer examination to understand how these actors intend to meet these goals. The scattered, incomplete and heterogeneous nature of city climate policy documents, however, has made their systemic analysis challenging. We analyze 318 climate action documents from cities that have pledged net-zero targets or joined a transnational climate initiative with this goal using machine learning-based natural language processing (NLP) techniques. We use these approaches to accomplish two primary goals: 1) determine text patterns that predict "ambitious" net-zero targets, where we define an ambitious target as one that encompasses a subnational government's economy-wide emissions; and 2) perform a sectoral analysis to identify patterns and trade-offs in climate action themes (i.e., land-use, industry, buildings, etc.). We find that cities that have defined ambitious climate actions tend to emphasize quantitative metrics and specific high-emitting sectors in their plans, supported by mentions of governance and citizen participation. Cities predominantly emphasize energy-related actions in their plans, particularly in the buildings, transport and heating sectors, but often at the expense of other sectors, including land-use and climate impacts. The method presented in this paper provides a replicable, scalable approach to analyzing climate action plans and a first step towards facilitating cross-city learning.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
Regularized linear autoencoders recover the principal components, eventually
Authors:
Xuchan Bao,
James Lucas,
Sushant Sachdeva,
Roger Grosse
Abstract:
Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs). We show that when trained with proper regularization, LAEs can directly learn the optimal representation -- ordered, axis-aligned principal components. We a…
▽ More
Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs). We show that when trained with proper regularization, LAEs can directly learn the optimal representation -- ordered, axis-aligned principal components. We analyze two such regularization schemes: non-uniform $\ell_2$ regularization and a deterministic variant of nested dropout [Rippel et al, ICML' 2014]. Though both regularization schemes converge to the optimal representation, we show that this convergence is slow due to ill-conditioning that worsens with increasing latent dimension. We show that the inefficiency of learning the optimal representation is not inevitable -- we present a simple modification to the gradient descent update that greatly speeds up convergence empirically.
△ Less
Submitted 1 October, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Faster Graph Embeddings via Coarsening
Authors:
Matthew Fahrbach,
Gramoz Goranci,
Richard Peng,
Sushant Sachdeva,
Chi Wang
Abstract:
Graph embeddings are a ubiquitous tool for machine learning tasks, such as node classification and link prediction, on graph-structured data. However, computing the embeddings for large-scale graphs is prohibitively inefficient even if we are interested only in a small subset of relevant vertices. To address this, we present an efficient graph coarsening approach, based on Schur complements, for c…
▽ More
Graph embeddings are a ubiquitous tool for machine learning tasks, such as node classification and link prediction, on graph-structured data. However, computing the embeddings for large-scale graphs is prohibitively inefficient even if we are interested only in a small subset of relevant vertices. To address this, we present an efficient graph coarsening approach, based on Schur complements, for computing the embedding of the relevant vertices. We prove that these embeddings are preserved exactly by the Schur complement graph that is obtained via Gaussian elimination on the non-relevant vertices. As computing Schur complements is expensive, we give a nearly-linear time algorithm that generates a coarsened graph on the relevant vertices that provably matches the Schur complement in expectation in each iteration. Our experiments involving prediction tasks on graphs demonstrate that computing embeddings on the coarsened graph, rather than the entire graph, leads to significant time savings without sacrificing accuracy.
△ Less
Submitted 22 October, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
A Convergent and Dimension-Independent Min-Max Optimization Algorithm
Authors:
Vijay Keswani,
Oren Mangoubi,
Sushant Sachdeva,
Nisheeth K. Vishnoi
Abstract:
We study a variant of a recently introduced min-max optimization framework where the max-player is constrained to update its parameters in a greedy manner until it reaches a first-order stationary point. Our equilibrium definition for this framework depends on a proposal distribution which the min-player uses to choose directions in which to update its parameters. We show that, given a smooth and…
▽ More
We study a variant of a recently introduced min-max optimization framework where the max-player is constrained to update its parameters in a greedy manner until it reaches a first-order stationary point. Our equilibrium definition for this framework depends on a proposal distribution which the min-player uses to choose directions in which to update its parameters. We show that, given a smooth and bounded nonconvex-nonconcave objective function, access to any proposal distribution for the min-player's updates, and stochastic gradient oracle for the max-player, our algorithm converges to the aforementioned approximate local equilibrium in a number of iterations that does not depend on the dimension. The equilibrium point found by our algorithm depends on the proposal distribution, and when applying our algorithm to train GANs we choose the proposal distribution to be a distribution of stochastic gradients. We empirically evaluate our algorithm on challenging nonconvex-nonconcave test-functions and loss functions arising in GAN training. Our algorithm converges on these test functions and, when used to train GANs, trains stably on synthetic and real-world datasets and avoids mode collapse
△ Less
Submitted 30 June, 2022; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Authors:
Guodong Zhang,
Lala Li,
Zachary Nado,
James Martens,
Sushant Sachdeva,
George E. Dahl,
Christopher J. Shallue,
Roger Grosse
Abstract:
Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critical batch size changes based on properties of the optimization algorithm, including acceleration and preconditioning, through two different lenses: large scale experiments, and analysis of a simple noi…
▽ More
Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critical batch size changes based on properties of the optimization algorithm, including acceleration and preconditioning, through two different lenses: large scale experiments, and analysis of a simple noisy quadratic model (NQM). We experimentally demonstrate that optimization algorithms that employ preconditioning, specifically Adam and K-FAC, result in much larger critical batch sizes than stochastic gradient descent with momentum. We also demonstrate that the NQM captures many of the essential features of real neural network training, despite being drastically simpler to work with. The NQM predicts our results with preconditioned optimizers, previous results with accelerated gradient descent, and other results around optimal learning rates and large batch training, making it a useful tool to generate testable predictions about neural network optimization.
△ Less
Submitted 28 October, 2019; v1 submitted 9 July, 2019;
originally announced July 2019.
-
Iterative Refinement for $\ell_p$-norm Regression
Authors:
Deeksha Adil,
Rasmus Kyng,
Richard Peng,
Sushant Sachdeva
Abstract:
We give improved algorithms for the $\ell_{p}$-regression problem, $\min_{x} \|x\|_{p}$ such that $A x=b,$ for all $p \in (1,2) \cup (2,\infty).$ Our algorithms obtain a high accuracy solution in $\tilde{O}_{p}(m^{\frac{|p-2|}{2p + |p-2|}}) \le \tilde{O}_{p}(m^{\frac{1}{3}})$ iterations, where each iteration requires solving an $m \times m$ linear system, $m$ being the dimension of the ambient spa…
▽ More
We give improved algorithms for the $\ell_{p}$-regression problem, $\min_{x} \|x\|_{p}$ such that $A x=b,$ for all $p \in (1,2) \cup (2,\infty).$ Our algorithms obtain a high accuracy solution in $\tilde{O}_{p}(m^{\frac{|p-2|}{2p + |p-2|}}) \le \tilde{O}_{p}(m^{\frac{1}{3}})$ iterations, where each iteration requires solving an $m \times m$ linear system, $m$ being the dimension of the ambient space.
By maintaining an approximate inverse of the linear systems that we solve in each iteration, we give algorithms for solving $\ell_{p}$-regression to $1 / \text{poly}(n)$ accuracy that run in time $\tilde{O}_p(m^{\max\{ω, 7/3\}}),$ where $ω$ is the matrix multiplication constant. For the current best value of $ω> 2.37$, we can thus solve $\ell_{p}$ regression as fast as $\ell_{2}$ regression, for all constant $p$ bounded away from $1.$
Our algorithms can be combined with fast graph Laplacian linear equation solvers to give minimum $\ell_{p}$-norm flow / voltage solutions to $1 / \text{poly}(n)$ accuracy on an undirected graph with $m$ edges in $\tilde{O}_{p}(m^{1 + \frac{|p-2|}{2p + |p-2|}}) \le \tilde{O}_{p}(m^{\frac{4}{3}})$ time.
For sparse graphs and for matrices with similar dimensions, our iteration counts and running times improve on the $p$-norm regression algorithm by [Bubeck-Cohen-Lee-Li STOC`18] and general-purpose convex optimization algorithms. At the core of our algorithms is an iterative refinement scheme for $\ell_{p}$-norms, using the smoothed $\ell_{p}$-norms introduced in the work of Bubeck et al. Given an initial solution, we construct a problem that seeks to minimize a quadratically-smoothed $\ell_{p}$ norm over a subspace, such that a crude solution to this problem allows us to improve the initial solution by a constant factor, leading to algorithms with fast convergence.
△ Less
Submitted 20 January, 2019;
originally announced January 2019.