-
Extending Spike-Timing Dependent Plasticity to Learning Synaptic Delays
Authors:
Marissa Dominijanni,
Alexander Ororbia,
Kenneth W. Regan
Abstract:
Synaptic delays play a crucial role in biological neuronal networks, where their modulation has been observed in mammalian learning processes. In the realm of neuromorphic computing, although spiking neural networks (SNNs) aim to emulate biology more closely than traditional artificial neural networks do, synaptic delays are rarely incorporated into their simulation. We introduce a novel learning…
▽ More
Synaptic delays play a crucial role in biological neuronal networks, where their modulation has been observed in mammalian learning processes. In the realm of neuromorphic computing, although spiking neural networks (SNNs) aim to emulate biology more closely than traditional artificial neural networks do, synaptic delays are rarely incorporated into their simulation. We introduce a novel learning rule for simultaneously learning synaptic connection strengths and delays, by extending spike-timing dependent plasticity (STDP), a Hebbian method commonly used for learning synaptic weights. We validate our approach by extending a widely-used SNN model for classification trained with unsupervised learning. Then we demonstrate the effectiveness of our new method by comparing it against another existing methods for co-learning synaptic weights and delays as well as against STDP without synaptic delays. Results demonstrate that our proposed method consistently achieves superior performance across a variety of test scenarios. Furthermore, our experimental results yield insight into the interplay between synaptic efficacy and delay.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Data Efficiency for Large Recommendation Models
Authors:
Kshitij Jain,
Jingru Xie,
Kevin Regan,
Cheng Chen,
Jie Han,
Steve Li,
Zhuoshu Li,
Todd Phillips,
Myles Sussman,
Matt Troup,
Angel Yu,
Jia Zhuo
Abstract:
Large recommendation models (LRMs) are fundamental to the multi-billion dollar online advertising industry, processing massive datasets of hundreds of billions of examples before transitioning to continuous online training to adapt to rapidly changing user behavior. The massive scale of data directly impacts both computational costs and the speed at which new methods can be evaluated (R&D velocity…
▽ More
Large recommendation models (LRMs) are fundamental to the multi-billion dollar online advertising industry, processing massive datasets of hundreds of billions of examples before transitioning to continuous online training to adapt to rapidly changing user behavior. The massive scale of data directly impacts both computational costs and the speed at which new methods can be evaluated (R&D velocity). This paper presents actionable principles and high-level frameworks to guide practitioners in optimizing training data requirements. These strategies have been successfully deployed in Google's largest Ads CTR prediction models and are broadly applicable beyond LRMs. We outline the concept of data convergence, describe methods to accelerate this convergence, and finally, detail how to optimally balance training data volume with model size.
△ Less
Submitted 25 October, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models
Authors:
Rohan Anil,
Sandra Gadanho,
Da Huang,
Nijith Jacob,
Zhuoshu Li,
Dong Lin,
Todd Phillips,
Cristina Pop,
Kevin Regan,
Gil I. Shamir,
Rakesh Shivanna,
Qiqi Yan
Abstract:
For industrial-scale advertising systems, prediction of ad click-through rate (CTR) is a central problem. Ad clicks constitute a significant class of user engagements and are often used as the primary signal for the usefulness of ads to users. Additionally, in cost-per-click advertising systems where advertisers are charged per click, click rate expectations feed directly into value estimation. Ac…
▽ More
For industrial-scale advertising systems, prediction of ad click-through rate (CTR) is a central problem. Ad clicks constitute a significant class of user engagements and are often used as the primary signal for the usefulness of ads to users. Additionally, in cost-per-click advertising systems where advertisers are charged per click, click rate expectations feed directly into value estimation. Accordingly, CTR model development is a significant investment for most Internet advertising companies. Engineering for such problems requires many machine learning (ML) techniques suited to online learning that go well beyond traditional accuracy improvements, especially concerning efficiency, reproducibility, calibration, credit attribution. We present a case study of practical techniques deployed in Google's search ads CTR model. This paper provides an industry case study highlighting important areas of current ML research and illustrating how impactful new ML methods are evaluated and made useful in a large-scale industrial setting.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Multi-Structural Games and Number of Quantifiers
Authors:
Ronald Fagin,
Jonathan Lenchner,
Kenneth W. Regan,
Nikhil Vyas
Abstract:
We study multi-structural games, played on two sets $\mathcal{A}$ and $\mathcal{B}$ of structures. These games generalize Ehrenfeucht-Fraïssé games. Whereas Ehrenfeucht-Fraïssé games capture the quantifier rank of a first-order sentence, multi-structural games capture the number of quantifiers, in the sense that Spoiler wins the $r$-round game if and only if there is a first-order sentence $φ$ wit…
▽ More
We study multi-structural games, played on two sets $\mathcal{A}$ and $\mathcal{B}$ of structures. These games generalize Ehrenfeucht-Fraïssé games. Whereas Ehrenfeucht-Fraïssé games capture the quantifier rank of a first-order sentence, multi-structural games capture the number of quantifiers, in the sense that Spoiler wins the $r$-round game if and only if there is a first-order sentence $φ$ with at most $r$ quantifiers, where every structure in $\mathcal{A}$ satisfies $φ$ and no structure in $\mathcal{B}$ satisfies $φ$. We use these games to give a complete characterization of the number of quantifiers required to distinguish linear orders of different sizes, and develop machinery for analyzing structures beyond linear orders.
△ Less
Submitted 28 January, 2025; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Scalable Second Order Optimization for Deep Learning
Authors:
Rohan Anil,
Vineet Gupta,
Tomer Koren,
Kevin Regan,
Yoram Singer
Abstract:
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs. I…
▽ More
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs. In an attempt to bridge this gap between theoretical and practical optimization, we present a scalable implementation of a second-order preconditioned method (concretely, a variant of full-matrix Adagrad), that along with several critical algorithmic and numerical improvements, provides significant convergence and wall-clock time improvements compared to conventional first-order methods on state-of-the-art deep models. Our novel design effectively utilizes the prevalent heterogeneous hardware architecture for training deep models, consisting of a multicore CPU coupled with multiple accelerator units. We demonstrate superior performance compared to state-of-the-art on very large learning tasks such as machine translation with Transformers, language modeling with BERT, click-through rate prediction on Criteo, and image classification on ImageNet with ResNet-50.
△ Less
Submitted 5 March, 2021; v1 submitted 20 February, 2020;
originally announced February 2020.
-
Stabilizer Circuits, Quadratic Forms, and Computing Matrix Rank
Authors:
Chaowen Guan,
Kenneth W. Regan
Abstract:
We show that a form of strong simulation for $n$-qubit quantum stabilizer circuits $C$ is computable in $O(s + n^ω)$ time, where $ω$ is the exponent of matrix multiplication. Solution counting for quadratic forms over $\mathbb{F}_2$ is also placed into $O(n^ω)$ time. This improves previous $O(n^3)$ bounds. Our methods in fact show an $O(n^2)$-time reduction from matrix rank over $\mathbb{F}_2$ to…
▽ More
We show that a form of strong simulation for $n$-qubit quantum stabilizer circuits $C$ is computable in $O(s + n^ω)$ time, where $ω$ is the exponent of matrix multiplication. Solution counting for quadratic forms over $\mathbb{F}_2$ is also placed into $O(n^ω)$ time. This improves previous $O(n^3)$ bounds. Our methods in fact show an $O(n^2)$-time reduction from matrix rank over $\mathbb{F}_2$ to computing $p = |\langle \; 0^n \;|\; C \;|\; 0^n \;\rangle|^2$ (hence also to solution counting) and a converse reduction that is $O(s + n^2)$ except for matrix multiplications used to decide whether $p > 0$. The current best-known worst-case time for matrix rank is $O(n^ω)$ over $\mathbb{F}_2$, indeed over any field, while $ω$ is currently upper-bounded by $2.3728\dots$ Our methods draw on properties of classical quadratic forms over $\mathbb{Z}_4$. We study possible distributions of Feynman paths in the circuits and prove that the differences in $+1$ vs. $-1$ counts and $+i$ vs. $-i$ counts are always $0$ or a power of $2$. Further properties of quantum graph states and connections to graph theory are discussed.
△ Less
Submitted 5 April, 2019; v1 submitted 29 March, 2019;
originally announced April 2019.
-
Large Margin Deep Networks for Classification
Authors:
Gamaleldin F. Elsayed,
Dilip Krishnan,
Hossein Mobahi,
Kevin Regan,
Samy Bengio
Abstract:
We present a formulation of deep learning that aims at producing a large margin classifier. The notion of margin, minimum distance to a decision boundary, has served as the foundation of several theoretically profound and empirically successful results for both classification and regression tasks. However, most large margin algorithms are applicable only to shallow models with a preset feature rep…
▽ More
We present a formulation of deep learning that aims at producing a large margin classifier. The notion of margin, minimum distance to a decision boundary, has served as the foundation of several theoretically profound and empirically successful results for both classification and regression tasks. However, most large margin algorithms are applicable only to shallow models with a preset feature representation; and conventional margin methods for neural networks only enforce margin at the output layer. Such methods are therefore not well suited for deep networks.
In this work, we propose a novel loss function to impose a margin on any chosen set of layers of a deep network (including input and hidden layers). Our formulation allows choosing any norm on the metric measuring the margin. We demonstrate that the decision boundary obtained by our loss has nice properties compared to standard classification loss functions. Specifically, we show improved empirical results on the MNIST, CIFAR-10 and ImageNet datasets on multiple tasks: generalization from small training sets, corrupted labels, and robustness against adversarial perturbations. The resulting loss is general and complementary to existing data augmentation (such as random/adversarial input transform) and regularization techniques (such as weight decay, dropout, and batch norm).
△ Less
Submitted 3 December, 2018; v1 submitted 15 March, 2018;
originally announced March 2018.
-
Polynomials Modulo Composite Numbers: Ax-Katz type theorems for the structure of their solution sets
Authors:
Robert L. Surowka,
Kenneth W. Regan
Abstract:
We extend the Ax-Katz theorem for a single polynomial from finite fields to the rings Z_m with m composite. This extension not only yields the analogous result, but gives significantly higher divisibility bounds. We conjecture what computer runs suggest is the optimal result for any m, and prove a special case of it. The special case is for m = 2^r and polynomials of degree 2. Our results also yie…
▽ More
We extend the Ax-Katz theorem for a single polynomial from finite fields to the rings Z_m with m composite. This extension not only yields the analogous result, but gives significantly higher divisibility bounds. We conjecture what computer runs suggest is the optimal result for any m, and prove a special case of it. The special case is for m = 2^r and polynomials of degree 2. Our results also yield further properties of the solution spaces. Polynomials modulo composites are the focus of some computational complexity lower bound frontiers, while those modulo 2^r arise in the simulation of quantum circuits. We give some prospective applications of this research.
△ Less
Submitted 18 August, 2014; v1 submitted 18 April, 2014;
originally announced April 2014.
-
Regret-based Reward Elicitation for Markov Decision Processes
Authors:
Kevin Regan,
Craig Boutilier
Abstract:
The specification of aMarkov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be spec…
▽ More
The specification of aMarkov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We first discuss how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion. We then demonstrate how regret can be reduced by efficiently eliciting reward information using bound queries, using regret-reduction as a means for choosing suitable queries. Empirical results demonstrate that regret-based reward elicitation offers an effective way to produce near-optimal policies without resorting to the precise specification of the entire reward function.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.
-
Simulating Special but Natural Quantum Circuits
Authors:
Richard J. Lipton,
Kenneth W. Regan,
Atri Rudra
Abstract:
We identify a sub-class of BQP that captures certain structural commonalities among many quantum algorithms including Shor's algorithms. This class does not contain all of BQP (e.g. Grover's algorithm does not fall into this class). Our main result is that any algorithm in this class that measures at most O(log n) qubits can be simulated by classical randomized polynomial time algorithms. This doe…
▽ More
We identify a sub-class of BQP that captures certain structural commonalities among many quantum algorithms including Shor's algorithms. This class does not contain all of BQP (e.g. Grover's algorithm does not fall into this class). Our main result is that any algorithm in this class that measures at most O(log n) qubits can be simulated by classical randomized polynomial time algorithms. This does not dequantize Shor's algorithm (as the latter measures n qubits) but our work also highlights a new potentially hard function for cryptographic applications.
Our main technical contribution is (to the best of our knowledge) a new exact characterization of certain sums of Fourier-type coefficients (with exponentially many summands).
△ Less
Submitted 16 January, 2012;
originally announced January 2012.