Decoupling Gating from Linearity

Fiat, Jonathan; Malach, Eran; Shalev-Shwartz, Shai

Computer Science > Machine Learning

arXiv:1906.05032 (cs)

[Submitted on 12 Jun 2019]

Title:Decoupling Gating from Linearity

Authors:Jonathan Fiat, Eran Malach, Shai Shalev-Shwartz

View PDF

Abstract:ReLU neural-networks have been in the focus of many recent theoretical works, trying to explain their empirical success. Nonetheless, there is still a gap between current theoretical results and empirical observations, even in the case of shallow (one hidden-layer) networks. For example, in the task of memorizing a random sample of size $m$ and dimension $d$, the best theoretical result requires the size of the network to be $\tilde{\Omega}(\frac{m^2}{d})$, while empirically a network of size slightly larger than $\frac{m}{d}$ is sufficient. To bridge this gap, we turn to study a simplified model for ReLU networks. We observe that a ReLU neuron is a product of a linear function with a gate (the latter determines whether the neuron is active or not), where both share a jointly trained weight vector. In this spirit, we introduce the Gated Linear Unit (GaLU), which simply decouples the linearity from the gating by assigning different vectors for each role. We show that GaLU networks allow us to get optimization and generalization results that are much stronger than those available for ReLU networks. Specifically, we show a memorization result for networks of size $\tilde{\Omega}(\frac{m}{d})$, and improved generalization bounds. Finally, we show that in some scenarios, GaLU networks behave similarly to ReLU networks, hence proving to be a good choice of a simplified model.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1906.05032 [cs.LG]
	(or arXiv:1906.05032v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1906.05032

Submission history

From: Eran Malach [view email]
[v1] Wed, 12 Jun 2019 09:43:05 UTC (52 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-06

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jonathan Fiat
Eran Malach
Shai Shalev-Shwartz

export BibTeX citation

Computer Science > Machine Learning

Title:Decoupling Gating from Linearity

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Decoupling Gating from Linearity

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators