Search | arXiv e-print repository

Better Neural Network Expressivity: Subdividing the Simplex

Authors: Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, Amir Yehudayoff

Abstract: This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21) conjectured that this result is optimal in the sense that there are CPWL functions… ▽ More This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21) conjectured that this result is optimal in the sense that there are CPWL functions on $\mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $\lceil\log_3(n-1)\rceil+1$ hidden layers are sufficient to compute all CPWL functions on $\mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $\lceil\log_3(n-2)\rceil+1$ hidden layers are sufficient to compute the maximum of $n\geq 4$ numbers. Our constructions almost match the $\lceil\log_3(n)\rceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: 11 pages, 1 figure

arXiv:2505.06169 [pdf, ps, other]

On the Depth of Monotone ReLU Neural Networks and ICNNs

Authors: Egor Bakaev, Florestan Brunck, Christoph Hertrich, Daniel Reichman, Amir Yehudayoff

Abstract: We study two models of ReLU neural networks: monotone networks (ReLU$^+$) and input convex neural networks (ICNN). Our focus is on expressivity, mostly in terms of depth, and we prove the following lower bounds. For the maximum function MAX$_n$ computing the maximum of $n$ real numbers, we show that ReLU$^+$ networks cannot compute MAX$_n$, or even approximate it. We prove a sharp $n$ lower bound… ▽ More We study two models of ReLU neural networks: monotone networks (ReLU$^+$) and input convex neural networks (ICNN). Our focus is on expressivity, mostly in terms of depth, and we prove the following lower bounds. For the maximum function MAX$_n$ computing the maximum of $n$ real numbers, we show that ReLU$^+$ networks cannot compute MAX$_n$, or even approximate it. We prove a sharp $n$ lower bound on the ICNN depth complexity of MAX$_n$. We also prove depth separations between ReLU networks and ICNNs; for every $k$, there is a depth-2 ReLU network of size $O(k^2)$ that cannot be simulated by a depth-$k$ ICNN. The proofs are based on deep connections between neural networks and polyhedral geometry, and also use isoperimetric properties of triangulations. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: 27 pages, 17 figures

arXiv:2312.13893 [pdf, ps, other]

Linear Families of Triangles and Orthology

Authors: Egor Bakaev, Pavel Kozhevnikov

Abstract: Two triangles are called orthologic if the perpendiculars from the vertices of one of them to the sides of the other are concurrent. In this paper, we explore the concept of orthology from various points of view. Mostly we work in terms of elementary geometry in $\mathbb{R}^2$. In the final section, we relate the discussed concepts to $\mathbb{R}^4$. A key idea in this paper is that two orthologic… ▽ More Two triangles are called orthologic if the perpendiculars from the vertices of one of them to the sides of the other are concurrent. In this paper, we explore the concept of orthology from various points of view. Mostly we work in terms of elementary geometry in $\mathbb{R}^2$. In the final section, we relate the discussed concepts to $\mathbb{R}^4$. A key idea in this paper is that two orthologic triangles generate a one-parameter (linear) family in which any two triangles are orthologic. Working with such a family can provide a natural approach to certain questions. △ Less

Submitted 21 December, 2023; originally announced December 2023.

MSC Class: 51N20 (Primary); 51M04

Showing 1–3 of 3 results for author: Bakaev, E