Search | arXiv e-print repository

The Computational Complexity of Counting Linear Regions in ReLU Neural Networks

Authors: Moritz Stargalla, Christoph Hertrich, Daniel Reichman

Abstract: An established measure of the expressive power of a given ReLU neural network is the number of linear regions into which it partitions the input space. There exist many different, non-equivalent definitions of what a linear region actually is. We systematically assess which papers use which definitions and discuss how they relate to each other. We then analyze the computational complexity of count… ▽ More An established measure of the expressive power of a given ReLU neural network is the number of linear regions into which it partitions the input space. There exist many different, non-equivalent definitions of what a linear region actually is. We systematically assess which papers use which definitions and discuss how they relate to each other. We then analyze the computational complexity of counting the number of such regions for the various definitions. Generally, this turns out to be an intractable problem. We prove NP- and #P-hardness results already for networks with one hidden layer and strong hardness of approximation results for two or more hidden layers. Finally, on the algorithmic side, we demonstrate that counting linear regions can at least be achieved in polynomial space for some common definitions. △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: 25 pages

arXiv:2505.14338 [pdf, ps, other]

Better Neural Network Expressivity: Subdividing the Simplex

Authors: Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, Amir Yehudayoff

Abstract: This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21) conjectured that this result is optimal in the sense that there are CPWL functions… ▽ More This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21) conjectured that this result is optimal in the sense that there are CPWL functions on $\mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $\lceil\log_3(n-1)\rceil+1$ hidden layers are sufficient to compute all CPWL functions on $\mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $\lceil\log_3(n-2)\rceil+1$ hidden layers are sufficient to compute the maximum of $n\geq 4$ numbers. Our constructions almost match the $\lceil\log_3(n)\rceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: 11 pages, 1 figure

arXiv:2505.06169 [pdf, ps, other]

On the Depth of Monotone ReLU Neural Networks and ICNNs

Authors: Egor Bakaev, Florestan Brunck, Christoph Hertrich, Daniel Reichman, Amir Yehudayoff

Abstract: We study two models of ReLU neural networks: monotone networks (ReLU$^+$) and input convex neural networks (ICNN). Our focus is on expressivity, mostly in terms of depth, and we prove the following lower bounds. For the maximum function MAX$_n$ computing the maximum of $n$ real numbers, we show that ReLU$^+$ networks cannot compute MAX$_n$, or even approximate it. We prove a sharp $n$ lower bound… ▽ More We study two models of ReLU neural networks: monotone networks (ReLU$^+$) and input convex neural networks (ICNN). Our focus is on expressivity, mostly in terms of depth, and we prove the following lower bounds. For the maximum function MAX$_n$ computing the maximum of $n$ real numbers, we show that ReLU$^+$ networks cannot compute MAX$_n$, or even approximate it. We prove a sharp $n$ lower bound on the ICNN depth complexity of MAX$_n$. We also prove depth separations between ReLU networks and ICNNs; for every $k$, there is a depth-2 ReLU network of size $O(k^2)$ that cannot be simulated by a depth-$k$ ICNN. The proofs are based on deep connections between neural networks and polyhedral geometry, and also use isoperimetric properties of triangulations. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: 27 pages, 17 figures

arXiv:2502.09324 [pdf, other]

Depth-Bounds for Neural Networks via the Braid Arrangement

Authors: Moritz Grillo, Christoph Hertrich, Georg Loho

Abstract: We contribute towards resolving the open question of how many hidden layers are required in ReLU networks for exactly representing all continuous and piecewise linear functions on $\mathbb{R}^d$. While the question has been resolved in special cases, the best known lower bound in general is still 2. We focus on neural networks that are compatible with certain polyhedral complexes, more precisely w… ▽ More We contribute towards resolving the open question of how many hidden layers are required in ReLU networks for exactly representing all continuous and piecewise linear functions on $\mathbb{R}^d$. While the question has been resolved in special cases, the best known lower bound in general is still 2. We focus on neural networks that are compatible with certain polyhedral complexes, more precisely with the braid fan. For such neural networks, we prove a non-constant lower bound of $Ω(\log\log d)$ hidden layers required to exactly represent the maximum of $d$ numbers. Additionally, under our assumption, we provide a combinatorial proof that 3 hidden layers are necessary to compute the maximum of 5 numbers; this had only been verified with an excessive computation so far. Finally, we show that a natural generalization of the best known upper bound to maxout networks is not tight, by demonstrating that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers. △ Less

Submitted 13 February, 2025; originally announced February 2025.

arXiv:2411.03006 [pdf, ps, other]

Neural Networks and (Virtual) Extended Formulations

Authors: Christoph Hertrich, Georg Loho

Abstract: Neural networks with piecewise linear activation functions, such as rectified linear units (ReLU) or maxout, are among the most fundamental models in modern machine learning. We make a step towards proving lower bounds on the size of such neural networks by linking their representative capabilities to the notion of the extension complexity $\mathrm{xc}(P)$ of a polytope $P$. This is a well-studied… ▽ More Neural networks with piecewise linear activation functions, such as rectified linear units (ReLU) or maxout, are among the most fundamental models in modern machine learning. We make a step towards proving lower bounds on the size of such neural networks by linking their representative capabilities to the notion of the extension complexity $\mathrm{xc}(P)$ of a polytope $P$. This is a well-studied quantity in combinatorial optimization and polyhedral geometry describing the number of inequalities needed to model $P$ as a linear program. We show that $\mathrm{xc}(P)$ is a lower bound on the size of any monotone or input-convex neural network that solves the linear optimization problem over $P$. This implies exponential lower bounds on such neural networks for a variety of problems, including the polynomially solvable maximum weight matching problem. In an attempt to prove similar bounds also for general neural networks, we introduce the notion of virtual extension complexity $\mathrm{vxc}(P)$, which generalizes $\mathrm{xc}(P)$ and describes the number of inequalities needed to represent the linear optimization problem over $P$ as a difference of two linear programs. We prove that $\mathrm{vxc}(P)$ is a lower bound on the size of any neural network that optimizes over $P$. While it remains an open question to derive useful lower bounds on $\mathrm{vxc}(P)$, we argue that this quantity deserves to be studied independently from neural networks by proving that one can efficiently optimize over a polytope $P$ using a small virtual extended formulation. △ Less

Submitted 11 February, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

arXiv:2410.04907 [pdf, other]

Decomposition Polyhedra of Piecewise Linear Functions

Authors: Marie-Charlotte Brandenburg, Moritz Grillo, Christoph Hertrich

Abstract: In this paper we contribute to the frequently studied question of how to decompose a continuous piecewise linear (CPWL) function into a difference of two convex CPWL functions. Every CPWL function has infinitely many such decompositions, but for applications in optimization and neural network theory, it is crucial to find decompositions with as few linear pieces as possible. This is a highly chall… ▽ More In this paper we contribute to the frequently studied question of how to decompose a continuous piecewise linear (CPWL) function into a difference of two convex CPWL functions. Every CPWL function has infinitely many such decompositions, but for applications in optimization and neural network theory, it is crucial to find decompositions with as few linear pieces as possible. This is a highly challenging problem, as we further demonstrate by disproving a recently proposed approach by Tran and Wang [Minimal representations of tropical rational functions. Algebraic Statistics, 15(1):27-59, 2024]. To make the problem more tractable, we propose to fix an underlying polyhedral complex determining the possible locus of nonlinearity. Under this assumption, we prove that the set of decompositions forms a polyhedron that arises as intersection of two translated cones. We prove that irreducible decompositions correspond to the bounded faces of this polyhedron and minimal solutions must be vertices. We then identify cases with a unique minimal decomposition, and illustrate how our insights have consequences in the theory of submodular functions. Finally, we improve upon previous constructions of neural networks for a given convex CPWL function and apply our framework to obtain results in the nonconvex case. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2311.01959 [pdf, ps, other]

A First Order Method for Linear Programming Parameterized by Circuit Imbalance

Authors: Richard Cole, Christoph Hertrich, Yixin Tao, László A. Végh

Abstract: Various first order approaches have been proposed in the literature to solve Linear Programming (LP) problems, recently leading to practically efficient solvers for large-scale LPs. From a theoretical perspective, linear convergence rates have been established for first order LP algorithms, despite the fact that the underlying formulations are not strongly convex. However, the convergence rate typ… ▽ More Various first order approaches have been proposed in the literature to solve Linear Programming (LP) problems, recently leading to practically efficient solvers for large-scale LPs. From a theoretical perspective, linear convergence rates have been established for first order LP algorithms, despite the fact that the underlying formulations are not strongly convex. However, the convergence rate typically depends on the Hoffman constant of a large matrix that contains the constraint matrix, as well as the right hand side, cost, and capacity vectors. We introduce a first order approach for LP optimization with a convergence rate depending polynomially on the circuit imbalance measure, which is a geometric parameter of the constraint matrix, and depending logarithmically on the right hand side, capacity, and cost vectors. This provides much stronger convergence guarantees. For example, if the constraint matrix is totally unimodular, we obtain polynomial-time algorithms, whereas the convergence guarantees for approaches based on primal-dual formulations may have arbitrarily slow convergence rates for this class. Our approach is based on a fast gradient method due to Necoara, Nesterov, and Glineur (Math. Prog. 2019); this algorithm is called repeatedly in a framework that gradually fixes variables to the boundary. This technique is based on a new approximate version of Tardos's method, that was used to obtain a strongly polynomial algorithm for combinatorial LPs (Oper. Res. 1986). △ Less

Submitted 27 March, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

arXiv:2305.11005 [pdf, other]

Mode Connectivity in Auction Design

Authors: Christoph Hertrich, Yixin Tao, László A. Végh

Abstract: Optimal auction design is a fundamental problem in algorithmic game theory. This problem is notoriously difficult already in very simple settings. Recent work in differentiable economics showed that neural networks can efficiently learn known optimal auction mechanisms and discover interesting new ones. In an attempt to theoretically justify their empirical success, we focus on one of the first su… ▽ More Optimal auction design is a fundamental problem in algorithmic game theory. This problem is notoriously difficult already in very simple settings. Recent work in differentiable economics showed that neural networks can efficiently learn known optimal auction mechanisms and discover interesting new ones. In an attempt to theoretically justify their empirical success, we focus on one of the first such networks, RochetNet, and a generalized version for affine maximizer auctions. We prove that they satisfy mode connectivity, i.e., locally optimal solutions are connected by a simple, piecewise linear path such that every solution on the path is almost as good as one of the two local optima. Mode connectivity has been recently investigated as an intriguing empirical and theoretically justifiable property of neural networks used for prediction problems. Our results give the first such analysis in the context of differentiable economics, where neural networks are used directly for solving non-convex optimization problems. △ Less

Submitted 17 July, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Conference paper published at NeurIPS 2023

arXiv:2303.17045 [pdf, ps, other]

Training Neural Networks is NP-Hard in Fixed Dimension

Authors: Vincent Froese, Christoph Hertrich

Abstract: We study the parameterized complexity of training two-layer neural networks with respect to the dimension of the input data and the number of hidden neurons, considering ReLU and linear threshold activation functions. Albeit the computational complexity of these problems has been studied numerous times in recent years, several questions are still open. We answer questions by Arora et al. [ICLR '18… ▽ More We study the parameterized complexity of training two-layer neural networks with respect to the dimension of the input data and the number of hidden neurons, considering ReLU and linear threshold activation functions. Albeit the computational complexity of these problems has been studied numerous times in recent years, several questions are still open. We answer questions by Arora et al. [ICLR '18] and Khalife and Basu [IPCO '22] showing that both problems are NP-hard for two dimensions, which excludes any polynomial-time algorithm for constant dimension. We also answer a question by Froese et al. [JAIR '22] proving W[1]-hardness for four ReLUs (or two linear threshold neurons) with zero training error. Finally, in the ReLU case, we show fixed-parameter tractability for the combined parameter number of dimensions and number of ReLUs if the network is assumed to compute a convex map. Our results settle the complexity status regarding these parameters almost completely. △ Less

Submitted 18 January, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: Paper accepted at NeurIPS 2023

arXiv:2302.12553 [pdf, ps, other]

Lower Bounds on the Depth of Integral ReLU Neural Networks via Lattice Polytopes

Authors: Christian Haase, Christoph Hertrich, Georg Loho

Abstract: We prove that the set of functions representable by ReLU neural networks with integer weights strictly increases with the network depth while allowing arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden layers are indeed necessary to compute the maximum of $n$ numbers, matching known upper bounds. Our results are based on the known duality between neural networks and Newto… ▽ More We prove that the set of functions representable by ReLU neural networks with integer weights strictly increases with the network depth while allowing arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden layers are indeed necessary to compute the maximum of $n$ numbers, matching known upper bounds. Our results are based on the known duality between neural networks and Newton polytopes via tropical geometry. The integrality assumption implies that these Newton polytopes are lattice polytopes. Then, our depth lower bounds follow from a parity argument on the normalized volume of faces of such polytopes. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: ICLR 2023 conference paper

arXiv:2204.01368 [pdf, other]

Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete

Authors: Daniel Bertschinger, Christoph Hertrich, Paul Jungeblut, Tillmann Miltzow, Simon Weber

Abstract: We consider the problem of finding weights and biases for a two-layer fully connected neural network to fit a given set of data points as well as possible, also known as EmpiricalRiskMinimization. Our main result is that the associated decision problem is $\exists\mathbb{R}$-complete, that is, polynomial-time equivalent to determining whether a multivariate polynomial with integer coefficients has… ▽ More We consider the problem of finding weights and biases for a two-layer fully connected neural network to fit a given set of data points as well as possible, also known as EmpiricalRiskMinimization. Our main result is that the associated decision problem is $\exists\mathbb{R}$-complete, that is, polynomial-time equivalent to determining whether a multivariate polynomial with integer coefficients has any real roots. Furthermore, we prove that algebraic numbers of arbitrarily large degree are required as weights to be able to train some instances to optimality, even if all data points are rational. Our result already applies to fully connected instances with two inputs, two outputs, and one hidden layer of ReLU neurons. Thereby, we strengthen a result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021]. A consequence of this is that a combinatorial search algorithm like the one by Arora, Basu, Mianjy and Mukherjee [ICLR 2018] is impossible for networks with more than one output dimension, unless $\mathsf{NP}=\exists\mathbb{R}$. △ Less

Submitted 22 March, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: 39 pages, 17 figures. Changes in version 2: Added algebraic universality result, improved interpretation of results Changes in version 3: Improved exposition by formalizing properties of gadgets

arXiv:2105.14835 [pdf, ps, other]

doi 10.1137/22M1489332

Towards Lower Bounds on the Depth of ReLU Neural Networks

Authors: Christoph Hertrich, Amitabh Basu, Marco Di Summa, Martin Skutella

Abstract: We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning a… ▽ More We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth. △ Less

Submitted 17 July, 2024; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: Authors' accepted manuscript for SIAM Journal on Discrete Mathematics. A preliminary conference version appeared at NeurIPS 2021

Journal ref: SIAM Journal on Discrete Mathematics 2023 37:2, 997-1029

arXiv:2105.08675 [pdf, ps, other]

doi 10.1613/jair.1.13547

The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality

Authors: Vincent Froese, Christoph Hertrich, Rolf Niedermeier

Abstract: Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parame… ▽ More Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension $d$ of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter $d$ and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including $\ell^p$-loss for all $p\in[0,\infty]$. In particular, we extend a known polynomial-time algorithm for constant $d$ and convex loss functions to a more general class of loss functions, matching our running time lower bounds also in these cases. △ Less

Submitted 23 August, 2022; v1 submitted 18 May, 2021; originally announced May 2021.

Journal ref: Journal of Artificial Intelligence Research 74 (2022): 1775-1790

arXiv:2102.06635 [pdf, other]

doi 10.1007/s10107-024-02096-x

ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation

Authors: Christoph Hertrich, Leon Sering

Abstract: This paper studies the expressive power of artificial neural networks with rectified linear units. In order to study them as a model of real-valued computation, we introduce the concept of Max-Affine Arithmetic Programs and show equivalence between them and neural networks concerning natural complexity measures. We then use this result to show that two fundamental combinatorial optimization proble… ▽ More This paper studies the expressive power of artificial neural networks with rectified linear units. In order to study them as a model of real-valued computation, we introduce the concept of Max-Affine Arithmetic Programs and show equivalence between them and neural networks concerning natural complexity measures. We then use this result to show that two fundamental combinatorial optimization problems can be solved with polynomial-size neural networks. First, we show that for any undirected graph with $n$ nodes, there is a neural network (with fixed weights and biases) of size $\mathcal{O}(n^3)$ that takes the edge weights as input and computes the value of a minimum spanning tree of the graph. Second, we show that for any directed graph with $n$ nodes and $m$ arcs, there is a neural network of size $\mathcal{O}(m^2n^2)$ that takes the arc capacities as input and computes a maximum flow. Our results imply that these two problems can be solved with strongly polynomial time algorithms that solely use affine transformations and maxima computations, but no comparison-based branchings. △ Less

Submitted 17 July, 2024; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: Authors' accepted manuscript for Mathematical Programming (2024). A short version appeared in the proceedings of IPCO 2023

arXiv:2008.09692 [pdf, other]

doi 10.37236/9808

Coloring Drawings of Graphs

Authors: Christoph Hertrich, Felix Schröder, Raphael Steiner

Abstract: We consider cell colorings of drawings of graphs in the plane. Given a multi-graph $G$ together with a drawing $Γ(G)$ in the plane with only finitely many crossings, we define a cell $k$-coloring of $Γ(G)$ to be a coloring of the maximal connected regions of the drawing, the cells, with $k$ colors such that adjacent cells have different colors. By the $4$-color theorem, every drawing of a bridge… ▽ More We consider cell colorings of drawings of graphs in the plane. Given a multi-graph $G$ together with a drawing $Γ(G)$ in the plane with only finitely many crossings, we define a cell $k$-coloring of $Γ(G)$ to be a coloring of the maximal connected regions of the drawing, the cells, with $k$ colors such that adjacent cells have different colors. By the $4$-color theorem, every drawing of a bridgeless graph has a cell $4$-coloring. A drawing of a graph is cell $2$-colorable if and only if the underlying graph is Eulerian. We show that every graph without degree 1 vertices admits a cell $3$-colorable drawing. This leads to the natural question which abstract graphs have the property that each of their drawings has a cell $3$-coloring. We say that such a graph is universally cell $3$-colorable. We show that every $4$-edge-connected graph and every graph admitting a nowhere-zero $3$-flow is universally cell $3$-colorable. We also discuss circumstances under which universal cell $3$-colorability guarantees the existence of a nowhere-zero $3$-flow. On the negative side, we present an infinite family of universally cell $3$-colorable graphs without a nowhere-zero $3$-flow. On the positive side, we formulate a conjecture which has a surprising relation to a famous open problem by Tutte known as the $3$-flow-conjecture. We prove our conjecture for subcubic and for $K_{3,3}$-minor-free graphs. △ Less

Submitted 26 August, 2022; v1 submitted 21 August, 2020; originally announced August 2020.

Comments: 35 pages, 23 figures

MSC Class: 05C10 05C15 (Primary) 05C45 (Secondary) ACM Class: F.2.2; G.2.2

Journal ref: The Electronic Journal of Combinatorics 29(1) (2022), #P1.17

arXiv:2006.09872 [pdf, ps, other]

doi 10.1007/s10951-020-00667-2

Scheduling a Proportionate Flow Shop of Batching Machines

Authors: Christoph Hertrich, Christian Weiß, Heiner Ackermann, Sandy Heydrich, Sven O. Krumke

Abstract: In this paper we study a proportionate flow shop of batching machines with release dates and a fixed number $m \geq 2$ of machines. The scheduling problem has so far barely received any attention in the literature, but recently its importance has increased significantly, due to applications in the industrial scaling of modern bio-medicine production processes. We show that for any fixed number of… ▽ More In this paper we study a proportionate flow shop of batching machines with release dates and a fixed number $m \geq 2$ of machines. The scheduling problem has so far barely received any attention in the literature, but recently its importance has increased significantly, due to applications in the industrial scaling of modern bio-medicine production processes. We show that for any fixed number of machines, the makespan and the sum of completion times can be minimized in polynomial time. Furthermore, we show that the obtained algorithm can also be used to minimize the weighted total completion time, maximum lateness, total tardiness and (weighted) number of late jobs in polynomial time if all release dates are $0$. Previously, polynomial time algorithms have only been known for two machines. △ Less

Submitted 26 November, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: Version 2: replace initial preprint with authors' accepted manuscript

Journal ref: Journal of Scheduling 23, 575-593 (2020)

arXiv:2005.14105 [pdf, other]

doi 10.1287/ijoc.2021.0225

Provably Good Solutions to the Knapsack Problem via Neural Networks of Bounded Size

Authors: Christoph Hertrich, Martin Skutella

Abstract: The development of a satisfying and rigorous mathematical understanding of the performance of neural networks is a major challenge in artificial intelligence. Against this background, we study the expressive power of neural networks through the example of the classical NP-hard Knapsack Problem. Our main contribution is a class of recurrent neural networks (RNNs) with rectified linear units that ar… ▽ More The development of a satisfying and rigorous mathematical understanding of the performance of neural networks is a major challenge in artificial intelligence. Against this background, we study the expressive power of neural networks through the example of the classical NP-hard Knapsack Problem. Our main contribution is a class of recurrent neural networks (RNNs) with rectified linear units that are iteratively applied to each item of a Knapsack instance and thereby compute optimal or provably good solution values. We show that an RNN of depth four and width depending quadratically on the profit of an optimum Knapsack solution is sufficient to find optimum Knapsack solutions. We also prove the following tradeoff between the size of an RNN and the quality of the computed Knapsack solution: for Knapsack instances consisting of $n$ items, an RNN of depth five and width $w$ computes a solution of value at least $1-\mathcal{O}(n^2/\sqrt{w})$ times the optimum solution value. Our results build upon a classical dynamic programming formulation of the Knapsack Problem as well as a careful rounding of profit values that are also at the core of the well-known fully polynomial-time approximation scheme for the Knapsack Problem. A carefully conducted computational study qualitatively supports our theoretical size bounds. Finally, we point out that our results can be generalized to many other combinatorial optimization problems that admit dynamic programming solution methods, such as various Shortest Path Problems, the Longest Common Subsequence Problem, and the Traveling Salesperson Problem. △ Less

Submitted 11 July, 2024; v1 submitted 28 May, 2020; originally announced May 2020.

Comments: Authors' accepted manuscript for the INFORMS Journal on Computing. A short version of this paper appeared in the proceedings of AAAI 2021

Journal ref: INFORMS Journal on Computing 35(5):1079-1097 (2023)

arXiv:2005.03552 [pdf, other]

doi 10.1007/s10951-022-00732-y

Online Algorithms to Schedule a Proportionate Flexible Flow Shop of Batching Machines

Authors: Christoph Hertrich, Christian Weiß, Heiner Ackermann, Sandy Heydrich, Sven O. Krumke

Abstract: This paper is the first to consider online algorithms to schedule a proportionate flexible flow shop of batching machines (PFFB). The scheduling model is motivated by manufacturing processes of individualized medicaments, which are used in modern medicine to treat some serious illnesses. We provide two different online algorithms, proving also lower bounds for the offline problem to compute their… ▽ More This paper is the first to consider online algorithms to schedule a proportionate flexible flow shop of batching machines (PFFB). The scheduling model is motivated by manufacturing processes of individualized medicaments, which are used in modern medicine to treat some serious illnesses. We provide two different online algorithms, proving also lower bounds for the offline problem to compute their competitive ratios. The first algorithm is an easy-to-implement, general local scheduling heuristic. It is 2-competitive for PFFBs with an arbitrary number of stages and for several natural scheduling objectives. We also show that for total/average flow time, no deterministic algorithm with better competitive ratio exists. For the special case with two stages and the makespan or total completion time objective, we describe an improved algorithm that achieves the best possible competitive ratio $\varphi=\frac{1+\sqrt{5}}{2}$, the golden ratio. All our results also hold for proportionate (non-flexible) flow shops of batching machines (PFB) for which this is also the first paper to study online algorithms. △ Less

Submitted 17 July, 2024; v1 submitted 7 May, 2020; originally announced May 2020.

Comments: Authors' accepted manuscript

Journal ref: Journal of Scheduling 25, 643-657 (2022)

arXiv:1901.02771 [pdf, ps, other]

doi 10.1007/978-3-030-18500-8_17

Sweep Algorithms for the Capacitated Vehicle Routing Problem with Structured Time Window

Authors: Christoph Hertrich, Philipp Hungerländer, Christian Truden

Abstract: The capacitated Vehicle Routing Problem with structured Time Windows (cVRPsTW) is concerned with finding optimal tours for vehicles with given capacity constraints to deliver goods to customers within assigned time windows. In our problem variant these time windows have a special structure, namely they are non-overlapping and each time window holds several customers. This is a reasonable assumptio… ▽ More The capacitated Vehicle Routing Problem with structured Time Windows (cVRPsTW) is concerned with finding optimal tours for vehicles with given capacity constraints to deliver goods to customers within assigned time windows. In our problem variant these time windows have a special structure, namely they are non-overlapping and each time window holds several customers. This is a reasonable assumption for Attended Home Delivery services. Sweep algorithms are known as simple, yet effective heuristics for the classical capacitated Vehicle Routing Problem. We propose variants of the sweep algorithm that are not only able to deal with time windows, but also exploit the additional structure of the time windows in a cVRPsTW. Afterwards we suggest local improvement heuristics to decrease our objective function even further. A carefully constructed benchmark set that resembles real-world data is used to prove the efficacy of our algorithms in a computational study. △ Less

Submitted 9 January, 2019; originally announced January 2019.

Showing 1–19 of 19 results for author: Hertrich, C