-
Optimization of Next-Day Delivery Coverage using Constraint Programming and Random Key Optimizers
Authors:
Kyle Brubaker,
Kyle E. C. Booth,
Martin J. A. Schuetz,
Philipp Loick,
Jian Shen,
Arun Ramamurthy,
Georgios Paschos
Abstract:
We consider the logistics network of an e-commerce retailer, specifically the so-called "middle mile" network, that routes inventory from supply warehouses to distribution stations to be ingested into the terminal ("last mile") delivery network. The speed of packages through this middle mile network is a key determinant for the ultimate delivery speed to the end user. An important target for a ret…
▽ More
We consider the logistics network of an e-commerce retailer, specifically the so-called "middle mile" network, that routes inventory from supply warehouses to distribution stations to be ingested into the terminal ("last mile") delivery network. The speed of packages through this middle mile network is a key determinant for the ultimate delivery speed to the end user. An important target for a retailer is to maximize the fraction of user orders that can be serviced within one day, i.e., next-day delivery. As such, we formulate the maximization of expected next-day delivery coverage within the middle-mile network as an optimization problem, involving a set of temporal and capacity-based constraints on the network and requiring the use of a black-box model to evaluate the objective function. We design both exact constraint programming (CP) and heuristic random-key optimizer (RKO) approaches, the former of which uses a proxy objective function. We perform experiments on large-scale, real-world problem instances and show that both approaches have merit, in that they can match or outperform the baseline solution, a bespoke greedy solver with integrated local search, in expected next-day delivery coverage. Our experiments focus on two high-level problem definitions, starting with a base problem and then adding more complexity, and also explore the generalization of the solvers across a range of problem instance sizes. We find that a hybrid model using RKO and a bespoke local search protocol performs best on the full problem definition with respect to expected next-day delivery (increase of +50 basis points [bps] over baseline) but can take days to run, whereas the hybrid model using CP and local search is slightly less competitive (+20 bps) but takes only hours to run.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Scalable iterative pruning of large language and vision models using block coordinate descent
Authors:
Gili Rosenberg,
J. Kyle Brubaker,
Martin J. A. Schuetz,
Elton Yechao Zhu,
Serdar Kadıoğlu,
Sima E. Borujeni,
Helmut G. Katzgraber
Abstract:
Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner using b…
▽ More
Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner using block coordinate descent. The iterative, block-based nature of this pruning technique, which we dub ``iterative Combinatorial Brain Surgeon'' (iCBS) allows for scalability to very large models, including large language models (LLMs), that may not be feasible with a one-shot combinatorial optimization approach. When applied to large models like Mistral and DeiT, iCBS achieves higher performance metrics at the same density levels compared to existing pruning methods such as Wanda. This demonstrates the effectiveness of this iterative, block-wise pruning method in compressing and optimizing the performance of large deep learning models, even while optimizing over only a small fraction of the weights. Moreover, our approach allows for a quality-time (or cost) tradeoff that is not available when using a one-shot pruning technique alone. The block-wise formulation of the optimization problem enables the use of hardware accelerators, potentially offsetting the increased computational costs compared to one-shot pruning methods like Wanda. In particular, the optimization problem solved for each block is quantum-amenable in that it could, in principle, be solved by a quantum computer.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
A Random-Key Optimizer for Combinatorial Optimization
Authors:
Antonio A. Chaves,
Mauricio G. C. Resende,
Martin J. A. Schuetz,
J. Kyle Brubaker,
Helmut G. Katzgraber,
Edilson F. de Arruda,
Ricardo M. A. Silva
Abstract:
This paper presents the Random-Key Optimizer (RKO), a versatile and efficient stochastic local search method tailored for combinatorial optimization problems. Using the random-key concept, RKO encodes solutions as vectors of random keys that are subsequently decoded into feasible solutions via problem-specific decoders. The RKO framework is able to combine a plethora of classic metaheuristics, eac…
▽ More
This paper presents the Random-Key Optimizer (RKO), a versatile and efficient stochastic local search method tailored for combinatorial optimization problems. Using the random-key concept, RKO encodes solutions as vectors of random keys that are subsequently decoded into feasible solutions via problem-specific decoders. The RKO framework is able to combine a plethora of classic metaheuristics, each capable of operating independently or in parallel, with solution sharing facilitated through an elite solution pool. This modular approach allows for the adaptation of various metaheuristics, including simulated annealing, iterated local search, and greedy randomized adaptive search procedures, among others. The efficacy of the RKO framework, implemented in C++, is demonstrated through its application to three NP-hard combinatorial optimization problems: the alpha-neighborhood p-median problem, the tree of hubs location problem, and the node-capacitated graph partitioning problem. The results highlight the framework's ability to produce high-quality solutions across diverse problem domains, underscoring its potential as a robust tool for combinatorial optimization.
△ Less
Submitted 15 November, 2024; v1 submitted 6 November, 2024;
originally announced November 2024.
-
SimLOD: Simultaneous LOD Generation and Rendering
Authors:
Markus Schütz,
Lukas Herzberger,
Michael Wimmer
Abstract:
About: We propose an incremental LOD generation approach for point clouds that allows us to simultaneously load points from disk, update an octree-based level-of-detail representation, and render the intermediate results in real time while additional points are still being loaded from disk. LOD construction and rendering are both implemented in CUDA and share the GPU's processing power, but each i…
▽ More
About: We propose an incremental LOD generation approach for point clouds that allows us to simultaneously load points from disk, update an octree-based level-of-detail representation, and render the intermediate results in real time while additional points are still being loaded from disk. LOD construction and rendering are both implemented in CUDA and share the GPU's processing power, but each incremental update is lightweight enough to leave enough time to maintain real-time frame rates.
Background: LOD construction is typically implemented as a preprocessing step that requires users to wait before they are able to view the results in real time. This approach allows users to view intermediate results right away.
Results: Our approach is able to stream points from an SSD and update the octree on the GPU at rates of up to 580 million points per second (~9.3GB/s from a PCIe 5.0 SSD) on an RTX 4090. Depending on the data set, our approach spends an average of about 1 to 2 ms to incrementally insert 1 million points into the octree, allowing us to insert several million points per frame into the LOD structure and render the intermediate results within the same frame.
Discussion/Limitations: We aim to provide near-instant, real-time visualization of large data sets without preprocessing. Out-of-core processing of arbitrarily large data sets and color-filtering for higher-quality LODs are subject to future work.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Explainable AI using expressive Boolean formulas
Authors:
Gili Rosenberg,
J. Kyle Brubaker,
Martin J. A. Schuetz,
Grant Salton,
Zhihuai Zhu,
Elton Yechao Zhu,
Serdar Kadıoğlu,
Sima E. Borujeni,
Helmut G. Katzgraber
Abstract:
We propose and implement an interpretable machine learning classification model for Explainable AI (XAI) based on expressive Boolean formulas. Potential applications include credit scoring and diagnosis of medical conditions. The Boolean formula defines a rule with tunable complexity (or interpretability), according to which input data are classified. Such a formula can include any operator that c…
▽ More
We propose and implement an interpretable machine learning classification model for Explainable AI (XAI) based on expressive Boolean formulas. Potential applications include credit scoring and diagnosis of medical conditions. The Boolean formula defines a rule with tunable complexity (or interpretability), according to which input data are classified. Such a formula can include any operator that can be applied to one or more Boolean variables, thus providing higher expressivity compared to more rigid rule-based and tree-based approaches. The classifier is trained using native local optimization techniques, efficiently searching the space of feasible formulas. Shallow rules can be determined by fast Integer Linear Programming (ILP) or Quadratic Unconstrained Binary Optimization (QUBO) solvers, potentially powered by special purpose hardware or quantum devices. We combine the expressivity and efficiency of the native local optimizer with the fast operation of these devices by executing non-local moves that optimize over subtrees of the full Boolean formula. We provide extensive numerical benchmarking results featuring several baselines on well-known public datasets. Based on the results, we find that the native local rule classifier is generally competitive with the other classifiers. The addition of non-local moves achieves similar results with fewer iterations, and therefore using specialized or quantum hardware could lead to a speedup by fast proposal of non-local moves.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Reply to: Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems
Authors:
Martin J. A. Schuetz,
J. Kyle Brubaker,
Helmut G. Katzgraber
Abstract:
We provide a comprehensive reply to the comment written by Stefan Boettcher [arXiv:2210.00623] and argue that the comment singles out one particular non-representative example problem, entirely focusing on the maximum cut problem (MaxCut) on sparse graphs, for which greedy algorithms are expected to perform well. Conversely, we highlight the broader algorithmic development underlying our original…
▽ More
We provide a comprehensive reply to the comment written by Stefan Boettcher [arXiv:2210.00623] and argue that the comment singles out one particular non-representative example problem, entirely focusing on the maximum cut problem (MaxCut) on sparse graphs, for which greedy algorithms are expected to perform well. Conversely, we highlight the broader algorithmic development underlying our original work, and (within our original framework) provide additional numerical results showing sizable improvements over our original data, thereby refuting the comment's original performance statements. Furthermore, it has already been shown that physics-inspired graph neural networks (PI-GNNs) can outperform greedy algorithms, in particular on hard, dense instances. We also argue that the internal (parallel) anatomy of graph neural networks is very different from the (sequential) nature of greedy algorithms, and (based on their usage at the scale of real-world social networks) point out that graph neural networks have demonstrated their potential for superior scalability compared to existing heuristics such as extremal optimization. Finally, we conclude highlighting the conceptual novelty of our work and outline some potential extensions.
△ Less
Submitted 3 February, 2023;
originally announced March 2023.
-
GPU-Accelerated LOD Generation for Point Clouds
Authors:
Markus Schütz,
Bernhard Kerbl,
Philip Klaus,
Michael Wimmer
Abstract:
About: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-…
▽ More
About: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-based state of the art.
Background: LOD structures are required to render hundreds of millions to trillions of points, but constructing them takes time.
Results: LOD structures suitable for rendering and streaming are constructed at rates of about 1 billion points per second (with color filtering) to 4 billion points per second (sample-picking/random sampling, state of the art) on an RTX 3090 -- an improvement of a factor of 80 to 400 times over the CPU-based state of the art (12 million points per second). Due to being in-core, model sizes are limited to about 500 million points per 24GB memory.
Discussion: Our method currently focuses on maximizing in-core construction speed on the GPU. Issues such as out-of-core construction of arbitrarily large data sets are not addressed, but we expect it to be suitable as a component of bottom-up out-of-core LOD construction schemes.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Reply to: Modern graph neural networks do worse than classical greedy algorithms in solving combinatorial optimization problems like maximum independent set
Authors:
Martin J. A. Schuetz,
J. Kyle Brubaker,
Helmut G. Katzgraber
Abstract:
We provide a comprehensive reply to the comment written by Chiara Angelini and Federico Ricci-Tersenghi [arXiv:2206.13211] and argue that the comment singles out one particular non-representative example problem, entirely focusing on the maximum independent set (MIS) on sparse graphs, for which greedy algorithms are expected to perform well. Conversely, we highlight the broader algorithmic develop…
▽ More
We provide a comprehensive reply to the comment written by Chiara Angelini and Federico Ricci-Tersenghi [arXiv:2206.13211] and argue that the comment singles out one particular non-representative example problem, entirely focusing on the maximum independent set (MIS) on sparse graphs, for which greedy algorithms are expected to perform well. Conversely, we highlight the broader algorithmic development underlying our original work, and (within our original framework) provide additional numerical results showing sizable improvements over our original results, thereby refuting the comment's performance statements. We also provide results showing run-time scaling superior to the results provided by Angelini and Ricci-Tersenghi. Furthermore, we show that the proposed set of random d-regular graphs does not provide a universal set of benchmark instances, nor do greedy heuristics provide a universal algorithmic baseline. Finally, we argue that the internal (parallel) anatomy of graph neural networks is very different from the (sequential) nature of greedy algorithms and emphasize that graph neural networks have demonstrated their potential for superior scalability compared to existing heuristics such as parallel tempering. We conclude by discussing the conceptual novelty of our work and outline some potential extensions.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Optimization of Robot Trajectory Planning with Nature-Inspired and Hybrid Quantum Algorithms
Authors:
Martin J. A. Schuetz,
J. Kyle Brubaker,
Henry Montagu,
Yannick van Dijk,
Johannes Klepsch,
Philipp Ross,
Andre Luckow,
Mauricio G. C. Resende,
Helmut G. Katzgraber
Abstract:
We solve robot trajectory planning problems at industry-relevant scales. Our end-to-end solution integrates highly versatile random-key algorithms with model stacking and ensemble techniques, as well as path relinking for solution refinement. The core optimization module consists of a biased random-key genetic algorithm. Through a distinct separation of problem-independent and problem-dependent mo…
▽ More
We solve robot trajectory planning problems at industry-relevant scales. Our end-to-end solution integrates highly versatile random-key algorithms with model stacking and ensemble techniques, as well as path relinking for solution refinement. The core optimization module consists of a biased random-key genetic algorithm. Through a distinct separation of problem-independent and problem-dependent modules, we achieve an efficient problem representation, with a native encoding of constraints. We show that generalizations to alternative algorithmic paradigms such as simulated annealing are straightforward. We provide numerical benchmark results for industry-scale data sets. Our approach is found to consistently outperform greedy baseline results. To assess the capabilities of today's quantum hardware, we complement the classical approach with results obtained on quantum annealing hardware, using qbsolv on Amazon Braket. Finally, we show how the latter can be integrated into our larger pipeline, providing a quantum-ready hybrid solution to the problem.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Software Rasterization of 2 Billion Points in Real Time
Authors:
Markus Schütz,
Bernhard Kerbl,
Michael Wimmer
Abstract:
We propose a software rasterization pipeline for point clouds that is capable of brute-force rendering up to two billion points in real time (60fps). Improvements over the state of the art are achieved by batching points in a way that a number of batch-level optimizations can be computed before rasterizing the points within the same rendering pass. These optimizations include frustum culling, leve…
▽ More
We propose a software rasterization pipeline for point clouds that is capable of brute-force rendering up to two billion points in real time (60fps). Improvements over the state of the art are achieved by batching points in a way that a number of batch-level optimizations can be computed before rasterizing the points within the same rendering pass. These optimizations include frustum culling, level-of-detail rendering, and choosing the appropriate coordinate precision for a given batch of points directly within a compute workgroup. Adaptive coordinate precision, in conjunction with visibility buffers, reduces the number of loaded bytes for the majority of points down to 4, thus making our approach several times faster than the bandwidth-limited state of the art. Furthermore, support for LOD rendering makes our software-rasterization approach suitable for rendering arbitrarily large point clouds, and to meet the increased performance demands of virtual reality rendering.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Graph Coloring with Physics-Inspired Graph Neural Networks
Authors:
Martin J. A. Schuetz,
J. Kyle Brubaker,
Zhihuai Zhu,
Helmut G. Katzgraber
Abstract:
We show how graph neural networks can be used to solve the canonical graph coloring problem. We frame graph coloring as a multi-class node classification problem and utilize an unsupervised training strategy based on the statistical physics Potts model. Generalizations to other multi-class problems such as community detection, data clustering, and the minimum clique cover problem are straightforwa…
▽ More
We show how graph neural networks can be used to solve the canonical graph coloring problem. We frame graph coloring as a multi-class node classification problem and utilize an unsupervised training strategy based on the statistical physics Potts model. Generalizations to other multi-class problems such as community detection, data clustering, and the minimum clique cover problem are straightforward. We provide numerical benchmark results and illustrate our approach with an end-to-end application for a real-world scheduling use case within a comprehensive encode-process-decode framework. Our optimization approach performs on par or outperforms existing solvers, with the ability to scale to problems with millions of variables.
△ Less
Submitted 23 November, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Combinatorial Optimization with Physics-Inspired Graph Neural Networks
Authors:
Martin J. A. Schuetz,
J. Kyle Brubaker,
Helmut G. Katzgraber
Abstract:
Combinatorial optimization problems are pervasive across science and industry. Modern deep learning tools are poised to solve these problems at unprecedented scales, but a unifying framework that incorporates insights from statistical physics is still outstanding. Here we demonstrate how graph neural networks can be used to solve combinatorial optimization problems. Our approach is broadly applica…
▽ More
Combinatorial optimization problems are pervasive across science and industry. Modern deep learning tools are poised to solve these problems at unprecedented scales, but a unifying framework that incorporates insights from statistical physics is still outstanding. Here we demonstrate how graph neural networks can be used to solve combinatorial optimization problems. Our approach is broadly applicable to canonical NP-hard problems in the form of quadratic unconstrained binary optimization problems, such as maximum cut, minimum vertex cover, maximum independent set, as well as Ising spin glasses and higher-order generalizations thereof in the form of polynomial unconstrained binary optimization problems. We apply a relaxation strategy to the problem Hamiltonian to generate a differentiable loss function with which we train the graph neural network and apply a simple projection to integer variables once the unsupervised training process has completed. We showcase our approach with numerical results for the canonical maximum cut and maximum independent set problems. We find that the graph neural network optimizer performs on par or outperforms existing solvers, with the ability to scale beyond the state of the art to problems with millions of variables.
△ Less
Submitted 22 April, 2022; v1 submitted 2 July, 2021;
originally announced July 2021.
-
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
Authors:
Lam Pham,
Alexander Schindler,
Mina Schütz,
Jasmin Lampert,
Sven Schlarb,
Ross King
Abstract:
In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification…
▽ More
In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2%, 91.1%, and 93.9% with audio input only, visual input only, and both audio-visual input, respectively. The highest classification accuracy of 93.9%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5% compared with DCASE baseline.
△ Less
Submitted 12 June, 2021;
originally announced June 2021.
-
Automatic Sexism Detection with Multilingual Transformer Models
Authors:
Mina Schütz,
Jaqueline Boeck,
Daria Liakhovets,
Djordje Slijepčević,
Armin Kirchknopf,
Manuel Hecht,
Johannes Bogensperger,
Sven Schlarb,
Alexander Schindler,
Matthias Zeppelzauer
Abstract:
Sexism has become an increasingly major problem on social networks during the last years. The first shared task on sEXism Identification in Social neTworks (EXIST) at IberLEF 2021 is an international competition in the field of Natural Language Processing (NLP) with the aim to automatically identify sexism in social media content by applying machine learning methods. Thereby sexism detection is fo…
▽ More
Sexism has become an increasingly major problem on social networks during the last years. The first shared task on sEXism Identification in Social neTworks (EXIST) at IberLEF 2021 is an international competition in the field of Natural Language Processing (NLP) with the aim to automatically identify sexism in social media content by applying machine learning methods. Thereby sexism detection is formulated as a coarse (binary) classification problem and a fine-grained classification task that distinguishes multiple types of sexist content (e.g., dominance, stereotyping, and objectification). This paper presents the contribution of the AIT_FHSTP team at the EXIST2021 benchmark for both tasks. To solve the tasks we applied two multilingual transformer models, one based on multilingual BERT and one based on XLM-R. Our approach uses two different strategies to adapt the transformers to the detection of sexist content: first, unsupervised pre-training with additional data and second, supervised fine-tuning with additional and augmented data. For both tasks our best model is XLM-R with unsupervised pre-training on the EXIST data and additional datasets and fine-tuning on the provided dataset. The best run for the binary classification (task 1) achieves a macro F1-score of 0.7752 and scores 5th rank in the benchmark; for the multiclass classification (task 2) our best submission scores 6th rank with a macro F1-score of 0.5589.
△ Less
Submitted 8 February, 2022; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Rendering Point Clouds with Compute Shaders and Vertex Order Optimization
Authors:
Markus Schütz,
Bernhard Kerbl,
Michael Wimmer
Abstract:
While commodity GPUs provide a continuously growing range of features and sophisticated methods for accelerating compute jobs, many state-of-the-art solutions for point cloud rendering still rely on the provided point primitives (GL_POINTS, POINTLIST, ...) of graphics APIs for image synthesis. In this paper, we present several compute-based point cloud rendering approaches that outperform the hard…
▽ More
While commodity GPUs provide a continuously growing range of features and sophisticated methods for accelerating compute jobs, many state-of-the-art solutions for point cloud rendering still rely on the provided point primitives (GL_POINTS, POINTLIST, ...) of graphics APIs for image synthesis. In this paper, we present several compute-based point cloud rendering approaches that outperform the hardware pipeline by up to an order of magnitude and achieve significantly better frame times than previous compute-based methods. Beyond basic closest-point rendering, we also introduce a fast, high-quality variant to reduce aliasing. We present and evaluate several variants of our proposed methods with different flavors of optimization, in order to ensure their applicability and achieve optimal performance on a range of platforms and architectures with varying support for novel GPU hardware features. During our experiments, the observed peak performance was reached rendering 796 million points (12.7GB) at rates of 62 to 64 frames per second (50 billion points per second, 802GB/s) on an RTX 3090 without the use of level-of-detail structures.
We further introduce an optimized vertex order for point clouds to boost the efficiency of GL_POINTS by a factor of 5x in cases where hardware rendering is compulsory. We compare different orderings and show that Morton sorted buffers are faster for some viewpoints, while shuffled vertex buffers are faster in others. In contrast, combining both approaches by first sorting according to Morton-code and shuffling the resulting sequence in batches of 128 points leads to a vertex buffer layout with high rendering performance and low sensitivity to viewpoint changes.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Rendering Point Clouds with Compute Shaders
Authors:
Markus Schütz,
Michael Wimmer
Abstract:
We propose a compute shader based point cloud rasterizer with up to 10 times higher performance than classic point-based rendering with the GL_POINT primitive. In addition to that, our rasterizer offers 5 byte depth-buffer precision with uniform or customizable distribution, and we show that it is possible to implement a high-quality splatting method that blends together overlapping fragments whil…
▽ More
We propose a compute shader based point cloud rasterizer with up to 10 times higher performance than classic point-based rendering with the GL_POINT primitive. In addition to that, our rasterizer offers 5 byte depth-buffer precision with uniform or customizable distribution, and we show that it is possible to implement a high-quality splatting method that blends together overlapping fragments while still maintaining higher frame-rates than the traditional approach.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.