-
Efficient parallel implementation of the multiplicative weight update method for graph-based linear programs
Authors:
Caleb Ju,
Serif Yesil,
Mengyuan Sun,
Chandra Chekuri,
Edgar Solomonik
Abstract:
Positive linear programs (LPs) model many graph and operations research problems. One can solve for a $(1+ε)$-approximation for positive LPs, for any selected $ε$, in polylogarithmic depth and near-linear work via variations of the multiplicative weight update (MWU) method. Despite extensive theoretical work on these algorithms through the decades, their empirical performance is not well understoo…
▽ More
Positive linear programs (LPs) model many graph and operations research problems. One can solve for a $(1+ε)$-approximation for positive LPs, for any selected $ε$, in polylogarithmic depth and near-linear work via variations of the multiplicative weight update (MWU) method. Despite extensive theoretical work on these algorithms through the decades, their empirical performance is not well understood.
In this work, we implement and test an efficient parallel algorithm for solving positive LP relaxations, and apply it to graph problems such as densest subgraph, bipartite matching, vertex cover and dominating set. We accelerate the algorithm via a new step size search heuristic. Our implementation uses sparse linear algebra optimization techniques such as fusion of vector operations and use of sparse format. Furthermore, we devise an implicit representation for graph incidence constraints. We demonstrate the parallel scalability with the use of threading OpenMP and MPI on the Stampede2 supercomputer. We compare this implementation with exact libraries and specialized libraries for the above problems in order to evaluate MWU's practical standing for both accuracy and performance among other methods. Our results show this implementation is faster than general purpose LP solvers (IBM CPLEX, Gurobi) in all of our experiments, and in some instances, outperforms state-of-the-art specialized parallel graph algorithms.
△ Less
Submitted 12 February, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
SENSEi: Input-Sensitive Compilation for Accelerating GNNs
Authors:
Damitha Lenadora,
Vimarsh Sathia,
Gerasimos Gerogiannis,
Serif Yesil,
Josep Torrellas,
Charith Mendis
Abstract:
Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN computations lead to novel input-sensitive performance behavior. We leverage this observation to propose SENSEi, a system that exposes different sparse and dense ma…
▽ More
Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN computations lead to novel input-sensitive performance behavior. We leverage this observation to propose SENSEi, a system that exposes different sparse and dense matrix primitive compositions based on different matrix re-associations of GNN computations and selects the best among them based on input attributes. SENSEi executes in two stages: (1) an offline compilation stage that enumerates all valid re-associations leading to different sparse-dense matrix compositions and uses input-oblivious pruning techniques to prune away clearly unprofitable candidates and (2) an online runtime system that explores the remaining candidates and uses light-weight cost models to select the best re-association based on the input graph and the embedding sizes on a given hardware platform. On a wide range of configurations, SENSEi achieves speedups of up to $2.012\times$ and $1.85\times$ on graph convolutional networks and up to $6.294\times$ and $16.274\times$ on graph attention networks, on GPUs and CPUs respectively. We also show that its technique generalizes to GNN variants, including those that require sampling. Furthermore, we show that SENSEi's techniques are agnostic to the underlying GNN system, and can be used to yield synergistic improvements across a diverse set of implementations.
△ Less
Submitted 8 March, 2024; v1 submitted 26 June, 2023;
originally announced June 2023.
-
On Dynamic Precision Scaling
Authors:
Serif Yesil,
Ismail Akturk,
Ulya R. Karpuzcu
Abstract:
Based on the observation that application phases exhibit varying degrees of sensitivity to noise (i.e., accuracy loss) in computation during execution, this paper explores how Dynamic Precision Scaling (DPS) can maximize power efficiency by tailoring the precision of computation adaptively to temporal changes in algorithmic noise tolerance. DPS can decrease the arithmetic precision of noise-tolera…
▽ More
Based on the observation that application phases exhibit varying degrees of sensitivity to noise (i.e., accuracy loss) in computation during execution, this paper explores how Dynamic Precision Scaling (DPS) can maximize power efficiency by tailoring the precision of computation adaptively to temporal changes in algorithmic noise tolerance. DPS can decrease the arithmetic precision of noise-tolerant phases to result in power savings at the same operating speed (or faster execution within the same power budget), while keeping the overall loss in accuracy due to precision reduction bounded.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
FPGA Impementation of Erasure-Only Reed Solomon Decoders for Hybrid-ARQ Systems
Authors:
Cansu Sen,
Soner Yesil,
Ertugrul Kolagasioglu
Abstract:
This paper presents the usage of the Reed Solomon Codes as the Forward Error Correction (FEC) unit of the Hybrid Automatic Repeat Request (ARQ) methods. Parametric and flexible FPGA implementation details of such Erasure-Only RS decoders with high symbol lengths (e.g. GF(2^32)) have been presented. The design is based on the GF(2m) multiplier logic core operating at a single clock cycle, where the…
▽ More
This paper presents the usage of the Reed Solomon Codes as the Forward Error Correction (FEC) unit of the Hybrid Automatic Repeat Request (ARQ) methods. Parametric and flexible FPGA implementation details of such Erasure-Only RS decoders with high symbol lengths (e.g. GF(2^32)) have been presented. The design is based on the GF(2m) multiplier logic core operating at a single clock cycle, where the resource utilization and throughput are both directly proportional to the number of these cores. For a fixed implementation, the throughput inversely decreases with the number of erasures to be corrected. Implementation in Zynq7020 SoC device of an example GF(2^32)-RS Decoder capable of correcting 64-erasures with a single multiplier resulted in 1641-LUTs and 188-FFs achieving 15Mbps, whereas the design with 8 multipliers resulted in 6128-LUTs and 628-FFs achieving 100Mbps.
△ Less
Submitted 30 March, 2016;
originally announced March 2016.