Search | arXiv e-print repository

Scalable iterative pruning of large language and vision models using block coordinate descent

Authors: Gili Rosenberg, J. Kyle Brubaker, Martin J. A. Schuetz, Elton Yechao Zhu, Serdar Kadıoğlu, Sima E. Borujeni, Helmut G. Katzgraber

Abstract: Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner using b… ▽ More Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner using block coordinate descent. The iterative, block-based nature of this pruning technique, which we dub ``iterative Combinatorial Brain Surgeon'' (iCBS) allows for scalability to very large models, including large language models (LLMs), that may not be feasible with a one-shot combinatorial optimization approach. When applied to large models like Mistral and DeiT, iCBS achieves higher performance metrics at the same density levels compared to existing pruning methods such as Wanda. This demonstrates the effectiveness of this iterative, block-wise pruning method in compressing and optimizing the performance of large deep learning models, even while optimizing over only a small fraction of the weights. Moreover, our approach allows for a quality-time (or cost) tradeoff that is not available when using a one-shot pruning technique alone. The block-wise formulation of the optimization problem enables the use of hardware accelerators, potentially offsetting the increased computational costs compared to one-shot pruning methods like Wanda. In particular, the optimization problem solved for each block is quantum-amenable in that it could, in principle, be solved by a quantum computer. △ Less

Submitted 26 November, 2024; originally announced November 2024.

Comments: 16 pages, 6 figures, 5 tables

arXiv:2407.20185 [pdf, other]

Solving QUBOs with a quantum-amenable branch and bound method

Authors: Thomas Häner, Kyle E. C. Booth, Sima E. Borujeni, Elton Yechao Zhu

Abstract: Due to the expected disparity in quantum vs. classical clock speeds, quantum advantage for branch and bound algorithms is more likely achievable in settings involving large search trees and low operator evaluation costs. Therefore, in this paper, we describe and experimentally validate an exact classical branch and bound solver for quadratic unconstrained binary optimization (QUBO) problems that m… ▽ More Due to the expected disparity in quantum vs. classical clock speeds, quantum advantage for branch and bound algorithms is more likely achievable in settings involving large search trees and low operator evaluation costs. Therefore, in this paper, we describe and experimentally validate an exact classical branch and bound solver for quadratic unconstrained binary optimization (QUBO) problems that matches these criteria. Our solver leverages cheap-to-implement bounds from the literature previously proposed for Ising models, including that of Hartwig, Daske, and Kobe from 1984. We detail a variety of techniques from high-performance computing and operations research used to boost solver performance, including a global variable reordering heuristic, a primal heuristic based on simulated annealing, and a truncated computation of the recursive bound. We also outline a number of simple and inexpensive bound extrapolation techniques. Finally, we conduct an extensive empirical analysis of our solver, comparing its performance to state-of-the-art QUBO and MaxCut solvers, and discuss the challenges of a speedup via quantum branch and bound beyond those faced by any quadratic quantum speedup. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2306.03976 [pdf, other]

doi 10.3390/make5040086

Explainable AI using expressive Boolean formulas

Authors: Gili Rosenberg, J. Kyle Brubaker, Martin J. A. Schuetz, Grant Salton, Zhihuai Zhu, Elton Yechao Zhu, Serdar Kadıoğlu, Sima E. Borujeni, Helmut G. Katzgraber

Abstract: We propose and implement an interpretable machine learning classification model for Explainable AI (XAI) based on expressive Boolean formulas. Potential applications include credit scoring and diagnosis of medical conditions. The Boolean formula defines a rule with tunable complexity (or interpretability), according to which input data are classified. Such a formula can include any operator that c… ▽ More We propose and implement an interpretable machine learning classification model for Explainable AI (XAI) based on expressive Boolean formulas. Potential applications include credit scoring and diagnosis of medical conditions. The Boolean formula defines a rule with tunable complexity (or interpretability), according to which input data are classified. Such a formula can include any operator that can be applied to one or more Boolean variables, thus providing higher expressivity compared to more rigid rule-based and tree-based approaches. The classifier is trained using native local optimization techniques, efficiently searching the space of feasible formulas. Shallow rules can be determined by fast Integer Linear Programming (ILP) or Quadratic Unconstrained Binary Optimization (QUBO) solvers, potentially powered by special purpose hardware or quantum devices. We combine the expressivity and efficiency of the native local optimizer with the fast operation of these devices by executing non-local moves that optimize over subtrees of the full Boolean formula. We provide extensive numerical benchmarking results featuring several baselines on well-known public datasets. Based on the results, we find that the native local rule classifier is generally competitive with the other classifiers. The addition of non-local moves achieves similar results with fewer iterations, and therefore using specialized or quantum hardware could lead to a speedup by fast proposal of non-local moves. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: 28 pages, 16 figures, 4 tables

Journal ref: Mach. Learn. Knowl. Extr. 2023, 5(4), 1760-1795

arXiv:2209.10615 [pdf, other]

doi 10.22331/q-2024-10-10-1495

Iteration Complexity of Variational Quantum Algorithms

Authors: Vyacheslav Kungurtsev, Georgios Korpas, Jakub Marecek, Elton Yechao Zhu

Abstract: There has been much recent interest in near-term applications of quantum computers, i.e., using quantum circuits that have short decoherence times due to hardware limitations. Variational quantum algorithms (VQA), wherein an optimization algorithm implemented on a classical computer evaluates a parametrized quantum circuit as an objective function, are a leading framework in this space. An enormou… ▽ More There has been much recent interest in near-term applications of quantum computers, i.e., using quantum circuits that have short decoherence times due to hardware limitations. Variational quantum algorithms (VQA), wherein an optimization algorithm implemented on a classical computer evaluates a parametrized quantum circuit as an objective function, are a leading framework in this space. An enormous breadth of algorithms in this framework have been proposed for solving a range of problems in machine learning, forecasting, applied physics, and combinatorial optimization, among others. In this paper, we analyze the iteration complexity of VQA, that is, the number of steps that VQA requires until its iterates satisfy a surrogate measure of optimality. We argue that although VQA procedures incorporate algorithms that can, in the idealized case, be modeled as classic procedures in the optimization literature, the particular nature of noise in near-term devices invalidates the claim of applicability of off-the-shelf analyses of these algorithms. Specifically, noise makes the evaluations of the objective function via quantum circuits biased. Commonly used optimization procedures, such as SPSA and the parameter shift rule, can thus be seen as derivative-free optimization algorithms with biased function evaluations, for which there are currently no iteration complexity guarantees in the literature. We derive the missing guarantees and find that the rate of convergence is unaffected. However, the level of bias contributes unfavorably to both the constant therein, and the asymptotic distance to stationarity, i.e., the more bias, the farther one is guaranteed, at best, to reach a stationary point of the VQA objective. △ Less

Submitted 8 September, 2024; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: 45 pages, 13 figures

Report number: Report number 1048209.1.0

Journal ref: Quantum 8, 1495 (2024)

Showing 1–4 of 4 results for author: Zhu, E Y