-
A General-Purpose Transferable Predictor for Neural Architecture Search
Authors:
Fred X. Han,
Keith G. Mills,
Fabian Chudak,
Parsa Riahi,
Mohammad Salameh,
Jialin Zhang,
Wei Lu,
Shangling Jui,
Di Niu
Abstract:
Understanding and modelling the performance of neural architectures is key to Neural Architecture Search (NAS). Performance predictors have seen widespread use in low-cost NAS and achieve high ranking correlations between predicted and ground truth performance in several NAS benchmarks. However, existing predictors are often designed based on network encodings specific to a predefined search space…
▽ More
Understanding and modelling the performance of neural architectures is key to Neural Architecture Search (NAS). Performance predictors have seen widespread use in low-cost NAS and achieve high ranking correlations between predicted and ground truth performance in several NAS benchmarks. However, existing predictors are often designed based on network encodings specific to a predefined search space and are therefore not generalizable to other search spaces or new architecture families. In this paper, we propose a general-purpose neural predictor for NAS that can transfer across search spaces, by representing any given candidate Convolutional Neural Network (CNN) with a Computation Graph (CG) that consists of primitive operators. We further combine our CG network representation with Contrastive Learning (CL) and propose a graph representation learning procedure that leverages the structural information of unlabeled architectures from multiple families to train CG embeddings for our performance predictor. Experimental results on NAS-Bench-101, 201 and 301 demonstrate the efficacy of our scheme as we achieve strong positive Spearman Rank Correlation Coefficient (SRCC) on every search space, outperforming several Zero-Cost Proxies, including Synflow and Jacov, which are also generalizable predictors across search spaces. Moreover, when using our proposed general-purpose predictor in an evolutionary neural architecture search algorithm, we can find high-performance architectures on NAS-Bench-101 and find a MobileNetV3 architecture that attains 79.2% top-1 accuracy on ImageNet.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
GENNAPE: Towards Generalized Neural Architecture Performance Estimators
Authors:
Keith G. Mills,
Fred X. Han,
Jialin Zhang,
Fabian Chudak,
Ali Safari Mamaghani,
Mohammad Salameh,
Wei Lu,
Shangling Jui,
Di Niu
Abstract:
Predicting neural architecture performance is a challenging task and is crucial to neural architecture design and search. Existing approaches either rely on neural performance predictors which are limited to modeling architectures in a predefined design space involving specific sets of operators and connection rules, and cannot generalize to unseen architectures, or resort to zero-cost proxies whi…
▽ More
Predicting neural architecture performance is a challenging task and is crucial to neural architecture design and search. Existing approaches either rely on neural performance predictors which are limited to modeling architectures in a predefined design space involving specific sets of operators and connection rules, and cannot generalize to unseen architectures, or resort to zero-cost proxies which are not always accurate. In this paper, we propose GENNAPE, a Generalized Neural Architecture Performance Estimator, which is pretrained on open neural architecture benchmarks, and aims to generalize to completely unseen architectures through combined innovations in network representation, contrastive pretraining, and fuzzy clustering-based predictor ensemble. Specifically, GENNAPE represents a given neural network as a Computation Graph (CG) of atomic operations which can model an arbitrary architecture. It first learns a graph encoder via Contrastive Learning to encourage network separation by topological features, and then trains multiple predictor heads, which are soft-aggregated according to the fuzzy membership of a neural network. Experiments show that GENNAPE pretrained on NAS-Bench-101 can achieve superior transferability to 5 different public neural network benchmarks, including NAS-Bench-201, NAS-Bench-301, MobileNet and ResNet families under no or minimum fine-tuning. We further introduce 3 challenging newly labelled neural network benchmarks: HiAML, Inception and Two-Path, which can concentrate in narrow accuracy ranges. Extensive experiments show that GENNAPE can correctly discern high-performance architectures in these families. Finally, when paired with a search algorithm, GENNAPE can find architectures that improve accuracy while reducing FLOPs on three families.
△ Less
Submitted 24 April, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Profiling Neural Blocks and Design Spaces for Mobile Neural Architecture Search
Authors:
Keith G. Mills,
Fred X. Han,
Jialin Zhang,
Seyed Saeed Changiz Rezaei,
Fabian Chudak,
Wei Lu,
Shuo Lian,
Shangling Jui,
Di Niu
Abstract:
Neural architecture search automates neural network design and has achieved state-of-the-art results in many deep learning applications. While recent literature has focused on designing networks to maximize accuracy, little work has been conducted to understand the compatibility of architecture design spaces to varying hardware. In this paper, we analyze the neural blocks used to build Once-for-Al…
▽ More
Neural architecture search automates neural network design and has achieved state-of-the-art results in many deep learning applications. While recent literature has focused on designing networks to maximize accuracy, little work has been conducted to understand the compatibility of architecture design spaces to varying hardware. In this paper, we analyze the neural blocks used to build Once-for-All (MobileNetV3), ProxylessNAS and ResNet families, in order to understand their predictive power and inference latency on various devices, including Huawei Kirin 9000 NPU, RTX 2080 Ti, AMD Threadripper 2990WX, and Samsung Note10. We introduce a methodology to quantify the friendliness of neural blocks to hardware and the impact of their placement in a macro network on overall network performance via only end-to-end measurements. Based on extensive profiling results, we derive design insights and apply them to hardware-specific search space reduction. We show that searching in the reduced search space generates better accuracy-latency Pareto frontiers than searching in the original search spaces, customizing architecture search according to the hardware. Moreover, insights derived from measurements lead to notably higher ImageNet top-1 scores on all search spaces investigated.
△ Less
Submitted 25 September, 2021;
originally announced September 2021.
-
Solving SAT and MaxSAT with a Quantum Annealer: Foundations, Encodings, and Preliminary Results
Authors:
Zhengbing Bian,
Fabian Chudak,
William Macready,
Aidan Roy,
Roberto Sebastiani,
Stefano Varotti
Abstract:
Quantum annealers (QAs) are specialized quantum computers that minimize objective functions over discrete variables by physically exploiting quantum effects. Current QA platforms allow for the optimization of quadratic objectives defined over binary variables (qubits), also known as Ising problems. In the last decade, QA systems as implemented by D-Wave have scaled with Moore-like growth. Current…
▽ More
Quantum annealers (QAs) are specialized quantum computers that minimize objective functions over discrete variables by physically exploiting quantum effects. Current QA platforms allow for the optimization of quadratic objectives defined over binary variables (qubits), also known as Ising problems. In the last decade, QA systems as implemented by D-Wave have scaled with Moore-like growth. Current architectures provide 2048 sparsely-connected qubits, and continued exponential growth is anticipated, together with increased connectivity. We explore the feasibility of such architectures for solving SAT and MaxSAT problems as QA systems scale. We develop techniques for effectively encoding SAT -and, with some limitations, MaxSAT- into Ising problems compatible with sparse QA architectures. We provide the theoretical foundations for this mapping, and present encoding techniques that combine offline Satisfiability and Optimization Modulo Theories with on-the-fly placement and routing. Preliminary empirical tests on a current generation 2048-qubit D-Wave system support the feasibility of the approach for certain SAT and MaxSAT problems.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines
Authors:
Dmytro Korenkevych,
Yanbo Xue,
Zhengbing Bian,
Fabian Chudak,
William G. Macready,
Jason Rolfe,
Evgeny Andriyash
Abstract:
Quantum annealing (QA) is a hardware-based heuristic optimization and sampling method applicable to discrete undirected graphical models. While similar to simulated annealing, QA relies on quantum, rather than thermal, effects to explore complex search spaces. For many classes of problems, QA is known to offer computational advantages over simulated annealing. Here we report on the ability of rece…
▽ More
Quantum annealing (QA) is a hardware-based heuristic optimization and sampling method applicable to discrete undirected graphical models. While similar to simulated annealing, QA relies on quantum, rather than thermal, effects to explore complex search spaces. For many classes of problems, QA is known to offer computational advantages over simulated annealing. Here we report on the ability of recent QA hardware to accelerate training of fully visible Boltzmann machines. We characterize the sampling distribution of QA hardware, and show that in many cases, the quantum distributions differ significantly from classical Boltzmann distributions. In spite of this difference, training (which seeks to match data and model statistics) using standard classical gradient updates is still effective. We investigate the use of QA for seeding Markov chains as an alternative to contrastive divergence (CD) and persistent contrastive divergence (PCD). Using $k=50$ Gibbs steps, we show that for problems with high-energy barriers between modes, QA-based seeds can improve upon chains with CD and PCD initializations. For these hard problems, QA gradient estimates are more accurate, and allow for faster learning. Furthermore, and interestingly, even the case of raw QA samples (that is, $k=0$) achieved similar improvements. We argue that this relates to the fact that we are training a quantum rather than classical Boltzmann distribution in this case. The learned parameters give rise to hardware QA distributions closely approximating classical Boltzmann distributions that are hard to train with CD/PCD.
△ Less
Submitted 14 November, 2016;
originally announced November 2016.
-
Investigating the Performance of an Adiabatic Quantum Optimization Processor
Authors:
Kamran Karimi,
Neil G. Dickson,
Firas Hamze,
M. H. S. Amin,
Marshall Drew-Brook,
Fabian A. Chudak,
Paul I. Bunyk,
William G. Macready,
Geordie Rose
Abstract:
Adiabatic quantum optimization offers a new method for solving hard optimization problems. In this paper we calculate median adiabatic times (in seconds) determined by the minimum gap during the adiabatic quantum optimization for an NP-hard Ising spin glass instance class with up to 128 binary variables. Using parameters obtained from a realistic superconducting adiabatic quantum processor, we ext…
▽ More
Adiabatic quantum optimization offers a new method for solving hard optimization problems. In this paper we calculate median adiabatic times (in seconds) determined by the minimum gap during the adiabatic quantum optimization for an NP-hard Ising spin glass instance class with up to 128 binary variables. Using parameters obtained from a realistic superconducting adiabatic quantum processor, we extract the minimum gap and matrix elements using high performance Quantum Monte Carlo simulations on a large-scale Internet-based computing platform. We compare the median adiabatic times with the median running times of two classical solvers and find that, for the considered problem sizes, the adiabatic times for the simulated processor architecture are about 4 and 6 orders of magnitude shorter than the two classical solvers' times. This shows that if the adiabatic time scale were to determine the computation time, adiabatic quantum optimization would be significantly superior to those classical solvers for median spin glass problems of at least up to 128 qubits. We also discuss important additional constraints that affect the performance of a realistic system.
△ Less
Submitted 27 January, 2011; v1 submitted 21 June, 2010;
originally announced June 2010.
-
Fast optical layer mesh protection using pre-cross-connected trails
Authors:
Timothy Y. Chow,
Fabian Chudak,
Anthony M. Ffrench
Abstract:
Conventional optical networks are based on SONET rings, but since rings are known to use bandwidth inefficiently, there has been much research into shared mesh protection, which promises significant bandwidth savings. Unfortunately, most shared mesh protection schemes cannot guarantee that failed traffic will be restored within the 50 ms timeframe that SONET standards specify. A notable exceptio…
▽ More
Conventional optical networks are based on SONET rings, but since rings are known to use bandwidth inefficiently, there has been much research into shared mesh protection, which promises significant bandwidth savings. Unfortunately, most shared mesh protection schemes cannot guarantee that failed traffic will be restored within the 50 ms timeframe that SONET standards specify. A notable exception is the p-cycle scheme of Grover and Stamatelakis. We argue, however, that p-cycles have certain limitations, e.g., there is no easy way to adapt p-cycles to a path-based protection scheme, and p-cycles seem more suited to static traffic than to dynamic traffic. In this paper we show that the key to fast restoration times is not a ring-like topology per se, but rather the ability to pre-cross-connect protection paths. This leads to the concept of a pre-cross-connected trail or PXT, which is a structure that is more flexible than rings and that adapts readily to both path-based and link-based schemes and to both static and dynamic traffic. The PXT protection scheme achieves fast restoration speeds, and our simulations, which have been carefully chosen using ideas from experimental design theory, show that the bandwidth efficiency of the PXT protection scheme is comparable to that of conventional shared mesh protection schemes.
△ Less
Submitted 27 July, 2004; v1 submitted 3 September, 2002;
originally announced September 2002.