-
Scalable Wavelength Arbitration for Microring-based DWDM Transceivers
Authors:
Sunjin Choi,
Vladimir Stojanović
Abstract:
This paper introduces the concept of autonomous microring arbitration, or wavelength arbitration, to address the challenge of multi-microring initialization in microring-based Dense-Wavelength-Division-Multiplexed (DWDM) transceivers. This arbitration is inherently policy-driven, defining critical system characteristics such as the spectral ordering of microrings. Furthermore, to facilitate large-…
▽ More
This paper introduces the concept of autonomous microring arbitration, or wavelength arbitration, to address the challenge of multi-microring initialization in microring-based Dense-Wavelength-Division-Multiplexed (DWDM) transceivers. This arbitration is inherently policy-driven, defining critical system characteristics such as the spectral ordering of microrings. Furthermore, to facilitate large-scale deployment, the arbitration algorithms must operate independently of specific wavelength information and be resilient to system variability. Addressing these complexities requires a holistic approach that encompasses the entire system, from device-level variabilities to the transceiver electrical-to-optical interface - this system-wide perspective is the focus of this paper. To support efficient analysis, we develop a hierarchical framework incorporating an ideal, wavelength-aware arbitration model to examine arbitration failures at both the policy and algorithmic levels. The effectiveness of this approach is demonstrated in two ways: by analyzing the robustness of each policy in relation to device variabilities, and by developing an algorithm that achieves near-perfect alignment with the ideal model, offering superior robustness compared to the traditional sequential tuning method. The simulator code used in this paper is available at https://github.com/wdmsim/wdm-simulator.
△ Less
Submitted 7 February, 2025; v1 submitted 22 November, 2024;
originally announced November 2024.
-
Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design
Authors:
Kourosh Hakhamaneshi,
Marcel Nassar,
Mariano Phielipp,
Pieter Abbeel,
Vladimir Stojanović
Abstract:
Being able to predict the performance of circuits without running expensive simulations is a desired capability that can catalyze automated design. In this paper, we present a supervised pretraining approach to learn circuit representations that can be adapted to new circuit topologies or unseen prediction tasks. We hypothesize that if we train a neural network (NN) that can predict the output DC…
▽ More
Being able to predict the performance of circuits without running expensive simulations is a desired capability that can catalyze automated design. In this paper, we present a supervised pretraining approach to learn circuit representations that can be adapted to new circuit topologies or unseen prediction tasks. We hypothesize that if we train a neural network (NN) that can predict the output DC voltages of a wide range of circuit instances it will be forced to learn generalizable knowledge about the role of each circuit element and how they interact with each other. The dataset for this supervised learning objective can be easily collected at scale since the required DC simulation to get ground truth labels is relatively cheap. This representation would then be helpful for few-shot generalization to unseen circuit metrics that require more time consuming simulations for obtaining the ground-truth labels. To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings. We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties with up to 10x more sample efficiency compared to a randomly initialized model. We further show that we can improve sample efficiency of prior SoTA model-based optimization methods by 2x (almost as good as using an oracle model) via fintuning pretrained GNNs as the feature extractor of the learned models.
△ Less
Submitted 1 April, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data
Authors:
Kourosh Hakhamaneshi,
Pieter Abbeel,
Vladimir Stojanovic,
Aditya Grover
Abstract:
The goal of Multi-task Bayesian Optimization (MBO) is to minimize the number of queries required to accurately optimize a target black-box function, given access to offline evaluations of other auxiliary functions. When offline datasets are large, the scalability of prior approaches comes at the expense of expressivity and inference quality. We propose JUMBO, an MBO algorithm that sidesteps these…
▽ More
The goal of Multi-task Bayesian Optimization (MBO) is to minimize the number of queries required to accurately optimize a target black-box function, given access to offline evaluations of other auxiliary functions. When offline datasets are large, the scalability of prior approaches comes at the expense of expressivity and inference quality. We propose JUMBO, an MBO algorithm that sidesteps these limitations by querying additional data based on a combination of acquisition signals derived from training two Gaussian Processes (GP): a cold-GP operating directly in the input domain and a warm-GP that operates in the feature space of a deep neural network pretrained using the offline data. Such a decomposition can dynamically control the reliability of information derived from the online and offline data and the use of pretrained neural networks permits scalability to large offline datasets. Theoretically, we derive regret bounds for JUMBO and show that it achieves no-regret under conditions analogous to GP-UCB (Srinivas et. al. 2010). Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems: hyper-parameter optimization and automated circuit design.
△ Less
Submitted 10 March, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction
Authors:
Kourosh Hakhamaneshi,
Keertana Settaluri,
Pieter Abbeel,
Vladimir Stojanovic
Abstract:
In this work we present a new method of black-box optimization and constraint satisfaction. Existing algorithms that have attempted to solve this problem are unable to consider multiple modes, and are not able to adapt to changes in environment dynamics. To address these issues, we developed a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling unifor…
▽ More
In this work we present a new method of black-box optimization and constraint satisfaction. Existing algorithms that have attempted to solve this problem are unable to consider multiple modes, and are not able to adapt to changes in environment dynamics. To address these issues, we developed a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space. We train the model using maximum entropy policy gradient methods from Reinforcement Learning. Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions. We empirically compare our algorithm with variations of CEM, including one with a Gaussian prior with fixed variance, and demonstrate better performance in terms of: number of diverse solutions, better mode discovery in multi-modal problems, and better sample efficiency in certain cases.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Tuning Algorithms and Generators for Efficient Edge Inference
Authors:
Rawan Naous,
Lazar Supic,
Yoonhwan Kang,
Ranko Sredojevic,
Anish Singhani,
Vladimir Stojanovic
Abstract:
A surge in artificial intelligence and autonomous technologies have increased the demand toward enhanced edge-processing capabilities. Computational complexity and size of state-of-the-art Deep Neural Networks (DNNs) are rising exponentially with diverse network models and larger datasets. This growth limits the performance scaling and energy-efficiency of both distributed and embedded inference p…
▽ More
A surge in artificial intelligence and autonomous technologies have increased the demand toward enhanced edge-processing capabilities. Computational complexity and size of state-of-the-art Deep Neural Networks (DNNs) are rising exponentially with diverse network models and larger datasets. This growth limits the performance scaling and energy-efficiency of both distributed and embedded inference platforms. Embedded designs at the edge are constrained by energy and speed limitations of available processor substrates and processor to memory communication required to fetch the model coefficients. While many hardware accelerator and network deployment frameworks have been in development, a framework is needed to allow the variety of existing architectures, and those in development, to be expressed in critical parts of the flow that perform various optimization steps. Moreover, premature architecture-blind network selection and optimization diminish the effectiveness of schedule optimizations and hardware-specific mappings. In this paper, we address these issues by creating a cross-layer software-hardware design framework that encompasses network training and model compression that is aware of and tuned to the underlying hardware architecture. This approach leverages the available degrees of DNN structure and sparsity to create a converged network that can be partitioned and efficiently scheduled on the target hardware platform, minimizing data movement, and improving the overall throughput and energy. To further streamline the design, we leverage the high-level, flexible SoC generator platform based on RISC-V ROCC framework. This integration allows seamless extensions of the RISC-V instruction set and Chisel-based rapid generator design. Utilizing this approach, we implemented a silicon prototype in a 16 nm TSMC process node achieving record processing efficiency of up to 18 TOPS/W.
△ Less
Submitted 10 May, 2020; v1 submitted 30 July, 2019;
originally announced August 2019.
-
BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks
Authors:
Kourosh Hakhamaneshi,
Nick Werblun,
Pieter Abbeel,
Vladimir Stojanovic
Abstract:
The discrepancy between post-layout and schematic simulation results continues to widen in analog design due in part to the domination of layout parasitics. This paradigm shift is forcing designers to adopt design methodologies that seamlessly integrate layout effects into the standard design flow. Hence, any simulation-based optimization framework should take into account time-consuming post-layo…
▽ More
The discrepancy between post-layout and schematic simulation results continues to widen in analog design due in part to the domination of layout parasitics. This paradigm shift is forcing designers to adopt design methodologies that seamlessly integrate layout effects into the standard design flow. Hence, any simulation-based optimization framework should take into account time-consuming post-layout simulation results. This work presents a learning framework that learns to reduce the number of simulations of evolutionary-based combinatorial optimizers, using a DNN that discriminates against generated samples, before running simulations. Using this approach, the discriminator achieves at least two orders of magnitude improvement on sample efficiency for several large circuit examples including an optical link receiver layout.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
MPDCompress - Matrix Permutation Decomposition Algorithm for Deep Neural Network Compression
Authors:
Lazar Supic,
Rawan Naous,
Ranko Sredojevic,
Aleksandra Faust,
Vladimir Stojanovic
Abstract:
Deep neural networks (DNNs) have become the state-of-the-art technique for machine learning tasks in various applications. However, due to their size and the computational complexity, large DNNs are not readily deployable on edge devices in real-time. To manage complexity and accelerate computation, network compression techniques based on pruning and quantization have been proposed and shown to be…
▽ More
Deep neural networks (DNNs) have become the state-of-the-art technique for machine learning tasks in various applications. However, due to their size and the computational complexity, large DNNs are not readily deployable on edge devices in real-time. To manage complexity and accelerate computation, network compression techniques based on pruning and quantization have been proposed and shown to be effective in reducing network size. However, such network compression can result in irregular matrix structures that are mismatched with modern hardware-accelerated platforms, such as graphics processing units (GPUs) designed to perform the DNN matrix multiplications in a structured (block-based) way. We propose MPDCompress, a DNN compression algorithm based on matrix permutation decomposition via random mask generation. In-training application of the masks molds the synaptic weight connection matrix to a sub-graph separation format. Aided by the random permutations, a hardware-desirable block matrix is generated, allowing for a more efficient implementation and compression of the network. To show versatility, we empirically verify MPDCompress on several network models, compression rates, and image datasets. On the LeNet 300-100 model (MNIST dataset), Deep MNIST, and CIFAR10, we achieve 10 X network compression with less than 1% accuracy loss compared to non-compressed accuracy performance. On AlexNet for the full ImageNet ILSVRC-2012 dataset, we achieve 8 X network compression with less than 1% accuracy loss, with top-5 and top-1 accuracies of 79.6% and 56.4%, respectively. Finally, we observe that the algorithm can offer inference speedups across various hardware platforms, with 4 X faster operation achieved on several mobile GPUs.
△ Less
Submitted 30 May, 2018;
originally announced May 2018.
-
Structured Deep Neural Network Pruning via Matrix Pivoting
Authors:
Ranko Sredojevic,
Shaoyi Cheng,
Lazar Supic,
Rawan Naous,
Vladimir Stojanovic
Abstract:
Deep Neural Networks (DNNs) are the key to the state-of-the-art machine vision, sensor fusion and audio/video signal processing. Unfortunately, their computation complexity and tight resource constraints on the Edge make them hard to leverage on mobile, embedded and IoT devices. Due to great diversity of Edge devices, DNN designers have to take into account the hardware platform and application re…
▽ More
Deep Neural Networks (DNNs) are the key to the state-of-the-art machine vision, sensor fusion and audio/video signal processing. Unfortunately, their computation complexity and tight resource constraints on the Edge make them hard to leverage on mobile, embedded and IoT devices. Due to great diversity of Edge devices, DNN designers have to take into account the hardware platform and application requirements during network training. In this work we introduce pruning via matrix pivoting as a way to improve network pruning by compromising between the design flexibility of architecture-oblivious and performance efficiency of architecture-aware pruning, the two dominant techniques for obtaining resource-efficient DNNs. We also describe local and global network optimization techniques for efficient implementation of the resulting pruned networks. In combination, the proposed pruning and implementation result in close to linear speed up with the reduction of network coefficients during pruning.
△ Less
Submitted 30 November, 2017;
originally announced December 2017.
-
On U-Statistics and Compressed Sensing II: Non-Asymptotic Worst-Case Analysis
Authors:
Fabian Lim,
Vladimir Stojanovic
Abstract:
In another related work, U-statistics were used for non-asymptotic "average-case" analysis of random compressed sensing matrices. In this companion paper the same analytical tool is adopted differently - here we perform non-asymptotic "worst-case" analysis.
Simple union bounds are a natural choice for "worst-case" analyses, however their tightness is an issue (and questioned in previous works).…
▽ More
In another related work, U-statistics were used for non-asymptotic "average-case" analysis of random compressed sensing matrices. In this companion paper the same analytical tool is adopted differently - here we perform non-asymptotic "worst-case" analysis.
Simple union bounds are a natural choice for "worst-case" analyses, however their tightness is an issue (and questioned in previous works). Here we focus on a theoretical U-statistical result, which potentially allows us to prove that these union bounds are tight. To our knowledge, this kind of (powerful) result is completely new in the context of CS. This general result applies to a wide variety of parameters, and is related to (Stein-Chen) Poisson approximation. In this paper, we consider i) restricted isometries, and ii) mutual coherence. For the bounded case, we show that k-th order restricted isometry constants have tight union bounds, when the measurements m = \mathcal{O}(k (1 + \log(n/k))). Here we require the restricted isometries to grow linearly in k, however we conjecture that this result can be improved to allow them to be fixed. Also, we show that mutual coherence (with the standard estimate \sqrt{(4\log n)/m}) have very tight union bounds.
For coherence, the normalization complicates general discussion, and we consider only Gaussian and Bernoulli cases here.
△ Less
Submitted 30 October, 2012;
originally announced October 2012.
-
On U-Statistics and Compressed Sensing I: Non-Asymptotic Average-Case Analysis
Authors:
Fabian Lim,
Vladimir Marko Stojanovic
Abstract:
Hoeffding's U-statistics model combinatorial-type matrix parameters (appearing in CS theory) in a natural way. This paper proposes using these statistics for analyzing random compressed sensing matrices, in the non-asymptotic regime (relevant to practice). The aim is to address certain pessimisms of "worst-case" restricted isometry analyses, as observed by both Blanchard & Dossal, et. al.
We sho…
▽ More
Hoeffding's U-statistics model combinatorial-type matrix parameters (appearing in CS theory) in a natural way. This paper proposes using these statistics for analyzing random compressed sensing matrices, in the non-asymptotic regime (relevant to practice). The aim is to address certain pessimisms of "worst-case" restricted isometry analyses, as observed by both Blanchard & Dossal, et. al.
We show how U-statistics can obtain "average-case" analyses, by relating to statistical restricted isometry property (StRIP) type recovery guarantees. However unlike standard StRIP, random signal models are not required; the analysis here holds in the almost sure (probabilistic) sense. For Gaussian/bounded entry matrices, we show that both l1-minimization and LASSO essentially require on the order of k \cdot [\log((n-k)/u) + \sqrt{2(k/n) \log(n/k)}] measurements to respectively recover at least 1-5u fraction, and 1-4u fraction, of the signals. Noisy conditions are considered. Empirical evidence suggests our analysis to compare well to Donoho & Tanner's recent large deviation bounds for l0/l1-equivalence, in the regime of block lengths 1000-3000 with high undersampling (50-150 measurements); similar system sizes are found in recent CS implementation.
In this work, it is assumed throughout that matrix columns are independently sampled.
△ Less
Submitted 30 October, 2012;
originally announced October 2012.