-
TinyIREE: An ML Execution Environment for Embedded Systems from Compilation to Deployment
Authors:
Hsin-I Cindy Liu,
Marius Brehler,
Mahesh Ravishankar,
Nicolas Vasilache,
Ben Vanik,
Stella Laurenzo
Abstract:
Machine learning model deployment for training and execution has been an important topic for industry and academic research in the last decade. Much of the attention has been focused on developing specific toolchains to support acceleration hardware. In this paper, we present IREE, a unified compiler and runtime stack with the explicit goal to scale down machine learning programs to the smallest f…
▽ More
Machine learning model deployment for training and execution has been an important topic for industry and academic research in the last decade. Much of the attention has been focused on developing specific toolchains to support acceleration hardware. In this paper, we present IREE, a unified compiler and runtime stack with the explicit goal to scale down machine learning programs to the smallest footprints for mobile and edge devices, while maintaining the ability to scale up to larger deployment targets. IREE adopts a compiler-based approach and optimizes for heterogeneous hardware accelerators through the use of the MLIR compiler infrastructure which provides the means to quickly design and implement multi-level compiler intermediate representations (IR). More specifically, this paper is focused on TinyIREE, which is a set of deployment options in IREE that accommodate the limited memory and computation resources in embedded systems and bare-metal platforms, while also demonstrating IREE's intuitive workflow that generates workloads for different ISA extensions and ABIs through LLVM.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
Composable and Modular Code Generation in MLIR: A Structured and Retargetable Approach to Tensor Compiler Construction
Authors:
Nicolas Vasilache,
Oleksandr Zinenko,
Aart J. C. Bik,
Mahesh Ravishankar,
Thomas Raoux,
Alexander Belyaev,
Matthias Springer,
Tobias Gysi,
Diego Caballero,
Stephan Herhut,
Stella Laurenzo,
Albert Cohen
Abstract:
Despite significant investment in software infrastructure, machine learning systems, runtimes and compilers do not compose properly. We propose a new design aiming at providing unprecedented degrees of modularity, composability and genericity. This paper discusses a structured approach to the construction of domain-specific code generators for tensor compilers, with the stated goal of improving th…
▽ More
Despite significant investment in software infrastructure, machine learning systems, runtimes and compilers do not compose properly. We propose a new design aiming at providing unprecedented degrees of modularity, composability and genericity. This paper discusses a structured approach to the construction of domain-specific code generators for tensor compilers, with the stated goal of improving the productivity of both compiler engineers and end-users. The approach leverages the natural structure of tensor algebra. It has been the main driver for the design of progressive lowering paths in \MLIR. The proposed abstractions and transformations span data structures and control flow with both functional (SSA form) and imperative (side-effecting) semantics. We discuss the implications of this infrastructure on compiler construction and present preliminary experimental results.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Streaming keyword spotting on mobile devices
Authors:
Oleg Rybakov,
Natasha Kononenko,
Niranjan Subrahmanya,
Mirko Visontai,
Stella Laurenzo
Abstract:
In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones. NN model conversion from non-streaming mode (model receives the whole input sequence and then returns the classification result) to streaming mode (model receives portion of the input sequence and classifies it incrementally) may require manual model rewriting. W…
▽ More
In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones. NN model conversion from non-streaming mode (model receives the whole input sequence and then returns the classification result) to streaming mode (model receives portion of the input sequence and classifies it incrementally) may require manual model rewriting. We address this by designing a Tensorflow/Keras based library which allows automatic conversion of non-streaming models to streaming ones with minimum effort. With this library we benchmark multiple KWS models in both streaming and non-streaming modes on mobile phones and demonstrate different tradeoffs between latency and accuracy. We also explore novel KWS models with multi-head attention which reduce the classification error over the state-of-art by 10% on Google speech commands data sets V2. The streaming library with all experiments is open-sourced.
△ Less
Submitted 28 July, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Authors:
Jonathan Shen,
Patrick Nguyen,
Yonghui Wu,
Zhifeng Chen,
Mia X. Chen,
Ye Jia,
Anjuli Kannan,
Tara Sainath,
Yuan Cao,
Chung-Cheng Chiu,
Yanzhang He,
Jan Chorowski,
Smit Hinsu,
Stella Laurenzo,
James Qin,
Orhan Firat,
Wolfgang Macherey,
Suyog Gupta,
Ankur Bapna,
Shuyuan Zhang,
Ruoming Pang,
Ron J. Weiss,
Rohit Prabhavalkar,
Qiao Liang,
Benoit Jacob
, et al. (66 additional authors not shown)
Abstract:
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly w…
▽ More
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework.
△ Less
Submitted 21 February, 2019;
originally announced February 2019.
-
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Authors:
Ye Jia,
Melvin Johnson,
Wolfgang Macherey,
Ron J. Weiss,
Yuan Cao,
Chung-Cheng Chiu,
Naveen Ari,
Stella Laurenzo,
Yonghui Wu
Abstract:
End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding. However, the quality of end-to-end ST is often limited by a paucity of training data, since it is difficult to collect large parallel corpora…
▽ More
End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding. However, the quality of end-to-end ST is often limited by a paucity of training data, since it is difficult to collect large parallel corpora of speech and translated transcript pairs. Previous studies have proposed the use of pre-trained components and multi-task learning in order to benefit from weakly supervised training data, such as speech-to-transcript or text-to-foreign-text pairs. In this paper, we demonstrate that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speech-to-translation pairs for ST training can be more effective than multi-task learning. Furthermore, we demonstrate that a high quality end-to-end ST model can be trained using only weakly supervised datasets, and that synthetic data sourced from unlabeled monolingual text or speech can be used to improve performance. Finally, we discuss methods for avoiding overfitting to synthetic speech with a quantitative ablation study.
△ Less
Submitted 10 February, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.