Skip to main content

Showing 1–41 of 41 results for author: Beaufays, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10443  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Federated Learning of Large ASR Models in the Real World

    Authors: Yonghui Xiao, Yuxin Ding, Changwan Ryu, Petr Zadrazil, Francoise Beaufays

    Abstract: Federated learning (FL) has shown promising results on training machine learning models with privacy preservation. However, for large models with over 100 million parameters, the training resource requirement becomes an obstacle for FL because common devices do not have enough memory and computation power to finish the FL tasks. Although efficient training methods have been proposed, it is still a… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2402.18932  [pdf, other

    eess.AS cs.SD

    Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

    Authors: Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Fadi Biadsy, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov

    Abstract: Collecting high-quality studio recordings of audio is challenging, which limits the language coverage of text-to-speech (TTS) systems. This paper proposes a framework for scaling a multilingual TTS model to 100+ languages using found data without supervision. The proposed framework combines speech-text encoder pretraining with unsupervised training using untranscribed speech and unspoken text data… ▽ More

    Submitted 16 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: To appear in ICASSP 2024. Demo page: https://google.github.io/tacotron/publications/extending_tts/

  4. arXiv:2309.09996  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Improving Speech Recognition for African American English With Audio Classification

    Authors: Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar

    Abstract: Automatic speech recognition (ASR) systems have been shown to have large quality disparities between the language varieties they are intended or expected to recognize. One way to mitigate this is to train or fine-tune models with more representative datasets. But this approach can be hindered by limited in-domain data for training and evaluation. We propose a new way to improve the robustness of a… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  5. arXiv:2305.15663  [pdf, other

    cs.CL cs.SD eess.AS

    Mixture-of-Expert Conformer for Streaming Multilingual ASR

    Authors: Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays

    Abstract: End-to-end models with large capacity have significantly improved multilingual automatic speech recognition, but their computation cost poses challenges for on-device applications. We propose a streaming truly multilingual Conformer incorporating mixture-of-expert (MoE) layers that learn to only activate a subset of parameters in training and inference. The MoE layer consists of a softmax gate whi… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  6. arXiv:2304.00173  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Lego-Features: Exporting modular encoder features for streaming and deliberation ASR

    Authors: Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays

    Abstract: In end-to-end (E2E) speech recognition models, a representational tight-coupling inevitably emerges between the encoder and the decoder. We build upon recent work that has begun to explore building encoders with modular encoded representations, such that encoders and decoders from different models can be stitched together in a zero-shot manner without further fine-tuning. While previous research o… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  7. arXiv:2303.01037  [pdf, other

    cs.CL cs.SD eess.AS

    Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

    Authors: Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk , et al. (2 additional authors not shown)

    Abstract: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quant… ▽ More

    Submitted 24 September, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 20 pages, 7 figures, 8 tables

  8. arXiv:2302.01496  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Efficient Domain Adaptation for Speech Foundation Models

    Authors: Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

    Abstract: Foundation models (FMs), that are trained on broad data at scale and are adaptable to a wide range of downstream tasks, have brought large interest in the research community. Benefiting from the diverse data sources such as different modalities, languages and application domains, foundation models have demonstrated strong generalization and knowledge transfer capabilities. In this paper, we presen… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  9. arXiv:2209.06359  [pdf, other

    cs.LG cs.AI

    Federated Pruning: Improving Neural Network Efficiency with Federated Learning

    Authors: Rongmei Lin, Yonghui Xiao, Tien-Ju Yang, Ding Zhao, Li Xiong, Giovanni Motta, Françoise Beaufays

    Abstract: Automatic Speech Recognition models require large amount of speech data for training, and the collection of such data often leads to privacy concerns. Federated learning has been widely used and is considered to be an effective decentralized technique by collaboratively learning a shared prediction model while keeping the data local on different clients devices. However, the limited computation an… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: To appear in INTERSPEECH 2022

  10. arXiv:2205.03494  [pdf, other

    cs.LG

    Online Model Compression for Federated Learning with Large Models

    Authors: Tien-Ju Yang, Yonghui Xiao, Giovanni Motta, Françoise Beaufays, Rajiv Mathews, Mingqing Chen

    Abstract: This paper addresses the challenges of training large neural network models under federated learning settings: high on-device memory usage and communication cost. The proposed Online Model Compression (OMC) provides a framework that stores model parameters in a compressed format and decompresses them only when needed. We use quantization as the compression method in this paper and propose three me… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  11. arXiv:2204.08345  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Extracting Targeted Training Data from ASR Models, and How to Mitigate It

    Authors: Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays

    Abstract: Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data… ▽ More

    Submitted 27 June, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: Accepted to appear at Interspeech'22

  12. arXiv:2204.06322  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

    Authors: Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays

    Abstract: We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering str… ▽ More

    Submitted 29 June, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  13. arXiv:2201.06469  [pdf, ps, other

    cs.CL

    Handling Compounding in Mobile Keyboard Input

    Authors: Andreas Kabel, Keith Hall, Tom Ouyang, David Rybach, Daan van Esch, Françoise Beaufays

    Abstract: This paper proposes a framework to improve the typing experience of mobile users in morphologically rich languages. Smartphone keyboards typically support features such as input decoding, corrections and predictions that all rely on language models. For latency reasons, these operations happen on device, so the models are of limited size and cannot easily cover all the words needed by users for th… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: 7 pages

  14. arXiv:2111.00556  [pdf, other

    cs.LG cs.CL cs.CR

    Revealing and Protecting Labels in Distributed Training

    Authors: Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays

    Abstract: Distributed learning paradigms such as federated learning often involve transmission of model updates, or gradients, over a network, thereby avoiding transmission of private data. However, it is possible for sensitive information about the training data to be revealed from such gradients. Prior works have demonstrated that labels can be revealed analytically from the last layer of certain models (… ▽ More

    Submitted 31 October, 2021; originally announced November 2021.

  15. arXiv:2110.05607  [pdf, other

    cs.LG cs.SD eess.AS

    Partial Variable Training for Efficient On-Device Federated Learning

    Authors: Tien-Ju Yang, Dhruv Guliani, Françoise Beaufays, Giovanni Motta

    Abstract: This paper aims to address the major challenges of Federated Learning (FL) on edge devices: limited memory and expensive communication. We propose a novel method, called Partial Variable Training (PVT), that only trains a small subset of variables on edge devices to reduce memory usage and communication cost. With PVT, we show that network accuracy can be maintained by utilizing more local trainin… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

  16. arXiv:2110.04267  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training

    Authors: Lillian Zhou, Dhruv Guliani, Andreas Kabel, Giovanni Motta, Françoise Beaufays

    Abstract: Transformer-based architectures have been the subject of research aimed at understanding their overparameterization and the non-uniform importance of their layers. Applying these approaches to Automatic Speech Recognition, we demonstrate that the state-of-the-art Conformer models generally have multiple ambient layers. We study the stability of these layers across runs and model sizes, propose tha… ▽ More

    Submitted 4 February, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    MSC Class: 68T10 ACM Class: I.2.7

  17. arXiv:2110.03634  [pdf, other

    cs.LG cs.DC

    Enabling On-Device Training of Speech Recognition Models with Federated Dropout

    Authors: Dhruv Guliani, Lillian Zhou, Changwan Ryu, Tien-Ju Yang, Harry Zhang, Yonghui Xiao, Francoise Beaufays, Giovanni Motta

    Abstract: Federated learning can be used to train machine learning models on the edge on local data that never leave devices, providing privacy by default. This presents a challenge pertaining to the communication and computation costs associated with clients' devices. These costs are strongly correlated with the size of the model being trained, and are significant for state-of-the-art automatic speech reco… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    MSC Class: 68T10 ACM Class: I.2.7

  18. arXiv:2110.02220  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.NE

    Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

    Authors: Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Trevor Strohman, Françoise Beaufays

    Abstract: Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result. However, the traditional re-scoring approaches based on an external language model is prone to diverge during the personalized training. In this work, we introduce a model-based… ▽ More

    Submitted 6 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, 3 tables

  19. arXiv:2110.00165  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

    Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

    Abstract: Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online A… ▽ More

    Submitted 15 February, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

    Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables

  20. arXiv:2110.00155  [pdf, other

    cs.SD cs.LG eess.AS

    Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

    Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays

    Abstract: Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server data distribution can be very different from the data distribution on user devices, which could affect the model performance. There are two main challenges for on… ▽ More

    Submitted 30 September, 2021; originally announced October 2021.

    Comments: 5 pages

  21. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  22. arXiv:2106.10259  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

    Authors: Katrin Tomanek, Françoise Beaufays, Julie Cattiau, Angad Chandorkar, Khe Chai Sim

    Abstract: While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, de… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  23. arXiv:2104.07815  [pdf, other

    cs.CL cs.CR cs.LG

    A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter It

    Authors: Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays

    Abstract: End-to-end Automatic Speech Recognition (ASR) models are commonly trained over spoken utterances using optimization methods like Stochastic Gradient Descent (SGD). In distributed settings like Federated Learning, model training requires transmission of gradients over a network. In this work, we design the first method for revealing the identity of the speaker of a training utterance with access on… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  24. Training Speech Recognition Models with Federated Learning: A Quality/Cost Framework

    Authors: Dhruv Guliani, Francoise Beaufays, Giovanni Motta

    Abstract: We propose using federated learning, a decentralized on-device learning paradigm, to train speech recognition models. By performing epochs of training on a per-user basis, federated learning must incur the cost of dealing with non-IID data distributions, which are expected to negatively affect the quality of the trained model. We propose a framework by which the degree of non-IID-ness can be varie… ▽ More

    Submitted 14 May, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

    Comments: Paper published at ICASSP 2021

    MSC Class: 68T10 ACM Class: I.2.7

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 3080-3084

  25. arXiv:2009.10031  [pdf, other

    cs.LG cs.CR stat.ML

    Training Production Language Models without Memorizing User Data

    Authors: Swaroop Ramaswamy, Om Thakkar, Rajiv Mathews, Galen Andrew, H. Brendan McMahan, Françoise Beaufays

    Abstract: This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) while leveraging the Differentially Private Federated Averaging (DP-FedAvg) technique. There has been prior work on building practical FL infrastructure, including work demonstrating the feasibility of training language models on mobile devices using such infrastructure. It has also b… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  26. arXiv:2006.07490  [pdf, other

    cs.LG cs.CL stat.ML

    Understanding Unintended Memorization in Federated Learning

    Authors: Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Françoise Beaufays

    Abstract: Recent works have shown that generative sequence models (e.g., language models) have a tendency to memorize rare or unique sequences in the training data. Since useful models are often trained on sensitive data, to ensure the privacy of the training data it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale di… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  27. arXiv:2006.01416  [pdf, other

    cs.CL cs.SD eess.AS

    Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

    Authors: Yuan Shangguan, Kate Knister, Yanzhang He, Ian McGraw, Francoise Beaufays

    Abstract: The demand for fast and accurate incremental speech recognition increases as the applications of automatic speech recognition (ASR) proliferate. Incremental speech recognizers output chunks of partially recognized words while the user is still talking. Partial results can be revised before the ASR finalizes its hypothesis, causing instability issues. We analyze the quality and stability of on-devi… ▽ More

    Submitted 14 August, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: Accepted at Interspeech 2020

  28. arXiv:2001.08885  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network

    Authors: Mary Gooneratne, Khe Chai Sim, Petr Zadrazil, Andreas Kabel, Françoise Beaufays, Giovanni Motta

    Abstract: Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models. However, one of the major obstacles to achieving this goal is the memory limitation of mobile devices. Reducing training memory enables models with high-dimensional weight matrices, like automatic speech recognition (ASR) models, to be trained on-device. In this paper, we prop… ▽ More

    Submitted 24 January, 2020; originally announced January 2020.

  29. arXiv:1912.09251  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

    Authors: Khe Chai Sim, Françoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif Johnson, Giovanni Motta, Lillian Zhou

    Abstract: We study the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user. These techniques differ in the amounts of user effort required to provide supervision, and are evaluated on how they impact speech recognition performance. We propose using keyword-dependent precision and recall metrics to measure vocabulary acq… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  30. arXiv:1912.01218  [pdf

    cs.HC cs.CL

    Writing Across the World's Languages: Deep Internationalization for Gboard, the Google Keyboard

    Authors: Daan van Esch, Elnaz Sarbar, Tamar Lucassen, Jeremy O'Brien, Theresa Breiner, Manasa Prasad, Evan Crew, Chieu Nguyen, Françoise Beaufays

    Abstract: This technical report describes our deep internationalization program for Gboard, the Google Keyboard. Today, Gboard supports 900+ language varieties across 70+ writing systems, and this report describes how and why we have been adding support for hundreds of language varieties from around the globe. Many languages of the world are increasingly used in writing on an everyday basis, and we describe… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

  31. arXiv:1910.10252  [pdf, other

    cs.LG stat.ML

    Federated Evaluation of On-device Personalization

    Authors: Kangkang Wang, Rajiv Mathews, Chloé Kiddon, Hubert Eichner, Françoise Beaufays, Daniel Ramage

    Abstract: Federated learning is a distributed, on-device computation framework that enables training global models without exporting sensitive user data to servers. In this work, we describe methods to extend the federation framework to evaluate strategies for personalization of global models. We present tools to analyze the effects of personalization and evaluate conditions under which personalization yiel… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: 4 pages, 4 figures

  32. arXiv:1910.03432  [pdf, other

    cs.CL cs.LG

    Federated Learning of N-gram Language Models

    Authors: Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, Michael Riley

    Abstract: We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used to train global models for portable devices such as smart phones. Federated learning is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards, because training is performed without the… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: 10 pages

  33. arXiv:1909.06678  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

    Authors: Khe Chai Sim, Petr Zadrazil, Françoise Beaufays

    Abstract: Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers. However, these systems do not always generalize well for users with very different speech characteristics. This issue can be addressed by building personalized systems that are designed to work well for each specific use… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

  34. arXiv:1906.04329  [pdf, other

    cs.CL cs.LG

    Federated Learning for Emoji Prediction in a Mobile Keyboard

    Authors: Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, Françoise Beaufays

    Abstract: We show that a word-level recurrent neural network can predict emoji from text typed on a mobile keyboard. We demonstrate the usefulness of transfer learning for predicting emoji by pretraining the model using a language modeling task. We also propose mechanisms to trigger emoji and tune the diversity of candidates. The model is trained using a distributed on-device learning framework called feder… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

  35. arXiv:1903.10635  [pdf, other

    cs.CL

    Federated Learning Of Out-Of-Vocabulary Words

    Authors: Mingqing Chen, Rajiv Mathews, Tom Ouyang, Françoise Beaufays

    Abstract: We demonstrate that a character-level recurrent neural network is able to learn out-of-vocabulary (OOV) words under federated learning settings, for the purpose of expanding the vocabulary of a virtual keyboard for smartphones without exporting sensitive text to servers. High-frequency words can be sampled from the trained generative model by drawing from the joint posterior directly. We study the… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

  36. arXiv:1812.02903  [pdf, other

    cs.LG stat.ML

    Applied Federated Learning: Improving Google Keyboard Query Suggestions

    Authors: Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, Françoise Beaufays

    Abstract: Federated learning is a distributed form of machine learning where both the training data and model training are decentralized. In this paper, we use federated learning in a commercial, global-scale setting to train, evaluate and deploy a model to improve virtual keyboard search suggestion quality without direct access to the underlying user data. We describe our observations in federated training… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  37. arXiv:1811.03604  [pdf, other

    cs.CL

    Federated Learning for Mobile Keyboard Prediction

    Authors: Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, Daniel Ramage

    Abstract: We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a… ▽ More

    Submitted 28 February, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: 7 pages, 4 figures

  38. arXiv:1704.03987  [pdf, other

    cs.CL

    Mobile Keyboard Input Decoding with Finite-State Transducers

    Authors: Tom Ouyang, David Rybach, Françoise Beaufays, Michael Riley

    Abstract: We propose a finite-state transducer (FST) representation for the models used to decode keyboard inputs on mobile devices. Drawing from learnings from the field of speech recognition, we describe a decoding framework that can satisfy the strict memory and latency constraints of keyboard input. We extend this framework to support functionalities typically not present in speech recognition, such as… ▽ More

    Submitted 13 April, 2017; originally announced April 2017.

  39. arXiv:1603.03185  [pdf, other

    cs.CL cs.LG cs.SD

    Personalized Speech recognition on mobile devices

    Authors: Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Francoise Beaufays, Carolina Parada

    Abstract: We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its… ▽ More

    Submitted 11 March, 2016; v1 submitted 10 March, 2016; originally announced March 2016.

  40. arXiv:1507.06947  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

    Authors: Haşim Sak, Andrew Senior, Kanishka Rao, Françoise Beaufays

    Abstract: We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models initi… ▽ More

    Submitted 24 July, 2015; originally announced July 2015.

    Comments: To be published in the INTERSPEECH 2015 proceedings

  41. arXiv:1402.1128  [pdf, other

    cs.NE cs.CL cs.LG stat.ML

    Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

    Authors: Haşim Sak, Andrew Senior, Françoise Beaufays

    Abstract: Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture that has been designed to address the vanishing and exploding gradient problems of conventional RNNs. Unlike feedforward neural networks, RNNs have cyclic connections making them powerful for modeling sequences. They have been successfully used for sequence labeling and sequence prediction tasks, such as handwriting rec… ▽ More

    Submitted 5 February, 2014; originally announced February 2014.