Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Amodei, Dario; Anubhai, Rishita; Battenberg, Eric; Case, Carl; Casper, Jared; Catanzaro, Bryan; Chen, Jingdong; Chrzanowski, Mike; Coates, Adam; Diamos, Greg; Elsen, Erich; Engel, Jesse; Fan, Linxi; Fougner, Christopher; Han, Tony; Hannun, Awni; Jun, Billy; LeGresley, Patrick; Lin, Libby; Narang, Sharan; Ng, Andrew; Ozair, Sherjil; Prenger, Ryan; Raiman, Jonathan; Satheesh, Sanjeev; Seetapun, David; Sengupta, Shubho; Wang, Yi; Wang, Zhiqian; Wang, Chong; Xiao, Bo; Yogatama, Dani; Zhan, Jun; Zhu, Zhenyao

Computer Science > Computation and Language

arXiv:1512.02595 (cs)

[Submitted on 8 Dec 2015]

Title:Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

View PDF

Abstract:We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1512.02595 [cs.CL]
	(or arXiv:1512.02595v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1512.02595

Submission history

From: Awni Hannun [view email]
[v1] Tue, 8 Dec 2015 19:13:50 UTC (871 KB)

Computer Science > Computation and Language

Title:Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Submission history

Access Paper:

References & Citations

3 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Submission history

Access Paper:

References & Citations

3 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators