A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

Wang, Nick J. C.; Wang, Shaojun; Xiao, Jing

Computer Science > Computation and Language

arXiv:2204.03879 (cs)

[Submitted on 8 Apr 2022]

Title:A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

Authors:Nick J.C. Wang, Shaojun Wang, Jing Xiao

View PDF

Abstract:SLU combines ASR and NLU capabilities to accomplish speech-to-intent understanding. In this paper, we compare different ways to combine ASR and NLU, in particular using a single Conformer model with different ways to use its components, to better understand the strengths and weaknesses of each approach. We find that it is not necessarily a choice between two-stage decoding and end-to-end systems which determines the best system for research or application. System optimization still entails carefully improving the performance of each component. It is difficult to prove that one direction is conclusively better than the other. In this paper, we also propose a novel connectionist temporal summarization (CTS) method to reduce the length of acoustic encoding sequences while improving the accuracy and processing speed of end-to-end models. This method achieves the same intent accuracy as the best two-stage SLU recognition with complicated and time-consuming decoding but does so at lower computational cost. This stacked end-to-end SLU system yields an intent accuracy of 93.97% for the SmartLights far-field set, 95.18% for the close-field set, and 99.71% for FluentSpeech.

Comments:	Submitted to INTERSPEECH 2022. (5 pages, 1 figure.)
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.03879 [cs.CL]
	(or arXiv:2204.03879v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.03879

Submission history

From: Nick Wang J.C. [view email]
[v1] Fri, 8 Apr 2022 07:12:11 UTC (91 KB)

Computer Science > Computation and Language

Title:A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators