Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Dai, Wenliang; Cahyawijaya, Samuel; Yu, Tiezheng; Barezi, Elham J; Fung, Pascale

Computer Science > Computation and Language

arXiv:2207.02663 (cs)

[Submitted on 6 Jul 2022]

Title:Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Authors:Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale Fung

View PDF

Abstract:With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major languages, such as English and Chinese. There is a huge data scarcity issue for low-resource languages, hindering the development of research and applications for broader communities. Therefore, it is crucial to have more benchmarks to raise awareness and motivate the research in low-resource languages. To mitigate this problem, we collect a new dataset, namely Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car speech recognition in the Cantonese language with video and audio data. Together with it, we propose Cantonese Audio-Visual Speech Recognition for In-car Commands as a new challenge for the community to tackle low-resource speech recognition under in-car scenarios.

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2207.02663 [cs.CL]
	(or arXiv:2207.02663v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2207.02663

Submission history

From: Wenliang Dai [view email]
[v1] Wed, 6 Jul 2022 13:31:56 UTC (1,250 KB)

Computer Science > Computation and Language

Title:Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators