Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Faisal, M; Manzoor, Sanaullah

Computer Science > Computer Vision and Pattern Recognition

arXiv:1802.05521 (cs)

[Submitted on 15 Feb 2018]

Title:Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Authors:M Faisal, Sanaullah Manzoor

View PDF

Abstract:Human lip-reading is a challenging task. It requires not only knowledge of underlying language but also visual clues to predict spoken words. Experts need certain level of experience and understanding of visual expressions learning to decode spoken words. Now-a-days, with the help of deep learning it is possible to translate lip sequences into meaningful words. The speech recognition in the noisy environments can be increased with the visual information [1]. To demonstrate this, in this project, we have tried to train two different deep-learning models for lip-reading: first one for video sequences using spatiotemporal convolution neural network, Bi-gated recurrent neural network and Connectionist Temporal Classification Loss, and second for audio that inputs the MFCC features to a layer of LSTM cells and output the sequence. We have also collected a small audio-visual dataset to train and test our model. Our target is to integrate our both models to improve the speech recognition in the noisy environment

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1802.05521 [cs.CV]
	(or arXiv:1802.05521v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1802.05521

Submission history

From: Sanaullah Manzoor [view email]
[v1] Thu, 15 Feb 2018 13:28:19 UTC (651 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2018-02

Change to browse by:

cs
cs.SD
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

M. Faisal
Muhammad Faisal
Sanaullah Manzoor

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators