Alternative Visual Units for an Optimized Phoneme-Based Lipreading System

Bear, Helen; Harvey, Richard

doi:10.3390/app9183870

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:1909.07147 (eess)

[Submitted on 16 Sep 2019]

Title:Alternative Visual Units for an Optimized Phoneme-Based Lipreading System

Authors:Helen Bear, Richard Harvey

View PDF

Abstract:Lipreading is understanding speech from observed lip movements. An observed series of lip motions is an ordered sequence of visual lip gestures. These gestures are commonly known, but as yet are not formally defined, as `visemes'. In this article, we describe a structured approach which allows us to create speaker-dependent visemes with a fixed number of visemes within each set. We create sets of visemes for sizes two to 45. Each set of visemes is based upon clustering phonemes, thus each set has a unique phoneme-to-viseme mapping. We first present an experiment using these maps and the Resource Management Audio-Visual (RMAV) dataset which shows the effect of changing the viseme map size in speaker-dependent machine lipreading and demonstrate that word recognition with phoneme classifiers is possible. Furthermore, we show that there are intermediate units between visemes and phonemes which are better still. Second, we present a novel two-pass training scheme for phoneme classifiers. This approach uses our new intermediary visual units from our first experiment in the first pass as classifiers; before using the phoneme-to-viseme maps, we retrain these into phoneme classifiers. This method significantly improves on previous lipreading results with RMAV speakers.

Comments:	Accepted and published in Applied Sciences, 22pgs plus appendices and references
Subjects:	Image and Video Processing (eess.IV); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1909.07147 [eess.IV]
	(or arXiv:1909.07147v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.1909.07147
Journal reference:	Applied. Sciences. 2019, 9(18), 3870
Related DOI:	https://doi.org/10.3390/app9183870

Submission history

From: Helen L Bear [view email]
[v1] Mon, 16 Sep 2019 12:20:54 UTC (12,128 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Alternative Visual Units for an Optimized Phoneme-Based Lipreading System

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Alternative Visual Units for an Optimized Phoneme-Based Lipreading System

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators