-
Finding phonemes: improving machine lip-reading
Abstract: In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated pho… ▽ More
Submitted 3 October, 2017; originally announced October 2017.
Journal ref: Helen L. Bear, Richard W. Harvey, Yuxuan Lan. Finding phonemes: improving machine lip-reading. Audio-Visual Speech Processing (AVSP), 2015 p115-120
-
Speaker-independent machine lip-reading with speaker-dependent viseme classifiers
Abstract: In machine lip-reading, which is identification of speech from visual-only information, there is evidence to show that visual speech is highly dependent upon the speaker [1]. Here, we use a phoneme-clustering method to form new phoneme-to-viseme maps for both individual and multiple speakers. We use these maps to examine how similarly speakers talk visually. We conclude that broadly speaking, spea… ▽ More
Submitted 3 October, 2017; originally announced October 2017.
Journal ref: Helen L. Bear, Stephen J. Cox, Richard W. Harvey, Speaker-independent machine lip-reading with speaker-dependent viseme classifiers. Audio-Visual Speech Processing (AVSP) 2015, p190-195
-
Which phoneme-to-viseme maps best improve visual-only computer lip-reading?
Abstract: A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings… ▽ More
Submitted 3 October, 2017; originally announced October 2017.
Journal ref: Helen L. Bear, Richard W. Harvey, Barry-John Theobald, and Yuxuan Lan. Which phoneme-to-viseme maps best improve visual-only computer lip-reading? Advances in Visual Computing 2014. p230-239