Search | arXiv e-print repository

Language Proficiency and F0 Entrainment: A Study of L2 English Imitation in Italian, French, and Slovak Speakers

Authors: Zheng Yuan, Štefan Beňuš, Alessandro D'Ausilio

Abstract: This study explores F0 entrainment in second language (L2) English speech imitation during an Alternating Reading Task (ART). Participants with Italian, French, and Slovak native languages imitated English utterances, and their F0 entrainment was quantified using the Dynamic Time Warping (DTW) distance between the parameterized F0 contours of the imitated utterances and those of the model utteranc… ▽ More This study explores F0 entrainment in second language (L2) English speech imitation during an Alternating Reading Task (ART). Participants with Italian, French, and Slovak native languages imitated English utterances, and their F0 entrainment was quantified using the Dynamic Time Warping (DTW) distance between the parameterized F0 contours of the imitated utterances and those of the model utterances. Results indicate a nuanced relationship between L2 English proficiency and entrainment: speakers with higher proficiency generally exhibit less entrainment in pitch variation and declination. However, within dyads, the more proficient speakers demonstrate a greater ability to mimic pitch range, leading to increased entrainment. This suggests that proficiency influences entrainment differently at individual and dyadic levels, highlighting the complex interplay between language skill and prosodic adaptation. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Accepted at Speech Prosody 2024

arXiv:2404.02710 [pdf, other]

ART: The Alternating Reading Task Corpus for Speech Entrainment and Imitation

Authors: Zheng Yuan, Dorina de Jong, Štefan Beňuš, Noël Nguyen, Ruitao Feng, Róbert Sabo, Luciano Fadiga, Alessandro D`Ausilio

Abstract: We introduce the Alternating Reading Task (ART) Corpus, a collection of dyadic sentence reading for studying the entrainment and imitation behaviour in speech communication. The ART corpus features three experimental conditions - solo reading, alternating reading, and deliberate imitation - as well as three sub-corpora encompassing French-, Italian-, and Slovak-accented English. This design allows… ▽ More We introduce the Alternating Reading Task (ART) Corpus, a collection of dyadic sentence reading for studying the entrainment and imitation behaviour in speech communication. The ART corpus features three experimental conditions - solo reading, alternating reading, and deliberate imitation - as well as three sub-corpora encompassing French-, Italian-, and Slovak-accented English. This design allows systematic investigation of speech entrainment in a controlled and less-spontaneous setting. Alongside detailed transcriptions, it includes English proficiency scores, demographics, and in-experiment questionnaires for probing linguistic, personal and interpersonal influences on entrainment. Our presentation covers its design, collection, annotation processes, initial analysis, and future research prospects. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 15 pages, 2 figures, 7 tables, accepted at LREC-COLING 2024 conference

arXiv:2312.16599 [pdf, ps, other]

doi 10.21437/Interspeech.2023-1947

Relationship between auditory and semantic entrainment using Deep Neural Networks (DNN)

Authors: Jay Kejriwal, Štefan Beňuš

Abstract: The tendency of people to engage in similar, matching, or synchronized behaviour when interacting is known as entrainment. Many studies examined linguistic (syntactic and lexical structures) and paralinguistic (pitch, intensity) entrainment, but less attention was given to finding the relationship between them. In this study, we utilized state-of-the-art DNN embeddings such as BERT and TRIpLet Los… ▽ More The tendency of people to engage in similar, matching, or synchronized behaviour when interacting is known as entrainment. Many studies examined linguistic (syntactic and lexical structures) and paralinguistic (pitch, intensity) entrainment, but less attention was given to finding the relationship between them. In this study, we utilized state-of-the-art DNN embeddings such as BERT and TRIpLet Loss network (TRILL) vectors to extract features for measuring semantic and auditory similarities of turns within dialogues in two comparable spoken corpora of two different languages. We found people's tendency to entrain on semantic features more when compared to auditory features. Additionally, we found that entrainment in semantic and auditory linguistic features are positively correlated. The findings of this study might assist in implementing the mechanism of entrainment in human-machine interaction (HMI). △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: Interspeech 2023

arXiv:2312.15098 [pdf, other]

doi 10.21437/Interspeech.2023-1929

Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks

Authors: Jay Kejriwal, Stefan Benus, Lina M. Rojas-Barahona

Abstract: Speakers tend to engage in adaptive behavior, known as entrainment, when they become similar to their interlocutor in various aspects of speaking. We present an unsupervised deep learning framework that derives meaningful representation from textual features for developing semantic entrainment. We investigate the model's performance by extracting features using different variations of the BERT mod… ▽ More Speakers tend to engage in adaptive behavior, known as entrainment, when they become similar to their interlocutor in various aspects of speaking. We present an unsupervised deep learning framework that derives meaningful representation from textual features for developing semantic entrainment. We investigate the model's performance by extracting features using different variations of the BERT model (DistilBERT and XLM-RoBERTa) and Google's universal sentence encoder (USE) embeddings on two human-human (HH) corpora (The Fisher Corpus English Part 1, Columbia games corpus) and one human-machine (HM) corpus (Voice Assistant Conversation Corpus (VACC)). In addition to semantic features we also trained DNN-based models utilizing two auditory embeddings (TRIpLet Loss network (TRILL) vectors, Low-level descriptors (LLD) features) and two units of analysis (Inter pausal unit and Turn). The results show that semantic entrainment can be assessed with our model, that models can distinguish between HH and HM interactions and that the two units of analysis for extracting acoustic features provide comparable findings. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: Interspeech2023

arXiv:1805.11564 [pdf, other]

doi 10.1016/j.specom.2018.04.009

Entrainment profiles: Comparison by gender, role, and feature set

Authors: Uwe D. Reichel, Štefan Beňuš, Katalin Mády

Abstract: We examine prosodic entrainment in cooperative game dialogs for new feature sets describing register, pitch accent shape, and rhythmic aspects of utterances. For these as well as for established features we present entrainment profiles to detect within- and across-dialog entrainment by the speakers' gender and role in the game. It turned out, that feature sets undergo entrainment in different quan… ▽ More We examine prosodic entrainment in cooperative game dialogs for new feature sets describing register, pitch accent shape, and rhythmic aspects of utterances. For these as well as for established features we present entrainment profiles to detect within- and across-dialog entrainment by the speakers' gender and role in the game. It turned out, that feature sets undergo entrainment in different quantitative and qualitative ways, which can partly be attributed to their different functions. Furthermore, interactions between speaker gender and role (describer vs. follower) suggest gender-dependent strategies in cooperative solution-oriented interactions: female describers entrain most, male describers least. Our data suggests a slight advantage of the latter strategy on task success. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: Accepted Manuscript for Speech Communication (Elsevier), 25 April 2018

Journal ref: U.D. Reichel, Š. Beňuš, K. Mády. Entrainment profiles: Comparison by gender, role, and feature set. Speech Communication, 100:46-57, 2018

arXiv:1601.05991 [pdf, other]

doi 10.1016/j.csl.2016.10.001

Speech vocoding for laboratory phonology

Authors: Milos Cernak, Stefan Benus, Alexandros Lazaridis

Abstract: Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal. Our goal is to make a step towards bridging phonology and speech processing and to contribute to the program of Laboratory Phonology. We show three application exampl… ▽ More Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal. Our goal is to make a step towards bridging phonology and speech processing and to contribute to the program of Laboratory Phonology. We show three application examples for laboratory phonology: compositional phonological speech modelling, a comparison of phonological systems and an experimental phonological parametric text-to-speech (TTS) system. The featural representations of the following three phonological systems are considered in this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English (SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded speech, we conclude that the latter achieves slightly better results than the former. However, GP - the most compact phonological speech representation - performs comparably to the systems with a higher number of phonological features. The parametric TTS based on phonological speech representation, and trained from an unlabelled audiobook in an unsupervised manner, achieves intelligibility of 85% of the state-of-the-art parametric speech synthesis. We envision that the presented approach paves the way for researchers in both fields to form meaningful hypotheses that are explicitly testable using the concepts developed and exemplified in this paper. On the one hand, laboratory phonologists might test the applied concepts of their theoretical models, and on the other hand, the speech processing community may utilize the concepts developed for the theoretical phonological models for improvements of the current state-of-the-art applications. △ Less

Submitted 15 September, 2016; v1 submitted 22 January, 2016; originally announced January 2016.

Report number: Idiap-RR-07-2016

Journal ref: Computer Speech & Language, Volume 42, March 2017, Pages 100-121

Showing 1–6 of 6 results for author: Benus, S