Towards human-like spoken dialogue generation between AI agents from written dialogue

Mitsui, Kentaro; Hono, Yukiya; Sawada, Kei

Computer Science > Computation and Language

arXiv:2310.01088 (cs)

[Submitted on 2 Oct 2023]

Title:Towards human-like spoken dialogue generation between AI agents from written dialogue

Authors:Kentaro Mitsui, Yukiya Hono, Kei Sawada

View PDF

Abstract:The advent of large language models (LLMs) has made it possible to generate natural written dialogues between two agents. However, generating human-like spoken dialogues from these written dialogues remains challenging. Spoken dialogues have several unique characteristics: they frequently include backchannels and laughter, and the smoothness of turn-taking significantly influences the fluidity of conversation. This study proposes CHATS - CHatty Agents Text-to-Speech - a discrete token-based system designed to generate spoken dialogues based on written dialogues. Our system can generate speech for both the speaker side and the listener side simultaneously, using only the transcription from the speaker side, which eliminates the need for transcriptions of backchannels or laughter. Moreover, CHATS facilitates natural turn-taking; it determines the appropriate duration of silence after each utterance in the absence of overlap, and it initiates the generation of overlapping speech based on the phoneme sequence of the next utterance in case of overlap. Experimental evaluations indicate that CHATS outperforms the text-to-speech baseline, producing spoken dialogues that are more interactive and fluid while retaining clarity and intelligibility.

Comments:	18 pages, 8 figures, 9 tables, audio samples: this https URL
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2310.01088 [cs.CL]
	(or arXiv:2310.01088v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.01088

Submission history

From: Kentaro Mitsui [view email]
[v1] Mon, 2 Oct 2023 11:03:20 UTC (765 KB)

Computer Science > Computation and Language

Title:Towards human-like spoken dialogue generation between AI agents from written dialogue

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Towards human-like spoken dialogue generation between AI agents from written dialogue

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators