ReverBERT: A State Space Model for Efficient Text-Driven Speech Style Transfer

Brown, Michael; Martinez, Sofia; Singh, Priya

Abstract:Text-driven speech style transfer aims to mold the intonation, pace, and timbre of a spoken utterance to match stylistic cues from text descriptions. While existing methods leverage large-scale neural architectures or pre-trained language models, the computational costs often remain high. In this paper, we present \emph{ReverBERT}, an efficient framework for text-driven speech style transfer that draws inspiration from a state space model (SSM) paradigm, loosely motivated by the image-based method of Wang and Liu~\cite{wang2024stylemamba}. Unlike image domain techniques, our method operates in the speech space and integrates a discrete Fourier transform of latent speech features to enable smooth and continuous style modulation. We also propose a novel \emph{Transformer-based SSM} layer for bridging textual style descriptors with acoustic attributes, dramatically reducing inference time while preserving high-quality speech characteristics. Extensive experiments on benchmark speech corpora demonstrate that \emph{ReverBERT} significantly outperforms baselines in terms of naturalness, expressiveness, and computational efficiency. We release our model and code publicly to foster further research in text-driven speech style transfer.

Subjects:	Graphics (cs.GR); Computation and Language (cs.CL)
Cite as:	arXiv:2503.20992 [cs.GR]
	(or arXiv:2503.20992v1 [cs.GR] for this version)
	https://doi.org/10.48550/arXiv.2503.20992

Computer Science > Graphics

Title:ReverBERT: A State Space Model for Efficient Text-Driven Speech Style Transfer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators