Mitigating Unauthorized Speech Synthesis for Voice Protection

Zhang, Zhisheng; Yang, Qianyi; Wang, Derui; Huang, Pengyang; Cao, Yuxin; Ye, Kai; Hao, Jie

doi:10.1145/3689217.3690615

Computer Science > Sound

arXiv:2410.20742 (cs)

[Submitted on 28 Oct 2024]

Title:Mitigating Unauthorized Speech Synthesis for Voice Protection

Authors:Zhisheng Zhang, Qianyi Yang, Derui Wang, Pengyang Huang, Yuxin Cao, Kai Ye, Jie Hao

View PDF HTML (experimental)

Abstract:With just a few speech samples, it is possible to perfectly replicate a speaker's voice in recent years, while malicious voice exploitation (e.g., telecom fraud for illegal financial gain) has brought huge hazards in our daily lives. Therefore, it is crucial to protect publicly accessible speech data that contains sensitive information, such as personal voiceprints. Most previous defense methods have focused on spoofing speaker verification systems in timbre similarity but the synthesized deepfake speech is still of high quality. In response to the rising hazards, we devise an effective, transferable, and robust proactive protection technology named Pivotal Objective Perturbation (POP) that applies imperceptible error-minimizing noises on original speech samples to prevent them from being effectively learned for text-to-speech (TTS) synthesis models so that high-quality deepfake speeches cannot be generated. We conduct extensive experiments on state-of-the-art (SOTA) TTS models utilizing objective and subjective metrics to comprehensively evaluate our proposed method. The experimental results demonstrate outstanding effectiveness and transferability across various models. Compared to the speech unclarity score of 21.94% from voice synthesizers trained on samples without protection, POP-protected samples significantly increase it to 127.31%. Moreover, our method shows robustness against noise reduction and data augmentation techniques, thereby greatly reducing potential hazards.

Comments:	Accepted to ACM CCS Workshop (LAMPS) 2024
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2410.20742 [cs.SD]
	(or arXiv:2410.20742v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2410.20742
Related DOI:	https://doi.org/10.1145/3689217.3690615

Submission history

From: Zhisheng Zhang [view email]
[v1] Mon, 28 Oct 2024 05:16:37 UTC (345 KB)

Computer Science > Sound

Title:Mitigating Unauthorized Speech Synthesis for Voice Protection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Mitigating Unauthorized Speech Synthesis for Voice Protection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators