Audio and Speech Processing

Authors and titles for recent submissions

See today's new changes

Total of 90 entries : 1-50 51-90

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2507.01888 [pdf, other]: Title: Perceptual Ratings Predict Speech Inversion Articulatory Kinematics in Childhood Speech Sound Disorders

Nina R. Benway, Saba Tabatabaee, Dongliang Wang, Benjamin Munson, Jonathan L. Preston, Carol Espy-Wilson

Comments: This manuscript is in submission for publication. It has not yet been peer reviewed

Subjects: Audio and Speech Processing (eess.AS)
[2] arXiv:2507.01821 [pdf, html, other]: Title: Low-Complexity Neural Wind Noise Reduction for Audio Recordings

Hesam Eftekhari, Srikanth Raj Chetupalli, Shrishti Saha Shetu, Emanuël A. P. Habets, Oliver Thiergart

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[3] arXiv:2507.01765 [pdf, html, other]: Title: First Steps Towards Voice Anonymization for Code-Switching Speech

Sarina Meyer, Ekaterina Kolos, Ngoc Thang Vu

Comments: accepted at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2507.01750 [pdf, html, other]: Title: Generalizable Detection of Audio Deepfakes

Jose A. Lopez, Georg Stemmer, Héctor Cordourier Maruri

Comments: 8 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[5] arXiv:2507.01611 [pdf, html, other]: Title: QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model

Shaowen Chen, Tomoki Toda

Comments: This manuscript is currently under review for publication in the IEEE Transactions on Audio, Speech, and Language Processing. This work has been submitted to the IEEE for possible publication

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[6] arXiv:2507.01356 [pdf, html, other]: Title: Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora

Hitoshi Suda, Shinnosuke Takamichi, Satoru Fukayama

Comments: Accepted at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2507.01349 [pdf, html, other]: Title: IdolSongsJp Corpus: A Multi-Singer Song Corpus in the Style of Japanese Idol Groups

Hitoshi Suda, Junya Koguchi, Shunsuke Yoshida, Tomohiko Nakamura, Satoru Fukayama, Jun Ogata

Comments: Accepted at ISMIR 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2507.01348 [pdf, html, other]: Title: SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech

Cheng Zhuangfei, Zhang Guangyan, Tu Zehai, Song Yangyang, Mao Shuiyang, Jiao Xiaoqi, Li Jingyu, Guo Yiwen, Wu Jiasong

Comments: 10 pages, includes references, 4 figures, 4 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2507.01172 [pdf, html, other]: Title: Classical Guitar Duet Separation using GuitarDuets -- a Dataset of Real and Synthesized Guitar Recordings

Marios Glytsos, Christos Garoufis, Athanasia Zlatintsi, Petros Maragos

Comments: In Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR 2024), San Francisco, USA, November 2024. The dataset is available at: this https URL

Journal-ref: Proc. of the 25th Int. Society for Music Information Retrieval Conf. (ISMIR), San Francisco, USA, 2024, pp. 95-102

Subjects: Audio and Speech Processing (eess.AS)
[10] arXiv:2507.01024 [pdf, other]: Title: Hello Afrika: Speech Commands in Kinyarwanda

George Igwegbe, Martins Awojide, Mboh Bless, Nirel Kadzo

Comments: Data Science Africa, 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[11] arXiv:2507.01022 [pdf, html, other]: Title: Workflow-Based Evaluation of Music Generation Systems

Shayan Dadman, Bernt Arild Bremdal, Andreas Bergsland

Comments: 54 pages, 3 figures, 6 tables, 5 appendices

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[12] arXiv:2507.01021 [pdf, html, other]: Title: Scalable Offline ASR for Command-Style Dictation in Courtrooms

Kumarmanas Nethil, Vaibhav Mishra, Kriti Anandan, Kavya Manohar

Comments: Accepted to Interspeech 2025 Show & Tell

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[13] arXiv:2507.01931 (cross-list from cs.CL) [pdf, html, other]: Title: Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2507.01805 (cross-list from cs.SD) [pdf, html, other]: Title: A Dataset for Automatic Assessment of TTS Quality in Spanish

Alejandro Sosa Welford, Leonardo Pepino

Comments: 5 pages, 2 figures. Accepted at Interspeech 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15] arXiv:2507.01582 (cross-list from cs.SD) [pdf, html, other]: Title: Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder

Jing Luo, Xinyu Yang, Jie Wei

Comments: Accepted by IEEE SMC 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[16] arXiv:2507.01563 (cross-list from cs.SD) [pdf, html, other]: Title: Real-Time Emergency Vehicle Siren Detection with Efficient CNNs on Embedded Hardware

Marco Giordano, Stefano Giacomelli, Claudia Rinaldi, Fabio Graziosi

Comments: 10 pages, 10 figures, submitted to this https URL. arXiv admin note: text overlap with arXiv:2506.23437

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[17] arXiv:2507.01339 (cross-list from cs.SD) [pdf, other]: Title: User-guided Generative Source Separation

Yutong Wen, Minje Kim, Paris Smaragdis

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18] arXiv:2507.01143 (cross-list from cs.RO) [pdf, html, other]: Title: A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods

Reza Jalayer, Masoud Jalayer, Amirali Baniasadi

Comments: 35 pages

Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[19] arXiv:2507.00874 [pdf, html, other]: Title: Improving Stereo 3D Sound Event Localization and Detection: Perceptual Features, Stereo-specific Data Augmentation, and Distance Normalization

Jun-Wei Yeow, Ee-Leng Tan, Santi Peksi, Woon-Seng Gan

Comments: Technical report for DCASE 2025 Challenge Task 3

Subjects: Audio and Speech Processing (eess.AS)
[20] arXiv:2507.00755 [pdf, other]: Title: LearnAFE: Circuit-Algorithm Co-design Framework for Learnable Audio Analog Front-End

Jinhai Hu, Zhongyi Zhang, Cong Sheng Leow, Wang Ling Goh, Yuan Gao

Comments: 11 pages, 15 figures, accepted for publication on IEEE Transactions on Circuits and Systems I: Regular Papers

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[21] arXiv:2507.00458 [pdf, html, other]: Title: Mitigating Language Mismatch in SSL-Based Speaker Anonymization

Zhe Zhang, Wen-Chin Huang, Xin Wang, Xiaoxiao Miao, Junichi Yamagishi

Comments: Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22] arXiv:2507.00324 [pdf, html, other]: Title: Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges

Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Adupa, Lekha Bollinani, Hafiz Malik

Subjects: Audio and Speech Processing (eess.AS)
[23] arXiv:2507.00227 [pdf, html, other]: Title: Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis

Paul Mayer, Florian Lux, Alejandro Pérez-González-de-Martos, Angelina Elizarova, Lindsey Vanderlyn, Dirk Väth, Ngoc Thang Vu

Comments: Accepted at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[24] arXiv:2507.00155 [pdf, html, other]: Title: Do Music Source Separation Models Preserve Spatial Information in Binaural Audio?

Richa Namballa, Agnieszka Roginska, Magdalena Fuentes

Comments: 6 pages + references, 4 figures, 2 tables, 26th International Society for Music Information Retrieval (ISMIR) Conference

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[25] arXiv:2507.00966 (cross-list from cs.SD) [pdf, html, other]: Title: MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement

Nikolai Lund Kühne, Jesper Jensen, Jan Østergaard, Zheng-Hua Tan

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing for possible publication

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[26] arXiv:2507.00808 (cross-list from cs.SD) [pdf, html, other]: Title: Multi-interaction TTS toward professional recording reproduction

Hiroki Kanagawa, Kenichi Fujita, Aya Watanabe, Yusuke Ijima

Comments: 7 pages,6 figures, Accepted to Speech Synthesis Workshop 2025 (SSW13)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[27] arXiv:2507.00693 (cross-list from cs.SD) [pdf, html, other]: Title: Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection

Yifan Gao, Jiao Fu, Long Guo, Hong Liu

Comments: Accepted to Interspeech 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[28] arXiv:2507.00498 (cross-list from cs.SD) [pdf, html, other]: Title: MuteSwap: Silent Face-based Voice Conversion

Yifan Liu, Yu Fang, Zhouhan Lin

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[29] arXiv:2507.00475 (cross-list from cs.SD) [pdf, html, other]: Title: AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences

Minoru Kishi, Ryosuke Sakai, Shinnosuke Takamichi, Yusuke Kanamori, Yuki Okamoto

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30] arXiv:2507.00466 (cross-list from cs.SD) [pdf, html, other]: Title: Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture

Sebastian Murgul, Michael Heizmann

Comments: Accepted to the 22nd Sound and Music Computing Conference (SMC), 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[31] arXiv:2507.00229 (cross-list from cs.SD) [pdf, html, other]: Title: A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss

Tarikul Islam Tamiti, Biraj Joshi, Rida Hasan, Rashedul Hasan, Taieba Athay, Nursad Mamun, Anomadarshi Barua

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[32] arXiv:2507.00055 (cross-list from cs.LG) [pdf, html, other]: Title: Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation

Varsha Pendyala, Pedro Morgado, William Sethares

Comments: Accepted at INTERSPEECH 2025

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV); Signal Processing (eess.SP)

[33] arXiv:2506.23874 [pdf, html, other]: Title: URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition

Jiahe Wang, Chenda Li, Wei Wang, Wangyou Zhang, Samuele Cornell, Marvin Sach, Robin Scheibler, Kohei Saijo, Yihui Fu, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian

Comments: Submitted to ASRU2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[34] arXiv:2506.23859 [pdf, html, other]: Title: Less is More: Data Curation Matters in Scaling Speech Enhancement

Chenda Li, Wangyou Zhang, Wei Wang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Yihui Fu, Marvin Sach, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian

Comments: Submitted to ASRU2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[35] arXiv:2506.23553 [pdf, html, other]: Title: Human-CLAP: Human-perception-based contrastive language-audio pretraining

Taisei Takano, Yuki Okamoto, Yusuke Kanamori, Yuki Saito, Ryotaro Nagase, Hiroshi Saruwatari

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36] arXiv:2506.23371 [pdf, html, other]: Title: Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation

Frank Cwitkowitz, Zhiyao Duan

Comments: Accepted to ISMIR 2025

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[37] arXiv:2506.22972 [pdf, html, other]: Title: Adaptable Non-parametric Approach for Speech-based Symptom Assessment: Isolating Private Medical Data in a Retrieval Datastore

Yu-Wen Chen, Julia Hirschberg

Comments: IEEE MLSP 2025

Subjects: Audio and Speech Processing (eess.AS)
[38] arXiv:2506.22646 [pdf, html, other]: Title: Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR

Weiqing Wang, Taejin Park, Ivan Medennikov, Jinhan Wang, Kunal Dhawan, He Huang, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

Comments: Accepted by INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:2506.23986 (cross-list from cs.SD) [pdf, html, other]: Title: StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding

Dake Guo, Jixun Yao, Linhan Ma, He Wang, Lei Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40] arXiv:2506.23873 (cross-list from cs.SD) [pdf, html, other]: Title: Emergent musical properties of a transformer under contrastive self-supervised learning

Yuexuan Kong, Gabriel Meseguer-Brocal, Vincent Lostanlen, Mathieu Lagrange, Romain Hennequin

Comments: Accepted at ISMIR 2025

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[41] arXiv:2506.23869 (cross-list from cs.SD) [pdf, html, other]: Title: Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

Louis Bradshaw, Honglu Fan, Alexander Spangher, Stella Biderman, Simon Colton

Comments: ISMIR (2025)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[42] arXiv:2506.23670 (cross-list from cs.SD) [pdf, html, other]: Title: Efficient Interleaved Speech Modeling through Knowledge Distillation

Mohammadmahdi Nouriborji, Morteza Rohanian

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[43] arXiv:2506.23582 (cross-list from cs.SD) [pdf, html, other]: Title: RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio

Yusuke Kanamori, Yuki Okamoto, Taisei Takano, Shinnosuke Takamichi, Yuki Saito, Hiroshi Saruwatari

Comments: Accepted to INTERSPEECH2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2506.23552 (cross-list from cs.CV) [pdf, html, other]: Title: JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching

Mingi Kwon, Joonghyuk Shin, Jaeseok Jung, Jaesik Park, Youngjung Uh

Comments: project page: this https URL Under review. Preprint published on arXiv

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45] arXiv:2506.23437 (cross-list from cs.SD) [pdf, html, other]: Title: From Large-scale Audio Tagging to Real-Time Explainable Emergency Vehicle Sirens Detection

Stefano Giacomelli, Marco Giordano, Claudia Rinaldi, Fabio Graziosi

Comments: pre-print (submitted to the IEEE/ACM Transactions on Audio, Speech, and Language Processing)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[46] arXiv:2506.23367 (cross-list from cs.SD) [pdf, html, other]: Title: You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties

Paige Tuttösí, H. Henny Yeung, Yue Wang, Jean-Julien Aucouturier, Angelica Lim

Comments: Accepted to ISCA Speech Synthesis Workshop, 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[47] arXiv:2506.23325 (cross-list from cs.SD) [pdf, html, other]: Title: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

Yitian Gong, Luozhijie Jin, Ruifan Deng, Dong Zhang, Xin Zhang, Qinyuan Cheng, Zhaoye Fei, Shimin Li, Xipeng Qiu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[48] arXiv:2506.23130 (cross-list from cs.SD) [pdf, html, other]: Title: The Florence Price Art Song Dataset and Piano Accompaniment Generator

Tao-Tao He, Martin E. Malandro, Douglas Shadle

Comments: 8 pages, 4 figures. To appear in the proceedings of ISMIR 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:2506.23094 (cross-list from cs.SD) [pdf, html, other]: Title: TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure

Qi He, Gus Xia, Ziyu Wang

Comments: 9 pages, 4 figures, 2 tables. To be published in ISMIR 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[50] arXiv:2506.23049 (cross-list from cs.AI) [pdf, html, other]: Title: AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks

Leander Melroy Maben, Gayathri Ganesh Lakshmy, Srijith Radhakrishnan, Siddhant Arora, Shinji Watanabe

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 90 entries : 1-50 51-90

Showing up to 50 entries per page: fewer | more | all

Audio and Speech Processing

Authors and titles for recent submissions

Thu, 3 Jul 2025 (showing 18 of 18 entries )

Wed, 2 Jul 2025 (showing 14 of 14 entries )

Tue, 1 Jul 2025 (showing first 18 of 26 entries )