Sound

Authors and titles for recent submissions

See today's new changes

Total of 80 entries : 1-50 51-80

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2507.02666 [pdf, html, other]: Title: ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning

Junyu Wang, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

Comments: Accepted at Interspeech2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[2] arXiv:2507.02606 [pdf, html, other]: Title: De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu

Comments: Accepted by ICML 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2507.02391 [pdf, other]: Title: Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

Mostafa Sadeghi (MULTISPEECH), Jean-Eudes Ayilo (MULTISPEECH), Romain Serizel (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN)

Journal-ref: IEEE Signal Processing Letters, pp.1-5

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2507.02380 [pdf, html, other]: Title: JoyTTS: LLM-based Spoken Chatbot With Voice Cloning

Fangru Zhou, Jun Zhao, Guoxin Wang

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[5] arXiv:2507.02273 [pdf, html, other]: Title: Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures

Yen-Tung Yeh, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yi-Hsuan Yang, Yuki Mitsufuji

Comments: ISMIR 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2507.02176 [pdf, html, other]: Title: Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis

Marc-André Carbonneau, Benjamin van Niekerk, Hugo Seuté, Jean-Philippe Letendre, Herman Kamper, Julian Zaïdi

Comments: Accepted at SSW13 - Interspeech 2025 Speech Synthesis Workshop

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7] arXiv:2507.01974 [pdf, other]: Title: Acoustic evaluation of a neural network dedicated to the detection of animal vocalisations

Jérémy Rouch (CRNL-ENES), M Ducrettet (CRNL-ENES, ISYEB), S Haupert (ISYEB), R Emonet (LabHC), F Sèbe (CRNL-ENES, OFB - DRAS)

Journal-ref: 17e Congr{\`e}s Fran{\c c}ais d'Acoustique, soci{\'e}t{\'e} fran{\c c}aise d'acoustique, Apr 2025, Paris Universit{\'e} Sorbonne Nouvelle, France

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[8] arXiv:2507.02815 (cross-list from eess.AS) [pdf, html, other]: Title: Towards Perception-Informed Latent HRTF Representations

You Zhang, Andrew Francl, Ruohan Gao, Paul Calamia, Zhiyao Duan, Ishwarya Ananthabhotla

Comments: Accepted by IEEE WASPAA 2025, camera-ready version

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2507.02791 (cross-list from eess.AS) [pdf, html, other]: Title: Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance

Jakob Kienegger, Alina Mannanova, Huajian Fang, Timo Gerkmann

Comments: Accepted at IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[10] arXiv:2507.02768 (cross-list from eess.AS) [pdf, html, other]: Title: DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Sung-Feng Huang, Chih-Kai Yang, Chee-En Yu, Chun-Wei Chen, Wei-Chih Chen, Chien-yu Huang, Yi-Cheng Lin, Yu-Xiang Lin, Chi-An Fu, Chun-Yi Kuan, Wenze Ren, Xuanjun Chen, Wei-Ping Huang, En-Pei Hu, Tzu-Quan Lin, Yuan-Kuei Wu, Kuan-Po Huang, Hsiao-Ying Huang, Huang-Cheng Chou, Kai-Wei Chang, Cheng-Han Chiang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

Comments: Model and code available at: this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[11] arXiv:2507.02599 (cross-list from cs.LG) [pdf, html, other]: Title: Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data

Sertac Kilickaya, Levent Eren

Comments: Submitted to the Journal of Vibration Engineering & Technologies

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Systems and Control (eess.SY)
[12] arXiv:2507.02562 (cross-list from eess.AS) [pdf, html, other]: Title: Multi-Utterance Speech Separation and Association Trained on Short Segments

Yuzhu Wang, Archontis Politis, Konstantinos Drossos, Tuomas Virtanen

Comments: 5 pages, accepted by WASPAA 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13] arXiv:2507.02407 (cross-list from cs.CL) [pdf, other]: Title: Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability

Mark Atta Mensah, Isaac Wiafe, Akon Ekpezu, Justice Kwame Appati, Jamal-Deen Abdulai, Akosua Nyarkoa Wiafe-Akenten, Frank Ernest Yeboah, Gifty Odame

Comments: This version has been reviewed and accepted for presentation at the Future Technologies Conference (FTC) 2025, to be held on 6 & 7 November 2025 in Munich, Germany. 17 pages, 4 figures, 1 table

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2507.02109 (cross-list from cs.LG) [pdf, html, other]: Title: Parametric Neural Amp Modeling with Active Learning

Florian Grötschla, Luca A. Lanzendörfer, Longxiang Jiao, Roger Wattenhofer

Comments: Accepted at ISMIR 2025 as Late-Breaking Demo (LBD)

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15] arXiv:2507.02080 (cross-list from cs.MM) [pdf, html, other]: Title: TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation

Yubeen Lee, Sangeun Lee, Chaewon Park, Junyeop Cha, Eunil Park

Comments: 9 pages, 2 figures, 2 tables

Subjects: Multimedia (cs.MM); Sound (cs.SD)

[16] arXiv:2507.01805 [pdf, html, other]: Title: A Dataset for Automatic Assessment of TTS Quality in Spanish

Alejandro Sosa Welford, Leonardo Pepino

Comments: 5 pages, 2 figures. Accepted at Interspeech 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2507.01582 [pdf, html, other]: Title: Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder

Jing Luo, Xinyu Yang, Jie Wei

Comments: Accepted by IEEE SMC 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[18] arXiv:2507.01563 [pdf, html, other]: Title: Real-Time Emergency Vehicle Siren Detection with Efficient CNNs on Embedded Hardware

Marco Giordano, Stefano Giacomelli, Claudia Rinaldi, Fabio Graziosi

Comments: 10 pages, 10 figures, submitted to this https URL. arXiv admin note: text overlap with arXiv:2506.23437

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[19] arXiv:2507.01339 [pdf, other]: Title: User-guided Generative Source Separation

Yutong Wen, Minje Kim, Paris Smaragdis

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:2507.01931 (cross-list from cs.CL) [pdf, html, other]: Title: Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2507.01821 (cross-list from eess.AS) [pdf, html, other]: Title: Low-Complexity Neural Wind Noise Reduction for Audio Recordings

Hesam Eftekhari, Srikanth Raj Chetupalli, Shrishti Saha Shetu, Emanuël A. P. Habets, Oliver Thiergart

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[22] arXiv:2507.01750 (cross-list from eess.AS) [pdf, html, other]: Title: Generalizable Detection of Audio Deepfakes

Jose A. Lopez, Georg Stemmer, Héctor Cordourier Maruri

Comments: 8 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2507.01611 (cross-list from eess.AS) [pdf, html, other]: Title: QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model

Shaowen Chen, Tomoki Toda

Comments: This manuscript is currently under review for publication in the IEEE Transactions on Audio, Speech, and Language Processing. This work has been submitted to the IEEE for possible publication

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[24] arXiv:2507.01356 (cross-list from eess.AS) [pdf, html, other]: Title: Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora

Hitoshi Suda, Shinnosuke Takamichi, Satoru Fukayama

Comments: Accepted at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25] arXiv:2507.01349 (cross-list from eess.AS) [pdf, html, other]: Title: IdolSongsJp Corpus: A Multi-Singer Song Corpus in the Style of Japanese Idol Groups

Hitoshi Suda, Junya Koguchi, Shunsuke Yoshida, Tomohiko Nakamura, Satoru Fukayama, Jun Ogata

Comments: Accepted at ISMIR 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[26] arXiv:2507.01348 (cross-list from eess.AS) [pdf, html, other]: Title: SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech

Cheng Zhuangfei, Zhang Guangyan, Tu Zehai, Song Yangyang, Mao Shuiyang, Jiao Xiaoqi, Li Jingyu, Guo Yiwen, Wu Jiasong

Comments: 10 pages, includes references, 4 figures, 4 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[27] arXiv:2507.01143 (cross-list from cs.RO) [pdf, html, other]: Title: A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods

Reza Jalayer, Masoud Jalayer, Amirali Baniasadi

Comments: 35 pages

Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2507.01024 (cross-list from eess.AS) [pdf, other]: Title: Hello Afrika: Speech Commands in Kinyarwanda

George Igwegbe, Martins Awojide, Mboh Bless, Nirel Kadzo

Comments: Data Science Africa, 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[29] arXiv:2507.01022 (cross-list from eess.AS) [pdf, html, other]: Title: Workflow-Based Evaluation of Music Generation Systems

Shayan Dadman, Bernt Arild Bremdal, Andreas Bergsland

Comments: 54 pages, 3 figures, 6 tables, 5 appendices

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[30] arXiv:2507.01021 (cross-list from eess.AS) [pdf, html, other]: Title: Scalable Offline ASR for Command-Style Dictation in Courtrooms

Kumarmanas Nethil, Vaibhav Mishra, Kriti Anandan, Kavya Manohar

Comments: Accepted to Interspeech 2025 Show & Tell

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

[31] arXiv:2507.00966 [pdf, html, other]: Title: MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement

Nikolai Lund Kühne, Jesper Jensen, Jan Østergaard, Zheng-Hua Tan

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing for possible publication

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[32] arXiv:2507.00808 [pdf, html, other]: Title: Multi-interaction TTS toward professional recording reproduction

Hiroki Kanagawa, Kenichi Fujita, Aya Watanabe, Yusuke Ijima

Comments: 7 pages,6 figures, Accepted to Speech Synthesis Workshop 2025 (SSW13)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[33] arXiv:2507.00693 [pdf, html, other]: Title: Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection

Yifan Gao, Jiao Fu, Long Guo, Hong Liu

Comments: Accepted to Interspeech 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[34] arXiv:2507.00498 [pdf, html, other]: Title: MuteSwap: Silent Face-based Voice Conversion

Yifan Liu, Yu Fang, Zhouhan Lin

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[35] arXiv:2507.00475 [pdf, html, other]: Title: AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences

Minoru Kishi, Ryosuke Sakai, Shinnosuke Takamichi, Yusuke Kanamori, Yuki Okamoto

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:2507.00466 [pdf, html, other]: Title: Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture

Sebastian Murgul, Michael Heizmann

Comments: Accepted to the 22nd Sound and Music Computing Conference (SMC), 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[37] arXiv:2507.00229 [pdf, html, other]: Title: A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss

Tarikul Islam Tamiti, Biraj Joshi, Rida Hasan, Rashedul Hasan, Taieba Athay, Nursad Mamun, Anomadarshi Barua

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[38] arXiv:2507.00755 (cross-list from eess.AS) [pdf, other]: Title: LearnAFE: Circuit-Algorithm Co-design Framework for Learnable Audio Analog Front-End

Jinhai Hu, Zhongyi Zhang, Cong Sheng Leow, Wang Ling Goh, Yuan Gao

Comments: 11 pages, 15 figures, accepted for publication on IEEE Transactions on Circuits and Systems I: Regular Papers

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[39] arXiv:2507.00458 (cross-list from eess.AS) [pdf, html, other]: Title: Mitigating Language Mismatch in SSL-Based Speaker Anonymization

Zhe Zhang, Wen-Chin Huang, Xin Wang, Xiaoxiao Miao, Junichi Yamagishi

Comments: Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40] arXiv:2507.00155 (cross-list from eess.AS) [pdf, html, other]: Title: Do Music Source Separation Models Preserve Spatial Information in Binaural Audio?

Richa Namballa, Agnieszka Roginska, Magdalena Fuentes

Comments: 6 pages + references, 4 figures, 2 tables, 26th International Society for Music Information Retrieval (ISMIR) Conference

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[41] arXiv:2503.04995 (cross-list from eess.AS) [pdf, html, other]: Title: Musical Source Separation of Brazilian Percussion

Richa Namballa, Giovana Morais, Magdalena Fuentes

Comments: 2 pages + references, 1 figure, 1 table, Extended Abstracts for the Late-Breaking Demo Session of the 25th International Society for Music Information Retrieval Conference

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)

[42] arXiv:2506.23986 [pdf, html, other]: Title: StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding

Dake Guo, Jixun Yao, Linhan Ma, He Wang, Lei Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2506.23873 [pdf, html, other]: Title: Emergent musical properties of a transformer under contrastive self-supervised learning

Yuexuan Kong, Gabriel Meseguer-Brocal, Vincent Lostanlen, Mathieu Lagrange, Romain Hennequin

Comments: Accepted at ISMIR 2025

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44] arXiv:2506.23869 [pdf, html, other]: Title: Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

Louis Bradshaw, Honglu Fan, Alexander Spangher, Stella Biderman, Simon Colton

Comments: ISMIR (2025)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45] arXiv:2506.23670 [pdf, html, other]: Title: Efficient Interleaved Speech Modeling through Knowledge Distillation

Mohammadmahdi Nouriborji, Morteza Rohanian

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[46] arXiv:2506.23582 [pdf, html, other]: Title: RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio

Yusuke Kanamori, Yuki Okamoto, Taisei Takano, Shinnosuke Takamichi, Yuki Saito, Hiroshi Saruwatari

Comments: Accepted to INTERSPEECH2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47] arXiv:2506.23437 [pdf, html, other]: Title: From Large-scale Audio Tagging to Real-Time Explainable Emergency Vehicle Sirens Detection

Stefano Giacomelli, Marco Giordano, Claudia Rinaldi, Fabio Graziosi

Comments: pre-print (submitted to the IEEE/ACM Transactions on Audio, Speech, and Language Processing)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[48] arXiv:2506.23367 [pdf, html, other]: Title: You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties

Paige Tuttösí, H. Henny Yeung, Yue Wang, Jean-Julien Aucouturier, Angelica Lim

Comments: Accepted to ISCA Speech Synthesis Workshop, 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[49] arXiv:2506.23325 [pdf, html, other]: Title: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

Yitian Gong, Luozhijie Jin, Ruifan Deng, Dong Zhang, Xin Zhang, Qinyuan Cheng, Zhaoye Fei, Shimin Li, Xipeng Qiu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[50] arXiv:2506.23130 [pdf, html, other]: Title: The Florence Price Art Song Dataset and Piano Accompaniment Generator

Tao-Tao He, Martin E. Malandro, Douglas Shadle

Comments: 8 pages, 4 figures. To appear in the proceedings of ISMIR 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 80 entries : 1-50 51-80

Showing up to 50 entries per page: fewer | more | all

Sound

Authors and titles for recent submissions

Fri, 4 Jul 2025 (showing 15 of 15 entries )

Thu, 3 Jul 2025 (showing 15 of 15 entries )

Wed, 2 Jul 2025 (showing 11 of 11 entries )

Tue, 1 Jul 2025 (showing first 9 of 25 entries )