Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for April 2025

Total of 149 entries : 1-50 51-100 101-149
Showing up to 50 entries per page: fewer | more | all
[101] arXiv:2504.10650 (cross-list from cs.CY) [pdf, html, other]
Title: Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices
Éva Székely, Jūra Miniota, Míša (Michaela)Hejná
Comments: 5 pages, 0 figures, International Workshop on Spoken Dialogue Systems Technology (IWSDS) 2025
Journal-ref: Proceedings of the 2025 International Workshop on Spoken Dialogue Systems (IWSDS), pages 357-368, Bilbao, Spain, May 27-30 2025. ACL Anthology, https://aclanthology.org/2025.iwsds-1.37
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[102] arXiv:2504.10746 (cross-list from cs.CV) [pdf, html, other]
Title: Hearing Anywhere in Any Environment
Xiulong Liu, Anurag Kumar, Paul Calamia, Sebastia V. Amengual, Calvin Murdock, Ishwarya Ananthabhotla, Philip Robinson, Eli Shlizerman, Vamsi Krishna Ithapu, Ruohan Gao
Comments: CVPR 2025; Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2504.10782 (cross-list from cs.SD) [pdf, html, other]
Title: Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech
Patrick O'Reilly, Zeyu Jin, Jiaqi Su, Bryan Pardo
Comments: ICLR 2025 Workshop on GenAI Watermarking
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104] arXiv:2504.10793 (cross-list from cs.SD) [pdf, html, other]
Title: SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures
Kuang Yuan, Yifeng Wang, Xiyuxing Zhang, Chengyi Shen, Swarun Kumar, Justin Chan
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[105] arXiv:2504.10819 (cross-list from cs.SD) [pdf, html, other]
Title: Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy
Botao Zhao, Zuheng Kang, Yayun He, Xiaoyang Qu, Junqing Peng, Jing Xiao, Jianzong Wang
Comments: Accpeted by IEEE International Conference on Multimedia & Expo 2025 (ICME 2025)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[106] arXiv:2504.10821 (cross-list from cs.SD) [pdf, html, other]
Title: Progressive Rock Music Classification
Arpan Nagar, Joseph Bensabat, Jokent Gaza, Moinak Dey
Comments: 20 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[107] arXiv:2504.10826 (cross-list from cs.SD) [pdf, html, other]
Title: SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing
Xinlei Niu, Kin Wai Cheuk, Jing Zhang, Naoki Murata, Chieh-Hsin Lai, Michele Mancusi, Woosung Choi, Giorgio Fabbro, Wei-Hsiang Liao, Charles Patrick Martin, Yuki Mitsufuji
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[108] arXiv:2504.10849 (cross-list from cs.HC) [pdf, html, other]
Title: Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition
Naoto Nishida, Hirotaka Hiraki, Jun Rekimoto, Yoshio Ishiguro
Comments: 3 pages, 1 figures
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[109] arXiv:2504.11002 (cross-list from cs.SD) [pdf, html, other]
Title: Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation
Yan Rong, Shan Yang, Chenxing Li, Dong Yu, Li Liu
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[110] arXiv:2504.11622 (cross-list from cs.CR) [pdf, html, other]
Title: Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms' "Typo" Correction
Seyyed Ali Ayati, Jin Hyun Park, Yichen Cai, Marcus Botacin
Comments: Length: 13 pages Figures: 5 figures Tables: 7 tables Keywords: Acoustic side-channel attacks, machine learning, Visual Transformers, Large Language Models (LLMs), security Conference: Accepted at the 19th USENIX WOOT Conference on Offensive Technologies (WOOT '25). Licensing: This paper is submitted under the CC BY Creative Commons Attribution license. arXiv admin note: text overlap with arXiv:2502.09782
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111] arXiv:2504.12005 (cross-list from cs.SD) [pdf, other]
Title: Voice Conversion with Diverse Intonation using Conditional Variational Auto-Encoder
Soobin Suh, Dabi Ahn, Heewoong Park, Jonghun Park
Comments: 2 pages, Machine Learning in Speech and Language Processing Workshop (MLSLP) 2018
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[112] arXiv:2504.12272 (cross-list from cs.SD) [pdf, other]
Title: Edge Intelligence for Wildlife Conservation: Real-Time Hornbill Call Classification Using TinyML
Kong Ka Hing, Mehran Behjati
Comments: This is a preprint version of a paper accepted and published in Springer Lecture Notes in Networks and Systems. The final version is available at this https URL
Journal-ref: Selected Proceedings from the 2nd ICIMR 2024. Lecture Notes in Networks and Systems, vol 1316. Springer, Singapore
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[113] arXiv:2504.12279 (cross-list from cs.SD) [pdf, html, other]
Title: Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Mikhail Osipov
Comments: Preprint. 15 pages, 6 figures, 6 tables, 11 appendices. Code and data available upon request
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[114] arXiv:2504.12339 (cross-list from cs.CL) [pdf, html, other]
Title: GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
Yaodong Song, Hongjie Chen, Jie Lian, Yuxin Zhang, Guangmin Xia, Zehan Li, Genliang Zhao, Jian Kang, Jie Li, Yongxiang Li, Xuelong Li
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115] arXiv:2504.12398 (cross-list from cs.SD) [pdf, html, other]
Title: An accurate measurement of parametric array using a spurious sound filter topologically equivalent to a half-wavelength resonator
Woongji Kim, Beomseok Oh, Junsuk Rho, Wonkyu Moon
Comments: 12 pages, 11 figures. Published in Applied Acoustics
Journal-ref: Applied Acoustics, 240, 2025, 110910
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[116] arXiv:2504.12796 (cross-list from cs.MM) [pdf, html, other]
Title: A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li, Mining Tan, Feier Shen, Minyan Luo, Zijiao Yin, Fan Tang, Weiming Dong, Changsheng Xu
Comments: 34 pages, 7 figures
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2504.12880 (cross-list from cs.LG) [pdf, html, other]
Title: Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch, René Heinrich, Ilyass Moummad, Alexis Joly, Bernhard Sick, Christoph Scholz
Comments: accepted @TMLR: this https URL
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2504.13102 (cross-list from cs.SD) [pdf, other]
Title: A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition
Wei Huang, Shumeng Sun, Junpeng Lu, Zhenpeng Xu, Zhengyang Xiu, Hao Zhang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[119] arXiv:2504.13308 (cross-list from cs.SD) [pdf, html, other]
Title: Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope
Leena G Pillai, D. Muhammad Noorul Mubarak
Comments: This is a review paper about Acoustic to Articulatory inversion of speech, presented in an international conference. This paper has 8 pages and 2 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[120] arXiv:2504.13535 (cross-list from cs.SD) [pdf, html, other]
Title: MusFlow: Multimodal Music Generation via Conditional Flow Matching
Jiahao Song, Yuzhao Wang
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[121] arXiv:2504.13791 (cross-list from cs.SD) [pdf, html, other]
Title: Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion
Sandipan Dhar, Md. Tousin Akhter, Nanda Dulal Jana, Swagatam Das
Comments: 7 pages, 2 figures, 3 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[122] arXiv:2504.14076 (cross-list from cs.SD) [pdf, html, other]
Title: Transformation of audio embeddings into interpretable, concept-based representations
Alice Zhang, Edison Thomaz, Lie Lu
Comments: Accepted to International Joint Conference on Neural Networks (IJCNN) 2025
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[123] arXiv:2504.14735 (cross-list from cs.SD) [pdf, html, other]
Title: DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions
Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas, Yuki Mitsufuji
Comments: Accepted at DAFx 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124] arXiv:2504.15052 (cross-list from cs.CL) [pdf, html, other]
Title: Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT
Joachim Minder, Guillaume Wisniewski, Natalie Kübler
Comments: Accepted for publication in the proceedings of MT Summit 2025
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[125] arXiv:2504.15509 (cross-list from cs.CL) [pdf, html, other]
Title: SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
Keqi Deng, Wenxi Chen, Xie Chen, Philip C. Woodland
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126] arXiv:2504.15822 (cross-list from cs.SD) [pdf, html, other]
Title: Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Scott Wellington, Xuechen Liu, Junichi Yamagishi
Comments: Accepted at IEEE 23rd International Conference of the Biometrics Special Interest Group (BIOSIG 2024)
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[127] arXiv:2504.16213 (cross-list from cs.SD) [pdf, html, other]
Title: TinyML for Speech Recognition
Andrew Barovic, Armin Moin
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[128] arXiv:2504.16234 (cross-list from cs.LG) [pdf, other]
Title: Using Phonemes in cascaded S2S translation pipeline
Rene Pilz, Johannes Schneider
Comments: Accepted at Swiss NLP Conference 2025
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2504.16936 (cross-list from cs.MM) [pdf, html, other]
Title: Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao, Junyu Luo, Xiao Luo, Weizhi Zhang, Zhiping Xiao, Wei Ju, Philip S. Yu, Ming Zhang
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2504.17912 (cross-list from cs.SD) [pdf, html, other]
Title: STNet: Prediction of Underwater Sound Speed Profiles with An Advanced Semi-Transformer Neural Network
Wei Huang, Jiajun Lu, Hao Zhang, Tianhe Xu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[131] arXiv:2504.18099 (cross-list from cs.SD) [pdf, html, other]
Title: Tracking Articulatory Dynamics in Speech with a Fixed-Weight BiLSTM-CNN Architecture
Leena G Pillai, D. Muhammad Noorul Mubarak, Elizabeth Sherly
Comments: 10 pages with 8 figures. This paper presented in an international Conference
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[132] arXiv:2504.18283 (cross-list from cs.CV) [pdf, html, other]
Title: Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Minjae Kang, Martim Brandão
Comments: Originally submitted to CVPR 2025 on 2024-11-15 with paper ID 15808
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2504.18582 (cross-list from cs.SD) [pdf, other]
Title: Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
Abdulhady Abas Abdullah, Sarkhel H. Taher Karim, Sara Azad Ahmed, Kanar R. Tariq, Tarik A. Rashid
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[134] arXiv:2504.18650 (cross-list from cs.LG) [pdf, other]
Title: Unsupervised outlier detection to improve bird audio dataset labels
Bruce Collins
Comments: 27 pages, 9 figures
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2504.18715 (cross-list from cs.CL) [pdf, html, other]
Title: Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen, Qirui Wang, Runlin He, Shyam Gollakota
Comments: Accepted by CHI2025
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2504.18799 (cross-list from cs.MM) [pdf, html, other]
Title: A Survey on Multimodal Music Emotion Recognition
Rashini Liyanarachchi, Aditya Joshi, Erik Meijering
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2504.18950 (cross-list from cs.SD) [pdf, html, other]
Title: Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Erfan Loweimi, Mengjie Qian, Kate Knill, Mark Gales
Comments: 13 pages, 10 figures, 10 tables, 76 references
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[138] arXiv:2504.19030 (cross-list from cs.SD) [pdf, html, other]
Title: Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning
Sidahmed Lachenani, Hamza Kheddar, Mohamed Ouldzmirli
Journal-ref: 2024 International Conference on Telecommunications and Intelligent Systems (ICTIS)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[139] arXiv:2504.19146 (cross-list from cs.SD) [pdf, html, other]
Title: Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget
Xin Li, Kaikai Jia, Hao Sun, Jun Dai, Ziyang Jiang
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2504.19197 (cross-list from cs.SD) [pdf, html, other]
Title: Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar, Nanda Dulal Jana, Swagatam Das
Comments: 19 pages, 12 figures, 1 table
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[141] arXiv:2504.20447 (cross-list from cs.SD) [pdf, html, other]
Title: APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
Zhicheng Lian, Lizhi Wang, Hua Huang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[142] arXiv:2504.20532 (cross-list from cs.MM) [pdf, html, other]
Title: TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
Yue Li, Weizhi Liu, Dongdong Lin
Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2504.20625 (cross-list from cs.SD) [pdf, html, other]
Title: DiffusionRIR: Room Impulse Response Interpolation using Diffusion Models
Sagi Della Torre, Mirco Pezzoli, Fabio Antonacci, Sharon Gannot
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[144] arXiv:2504.20678 (cross-list from cs.CL) [pdf, html, other]
Title: Non-native Children's Automatic Speech Assessment Challenge (NOCASA)
Yaroslav Getman, Tamás Grósz, Mikko Kurimo, Giampiero Salvi
Comments: Final version of the baseline paper for the NOCASA competition (this https URL), Accepted at IEEE MLSP 2025
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[145] arXiv:2504.20776 (cross-list from cs.SD) [pdf, other]
Title: ECOSoundSet: a finely annotated dataset for the automated acoustic identification of Orthoptera and Cicadidae in North, Central and temperate Western Europe
David Funosas, Elodie Massol, Yves Bas, Svenja Schmidt, Dominik Arend, Alexander Gebhard, Luc Barbaro, Sebastian König, Rafael Carbonell Font, David Sannier, Fernand Deroussen, Jérôme Sueur, Christian Roesti, Tomi Trilar, Wolfgang Forstmeier, Lucas Roger, Eloïsa Matheu, Piotr Guzik, Julien Barataud, Laurent Pelozuelo, Stéphane Puissant, Sandra Mueller, Björn Schuller, Jose M. Montoya, Andreas Triantafyllopoulos, Maxime Cauchoix
Comments: 3 Figures + 2 Supplementary Figures, 2 Tables + 3 Supplementary Tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[146] arXiv:2504.20835 (cross-list from cs.SD) [pdf, html, other]
Title: Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning
Hongfei Xue, Yufeng Tang, Hexin Liu, Jun Zhang, Xuelong Geng, Lei Xie
Comments: 10 pages, 6 figures, Submitted to ACM MM 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147] arXiv:2504.20923 (cross-list from cs.SD) [pdf, html, other]
Title: End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation
Andrea Di Pierno (1 and 2), Luca Guarnera (2), Dario Allegra (2), Sebastiano Battiato (2) ((1) IMT School of Advanced Studies, Lucca, Italy, (2) Department of Mathematics and Computer Science, University of Catania, Italy)
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[148] arXiv:2504.21171 (cross-list from cs.SD) [pdf, html, other]
Title: Design, analysis, and experimental validation of a stepped plate parametric array loudspeaker
Woongji Kim, Beomseok Oh, Chayeong Kim, Wonkyu Moon
Comments: 51 pages, 18 figures, arXiv:this http URL(N) format preferred, submitted to The Journal of the Acoustical Society of America (AIP)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[149] arXiv:2504.21214 (cross-list from cs.CL) [pdf, html, other]
Title: Pretraining Large Brain Language Model for Active BCI: Silent Speech
Jinzhao Zhou, Zehong Cao, Yiqun Duan, Connor Barkley, Daniel Leong, Xiaowei Jiang, Quoc-Toan Nguyen, Ziyi Zhao, Thomas Do, Yu-Cheng Chang, Sheng-Fu Liang, Chin-teng Lin
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Total of 149 entries : 1-50 51-100 101-149
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack