Audio and Speech Processing

Authors and titles for March 2025

Total of 213 entries : 1-50 51-100 101-150 151-200 201-213

Showing up to 50 entries per page: fewer | more | all

[101] arXiv:2503.06346 (cross-list from cs.SD) [pdf, other]: Title: Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems

Maarten Grachten, Javier Nistal

Comments: Accepted for publication at the 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102] arXiv:2503.06348 (cross-list from cs.SD) [pdf, html, other]: Title: A Neural Score Follower for Computer Accompaniment of Polyphonic Musical Instruments

Ashwin Pillay

Comments: Masters Thesis

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2503.06362 (cross-list from cs.CV) [pdf, html, other]: Title: Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

Umberto Cappellazzo, Minsu Kim, Stavros Petridis

Comments: Accepted to IEEE ASRU 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104] arXiv:2503.06405 (cross-list from cs.SD) [pdf, html, other]: Title: Heterogeneous bimodal attention fusion for speech emotion recognition

Jiachen Luo, Huy Phan, Lin Wang, Joshua Reiss

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[105] arXiv:2503.06805 (cross-list from cs.CV) [pdf, html, other]: Title: Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts

Aref Farhadipour, Hossein Ranjbar, Masoumeh Chapariniya, Teodora Vukovic, Sarah Ebling, Volker Dellwo

Comments: 5 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[106] arXiv:2503.06924 (cross-list from cs.CL) [pdf, other]: Title: Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling

Michael McGuire

Comments: 26 pages, 10 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2503.06984 (cross-list from cs.SD) [pdf, html, other]: Title: Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition

Juncheng Wang, Chao Xu, Cheng Yu, Lei Shang, Zhe Hu, Shujun Wang, Liefeng Bo

Comments: Accepted to CVPR-25

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[108] arXiv:2503.07078 (cross-list from cs.CL) [pdf, html, other]: Title: Linguistic Knowledge Transfer Learning for Speech Enhancement

Kuo-Hsuan Hung, Xugang Lu, Szu-Wei Fu, Huan-Hsin Tseng, Hsin-Yi Lin, Chii-Wann Lin, Yu Tsao

Comments: 11 pages, 6 figures

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[109] arXiv:2503.07977 (cross-list from cs.SD) [pdf, html, other]: Title: Boundary Regression for Leitmotif Detection in Music Audio

Sihun Lee, Dasaem Jeong

Comments: 2 pages, 1 figure; presented at the 2024 ISMIR conference Late-Breaking Demo

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[110] arXiv:2503.08147 (cross-list from cs.CV) [pdf, html, other]: Title: FilmComposer: LLM-Driven Music Production for Silent Film Clips

Zhifeng Xie, Qile He, Youjia Zhu, Qiwei He, Mengtian Li

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111] arXiv:2503.08533 (cross-list from cs.CL) [pdf, html, other]: Title: ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems

Siddhant Arora, Yifan Peng, Jiatong Shi, Jinchuan Tian, William Chen, Shikhar Bharadwaj, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shuichiro Shimizu, Vaibhav Srivastav, Shinji Watanabe

Comments: Accepted at NAACL 2025 Demo Track

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2503.08540 (cross-list from cs.SD) [pdf, html, other]: Title: Mellow: a small audio language model for reasoning

Soham Deshmukh, Satvik Dixit, Rita Singh, Bhiksha Raj

Comments: Checkpoint and dataset available at: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[113] arXiv:2503.08798 (cross-list from cs.SD) [pdf, html, other]: Title: Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction

Minsu Kim, Rodrigo Mira, Honglie Chen, Stavros Petridis, Maja Pantic

Comments: Accepted to ICASSP 2025

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[114] arXiv:2503.08806 (cross-list from cs.SD) [pdf, html, other]: Title: Learning Control of Neural Sound Effects Synthesis from Physically Inspired Models

Yisu Zong, Joshua Reiss

Comments: ICASSP 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115] arXiv:2503.09053 (cross-list from cs.SD) [pdf, other]: Title: Control Surfaces: Using the Commodore 64 and Analog Synthesizer to Expand Musical Boundaries

Daniel McKemie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2503.09055 (cross-list from cs.SD) [pdf, other]: Title: Zero to 16383 Through the Wire: Transmitting High- Resolution MIDI with WebSockets and the Browser

Daniel McKemie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2503.09205 (cross-list from cs.MM) [pdf, other]: Title: Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model

Ali Vosoughi, Dimitra Emmanouilidou, Hannes Gamper

Comments: We are withdrawing this version due to the need for substantial updates in scope and organization, which affect the clarity and completeness of the manuscript. We plan to submit a revised version that incorporates these changes

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2503.09349 (cross-list from eess.SP) [pdf, html, other]: Title: Performance Modeling for Correlation-based Neural Decoding of Auditory Attention to Speech

Simon Geirnaert, Jonas Vanthornhout, Tom Francart, Alexander Bertrand

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[119] arXiv:2503.09905 (cross-list from cs.SD) [pdf, html, other]: Title: Quantization for OpenAI's Whisper Models: A Comparative Analysis

Allison Andreyev

Comments: 7 pages

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[120] arXiv:2503.10086 (cross-list from cs.SD) [pdf, html, other]: Title: Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking with Self-supervised Learning Features

Jiajun Deng, Yaolong Ju, Jing Yang, Simon Lui, Xunying Liu

Comments: Accepted by ISMIR2024

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[121] arXiv:2503.10211 (cross-list from cs.CL) [pdf, html, other]: Title: Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation

Henglyu Liu, Andong Chen, Kehai Chen, Xuefeng Bai, Meizhi Zhong, Yuan Qiu, Min Zhang

Comments: 12 pages, 7 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2503.10287 (cross-list from cs.SD) [pdf, html, other]: Title: MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment

Hao Zhou, Xiaobao Guo, Yuzhe Zhu, Adams Wai-Kin Kong

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Audio and Speech Processing (eess.AS)
[123] arXiv:2503.10446 (cross-list from cs.SD) [pdf, html, other]: Title: Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings

Jakaria Islam Emon, Md Abu Salek, Kazi Tamanna Alam

Comments: 6 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[124] arXiv:2503.10522 (cross-list from cs.MM) [pdf, html, other]: Title: AudioX: Diffusion Transformer for Anything-to-Audio Generation

Zeyue Tian, Yizhu Jin, Zhaoyang Liu, Ruibin Yuan, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

Comments: The code and datasets will be available at this https URL

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2503.11080 (cross-list from cs.CL) [pdf, html, other]: Title: Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation

Wuwei Huang, Renren Jin, Wen Zhang, Jian Luan, Bin Wang, Deyi Xiong

Comments: ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126] arXiv:2503.11190 (cross-list from cs.SD) [pdf, html, other]: Title: Cross-Modal Learning for Music-to-Music-Video Description Generation

Zhuoyuan Mao, Mengjie Zhao, Qiyu Wu, Zhi Zhong, Wei-Hsiang Liao, Hiromi Wakaki, Yuki Mitsufuji

Comments: Accepted by RepL4NLP 2025 @ NAACL 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[127] arXiv:2503.11197 (cross-list from cs.SD) [pdf, html, other]: Title: Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Gang Li, Jizhong Liu, Heinrich Dinkel, Yadong Niu, Junbo Zhang, Jian Luan

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[128] arXiv:2503.11206 (cross-list from cs.SD) [pdf, html, other]: Title: Comparative Study of Spike Encoding Methods for Environmental Sound Classification

Andres Larroza, Javier Naranjo-Alcazar, Vicent Ortiz Castelló, Pedro Zuccarello

Comments: Under review EUSIPCO 2025

Subjects: Sound (cs.SD); Emerging Technologies (cs.ET); Audio and Speech Processing (eess.AS)
[129] arXiv:2503.11229 (cross-list from cs.SD) [pdf, html, other]: Title: Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment

Ke Wang, Lei He, Kun Liu, Yan Deng, Wenning Wei, Sheng Zhao

Comments: 7 pages

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[130] arXiv:2503.11312 (cross-list from eess.SP) [pdf, html, other]: Title: A Data-Driven Exploration of Elevation Cues in HRTFs: An Explainable AI Perspective Across Multiple Datasets

Juan Antonio De Rus, Mario Montagud, Jesus Lopez-Ballester, Francesc J. Ferri, Maximo Cobos

Comments: 14 pages, 9 figures

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131] arXiv:2503.11315 (cross-list from cs.CV) [pdf, html, other]: Title: MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

Jeong Hun Yeo, Hyeongseop Rha, Se Jin Park, Yong Man Ro

Comments: Accepted at Findings of ACL 2025. The code and models are available this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132] arXiv:2503.11363 (cross-list from cs.SD) [pdf, html, other]: Title: Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification

Tobias Morocutti, Florian Schmid, Khaled Koutini, Gerhard Widmer

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[133] arXiv:2503.11373 (cross-list from cs.SD) [pdf, html, other]: Title: Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models

Tobias Morocutti, Florian Schmid, Jonathan Greif, Francesco Foscarin, Gerhard Widmer

Comments: In Proceedings of the 33rd European Signal Processing Conference (EUSIPCO 2025), Palermo, Italy

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[134] arXiv:2503.11562 (cross-list from cs.SD) [pdf, html, other]: Title: Designing Neural Synthesizers for Low-Latency Interaction

Franco Caspe, Jordie Shier, Mark Sandler, Charalampos Saitis, Andrew McPherson

Comments: See website at this http URL - 13 pages, 5 figures, accepted to the Journal of the Audio Engineering Society, LaTeX; Corrected typos, added hyphen to title to reflect JAES version

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[135] arXiv:2503.11627 (cross-list from cs.SD) [pdf, html, other]: Title: Are Deep Speech Denoising Models Robust to Adversarial Noise?

Will Schwarzer, Philip S. Thomas, Andrea Fanelli, Xiaoyu Liu

Comments: 13 pages, 5 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[136] arXiv:2503.11896 (cross-list from cs.SD) [pdf, html, other]: Title: Expressive Music Data Processing and Generation

Jingwei Liu

Comments: 7 pages, 4 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[137] arXiv:2503.11956 (cross-list from cs.SD) [pdf, html, other]: Title: Computational Extraction of Intonation and Tuning Systems from Multiple Microtonal Monophonic Vocal Recordings with Diverse Modes

Sepideh Shafiei, Shapour Hakam

Journal-ref: Sound and Music Computing Conference 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2503.12042 (cross-list from cs.SD) [pdf, html, other]: Title: Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing

Zhedong Zhang, Liang Li, Chenggang Yan, Chunshan Liu, Anton van den Hengel, Yuankai Qi

Comments: Accepted by CVPR2025

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[139] arXiv:2503.12115 (cross-list from cs.SD) [pdf, html, other]: Title: Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations

Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu

Comments: Accepted by IEEE Journal of Selected Topics in Signal Processing(JSTSP)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[140] arXiv:2503.12131 (cross-list from cs.CV) [pdf, html, other]: Title: DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap

Shentong Mo, Zehua Chen, Fan Bao, Jun Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2503.12261 (cross-list from cs.CV) [pdf, html, other]: Title: United we stand, Divided we fall: Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition in Valence-Arousal Space

R. Gnana Praveen, Jahangir Alam, Eric Charton

Comments: Achieved 2nd place in valence arousal challenge Submission to CVPR2025 Workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142] arXiv:2503.12388 (cross-list from cs.SD) [pdf, html, other]: Title: Serenade: A Singing Style Conversion Framework Based On Audio Infilling

Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda

Comments: Accepted to EUSIPCO 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2503.12506 (cross-list from cs.SD) [pdf, html, other]: Title: A General Close-loop Predictive Coding Framework for Auditory Working Memory

Zhongju Yuan, Geraint Wiggins, Dick Botteldooren

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[144] arXiv:2503.12589 (cross-list from cs.SD) [pdf, html, other]: Title: Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation

Wupeng Wang, Zexu Pan, Jingru Lin, Shuai Wang, Haizhou Li

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145] arXiv:2503.12806 (cross-list from cs.MM) [pdf, html, other]: Title: AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis

Hadam Baek, Hannie Shin, Jiyoung Seo, Chanwoo Kim, Saerom Kim, Hyeongbok Kim, Sangpil Kim

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146] arXiv:2503.12840 (cross-list from cs.SD) [pdf, html, other]: Title: Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics

Chen Liu, Liying Yang, Peike Li, Dadong Wang, Lincheng Li, Xin Yu

Comments: Accepted by CVPR2025

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[147] arXiv:2503.13763 (cross-list from cs.LG) [pdf, html, other]: Title: Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition

Atharva Agashe, Davelle Carreiro, Alexandra Van Dine, Joshua Peeples

Comments: 6 pages, 5 figures. This work has been accepted to IEEE OCEANS 2025

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2503.14185 (cross-list from cs.CL) [pdf, html, other]: Title: AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation

Wuwei Huang, Dexin Wang, Deyi Xiong

Comments: ACL 2021 Findings

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2503.14545 (cross-list from cs.LG) [pdf, html, other]: Title: PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing

Yanjia Huang, Renjie Li, Zhengzhong Tu

Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150] arXiv:2503.14928 (cross-list from cs.CV) [pdf, html, other]: Title: Shushing! Let's Imagine an Authentic Speech from the Silent Video

Jiaxin Ye, Hongming Shan

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 213 entries : 1-50 51-100 101-150 151-200 201-213

Showing up to 50 entries per page: fewer | more | all