Sound

Authors and titles for December 2024

Total of 231 entries : 1-25 51-75 76-100 101-125 126-150 151-175 176-200 201-225 ... 226-231

Showing up to 25 entries per page: fewer | more | all

[126] arXiv:2412.01092 (cross-list from eess.AS) [pdf, html, other]: Title: Deep Learning-Based Approach for Identification and Compensation of Nonlinear Distortions in Parametric Array Loudspeakers

Mengtong Li, Tao Zhuang, Kai Chen, Jia-Xin Zhong, Jing Lu

Comments: 5 pages, 7 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Systems and Control (eess.SY)
[127] arXiv:2412.01169 (cross-list from cs.MM) [pdf, html, other]: Title: OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Zichun Liao, Yusuke Kato, Kazuki Kozuka, Aditya Grover

Comments: 19 pages, 14 figures

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128] arXiv:2412.01195 (cross-list from eess.AS) [pdf, html, other]: Title: Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification

Bei Liu, Yanmin Qian

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[129] arXiv:2412.01401 (cross-list from eess.SP) [pdf, html, other]: Title: Linear stimulus reconstruction works on the KU Leuven audiovisual, gaze-controlled auditory attention decoding dataset

Simon Geirnaert, Iustina Rotaru, Tom Francart, Alexander Bertrand

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2412.01861 (cross-list from eess.AS) [pdf, html, other]: Title: Late fusion ensembles for speech recognition on diverse input audio representations

Marin Jezidžić, Matej Mihelčić

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[131] arXiv:2412.01996 (cross-list from eess.AS) [pdf, html, other]: Title: A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features

Jesús Monge-Alvarez, Carlos Hoyos-Barceló, Luis M. San-José-Revuelta, Pablo Casaseca-de-la-Higuera

Comments: 12 pages, 11 figures, 5 tables

Journal-ref: IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 66, NO. 8, AUGUST 2019, pp. 2319-2330

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[132] arXiv:2412.02164 (cross-list from eess.AS) [pdf, html, other]: Title: A Theoretical Framework for Acoustic Neighbor Embeddings

Woojay Jeon

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[133] arXiv:2412.02327 (cross-list from cs.AI) [pdf, html, other]: Title: Switchable deep beamformer for high-quality and real-time passive acoustic mapping

Yi Zeng, Jinwei Li, Hui Zhu, Shukuan Lu, Jianfeng Li, Xiran Cai

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134] arXiv:2412.02611 (cross-list from cs.CV) [pdf, html, other]: Title: AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2412.02612 (cross-list from cs.CL) [pdf, other]: Title: GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

Aohan Zeng, Zhengxiao Du, Mingdao Liu, Kedong Wang, Shengmin Jiang, Lei Zhao, Yuxiao Dong, Jie Tang

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2412.03074 (cross-list from cs.CL) [pdf, html, other]: Title: Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model

Joonyong Park, Daisuke Saito, Nobuaki Minematsu

Comments: APSIPA ASC 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2412.03075 (cross-list from cs.CL) [pdf, html, other]: Title: ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction

Victor Junqiu Wei, Weicheng Wang, Di Jiang, Yuanfeng Song, Lu Wang

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2412.03430 (cross-list from cs.CV) [pdf, html, other]: Title: SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model

Yan Li, Ziya Zhou, Zhiqiang Wang, Wei Xue, Wenhan Luo, Yike Guo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[139] arXiv:2412.04266 (cross-list from cs.CL) [pdf, html, other]: Title: Representation Purification for End-to-End Speech Translation

Chengwei Zhang, Yue Zhou, Rui Zhao, Yidong Chen, Xiaodong Shi

Comments: Accepted by COLING 2025

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2412.04425 (cross-list from eess.AS) [pdf, html, other]: Title: CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

Yen-Ju Lu, Jing Liu, Thomas Thebaud, Laureano Moro-Velazquez, Ariya Rastrow, Najim Dehak, Jesus Villalba

Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[141] arXiv:2412.04724 (cross-list from eess.AS) [pdf, html, other]: Title: StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

Jixun Yao, Yuguang Yang, Yu Pan, Ziqian Ning, Jiaohao Ye, Hongbin Zhou, Lei Xie

Comments: Accepted by AAAI 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[142] arXiv:2412.05015 (cross-list from eess.AS) [pdf, html, other]: Title: Perceptually Transparent Binaural Auralization of Simulated Sound Fields

Jens Ahrens

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[143] arXiv:2412.05167 (cross-list from cs.AI) [pdf, html, other]: Title: Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models

Kuofeng Gao, Shu-Tao Xia, Ke Xu, Philip Torr, Jindong Gu

Comments: Accepted by ACL 2025

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2412.05296 (cross-list from cs.AI) [pdf, html, other]: Title: Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation

Joonwoo Kwon, Heehwan Wang, Jinwoo Lee, Sooyoung Kim, Shinjae Yoo, Yuewei Lin, Jiook Cha

Comments: Codes and the dataset will be released upon acceptance

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145] arXiv:2412.05589 (cross-list from eess.AS) [pdf, html, other]: Title: SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR

Pengcheng Guo, Xuankai Chang, Hang Lv, Shinji Watanabe, Lei Xie

Comments: Accepted by IEEE/ACM TASLP

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[146] arXiv:2412.05694 (cross-list from cs.MM) [pdf, html, other]: Title: Combining Genre Classification and Harmonic-Percussive Features with Diffusion Models for Music-Video Generation

Leonardo Pina, Yongmin Li

Subjects: Multimedia (cs.MM); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147] arXiv:2412.05831 (cross-list from cs.MM) [pdf, html, other]: Title: Semi-Supervised Contrastive Learning for Controllable Video-to-Music Retrieval

Shanti Stewart, Gouthaman KV, Lie Lu, Andrea Fanelli

Comments: Accepted at ICASSP 2025

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2412.06209 (cross-list from cs.CV) [pdf, html, other]: Title: Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Tae-Hyun Oh

Comments: Under-review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2412.06259 (cross-list from eess.AS) [pdf, html, other]: Title: Leveraging Prompt Learning and Pause Encoding for Alzheimer's Disease Detection

Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling

Comments: Accepted by ISCSLP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[150] arXiv:2412.06602 (cross-list from cs.CL) [pdf, html, other]: Title: Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey

Tianxin Xie, Yan Rong, Pengfei Zhang, Wenwu Wang, Li Liu

Comments: A comprehensive survey on controllable TTS, 26 pages, 7 tables, 6 figures, 317 references. Under review

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 231 entries : 1-25 51-75 76-100 101-125 126-150 151-175 176-200 201-225 ... 226-231

Showing up to 25 entries per page: fewer | more | all