Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for December 2024

Total of 231 entries : 26-75 51-100 101-150 151-200 ... 201-231
Showing up to 50 entries per page: fewer | more | all
[26] arXiv:2412.05123 [pdf, html, other]
Title: Applying Automatic Differentiation to Optimize Differential Microphone Array Designs
Siminfar Samakoush Galougah, Ramani Duraiswami
Comments: 6 pages, 9 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2412.05436 [pdf, html, other]
Title: pyAMPACT: A Score-Audio Alignment Toolkit for Performance Data Estimation and Multi-modal Processing
Johanna Devaney, Daniel McKemie, Alex Morgan
Comments: International Society for Music Information Retrieval, Late Breaking Demo
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[28] arXiv:2412.05558 [pdf, html, other]
Title: WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition
Feng Li, Jiusong Luo, Wanjun Xia
Comments: Accepted by 31st International Conference on MultiMedia Modeling (MMM2025)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[29] arXiv:2412.05951 [pdf, html, other]
Title: When Vision Models Meet Parameter Efficient Look-Aside Adapters Without Large-Scale Audio Pretraining
Juan Yeo, Jinkwan Jang, Kyubyung Chae, Seongkyu Mun, Taesup Kim
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30] arXiv:2412.06001 [pdf, html, other]
Title: M6: Multi-generator, Multi-domain, Multi-lingual and cultural, Multi-genres, Multi-instrument Machine-Generated Music Detection Databases
Yupei Li, Hanqian Li, Lucia Specia, Björn W. Schuller
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[31] arXiv:2412.06208 [pdf, html, other]
Title: Pilot-guided Multimodal Semantic Communication for Audio-Visual Event Localization
Fei Yu, Zhe Xiang, Nan Che, Zhuoran Zhang, Yuandi Li, Junxiao Xue, Zhiguo Wan
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[32] arXiv:2412.06296 [pdf, html, other]
Title: VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features
Sifei Li, Binxin Yang, Chunji Yin, Chong Sun, Yuxin Zhang, Weiming Dong, Chen Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2412.06581 [pdf, other]
Title: EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations
Weizhen Bian, Yubo Zhou, Kaitai Zhang, Xiaohan Gu
Comments: I did not obtain the necessary approval from my academic supervisor prior to submission and there are issues with my current paper
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2412.06617 [pdf, html, other]
Title: AI TrackMate: Finally, Someone Who Will Give Your Music More Than Just "Sounds Great!"
Yi-Lin Jiang, Chia-Ho Hsiung, Yen-Tung Yeh, Lu-Rong Chen, Bo-Yu Chen
Comments: Accepted for the NeurIPS 2024 Creative AI Track
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[35] arXiv:2412.06660 [pdf, html, other]
Title: MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
Shansong Liu, Atin Sakkeer Hussain, Qilong Wu, Chenshuo Sun, Ying Shan
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[36] arXiv:2412.06703 [pdf, html, other]
Title: Source Separation & Automatic Transcription for Music
Bradford Derby, Lucas Dunker, Samarth Galchar, Shashank Jarmale, Akash Setti
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[37] arXiv:2412.06965 [pdf, html, other]
Title: Improving Source Extraction with Diffusion and Consistency Models
Tornike Karchkhadze, Mohammad Rasool Izadi, Shuo Zhang
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:2412.07316 [pdf, html, other]
Title: Preserving Speaker Information in Direct Speech-to-Speech Translation with Non-Autoregressive Generation and Pretraining
Rui Zhou, Akinori Ito, Takashi Nose
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[39] arXiv:2412.07948 [pdf, html, other]
Title: Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation
Jan Retkowski, Jakub Stępniak, Mateusz Modrzejewski
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[40] arXiv:2412.08112 [pdf, html, other]
Title: Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration
Haowei Lou, Helen Paik, Wen Hu, Lina Yao
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[41] arXiv:2412.08117 [pdf, html, other]
Title: LatentSpeech: Latent Diffusion for Text-To-Speech Generation
Haowei Lou, Helen Paik, Pari Delir Haghighi, Wen Hu, Lina Yao
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[42] arXiv:2412.08237 [pdf, html, other]
Title: TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
Xingchen Song, Mengtao Xing, Changwei Ma, Shengqiang Li, Di Wu, Binbin Zhang, Fuping Pan, Dinghao Zhou, Yuekai Zhang, Shun Lei, Zhendong Peng, Zhiyong Wu
Comments: Technical Report
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[43] arXiv:2412.08247 [pdf, html, other]
Title: MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues
Junjie Li, Ke Zhang, Shuai Wang, Kong Aik Lee, Man-Wai Mak, Haizhou Li
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[44] arXiv:2412.08312 [pdf, html, other]
Title: A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
Sowmya Cheripally
Comments: 7 pages, 5 figures, 2 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[45] arXiv:2412.08356 [pdf, html, other]
Title: Zero-Shot Mono-to-Binaural Speech Synthesis
Alon Levkovitch, Julian Salazar, Soroosh Mariooryad, RJ Skerry-Ryan, Nadav Bar, Bastiaan Kleijn, Eliya Nachmani
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[46] arXiv:2412.08504 [pdf, html, other]
Title: PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
Yifan Xie, Tao Feng, Xin Zhang, Xiangyang Luo, Zixuan Guo, Weijiang Yu, Heng Chang, Fei Ma, Fei Richard Yu
Comments: 9 pages, accepted by AAAI 2025
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[47] arXiv:2412.08550 [pdf, html, other]
Title: Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations
Hugo Flores García, Oriol Nieto, Justin Salamon, Bryan Pardo, Prem Seetharaman
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48] arXiv:2412.08577 [pdf, html, other]
Title: Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation
Hongming Guo, Ruibo Fu, Yizhong Geng, Shuai Liu, Shuchen Shi, Tao Wang, Chunyu Qiang, Chenxing Li, Ya Li, Zhengqi Wen, Yukun Liu, Xuefei Liu
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[49] arXiv:2412.08608 [pdf, html, other]
Title: AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
Mintong Kang, Chejian Xu, Bo Li
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[50] arXiv:2412.08683 [pdf, html, other]
Title: Emotional Vietnamese Speech-Based Depression Diagnosis Using Dynamic Attention Mechanism
Quang-Anh N.D., Manh-Hung Ha, Thai Kim Dinh, Minh-Duc Pham, Ninh Nguyen Van
Comments: 9 Page, 5 Figures
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[51] arXiv:2412.08856 [pdf, html, other]
Title: Complex-Cycle-Consistent Diffusion Model for Monaural Speech Enhancement
Yi Li, Yang Sun, Plamen Angelov
Comments: AAAI 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52] arXiv:2412.08944 [pdf, html, other]
Title: Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise
Tornike Karchkhadze, Keren Shao, Shlomo Dubnov
Journal-ref: 2024 IEEE International Conference on Big Data (Big Data)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[53] arXiv:2412.08988 [pdf, html, other]
Title: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong, Jiadong Pan, Liang Li, Yuankai Qi, Yuxin Peng, Anton van den Hengel, Jian Yang, Qingming Huang
Comments: Accepted to CVPR 2025
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[54] arXiv:2412.09032 [pdf, html, other]
Title: Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
Zhoulin Ji, Chenhao Lin, Hang Wang, Chao Shen
Comments: IJCAI 2024
Journal-ref: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, 2024, pp. 413-421
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[55] arXiv:2412.09168 [pdf, html, other]
Title: YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Zihao Chen, Haomin Zhang, Xinhan Di, Haoyu Wang, Sizhe Shan, Junjie Zheng, Yunming Liang, Yihan Fan, Xinfa Zhu, Wenjie Tian, Yihua Wang, Chaofan Ding, Lei Xie
Comments: 16 pages, 4 figures
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[56] arXiv:2412.09195 [pdf, html, other]
Title: On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection
Chenyang Guo, Liping Chen, Zhuhai Li, Kong Aik Lee, Zhen-Hua Ling, Wu Guo
Comments: 6 pages, 3 figures, published to IEEE SLT Workshop 2024
Journal-ref: 2024 IEEE Spoken Language Technology Workshop (SLT), 2024, pp. 1197-1202
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[57] arXiv:2412.09317 [pdf, html, other]
Title: Multimodal Sentiment Analysis based on Video and Audio Inputs
Antonio Fernandez, Suzan Awinat
Comments: Presented as a full paper in the 15th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2024) October 28-30, 2024, Leuven, Belgium
Journal-ref: Procedia Computer Science, Volume 251, 2024, Pages 41-48, ISSN 1877-0509
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[58] arXiv:2412.09467 [pdf, html, other]
Title: Audios Don't Lie: Multi-Frequency Channel Attention Mechanism for Audio Deepfake Detection
Yangguang Feng
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[59] arXiv:2412.09789 [pdf, html, other]
Title: SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
Sonal Kumar, Prem Seetharaman, Justin Salamon, Dinesh Manocha, Oriol Nieto
Comments: Website: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:2412.09928 [pdf, html, other]
Title: Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer's Disease Identification
Yifan Gao, Long Guo, Hong Liu
Comments: ICASSP 2025
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61] arXiv:2412.10011 [pdf, html, other]
Title: Enhanced Speech Emotion Recognition with Efficient Channel Attention Guided Deep CNN-BiLSTM Framework
Niloy Kumar Kundu, Sarah Kobir, Md. Rayhan Ahmed, Tahmina Aktar, Niloya Roy
Comments: 42 pages,10 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[62] arXiv:2412.10117 [pdf, html, other]
Title: CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou
Comments: Tech report, work in progress
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[63] arXiv:2412.10469 [pdf, other]
Title: Comparative Analysis of Mel-Frequency Cepstral Coefficients and Wavelet Based Audio Signal Processing for Emotion Detection and Mental Health Assessment in Spoken Speech
Idoko Agbo, Dr Hoda El-Sayed, M.D Kamruzzan Sarker
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64] arXiv:2412.10481 [pdf, other]
Title: Tipping Points, Pulse Elasticity and Tonal Tension: An Empirical Study on What Generates Tipping Points
Canishk Naik (CAM, LSE), Elaine Chew (Repmus, CNRS, STMS)
Comments: International Society for Music Information Retrieval Conference, Oct 2017, Suzhou, China, China
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[65] arXiv:2412.10649 [pdf, html, other]
Title: Hidden Echoes Survive Training in Audio To Audio Generative Instrument Models
Christopher J. Tralie, Matt Amery, Benjamin Douglas, Ian Utz
Comments: 8 pages, 11 Figures, Proceedings of 2025 AAAI Workshop on AI for Music
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[66] arXiv:2412.10792 [pdf, html, other]
Title: Audio-based Anomaly Detection in Industrial Machines Using Deep One-Class Support Vector Data Description
Sertac Kilickaya, Mete Ahishali, Cansu Celebioglu, Fahad Sohrab, Levent Eren, Turker Ince, Murat Askar, Moncef Gabbouj
Comments: To be published in 2025 IEEE Symposium Series on Computational Intelligence
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[67] arXiv:2412.10857 [pdf, html, other]
Title: Robust Persian Digit Recognition in Noisy Environments Using Hybrid CNN-BiGRU Model
Ali Nasr-Esfahani, Mehdi Bekrani, Roozbeh Rajabi
Comments: 6 pages, two columns, submitted to Pattern Recognition Letters
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[68] arXiv:2412.10968 [pdf, html, other]
Title: Composers' Evaluations of an AI Music Tool: Insights for Human-Centred Design
Eleanor Row, György Fazekas
Comments: Accepted to NeurIPS 2024 Workshop on Generative AI and Creativity: A dialogue between machine learning researchers and creative professionals in Vancouver, Canada
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[69] arXiv:2412.11272 [pdf, html, other]
Title: WhisperFlow: speech foundation models in real time
Rongxiang Wang, Zhiming Xu, Felix Xiaozhu Lin
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:2412.11449 [pdf, html, other]
Title: Whisper-GPT: A Hybrid Representation Audio Large Language Model
Prateek Verma
Comments: 6 pages, 3 figures. 50th International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71] arXiv:2412.11551 [pdf, html, other]
Title: Region-Based Optimization in Continual Learning for Audio Deepfake Detection
Yujie Chen, Jiangyan Yi, Cunhang Fan, Jianhua Tao, Yong Ren, Siding Zeng, Chu Yuan Zhang, Xinrui Yan, Hao Gu, Jun Xue, Chenglong Wang, Zhao Lv, Xiaohui Zhang
Comments: Accepted by AAAI 2025
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[72] arXiv:2412.11769 [pdf, other]
Title: Does it Chug? Towards a Data-Driven Understanding of Guitar Tone Description
Pratik Sutar, Jason Naradowsky, Yusuke Miyao
Comments: Accepted for publication at the 3rd Workshop on NLP for Music and Audio (NLP4MusA 2024)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[73] arXiv:2412.11907 [pdf, html, other]
Title: AudioCIL: A Python Toolbox for Audio Class-Incremental Learning with Multiple Scenes
Qisheng Xu, Yulin Sun, Yi Su, Qian Zhu, Xiaoyi Tan, Hongyu Wen, Zijian Gao, Kele Xu, Yong Dou, Dawei Feng
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2412.11943 [pdf, html, other]
Title: autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp, Andreas Triantafyllopoulos, Manuel Milling, Björn W. Schuller
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[75] arXiv:2412.12111 [pdf, html, other]
Title: Voice Biomarker Analysis and Automated Severity Classification of Dysarthric Speech in a Multilingual Context
Eunjung Yeo
Comments: SNU Doctoral thesis
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Total of 231 entries : 26-75 51-100 101-150 151-200 ... 201-231
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack