Sound

Authors and titles for June 2025

Total of 438 entries : 1-25 51-75 76-100 101-125 126-150 151-175 176-200 201-225 ... 426-438

Showing up to 25 entries per page: fewer | more | all

[126] arXiv:2506.12573 [pdf, html, other]: Title: Video-Guided Text-to-Music Generation Using Public Domain Movie Collections

Haven Kim, Zachary Novack, Weihan Xu, Julian McAuley, Hao-Wen Dong

Comments: ISMIR 2025 regular paper. Dataset, code, and demo available at this https URL

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[127] arXiv:2506.12665 [pdf, html, other]: Title: ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications

Valentin Ackva, Fares Schulz

Comments: 8 pages, accepted to the Proceedings of the 5th IEEE International Symposium on the Internet of Sounds (2024) - repository: this http URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[128] arXiv:2506.12672 [pdf, html, other]: Title: SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition

Yuta Hirano, Sakriani Sakti

Comments: Accepted by Interspeech 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[129] arXiv:2506.13001 [pdf, html, other]: Title: Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV

Christian Zhou-Zheng, Philippe Pasquier

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[130] arXiv:2506.13127 [pdf, html, other]: Title: I$^2$S-TFCKD: Intra-Inter Set Knowledge Distillation with Time-Frequency Calibration for Speech Enhancement

Jiaming Cheng, Ruiyu Liang, Chao Xu, Ye Ni, Wei Zhou, Björn W. Schuller, Xiaoshuai Hao

Comments: submitted to IEEE Transactions on Neural Networks and Learning Systems

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131] arXiv:2506.13272 [pdf, html, other]: Title: SONIC: Sound Optimization for Noise In Crowds

Pranav M N, Gandham Sai Santhosh, Tejas Joshi, S Sriniketh Desikan, Eswar Gupta

Subjects: Sound (cs.SD); Signal Processing (eess.SP)
[132] arXiv:2506.13595 [pdf, html, other]: Title: Persistent Homology of Music Network with Three Different Distances

Eunwoo Heo, Byeongchan Choi, Myung ock Kim, Mai Lan Tran, Jae-Hun Jung

Subjects: Sound (cs.SD); Computational Geometry (cs.CG); Audio and Speech Processing (eess.AS)
[133] arXiv:2506.13833 [pdf, html, other]: Title: A Survey on World Models Grounded in Acoustic Physical Information

Xiaoliang Chen, Le Chang, Xin Yu, Yunhe Huang, Xianling Tu

Comments: 28 pages,11 equations

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Robotics (cs.RO); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[134] arXiv:2506.13969 [pdf, other]: Title: Set theoretic solution for the tuning problem

Vsevolod Vladimirovich Deriushkin

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2506.13970 [pdf, other]: Title: Making deep neural networks work for medical audio: representation, compression and domain adaptation

Charles C Onu

Comments: PhD Thesis

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[136] arXiv:2506.14148 [pdf, html, other]: Title: Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment

Long-Vu Hoang, Tuan Nguyen, Tran Huy Dat

Comments: Accepted to Interspeech 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[137] arXiv:2506.14153 [pdf, html, other]: Title: Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models

Tuan Dat Phuong, Long-Vu Hoang, Huy Dat Tran

Comments: Accepted to Interspeech 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[138] arXiv:2506.14223 [pdf, html, other]: Title: Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription

Anna Hamberger, Sebastian Murgul, Jochen Schmidt, Michael Heizmann

Comments: Accepted to the 50th International Computer Music Conference (ICMC), 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[139] arXiv:2506.14226 [pdf, html, other]: Title: Investigation of Zero-shot Text-to-Speech Models for Enhancing Short-Utterance Speaker Verification

Yiyang Zhao, Shuai Wang, Guangzhi Sun, Zehua Chen, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Subjects: Sound (cs.SD)
[140] arXiv:2506.14293 [pdf, html, other]: Title: SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling

Tawsif Ahmed, Andrej Radonjic, Gollam Rabby

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[141] arXiv:2506.14396 [pdf, html, other]: Title: Manipulated Regions Localization For Partially Deepfake Audio: A Survey

Jiayi He, Jiangyan Yi, Jianhua Tao, Siding Zeng, Hao Gu

Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[142] arXiv:2506.14398 [pdf, html, other]: Title: A Comparative Study on Proactive and Passive Detection of Deepfake Speech

Chia-Hua Wu, Wanying Ge, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

Subjects: Sound (cs.SD)
[143] arXiv:2506.14434 [pdf, html, other]: Title: Unifying Streaming and Non-streaming Zipformer-based ASR

Bidisha Sharma, Karthik Pandia Durai, Shankar Venkatesan, Jeena J Prakash, Shashi Kumar, Malolan Chetlur, Andreas Stolcke

Comments: Accepted in ACL2025 Industry track

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[144] arXiv:2506.14503 [pdf, other]: Title: An Open Research Dataset of the 1932 Cairo Congress of Arab Music

Baris Bozkurt (College of Interdisciplinary Studies, Zayed University, Dubai, United Arab Emirates)

Comments: 14 pages, 4 figures, 4 tables

Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[145] arXiv:2506.14504 [pdf, html, other]: Title: Evolving music theory for emerging musical languages

Emmanuel Deruty

Comments: In Music 2025, Innovation in Music Conference. 20-22 June, 2025, Bath Spa University, Bath, UK

Subjects: Sound (cs.SD)
[146] arXiv:2506.14684 [pdf, html, other]: Title: Refining music sample identification with a self-supervised graph neural network

Aditya Bhattacharjee, Ivan Meresman Higgs, Mark Sandler, Emmanouil Benetos

Comments: Accepted at International Conference for Music Information Retrieval (ISMIR) 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[147] arXiv:2506.14723 [pdf, html, other]: Title: Adaptive Accompaniment with ReaLchords

Yusong Wu, Tim Cooijmans, Kyle Kastner, Adam Roberts, Ian Simon, Alexander Scarlatos, Chris Donahue, Cassie Tarakajian, Shayegan Omidshafiei, Aaron Courville, Pablo Samuel Castro, Natasha Jaques, Cheng-Zhi Anna Huang

Comments: Accepted by ICML 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[148] arXiv:2506.14750 [pdf, html, other]: Title: Exploring Speaker Diarization with Mixture of Experts

Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Hang Chen, Jun Du

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[149] arXiv:2506.14864 [pdf, other]: Title: pycnet-audio: A Python package to support bioacoustics data processing

Zachary J. Ruff, Damon B. Lesmeister

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[150] arXiv:2506.15000 [pdf, html, other]: Title: A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments

Md Jahangir Alam Khondkar, Ajan Ahmed, Masudul Haider Imtiaz, Stephanie Schuckers

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Total of 438 entries : 1-25 51-75 76-100 101-125 126-150 151-175 176-200 201-225 ... 426-438

Showing up to 25 entries per page: fewer | more | all