Sound

Authors and titles for October 2024

Total of 304 entries : 1-100 101-200 201-300 301-304

Showing up to 100 entries per page: fewer | more | all

[1] arXiv:2410.00210 [pdf, html, other]: Title: End-to-end Piano Performance-MIDI to Score Conversion with Transformers

Tim Beyer, Angela Dai

Comments: 6 pages, to appear at ISMIR 2024

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2410.00344 [pdf, html, other]: Title: Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces

Lilac Atassi

Comments: arXiv admin note: substantial text overlap with arXiv:2404.11976

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2410.00667 [pdf, other]: Title: Contribution of soundscape appropriateness to soundscape quality assessment in space: a mediating variable affecting acoustic comfort

Xinhao Yang, Guangyu Zhang, Xiaodong Lu, Yuan Zhang, Jian Kang

Comments: Accepted by Journal of Environmental Management

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)
[4] arXiv:2410.00767 [pdf, html, other]: Title: Zero-Shot Text-to-Speech from Continuous Text Streams

Trung Dang, David Aponte, Dung Tran, Tianyi Chen, Kazuhito Koishida

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5] arXiv:2410.00811 [pdf, html, other]: Title: Improving curriculum learning for target speaker extraction with synthetic speakers

Yun Liu, Xuechen Liu, Junichi Yamagishi

Comments: Accepted by SLT2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2410.00822 [pdf, html, other]: Title: VHASR: A Multimodal Speech Recognition System With Vision Hotwords

Jiliang Hu, Zuchao Li, Ping Wang, Haojun Ai, Lefei Zhang, Hai Zhao

Comments: 14 pages, 6 figures, accepted by EMNLP 2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[7] arXiv:2410.00872 [pdf, other]: Title: Do Music Generation Models Encode Music Theory?

Megan Wei, Michael Freeman, Chris Donahue, Chen Sun

Comments: Accepted at ISMIR 2024. Dataset: this https URL Code: this https URL Website: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:2410.00980 [pdf, html, other]: Title: Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset

Panagiota Anastasopoulou, Jessica Torrey, Xavier Serra, Frederic Font

Comments: DCASE2024, post-print, 5 pages, 2 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[9] arXiv:2410.01350 [pdf, html, other]: Title: Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling

Yuguang Yang, Yu Pan, Jixun Yao, Xiang Zhang, Jianhao Ye, Hongbin Zhou, Lei Xie, Lei Ma, Jianjun Zhao

Comments: Work in Progress; Under Review

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[10] arXiv:2410.01469 [pdf, html, other]: Title: TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Mohan Xu, Kai Li, Guo Chen, Xiaolin Hu

Comments: Accepted by ICLR 2025, demo page: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[11] arXiv:2410.01481 [pdf, html, other]: Title: SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios

Kai Li, Wendi Sang, Chang Zeng, Runxuan Yang, Guo Chen, Xiaolin Hu

Comments: Accepted by ICLR 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[12] arXiv:2410.02060 [pdf, html, other]: Title: PerTok: Expressive Encoding and Modeling of Symbolic Musical Ideas and Variations

Julian Lenz, Anirudh Mani

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13] arXiv:2410.02084 [pdf, html, other]: Title: Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset

Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Hao-Wen Dong

Comments: Accepted at ISMIR 2025

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2410.02130 [pdf, html, other]: Title: MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation

Trung X. Pham, Tri Ton, Chang D. Yoo

Comments: ICLR 2025

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[15] arXiv:2410.02144 [pdf, html, other]: Title: SoundMorpher: Perceptually-Uniform Sound Morphing with Diffusion Model

Xinlei Niu, Jing Zhang, Charles Patrick Martin

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:2410.02239 [pdf, html, other]: Title: A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings

Haopeng Geng, Daisuke Saito, Nobuaki Minematsu

Comments: Accepted by APSIPA ASC 2024. arXiv admin note: text overlap with arXiv:2409.11742

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[17] arXiv:2410.02271 [pdf, html, other]: Title: CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation

Junda Wu, Warren Li, Zachary Novack, Amit Namburi, Carol Chen, Julian McAuley

Comments: 4 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18] arXiv:2410.02560 [pdf, html, other]: Title: Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition

Olga Iakovenko, Ivan Bondarenko

Comments: Theory and Practice of Natural Computing 9th International Conference, TPNC 2020, Taoyuan, Taiwan, 2020, Proceedings 9

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[19] arXiv:2410.03264 [pdf, html, other]: Title: Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval

SeungHeon Doh, Minhee Lee, Dasaem Jeong, Juhan Nam

Comments: Accepted for publication at the IEEE ICASSP 2024

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20] arXiv:2410.03335 [pdf, html, other]: Title: Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition

Zixuan Wang, Chi-Keung Tang, Yu-Wing Tai

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:2410.03375 [pdf, other]: Title: SoundSignature: What Type of Music Do You Like?

Brandon James Carone, Pablo Ripollés

Comments: 10 pages, 1 figure, to be published in the 2024 International Symposium on the IEEE Internet of Sounds Proceedings

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[22] arXiv:2410.03427 [pdf, html, other]: Title: Biodenoising: Animal Vocalization Denoising without Access to Clean Data

Marius Miron, Sara Keen, Jen-Yu Liu, Benjamin Hoffman, Masato Hagiwara, Olivier Pietquin, Felix Effenberger, Maddie Cusimano

Comments: 5 pages, 2 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2410.03459 [pdf, html, other]: Title: Generative Semantic Communication for Text-to-Speech Synthesis

Jiahao Zheng, Jinke Ren, Peng Xu, Zhihao Yuan, Jie Xu, Fangxin Wang, Gui Gui, Shuguang Cui

Comments: The paper has been accepted by IEEE Globecom Workshop

Subjects: Sound (cs.SD); Information Theory (cs.IT); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2410.03676 [pdf, html, other]: Title: A quest through interconnected datasets: lessons from highly-cited ICASSP papers

Cynthia C. S. Liem, Doğa Taşcılar, Andrew M. Demetriou

Comments: in Proceedings of the 21st International Conference on Content-based Multimedia Indexing, September 18-20 2024, Reykjavik, Iceland

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[25] arXiv:2410.03734 [pdf, html, other]: Title: Accent conversion using discrete units with parallel data synthesized from controllable accented TTS

Tuan Nam Nguyen, Ngoc Quan Pham, Alexander Waibel

Comments: Accepted at Syndata4genAI

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[26] arXiv:2410.03752 [pdf, html, other]: Title: Efficient Streaming LLM for Speech Recognition

Junteng Jia, Gil Keren, Wei Zhou, Egor Lakomkin, Xiaohui Zhang, Chunyang Wu, Frank Seide, Jay Mahadeokar, Ozlem Kalinli

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[27] arXiv:2410.03879 [pdf, html, other]: Title: SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data

Liqian Zhang, Magdalena Fuentes

Comments: The paper has been accepted for ICASSP 2025, updating the latest camera-ready version

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[28] arXiv:2410.03904 [pdf, html, other]: Title: Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

Ksheeraja Raghavan, Samiran Gode, Ankit Shah, Surabhi Raghavan, Wolfram Burgard, Bhiksha Raj, Rita Singh

Comments: 9 pages, under review

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[29] arXiv:2410.04098 [pdf, html, other]: Title: The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities

Stefano Giacomelli, Marco Giordano, Claudia Rinaldi

Comments: Accepted at "IEEE 5th International Symposium on the Internet of Sounds, 30 Sep / 2 Oct 2024, Erlangen, Germany"

Journal-ref: in Proceedings of the 5th IEEE International Symposium on the Internet of Sounds (IEEE IS2 2024, https://internetofsounds.net/is2_2024/)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[30] arXiv:2410.04159 [pdf, html, other]: Title: Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer

Tomoki Honda, Shinsuke Sakai, Tatsuya Kawahara

Comments: Submitted to InterSpeech2024, Sample code is available at this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2410.04324 [pdf, html, other]: Title: Where are we in audio deepfake detection? A systematic analysis over generative and detection models

Xiang Li, Pin-Yu Chen, Wenqi Wei

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[32] arXiv:2410.04478 [pdf, html, other]: Title: Configurable Multilingual ASR with Speech Summary Representations

Harrison Zhu, Ivan Fung, Yingke Zhu, Lahiru Samarakoon

Comments: A preprint

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[33] arXiv:2410.04534 [pdf, html, other]: Title: UniMuMo: Unified Text, Music and Motion Generation

Han Yang, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian, Gaowen Liu, Chuang Gan

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[34] arXiv:2410.04702 [pdf, html, other]: Title: Demo of Zero-Shot Guitar Amplifier Modelling: Enhancing Modeling with Hyper Neural Networks

Yu-Hua Chen, Yuan-Chiao Cheng, Yen-Tung Yeh, Jui-Te Wu, Yu-Hsiang Ho, Jyh-Shing Roger Jang, Yi-Hsuan Yang

Comments: demo of the ISMIR paper

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2410.04704 [pdf, html, other]: Title: Modeling and Estimation of Vocal Tract and Glottal Source Parameters Using ARMAX-LF Model

Kai Lia, Masato Akagia, Yongwei Lib, Masashi Unokia

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[36] arXiv:2410.04797 [pdf, html, other]: Title: Attentive-based Multi-level Feature Fusion for Voice Disorder Diagnosis

Lipeng Shen, Yifan Xiong, Dongyue Guo, Wei Mo, Lingyu Yu, Hui Yang, Yi Lin

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[37] arXiv:2410.04990 [pdf, html, other]: Title: Stage-Wise and Prior-Aware Neural Speech Phase Prediction

Fei Liu, Yang Ai, Hui-Peng Du, Ye-Xin Lu, Rui-Chen Zheng, Zhen-Hua Ling

Comments: Accepted by SLT2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[38] arXiv:2410.05019 [pdf, html, other]: Title: RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

Ibrahim Aldarmaki, Thamar Solorio, Bhiksha Raj, Hanan Aldarmaki

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[39] arXiv:2410.05037 [pdf, html, other]: Title: Improving Speaker Representations Using Contrastive Losses on Multi-scale Features

Satvik Dixit, Massa Baali, Rita Singh, Bhiksha Raj

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40] arXiv:2410.05167 [pdf, html, other]: Title: Presto! Distilling Steps and Layers for Accelerating Music Generation

Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

Comments: Accepted as Spotlight at ICLR 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[41] arXiv:2410.05301 [pdf, other]: Title: Diffusion-based Unsupervised Audio-visual Speech Enhancement

Jean-Eudes Ayilo (MULTISPEECH), Mostafa Sadeghi (MULTISPEECH), Romain Serizel (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN)

Journal-ref: International Conference on Acoustics Speech and Signal Processing (ICASSP), IEEE, Apr 2025, Hyderabad, India

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[42] arXiv:2410.05423 [pdf, html, other]: Title: Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments

Sagarika Alavilli, Annesya Banerjee, Gasser Elbanna, Annika Magaro

Comments: Submitted to ICASSP 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[43] arXiv:2410.05647 [pdf, html, other]: Title: FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event Detection

Han Jiang, Wenyu Wang, Yiquan Zhou, Hongwu Ding, Jiacheng Xu, Jihua Zhu

Comments: Accepted to SLT 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2410.05739 [pdf, html, other]: Title: End-to-end multi-channel speaker extraction and binaural speech synthesis

Cheng Chi, Xiaoyu Li, Yuxuan Ke, Qunping Ni, Yao Ge, Xiaodong Li, Chengshi Zheng

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[45] arXiv:2410.05920 [pdf, html, other]: Title: FINALLY: fast and universal speech enhancement with studio-like quality

Nicholas Babaev, Kirill Tamogashev, Azat Saginbaev, Ivan Shchekotov, Hanbin Bae, Hosang Sung, WonJun Lee, Hoon-Young Cho, Pavel Andreev

Comments: Accepted to NeurIPS 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[46] arXiv:2410.06016 [pdf, html, other]: Title: Variable Bitrate Residual Vector Quantization for Audio Coding

Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo, Yukara Ikemiya, Zhi Zhong, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee, Wei-Hsiang Liao, Yuki Mitsufuji

Comments: ICASSP 2025 camera ready version

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47] arXiv:2410.06221 [pdf, html, other]: Title: POLIPHONE: A Dataset for Smartphone Model Identification from Audio Recordings

Davide Salvi, Daniele Ugo Leonzio, Antonio Giganti, Claudio Eutizi, Sara Mandelli, Paolo Bestagini, Stefano Tubaro

Comments: Submitted to IEEE Access

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[48] arXiv:2410.06459 [pdf, html, other]: Title: Mamba-based Segmentation Model for Speaker Diarization

Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Shota Horiguchi, Atsushi Ando, Shoko Araki

Comments: 5 pages, 4 figures. Submitted to ICASSP 2025. Code at this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:2410.06544 [pdf, html, other]: Title: SRC-gAudio: Sampling-Rate-Controlled Audio Generation

Chenxing Li, Manjie Xu, Dong Yu

Comments: Accepted by APSIPA2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2410.06572 [pdf, other]: Title: Can DeepFake Speech be Reliably Detected?

Hongbin Liu, Youzheng Chen, Arun Narayanan, Athula Balachandran, Pedro J. Moreno, Lun Wang

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[51] arXiv:2410.06608 [pdf, html, other]: Title: Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS

Onkar Kishor Susladkar, Vishesh Tripathi, Biddwan Ahmed

Journal-ref: EMNLP 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[52] arXiv:2410.06675 [pdf, html, other]: Title: SCOREQ: Speech Quality Assessment with Contrastive Regression

Alessandro Ragano, Jan Skoglund, Andrew Hines

Comments: Accepted NeurIPS 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53] arXiv:2410.06927 [pdf, html, other]: Title: Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks

Friedrich Wolf-Monheim

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[54] arXiv:2410.07530 [pdf, html, other]: Title: Audio Explanation Synthesis with Generative Foundation Models

Alican Akman, Qiyang Sun, Björn W. Schuller

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[55] arXiv:2410.07771 [pdf, html, other]: Title: Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models

Adriana Fernandez-Lopez, Shiwei Liu, Lu Yin, Stavros Petridis, Maja Pantic

Comments: Submitted to ICASSP 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[56] arXiv:2410.08035 [pdf, html, other]: Title: IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

Xin Zhang, Xiang Lyu, Zhihao Du, Qian Chen, Dong Zhang, Hangrui Hu, Chaohong Tan, Tianyu Zhao, Yuxuan Wang, Bin Zhang, Heng Lu, Yaqian Zhou, Xipeng Qiu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[57] arXiv:2410.08235 [pdf, html, other]: Title: A Recurrent Neural Network Approach to the Answering Machine Detection Problem

Kemal Altwlkany, Sead Delalic, Elmedin Selmanovic, Adis Alihodzic, Ivica Lovric

Comments: 6 pages, 2 figures, 2024 47th MIPRO ICT and Electronics Convention (MIPRO)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[58] arXiv:2410.08321 [pdf, html, other]: Title: Music Genre Classification using Large Language Models

Mohamed El Amine Meguenani, Alceu de Souza Britto Jr., Alessandro Lameiras Koerich

Comments: 7 pages

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[59] arXiv:2410.08435 [pdf, html, other]: Title: Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation

Tingyu Zhu, Haoyu Liu, Ziyu Wang, Zhimin Jiang, Zeyu Zheng

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[60] arXiv:2410.08626 [pdf, html, other]: Title: Small Tunes Transformer: Exploring Macro & Micro-Level Hierarchies for Skeleton-Conditioned Melody Generation

Yishan Lv, Jing Luo, Boyuan Ju, Xinyu Yang

Comments: Accepted by 31st International Conference on MultiMedia Modeling (MMM2025)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61] arXiv:2410.09250 [pdf, html, other]: Title: Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection

Chu-Hsuan Abraham Lin, Chen-Yu Liu, Samuel Yen-Chi Chen, Kuan-Cheng Chen

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Quantum Physics (quant-ph)
[62] arXiv:2410.09289 [pdf, html, other]: Title: Multimodal Audio-based Disease Prediction with Transformer-based Hierarchical Fusion Network

Jinjin Cai, Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Victoria McKenna, Aaron Friedman, Rachel Foot, Susan Storey, Ryan Boente, Sudip Vhaduri, Byung-Cheol Min

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[63] arXiv:2410.09360 [pdf, html, other]: Title: Towards the Synthesis of Non-speech Vocalizations

Enjamamul Hoq, Ifeoma Nwogu

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64] arXiv:2410.09396 [pdf, html, other]: Title: ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance

Yongkang Cheng, Mingjiang Liang, Shaoli Huang, Jifeng Ning, Wei Liu

Comments: Accepted by ICME 2024

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[65] arXiv:2410.09472 [pdf, html, other]: Title: DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning

Xiquan Li, Wenxi Chen, Ziyang Ma, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Qiuqiang Kong, Xie Chen

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[66] arXiv:2410.09578 [pdf, html, other]: Title: Objective Measurements of Voice Quality

Hira Dhamyal, Rita Singh

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67] arXiv:2410.09778 [pdf, html, other]: Title: LEAD Dataset: How Can Labels for Sound Event Detection Vary Depending on Annotators?

Naoki Koga, Yoshiaki Bando, Keisuke Imoto

Comments: Accepted to APSIPA ASC 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68] arXiv:2410.09869 [pdf, html, other]: Title: Prompt Tuning for Audio Deepfake Detection: Computationally Efficient Test-time Domain Adaptation with Limited Target Dataset

Hideyuki Oiso, Yuto Matsunaga, Kazuya Kakizaki, Taiki Miyagawa

Comments: Accepted at Interspeech 2024. Hideyuki Oiso and Yuto Matsunaga contributed equally

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[69] arXiv:2410.09928 [pdf, html, other]: Title: M2M-Gen: A Multimodal Framework for Automated Background Music Generation in Japanese Manga Using Large Language Models

Megha Sharma, Muhammad Taimoor Haseeb, Gus Xia, Yoshimasa Tsuruoka

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[70] arXiv:2410.10125 [pdf, html, other]: Title: Generative Deep Learning and Signal Processing for Data Augmentation of Cardiac Auscultation Signals: Improving Model Robustness Using Synthetic Audio

Leigh Abbott, Milan Marocchi, Matthew Fynn, Yue Rong, Sven Nordholm

Comments: 21 pages, 8 figures, 10 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[71] arXiv:2410.10515 [pdf, html, other]: Title: Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers

Gabriel Souza, Flavio Figueiredo, Alexei Machado, Deborah Guimarães

Comments: Presented at the Music for Machine Learning Workshop with ECML/PKDD. To be published by Springer

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72] arXiv:2410.10537 [pdf, html, other]: Title: Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

Jan Vrba, Jakub Steinbach, Tomáš Jirsa, Laura Verde, Roberta De Fazio, Yuwen Zeng, Kei Ichiji, Lukáš Hájek, Zuzana Sedláková, Zuzana Urbániová, Martin Chovanec, Jan Mareš, Noriyasu Homma

Comments: Code repository: this https URL, Supplementary materials: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[73] arXiv:2410.10676 [pdf, html, other]: Title: Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo

Comments: Accepted by ICLR 2025

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[74] arXiv:2410.10913 [pdf, html, other]: Title: Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning

Choi Changin, Lim Sungjun, Rhee Wonjong

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[75] arXiv:2410.10994 [pdf, html, other]: Title: GraFPrint: A GNN-Based Approach for Audio Identification

Aditya Bhattacharjee, Shubhr Singh, Emmanouil Benetos

Comments: Submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[76] arXiv:2410.11062 [pdf, html, other]: Title: CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning

Sjoerd Groot, Qinyu Chen, Jan C. van Gemert, Chang Gao

Comments: This paper has been accepted to be presented at the 2025 International Symposium on Circuits and Systems (ISCAS)

Journal-ref: 2025 IEEE International Symposium on Circuits and Systems (ISCAS)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[77] arXiv:2410.11120 [pdf, html, other]: Title: Audio-based Kinship Verification Using Age Domain Conversion

Qiyang Sun, Alican Akman, Xin Jing, Manuel Milling, Björn W. Schuller

Comments: 4 pages, 2 figures, submitted to IEEE Signal Processing Letters

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[78] arXiv:2410.11243 [pdf, html, other]: Title: Investigation of Speaker Representation for Target-Speaker Speech Processing

Takanori Ashihara, Takafumi Moriya, Shota Horiguchi, Junyi Peng, Tsubasa Ochiai, Marc Delcroix, Kohei Matsuura, Hiroshi Sato

Comments: Accepted at IEEE SLT 2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[79] arXiv:2410.11299 [pdf, html, other]: Title: Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models

Saksham Singh Kushwaha, Jianbo Ma, Mark R. P. Thomas, Yapeng Tian, Avery Bruni

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2410.11522 [pdf, other]: Title: Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction

Renhang Liu, Abhinaba Roy, Dorien Herremans

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[81] arXiv:2410.12028 [pdf, html, other]: Title: EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation

Mithun Manivannan (1), Vignesh Nethrapalli (1), Mark Cartwright (1) ((1) New Jersey Institute of Technology)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[82] arXiv:2410.12082 [pdf, html, other]: Title: Learning to rumble: Automated elephant call classification, detection and endpointing using deep architectures

Christiaan M. Geldenhuys, Thomas R. Niesler

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[83] arXiv:2410.12399 [pdf, html, other]: Title: SF-Speech: Straightened Flow for Zero-Shot Voice Clone

Xuyuan Li, Zengqiang Shang, Hua Hua, Peiyang Shi, Chen Yang, Li Wang, Pengyuan Zhang

Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2410.12416 [pdf, other]: Title: Enhancing Speech Emotion Recognition through Segmental Average Pooling of Self-Supervised Learning Features

Jonghwan Hyeon, Yung-Hwan Oh, Ho-Jin Choi

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[85] arXiv:2410.12668 [pdf, html, other]: Title: HeightCeleb - an enrichment of VoxCeleb dataset with speaker height information

Stanisław Kacprzak, Konrad Kowalczyk

Comments: Accepted at IEEE SLT 2024

Journal-ref: 2024 IEEE Spoken Language Technology Workshop (SLT), Macao, 2024, pp. 857-862

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2410.12956 [pdf, html, other]: Title: Towards Computational Analysis of Pansori Singing

Sangheon Park, Danbinaerin Han, Dasaem Jeong

Comments: Late-Breaking Demo Session of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[87] arXiv:2410.12957 [pdf, html, other]: Title: MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

Ruiqi Li, Siqi Zheng, Xize Cheng, Ziang Zhang, Shengpeng Ji, Zhou Zhao

Comments: Working in progress

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[88] arXiv:2410.13059 [pdf, html, other]: Title: AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding

Nhan Duc Thanh Nguyen, Huy Phan, Simon Geirnaert, Kaare Mikkelsen, Preben Kidmose

Comments: 11 pages, 6 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[89] arXiv:2410.13114 [pdf, html, other]: Title: Sound Check: Auditing Audio Datasets

William Agnew, Julia Barnett, Annie Chu, Rachel Hong, Michael Feffer, Robin Netzorg, Harry H. Jiang, Ezra Awumey, Sauvik Das

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Audio and Speech Processing (eess.AS)
[90] arXiv:2410.13179 [pdf, other]: Title: EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Ashish Seth, Ramaneswaran Selvakumar, S Sakshi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[91] arXiv:2410.13267 [pdf, html, other]: Title: CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

Shangda Wu, Yashan Wang, Ruibin Yuan, Zhancheng Guo, Xu Tan, Ge Zhang, Monan Zhou, Jing Chen, Xuefeng Mu, Yuejie Gao, Yuanliang Dong, Jiafeng Liu, Xiaobing Li, Feng Yu, Maosong Sun

Comments: 17 pages, 10 figures, 4 tables, accepted by NAACL 2025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[92] arXiv:2410.13282 [pdf, html, other]: Title: End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features

Natsuo Yamashita, Masaaki Yamamoto, Yohei Kawaguchi

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93] arXiv:2410.13328 [pdf, html, other]: Title: Enhancing 1-Second 3D SELD Performance with Filter Bank Analysis and SCConv Integration in CST-Former

Zhehui Zhang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2410.13419 [pdf, html, other]: Title: MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit

Yutian Wang, Wanyin Yang, Zhenrong Dai, Yilong Zhang, Kun Zhao, Hui Wang

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[95] arXiv:2410.13581 [pdf, html, other]: Title: Dynamic Range Compression and Its Effect on Music Genre Classification

Arlyn Reese Madsen III

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2410.13839 [pdf, html, other]: Title: Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding

Tan Dat Nguyen, Ji-Hoon Kim, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung

Comments: Submitted to IEEE ICASSP 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[97] arXiv:2410.14101 [pdf, html, other]: Title: Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech

Shuwei He, Rui Liu

Comments: 5 pages, 1 figure, Accepted by ICASSP'2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[98] arXiv:2410.14122 [pdf, html, other]: Title: Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation

Yonghyun Kim, Alexander Lerch

Comments: Accepted to the Late-Breaking Demo Session of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[99] arXiv:2410.14411 [pdf, html, other]: Title: SNAC: Multi-Scale Neural Audio Codec

Hubert Siuzdak, Florian Grötschla, Luca A. Lanzendörfer

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[100] arXiv:2410.14590 [pdf, html, other]: Title: Embodied Exploration of Latent Spaces and Explainable AI

Elizabeth Wilson, Mika Satomi, Alex McLean, Deva Schubert, Juan Felipe Amaya Gonzalez

Comments: In Proceedings of Explainable AI for the Arts Workshop 2024 (XAIxArts 2024) arXiv:2406.14485

Subjects: Sound (cs.SD); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)

Total of 304 entries : 1-100 101-200 201-300 301-304

Showing up to 100 entries per page: fewer | more | all