Audio and Speech Processing

Authors and titles for October 2024

Total of 358 entries : 1-50 51-100 101-150 151-200 201-250 251-300 301-350 ... 351-358

Showing up to 50 entries per page: fewer | more | all

[151] arXiv:2410.03427 (cross-list from cs.SD) [pdf, html, other]: Title: Biodenoising: Animal Vocalization Denoising without Access to Clean Data

Marius Miron, Sara Keen, Jen-Yu Liu, Benjamin Hoffman, Masato Hagiwara, Olivier Pietquin, Felix Effenberger, Maddie Cusimano

Comments: 5 pages, 2 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2410.03459 (cross-list from cs.SD) [pdf, html, other]: Title: Generative Semantic Communication for Text-to-Speech Synthesis

Jiahao Zheng, Jinke Ren, Peng Xu, Zhihao Yuan, Jie Xu, Fangxin Wang, Gui Gui, Shuguang Cui

Comments: The paper has been accepted by IEEE Globecom Workshop

Subjects: Sound (cs.SD); Information Theory (cs.IT); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[153] arXiv:2410.03676 (cross-list from cs.SD) [pdf, html, other]: Title: A quest through interconnected datasets: lessons from highly-cited ICASSP papers

Cynthia C. S. Liem, Doğa Taşcılar, Andrew M. Demetriou

Comments: in Proceedings of the 21st International Conference on Content-based Multimedia Indexing, September 18-20 2024, Reykjavik, Iceland

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[154] arXiv:2410.03719 (cross-list from cs.CL) [pdf, html, other]: Title: FluentEditor2: Text-based Speech Editing by Modeling Multi-Scale Acoustic and Prosody Consistency

Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li

Comments: submitted for an IEEE publication

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[155] arXiv:2410.03734 (cross-list from cs.SD) [pdf, html, other]: Title: Accent conversion using discrete units with parallel data synthesized from controllable accented TTS

Tuan Nam Nguyen, Ngoc Quan Pham, Alexander Waibel

Comments: Accepted at Syndata4genAI

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[156] arXiv:2410.03751 (cross-list from cs.CL) [pdf, html, other]: Title: Recent Advances in Speech Language Models: A Survey

Wenqian Cui, Dianzhi Yu, Xiaoqi Jiao, Ziqiao Meng, Guangyan Zhang, Qichao Wang, Yiwen Guo, Irwin King

Comments: The reduced version of this paper has been accepted at ACL 2025

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2410.03752 (cross-list from cs.SD) [pdf, html, other]: Title: Efficient Streaming LLM for Speech Recognition

Junteng Jia, Gil Keren, Wei Zhou, Egor Lakomkin, Xiaohui Zhang, Chunyang Wu, Frank Seide, Jay Mahadeokar, Ozlem Kalinli

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[158] arXiv:2410.03791 (cross-list from cs.HC) [pdf, html, other]: Title: People are poorly equipped to detect AI-powered voice clones

Sarah Barrington, Emily A. Cooper, Hany Farid

Journal-ref: Scientific Reports 15, 11004, 2025

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[159] arXiv:2410.03798 (cross-list from cs.CL) [pdf, html, other]: Title: Self-Powered LLM Modality Expansion for Large Speech-Text Models

Tengfei Yu, Xuebo Liu, Zhiyi Hou, Liang Ding, Dacheng Tao, Min Zhang

Comments: Accepted to EMNLP 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2410.03813 (cross-list from cs.LG) [pdf, html, other]: Title: SOI: Scaling Down Computational Complexity by Estimating Partial States of the Model

Grzegorz Stefański, Paweł Daniluk, Artur Szumaczuk, Jakub Tkaczuk

Comments: NeurIPS 2024

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[161] arXiv:2410.03879 (cross-list from cs.SD) [pdf, html, other]: Title: SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data

Liqian Zhang, Magdalena Fuentes

Comments: The paper has been accepted for ICASSP 2025, updating the latest camera-ready version

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[162] arXiv:2410.03904 (cross-list from cs.SD) [pdf, html, other]: Title: Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

Ksheeraja Raghavan, Samiran Gode, Ankit Shah, Surabhi Raghavan, Wolfram Burgard, Bhiksha Raj, Rita Singh

Comments: 9 pages, under review

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[163] arXiv:2410.03930 (cross-list from cs.CL) [pdf, html, other]: Title: Reverb: Open-Source ASR and Diarization from Rev

Nishchal Bhandari, Danny Chen, Miguel Ángel del Río Fernández, Natalie Delworth, Jennifer Drexler Fox, Migüel Jetté, Quinten McNamara, Corey Miller, Ondřej Novotný, Ján Profant, Nan Qin, Martin Ratajczak, Jean-Philippe Robichaud

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164] arXiv:2410.04029 (cross-list from cs.CL) [pdf, html, other]: Title: SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Alan Baade, Puyuan Peng, David Harwath

Comments: 10 pages, 2 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[165] arXiv:2410.04091 (cross-list from cs.LG) [pdf, html, other]: Title: Cross-Lingual Query-by-Example Spoken Term Detection: A Transformer-Based Approach

Allahdadi Fatemeh, Mahdian Toroghi Rahil, Zareian Hassan

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2410.04098 (cross-list from cs.SD) [pdf, html, other]: Title: The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities

Stefano Giacomelli, Marco Giordano, Claudia Rinaldi

Comments: Accepted at "IEEE 5th International Symposium on the Internet of Sounds, 30 Sep / 2 Oct 2024, Erlangen, Germany"

Journal-ref: in Proceedings of the 5th IEEE International Symposium on the Internet of Sounds (IEEE IS2 2024, https://internetofsounds.net/is2_2024/)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[167] arXiv:2410.04159 (cross-list from cs.SD) [pdf, html, other]: Title: Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer

Tomoki Honda, Shinsuke Sakai, Tatsuya Kawahara

Comments: Submitted to InterSpeech2024, Sample code is available at this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2410.04324 (cross-list from cs.SD) [pdf, html, other]: Title: Where are we in audio deepfake detection? A systematic analysis over generative and detection models

Xiang Li, Pin-Yu Chen, Wenqi Wei

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[169] arXiv:2410.04478 (cross-list from cs.SD) [pdf, html, other]: Title: Configurable Multilingual ASR with Speech Summary Representations

Harrison Zhu, Ivan Fung, Yingke Zhu, Lahiru Samarakoon

Comments: A preprint

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[170] arXiv:2410.04534 (cross-list from cs.SD) [pdf, html, other]: Title: UniMuMo: Unified Text, Music and Motion Generation

Han Yang, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian, Gaowen Liu, Chuang Gan

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[171] arXiv:2410.04702 (cross-list from cs.SD) [pdf, html, other]: Title: Demo of Zero-Shot Guitar Amplifier Modelling: Enhancing Modeling with Hyper Neural Networks

Yu-Hua Chen, Yuan-Chiao Cheng, Yen-Tung Yeh, Jui-Te Wu, Yu-Hsiang Ho, Jyh-Shing Roger Jang, Yi-Hsuan Yang

Comments: demo of the ISMIR paper

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2410.04704 (cross-list from cs.SD) [pdf, html, other]: Title: Modeling and Estimation of Vocal Tract and Glottal Source Parameters Using ARMAX-LF Model

Kai Lia, Masato Akagia, Yongwei Lib, Masashi Unokia

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[173] arXiv:2410.04797 (cross-list from cs.SD) [pdf, html, other]: Title: Attentive-based Multi-level Feature Fusion for Voice Disorder Diagnosis

Lipeng Shen, Yifan Xiong, Dongyue Guo, Wei Mo, Lingyu Yu, Hui Yang, Yi Lin

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[174] arXiv:2410.04906 (cross-list from cs.MM) [pdf, html, other]: Title: Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Ivan Rinaldi, Nicola Fanelli, Giovanna Castellano, Gennaro Vessio

Comments: Presented at the AI for Visual Arts (AI4VA) workshop at ECCV 2024

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175] arXiv:2410.04990 (cross-list from cs.SD) [pdf, html, other]: Title: Stage-Wise and Prior-Aware Neural Speech Phase Prediction

Fei Liu, Yang Ai, Hui-Peng Du, Ye-Xin Lu, Rui-Chen Zheng, Zhen-Hua Ling

Comments: Accepted by SLT2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[176] arXiv:2410.05019 (cross-list from cs.SD) [pdf, html, other]: Title: RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

Ibrahim Aldarmaki, Thamar Solorio, Bhiksha Raj, Hanan Aldarmaki

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[177] arXiv:2410.05037 (cross-list from cs.SD) [pdf, html, other]: Title: Improving Speaker Representations Using Contrastive Losses on Multi-scale Features

Satvik Dixit, Massa Baali, Rita Singh, Bhiksha Raj

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2410.05146 (cross-list from cs.CL) [pdf, html, other]: Title: CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation

Rui Zhao, Jinyu Li, Ruchao Fan, Matt Post

Comments: Accepted by IEEE Spoken Language Technology Workshop (SLT 2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[179] arXiv:2410.05167 (cross-list from cs.SD) [pdf, html, other]: Title: Presto! Distilling Steps and Layers for Accelerating Music Generation

Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

Comments: Accepted as Spotlight at ICLR 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[180] arXiv:2410.05301 (cross-list from cs.SD) [pdf, other]: Title: Diffusion-based Unsupervised Audio-visual Speech Enhancement

Jean-Eudes Ayilo (MULTISPEECH), Mostafa Sadeghi (MULTISPEECH), Romain Serizel (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN)

Journal-ref: International Conference on Acoustics Speech and Signal Processing (ICASSP), IEEE, Apr 2025, Hyderabad, India

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[181] arXiv:2410.05361 (cross-list from cs.LG) [pdf, html, other]: Title: RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction

Yuwei Zhang, Tong Xia, Aaqib Saeed, Cecilia Mascolo

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[182] arXiv:2410.05423 (cross-list from cs.SD) [pdf, html, other]: Title: Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments

Sagarika Alavilli, Annesya Banerjee, Gasser Elbanna, Annika Magaro

Comments: Submitted to ICASSP 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[183] arXiv:2410.05455 (cross-list from cs.LG) [pdf, html, other]: Title: Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

Shubham Gupta, Isaac Neri Gomez-Sarmiento, Faez Amjed Mezdari, Mirco Ravanelli, Cem Subakan

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184] arXiv:2410.05647 (cross-list from cs.SD) [pdf, html, other]: Title: FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event Detection

Han Jiang, Wenyu Wang, Yiquan Zhou, Hongwu Ding, Jiacheng Xu, Jihua Zhu

Comments: Accepted to SLT 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2410.05739 (cross-list from cs.SD) [pdf, html, other]: Title: End-to-end multi-channel speaker extraction and binaural speech synthesis

Cheng Chi, Xiaoyu Li, Yuxuan Ke, Qunping Ni, Yao Ge, Xiaodong Li, Chengshi Zheng

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[186] arXiv:2410.05791 (cross-list from cs.GR) [pdf, html, other]: Title: FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance

Ruocheng Wang, Pei Xu, Haochen Shi, Elizabeth Schumann, C. Karen Liu

Comments: SIGGRAPH Asia 2024. Project page: this https URL

Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2410.05920 (cross-list from cs.SD) [pdf, html, other]: Title: FINALLY: fast and universal speech enhancement with studio-like quality

Nicholas Babaev, Kirill Tamogashev, Azat Saginbaev, Ivan Shchekotov, Hanbin Bae, Hosang Sung, WonJun Lee, Hoon-Young Cho, Pavel Andreev

Comments: Accepted to NeurIPS 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[188] arXiv:2410.06016 (cross-list from cs.SD) [pdf, html, other]: Title: Variable Bitrate Residual Vector Quantization for Audio Coding

Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo, Yukara Ikemiya, Zhi Zhong, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee, Wei-Hsiang Liao, Yuki Mitsufuji

Comments: ICASSP 2025 camera ready version

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[189] arXiv:2410.06221 (cross-list from cs.SD) [pdf, html, other]: Title: POLIPHONE: A Dataset for Smartphone Model Identification from Audio Recordings

Davide Salvi, Daniele Ugo Leonzio, Antonio Giganti, Claudio Eutizi, Sara Mandelli, Paolo Bestagini, Stefano Tubaro

Comments: Submitted to IEEE Access

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[190] arXiv:2410.06459 (cross-list from cs.SD) [pdf, html, other]: Title: Mamba-based Segmentation Model for Speaker Diarization

Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Shota Horiguchi, Atsushi Ando, Shoko Araki

Comments: 5 pages, 4 figures. Submitted to ICASSP 2025. Code at this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2410.06543 (cross-list from cs.CR) [pdf, html, other]: Title: Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection

Aravinda Reddy PN, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, Pabitra Mitra Vinod Rathod

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192] arXiv:2410.06544 (cross-list from cs.SD) [pdf, html, other]: Title: SRC-gAudio: Sampling-Rate-Controlled Audio Generation

Chenxing Li, Manjie Xu, Dong Yu

Comments: Accepted by APSIPA2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2410.06608 (cross-list from cs.SD) [pdf, html, other]: Title: Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS

Onkar Kishor Susladkar, Vishesh Tripathi, Biddwan Ahmed

Journal-ref: EMNLP 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[194] arXiv:2410.06675 (cross-list from cs.SD) [pdf, html, other]: Title: SCOREQ: Speech Quality Assessment with Contrastive Regression

Alessandro Ragano, Jan Skoglund, Andrew Hines

Comments: Accepted NeurIPS 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2410.06846 (cross-list from cs.CL) [pdf, html, other]: Title: Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

Mutian He, Philip N. Garner

Comments: 18 pages, 5 figures; ICLR 2025 camera ready. Code: this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196] arXiv:2410.06927 (cross-list from cs.SD) [pdf, html, other]: Title: Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks

Friedrich Wolf-Monheim

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[197] arXiv:2410.07168 (cross-list from cs.CL) [pdf, html, other]: Title: Sylber: Syllabic Embedding Representation of Speech from Raw Audio

Cheol Jun Cho, Nicholas Lee, Akshat Gupta, Dhruv Agarwal, Ethan Chen, Alan W Black, Gopala K. Anumanchipalli

Comments: Accepted at ICLR 2025

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[198] arXiv:2410.07400 (cross-list from cs.CL) [pdf, other]: Title: Advocating Character Error Rate for Multilingual ASR Evaluation

Thennal D K, Jesin James, Deepa P Gopinath, Muhammed Ashraf K

Comments: 4 pages

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[199] arXiv:2410.07436 (cross-list from cs.LG) [pdf, html, other]: Title: Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Georgia Channing, Juil Sock, Ronald Clark, Philip Torr, Christian Schroeder de Witt

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2410.07491 (cross-list from cs.CL) [pdf, html, other]: Title: Transducer Consistency Regularization for Speech to Text Applications

Cindy Tseng, Yun Tang, Vijendra Raj Apsingekar

Comments: 8 pages, 4 figures. Accepted in IEEE Spoken Language Technology Workshop 2024

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Total of 358 entries : 1-50 51-100 101-150 151-200 201-250 251-300 301-350 ... 351-358

Showing up to 50 entries per page: fewer | more | all