Audio and Speech Processing

Authors and titles for March 2024

Total of 213 entries : 1-100 101-200 201-213

Showing up to 100 entries per page: fewer | more | all

[1] arXiv:2403.00293 [pdf, html, other]: Title: Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification

Mufan Sang, John H.L. Hansen

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[2] arXiv:2403.00379 [pdf, html, other]: Title: The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[3] arXiv:2403.00887 [pdf, html, other]: Title: SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

Aron R, Indra Sigicharla, Chirag Periwal, Mohanaprasad K, Nithya Darisini P S, Sourabh Tiwari, Shivani Arora

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[4] arXiv:2403.01130 [pdf, html, other]: Title: Advanced Signal Analysis in Detecting Replay Attacks for Automatic Speaker Verification Systems

Lee Shih Kuang

Comments: this https URL

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[5] arXiv:2403.01355 [pdf, html, other]: Title: a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification

Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, Itshak Lapidot

Comments: published at ISCA Speaker Odyssey 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[6] arXiv:2403.01369 [pdf, html, other]: Title: A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Comments: 8 pages; Shorter form accepted in ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[7] arXiv:2403.01494 [pdf, html, other]: Title: PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian

Comments: Accepted to ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[8] arXiv:2403.01670 [pdf, html, other]: Title: 6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human

Masahiro Yasuda, Shoichiro Saito, Akira Nakayama, Noboru Harada

Comments: ICASSP2024 accepted

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2403.02167 [pdf, html, other]: Title: EMOVOME: A Dataset for Emotion Recognition in Spontaneous Real-Life Speech

Lucía Gómez-Zaragozá, Rocío del Amor, María José Castro-Bleda, Valery Naranjo, Mariano Alcañiz Raya, Javier Marín-Morales

Comments: This article is a merged version of the description of the EMOVOME database in arXiv:2402.17496v1 and the speech emotion recognition models in arXiv:2403.02167v1. This work has been submitted to the IEEE for possible publication

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[10] arXiv:2403.02288 [pdf, html, other]: Title: PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings

Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin

Comments: Speaker Odyssey 2024

Subjects: Audio and Speech Processing (eess.AS)
[11] arXiv:2403.02371 [pdf, html, other]: Title: NeuroVoz: a Castillian Spanish corpus of parkinsonian speech

Janaína Mendes-Laureano, Jorge A. Gómez-García, Alejandro Guerrero-López, Elisa Luque-Buzo, Julián D. Arias-Londoño, Francisco J. Grandas-Pérez, Juan I. Godino-Llorente

Comments: Paper accepted at Scientific Data

Journal-ref: 2024, Scientific Data, 11(1), 1367

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[12] arXiv:2403.03100 [pdf, html, other]: Title: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[13] arXiv:2403.03611 [pdf, html, other]: Title: Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Dang Thoai Phan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14] arXiv:2403.04433 [pdf, other]: Title: Tweaking autoregressive methods for inpainting of gaps in audio signals

Ondřej Mokrý, Pavel Rajmic

Comments: Accepted to EUSIPCO 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[15] arXiv:2403.04743 [pdf, html, other]: Title: Speech Emotion Recognition Via CNN-Transformer and Multidimensional Attention Mechanism

Xiaoyu Tang, Yixin Lin, Ting Dang, Yuanfang Zhang, Jintao Cheng

Subjects: Audio and Speech Processing (eess.AS)
[16] arXiv:2403.04800 [pdf, html, other]: Title: (Un)paired signal-to-signal translation with 1D conditional GANs

Eric Easthope

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[17] arXiv:2403.04804 [pdf, html, other]: Title: AttentionStitch: How Attention Solves the Speech Editing Problem

Antonios Alexos, Pierre Baldi

Comments: Accepted in Machine Learning for Audio workship in NeurIPS 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[18] arXiv:2403.05187 [pdf, html, other]: Title: Robust Semantic Communications for Speech Transmission

Zhenzi Weng, Zhijin Qin, Geoffrey Ye Li

Subjects: Audio and Speech Processing (eess.AS)
[19] arXiv:2403.05393 [pdf, html, other]: Title: Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks

Vikas Tokala, Eric Grinstein, Mike Brookes, Simon Doclo, Jesper Jensen, Patrick A. Naylor

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS)
[20] arXiv:2403.05791 [pdf, html, other]: Title: Asynchronous Microphone Array Calibration using Hybrid TDOA Information

Chengjie Zhang, Jiang Wang, He Kong

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21] arXiv:2403.05887 [pdf, html, other]: Title: Aligning Speech to Languages to Enhance Code-switching Speech Recognition

Hexin Liu, Xiangyu Zhang, Haoyang Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

Comments: Manuscript submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS)
[22] arXiv:2403.06847 [pdf, html, other]: Title: SonoTraceLab -- A Raytracing-Based Acoustic Modelling System for Simulating Echolocation Behavior of Bats

Wouter Jansen, Jan Steckel

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2403.06856 [pdf, html, other]: Title: Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Amit Eliav, Sharon Gannot

Comments: 5 pages, 6 tables, 2 figures

Subjects: Audio and Speech Processing (eess.AS)
[24] arXiv:2403.07579 [pdf, html, other]: Title: On HRTF Notch Frequency Prediction Using Anthropometric Features and Neural Networks

Lior Arbel, Ishwarya Ananthabhotla, Zamir Ben-Hur, David Lou Alon, Boaz Rafaely

Subjects: Audio and Speech Processing (eess.AS)
[25] arXiv:2403.07661 [pdf, html, other]: Title: Gender-ambiguous voice generation through feminine speaking style transfer in male voices

Maria Koutsogiannaki, Shafel Mc Dowall, Ioannis Agiomyrgiannakis

Comments: submitted to Interspeech

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[26] arXiv:2403.07767 [pdf, html, other]: Title: Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets

Jan Pešán, Santosh Kesiraju, Lukáš Burget, Jan ''Honza'' Černocký

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
[27] arXiv:2403.07937 [pdf, html, other]: Title: Speech Robust Bench: A Robustness Benchmark For Speech Recognition

Muhammad A. Shah, David Solans Noguero, Mikko A. Heikkila, Bhiksha Raj, Nicolas Kourtellis

Comments: submitted to NeurIPS datasets and benchmark track 2025

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[28] arXiv:2403.07947 [pdf, other]: Title: The evaluation of a code-switched Sepedi-English automatic speech recognition system

Amanda Phaladi, Thipe Modipa

Comments: 13 pages,2 figures,2nd International Conference on NLP & AI (NLPAI 2024)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
[29] arXiv:2403.08654 [pdf, html, other]: Title: An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

Comments: Under review on IEEE Transactions on Audio, Speech, and Language Processing (2024)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30] arXiv:2403.09524 [pdf, html, other]: Title: Physics-Informed Neural Network for Volumetric Sound field Reconstruction of Speech Signals

Marco Olivieri, Xenofon Karakonstantis, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti, Efren Fernandez-Grande

Subjects: Audio and Speech Processing (eess.AS)
[31] arXiv:2403.09527 [pdf, html, other]: Title: WavCraft: Audio Editing and Generation with Large Language Models

Jinhua Liang, Huan Zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

Subjects: Audio and Speech Processing (eess.AS)
[32] arXiv:2403.09789 [pdf, html, other]: Title: Audiosockets: A Python socket package for Real-Time Audio Processing

Nicolas Shu, David V. Anderson

Comments: 4 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[33] arXiv:2403.10271 [pdf, html, other]: Title: SuperM2M: Supervised and Mixture-to-Mixture Co-Learning for Speech Enhancement and Noise-Robust ASR

Zhong-Qiu Wang

Comments: in Neural Networks

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[34] arXiv:2403.10420 [pdf, html, other]: Title: Hearing-Loss Compensation Using Deep Neural Networks: A Framework and Results From a Listening Test

Peter Leer, Jesper Jensen, Laurel H. Carney, Zheng-Hua Tan, Jan Østergaard, Lars Bramsløw

Subjects: Audio and Speech Processing (eess.AS)
[35] arXiv:2403.10428 [pdf, html, other]: Title: How to train your ears: Auditory-model emulation for large-dynamic-range inputs and mild-to-severe hearing losses

Peter Leer, Jesper Jensen, Zheng-Hua Tan, Jan Østergaard, Lars Bramsløw

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. This version is the authors' version and may vary from the final publication in details

Subjects: Audio and Speech Processing (eess.AS)
[36] arXiv:2403.10548 [pdf, html, other]: Title: Two-sided Acoustic Metascreen for Broadband and Individual Reflection and Transmission Control

Ao Chen, Xin Zhang

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[37] arXiv:2403.10565 [pdf, html, other]: Title: PTSD-MDNN : Fusion tardive de réseaux de neurones profonds multimodaux pour la détection du trouble de stress post-traumatique

Long Nguyen-Phuoc, Renald Gaboriau, Dimitri Delacroix, Laurent Navarro

Comments: in French language. GRETSI 2023

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
[38] arXiv:2403.10756 [pdf, html, other]: Title: Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

Comments: Submitted to EUSIPCO2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:2403.10937 [pdf, html, other]: Title: Initial Decoding with Minimally Augmented Language Model for Improved Lattice Rescoring in Low Resource ASR

Savitha Murthy, Dinkar Sitaram

Comments: 14 pages, 7 figures, Accepted in Sadhana Journal

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
[40] arXiv:2403.11037 [pdf, html, other]: Title: Fine-Grained Engine Fault Sound Event Detection Using Multimodal Signals

Dennis Fedorishin, Livio Forte III, Philip Schneider, Srirangaraj Setlur, Venu Govindaraju

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[41] arXiv:2403.11508 [pdf, html, other]: Title: Discriminative Neighborhood Smoothing for Generative Anomalous Sound Detection

Takuya Fujimura, Keisuke Imoto, Tomoki Toda

Comments: Submitted to EUSIPCO 2024

Subjects: Audio and Speech Processing (eess.AS)
[42] arXiv:2403.11578 [pdf, html, other]: Title: AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

Subjects: Audio and Speech Processing (eess.AS)
[43] arXiv:2403.12182 [pdf, html, other]: Title: Latent CLAP Loss for Better Foley Sound Synthesis

Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic

Journal-ref: EUSIPCO 2024 Proceedings, ISBN: 978-9-4645-9361-7

Subjects: Audio and Speech Processing (eess.AS)
[44] arXiv:2403.12258 [pdf, html, other]: Title: A Multi-loudspeaker Binaural Room Impulse Response Dataset with High-Resolution Translational and Rotational Head Coordinates in a Listening Room

Yue Qiao, Ryan Miguel Gonzales, Edgar Choueiri

Comments: Submitted to Frontiers in Signal Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[45] arXiv:2403.12630 [pdf, html, other]: Title: Reproducing the Acoustic Velocity Vectors in a Circular Listening Area

Jiarui Wang, Thushara Abhayapala, Jihui Aimee Zhang, Prasanga Samarasinghe

Comments: Submitted to the 17th International Conference on Signal Processing and Communication System (ICSPCS 2024)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46] arXiv:2403.13332 [pdf, other]: Title: TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

Comments: Accepted by ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47] arXiv:2403.13356 [pdf, html, other]: Title: KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Huali Zhou, Yuke Lin, Dong Liu, Ming Li

Comments: Accepted by ICPR 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[48] arXiv:2403.13465 [pdf, html, other]: Title: BanglaNum -- A Public Dataset for Bengali Digit Recognition from Speech

Mir Sayeed Mohammad, Azizul Zahid, Md Asif Iqbal

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[49] arXiv:2403.13643 [pdf, other]: Title: Vibration Sensitivity of one-port and two-port MEMS microphones

Francis Doyon-D'Amour, Carly Stalder, Timothy Hodges, Michel Stephan, Lixiue Wu, Triantafillos Koukoulas, Stephane Leahy, Raphael St-Gelais

Comments: 8 pages, 14 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50] arXiv:2403.14179 [pdf, html, other]: Title: AdaProj: Adaptively Scaled Angular Margin Subspace Projections for Anomalous Sound Detection with Auxiliary Classification Tasks

Kevin Wilkinghoff

Comments: Accepted for presentation at DCASE 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[51] arXiv:2403.14246 [pdf, html, other]: Title: CATSE: A Context-Aware Framework for Causal Target Sound Extraction

Shrishail Baligar, Mikolaj Kegler, Bryce Irvin, Marko Stamenovic, Shawn Newsam

Comments: Submitted to EUSIPCO 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[52] arXiv:2403.14268 [pdf, other]: Title: Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints

PeiYing Lee, HauYun Guo, Berlin Chen

Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[53] arXiv:2403.14817 [pdf, html, other]: Title: Crowdsourced Multilingual Speech Intelligibility Testing

Laura Lechler, Kamil Wojcicki

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[54] arXiv:2403.15336 [pdf, html, other]: Title: Dialogue Understandability: Why are we streaming movies with subtitles?

Helard Becerra Martinez, Alessandro Ragano, Diptasree Debnath, Asad Ullah, Crisron Rudolf Lucas, Martin Walsh, Andrew Hines

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
[55] arXiv:2403.15442 [pdf, html, other]: Title: Artificial Intelligence for Cochlear Implants: Review of Strategies, Challenges, and Perspectives

Billel Essaid, Hamza Kheddar, Noureddine Batel, Muhammad E.H.Chowdhury, Abderrahmane Lakas

Journal-ref: IEEE Access, 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[56] arXiv:2403.16610 [pdf, html, other]: Title: Distributed collaborative anomalous sound detection by embedding sharing

Kota Dohi, Yohei Kawaguchi

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[57] arXiv:2403.16973 [pdf, html, other]: Title: VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Puyuan Peng, Po-Yao Huang, Shang-Wen Li, Abdelrahman Mohamed, David Harwath

Comments: ACL 2024. Data, code, and model weights are available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[58] arXiv:2403.17402 [pdf, html, other]: Title: Infrastructure-less Localization from Indoor Environmental Sounds Based on Spectral Decomposition and Spatial Likelihood Model

Satoki Ogiso, Yoshiaki Bando, Takeshi Kurata, Takashi Okuma

Comments: 6 pages, 6 figures, accepted to IEEE/SICE SII 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59] arXiv:2403.17514 [pdf, html, other]: Title: Speaker Distance Estimation in Enclosures from Single-Channel Audio

Michael Neri, Archontis Politis, Daniel Krause, Marco Carli, Tuomas Virtanen

Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[60] arXiv:2403.17864 [pdf, html, other]: Title: Synthetic training set generation using text-to-audio models for environmental sound classification

Francesca Ronchini, Luca Comanducci, Fabio Antonacci

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[61] arXiv:2403.18257 [pdf, other]: Title: Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

Xilin Jiang, Cong Han, Nima Mesgarani

Comments: work in progress

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62] arXiv:2403.18560 [pdf, html, other]: Title: Noise-Robust Keyword Spotting through Self-supervised Pretraining

Jacob Mørk, Holger Severin Bovbjerg, Gergely Kiss, Zheng-Hua Tan

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[63] arXiv:2403.18636 [pdf, html, other]: Title: A Diffusion-Based Generative Equalizer for Music Restoration

Eloi Moliner, Maija Turunen, Filip Elvander, Vesa Välimäki

Comments: Presented at DAFx24. Historical music restoration examples are available at: this http URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64] arXiv:2403.18638 [pdf, html, other]: Title: Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection

Jinhua Liang, Ines Nolasco, Burooj Ghani, Huy Phan, Emmanouil Benetos, Dan Stowell

Subjects: Audio and Speech Processing (eess.AS)
[65] arXiv:2403.19207 [pdf, html, other]: Title: LV-CTC: Non-autoregressive ASR with CTC and latent variable models

Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku

Subjects: Audio and Speech Processing (eess.AS)
[66] arXiv:2403.19217 [pdf, html, other]: Title: Blind Identification of Binaural Room Impulse Responses from Smart Glasses

Thomas Deppisch, Nils Meyer-Kahlen, Sebastià V. Amengual Garí

Comments: Please use the published version of this article when citing: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4052-4065, 2024, doi: https://doi.org/10.1109/TASLP.2024.3454964

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4052-4065, 2024

Subjects: Audio and Speech Processing (eess.AS)
[67] arXiv:2403.19709 [pdf, html, other]: Title: Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

Tsendsuren Munkhdalai, Youzheng Chen, Khe Chai Sim, Fadi Biadsy, Tara Sainath, Pedro Moreno Mengibar

Comments: 5 pages, 3 figures, 5 tables

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[68] arXiv:2403.19971 [pdf, html, other]: Title: 3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization

Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Rongjie Huang, Chong Deng, Qian Chen, Shiliang Zhang, Wen Wang, Xihao Li

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[69] arXiv:2403.20090 [pdf, html, other]: Title: Non-Exponential Reverberation Modeling Using Dark Velvet Noise

Jon Fagerström, Sebastian J. Schlecht, Vesa Välimäki

Comments: Accepted for publication in the Journal of Audio Engineering Society

Subjects: Audio and Speech Processing (eess.AS)
[70] arXiv:2403.20184 [pdf, html, other]: Title: Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context

Tuan Nguyen, Corinne Fredouille, Alain Ghio, Mathieu Balaguer, Virginie Woisard

Comments: Accepted at LREC-COLING 2024

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[71] arXiv:2403.00212 (cross-list from cs.CL) [pdf, html, other]: Title: Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72] arXiv:2403.00274 (cross-list from cs.CV) [pdf, html, other]: Title: CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2403.00370 (cross-list from cs.CL) [pdf, html, other]: Title: Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview

Heyang Liu, Yu Wang, Yanfeng Wang

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2403.00529 (cross-list from cs.SD) [pdf, html, other]: Title: VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee

Comments: preprint

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[75] arXiv:2403.00790 (cross-list from cs.SD) [pdf, other]: Title: Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations

Tofara Moyo

Comments: Inaccuracies in script

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[76] arXiv:2403.00854 (cross-list from q-bio.NC) [pdf, html, other]: Title: Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning

Lauren Stumpf, Balasundaram Kadirvelu, Sigourney Waibel, A. Aldo Faisal

Comments: 17 pages, 2 tables, 4 main figures, 2 supplemental figures, prepared for journal submission

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[77] arXiv:2403.00977 (cross-list from cs.SD) [pdf, html, other]: Title: Scaling Up Adaptive Filter Optimizers

Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:2403.01087 (cross-list from cs.MM) [pdf, html, other]: Title: Towards Accurate Lip-to-Speech Synthesis in-the-Wild

Sindhu Hegde, Rudrabha Mukhopadhyay, C.V. Jawahar, Vinay Namboodiri

Comments: 8 pages of content, 1 page of references and 4 figures

Journal-ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2403.01132 (cross-list from cs.LG) [pdf, other]: Title: MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems

Chu Wang, Jinhong Wu, Yanzhi Wang, Zhijian Zha, Qi Zhou

Comments: The number of figures is 16. The number of tables is 5. The number of words is 9717

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2403.01255 (cross-list from cs.SD) [pdf, html, other]: Title: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey

Hamza Kheddar, Mustapha Hemis, Yassine Himeur

Journal-ref: Information Fusion, Elsevier, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[81] arXiv:2403.01278 (cross-list from cs.SD) [pdf, html, other]: Title: Enhancing Audio Generation Diversity with Visual Information

Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2403.01699 (cross-list from cs.CL) [pdf, other]: Title: Brilla AI: AI Contestant for the National Science and Maths Quiz

George Boateng, Jonathan Abrefah Mensah, Kevin Takyi Yeboah, William Edor, Andrew Kojo Mensah-Onumah, Naafi Dasana Ibrahim, Nana Sam Yeboah

Comments: 14 pages. Accepted for the WideAIED track at the 25th International Conference on AI in Education (AIED 2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2403.01700 (cross-list from cs.SD) [pdf, html, other]: Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

Comments: Accepted by ICASSP 2024

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[84] arXiv:2403.01785 (cross-list from cs.SD) [pdf, html, other]: Title: What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution

Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2403.01792 (cross-list from cs.SD) [pdf, html, other]: Title: ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2403.01960 (cross-list from cs.SD) [pdf, html, other]: Title: A robust audio deepfake detection system via multi-view feature

Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang

Comments: 5 pages, 2 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2403.02002 (cross-list from cs.SD) [pdf, html, other]: Title: Fine-Grained Quantitative Emotion Editing for Speech Generation

Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

Comments: This is accepted to IEEE APSIPA ASC 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88] arXiv:2403.02010 (cross-list from cs.SD) [pdf, html, other]: Title: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2403.02687 (cross-list from cs.HC) [pdf, other]: Title: Enhanced DareFightingICE Competitions: Sound Design and AI Competitions

Ibrahim Khan, Chollakorn Nimpattanavong, Thai Van Nguyen, Kantinan Plupattanakit, Ruck Thawonmas

Comments: This paper describes a new competition platform using Unity for our competitions at the 2024 IEEE Conference on Games (CoG 2024). It was accepted for presentation at CoG 2024. However, we recently discovered a much more effective way to do this task without using Unity, leading to our decision to withdraw the paper from CoG 2024 and ArXiv

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:2403.02701 (cross-list from cs.SD) [pdf, html, other]: Title: Fighting Game Adaptive Background Music for Improved Gameplay

Ibrahim Khan, Thai Van Nguyen, Chollakorn Nimpattanavong, Ruck Thawonmas

Comments: This is an updated version of our IEEE CoG 2023 paper (this https URL). This version has revised the description of the association between the distance between the two players (PD) and the instrument's volume on page 2. arXiv admin note: substantial text overlap with arXiv:2303.15734

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[91] arXiv:2403.02918 (cross-list from cs.RO) [pdf, html, other]: Title: Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction

Yue Li, Koen V Hindriks, Florian Kunneman

Comments: Accepted by ACM Technological Advances in Human-Robot Interaction. 9 pages

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2403.02938 (cross-list from cs.CL) [pdf, html, other]: Title: AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models

Kazuki Kawamura, Jun Rekimoto

Journal-ref: AHs '23: Proceedings of the Augmented Humans International Conference 2023

Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93] arXiv:2403.03095 (cross-list from cs.CV) [pdf, html, other]: Title: Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

Comments: Accepted To ICASSP2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2403.03145 (cross-list from cs.CV) [pdf, html, other]: Title: Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

Comments: Accepted to NeurIPS2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2403.03224 (cross-list from physics.soc-ph) [pdf, html, other]: Title: Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory

Vedant Tapiavala, Joshua Piesner, Sourjyamoy Barman, Feng Fu

Comments: 16 pages, 4 figures

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2403.03395 (cross-list from cs.SD) [pdf, other]: Title: Interactive Melody Generation System for Enhancing the Creativity of Musicians

So Hirawata, Noriko Otani

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[97] arXiv:2403.03411 (cross-list from cs.SD) [pdf, html, other]: Title: CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation

Vahid Ahmadi Kalkhorani, DeLiang Wang

Comments: 9 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2403.03510 (cross-list from cs.SD) [pdf, html, other]: Title: METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D

Stefan Schoder, Paul Maurerlehner

Comments: 4

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)
[99] arXiv:2403.03522 (cross-list from cs.SD) [pdf, html, other]: Title: Non-verbal information in spontaneous speech -- towards a new framework of analysis

Tirza Biron, Moshe Barboy, Eran Ben-Artzy, Alona Golubchik, Yanir Marmor, Smadar Szekely, Yaron Winter, David Harel

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[100] arXiv:2403.03538 (cross-list from cs.SD) [pdf, html, other]: Title: RADIA -- Radio Advertisement Detection with Intelligent Analytics

Jorge Álvarez, Juan Carlos Armenteros, Camilo Torrón, Miguel Ortega-Martín, Alfonso Ardoiz, Óscar García, Ignacio Arranz, Íñigo Galdeano, Ignacio Garrido, Adrián Alonso, Fernando Bayón, Oleg Vorontsov

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Total of 213 entries : 1-100 101-200 201-213

Showing up to 100 entries per page: fewer | more | all