Audio and Speech Processing

Authors and titles for January 2024

Total of 278 entries : 1-100 101-200 201-278

Showing up to 100 entries per page: fewer | more | all

[1] arXiv:2401.00197 [pdf, html, other]: Title: ODAQ: Open Dataset of Audio Quality

Matteo Torcoli, Chih-Wei Wu, Sascha Dick, Phillip A. Williams, Mhd Modar Halimeh, William Wolcott, Emanuel A. P. Habets

Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024

Subjects: Audio and Speech Processing (eess.AS)
[2] arXiv:2401.00225 [pdf, other]: Title: Enhancing dysarthria speech feature representation with empirical mode decomposition and Walsh-Hadamard transform

Ting Zhu, Shufei Duan, Camille Dingam, Huizhi Liang, Wei Zhang

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[3] arXiv:2401.00273 [pdf, html, other]: Title: Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee

Comments: Submitted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond workshop

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[4] arXiv:2401.00813 [pdf, html, other]: Title: Ultraspherical/Gegenbauer polynomials to unify 2D/3D Ambisonic directivity designs

Franz Zotter

Comments: 56 pages, 9 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[5] arXiv:2401.00900 [pdf, html, other]: Title: Detecting the presence of sperm whales echolocation clicks in noisy environments

Guy Gubnitsky, Roee Diamant

Comments: 10 pages and 10 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[6] arXiv:2401.00936 [pdf, html, other]: Title: The role of direct sound spherical harmonics representation in externalization using binaural reproduction

Eran Miller, Boaz Rafaely

Journal-ref: Applied Acoustics, Volume 148, 2019, Pages 40-45

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2401.01099 [pdf, html, other]: Title: Efficient Parallel Audio Generation using Group Masked Language Modeling

Myeonghun Jeong, Minchan Kim, Joun Yeop Lee, Nam Soo Kim

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[8] arXiv:2401.01145 [pdf, html, other]: Title: HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2025

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[9] arXiv:2401.01206 [pdf, html, other]: Title: Room impulse response reconstruction with physics-informed deep learning

Xenofon Karakonstantis, Diego Caviedes-Nozal, Antoine Richard, Efren Fernandez-Grande

Comments: Submitted to Journal of Acoustical Society of America (JASA)

Subjects: Audio and Speech Processing (eess.AS)
[10] arXiv:2401.01255 [pdf, html, other]: Title: On the Parameter Estimation of Sinusoidal Models for Speech and Audio Signals

George P. Kafentzis

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[11] arXiv:2401.01473 [pdf, other]: Title: Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning

Danwei Cai, Zexin Cai, Ze Li, Ming Li

Journal-ref: IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 1535-1550, 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12] arXiv:2401.01498 [pdf, html, other]: Title: Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[13] arXiv:2401.01792 [pdf, html, other]: Title: CoMoSVC: Consistency Model-based Singing Voice Conversion

Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[14] arXiv:2401.02046 [pdf, html, other]: Title: CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition

Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin

Comments: accepted by ASRU 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[15] arXiv:2401.02164 [pdf, html, other]: Title: Listening broadband physical model for microphones: a first step

Laurent Millot (IDEAT), Antoine Valette, Manuel Lopes, Gérard Pelé (IDEAT), Mohammed Elliq, Dominique Lambert (IDEAT)

Journal-ref: 120th Convention of the Audio Engineering Society, Audio Engineering Society, May 2006, Paris, France

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2401.02285 [pdf, html, other]: Title: Optimal Real-Weighted Beamforming With Application to Linear and Spherical Arrays

V. Tourbabin, M. Agmon, B. Rafaely, J. Tabrikian

Journal-ref: n IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 9, pp. 2575-2585, Nov. 2012

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17] arXiv:2401.02386 [pdf, html, other]: Title: Direction of Arrival Estimation Using Microphone Array Processing for Moving Humanoid Robots

Vladimir Tourbabin, Boaz Rafaely

Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, pp. 2046-2058, Nov. 2015

Subjects: Audio and Speech Processing (eess.AS); Robotics (cs.RO); Sound (cs.SD)
[18] arXiv:2401.02417 [pdf, html, other]: Title: Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister

Comments: To appear in ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[19] arXiv:2401.02463 [pdf, html, other]: Title: Some clues to build a sound analysis relevant to hearing

Laurent Millot (ACTE)

Journal-ref: 116th Convention of the Audio Engineering Society,, Audio Engineering Society, May 2004, Berlin (Germany), Germany

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[20] arXiv:2401.02673 [pdf, html, other]: Title: A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Dongdi Zhao, Jianbo Ma, Lu Lu, Jinke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[21] arXiv:2401.02839 [pdf, html, other]: Title: Pheme: Efficient and Conversational Speech Generation

Paweł Budzianowski, Taras Sereda, Tomasz Cichy, Ivan Vulić

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[22] arXiv:2401.03078 [pdf, html, other]: Title: StreamVC: Real-Time Low-Latency Voice Conversion

Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[23] arXiv:2401.03251 [pdf, html, other]: Title: TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR

Nagarathna Ravi, Thishyan Raj T, Vipul Arora

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[24] arXiv:2401.03286 [pdf, html, other]: Title: Theoretical Framework for the Optimization of Microphone Array Configuration for Humanoid Robot Audition

Vladimir Tourbabin, Boaz Rafaely

Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, 1803-1814, 2014

Subjects: Audio and Speech Processing (eess.AS); Robotics (cs.RO); Sound (cs.SD)
[25] arXiv:2401.03291 [pdf, html, other]: Title: Design framework for spherical microphone and loudspeaker arrays in a multiple-input multiple-output system

Hai Morgenstern, Boaz Rafaely, Markus Noisternig

Journal-ref: J. Acoust. Soc. Am. 2017, vol 141, no 3, 2024-2038

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[26] arXiv:2401.03441 [pdf, html, other]: Title: Spatial Reverberation and Dereverberation using an Acoustic Multiple-Input Multiple-Output System

Hai Morgenstern, Boaz Rafaely

Journal-ref: J. Audio Eng. Soc, vol. 65, no. 1/2, pp. 42-55, 2017

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[27] arXiv:2401.03448 [pdf, html, other]: Title: Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments

Renana Opochinsky, Mordehay Moradi, Sharon Gannot

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[28] arXiv:2401.03458 [pdf, html, other]: Title: Modal smoothing for analysis of room reflections measured with spherical microphone and loudspeaker arrays

Hai Morgenstern, Boaz Rafaely

Journal-ref: J. Acoust. Soc. Am., vol. 143, no. 2, pp. 1008-1018, 2018

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[29] arXiv:2401.03468 [pdf, html, other]: Title: Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, Lirong Dai

Comments: Accepted by AAAI 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30] arXiv:2401.03493 [pdf, html, other]: Title: Theory and investigation of acoustic multiple-input multiple-output systems based on spherical arrays in a room

Hai Morgenstern, Boaz Rafaely, Franz Zotter

Journal-ref: J. Acoust. Soc. Am., vol. 138, no. 5, pp. 2998-3009, November 2015

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[31] arXiv:2401.03497 [pdf, html, other]: Title: EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[32] arXiv:2401.03506 [pdf, html, other]: Title: DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

Journal-ref: Proc. Interspeech 2024, 3754-3758 (2024)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[33] arXiv:2401.03567 [pdf, html, other]: Title: Hyperbolic Distance-Based Speech Separation

Darius Petermann, Minje Kim

Comments: To be published at ICASSP2024, 14th of April 2024, Seoul, South Korea. Copyright (c) 2023 IEEE. 5 pages, 2 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[34] arXiv:2401.03650 [pdf, html, other]: Title: DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper

Jayeon Yi, Junghyun Koo, Kyogu Lee

Comments: To appear, ICASSP 2024. Demo samples at this https URL, repo at this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[35] arXiv:2401.03687 [pdf, html, other]: Title: BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

Comments: submitted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36] arXiv:2401.03689 [pdf, html, other]: Title: LUPET: Incorporating Hierarchical Information Path into Multilingual ASR

Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

Comments: Accepted by Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[37] arXiv:2401.03816 [pdf, html, other]: Title: Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss

Yusheng Tian, Jingyu Li, Tan Lee

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[38] arXiv:2401.03850 [pdf, html, other]: Title: Inverse Nonlinearity Compensation of Hyperelastic Deformation in Dielectric Elastomer for Acoustic Actuation

Jin Woo Lee, Gwang Seok An, Jeong-Yun Sun, Kyogu Lee

Journal-ref: IEEE Access 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:2401.03936 [pdf, other]: Title: Exploratory Evaluation of Speech Content Masking

Jennifer Williams, Karla Pizzi, Paul-Gauthier Noe, Sneha Das

Comments: Accepted to ITG Speech Conference 2023

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[40] arXiv:2401.03963 [pdf, other]: Title: Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

Comments: Accepted at ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS)
[41] arXiv:2401.04127 [pdf, html, other]: Title: Using perceptive subbands analysis to perform audio scenes cartography

Laurent Millot (IDEAC), Gérard Pelé (IDEAC), Mohammed Elliq

Journal-ref: 118th Convention of the Audio Engineering Society, Audio Engineering Society, May 2005, Barcelone (Espagne), Spain

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP); Classical Physics (physics.class-ph)
[42] arXiv:2401.04283 [pdf, html, other]: Title: FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43] arXiv:2401.04447 [pdf, html, other]: Title: Class-Incremental Learning for Multi-Label Audio Classification

Manjunath Mulimani, Annamaria Mesaros

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[44] arXiv:2401.04511 [pdf, html, other]: Title: Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement

Soumya Dutta, Sriram Ganapathy

Comments: 5 pages, 3 figures, accepted at ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[45] arXiv:2401.04976 [pdf, html, other]: Title: Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection

Haobo Yue, Zhicheng Zhang, Da Mu, Yonghao Dang, Jianqin Yin, Jin Tang

Comments: Accepted by ICPR2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46] arXiv:2401.05187 [pdf, html, other]: Title: Comparison of linear and nonlinear methods for decoding selective attention to speech from ear-EEG recordings

Mike Thornton, Danilo Mandic, Tobias Reichenbach

Subjects: Audio and Speech Processing (eess.AS)
[47] arXiv:2401.05314 [pdf, html, other]: Title: ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

Kevin Cai, Chonghua Liu, David M. Chan

Comments: To appear in ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[48] arXiv:2401.05717 [pdf, html, other]: Title: Segment Boundary Detection via Class Entropy Measurements in Connectionist Phoneme Recognition

Giampiero Salvi

Journal-ref: Speech Communication Volume 48, Issue 12, December 2006, Pages 1666-1676

Subjects: Audio and Speech Processing (eess.AS); Information Theory (cs.IT); Machine Learning (cs.LG); Sound (cs.SD)
[49] arXiv:2401.05809 [pdf, html, other]: Title: Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression

Yoshihide Tomita, Shoichi Koyama, Hiroshi Saruwatari

Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50] arXiv:2401.05916 [pdf, html, other]: Title: Neural Ambisonics encoding for compact irregular microphone arrays

Mikko Heikkinen, Archontis Politis, Tuomas Virtanen

Comments: Accepted for publication in Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[51] arXiv:2401.06183 [pdf, html, other]: Title: End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2

Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[52] arXiv:2401.06203 [pdf, html, other]: Title: Remixing Music for Hearing Aids Using Ensemble of Fine-Tuned Source Separators

Matthew Daly

Comments: 2 pages, ICASSP 2024, Cadenza Grand Challenge

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[53] arXiv:2401.06387 [pdf, html, other]: Title: Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[54] arXiv:2401.06485 [pdf, html, other]: Title: Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech

Yu Xi, Baochen Yang, Hao Li, Jiaqi Guo, Kai Yu

Comments: Accepted by ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[55] arXiv:2401.06588 [pdf, html, other]: Title: Dynamic Behaviour of Connectionist Speech Recognition with Strong Latency Constraints

Giampiero Salvi

Journal-ref: Speech Communication Volume 48, Issue 7, July 2006, Pages 802-818

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[56] arXiv:2401.06788 [pdf, html, other]: Title: The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie

Comments: Included in CNVSRC Workshop 2023, NCMMSC 2023

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[57] arXiv:2401.06897 [pdf, html, other]: Title: Maximum-Entropy Adversarial Audio Augmentation for Keyword Spotting

Zuzhao Ye, Gregory Ciccarelli, Brian Kulis

Comments: 5 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS)
[58] arXiv:2401.07336 [pdf, other]: Title: Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

Zhu Ting, Li Liangqi, Duan Shufei, Zhang Xueying, Xiao Zhongzhe, Jia Hairng, Liang Huizhi

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[59] arXiv:2401.07342 [pdf, html, other]: Title: Who Said What? An Automated Approach to Analyzing Speech in Preschool Classrooms

Anchen Sun, Juan J Londono, Batya Elbaum, Luis Estrada, Roberto Jose Lazo, Laura Vitale, Hugo Gonzalez Villasanti, Riccardo Fusaroli, Lynn K Perry, Daniel S Messinger

Comments: 8 pages, 4 figures, 3 tables, The paper has been accepted to 2024 IEEE International Conference on Development and Learning (ICDL) as a full oral presentation and will appear in the IEEE ICDL proceedings

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[60] arXiv:2401.07506 [pdf, html, other]: Title: SeMaScore : a new evaluation metric for automatic speech recognition tasks

Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar

Comments: Accepted at Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[61] arXiv:2401.07681 [pdf, other]: Title: Effect of target signals and delays on spatially selective active noise control for open-fitting hearables

Tong Xiao, Simon Doclo

Comments: ICASSP 2024 (c) 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Audio and Speech Processing (eess.AS); Systems and Control (eess.SY)
[62] arXiv:2401.07849 [pdf, other]: Title: Comparison of Frequency-Fusion Mechanisms for Binaural Direction-of-Arrival Estimation for Multiple Speakers

Daniel Fejgin, Elior Hadad, Sharon Gannot, Zbyněk Koldovský, Simon Doclo

Comments: Accepted for ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[63] arXiv:2401.08052 [pdf, html, other]: Title: Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization

Ming Cheng, Ming Li

Comments: Accepted by IEEE Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS)
[64] arXiv:2401.08166 [pdf, other]: Title: ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis

Haobin Tang, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[65] arXiv:2401.08268 [pdf, html, other]: Title: An Explainable Proxy Model for Multiabel Audio Segmentation

Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega

Comments: Accepted at ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[66] arXiv:2401.08342 [pdf, other]: Title: ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings

Jenthe Thienpondt, Kris Demuynck

Comments: proceedings of ASRU 2023

Subjects: Audio and Speech Processing (eess.AS)
[67] arXiv:2401.08486 [pdf, other]: Title: Microphone Subset Selection for the Weighted Prediction Error Algorithm using a Group Sparsity Penalty

Anselm Lohmann, Toon van Waterschoot, Joerg Bitzer, Simon Doclo

Subjects: Audio and Speech Processing (eess.AS)
[68] arXiv:2401.08678 [pdf, html, other]: Title: Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Comments: Submitted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[69] arXiv:2401.08833 [pdf, html, other]: Title: Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective

Alexander H. Liu, Sung-Lin Yeh, James Glass

Comments: ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[70] arXiv:2401.08864 [pdf, html, other]: Title: Binaural Angular Separation Network

Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[71] arXiv:2401.08916 [pdf, html, other]: Title: Two-pass Endpoint Detection for Speech Recognition

Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

Comments: ASRU 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[72] arXiv:2401.09308 [pdf, html, other]: Title: Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting Networks?

Stefano Damiano, Luca Bondi, Shabnam Ghaffarzadegan, Andre Guntoro, Toon van Waterschoot

Comments: Accepted paper: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[73] arXiv:2401.09315 [pdf, html, other]: Title: On Speech Pre-emphasis as a Simple and Inexpensive Method to Boost Speech Enhancement

Iván López-Espejo, Aditya Joglekar, Antonio M. Peinado, Jesper Jensen

Subjects: Audio and Speech Processing (eess.AS)
[74] arXiv:2401.09354 [pdf, other]: Title: Transcending Controlled Environments Assessing the Transferability of ASRRobust NLU Models to Real-World Applications

Hania Khan, Aleena Fatima Khalid, Zaryab Hassan

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[75] arXiv:2401.09686 [pdf, html, other]: Title: An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li

Comments: Accepted by ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[76] arXiv:2401.09717 [pdf, html, other]: Title: Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder

Tahiya Chowdhury, Veronica Romero, Amanda Stent

Comments: 5 pages, 4 tables, Proceedings of INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[77] arXiv:2401.09802 [pdf, html, other]: Title: Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

Minsu Kim, Jeong Hun Yeo, Se Jin Park, Hyeongseop Rha, Yong Man Ro

Comments: ACMMM 2024

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[78] arXiv:2401.10032 [pdf, html, other]: Title: FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[79] arXiv:2401.10411 [pdf, html, other]: Title: AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[80] arXiv:2401.10449 [pdf, html, other]: Title: Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search

Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe

Comments: accepted by ICASSP20224

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[81] arXiv:2401.10453 [pdf, other]: Title: 3D Room Geometry Inference from Multichannel Room Impulse Response using Deep Neural Network

Inmo Yeon, Jung-Woo Choi

Comments: 5 pages, 2 figures, Proceedings of the 24th International Congress on Acoustics

Journal-ref: Proceedings of the 24th International Congress on Acoustics, ICA 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[82] arXiv:2401.10494 [pdf, html, other]: Title: A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement

Yuewei Zhang, Huanbin Zou, Jie Zhu

Comments: Accepted by ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[83] arXiv:2401.10543 [pdf, html, other]: Title: Multilingual acoustic word embeddings for zero-resource languages

Christiaan Jacobs

Comments: PhD thesis

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[84] arXiv:2401.11017 [pdf, html, other]: Title: Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition

Ismail Rasim Ulgen, Zongyang Du, Carlos Busso, Berrak Sisman

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[85] arXiv:2401.11053 [pdf, html, other]: Title: StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang

Comments: Accepted by ACL2024 (Main)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[86] arXiv:2401.11645 [pdf, html, other]: Title: Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh Mehta

Comments: Published in IEEE's Spoken Language Technology (SLT) 2022, 8 pages (6 + 2 for references), 5 figures

Journal-ref: 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 252-259

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[87] arXiv:2401.11771 [pdf, other]: Title: Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

Vinotha R, Hepsiba D, L. D. Vijay Anand, Deepak John Reji

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[88] arXiv:2401.11829 [pdf, other]: Title: Harmonic Detection from Noisy Speech with Auditory Frame Gain for Intelligibility Enhancement

A. Queiroz, R. Coelho

Comments: 9 pages, 6 figures, 4 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[89] arXiv:2401.11832 [pdf, html, other]: Title: Acoustic Disturbance Sensing Level Detection for ASD Diagnosis and Intelligibility Enhancement

Marcelo Pillonetto, Anderson Queiroz, Rosângela Coelho

Comments: 4 pages, 3 figures, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[90] arXiv:2401.11857 [pdf, other]: Title: Adversarial speech for voice privacy protection from Personalized Speech generation

Shihao Chen, Liping Chen, Jie Zhang, KongAik Lee, Zhenhua Ling, Lirong Dai

Comments: Accepted by icassp 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[91] arXiv:2401.12085 [pdf, other]: Title: Consistency Based Unsupervised Self-training For ASR Personalisation

Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

Comments: Accepted for IEEE ASRU 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[92] arXiv:2401.12160 [pdf, other]: Title: ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter

Yi-Chiao Wu, Dejan Marković, Steven Krenn, Israel D. Gebru, Alexander Richard

Comments: 5 pages, 3 figures, 2 tables. Proc. ICASSP, 2024

Subjects: Audio and Speech Processing (eess.AS)
[93] arXiv:2401.12238 [pdf, html, other]: Title: Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[94] arXiv:2401.12264 [pdf, html, other]: Title: CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Image and Video Processing (eess.IV)
[95] arXiv:2401.12440 [pdf, html, other]: Title: Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models

Chenyang Gao, Brecht Desplanques, Chelsea J.-T. Ju, Aman Chadha, Andreas Stolcke

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[96] arXiv:2401.12473 [pdf, html, other]: Title: Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe

Comments: 5 pages, 4 figures, accepted by ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[97] arXiv:2401.12570 [pdf, html, other]: Title: DiffMoog: a Differentiable Modular Synthesizer for Sound Matching

Noy Uzrad, Oren Barkan, Almog Elharar, Shlomi Shvartzman, Moshe Laufer, Lior Wolf, Noam Koenigstein

Comments: 5 pages, 7 figures, 1 table, Our code is released at this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[98] arXiv:2401.12850 [pdf, html, other]: Title: End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization

Prachi Singh, Sriram Ganapathy

Comments: 11 pages. Under review IEEE TASLP. \c{opyright} 2024 IEEE

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[99] arXiv:2401.13146 [pdf, html, other]: Title: Locality enhanced dynamic biasing and sampling strategies for contextual ASR

Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

Comments: Accepted for IEEE ASRU 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[100] arXiv:2401.13249 [pdf, html, other]: Title: MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

Comments: Accepted in ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)

Total of 278 entries : 1-100 101-200 201-278

Showing up to 100 entries per page: fewer | more | all