Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for April 2022

Total of 320 entries : 1-25 26-50 51-75 76-100 101-125 ... 301-320
Showing up to 25 entries per page: fewer | more | all
[26] arXiv:2204.02166 [pdf, other]
Title: Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective
Yuying Xie, Thomas Arildsen, Zheng-Hua Tan
Comments: Published in: 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)
Subjects: Audio and Speech Processing (eess.AS)
[27] arXiv:2204.02195 [pdf, other]
Title: Complex Recurrent Variational Autoencoder with Application to Speech Enhancement
Yuying Xie, Thomas Arildsen, Zheng-Hua Tan
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Audio and Speech Processing (eess.AS)
[28] arXiv:2204.02249 [pdf, other]
Title: A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality
Alessandro Ragano, Emmanouil Benetos, Michael Chinen, Helard B. Martinez, Chandan K.A. Reddy, Jan Skoglund, Andrew Hines
Comments: Accepted ISSC 2023
Subjects: Audio and Speech Processing (eess.AS)
[29] arXiv:2204.02263 [pdf, other]
Title: Multilingual and Multimodal Abuse Detection
Rini Sharon, Heet Shah, Debdoot Mukherjee, Vikram Gupta
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[30] arXiv:2204.02281 [pdf, other]
Title: Design Guidelines for Inclusive Speaker Verification Evaluation Datasets
Wiebke Toussaint Hutiri, Lauriane Gorce, Aaron Yi Ding
Comments: Accepted to INTERSPEECH 2022 (submitted version)
Subjects: Audio and Speech Processing (eess.AS); Computers and Society (cs.CY); Machine Learning (cs.LG)
[31] arXiv:2204.02306 [pdf, other]
Title: Low-Latency Speech Separation Guided Diarization for Telephone Conversations
Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
Comments: Accepted for Presentation at IEEE Spoken Language Technology Workshop (SLT) 2022
Subjects: Audio and Speech Processing (eess.AS)
[32] arXiv:2204.02381 [pdf, other]
Title: Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning
Nilaksh Das, Duen Horng Chau
Comments: Submitted to Insterspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[33] arXiv:2204.02385 [pdf, other]
Title: Learning Speech Emotion Representations in the Quaternion Domain
Eric Guizzo, Tillman Weyde, Simone Scardapane, Danilo Comminiello
Comments: Accepted for Publication in IEEE/ACM Transactions on Audio, Speech and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[34] arXiv:2204.02637 [pdf, other]
Title: Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features
Jin Woo Lee, Sungho Lee, Kyogu Lee
Comments: Submitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[35] arXiv:2204.02639 [pdf, other]
Title: Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification
Jin Woo Lee, Eungbeom Kim, Junghyun Koo, Kyogu Lee
Comments: Accepted to be published in the Proceedings of Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS)
[36] arXiv:2204.02694 [pdf, other]
Title: Customizable End-to-end Optimization of Online Neural Network-supported Dereverberation for Hearing Devices
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
Comments: ©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[37] arXiv:2204.02741 [pdf, other]
Title: Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
Comments: accepted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[38] arXiv:2204.02841 [pdf, other]
Title: Spectral Denoising for Microphone Classification
L. Cuccovillo, A. Giganti, P. Bestagini, P. Aichroth, S. Tubaro
Journal-ref: in ACM International Workshop on Multimedia AI against Disinformation (MAD), Newark, NJ, USA, 2022, pp. 10-17
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:2204.02978 [pdf, other]
Title: A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
Comments: Accepted for publication in EURASIP Journal on Audio, Speech and Music Processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[40] arXiv:2204.03166 [pdf, other]
Title: Musical Information Extraction from the Singing Voice
Preeti Rao
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[41] arXiv:2204.03219 [pdf, other]
Title: DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores
Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee
Comments: Accepted to Interspeech 2022. Code will be available in the future
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[42] arXiv:2204.03232 [pdf, other]
Title: Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka
Comments: Submitted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[43] arXiv:2204.03238 [pdf, other]
Title: Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis
Yutian Wang, Yuankun Xie, Kun Zhao, Hui Wang, Qin Zhang
Comments: accepted by IEEE International Conference on Multimedia and Expo 2022 (ICME2022)
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
[44] arXiv:2204.03305 [pdf, other]
Title: MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids
Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[45] arXiv:2204.03306 [pdf, other]
Title: Music-robust Automatic Lyrics Transcription of Polyphonic Music
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li
Comments: 7 pages, 2 figures, accepted by 2022 Sound and Music Computing
Subjects: Audio and Speech Processing (eess.AS)
[46] arXiv:2204.03310 [pdf, other]
Title: MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
Ryandhimas E. Zezario, Szu-wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[47] arXiv:2204.03339 [pdf, other]
Title: Boosting Self-Supervised Embeddings for Speech Enhancement
Kuo-Hsuan Hung, Szu-wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao, Chii-Wann Lin
Comments: accepted to INTERSPEECH-2022
Subjects: Audio and Speech Processing (eess.AS)
[48] arXiv:2204.03379 [pdf, other]
Title: Correcting Mispronunciations in Speech using Spectrogram Inpainting
Talia Ben-Simon, Felix Kreuk, Faten Awwad, Jacob T. Cohen, Joseph Keshet
Comments: Accepted for publication at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[49] arXiv:2204.03417 [pdf, other]
Title: Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer
Comments: Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[50] arXiv:2204.03428 [pdf, other]
Title: Detecting Vocal Fatigue with Neural Embeddings
Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet
Comments: Accepted for Publication in the Journal of Voice
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Total of 320 entries : 1-25 26-50 51-75 76-100 101-125 ... 301-320
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack