Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for February 2021

Total of 208 entries : 1-50 51-100 101-150 151-200 201-208
Showing up to 50 entries per page: fewer | more | all
[51] arXiv:2102.07961 [pdf, other]
Title: Semi-Supervised Singing Voice Separation with Noisy Self-Training
Zhepei Wang, Ritwik Giri, Umut Isik, Jean-Marc Valin, Arvindh Krishnaswamy
Comments: Accepted at 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)
Subjects: Audio and Speech Processing (eess.AS)
[52] arXiv:2102.08075 [pdf, other]
Title: Axial Residual Networks for CycleGAN-based Voice Conversion
Jaeseong You, Gyuhyeon Nam, Dalhyun Kim, Gyeongsu Chae
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53] arXiv:2102.08328 [pdf, other]
Title: Context-Aware Prosody Correction for Text-Based Speech Editing
Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo
Comments: To appear in proceedings of ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[54] arXiv:2102.08706 [pdf, other]
Title: Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder
Huajian Fang, Guillaume Carbajal, Stefan Wermter, Timo Gerkmann
Comments: ICASSP 2021. (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[55] arXiv:2102.09106 [pdf, other]
Title: Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition
Gary Yeung, Ruchao Fan, Abeer Alwan
Comments: To be published in IEEE ICASSP
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56] arXiv:2102.09168 [pdf, other]
Title: Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition
Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
Comments: Accepted to ICASSP2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57] arXiv:2102.09660 [pdf, other]
Title: Generative Speech Coding with Predictive Variance Regularization
W. Bastiaan Kleijn, Andrew Storus, Michael Chinen, Tom Denton, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Hengchin Yeh
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58] arXiv:2102.09666 [pdf, other]
Title: Dynamic curriculum learning via data parameters for noise robust keyword spotting
Takuya Higuchi, Shreyas Saxena, Mehrez Souden, Tien Dung Tran, Masood Delfarah, Chandra Dhir
Comments: Accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[59] arXiv:2102.09838 [pdf, other]
Title: A Robust Maximum Likelihood Distortionless Response Beamformer based on a Complex Generalized Gaussian Distribution
Weixin Meng, Chengshi Zheng, Xiaodong Li
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[60] arXiv:2102.09853 [pdf, other]
Title: Direction of Arrival Estimation of Noisy Speech Using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals
Nils Poschadel, Robert Hupke, Stephan Preihs, Jürgen Peissig
Comments: 5 pages, 6 figures. Accepted to EUSIPCO 2021
Subjects: Audio and Speech Processing (eess.AS)
[61] arXiv:2102.09918 [pdf, other]
Title: End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Prashanth Gurunath Shivakumar, Shrikanth Narayanan
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62] arXiv:2102.09928 [pdf, other]
Title: Do End-to-End Speech Recognition Models Care About Context?
Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel
Comments: Published in the proceedings of INTERSPEECH 2020, pp. 4352-4356
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[63] arXiv:2102.09939 [pdf, other]
Title: ABSP System for The Third DIHARD Challenge
A Kishore Kumar, Shefali Waldekar, Goutam Saha, Md Sahidullah
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[64] arXiv:2102.09959 [pdf, other]
Title: Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast
Satvik Venkatesh, David Moffat, Alexis Kirke, Gözel Shakeri, Stephen Brewster, Jörg Fachner, Helen Odell-Miller, Alex Street, Nicolas Farina, Sube Banerjee, Eduardo Reck Miranda
Comments: 5 pages, 3 figures, Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[65] arXiv:2102.10345 [pdf, other]
Title: Model architectures to extrapolate emotional expressions in DNN-based text-to-speech
Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima
Comments: This is the author's final draft. Accepted by Speech Communication. Please refer to the journal if you want
Subjects: Audio and Speech Processing (eess.AS)
[66] arXiv:2102.10376 [pdf, other]
Title: The Use of Voice Source Features for Sung Speech Recognition
Gerardo Roa Dabike, Jon Barker
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[67] arXiv:2102.10449 [pdf, other]
Title: WARP-Q: Quality Prediction For Generative Neural Speech Codecs
Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines
Comments: Accepted for presentation at IEEE ICASSP 2021. Source code and data can be found on this https URL
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[68] arXiv:2102.10815 [pdf, other]
Title: LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2012.01684
Subjects: Audio and Speech Processing (eess.AS)
[69] arXiv:2102.11265 [pdf, other]
Title: Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies
Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Karan Singla, Victor Ardulov, Raghuveer Peri, Derek D. Caperton, James Gibson, Michael J. Tanana, Panayiotis Georgiou, Jake Van Epps, Sarah P. Lord, Tad Hirsch, Zac E. Imel, David C. Atkins, Shrikanth Narayanan
Comments: new version has an updated title
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[70] arXiv:2102.11480 [pdf, other]
Title: Evolutionary optimization of contexts for phonetic correction in speech recognition systems
Rafael Viana-Cámara, Diego Campos-Sobrino, Mario Campos-Soberanis
Comments: 13 pages, 4 figures, This article is a translation of the paper "Optimización evolutiva de contextos para la corrección fonética en sistemas de reconocimiento del habla" presented in COMIA 2019
Journal-ref: Research in Computing Science Issue 148(8), 2019, pp. 293-306. ISSN 1870-4069
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[71] arXiv:2102.11525 [pdf, other]
Title: End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian
Comments: 5 pages, 1 figure, accepted by ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[72] arXiv:2102.11594 [pdf, other]
Title: Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition
Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[73] arXiv:2102.11634 [pdf, other]
Title: Dual-Path Modeling for Long Recording Speech Separation in Meetings
Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian
Comments: Accepted by ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[74] arXiv:2102.11906 [pdf, other]
Title: Handling Background Noise in Neural Speech Generation
Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus, Hengchin Yeh, W. Bastiaan Kleijn, Jan Skoglund
Comments: 5 pages, 3 figures, presented at the Asilomar Conference on Signals, Systems, and Computers 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75] arXiv:2102.12078 [pdf, other]
Title: Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks
Ju Lin, Adriaan J. van Wijngaarden, Kuang-Ching Wang, Melissa C. Smith
Comments: Preprint
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[76] arXiv:2102.12394 [pdf, other]
Title: SEP-28k: A Dataset for Stuttering Event Detection From Podcasts With People Who Stutter
Colin Lea, Vikramjit Mitra, Aparna Joshi, Sachin Kajarekar, Jeffrey P. Bigham
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[77] arXiv:2102.12397 [pdf, other]
Title: Thoughts on the potential to compensate a hearing loss in noise
Marc René Schädler
Comments: 26 pages, 22 figures, related code this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[78] arXiv:2102.12624 [pdf, other]
Title: Meta-Learning for improving rare word recognition in end-to-end ASR
Florian Lux, Ngoc Thang Vu
Comments: Revised version to be published in the proceedings of ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79] arXiv:2102.12829 [pdf, other]
Title: Automatic Classification of OSA related Snoring Signals from Nocturnal Audio Recordings
Arun Sebastian, Peter A. Cistulli, Gary Cohen, Philip de Chazal
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[80] arXiv:2102.13334 [pdf, other]
Title: Integration of deep learning with expectation maximization for spatial cue based speech separation in reverberant conditions
Sania Gul, Muhammad Salman Khan, Syed Waqar Shah
Subjects: Audio and Speech Processing (eess.AS)
[81] arXiv:2102.13397 [pdf, other]
Title: Underwater Acoustic Communication Receiver Using Deep Belief Network
Abigail Lee-Leon, Chau Yuen, Dorien Herremans
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[82] arXiv:2102.13468 [pdf, other]
Title: The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates
Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, Casper Kaandorp
Comments: 5 pages
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[83] arXiv:2102.00151 (cross-list from cs.SD) [pdf, other]
Title: Expressive Neural Voice Cloning
Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
Comments: 12 pages, 2 figures, 2 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[84] arXiv:2102.00201 (cross-list from cs.SD) [pdf, other]
Title: Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging
Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra, Dmitry Bogdanov
Comments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[85] arXiv:2102.00247 (cross-list from cs.CL) [pdf, other]
Title: Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Shilun Lin, Fenglong Xie, Li Meng, Xinhui Li, Li Lu
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[86] arXiv:2102.00291 (cross-list from cs.SD) [pdf, other]
Title: Speech Recognition by Simply Fine-tuning BERT
Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda
Comments: Accepted to ICASSP 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[87] arXiv:2102.00313 (cross-list from cs.SD) [pdf, other]
Title: Cortical Features for Defense Against Adversarial Audio Attacks
Ilya Kavalerov, Ruijie Zheng, Wojciech Czaja, Rama Chellappa
Comments: Co-author legal name changed
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[88] arXiv:2102.00382 (cross-list from cs.SD) [pdf, other]
Title: Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks
Ruchit Agrawal, Daniel Wolff, Simon Dixon
Comments: ICASSP 2021 camera-ready version. Copyrights belong to IEEE
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[89] arXiv:2102.00429 (cross-list from cs.SD) [pdf, other]
Title: High Fidelity Speech Regeneration with Application to Speech Enhancement
Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[90] arXiv:2102.00550 (cross-list from cs.SD) [pdf, other]
Title: Boosting the Predictive Accurary of Singer Identification Using Discrete Wavelet Transform For Feature Extraction
Victoire Djimna Noyum, Younous Perieukeu Mofenjou, Cyrille Feudjio, Alkan Göktug, Ernest Fokoué
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[91] arXiv:2102.00616 (cross-list from cs.SD) [pdf, other]
Title: Neural Network architectures to classify emotions in Indian Classical Music
Uddalok Sarkar, Sayan Nag, Medha Basu, Archi Banerjee, Shankha Sanyal, Ranjan Sengupta, Dipak Ghosh
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[92] arXiv:2102.01013 (cross-list from cs.CL) [pdf, other]
Title: End2End Acoustic to Semantic Transduction
Valentin Pelloin, Nathalie Camelin, Antoine Laurent, Renato De Mori, Antoine Caubrière, Yannick Estève, Sylvain Meignier
Comments: Accepted at IEEE ICASSP 2021
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93] arXiv:2102.01133 (cross-list from cs.SD) [pdf, other]
Title: Deep Music Information Dynamics
Shlomo Dubnov
Journal-ref: The 2020 Joint Conference on AI Music Creativity, October 19-23, 2020, Royal Institute of Technology (KTH), Stockholm, Sweden
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[94] arXiv:2102.01243 (cross-list from cs.SD) [pdf, other]
Title: PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Yuan Gong, Yu-An Chung, James Glass
Comments: Published in IEEE/ACM Transactions on Audio Speech and Language Processing. Code at this https URL
Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3292-3306, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[95] arXiv:2102.01547 (cross-list from cs.SD) [pdf, other]
Title: WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit
Zhuoyuan Yao, Di Wu, Xiong Wang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei
Comments: 5 pages, 2 figures, 4 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[96] arXiv:2102.01640 (cross-list from cs.SD) [pdf, other]
Title: SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer
Pramit Saha, Debasish Ray Mohapatra, Sidney Fels
Comments: 2 pages, 1 figure
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[97] arXiv:2102.01692 (cross-list from cs.SD) [pdf, other]
Title: Generacion de voces artificiales infantiles en castellano con acento costarricense
Ana Lilia Alvarez-Blanco, Eugenia Cordoba-Warner, Marvin Coto-Jimenez, Vivian Fallas-Lopez, Maribel Morales Rodriguez
Comments: 12 pages, in Spanish
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[98] arXiv:2102.01813 (cross-list from cs.SD) [pdf, other]
Title: Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation
Mingke Xu, Fan Zhang, Xiaodong Cui, Wei Zhang
Comments: Accepted by ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[99] arXiv:2102.01927 (cross-list from cs.SD) [pdf, other]
Title: Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance
Keisuke Imoto, Sakiko Mishima, Yumi Arai, Reishi Kondo
Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2006.15253
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100] arXiv:2102.01930 (cross-list from cs.SD) [pdf, other]
Title: General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Total of 208 entries : 1-50 51-100 101-150 151-200 201-208
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack