close this message
arXiv smileybones

arXiv Is Hiring a DevOps Engineer

Work on one of the world's most important websites and make an impact on open science.

View Jobs
Skip to main content
Cornell University

arXiv Is Hiring a DevOps Engineer

View Jobs
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for March 2021

Total of 160 entries
Showing up to 2000 entries per page: fewer | more | all
[1] arXiv:2103.00110 [pdf, other]
Title: MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
Yichong Leng, Xu Tan, Sheng Zhao, Frank Soong, Xiang-Yang Li, Tao Qin
Comments: Accepted by ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2103.00255 [pdf, other]
Title: Expert Decision Support System for aeroacoustic source type identification using clustering
Armin Goudarzi, Carsten Spehr, Steffen Herbold
Comments: Preprint for JASA Journal
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2103.00383 [pdf, other]
Title: Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech Recognition
Gautam Krishna, Mason Carnahan, Shilpa Shamapant, Yashitha Surendranath, Saumya Jain, Arundhati Ghosh, Co Tran, Jose del R Millan, Ahmed H Tewfik
Comments: Accepted to IEEE EMBC 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[4] arXiv:2103.00417 [pdf, other]
Title: Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization
Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa
Comments: Published in Proceedings of the 28th European Signal Processing Conference (EUSIPCO), 2020
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5] arXiv:2103.01173 [pdf, other]
Title: Unsupervised Classification of Voiced Speech and Pitch Tracking Using Forward-Backward Kalman Filtering
Benedikt Boenninghoff, Robert M. Nickel, Steffen Zeiler, Dorothea Kolossa
Comments: Speech Communication; 12. ITG Symposium, 5-7 Oct. 2016
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:2103.01463 [pdf, other]
Title: Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura
Comments: Accepted to ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7] arXiv:2103.01599 [pdf, other]
Title: Open Range Pitch Tracking for Carrier Frequency Difference Estimation from HF Transmitted Speech
Joerg Schmalenstroeer, Jens Heitkaemper, Joerg Ullmann, Reinhold Haeb-Umbach
Comments: Submitted to EUSIPCO 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2103.01806 [pdf, other]
Title: Virufy: A Multi-Branch Deep Learning Network for Automated Detection of COVID-19
Ahmed Fakhry, Xinyi Jiang, Jaclyn Xiao, Gunvant Chaudhari, Asriel Han, Amil Khanzada
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9] arXiv:2103.01830 [pdf, other]
Title: Audio scene monitoring using redundant ad-hoc microphone array networks
Peter Gerstoft, Yihan Hu, Michael J. Bianco, Chaitanya Patil, Ardel Alegre, Yoav Freund, Francois Grondin
Comments: IN press, IEEE Internet of Things Journal
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10] arXiv:2103.01893 [pdf, other]
Title: Listen, Read, and Identify: Multimodal Singing Language Identification of Music
Keunwoo Choi, Yuxuan Wang
Comments: ISMIR 2021 camera-ready
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[11] arXiv:2103.01894 [pdf, other]
Title: Investigations on Audiovisual Emotion Recognition in Noisy Conditions
Michael Neumann, Ngoc Thang Vu
Comments: Published at the IEEE workshop on Spoken Language Technology (SLT) 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[12] arXiv:2103.01929 [pdf, other]
Title: SoundCLR: Contrastive Learning of Representations For Improved Environmental Sound Classification
Alireza Nasiri, Jianjun Hu
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13] arXiv:2103.02378 [pdf, other]
Title: Continuous Speech Separation with Ad Hoc Microphone Arrays
Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[14] arXiv:2103.02420 [pdf, other]
Title: Multi-view Audio and Music Classification
Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Lam Pham, Philipp Koch, Ian McLoughlin, Alfred Mertins
Comments: Accepted to ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:2103.02644 [pdf, other]
Title: Compute and memory efficient universal sound source separation
Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis
Comments: Accepted to Journal of Signal Processing Systems this https URL. arXiv admin note: substantial text overlap with arXiv:2007.06833
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:2103.02993 [pdf, other]
Title: Speech Emotion Recognition using Semantic Information
Panagiotis Tzirakis, Anh Nguyen, Stefanos Zafeiriou, Björn W. Schuller
Comments: ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2103.03142 [pdf, other]
Title: Error-driven Fixed-Budget ASR Personalization for Accented Speakers
Abhijeet Awasthi, Aman Kansal, Sunita Sarawagi, Preethi Jyothi
Comments: In ICASSP 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[18] arXiv:2103.03483 [pdf, other]
Title: Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices
Md Mohaimenuzzaman, Christoph Bergmeir, Ian Thomas West, Bernd Meyer
Journal-ref: Pattern Recognition, p.109025 (2022)
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19] arXiv:2103.03516 [pdf, other]
Title: Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen
Comments: Accepted for presentation at ICASSP 2021
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[20] arXiv:2103.03927 [pdf, other]
Title: AudioVisual Speech Synthesis: A brief literature review
Efthymios Georgiou, Athanasios Katsamanis
Comments: review is written in Greek
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[21] arXiv:2103.04097 [pdf, other]
Title: Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system
Noé Tits, Kevin El Haddad, Thierry Dutoit
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[22] arXiv:2103.05236 [pdf, other]
Title: GAN Vocoder: Multi-Resolution Discriminator Is All You Need
Jaeseong You, Dalhyun Kim, Gyuhyeon Nam, Geumbyeol Hwang, Gyeongsu Chae
Comments: Accepted to Interspeech 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[23] arXiv:2103.05719 [pdf, other]
Title: Spheroidal Ambisonics: a Spatial Audio Framework Using Spheroidal Bases
Shoken Kaneko
Journal-ref: JASA Express Letters 1.8 (2021): 084803
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2103.06049 [pdf, other]
Title: Search Disaster Victims using Sound Source Localization
Abhish Khanal, Deepak Chand, Prakash Chaudhary, Subash Timilsina, Sanjeeb Prasad Panday, Aman Shakya, Rom Kant Pandey
Comments: 9 pages, 17 figures, 17th ISCRAM Conference Blacksburg, VA, USA
Journal-ref: Iscram 2020 1022-1030
Subjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[25] arXiv:2103.06157 [pdf, other]
Title: Automatic Speaker Independent Dysarthric Speech Intelligibility Assessment System
Ayush Tripathi, Swapnil Bhosale, Sunil Kumar Kopparapu
Comments: 29 pages, 2 figures, Computer Speech & Language 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[26] arXiv:2103.06508 [pdf, other]
Title: Multi-Format Contrastive Learning of Audio Representations
Luyu Wang, Aaron van den Oord
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:2103.06620 [pdf, other]
Title: Topological Data Analysis of Korean Music in Jeongganbo: A Cycle Structure
Mai Lan Tran, Changbom Park, Jae-Hun Jung
Subjects: Sound (cs.SD); Computational Geometry (cs.CG); Audio and Speech Processing (eess.AS)
[28] arXiv:2103.07125 [pdf, other]
Title: Learning spectro-temporal representations of complex sounds with parameterized neural networks
Rachid Riad, Julien Karadayi, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[29] arXiv:2103.07197 [pdf, other]
Title: Latent Space Explorations of Singing Voice Synthesis using DDSP
Juan Alonso, Cumhur Erkut
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[30] arXiv:2103.07220 [pdf, other]
Title: Real-time Timbre Transfer and Sound Synthesis using DDSP
Francesco Ganis, Erik Frej Knudesn, Søren V. K. Lyster, Robin Otterbein, David Südholt, Cumhur Erkut
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[31] arXiv:2103.07276 [pdf, other]
Title: Modelling Animal Biodiversity Using Acoustic Monitoring and Deep Learning
C. Chalmers, P.Fergus, S. Wich, S. N. Longmore
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[32] arXiv:2103.07656 [pdf, other]
Title: Optimal Embedding Calibration for Symbolic Music Similarity
Xinran Zhang, Maosong Sun, Jiafeng Liu, Xiaobing Li
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[33] arXiv:2103.07904 [pdf, other]
Title: Blind Estimation of Room Acoustic Parameters and Speech Transmission Index using MTF-based CNNs
Suradej Duangpummet, Jessada Karnjana, Waree Kongprawechnon, Masashi Unoki
Comments: 5 pages, 10 figures, IEEEtran class
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2103.08086 [pdf, other]
Title: Multi-Discriminator Sobolev Defense-GAN Against Adversarial Attacks for End-to-End Speech Systems
Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
Comments: 10 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35] arXiv:2103.08095 [pdf, other]
Title: Towards Robust Speech-to-Text Adversarial Attack
Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
Comments: 5 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36] arXiv:2103.08203 [pdf, other]
Title: Computational timbre and tonal system similarity analysis of the music of Northern Myanmar-based Kachin compared to Xinjiang-based Uyghur ethnic groups
Rolf Bader, Michael Blaß, Jonas Franke
Comments: 12 pages, 9 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[37] arXiv:2103.08310 [pdf, other]
Title: EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition
Maurice Gerczuk, Shahin Amiriparian, Sandra Ottl, Björn Schuller
Comments: 18 pages, 7 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38] arXiv:2103.08569 [pdf, other]
Title: DHASP: Differentiable Hearing Aid Speech Processing
Zehai Tu, Ning Ma, Jon Barker
Comments: To appear at ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[39] arXiv:2103.08993 [pdf, other]
Title: Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning
Jama Hussein Mohamud, Lloyd Acquaye Thompson, Aissatou Ndoye, Laurent Besacier
Comments: Accepted at AfricaNLP2021 workshop at EACL 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[40] arXiv:2103.09063 [pdf, other]
Title: An Asynchronous WFST-Based Decoder For Automatic Speech Recognition
Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev Khudanpur
Comments: 5 pages, 5 figures, icassp
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:2103.09410 [pdf, other]
Title: Contrastive Learning of Musical Representations
Janne Spijkervet, John Ashley Burgoyne
Comments: 15 pages, 8 figures. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[42] arXiv:2103.09879 [pdf, other]
Title: Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking
Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[43] arXiv:2103.10018 [pdf, other]
Title: Audio Description from Image by Modal Translation Network
Hailong Ning, Xiangtao Zheng, Yuan Yuan, Xiaoqiang Lu
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[44] arXiv:2103.10661 [pdf, other]
Title: USTC-NELSLIP System Description for DIHARD-III Challenge
Yuxuan Wang, Maokui He, Shutong Niu, Lei Sun, Tian Gao, Xin Fang, Jia Pan, Jun Du, Chin-Hui Lee
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45] arXiv:2103.11730 [pdf, other]
Title: Reduced basis methods for numerical room acoustic simulations with parametrized boundaries
Hermes Sampedro Llopis, Allan P. Engsig-Karup, Cheol-Ho Jeong, Finnur Pind, Jan S. Hesthaven
Comments: 10 figures and 2 tables
Subjects: Sound (cs.SD); Computational Engineering, Finance, and Science (cs.CE); Audio and Speech Processing (eess.AS)
[46] arXiv:2103.11988 [pdf, other]
Title: Self-paced ensemble learning for speech and audio classification
Nicolae-Catalin Ristea, Radu Tudor Ionescu
Comments: Accepted at INTERSPEECH 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[47] arXiv:2103.12152 [pdf, other]
Title: Musical Mix Clarity Predication using Decomposition and Perceptual Masking Thresholds
Andrew Parker, Steven Fenton
Comments: 18 pages, 7 figures, 2 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48] arXiv:2103.12157 [pdf, other]
Title: Tiny Transformers for Environmental Sound Classification at the Edge
David Elliott, Carlos E. Otero, Steven Wyatt, Evan Martino
Comments: 12 pages, submitted to IEEE Journal of Internet of Things
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[49] arXiv:2103.12306 [pdf, other]
Title: GISE-51: A scalable isolated sound events dataset
Sarthak Yadav, Mary Ellen Foster
Comments: Technical Report
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2103.12864 [pdf, other]
Title: Learned complex masks for multi-instrument source separation
Andreas Jansson, Rachel M. Bittner, Nicola Montecchio, Tillman Weyde
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[51] arXiv:2103.13219 [pdf, other]
Title: Transfer Learning for Piano Sustain-Pedal Detection
Beici Liang, György Fazekas, Mark Sandler
Comments: Published in 2019 International Joint Conference on Neural Networks (IJCNN)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52] arXiv:2103.13300 [pdf, other]
Title: Automatic Cough Classification for Tuberculosis Screening in a Real-World Environment
Madhurananda Pahar, Marisa Klopper, Byron Reeve, Grant Theron, Rob Warren, Thomas Niesler
Comments: This paper has been accepted in Physiological Measurement (2021)
Journal-ref: Physiological Measurement, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[53] arXiv:2103.13443 [pdf, other]
Title: Blind Speech Separation and Dereverberation using Neural Beamforming
Lukas Pfeifenberger, Franz Pernkopf
Comments: 13 pages, 9 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[54] arXiv:2103.13620 [pdf, other]
Title: SubSpectral Normalization for Neural Audio Data Processing
Simyung Chang, Hyoungwoo Park, Janghoon Cho, Hyunsin Park, Sungrack Yun, Kyuwoong Hwang
Comments: 4 pages, ICASSP '21 accepted
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[55] arXiv:2103.14201 [pdf, other]
Title: Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis
Nikhil Singh, Jeff Mentch, Jerry Ng, Matthew Beveridge, Iddo Drori
Comments: ICCV 2021. Project page: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[56] arXiv:2103.14206 [pdf, other]
Title: Three dimensional higher-order raypath separation in a shallow-water waveguide
Jiang Longyu, Zhang Zhe, Roux Philippe
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57] arXiv:2103.14208 [pdf, other]
Title: Modeling the Compatibility of Stem Tracks to Generate Music Mashups
Jiawen Huang, Ju-Chiang Wang, Jordan B. L. Smith, Xuchen Song, Yuxuan Wang
Comments: This is a preprint of the paper accepted by AAAI-21. Please cite the version included in the Proceedings of the 35th AAAI Conference on Artificial Intelligence
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[58] arXiv:2103.14236 [pdf, other]
Title: Subspace-based compressive sensing algorithm for raypath separation in a shallow-water waveguide
Longyu Jiang, Zhe Zhang, Rui Jin, Xiao Zhou, Philippe Roux
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[59] arXiv:2103.14245 [pdf, other]
Title: Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN
Congyi Wang, Yu Chen, Bin Wang, Yi Shi
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[60] arXiv:2103.14330 [pdf, other]
Title: Guided Training: A Simple Method for Single-channel Speaker Separation
Hao Li, Xueliang Zhang, Guanglai Gao
Comments: 5 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[61] arXiv:2103.14574 [pdf, other]
Title: Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, RJ Skerry-Ryan, Yonghui Wu
Comments: Submitted to INTERSPEECH 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[62] arXiv:2103.14717 [pdf, other]
Title: Cyclic Defense GAN Against Speech Adversarial Attacks
Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
Comments: 5
Journal-ref: IEEE Signal Processing Letters (2021) 1-5
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[63] arXiv:2103.14736 [pdf, other]
Title: Construction of a Large-scale Japanese ASR Corpus on TV Recordings
Shintaro Ando, Hiromasa Fujihara
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[64] arXiv:2103.14882 [pdf, other]
Title: On TasNet for Low-Latency Single-Speaker Speech Enhancement
Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65] arXiv:2103.14895 [pdf, other]
Title: Feature-based Representation for Violin Bridge Admittances
R. Malvermi, S. Gonzalez, M. Quintavalla, F. Antonacci, A. Sarti, J. A. Torres, R. Corradi
Comments: 8 pages, 6 figures, submitted to "The 27th International Congress on Sound and Vibration" (ICSV)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[66] arXiv:2103.15722 [pdf, other]
Title: Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
Chengdong Liang, Menglong Xu, Xiao-Lei Zhang
Comments: There is an error in the description of section 3.2.1
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[67] arXiv:2103.15999 [pdf, other]
Title: Audio classification of the content of food containers and drinking glasses
Santiago Donaher, Alessio Xompero, Andrea Cavallaro
Comments: Camera-ready version. Paper accepted to EUSIPCO21. 5 pages, 4 figures, 3 tables. Minor improvements to the paper presentation
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68] arXiv:2103.16079 [pdf, other]
Title: Environmental sound analysis with mixup based multitask learning and cross-task fusion
Weiping Zheng, Dacan Jiang, Gansen Zhao
Comments: 5 pages, 1 figue
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[69] arXiv:2103.16091 [pdf, other]
Title: Symbolic Music Generation with Diffusion Models
Gautam Mittal, Jesse Engel, Curtis Hawthorne, Ian Simon
Comments: ISMIR 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[70] arXiv:2103.16149 [pdf, other]
Title: Time-domain Speech Enhancement with Generative Adversarial Learning
Feiyang Xiao, Jian Guan, Qiuqiang Kong, Wenwu Wang
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71] arXiv:2103.16804 [pdf, other]
Title: TS-RIR: Translated synthetic room impulse responses for speech augmentation
Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
Comments: Accepted to IEEE ASRU 2021. Source code is available at this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72] arXiv:2103.16935 [pdf, other]
Title: Near field Acoustic Holography on arbitrary shapes using Convolutional Neural Network
Marco Olivieri, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti
Comments: accepted for publication in EUSIPCO21
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[73] arXiv:2103.16988 [pdf, other]
Title: Towards Citizen Science for Smart Cities: A Framework for a Collaborative Game of Bird Call Recognition Based on Internet of Sound Practices
Emmanuel Rovithis, Nikolaos Moustakas, Konstantinos Vogklis, Konstantinos Drossos, Andreas Floros
Subjects: Sound (cs.SD)
[74] arXiv:2103.17139 [pdf, other]
Title: Privacy Enhanced Speech Emotion Communication using Deep Learning Aided Edge Computing
Hafiz Shehbaz Ali, Fakhar ul Hassan, Siddique Latif, Habib Ullah Manzoor, Junaid Qadir
Comments: accepted in ICC 2021 AffectiveSense workshop
Subjects: Sound (cs.SD)
[75] arXiv:2103.00129 (cross-list from cs.HC) [pdf, other]
Title: Music Genre Bars
Swaroop Panda, S.T.Roy
Comments: A version of this paper was presented at IEEE VIS 2020
Journal-ref: IEEE VIS 2020 posters
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[76] arXiv:2103.00324 (cross-list from eess.AS) [pdf, other]
Title: Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals
Comments: 15 pages, 9 figures, 6 tables
Journal-ref: Speech Communication, Volume 128, April 2021, Pages 24-34
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD); Neurons and Cognition (q-bio.NC)
[77] arXiv:2103.00333 (cross-list from eess.AS) [pdf, other]
Title: Silent versus modal multi-speaker speech recognition from ultrasound and video
Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Comments: 5 pages, 5 figures, Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[78] arXiv:2103.00422 (cross-list from eess.AS) [pdf, other]
Title: Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition
Hirofumi Inaguma, Tatsuya Kawahara
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[79] arXiv:2103.00484 (cross-list from cs.CR) [pdf, other]
Title: Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood, Marriam Nawaz, Khalid Mahmood Malik, Ali Javed, Aun Irtaza
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[80] arXiv:2103.00816 (cross-list from eess.AS) [pdf, other]
Title: Contrastive Separative Coding for Self-supervised Representation Learning
Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu
Comments: Accepted in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[81] arXiv:2103.00819 (cross-list from eess.AS) [pdf, other]
Title: Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation
Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
Comments: Accepted in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[82] arXiv:2103.00833 (cross-list from cs.AI) [pdf, other]
Title: Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning
Thomas Pellegrini (IRIT-SAMoVA), Timothée Masquelier (CERCO)
Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Jun 2021, Toronto, Canada
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2103.00993 (cross-list from eess.AS) [pdf, other]
Title: AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu
Comments: Accepted by ICLR 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[84] arXiv:2103.01032 (cross-list from cs.CL) [pdf, other]
Title: Inductive biases, pretraining and fine-tuning jointly account for brain responses to speech
Juliette Millet, Jean-Remi King
Comments: 10 pages, 3 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[85] arXiv:2103.01059 (cross-list from eess.AS) [pdf, other]
Title: Comparing acoustic analyses of speech data collected remotely
Cong Zhang, Kathleen Jepson, Georg Lohfink, Amalia Arvaniti
Journal-ref: The Journal of the Acoustical Society of America 149, 3910-3916 (2021)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[86] arXiv:2103.01461 (cross-list from eess.AS) [pdf, other]
Title: Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect
Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu
Comments: Accepted in AAAI 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[87] arXiv:2103.01661 (cross-list from eess.AS) [pdf, other]
Title: Incorporating VAD into ASR System by Multi-task Learning
Meng Li, Xia Yan, Feng Lin
Comments: 5 pages, 2 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[88] arXiv:2103.02147 (cross-list from eess.AS) [pdf, other]
Title: Reverb Conversion of Mixed Vocal Tracks Using an End-to-end Convolutional Deep Neural Network
Junghyun Koo, Seungryeol Paik, Kyogu Lee
Comments: To appear in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[89] arXiv:2103.02183 (cross-list from eess.SP) [pdf, other]
Title: Auditory Attention Decoding from EEG using Convolutional Recurrent Neural Network
Zhen Fu, Bo Wang, Xihong Wu, Jing Chen
Comments: 5 pages, 4 figures, submitted to EUSIPCO 2021
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:2103.02313 (cross-list from eess.AS) [pdf, other]
Title: Open community platform for hearing aid algorithm research: open Master Hearing Aid (openMHA)
Hendrik Kayser, Tobias Herzke, Paul Maanen, Max Zimmermann, Giso Grimm, Volker Hohmann
Comments: 10 pages, 5 figures
Journal-ref: SoftwareX, Volume 17, 2022, 100953, ISSN 2352-7110
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[91] arXiv:2103.02421 (cross-list from eess.AS) [pdf, other]
Title: The effect of speech and noise levels on the quality perceived by cochlear implant and normal hearing listeners
Sara Akbarzadeh, Sungmin Lee, Fei Chen, Chin-Tuan Tan
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[92] arXiv:2103.02552 (cross-list from eess.AS) [pdf, other]
Title: Multi-Channel and Multi-Microphone Acoustic Echo Cancellation Using A Deep Learning Based Approach
Hao Zhang, DeLiang Wang
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[93] arXiv:2103.02703 (cross-list from eess.AS) [pdf, other]
Title: The Spatial Selective Auditory Attention of Cochlear Implant Users in Different Conversational Sound Levels
Sara Akbarzadeh, Sungmin Lee, Chin-Tuan Tan
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[94] arXiv:2103.02858 (cross-list from eess.AS) [pdf, other]
Title: crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder
Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[95] arXiv:2103.02899 (cross-list from eess.AS) [pdf, other]
Title: End-to-end acoustic modelling for phone recognition of young readers
Lucile Gelin, Morgane Daniel, Julien Pinquier, Thomas Pellegrini
Comments: 16 pages, 8 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[96] arXiv:2103.03023 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Mispronunciation Detection and Diagnosis From Raw Waveforms
Bi-Cheng Yan, Berlin Chen
Comments: Preprint. Under review 5 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[97] arXiv:2103.03049 (cross-list from eess.AS) [pdf, other]
Title: A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, Hoon-Young Cho
Comments: Accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[98] arXiv:2103.03206 (cross-list from cs.CV) [pdf, other]
Title: Perceiver: General Perception with Iterative Attention
Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira
Comments: ICML 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2103.03215 (cross-list from eess.AS) [pdf, other]
Title: Front-end Diarization for Percussion Separation in Taniavartanam of Carnatic Music Concerts
Nauman Dawalatabad, Jilt Sebastian, Jom Kuriakose, C. Chandra Sekhar, Shrikanth Narayanan, Hema A. Murthy
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[100] arXiv:2103.03344 (cross-list from cs.CR) [pdf, other]
Title: WaveGuard: Understanding and Mitigating Audio Adversarial Examples
Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar
Comments: Published as a conference paper at Usenix Security 2021
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[101] arXiv:2103.03541 (cross-list from cs.CL) [pdf, other]
Title: Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis
Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
Comments: 17 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102] arXiv:2103.03621 (cross-list from cs.HC) [pdf, other]
Title: Low-latency auditory spatial attention detection based on spectro-spatial features from EEG
Siqi Cai, Pengcheng Sun, Tanja Schultz, Haizhou Li
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2103.03954 (cross-list from eess.AS) [pdf, other]
Title: ODAS: Open embeddeD Audition System
François Grondin, Dominic Létourneau, Cédric Godin, Jean-Samuel Lauzon, Jonathan Vincent, Simon Michaud, Samuel Faucher, François Michaud
Comments: This paper was published in Frontiers Robotics and AI
Subjects: Audio and Speech Processing (eess.AS); Robotics (cs.RO); Sound (cs.SD)
[104] arXiv:2103.03970 (cross-list from eess.AS) [pdf, other]
Title: Incorporating Wireless Communication Parameters into the E-Model Algorithm
Demóstenes Z. Rodríguez, Dick Carrillo Melgarejo, Miguel A. Ramírez, Pedro H. J. Nardelli, Sebastian Möller
Comments: 18 pages
Journal-ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[105] arXiv:2103.04088 (cross-list from eess.AS) [pdf, other]
Title: Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, Po-chun Hsu, Hung-yi Lee
Comments: Accepted by ICASSP 2021, in the special session of ICASSP 2021 M2VoC Challenge
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[106] arXiv:2103.04336 (cross-list from eess.AS) [pdf, other]
Title: HTMD-Net: A Hybrid Masking-Denoising Approach to Time-Domain Monaural Singing Voice Separation
Christos Garoufis, Athanasia Zlatintsi, Petros Maragos
Comments: submitted for publication in EUSIPCO 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[107] arXiv:2103.04346 (cross-list from eess.AS) [pdf, other]
Title: An Optimized Signal Processing Pipeline for Syllable Detection and Speech Rate Estimation
Kamini Sabu, Syomantak Chaudhuri, Preeti Rao, Mahesh Patil
Comments: 6 pages, 3 figures, accepted in National Conference on Communications (NCC) 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[108] arXiv:2103.04699 (cross-list from eess.AS) [pdf, other]
Title: CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge
Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[109] arXiv:2103.04792 (cross-list from eess.AS) [pdf, other]
Title: An Ultra-low Power RNN Classifier for Always-On Voice Wake-Up Detection Robust to Real-World Scenarios
Emmanuel Hardy, Franck Badets
Comments: This is an updated version of the paper submitted at tinyML'21 that was accepted as a poster (this https URL). The posters are video-only and not in the conference proceedings, hence this publication on arXiv
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[110] arXiv:2103.05081 (cross-list from eess.AS) [pdf, other]
Title: A Parallelizable Lattice Rescoring Strategy with Neural Language Models
Ke Li, Daniel Povey, Sanjeev Khudanpur
Comments: To appear at ICASSP 2021. 5 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[111] arXiv:2103.05468 (cross-list from eess.AS) [pdf, other]
Title: CNN-based Spoken Term Detection and Localization without Dynamic Programming
Tzeviya Sylvia Fuchs, Yael Segal, Joseph Keshet
Journal-ref: ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[112] arXiv:2103.06089 (cross-list from cs.LG) [pdf, other]
Title: Variable-rate discrete representation learning
Sander Dieleman, Charlie Nash, Jesse Engel, Karen Simonyan
Comments: 26 pages, 15 figures, samples can be found at this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113] arXiv:2103.06125 (cross-list from cs.LG) [pdf, other]
Title: Learning to Generate Music With Sentiment
Lucas N. Ferreira, Jim Whitehead
Comments: International Society for Music Information Retrieval (2019)
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114] arXiv:2103.06478 (cross-list from eess.AS) [pdf, other]
Title: Deep Multiway Canonical Correlation Analysis for Multi-Subject EEG Normalization
Jaswanth Reddy Katthi, Sriram Ganapathy
Comments: 5 pages, 2 figures, 2 tables, to be published in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[115] arXiv:2103.06581 (cross-list from eess.AS) [pdf, other]
Title: Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-supervised Sound Event Detection
Janek Ebbers, Reinhold Haeb-Umbach
Comments: accepted by dcase2020 workshop, the presented system received the reproducible system award for the dcase2020 challenge task 4
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[116] arXiv:2103.06695 (cross-list from eess.AS) [pdf, other]
Title: BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Comments: IJCNN 2021, 8 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[117] arXiv:2103.07108 (cross-list from cs.HC) [pdf, other]
Title: Measuring Voice UX Quantitatively: A Rapid Review
Katie Seaborn, Jacqueline Urakami
Comments: Accepted at CHI EA '21
Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO); Sound (cs.SD)
[118] arXiv:2103.07186 (cross-list from eess.AS) [pdf, other]
Title: Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov, Ivan Medennikov, Yuri Matveev
Comments: 16 pages, 7 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[119] arXiv:2103.07390 (cross-list from eess.AS) [pdf, other]
Title: Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks
Chitralekha Gupta, Purnima Kamath, Lonce Wyse
Comments: Submitted to Sound and Music Computing Conference (SMC) 2021
Journal-ref: Sound and Music Computing 2021
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Signal Processing (eess.SP)
[120] arXiv:2103.07554 (cross-list from cs.LG) [pdf, other]
Title: A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training
Adnan Haider, Chao Zhang, Florian L. Kreyssig, Philip C. Woodland
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121] arXiv:2103.08207 (cross-list from eess.AS) [pdf, other]
Title: XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition
Zi-Qiang Zhang, Yan Song, Ming-Hui Wu, Xin Fang, Li-Rong Dai
Comments: 5 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[122] arXiv:2103.08393 (cross-list from eess.AS) [pdf, other]
Title: Wav2vec-C: A Self-supervised Model for Speech Representation Learning
Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas
Comments: To appear in Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[123] arXiv:2103.08468 (cross-list from cs.CV) [pdf, other]
Title: Beyond Image to Depth: Improving Depth Prediction using Echoes
Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma
Comments: To appear in CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[124] arXiv:2103.08709 (cross-list from eess.AS) [pdf, other]
Title: Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads
Shahan Nercessian, Andy Sarroff, Kurt James Werner
Comments: 5 pages, 4 figures. To be published in IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[125] arXiv:2103.08801 (cross-list from eess.AS) [pdf, other]
Title: Flow-based Self-supervised Density Estimation for Anomalous Sound Detection
Kota Dohi, Takashi Endo, Harsh Purohit, Ryo Tanabe, Yohei Kawaguchi
Comments: 5 pages, 1 figure, accepted in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[126] arXiv:2103.09148 (cross-list from eess.AS) [pdf, other]
Title: DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics
Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda
Comments: To appear in Proceedings of Interspeech, 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[127] arXiv:2103.09154 (cross-list from cs.CV) [pdf, other]
Title: Leveraging Recent Advances in Deep Learning for Audio-Visual Emotion Recognition
Liam Schoneveld, Alice Othmani, Hazem Abdelkawy
Comments: 8 pages, 3 figures, Pattern Recognition Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128] arXiv:2103.09420 (cross-list from eess.AS) [pdf, other]
Title: Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning
Siyang Yuan, Pengyu Cheng, Ruiyi Zhang, Weituo Hao, Zhe Gan, Lawrence Carin
Comments: To appear in ICLR 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[129] arXiv:2103.09474 (cross-list from eess.AS) [pdf, other]
Title: STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee, Kyumin Park, Daeyoung Kim
Comments: 5 pages, 2 figures, Accepted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[130] arXiv:2103.09935 (cross-list from cs.CL) [pdf, other]
Title: Advancing RNN Transducer Technology for Speech Recognition
George Saon, Zoltan Tueske, Daniel Bolanos, Brian Kingsbury
Comments: Accepted at ICASSP 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131] arXiv:2103.09963 (cross-list from eess.AS) [pdf, other]
Title: TSTNN: Two-stage Transformer based Neural Network for Speech Enhancement in the Time Domain
Kai Wang, Bengbeng He, Wei-Ping Zhu
Comments: 5 pages, 4 figures, accepted by IEEE ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[132] arXiv:2103.10166 (cross-list from q-bio.QM) [pdf, other]
Title: Discriminative Singular Spectrum Classifier with Applications on Bioacoustic Signal Recognition
Bernardo B. Gatto, Juan G. Colonna, Eulanda M. dos Santos, Alessandro L. Koerich, Kazuhiro Fukui
Comments: 15 pages
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2103.10651 (cross-list from cs.CR) [pdf, other]
Title: SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems
Yuxuan Chen, Jiangshan Zhang, Xuejing Yuan, Shengzhi Zhang, Kai Chen, Xiaofeng Wang, Shanqing Guo
Comments: 17 pages
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134] arXiv:2103.11822 (cross-list from physics.ed-ph) [pdf, other]
Title: Graphic relation between amplitude and sound intensity level
Jorge Pinochet
Comments: 6 pages, 5 figures
Journal-ref: The Physics Teacher, 2021
Subjects: Physics Education (physics.ed-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2103.12063 (cross-list from eess.AS) [pdf, other]
Title: QUCoughScope: An Artificially Intelligent Mobile Application to Detect Asymptomatic COVID-19 Patients using Cough and Breathing Sounds
Muhammad E. H. Chowdhury, Nabil Ibtehaz, Tawsifur Rahman, Yosra Magdi Salih Mekki, Yazan Qibalwey, Sakib Mahmud, Maymouna Ezeddin, Susu Zughaier, Sumaya Ali S A Al-Maadeed
Comments: 6 page, Table 4, Figure 2
Journal-ref: Diagnostics 2022, 12(4), 920
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[136] arXiv:2103.12388 (cross-list from eess.AS) [pdf, other]
Title: Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection
Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang, Yuping Wang
Comments: Updated, please refer to "this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[137] arXiv:2103.13329 (cross-list from eess.AS) [pdf, other]
Title: Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks
Md Akmal Haidar, Mehdi Rezagholizadeh
Comments: Accepted in ICASSP 2021 conference
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[138] arXiv:2103.14129 (cross-list from eess.AS) [pdf, other]
Title: Radically Old Way of Computing Spectra: Applications in End-to-End ASR
Samik Sadhu, Hynek Hermansky
Comments: submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[139] arXiv:2103.14152 (cross-list from eess.AS) [pdf, other]
Title: Residual Energy-Based Models for End-to-End Speech Recognition
Qiujia Li, Yu Zhang, Bo Li, Liangliang Cao, Philip C. Woodland
Comments: To appear in Proc. Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[140] arXiv:2103.14253 (cross-list from eess.AS) [pdf, other]
Title: Supervised Chorus Detection for Popular Music Using Convolutional Neural Network and Multi-task Learning
Ju-Chiang Wang, Jordan B.L. Smith, Jitong Chen, Xuchen Song, Yuxuan Wang
Comments: This version is a preprint of an accepted paper by ICASSP2021. Please cite the publication in the Proceedings of IEEE International Conference on Acoustics, Speech, & Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[141] arXiv:2103.14297 (cross-list from eess.AS) [pdf, other]
Title: CNN-based Discriminative Training for Domain Compensation in Acoustic Event Detection with Frame-wise Classifier
Tiantian Tang, Xinyuan Zhou, Yanhua Long, Yijie Li, Jiaen Liang
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[142] arXiv:2103.14302 (cross-list from cs.CL) [pdf, other]
Title: Mutually-Constrained Monotonic Multihead Attention for Online ASR
Jaeyun Song, Hajin Shim, Eunho Yang
Comments: Accepted at IEEE ICASSP 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2103.14512 (cross-list from cs.CL) [pdf, other]
Title: Continual Speaker Adaptation for Text-to-Speech Synthesis
Hamed Hemati, Damian Borth
Comments: Preprint
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2103.14583 (cross-list from cs.CL) [pdf, other]
Title: Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages
Nay San, Martijn Bartelds, Mitchell Browne, Lily Clifford, Fiona Gibson, John Mansfield, David Nash, Jane Simpson, Myfany Turpin, Maria Vollmer, Sasha Wilmoth, Dan Jurafsky
Comments: Accepted at ASRU 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145] arXiv:2103.14602 (cross-list from eess.AS) [pdf, other]
Title: Data Quality as Predictor of Voice Anti-Spoofing Generalization
Bhusan Chettri, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen
Comments: INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[146] arXiv:2103.14776 (cross-list from eess.AS) [pdf, other]
Title: Scalable and Efficient Neural Speech Coding: A Hybrid Design
Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beak, Minje Kim
Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP), 2021 (Accepted for publication)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[147] arXiv:2103.15060 (cross-list from cs.CL) [pdf, other]
Title: PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu
Comments: Accepted to Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2103.15122 (cross-list from eess.AS) [pdf, other]
Title: Quantifying Bias in Automatic Speech Recognition
Siyuan Feng, Olya Kudina, Bence Mark Halpern, Odette Scharenborg
Comments: Submitted to INTERSPEECH (IS) 2021. This preprint version differs slightly from the version submitted to IS 2021: Figure 1 is not included in IS 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[149] arXiv:2103.15305 (cross-list from eess.AS) [pdf, other]
Title: Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays
Junqi Chen, Xiao-Lei Zhang
Journal-ref: Proc. Interspeech 2021, 291-295
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[150] arXiv:2103.15421 (cross-list from eess.AS) [pdf, other]
Title: Improved Meta-Learning Training for Speaker Verification
Yafeng Chen, Wu Guo, Bin Gu
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[151] arXiv:2103.16193 (cross-list from eess.AS) [pdf, other]
Title: MediaSpeech: Multilanguage ASR Benchmark and Dataset
Rostislav Kolobov, Olga Okhapkina, Olga Omelchishina, Andrey Platunov, Roman Bedyakin, Vyacheslav Moshkin, Dmitry Menshikov, Nikolay Mikhaylovskiy
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[152] arXiv:2103.16269 (cross-list from eess.AS) [pdf, other]
Title: Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech
Chenglin Xu, Wei Rao, Jibin Wu, Haizhou Li
Comments: 13 pages, submitted to IEEE/ACM transaction on Audio, Speech and Language on 10 Jan. 2021
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[153] arXiv:2103.16456 (cross-list from eess.AS) [pdf, other]
Title: Enhancing Segment-Based Speech Emotion Recognition by Deep Self-Learning
Shuiyang Mao, P. C. Ching, Tan Lee
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[154] arXiv:2103.16674 (cross-list from eess.AS) [pdf, other]
Title: Pre-training for low resource speech-to-intent applications
Pu Wang, Hugo Van hamme
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[155] arXiv:2103.16776 (cross-list from eess.AS) [pdf, other]
Title: Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[156] arXiv:2103.16827 (cross-list from eess.AS) [pdf, other]
Title: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer
Journal-ref: ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[157] arXiv:2103.16849 (cross-list from eess.AS) [pdf, other]
Title: TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation
Helin Wang, Bo Wu, Lianwu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu
Comments: Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[158] arXiv:2103.16858 (cross-list from eess.AS) [pdf, other]
Title: SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification
Helin Wang, Yuexian Zou, Wenwu Wang
Comments: Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[159] arXiv:2103.17122 (cross-list from eess.AS) [pdf, other]
Title: Adversarial Attacks and Defenses for Speech Recognition Systems
Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[160] arXiv:2103.17189 (cross-list from eess.AS) [pdf, other]
Title: Y$^2$-Net FCRN for Acoustic Echo and Noise Suppression
Ernst Seidel, Jan Franzen, Maximilian Strake, Tim Fingscheidt
Comments: 5 pages, 2 figures, accepted for Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Total of 160 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack