Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for April 2021

Total of 266 entries : 1-50 51-100 101-150 151-200 201-250 251-266
Showing up to 50 entries per page: fewer | more | all
[101] arXiv:2104.13069 [pdf, other]
Title: Visualization of Linear Operations in the Spherical Harmonics Domain
Maximilian Kentgens, Peter Jax
Comments: Pre-print/author version of paper presented at International Conference on Immersive and 3D Audio (I3DA), Sept. 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[102] arXiv:2104.13168 [pdf, other]
Title: dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing
Diego Di Carlo, Pinchas Tandeitnik, Cédric Foy (UMRAE), Antoine Deleforge, Nancy Bertin, Sharon Gannot
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[103] arXiv:2104.13247 [pdf, other]
Title: IATos: AI-powered pre-screening tool for COVID-19 from cough audio samples
D. Trejo Pizzo, S. Esteban
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[104] arXiv:2104.13347 [pdf, other]
Title: BeamLearning: an end-to-end Deep Learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data
Hadrien Pujol, Éric Bavu, Alexandre Garcia
Comments: The following article has been submitted to the special issue on Machine Learning in Acoustics in JASA. After it is published, it will be found at this http URL
Journal-ref: J. Acoust. Soc. Am. 149 (6), June 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[105] arXiv:2104.13423 [pdf, other]
Title: DASEE A Synthetic Database of Domestic Acoustic Scenes and Events in Dementia Patients Environment
Abigail Copiaco, Christian Ritz, Stefano Fasciani, Nidhal Abdulaziz
Comments: 5 pages, 4 figures, 6 tables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[106] arXiv:2104.13553 [pdf, other]
Title: AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries
Woosung Choi, Minseok Kim, Marco A. Martínez Ramírez, Jaehwa Chung, Soonyoung Jung
Comments: 10 pages, 8 figures, 3 tables, under reviewing of ACMMM 21
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[107] arXiv:2104.13620 [pdf, other]
Title: IDMT-Traffic: An Open Benchmark Dataset for Acoustic Traffic Monitoring Research
Jakob Abeßer, Saichand Gourishetti, András Kátai, Tobias Clauß, Prachi Sharma, Judith Liebetrau
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[108] arXiv:2104.13970 [pdf, other]
Title: Personalized Keyphrase Detection using Speaker and Environment Information
Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng (Arden)Huang, Arun Narayanan, Ian McGraw
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[109] arXiv:2104.14264 [pdf, other]
Title: Hardware-Friendly Synaptic Orders and Timescales in Liquid State Machines for Speech Classification
Vivek Saraswat, Ajinkya Gorad, Anand Naik, Aakash Patil, Udayan Ganguly
Subjects: Audio and Speech Processing (eess.AS); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Neurons and Cognition (q-bio.NC)
[110] arXiv:2104.14791 [pdf, other]
Title: Deformable TDNN with adaptive receptive fields for speech recognition
Keyu An, Yi Zhang, Zhijian Ou
Comments: 5 pages. submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[111] arXiv:2104.14921 [pdf, other]
Title: Crackle Detection In Lung Sounds Using Transfer Learning And Multi-Input Convolitional Neural Networks
Truc Nguyen, Franz Pernkopf
Comments: Under Review in Proceeding of EMBC 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[112] arXiv:2104.00235 (cross-list from cs.CL) [pdf, other]
Title: Multilingual and code-switching ASR challenges for low resource Indian languages
Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham
Comments: 6 pages
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[113] arXiv:2104.00239 (cross-list from cs.CV) [pdf, other]
Title: Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou, Liang Zheng, Yiran Zhong, Shijie Hao, Meng Wang
Comments: Accepted to CVPR 2021. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114] arXiv:2104.00315 (cross-list from cs.CV) [pdf, other]
Title: Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[115] arXiv:2104.00355 (cross-list from cs.SD) [pdf, other]
Title: Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux
Comments: In Proceedings of Interspeech 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[116] arXiv:2104.00437 (cross-list from cs.SD) [pdf, other]
Title: Enriched Music Representations with Multiple Cross-modal Contrastive Learning
Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov
Comments: Accepted for publication to IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[117] arXiv:2104.00705 (cross-list from cs.SD) [pdf, other]
Title: Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Qing He, Zhiping Xiu, Thilo Koehler, Jilong Wu
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[118] arXiv:2104.00732 (cross-list from cs.SD) [pdf, other]
Title: Out of a hundred trials, how many errors does your speaker verifier make?
Niko Brümmer, Luciana Ferrer, Albert Swart
Comments: Submitted to Interspeech 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[119] arXiv:2104.00824 (cross-list from cs.CL) [pdf, other]
Title: Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments
David R. Mortensen, Jordan Picone, Xinjian Li, Kathleen Siminyu
Comments: 4 pages, 3 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2104.01027 (cross-list from cs.SD) [pdf, other]
Title: Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[121] arXiv:2104.01160 (cross-list from cs.SD) [pdf, other]
Title: PhyAug: Physics-Directed Data Augmentation for Deep Sensing Model Transfer in Cyber-Physical Systems
Wenjie Luo, Zhenyu Yan, Qun Song, Rui Tan
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[122] arXiv:2104.01161 (cross-list from cs.SD) [pdf, other]
Title: An Audio-Based Deep Learning Framework For BBC Television Programme Classification
Lam Pham, Chris Baume, Qiuqiang Kong, Tassadaq Hussain, Wenwu Wang, Mark Plumbley
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123] arXiv:2104.01271 (cross-list from cs.SD) [pdf, other]
Title: PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification
Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Comments: Accepted to Interspeech 2021
Journal-ref: Proc. Interspeech 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[124] arXiv:2104.01304 (cross-list from cs.SD) [pdf, other]
Title: Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio
Jeffrey Tumminia, Amanda Kuznecov, Sophia Tsilerides, Ilana Weinstein, Brian McFee, Michael Picheny, Aaron R. Kaufman
Comments: Under review for InterSpeech 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2104.01378 (cross-list from cs.CL) [pdf, other]
Title: speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, Yukai Huang, Ke Li, Daniel Povey, Yujun Wang
Comments: Accepted in INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126] arXiv:2104.01393 (cross-list from cs.CL) [pdf, other]
Title: On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR
Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, Stefan Riezler
Comments: Accepted at INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[127] arXiv:2104.01444 (cross-list from cs.SD) [pdf, other]
Title: Mixture of orthogonal sequences made from extended time-stretched pulses enables measurement of involuntary voice fundamental frequency response to pitch perturbation
Hideki Kawahara, Toshie Matsui, Kohei Yatabe, Ken-Ichi Sakakibara, Minoru Tsuzaki, Masanori Morise, Toshio Irino
Comments: 5 pages, 9 figures, submitted to Interspeech2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[128] arXiv:2104.01454 (cross-list from cs.CL) [pdf, other]
Title: Few-Shot Keyword Spotting in Any Language
Mark Mazumder, Colby Banbury, Josh Meyer, Pete Warden, Vijay Janapa Reddi
Journal-ref: Proc. Interspeech 2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[129] arXiv:2104.01604 (cross-list from cs.CL) [pdf, other]
Title: Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers
Loren Lugosch, Piyush Papreja, Mirco Ravanelli, Abdelwahab Heba, Titouan Parcollet
Comments: Accepted to NeurIPS 2021 - Datasets and Benchmarks Track
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[130] arXiv:2104.01616 (cross-list from cs.CL) [pdf, other]
Title: Towards Lifelong Learning of End-to-end ASR
Heng-Jui Chang, Hung-yi Lee, Lin-shan Lee
Comments: Interspeech 2021. We acknowledge the support of Salesforce Research Deep Learning Grant
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[131] arXiv:2104.01807 (cross-list from cs.SD) [pdf, other]
Title: StarGAN-based Emotional Voice Conversion for Japanese Phrases
Asuka Moritani, Ryo Ozaki, Shoki Sakamoto, Hirokazu Kameoka, Tadahiro Taniguchi
Comments: Submitted to Interspeech 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[132] arXiv:2104.01923 (cross-list from eess.SP) [pdf, other]
Title: Real-time Streaming Wave-U-Net with Temporal Convolutions for Multichannel Speech Enhancement
Vasiliy Kuzmin, Fyodor Kravchenko, Artem Sokolov, Jie Geng
Comments: Draft paper for InterSpeech 2021 processing
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2104.01978 (cross-list from cs.SD) [pdf, other]
Title: Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition
Haoqi Li, Yelin Kim, Cheng-Hao Kuo, Shrikanth Narayanan
Comments: paper accepted by INTERSPEECH2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[134] arXiv:2104.02000 (cross-list from cs.CV) [pdf, other]
Title: Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian, Chenliang Xu
Comments: CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2104.02005 (cross-list from cs.SD) [pdf, other]
Title: Uncertainty-Aware COVID-19 Detection from Imbalanced Sound Data
Tong Xia, Jing Han, Lorena Qendro, Ting Dang, Cecilia Mascolo
Comments: Accepted by INTERSPEECH 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[136] arXiv:2104.02014 (cross-list from cs.CL) [pdf, other]
Title: SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko
Comments: 5 pages, 1 figure. Submitted to INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[137] arXiv:2104.02026 (cross-list from cs.CV) [pdf, other]
Title: Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian, Di Hu, Chenliang Xu
Comments: CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2104.02194 (cross-list from cs.CL) [pdf, other]
Title: Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer
Comments: Accepted for presentation at INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[139] arXiv:2104.02207 (cross-list from cs.SD) [pdf, other]
Title: Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer
Comments: Proc. of Interspeech 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[140] arXiv:2104.02232 (cross-list from cs.SD) [pdf, other]
Title: Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer
Comments: Submitted to Interspeech 2021 (under review)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[141] arXiv:2104.02306 (cross-list from cs.SD) [pdf, other]
Title: Binary Neural Network for Speaker Verification
Tinglong Zhu, Xiaoyi Qin, Ming Li
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[142] arXiv:2104.02309 (cross-list from cs.SD) [pdf, other]
Title: MuSLCAT: Multi-Scale Multi-Level Convolutional Attention Transformer for Discriminative Music Modeling on Raw Waveforms
Kai Middlebrook, Shyam Sudhakaran, David Guy Brizan
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[143] arXiv:2104.02387 (cross-list from cs.SD) [pdf, other]
Title: Towards Consistent Hybrid HMM Acoustic Modeling
Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2104.02410 (cross-list from cs.SE) [pdf, html, other]
Title: Using Voice and Biofeedback to Predict User Engagement during Product Feedback Interviews
Alessio Ferrari, Thaide Huichapa, Paola Spoletini, Nicole Novielli, Davide Fucci, Daniela Girardi
Comments: This paper contains updated experimental results with respect to the initial version
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145] arXiv:2104.02477 (cross-list from cs.SD) [pdf, other]
Title: COVID-19 Detection in Cough, Breath and Speech using Deep Transfer Learning and Bottleneck Features
Madhurananda Pahar, Marisa Klopper, Robin Warren, Thomas Niesler
Journal-ref: Computers in Biology and Medicine, 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[146] arXiv:2104.02535 (cross-list from cs.SD) [pdf, other]
Title: Optimal Transport-based Adaptation in Dysarthric Speech Tasks
Rosanna Turrisi, Leonardo Badino
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[147] arXiv:2104.02558 (cross-list from cs.SD) [pdf, other]
Title: Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model
Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[148] arXiv:2104.02588 (cross-list from cs.CE) [pdf, other]
Title: Principal Component Analysis Applied to Gradient Fields in Band Gap Optimization Problems for Metamaterials
Giorgio Gnecco, Andrea Bacigalupo, Francesca Fantoni, Daniela Selvi
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2104.02606 (cross-list from cs.CV) [pdf, other]
Title: Weakly-supervised Audio-visual Sound Source Detection and Separation
Tanzila Rahman, Leonid Sigal
Comments: 4 figures, 6 pages
Journal-ref: IEEE International Conference on Multimedia and Expo (ICME) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[150] arXiv:2104.02656 (cross-list from cs.CV) [pdf, other]
Title: Collaborative Learning to Generate Audio-Video Jointly
Vinod K Kurmi, Vipul Bajaj, Badri N Patro, K S Venkatesh, Vinay P Namboodiri, Preethi Jyothi
Comments: ICASSP 2021 (Accepted)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
Total of 266 entries : 1-50 51-100 101-150 151-200 201-250 251-266
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack