Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for April 2021

Total of 266 entries : 1-50 51-100 101-150 151-200 201-250 251-266
Showing up to 50 entries per page: fewer | more | all
[151] arXiv:2104.02691 (cross-list from cs.CV) [pdf, other]
Title: Localizing Visual Sounds the Hard Way
Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
Comments: CVPR2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[152] arXiv:2104.02775 (cross-list from cs.CV) [pdf, other]
Title: Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn
Comments: CVPR 2021. The first two authors contributed equally to this work. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[153] arXiv:2104.02868 (cross-list from cs.SD) [pdf, other]
Title: Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR
Xian Shi, Pan Zhou, Wei Chen, Lei Xie
Comments: Submitted to ASRU 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2104.03123 (cross-list from cs.LG) [pdf, other]
Title: Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection
Wanying Ge, Michele Panariello, Jose Patino, Massimiliano Todisco, Nicholas Evans
Comments: Accepted to INTERSPEECH 2021
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[155] arXiv:2104.03204 (cross-list from cs.SD) [pdf, other]
Title: Learning robust speech representation with an articulatory-regularized variational autoencoder
Marc-Antoine Georges, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[156] arXiv:2104.03502 (cross-list from cs.SD) [pdf, other]
Title: Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
Leonardo Pepino, Pablo Riera, Luciana Ferrer
Comments: 5 pages, 2 figures. Submitted to Interspeech 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[157] arXiv:2104.03521 (cross-list from cs.SD) [pdf, other]
Title: Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng
Comments: 5 pages, 4 figures, submitted to INTERSPEECH 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[158] arXiv:2104.03538 (cross-list from cs.SD) [pdf, other]
Title: MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement
Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao
Comments: Accepted by Interspeech 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[159] arXiv:2104.03587 (cross-list from cs.SD) [pdf, other]
Title: WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition
Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[160] arXiv:2104.03603 (cross-list from cs.SD) [pdf, other]
Title: AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen
Comments: Accepted by Interspeech 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[161] arXiv:2104.03617 (cross-list from cs.SD) [pdf, html, other]
Title: Half-Truth: A Partially Fake Audio Detection Dataset
Jiangyan Yi, Ye Bai, Jianhua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu
Comments: accepted by Interspeech 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[162] arXiv:2104.03643 (cross-list from cs.CL) [pdf, other]
Title: Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems
Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Karel Veselý, Martin Kocour, Igor Szöke
Comments: Presented at: Interspeech conference 2021 (Brno, Czechia, August 30 - September 3)
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[163] arXiv:2104.03725 (cross-list from cs.LG) [pdf, other]
Title: On tuning consistent annealed sampling for denoising score matching
Joan Serrà, Santiago Pascual, Jordi Pons
Comments: 3 pages and 1 figure
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164] arXiv:2104.03815 (cross-list from cs.CL) [pdf, other]
Title: Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation
Fengpeng Yue, Yan Deng, Lei He, Tom Ko
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2104.03838 (cross-list from cs.SD) [pdf, other]
Title: Speech Denoising Without Clean Training Data: A Noise2Noise Approach
Madhav Mahesh Kashyap, Anuj Tambwekar, Krishnamoorthy Manohara, S Natarajan
Comments: Published in Interspeech 2021 ( See this https URL ). 5 pages, 2 figures, 1 table
Journal-ref: Proc. Interspeech 2021, 2716-2720
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[166] arXiv:2104.03842 (cross-list from cs.CL) [pdf, other]
Title: RNN Transducer Models For Spoken Language Understanding
Samuel Thomas, Hong-Kwang J. Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory
Comments: To appear in the proceedings of ICASSP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[167] arXiv:2104.03876 (cross-list from cs.SD) [pdf, other]
Title: SerumRNN: Step by Step Audio VST Effect Programming
Christopher Mitcheltree, Hideki Koike
Comments: Audio samples of the system can be listened to at this http URL
Journal-ref: 10th International Conference on Artificial Intelligence in Music, Sound, Art, and Design (EvoMUSART 2021), Seville, Spain
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[168] arXiv:2104.04111 (cross-list from cs.SD) [pdf, other]
Title: Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Yang Gao, Tyler Vuong, Mahsa Elyasi, Gaurav Bharaj, Rita Singh
Comments: Camera ready version. Accepted by INTERSPEECH 2021
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[169] arXiv:2104.04143 (cross-list from cs.SD) [pdf, other]
Title: Heaps' Law and Vocabulary Richness in the History of Classical Music Harmony
Marc Serra-Peralta, Joan Serrà, Álvaro Corral
Comments: 12 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Physics and Society (physics.soc-ph)
[170] arXiv:2104.04325 (cross-list from cs.SD) [pdf, other]
Title: Joint Online Multichannel Acoustic Echo Cancellation, Speech Dereverberation and Source Separation
Yueyue Na, Ziteng Wang, Zhang Liu, Biao Tian, Qiang Fu
Comments: submitted to INTERSPEECH 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[171] arXiv:2104.04371 (cross-list from cs.MM) [pdf, other]
Title: Speech Quality Assessment in Crowdsourcing: Comparison Category Rating Method
Babak Naderi, Sebastian Möller, Ross Cutler
Comments: Accepted for QoMEX2021
Subjects: Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[172] arXiv:2104.04487 (cross-list from cs.CL) [pdf, other]
Title: Language model fusion for streaming end to end speech recognition
Rodrigo Cabrera, Xiaofeng Liu, Mohammadreza Ghodsi, Zebulun Matteson, Eugene Weinstein, Anjuli Kannan
Comments: 5 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[173] arXiv:2104.04552 (cross-list from cs.CL) [pdf, other]
Title: Lookup-Table Recurrent Language Models for Long Tail Speech Recognition
W. Ronny Huang, Tara N. Sainath, Cal Peyser, Shankar Kumar, David Rybach, Trevor Strohman
Comments: Presented as conference paper at Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2104.04598 (cross-list from cs.SD) [pdf, other]
Title: Cross-Modal learning for Audio-Visual Video Parsing
Jatin Lamba, Abhishek, Jayaprakash Akula, Rishabh Dabral, Preethi Jyothi, Ganesh Ramakrishnan
Comments: Work accepted at Interspeech 2021
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[175] arXiv:2104.04668 (cross-list from cs.SD) [pdf, other]
Title: Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda
Comments: Submitted to INTERSPEECH 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[176] arXiv:2104.04702 (cross-list from cs.SD) [pdf, other]
Title: Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR
Fan Yu, Haoneng Luo, Pengcheng Guo, Yuhao Liang, Zhuoyuan Yao, Lei Xie, Yingying Gao, Leijing Hou, Shilei Zhang
Comments: 5 pages,4 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2104.04805 (cross-list from cs.CL) [pdf, other]
Title: Non-autoregressive Transformer-based End-to-end ASR using BERT
Fu-Hao Yu, Kuan-Yu Chen
Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1474-1482, 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2104.04950 (cross-list from cs.CL) [pdf, other]
Title: Innovative Bert-based Reranking Language Models for Speech Recognition
Shih-Hsuan Chiu, Berlin Chen
Comments: 6 pages, 3 figures, Published in IEEE SLT 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2104.05055 (cross-list from cs.CL) [pdf, other]
Title: NeMo Inverse Text Normalization: From Development To Production
Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[180] arXiv:2104.05418 (cross-list from cs.LG) [pdf, other]
Title: Contrastive Learning of Global-Local Video Representations
Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[181] arXiv:2104.05488 (cross-list from cs.CL) [pdf, other]
Title: CNN Encoding of Acoustic Parameters for Prominence Detection
Kamini Sabu, Mithilesh Vaidya, Preeti Rao
Comments: 5 pages, 2 figures, 6 tables, Submitted to INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[182] arXiv:2104.05507 (cross-list from cs.CL) [pdf, other]
Title: BART based semantic correction for Mandarin automatic speech recognition system
Yun Zhao, Xuerui Yang, Jinchao Wang, Yongyu Gao, Chao Yan, Yuanfu Zhou
Comments: submitted to INTERSPEECH2021
Journal-ref: Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2104.05544 (cross-list from cs.CL) [pdf, other]
Title: Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models
Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
Comments: accepted to Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[184] arXiv:2104.05657 (cross-list from cs.SD) [pdf, other]
Title: End-to-End Mandarin Tone Classification with Short Term Context Information
Jiyang Tang, Ming Li
Comments: Accepted by APSIPA ASC 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[185] arXiv:2104.05752 (cross-list from cs.CL) [pdf, other]
Title: Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs
Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson Morais
Comments: Accepted to Interspeech 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186] arXiv:2104.05784 (cross-list from cs.SD) [pdf, other]
Title: Extremely Low Footprint End-to-End ASR System for Smart Device
Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin
Comments: 5 pages, 2 figures, accepted by INTERSPEECH 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2104.05980 (cross-list from cs.CL) [pdf, other]
Title: Experiments of ASR-based mispronunciation detection for children and adult English learners
Nina Hosseini-Kivanani, Roberto Gretter, Marco Matassoni, Giuseppe Daniele Falavigna
Comments: Submitted to INTERSPEECH2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2104.06004 (cross-list from cs.SD) [pdf, other]
Title: Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Lexical Information Fusion
Ziang Zhou, Yanze Xu, Ming Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[189] arXiv:2104.06074 (cross-list from cs.SD) [pdf, other]
Title: NoiseVC: Towards High Quality Zero-Shot Voice Conversion
Shijun Wang, Damian Borth
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[190] arXiv:2104.06104 (cross-list from cs.CL) [pdf, other]
Title: Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept
Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney
Comments: accepted at Interspeech2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[191] arXiv:2104.06162 (cross-list from cs.SD) [pdf, other]
Title: Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin
Comments: Accepted by CVPR 2021. Code, models, and demo video are available on our webpage: \<this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[192] arXiv:2104.06268 (cross-list from cs.CL) [pdf, other]
Title: Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling
Genta Indra Winata
Comments: HKUST PhD Thesis. 120 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[193] arXiv:2104.06457 (cross-list from cs.CL) [pdf, other]
Title: Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe
Comments: Accepted at NAACL-HLT 2021 (short paper)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2104.06517 (cross-list from cs.SD) [pdf, other]
Title: Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition
Eunjeong Koh, Shlomo Dubnov
Comments: AAAI Workshop on Affective Content Analysis 2021 Camera Ready Version
Journal-ref: AAAI 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[195] arXiv:2104.06607 (cross-list from cs.SD) [pdf, other]
Title: Revisiting the Onsets and Frames Model with Additive Attention
Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans
Comments: Accepted in IJCNN 2021 Special Session S04. this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196] arXiv:2104.06666 (cross-list from cs.SD) [pdf, other]
Title: End-to-end Keyword Spotting using Neural Architecture Search and Quantization
David Peter, Wolfgang Roth, Franz Pernkopf
Comments: arXiv admin note: text overlap with arXiv:2012.10138
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[197] arXiv:2104.06793 (cross-list from cs.SD) [pdf, other]
Title: Non-autoregressive sequence-to-sequence voice conversion
Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda
Comments: Accepted to ICASSP2021. Demo HP: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[198] arXiv:2104.06835 (cross-list from cs.CL) [pdf, other]
Title: Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng
Comments: Accepted by Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[199] arXiv:2104.06865 (cross-list from cs.SD) [pdf, other]
Title: Efficient conformer-based speech recognition with linear attention
Shengqiang Li, Menglong Xu, Xiao-Lei Zhang
Comments: submitted to APSIPA ASC 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2104.06900 (cross-list from cs.SD) [pdf, other]
Title: FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion
Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 266 entries : 1-50 51-100 101-150 151-200 201-250 251-266
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack