Audio and Speech Processing

Authors and titles for February 2021

Total of 208 entries : 1-25 ... 101-125 126-150 151-175 176-200 201-208

Showing up to 25 entries per page: fewer | more | all

[176] arXiv:2102.09680 (cross-list from cs.CL) [pdf, other]: Title: Fixing Errors of the Google Voice Recognizer through Phonetic Distance Metrics

Diego Campos-Sobrino, Mario Campos-Soberanis, Iván Martínez-Chin, Víctor Uc-Cetina

Comments: 13 pages, 4 figures. This article is a translation of the paper "Corrección de errores del reconocedor de voz de Google usando métricas de distancia fonética" presented in COMIA 2018

Journal-ref: Research in Computing Science 148(1), 2019, pp. 57-70. ISSN 1870-4069

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[177] arXiv:2102.09737 (cross-list from cs.CV) [pdf, other]: Title: One Shot Audio to Animated Video Generation

Neeraj Kumar, Srishti Goel, Ankur Narang, Brejesh Lall, Mujtaba Hasan, Pranshu Agarwal, Dipankar Sarkar

Comments: arXiv admin note: substantial text overlap with arXiv:2012.07842, arXiv:2012.07304

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[178] arXiv:2102.09763 (cross-list from cs.SD) [pdf, other]: Title: Frequency-Temporal Attention Network for Singing Melody Extraction

Shuai Yu, Xiaoheng Sun, Yi Yu, Wei Li

Comments: This paper has been accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2102.09794 (cross-list from cs.SD) [pdf, other]: Title: Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure

Zixun Guo, Makris Dimos, Herremans Dorien

Journal-ref: Proc. of the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021(virtual)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[180] arXiv:2102.09817 (cross-list from cs.SD) [pdf, other]: Title: Unit selection synthesis based data augmentation for fixed phrase speaker verification

Houjun Huang, Xu Xiang, Fei Zhao, Shuai Wang, Yanmin Qian

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181] arXiv:2102.09828 (cross-list from cs.SD) [pdf, other]: Title: AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

Houjun Huang, Xu Xiang, Yexin Yang, Rao Ma, Yanmin Qian

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[182] arXiv:2102.09914 (cross-list from cs.CL) [pdf, other]: Title: Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier

Comments: 4 pages

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[183] arXiv:2102.09966 (cross-list from cs.SD) [pdf, other]: Title: CatNet: music source separation system with mix-audio augmentation

Xuchen Song, Qiuqiang Kong, Xingjian Du, Yuxuan Wang

Comments: 5 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184] arXiv:2102.09971 (cross-list from cs.SD) [pdf, other]: Title: Speech enhancement with weakly labelled data from AudioSet

Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang

Comments: 5 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2102.09978 (cross-list from cs.SD) [pdf, other]: Title: TransMask: A Compact and Fast Speech Separation Model Based on Transformer

Zining Zhang, Bingsheng He, Zhenjie Zhang

Comments: Accepted in ICASSP2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186] arXiv:2102.10233 (cross-list from cs.SD) [pdf, other]: Title: The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

Xian Shi, Fan Yu, Yizhou Lu, Yuhao Liang, Qiangze Feng, Daliang Wang, Yanmin Qian, Lei Xie

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2102.10236 (cross-list from cs.SD) [pdf, other]: Title: Singer Identification Using Deep Timbre Feature Learning with KNN-Net

Xulong Zhang, Jiale Qian, Yi Yu, Yifu Sun, Wei Li

Comments: Published as a conference paper at ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2102.10322 (cross-list from cs.SD) [pdf, other]: Title: Learnable MFCCs for Speaker Verification

Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Comments: Accepted to ISCAS 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[189] arXiv:2102.10331 (cross-list from q-bio.NC) [pdf, other]: Title: Separating Stimulus-Induced and Background Components of Dynamic Functional Connectivity in Naturalistic fMRI

Chee-Ming Ting, Jeremy I. Skipper, Steven L. Small, Hernando Ombao

Comments: Main paper: 10 pages, 8 figures. Supplemental file: 3 pages

Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV); Signal Processing (eess.SP); Applications (stat.AP)
[190] arXiv:2102.10515 (cross-list from cs.SD) [pdf, other]: Title: Anomaly Detection in Audio with Concept Drift using Adaptive Huffman Coding

Pratibha Kumari, Mukesh Saini

Comments: 22 pages, 8 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2102.10905 (cross-list from cs.CL) [pdf, other]: Title: Joint Intent Detection And Slot Filling Based on Continual Learning Model

Yanfei Hui, Jianzong Wang, Ning Cheng, Fengying Yu, Tianbo Wu, Jing Xiao

Comments: Accepted to ICASSP 2021

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[192] arXiv:2102.11058 (cross-list from cs.SD) [pdf, other]: Title: Anyone GAN Sing

Shreeviknesh Sankaran, Sukavanan Nanjundan, G. Paavai Anand

Comments: 5 pages, 8 figures

Journal-ref: International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN: 2349-5162, Vol.7, Issue 5, page no. 25-29, May-2020

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[193] arXiv:2102.11114 (cross-list from cs.CL) [pdf, other]: Title: Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

Comments: Accepted in 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2102.11420 (cross-list from cs.SD) [pdf, other]: Title: Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion

Samuel J. Broughton, Md Asif Jalal, Roger K. Moore

Comments: For demo, see this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2102.11457 (cross-list from cs.SD) [pdf, other]: Title: Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Zeyu Xie, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196] arXiv:2102.11474 (cross-list from cs.SD) [pdf, other]: Title: Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197] arXiv:2102.11488 (cross-list from cs.SD) [pdf, other]: Title: Senone-aware Adversarial Multi-task Training for Unsupervised Child to Adult Speech Adaptation

Richeng Duan, Nancy F. Chen

Comments: accepted for presentation at ICASSP-2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[198] arXiv:2102.11531 (cross-list from cs.SD) [pdf, other]: Title: Memory-efficient Speech Recognition on Smart Devices

Ganesh Venkatesh, Alagappan Valliappan, Jay Mahadeokar, Yuan Shangguan, Christian Fuegen, Michael L. Seltzer, Vikas Chandra

Journal-ref: ICASSP 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[199] arXiv:2102.11588 (cross-list from cs.SD) [pdf, other]: Title: Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

Julio Wissing, Benedikt Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura

Comments: 4 pages, 6 figures, ICASSP 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[200] arXiv:2102.11771 (cross-list from cs.SD) [pdf, other]: Title: Improving Deep Learning Sound Events Classifiers using Gram Matrix Feature-wise Correlations

Antonio Joia Neto, Andre G C Pacheco, Diogo C Luvizon

Comments: To appear on ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Total of 208 entries : 1-25 ... 101-125 126-150 151-175 176-200 201-208

Showing up to 25 entries per page: fewer | more | all