Audio and Speech Processing

Authors and titles for August 2020

Total of 254 entries : 1-50 51-100 101-150 151-200 201-250 251-254

Showing up to 50 entries per page: fewer | more | all

[101] arXiv:2008.04549 [pdf, other]: Title: Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages

Haitong Zhang, Yue Lin

Comments: Accepted to the conference of INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[102] arXiv:2008.04562 [pdf, other]: Title: Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

Zongyang Du, Kun Zhou, Berrak Sisman, Haizhou Li

Comments: Accepted to APSIPA ASC 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[103] arXiv:2008.04574 [pdf, other]: Title: Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

Ravichander Vipperla, Sangjun Park, Kihyun Choo, Samin Ishtiaq, Kyoungbo Min, Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos, Nicholas D. Lane

Comments: Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[104] arXiv:2008.04578 [pdf, other]: Title: Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data

Rosa González Hautamäki, Tomi Kinnunen

Comments: Accepted to INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Computers and Society (cs.CY); Sound (cs.SD)
[105] arXiv:2008.04590 [pdf, other]: Title: Surgical Mask Detection with Convolutional Neural Networks and Data Augmentations on Spectrograms

Steffen Illium, Robert Müller, Andreas Sedlmeier, Claudia Linnhoff-Popien

Comments: 5 pages, 2 figures, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[106] arXiv:2008.04617 [pdf, other]: Title: Alzheimer's Dementia Detection from Audio and Text Modalities

Edward L. Campbell (1), Laura Docío-Fernández (1), Javier Jiménez Raboso (2), Carmen García-Mateo (1) ((1) GTM research group, AtlanTTic Research Center, University of Vigo, (2) acceXible)

Comments: 5 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[107] arXiv:2008.04658 [pdf, other]: Title: Transfer Learning for Improving Singing-voice Detection in Polyphonic Instrumental Music

Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li

Comments: Accepted by INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[108] arXiv:2008.04659 [pdf, other]: Title: S-vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder

N J Metilda Sagaya Mary, S Umesh, Sandesh V Katta

Comments: Version 2, Accepted for publication in IEEE TASLP

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[109] arXiv:2008.05011 [pdf, other]: Title: Compact Speaker Embedding: lrx-vector

Munir Georges, Jonathan Huang, Tobias Bocklet

Comments: Accepted to INTERSPEECH 2020

Journal-ref: Proc. Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[110] arXiv:2008.05086 [pdf, other]: Title: Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[111] arXiv:2008.05175 [pdf, other]: Title: Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

Haiwei Wu, Lin Zhang, Lin Yang, Xuyang Wang, Junjie Wang, Dong Zhang, Ming Li

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[112] arXiv:2008.05216 [pdf, other]: Title: Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

Haohe Liu, Lei Xie, Jian Wu, Geng Yang

Comments: Accepted in INTERSPEECH 2020

Journal-ref: Proc. Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[113] arXiv:2008.05259 [pdf, other]: Title: Emotion Profile Refinery for Speech Emotion Classification

Shuiyang Mao, P. C. Ching, Tan Lee

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[114] arXiv:2008.05284 [pdf, other]: Title: Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

Comments: To appear in IEEE Signal Processing Letters (SPL)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[115] arXiv:2008.05289 [pdf, other]: Title: Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou

Comments: Accepted in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[116] arXiv:2008.05514 [pdf, other]: Title: Online Automatic Speech Recognition with Listen, Attend and Spell Model

Roger Hsiao, Dogan Can, Tim Ng, Ruchir Travadi, Arnab Ghoshal

Comments: 5 pages, 4 figures, this version is submitted to IEEE Signal Processing Letters

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[117] arXiv:2008.05650 [pdf, other]: Title: MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

Zhenpeng Zheng, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

Comments: will be presented in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[118] arXiv:2008.05656 [pdf, other]: Title: Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: will be presented in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[119] arXiv:2008.05671 [pdf, other]: Title: Large-scale Transfer Learning for Low-resource Spoken Language Understanding

Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao

Comments: will be presented in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[120] arXiv:2008.05695 [pdf, other]: Title: Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification

Xiaoyang Qu, Jianzong Wang, Jing Xiao

Comments: will be presented in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
[121] arXiv:2008.05750 [pdf, other]: Title: Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition

Wenyong Huang, Wenchao Hu, Yu Ting Yeung, Xiao Chen

Comments: Accepted by INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[122] arXiv:2008.05773 [pdf, other]: Title: Continuous Speech Separation with Conformer

Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[123] arXiv:2008.05889 [pdf, other]: Title: Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe submission to NIST SRE Challenge 2019

Grigory Antipov, Nicolas Gengembre, Olivier Le Blouch, Gaël Le Lan

Comments: 5 pages, 1 figure, accepted at INTERSPEECH 2020. Corrected the reference [20]

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[124] arXiv:2008.05983 [pdf, other]: Title: Cross attentive pooling for speaker verification

Seong Min Kye, Yoohwan Kwon, Joon Son Chung

Comments: SLT 2021. Code available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[125] arXiv:2008.06006 [pdf, other]: Title: Textual Echo Cancellation

Shaojin Ding, Ye Jia, Ke Hu, Quan Wang

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Machine Learning (stat.ML)
[126] arXiv:2008.06121 [pdf, other]: Title: LSTM Acoustic Models Learn to Align and Pronounce with Graphemes

Arindrima Datta, Guanlong Zhao, Bhuvana Ramabhadran, Eugene Weinstein

Comments: 5 pages, 4 figures. This work was done between summer 2018 and spring 2019

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[127] arXiv:2008.06146 [pdf, other]: Title: End-to-End Trainable Self-Attentive Shallow Network for Text-Independent Speaker Verification

Hyeonmook Park, Jungbae Park, Sang Wan Lee

Comments: 5 pages, 3 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[128] arXiv:2008.06182 [pdf, other]: Title: Online Speaker Adaptation for WaveNet-based Neural Vocoders

Qiuchen Huang, Yang Ai, Zhenhua Ling

Comments: 6 pages, 2 figures, 4 tables

Subjects: Audio and Speech Processing (eess.AS)
[129] arXiv:2008.06208 [pdf, other]: Title: Adaptable Multi-Domain Language Model for Transformer ASR

Taewoo Lee, Min-Joong Lee, Tae Gyoon Kang, Seokyeoung Jung, Minseok Kwon, Yeona Hong, Jungin Lee, Kyoung-Gu Woo, Ho-Gyeong Kim, Jiseung Jeong, Jihyun Lee, Hosik Lee, Young Sang Choi

Comments: This paper is accepted for presentation at IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE ICASSP), 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[130] arXiv:2008.06273 [pdf, other]: Title: The Impact of Label Noise on a Music Tagger

Katharina Prinz, Arthur Flexer, Gerhard Widmer

Comments: In Proceedings of the 13th International Workshop on Machine Learning and Music, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[131] arXiv:2008.06358 [pdf, other]: Title: Semi-supervised learning using teacher-student models for vocal melody extraction

Sangeun Kum, Jing-Hua Lin, Li Su, Juhan Nam

Comments: 8 pages, 5 figures, accepted for the 21st International Society for Music Information Retrieval Conference (ISMIR 2020)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[132] arXiv:2008.06412 [pdf, other]: Title: Data augmentation and loss normalization for deep noise suppression

Sebastian Braun, Ivan Tashev

Comments: to appear in Proc. 22nd International Conference on Speech and Computer (SPECOM), 2020

Subjects: Audio and Speech Processing (eess.AS)
[133] arXiv:2008.06580 [pdf, other]: Title: Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski

Comments: Total of 31 pages, 27 figures. Associated repository: this https URL

Journal-ref: IEEE Open Journal of Signal Processing, vol. 2, pp. 33-66, 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[134] arXiv:2008.06665 [pdf, other]: Title: EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification

Shuiyang Mao, P. C. Ching, Tan Lee

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[135] arXiv:2008.06667 [pdf, other]: Title: Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition

Shuiyang Mao, P. C. Ching, C.-C. Jay Kuo, Tan Lee

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[136] arXiv:2008.06682 [pdf, other]: Title: Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition

Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, Suranga Nanayakkara

Comments: Accepted to INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[137] arXiv:2008.06702 [pdf, other]: Title: Experimental investigations of psychoacoustic characteristics of household vacuum cleaners

Sanjay Kumar, Wong Sze Wing, Teng Mingbang, Heow Pueh Lee

Comments: 16 pages, 7 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[138] arXiv:2008.06764 [pdf, other]: Title: FEARLESS STEPS Challenge (FS-2): Supervised Learning with Massive Naturalistic Apollo Data

Aditya Joglekar, John H.L. Hansen, Meena Chandra Shekar, Abhijeet Sangwan

Comments: Paper Accepted in the Interspeech 2020 Conference

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[139] arXiv:2008.06867 [pdf, other]: Title: Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder

Hyun-Wook Yoon, Sang-Hoon Lee, Hyeong-Rae Noh, Seong-Whan Lee

Comments: Accepted in INTERSPEECH2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[140] arXiv:2008.06892 [pdf, other]: Title: Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Mingjie Chen, Thomas Hain

Comments: To be presented in Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[141] arXiv:2008.06994 [pdf, other]: Title: ADL-MVDR: All deep learning MVDR beamformer for target speech separation

Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Dong Yu

Comments: Accepted to ICASSP 2021, 5 pages, 2 figures; Demos are available at this https URL

Subjects: Audio and Speech Processing (eess.AS)
[142] arXiv:2008.07052 [pdf, other]: Title: Exploiting Fully Convolutional Network and Visualization Techniques on Spontaneous Speech for Dementia Detection

Youxiang Zhu, Xiaohui Liang

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[143] arXiv:2008.07085 [pdf, other]: Title: Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

Soham Deshmukh, Bhiksha Raj, Rita Singh

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[144] arXiv:2008.07118 [pdf, other]: Title: PIANOTREE VAE: Structured Representation Learning for Polyphonic Music

Ziyu Wang, Yiyi Zhang, Yixiao Zhang, Junyan Jiang, Ruihan Yang, Junbo Zhao (Jake), Gus Xia

Journal-ref: In Proceedings of 21st International Conference on Music Information Retrieval (ISMIR), Montreal, Canada (virtual conference), 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[145] arXiv:2008.07191 [pdf, other]: Title: Deep Variational Generative Models for Audio-visual Speech Separation

Viet-Nhat Nguyen, Mostafa Sadeghi, Elisa Ricci, Xavier Alameda-Pineda

Comments: Accepted to the 31st IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Oct. 25-28, 2021, Gold Coast, Queensland, Australia

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[146] arXiv:2008.07231 [pdf, other]: Title: StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Piotr Masztalski, Mateusz Matuszewski, Karol Piaskowski, Michał Romaniuk

Comments: Accepted for INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[147] arXiv:2008.07244 [pdf, other]: Title: Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Michał Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz Matuszewski

Comments: Accepted for INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[148] arXiv:2008.07247 [pdf, other]: Title: Deep Learning Based Open Set Acoustic Scene Classification

Zuzanna Kwiatkowska, Beniamin Kalinowski, Michał Kośmider, Krzysztof Rykaczewski

Comments: This paper was submitted to conference INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[149] arXiv:2008.07281 [pdf, other]: Title: On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

Journal-ref: IEEE Signal Processing Letters, 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP); Machine Learning (stat.ML)
[150] arXiv:2008.07520 [pdf, other]: Title: Do face masks introduce bias in speech technologies? The case of automated scoring of speaking proficiency

Anastassia Loukina, Keelan Evanini, Matthew Mulholland, Ian Blood, Klaus Zechner

Journal-ref: Proceedings of Interspeech 2020, 1942-1946

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Sound (cs.SD)

Total of 254 entries : 1-50 51-100 101-150 151-200 201-250 251-254

Showing up to 50 entries per page: fewer | more | all