Audio and Speech Processing

Authors and titles for June 2020

Total of 181 entries : 1-50 51-100 101-150 151-181

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2006.00217 [pdf, other]: Title: Exploring Filterbank Learning for Keyword Spotting

Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[2] arXiv:2006.00408 [pdf, other]: Title: Introducing Latent Timbre Synthesis

K. Tatar, D. Bisig, P. Pasquier

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[3] arXiv:2006.00452 [pdf, other]: Title: Crossed-Time Delay Neural Network for Speaker Recognition

Liang Chen, Yanchun Liang, Xiaohu Shi, You Zhou, Chunguo Wu

Comments: MMM 2021 Paper, add GitHub address

Journal-ref: MMM 2021

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[4] arXiv:2006.00518 [pdf, other]: Title: Data-driven Detection and Analysis of the Patterns of Creaky Voice

Thomas Drugman, John Kane, Christer Gobl

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[5] arXiv:2006.00521 [pdf, other]: Title: Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra

Thomas Drugman, Yannis Stylianou

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[6] arXiv:2006.00525 [pdf, other]: Title: Residual Excitation Skewness for Automatic Speech Polarity Detection

Thomas Drugman

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[7] arXiv:2006.00687 [pdf, other]: Title: Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net

Hyeong-Seok Choi, Hoon Heo, Jie Hwan Lee, Kyogu Lee

Comments: 5 pages, 3 figures, Submitted to Interspeech2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2006.00703 [pdf, other]: Title: Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

Chander Chandak, Zeynab Raeesy, Ariya Rastrow, Yuzong Liu, Xiangyang Huang, Siyu Wang, Dong Kwon Joo, Roland Maas

Comments: 5 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[9] arXiv:2006.00751 [pdf, other]: Title: Evaluation of CNN-based Automatic Music Tagging Models

Minz Won, Andres Ferraro, Dmitry Bogdanov, Xavier Serra

Comments: 7 pages, 2 figures, Sound and Music Computing 2020 (SMC 2020)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[10] arXiv:2006.00772 [pdf, other]: Title: Similarity-and-Independence-Aware Beamformer: Method for Target Source Extraction using Magnitude Spectrogram as Reference

Atsuo Hiroe

Comments: Accepted in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[11] arXiv:2006.00782 [pdf, other]: Title: Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

Sanket Shah, Basil Abraham, Gurunath Reddy M, Sunayana Sitaram, Vikas Joshi

Comments: 5 pages (4 pages + 1 page references), 5 tables, 1 figure, 1 algorithm, 16 references

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[12] arXiv:2006.00848 [pdf, other]: Title: A time-scale modification dataset with subjective quality labels

Timothy Roberts, Kuldip K. Paliwal

Comments: 12 Pages, 13 Figures, Published in The Journal of the Acoustical Society of America (Vol.148, Issue 1), For associated dataset, see this http URL

Journal-ref: J. Acoust. Soc. Am. 148(1). pp. 201-210 (2020)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13] arXiv:2006.00877 [pdf, other]: Title: High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder

Kazi Nazmul Haque, Rajib Rana, Björn W Schuller

Comments: The paper is submitted to IEEE Access for review

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[14] arXiv:2006.01260 [pdf, other]: Title: Improving EEG based continuous speech recognition using GAN

Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

Comments: Under Review

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[15] arXiv:2006.01261 [pdf, other]: Title: Understanding effect of speech perception in EEG based speech recognition systems

Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

Comments: Under Review

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[16] arXiv:2006.01262 [pdf, other]: Title: Predicting Different Acoustic Features from EEG and towards direct synthesis of Audio Waveform from EEG

Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

Comments: Under Review

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[17] arXiv:2006.01595 [pdf, other]: Title: Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

Haytham M. Fayek, Anurag Kumar

Comments: 29th International Joint Conference on Artificial Intelligence (IJCAI 2020)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[18] arXiv:2006.01708 [pdf, other]: Title: Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings

Amélie Bosca, Alexandre Guérin, Lauréline Perotin, Srđan Kitić

Comments: Accepted for EUSIPCO 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[19] arXiv:2006.01796 [pdf, other]: Title: Neural Speaker Diarization with Speaker-Wise Chain Rule

Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Jing Shi, Kenji Nagamatsu

Comments: Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[20] arXiv:2006.01906 [pdf, other]: Title: Detecting Audio Attacks on ASR Systems with Dropout Uncertainty

Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin

Comments: Accepted for publication at Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[21] arXiv:2006.01919 [pdf, other]: Title: A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

Archontis Politis, Sharath Adavanne, Tuomas Virtanen

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22] arXiv:2006.02099 [pdf, other]: Title: Time Domain Velocity Vector for Retracing the Multipath Propagation

Jérôme Daniel, Srđan Kitić

Comments: Presented at ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2006.02547 [pdf, other]: Title: A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass

Comments: Proceedings of Interspeech, 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[24] arXiv:2006.02616 [pdf, other]: Title: Online End-to-End Neural Diarization with Speaker-Tracing Buffer

Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

Comments: Accepted to SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25] arXiv:2006.02786 [pdf, other]: Title: Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

Thilo von Neumann, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

Comments: 5 pages, INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[26] arXiv:2006.02814 [pdf, other]: Title: CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning

Sameer Khurana, Antoine Laurent, James Glass

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[27] arXiv:2006.02902 [pdf, other]: Title: Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems

Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

Comments: Under Review. arXiv admin note: substantial text overlap with arXiv:2006.01260

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[28] arXiv:2006.03107 [pdf, other]: Title: Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates

Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh

Comments: 5 pages, 4 figures, InterSpeech 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[29] arXiv:2006.03214 [pdf, other]: Title: Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

Haibin Wu, Andy T. Liu, Hung-yi Lee

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[30] arXiv:2006.03411 [pdf, other]: Title: Contextual RNN-T For Open Domain ASR

Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[31] arXiv:2006.03429 [pdf, other]: Title: Acoustic Anomaly Detection for Machine Sounds based on Image Transfer Learning

Robert Müller, Fabian Ritz, Steffen Illium, Claudia Linnhoff-Popien

Comments: ICAART 2021, 8 pages, 2 figures, 1 table

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[32] arXiv:2006.03473 [pdf, other]: Title: AP20-OLR Challenge: Three Tasks and Their Baselines

Zheng Li, Miao Zhao, Qingyang Hong, Lin Li, Zhiyuan Tang, Dong Wang, Liming Song, Cheng Yang

Comments: arXiv admin note: substantial text overlap with arXiv:1907.07626, arXiv:1806.00616, arXiv:1706.09742

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[33] arXiv:2006.04136 [pdf, other]: Title: Analysis and Synthesis of Hypo and Hyperarticulated Speech

Benjamin Picart, Thomas Drugman, Thierry Dutoit

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[34] arXiv:2006.04138 [pdf, other]: Title: Maximum Phase Modeling for Sparse Linear Prediction of Speech

Thomas Drugman

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[35] arXiv:2006.04142 [pdf, other]: Title: Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

Onur Babacan, Thomas Drugman, Tuomo Raitio, Daniel Erro, Thierry Dutoit

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[36] arXiv:2006.04154 [pdf, other]: Title: VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture

Da-Yi Wu, Yen-Hao Chen, Hung-Yi Lee

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[37] arXiv:2006.04326 [pdf, other]: Title: Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Nakamasa Inoue, Keita Goto

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[38] arXiv:2006.04372 [pdf, other]: Title: Zero resource speech synthesis using transcripts derived from perceptual acoustic units

Karthik Pandia D S, Hema A Murthy

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:2006.04469 [pdf, other]: Title: A non-causal FFTNet architecture for speech enhancement

Muhammed PV Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou

Comments: 5 pages

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[40] arXiv:2006.04558 [pdf, other]: Title: FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Comments: Accepted by ICLR 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[41] arXiv:2006.04664 [pdf, other]: Title: MultiSpeech: Multi-Speaker Text to Speech with Transformer

Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[42] arXiv:2006.04928 [pdf, other]: Title: Learning to Count Words in Fluent Speech enables Online Speech Recognition

George Sterpu, Christian Saam, Naomi Harte

Comments: Accepted at the 8th IEEE Spoken Language Technology Workshop (SLT 2021)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[43] arXiv:2006.05129 [pdf, other]: Title: On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

Balázs Tarján, György Szaszák, Tibor Fegyó, Péter Mihajlik

Comments: 8 pages, 2 figures, accepted for publication at TSD 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[44] arXiv:2006.05174 [pdf, other]: Title: Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers

Tsung-Han Wu, Chun-Chen Hsieh, Yen-Hao Chen, Po-Han Chi, Hung-yi Lee

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[45] arXiv:2006.05233 [pdf, other]: Title: A fully recurrent feature extraction for single channel speech enhancement

Muhammed PV Shifas, Santelli Claudio, Vassilis Tsiaras, Yannis Stylianou

Comments: 5 pages

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46] arXiv:2006.05257 [pdf, other]: Title: Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition

Gurunath Reddy Madhumani, Sanket Shah, Basil Abraham, Vikas Joshi, Sunayana Sitaram

Comments: 5 pages (4 pages + 1 reference), 3 tables, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[47] arXiv:2006.05365 [pdf, other]: Title: Vocal markers from sustained phonation in Huntington's Disease

Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi

Comments: To appear at INTERSPEECH 2020. 1 pages of supplementary material appear only in the arxiv version. Code to replicate this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[48] arXiv:2006.05474 [pdf, other]: Title: Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Changhan Wang, Juan Pino, Jiatao Gu

Comments: Accepted to INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[49] arXiv:2006.05584 [pdf, other]: Title: Exploring Quality and Generalizability in Parameterized Neural Audio Effects

William Mitchell, Scott H. Hawley

Comments: 7 pages, 5 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[50] arXiv:2006.05596 [pdf, other]: Title: Speaker Diarization: Using Recurrent Neural Networks

Vishal Sharma, Zekun Zhang, Zachary Neubert, Curtis Dyreson

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Total of 181 entries : 1-50 51-100 101-150 151-181

Showing up to 50 entries per page: fewer | more | all