Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for March 2020

Total of 117 entries
Showing up to 2000 entries per page: fewer | more | all
[26] arXiv:2003.03135 [pdf, other]
Title: Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages
Astik Biswas, Emre Yılmaz, Febe de Wet, Ewald van der Westhuizen, Thomas Niesler
Comments: Conference
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[27] arXiv:2003.03375 [pdf, other]
Title: Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals
Eric Guizzo, Tillman Weyde, Jack Barnett Leveson
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[28] arXiv:2003.03432 [pdf, other]
Title: Lightweight Speaker Verification for Online Identification of New Speakers with Short Segments
Ivette Velez, Caleb Rascon, Gibran Fuentes-Pineda
Comments: This paper has been accepted for publication in Applied Soft Computing Journal
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[29] arXiv:2003.03927 [pdf, other]
Title: Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
Comments: accepted in ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30] arXiv:2003.03987 [pdf, other]
Title: Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system
Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani
Comments: 8 pages, to appear in ICASSP2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[31] arXiv:2003.03998 [pdf, other]
Title: Improving noise robust automatic speech recognition with single-channel time-domain enhancement network
Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani
Comments: 5 pages, to appear in ICASSP2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[32] arXiv:2003.04194 [pdf, other]
Title: Toward Cross-Domain Speech Recognition with End-to-End Models
Thai-Son Nguyen, Sebastian Stüker, Alex Waibel
Comments: Presented in Life-Long Learning for Spoken Language Systems Workshop - ASRU 2019
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[33] arXiv:2003.04241 [pdf, other]
Title: Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data
Vincent Roger, Jérôme Farinas, Julien Pinquier
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[34] arXiv:2003.04640 [pdf, other]
Title: Vowels and Prosody Contribution in Neural Network Based Voice Conversion Algorithm with Noisy Training Data
Olaide Agbolade
Comments: 5 pages
Journal-ref: European Journal of Engineering Research and Science, 5(3), pp.229-233 (2020)
Subjects: Audio and Speech Processing (eess.AS)
[35] arXiv:2003.04710 [pdf, other]
Title: Development of Automatic Speech Recognition for Kazakh Language using Transfer Learning
Amirgaliyev E.N., Kuanyshbay D.N., Baimuratov O
Comments: 9 pages, 3 fig., 1 table
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36] arXiv:2003.04733 [pdf, other]
Title: Speaker Identification using EEG
Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP); Machine Learning (stat.ML)
[37] arXiv:2003.05184 [pdf, other]
Title: Voice conversion using coefficient mapping and neural network
Olaide Ayodeji Agbolade, Samson A. Oyetunji
Comments: 5 pages
Journal-ref: In 2016 International Conference for Students on Applied Engineering (ICSAE) (pp. 479-483) IEEE
Subjects: Audio and Speech Processing (eess.AS)
[38] arXiv:2003.05223 [pdf, other]
Title: Robust Audio Watermarking Using Graph-based Transform and Singular Value Decomposition
Majid Farzaneh, Rahil Mahdian Toroghi
Comments: 5 pages, 5 images, 4 tables, submitted to an IRED conference
Subjects: Audio and Speech Processing (eess.AS)
[39] arXiv:2003.05897 [pdf, other]
Title: Bringing in the outliers: A sparse subspace clustering approach to learn a dictionary of mouse ultrasonic vocalizations
Jiaxi Wang, Karel Mundnich, Allison T. Knoll, Pat Levitt, Shrikanth Narayanan
Comments: 5 pages, 4 figures, conference paper, accepted in ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40] arXiv:2003.06182 [pdf, other]
Title: A Wide Dataset of Ear Shapes and Pinna-Related Transfer Functions Generated by Random Ear Drawings
Corentin Guezenoc (IETR), Renaud Seguier (IETR)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Classical Physics (physics.class-ph)
[41] arXiv:2003.06183 [pdf, other]
Title: HRTF Individualization: A Survey
Corentin Guezenoc (IETR), Renaud Seguier (IETR)
Comments: Audio Engineering Society Convention 145, Audio Engineering Society, Oct 2018, New York, United States
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[42] arXiv:2003.06226 [pdf, other]
Title: Quantifying Musical Style: Ranking Symbolic Music based on Similarity to a Style
Jeff Ens, Philippe Pasquier
Journal-ref: In Proceedings of the International Symposium on Music Information Retrieval. Vol. 20. 2019, 870-877
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43] arXiv:2003.06227 [pdf, other]
Title: Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
Ting-Yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir
Comments: Accepted at ICASSP 2020 (for presentation in a lecture session)
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Machine Learning (cs.LG); Sound (cs.SD)
[44] arXiv:2003.06656 [pdf, other]
Title: Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events
Davide Berghi, Hanne Stenzel, Marco Volino, Adrian Hilton, Philip J.B. Jackson
Comments: Two-pages poster abstract
Journal-ref: IEEE VR 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[45] arXiv:2003.06686 [pdf, other]
Title: Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0
Zack Hodari, Catherine Lai, Simon King
Comments: Published to the 10th ISCA International Conference on Speech Prosody (SP2020)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[46] arXiv:2003.06779 [pdf, other]
Title: A proto-object based audiovisual saliency map
Sudarshan Ramenahalli
Comments: 50 pages, 12 figures
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[47] arXiv:2003.06894 [pdf, other]
Title: Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models
Natalia Tomashenko, Yuri Khokhlov, Yannick Esteve
Comments: 36 pages; originally was submitted to CSL in February 2017
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[48] arXiv:2003.07032 [pdf, other]
Title: Multi-modal Multi-channel Target Speech Separation
Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lianwu Chen, Yuexian Zou, Dong Yu
Comments: accepted in IEEE Journal of Selcted Topics in Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[49] arXiv:2003.07393 [pdf, other]
Title: TensorFlow Audio Models in Essentia
Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[50] arXiv:2003.07482 [pdf, other]
Title: High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model
Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong
Comments: Accepted by ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[51] arXiv:2003.07544 [pdf, other]
Title: Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method
Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen, Xuefei Liu
Comments: ACCEPTED by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[52] arXiv:2003.07688 [pdf, other]
Title: End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification
Esther Rituerto-González, Carmen Peláez-Moreno
Comments: Published on Monday 10th of May 2021 in Neural Computing and Applications, Springer
Journal-ref: Online, Neural Comput & Applic (2021), pp. 1-11
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[53] arXiv:2003.07692 [pdf, other]
Title: ASR Error Correction and Domain Adaptation Using Machine Translation
Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze
Comments: Accepted for Oral Presentation at ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[54] arXiv:2003.07704 [pdf, other]
Title: Audio inpainting with generative adversarial network
P. P. Ebner, A. Eltelt
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[55] arXiv:2003.07705 [pdf, other]
Title: Hybrid Autoregressive Transducer (hat)
Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[56] arXiv:2003.07962 [pdf, other]
Title: Deliberation Model Based Two-Pass End-to-End Speech Recognition
Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[57] arXiv:2003.08954 [pdf, other]
Title: Voice and accompaniment separation in music using self-attention convolutional neural network
Yuzhou Liu (1), Balaji Thoshkahna (2), Ali Milani (3), Trausti Kristjansson (3) ((1) Ohio State University (2) Amazon Music, Bangalore (3) Amazon Lab126, CA)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[58] arXiv:2003.09125 [pdf, other]
Title: Improving Embedding Extraction for Speaker Verification with Ladder Network
Fei Tao, Gokhan Tur
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[59] arXiv:2003.09164 [pdf, other]
Title: Acoustic Scene Classification using Audio Tagging
Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu
Comments: 5 pages, 2 figures, 6 tables, submitted to Interspeech 2020 as a conference paper
Subjects: Audio and Speech Processing (eess.AS)
[60] arXiv:2003.09180 [pdf, other]
Title: Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking
Yoonjae Jeong, Hoon-Young Cho
Comments: Accepted by ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[61] arXiv:2003.09542 [pdf, other]
Title: Deep Generative Variational Autoencoding for Replay Spoof Detection in Automatic Speaker Verification
Bhusan Chettri, Tomi Kinnunen, Emmanouil Benetos
Comments: Accepted to Computer Speech and Language Special issue on Advances in Automatic Speaker Verification Anti-spoofing, 2020
Subjects: Audio and Speech Processing (eess.AS)
[62] arXiv:2003.09889 [pdf, other]
Title: Audio Impairment Recognition Using a Correlation-Based Feature Representation
Alessandro Ragano, Emmanouil Benetos, Andrew Hines
Comments: This publication has been accepted in 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[63] arXiv:2003.09891 [pdf, other]
Title: Low Latency ASR for Simultaneous Speech Translation
Thai Son Nguyen, Jan Niehues, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Muller, Matthias Sperber, Sebastian Stueker, Alex Waibel
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[64] arXiv:2003.10022 [pdf, other]
Title: High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
Thai-Son Nguyen, Ngoc-Quan Pham, Sebastian Stueker, Alex Waibel
Comments: To appear in Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[65] arXiv:2003.10183 [pdf, other]
Title: Dialect Identification of Spoken North Sámi Language Varieties Using Prosodic Features
Sofoklis Kakouros, Katri Hiovain, Martti Vainio, Juraj Šimko
Subjects: Audio and Speech Processing (eess.AS)
[66] arXiv:2003.10369 [pdf, other]
Title: Low Latency End-to-End Streaming Speech Recognition with a Scout Network
Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou
Subjects: Audio and Speech Processing (eess.AS)
[67] arXiv:2003.10724 [pdf, other]
Title: Evaluation of Error and Correlation-Based Loss Functions For Multitask Learning Dimensional Speech Emotion Recognition
Bagus Tris Atmaja, Masato Akagi
Comments: 3 figures, 3 tables, submitted to ANV 2020
Subjects: Audio and Speech Processing (eess.AS)
[68] arXiv:2003.11750 [pdf, other]
Title: Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression
Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda
Comments: 13 pages, 13 figures, 1 table, accepted to publish in IEEE Access
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[69] arXiv:2003.11882 [pdf, other]
Title: Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders
Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines
Comments: 6 pages, 11 figures, conference
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[70] arXiv:2003.11982 [pdf, other]
Title: In defence of metric learning for speaker recognition
Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han
Comments: The code can be found at this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[71] arXiv:2003.12108 [pdf, other]
Title: A Review of Multi-Objective Deep Learning Speech Denoising Methods
Arian Azarang, Nasser Kehtarnavaz
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[72] arXiv:2003.12266 [pdf, other]
Title: Dual Attention in Time and Frequency Domain for Voice Activity Detection
Joohyung Lee, Youngmoon Jung, Hoirin Kim
Comments: Accepted to Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS)
[73] arXiv:2003.12326 [pdf, other]
Title: Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss
Yi Luo, Nima Mesgarani
Comments: Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[74] arXiv:2003.12362 [pdf, other]
Title: Can you hear me $\textit{now}$? Sensitive comparisons of human and machine perception
Michael A Lepori, Chaz Firestone
Comments: 24 pages; 4 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[75] arXiv:2003.12366 [pdf, html, other]
Title: Training for Speech Recognition on Coprocessors
Sebastian Baunsgaard, Sebastian B. Wrede, Pınar Tozun
Comments: published at ADMS 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[76] arXiv:2003.12425 [pdf, other]
Title: Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems
Akhil Mathur, Anton Isopoussu, Fahim Kawsar, Nadia Berthouze, Nicholas D. Lane
Comments: Published at ACM IPSN 2019
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[77] arXiv:2003.13033 [pdf, other]
Title: Mechanical classification of voice quality
Akito Yoshida, Shigeru Shinomoto
Subjects: Audio and Speech Processing (eess.AS)
[78] arXiv:2003.13917 [pdf, other]
Title: Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement
Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Chin-Hui Lee
Comments: The authors have revised some annotations in Table 4 to improve the clarity. The authors thank reading feedbacks from Jonathan Le Roux. The first draft was finished in August 2019. Accepted to IEEE ICASSP 2020
Journal-ref: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[79] arXiv:2003.00063 (cross-list from cs.CV) [pdf, other]
Title: Bio-Inspired Modality Fusion for Active Speaker Detection
Gustavo Assunção, Nuno Gonçalves, Paulo Menezes
Journal-ref: Appl. Sci. 2021, 11(8), 3397
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[80] arXiv:2003.00304 (cross-list from cs.CL) [pdf, other]
Title: Voice trigger detection from LVCSR hypothesis lattices using bidirectional lattice recurrent neural networks
Woojay Jeon, Leo Liu, Henry Mason
Comments: Presented at IEEE ICASSP, May 2019
Journal-ref: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 6356-6360
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[81] arXiv:2003.00342 (cross-list from cs.RO) [pdf, other]
Title: Robust Robotic Pouring using Audition and Haptics
Hongzhuo Liang, Chuangchuang Zhou, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Fuchun Sun, Marcus Stoffel, Jianwei Zhang
Comments: accepted by IROS2020
Journal-ref: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2003.00351 (cross-list from cs.CV) [pdf, other]
Title: Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
Nicolae-Catalin Ristea, Liviu Cristian Dutu, Anamaria Radoi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2003.00414 (cross-list from cs.SD) [pdf, other]
Title: Harmonics Based Representation in Clarinet Tone Quality Evaluation
Yixin Wang, Xiaohong Guan, Youtian Du, Nan Nan
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[84] arXiv:2003.00991 (cross-list from eess.SP) [pdf, other]
Title: Uniform Array with Broadband Beamforming for Arbitrary Beam Patterns
Phan Le Son
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2003.01037 (cross-list from cs.SD) [pdf, other]
Title: One or Two Components? The Scattering Transform Answers
Vincent Lostanlen, Alice Cohen-Hadria, Juan Pablo Bello
Comments: 5 pages, 4 figures, in English. Proceedings of the European Signal Processing Conference (EUSIPCO 2020)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[86] arXiv:2003.01309 (cross-list from cs.CL) [pdf, other]
Title: Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection
Qian Chen, Mengzhe Chen, Bo Li, Wen Wang
Comments: 4 pages, 2 figures, accepted by ICASSP 2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2003.01478 (cross-list from cs.CL) [pdf, other]
Title: Multi-Task Learning with Auxiliary Speaker Identification for Conversational Emotion Recognition
Jingye Li, Meishan Zhang, Donghong Ji, Yijiang Liu
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88] arXiv:2003.01509 (cross-list from cs.CL) [pdf, other]
Title: Improving Uyghur ASR systems with decoders using morpheme-based language models
Zicheng Qiu, Wei Jiang, Turghunjan Mamut
Comments: 4 figures, 5 tables
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2003.01787 (cross-list from cs.LG) [pdf, other]
Title: Untangling in Invariant Speech Recognition
Cory Stephenson, Jenelle Feather, Suchismita Padhy, Oguz Elibol, Hanlin Tang, Josh McDermott, SueYeon Chung
Comments: Advances in Neural Information Processing Systems. 2019
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:2003.01958 (cross-list from cs.MM) [pdf, other]
Title: ASMD: an automatic framework for compiling multimodal datasets with audio and scores
Federico Simonetta, Stavros Ntalampiras, Federico Avanzini
Comments: Accepted at the Sound and Music Computing Conference 2020
Subjects: Multimedia (cs.MM); Digital Libraries (cs.DL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91] arXiv:2003.02436 (cross-list from cs.LG) [pdf, other]
Title: Talking-Heads Attention
Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[92] arXiv:2003.03160 (cross-list from cs.SD) [pdf, other]
Title: A Neural Network Based Framework for Archetypical Sound Synthesis
Eric Guizzo, Alberto Novello
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93] arXiv:2003.03287 (cross-list from cs.SD) [pdf, other]
Title: Wavelet-based spatial audio framework
Davide Scaini
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2003.04210 (cross-list from cs.CV) [pdf, other]
Title: Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2003.04222 (cross-list from eess.SP) [pdf, other]
Title: Sparse and Cosparse Audio Dequantization Using Convex Optimization
Pavel Záviška, Pavel Rajmic
Journal-ref: 2020 43rd International Conference on Telecommunications and Signal Processing (TSP)
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2003.05997 (cross-list from cs.LG) [pdf, other]
Title: Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier
Comments: TACL 2020; pre-MIT Press publication version; v5 has a random attention baseline
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[97] arXiv:2003.07000 (cross-list from cs.CL) [pdf, other]
Title: TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang, Peng Xu, Davis Liang, Ajay Mishra, Bing Xiang
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2003.07758 (cross-list from cs.CV) [pdf, other]
Title: Multi-modal Dense Video Captioning
Vladimir Iashin, Esa Rahtu
Comments: To appear in the proceedings of CVPR Workshops 2020; Code: this https URL Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[99] arXiv:2003.07839 (cross-list from cs.SD) [pdf, other]
Title: High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features
Pierre-Amaury Grumiaux, Srdjan Kitic, Laurent Girin, Alexandre Guérin
Comments: 5 pages, 1 figure
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100] arXiv:2003.07996 (cross-list from cs.SD) [pdf, other]
Title: Cross Lingual Cross Corpus Speech Emotion Recognition
Shivali Goel (1), Homayoon Beigi (1 and 2) ((1) Department of Computer Science, Columbia University, (2) Recognition Technologies, Inc., South Salem, New York, United States)
Comments: 7 pages, 2 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[101] arXiv:2003.08050 (cross-list from cs.SD) [pdf, other]
Title: Multi-Source DOA Estimation through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield
A. Fahim, P. N. Samarasinghe, T. D. Abhayapala
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2019) 605 - 618
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102] arXiv:2003.08225 (cross-list from cs.SD) [pdf, other]
Title: Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method
Yuan Gong, Jian Yang, Christian Poellabauer
Comments: Code of this work is available here: this https URL
Journal-ref: in IEEE Signal Processing Letters, vol. 27, pp. 920-924, 2020
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[103] arXiv:2003.09211 (cross-list from cs.CL) [pdf, other]
Title: Parallel Intent and Slot Prediction using MLB Fusion
Anmol Bhasin, Bharatram Natarajan, Gaurav Mathur, Himanshu Mangla
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[104] arXiv:2003.09284 (cross-list from cs.SD) [pdf, other]
Title: Acoustic Scene Classification with Squeeze-Excitation Residual Networks
Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarello, Maximo Cobos
Journal-ref: IEEEAccess 2020
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[105] arXiv:2003.09287 (cross-list from cs.SD) [pdf, other]
Title: Exploring Inherent Properties of the Monophonic Melody of Songs
Zehao Wang, Shicheng Zhang, Xiaoou Chen
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[106] arXiv:2003.09632 (cross-list from cs.SD) [pdf, other]
Title: A Quantum Vocal Theory of Sound
Davide Rocchesso, Maria Mannone
Comments: 32 pages, 11 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2003.09815 (cross-list from cs.SD) [pdf, other]
Title: A Time-domain Monaural Speech Enhancement with Feedback Learning
Andong Li, Chengshi Zheng, Linjuan Cheng, Renhua Peng, Xiaodong Li
Comments: Accepted by APSIPA 2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108] arXiv:2003.10806 (cross-list from cs.SD) [pdf, other]
Title: Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato
Maxim Vashkevich, Alexander Petrovsky, Yuliya Rushkevich
Comments: Proc. of International Conference Signal Processing Algorithms, Architectures, Arrangements, and Applications (SPA 2019)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[109] arXiv:2003.11117 (cross-list from cs.SD) [pdf, other]
Title: COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis
Björn W. Schuller, Dagmar M. Schuller, Kun Qian, Juan Liu, Huaiyuan Zheng, Xiao Li
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[110] arXiv:2003.11562 (cross-list from cs.CL) [pdf, other]
Title: Finnish Language Modeling with Deep Transformer Models
Abhilash Jain, Aku Ruohe, Stig-Arne Grönroos, Mikko Kurimo
Comments: 4 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[111] arXiv:2003.12175 (cross-list from cs.LG) [pdf, other]
Title: Incremental Learning Algorithm for Sound Event Detection
Eunjeong Koh, Fatemeh Saki, Yinyi Guo, Cheng-Yu Hung, Erik Visser
Comments: IEEE ICME 2020 Camera Ready Version
Journal-ref: IEEE ICME 2020
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2003.12222 (cross-list from cs.SD) [pdf, other]
Title: Voice activity detection in the wild via weakly supervised sound event detection
Heinrich Dinkel, Yefei Chen, Mengyue Wu, Kai Yu
Comments: Accepted in Interspeech 2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113] arXiv:2003.12687 (cross-list from cs.CL) [pdf, other]
Title: Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
Comments: Accepted to INTERSPEECH 2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114] arXiv:2003.12799 (cross-list from cs.CL) [pdf, other]
Title: Unsupervised feature learning for speech using correspondence and Siamese networks
Petri-Johan Last, Herman A. Engelbrecht, Herman Kamper
Comments: 5 pages, 3 figures, 2 tables; accepted to the IEEE Signal Processing Letters, (c) 2020 IEEE
Journal-ref: IEEE Signal Processing Letters 27 (2020) 421-425
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[115] arXiv:2003.12973 (cross-list from cs.SD) [pdf, other]
Title: A Recursive Network with Dynamic Attention for Monaural Speech Enhancement
Andong Li, Chengshi Zheng, Cunhang Fan, Renhua Peng, Xiaodong Li
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2003.13217 (cross-list from cs.MM) [pdf, other]
Title: Deep Residual Neural Networks for Image in Speech Steganography
Shivam Agarwal, Siddarth Venkatraman
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2003.14021 (cross-list from cs.LG) [pdf, other]
Title: A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification
Juan M. Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Total of 117 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack