Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for April 2020

Total of 118 entries : 1-50 51-100 101-118
Showing up to 50 entries per page: fewer | more | all
[51] arXiv:2004.04972 (cross-list from cs.CL) [pdf, other]
Title: Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data
Soumi Maiti, Erik Marchi, Alistair Conkie
Comments: Accepted to IEEE ICASSP 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52] arXiv:2004.05274 (cross-list from eess.AS) [pdf, other]
Title: Improved Speech Representations with Multi-Target Autoregressive Predictive Coding
Yu-An Chung, James Glass
Comments: Accepted to ACL 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[53] arXiv:2004.05830 (cross-list from eess.AS) [pdf, other]
Title: From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech
Hyeong-Seok Choi, Changdae Park, Kyogu Lee
Comments: 18 pages, 12 figures, Published as a conference paper at International Conference on Learning Representations (ICLR) 2020. (camera-ready version)
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[54] arXiv:2004.05985 (cross-list from cs.CL) [pdf, other]
Title: Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?
Łukasz Augustyniak, Piotr Szymanski, Mikołaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak
Comments: submitted to INTERSPEECH'20
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2004.05989 (cross-list from eess.AS) [pdf, other]
Title: Data augmentation using generative networks to identify dementia
Bahman Mirheidari, Yilin Pan, Daniel Blackburn, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[56] arXiv:2004.06332 (cross-list from eess.AS) [pdf, other]
Title: Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment
Chao Ma, Dongmei Li, Xupeng Jia
Comments: This paper has been rejectted by INTERSPEECH 2020. It has been modified extensively and submitted to APSIPA ASC 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57] arXiv:2004.06338 (cross-list from eess.AS) [pdf, other]
Title: Transformer based Grapheme-to-Phoneme Conversion
Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth
Comments: INTERSPEECH 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[58] arXiv:2004.06422 (cross-list from eess.AS) [pdf, other]
Title: An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification
Hemlata Tak, Jose Patino, Andreas Nautsch, Nicholas Evans, Massimiliano Todisco
Comments: Accepted to Speaker Odyssey (The Speaker and Language Recognition Workshop), 2020, 8 pages
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59] arXiv:2004.06480 (cross-list from eess.AS) [pdf, other]
Title: Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech
N. Wilkinson, A. Biswas, E. Yılmaz, F. de Wet, E. van der Westhuizen, T.R. Niesler
Comments: SLTU 2020. arXiv admin note: text overlap with arXiv:2003.03135
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[60] arXiv:2004.06579 (cross-list from eess.AS) [pdf, other]
Title: The Hearpiece database of individual transfer functions of an openly available in-the-ear earpiece for hearing device research
Florian Denk, Birger Kollmeier
Comments: 14 pages, 13 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[61] arXiv:2004.06756 (cross-list from eess.AS) [pdf, other]
Title: Speaker Diarization with Lexical Information
Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan
Journal-ref: Interspeech 2019, 391-395
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[62] arXiv:2004.07070 (cross-list from cs.CL) [pdf, other]
Title: Analyzing analytical methods: The case of phonology in neural models of spoken language
Grzegorz Chrupała, Bertrand Higy, Afra Alishahi
Comments: ACL 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63] arXiv:2004.07301 (cross-list from cs.CV) [pdf, other]
Title: ESResNet: Environmental Sound Classification Based on Visual Domain Models
Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel
Comments: 8 pages, 4 figures; submitted to ICPR 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64] arXiv:2004.07370 (cross-list from eess.AS) [pdf, other]
Title: F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder
Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[65] arXiv:2004.07442 (cross-list from cs.CR) [pdf, other]
Title: Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release
Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa
Comments: The paper has been accepted by the IEEE International Conference on Multimedia & Expo 2020(ICME 2020)
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66] arXiv:2004.07832 (cross-list from eess.AS) [pdf, other]
Title: Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders
Yang Ai, Zhen-Hua Ling
Comments: Submitted to Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[67] arXiv:2004.07948 (cross-list from eess.AS) [pdf, other]
Title: Sound of Guns: Digital Forensics of Gun Audio Samples meets Artificial Intelligence
Simone Raponi, Isra Ali, Gabriele Oligeri
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[68] arXiv:2004.07992 (cross-list from eess.AS) [pdf, other]
Title: Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network
Mariana Rodrigues Makiuchi, Tifani Warnita, Nakamasa Inoue, Koichi Shinoda, Michitaka Yoshimura, Momoko Kitazawa, Kei Funaki, Yoko Eguchi, Taishiro Kishimoto
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[69] arXiv:2004.08248 (cross-list from eess.AS) [pdf, other]
Title: Acoustical classification of different speech acts using nonlinear methods
Chirayata Bhattacharyya, Sourya Sengupta, Sayan Nag, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh
Comments: 6 pages, 2 figures; Proceedings of WESPAC 2018, New Delhi, India, November 11-15, 2018
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Chaotic Dynamics (nlin.CD); Neurons and Cognition (q-bio.NC)
[70] arXiv:2004.08287 (cross-list from eess.AS) [pdf, other]
Title: Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning
Jyotibdha Acharya, Arindam Basu
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[71] arXiv:2004.08326 (cross-list from eess.AS) [pdf, other]
Title: SpEx: Multi-Scale Time Domain Speaker Extraction Network
Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
Comments: ACCEPTED in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[72] arXiv:2004.09347 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Whisper to Natural Speech Conversion using Modified Transformer Network
Abhishek Niranjan, Mukesh Sharma, Sai Bharath Chandra Gutha, M Ali Basha Shaik
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[73] arXiv:2004.09367 (cross-list from cs.LG) [pdf, other]
Title: ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers
Jung-Woo Ha, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Eunmi Kim, Hyeji Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim
Comments: 5 pages, 2 figures, 4 tables, The first two authors equally contributed to this work
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Machine Learning (stat.ML)
[74] arXiv:2004.09476 (cross-list from cs.CV) [pdf, other]
Title: Music Gesture for Visual Sound Separation
Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba
Comments: CVPR 2020. Project page: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75] arXiv:2004.09571 (cross-list from eess.AS) [pdf, other]
Title: Language-agnostic Multilingual Modeling
Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, Brian Roark
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
[76] arXiv:2004.09584 (cross-list from eess.AS) [pdf, other]
Title: ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric
Michael Chinen, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman, Andrew Hines
Comments: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[77] arXiv:2004.09607 (cross-list from eess.AS) [pdf, other]
Title: Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System
Viet Lam Phung, Phan Huy Kinh, Anh Tuan Dinh, Quoc Bao Nguyen
Comments: 8 pages, 2 figures, submit to Oriental Cocosda
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[78] arXiv:2004.10087 (cross-list from cs.CL) [pdf, other]
Title: AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling
Libo Qin, Xiao Xu, Wanxiang Che, Ting Liu
Comments: Accepted at Findings of EMNLP 2020. Data and code are available at this [URL] (this https URL)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2004.10093 (cross-list from cs.CL) [pdf, other]
Title: Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang, Yu Wu, Shujie Liu, Ming Zhou, Zhenglu Yang
Comments: accepted by ACL2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2004.10120 (cross-list from eess.AS) [pdf, other]
Title: Vector Quantized Contrastive Predictive Coding for Template-based Music Generation
Gaëtan Hadjeres, Léopold Crestel
Comments: 15 pages, 13 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[81] arXiv:2004.10234 (cross-list from cs.CL) [pdf, other]
Title: ESPnet-ST: All-in-One Speech Translation Toolkit
Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, Shinji Watanabe
Comments: Accepted at ACL 2020 System Demonstration (update Table1, fix typo)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2004.10246 (cross-list from eess.AS) [pdf, other]
Title: Music Generation with Temporal Structure Augmentation
Shakeel Raja
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[83] arXiv:2004.10345 (cross-list from cs.MM) [pdf, other]
Title: MIDI-Sheet Music Alignment Using Bootleg Score Synthesis
Thitaree Tanprasert, Teerapat Jenrungrot, Meinard Mueller, T.J. Tsai
Comments: 8 pages, 6 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[84] arXiv:2004.10347 (cross-list from cs.MM) [pdf, other]
Title: MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music
Daniel Yang, Thitaree Tanprasert, Teerapat Jenrungrot, Mengyi Shan, TJ Tsai
Comments: 8 pages, 8 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[85] arXiv:2004.10391 (cross-list from eess.AS) [pdf, other]
Title: Towards Linking the Lakh and IMSLP Datasets
TJ Tsai
Comments: 5 pages, 4 figures, 1 table. Accepted paper at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Image and Video Processing (eess.IV)
[86] arXiv:2004.10454 (cross-list from cs.CL) [pdf, other]
Title: A Study of Non-autoregressive Model for Sequence Generation
Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu
Comments: Accepted by ACL 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2004.10799 (cross-list from eess.AS) [pdf, other]
Title: Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov
Comments: Accepted by Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[88] arXiv:2004.10823 (cross-list from eess.AS) [pdf, other]
Title: Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Tomoki Koriyama, Hiroshi Saruwatari
Comments: 5 pages. Accepted by ICASSP2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[89] arXiv:2004.11012 (cross-list from eess.AS) [pdf, other]
Title: ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma
Comments: Accepted by ISCSLP2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[90] arXiv:2004.11162 (cross-list from eess.AS) [pdf, other]
Title: Flexible framework for audio reconstruction
Ondřej Mokrý, Pavel Rajmic, Pavel Záviška
Journal-ref: 23rd International Conference on Digital Audio Effects (eDAFx2020)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[91] arXiv:2004.11284 (cross-list from eess.AS) [pdf, other]
Title: Unsupervised Speech Decomposition via Triple Information Bottleneck
Kaizhi Qian, Yang Zhang, Shiyu Chang, David Cox, Mark Hasegawa-Johnson
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[92] arXiv:2004.11724 (cross-list from cs.MM) [pdf, other]
Title: Using Cell Phone Pictures of Sheet Music To Retrieve MIDI Passages
TJ Tsai, Daniel Yang, Mengyi Shan, Thitaree Tanprasert, Teerapat Jenrungrot
Comments: 13 pages, 8 figures, 3 tables. Accepted article in IEEE Transactions on Multimedia. arXiv admin note: text overlap with arXiv:2004.10347
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[93] arXiv:2004.11956 (cross-list from eess.AS) [pdf, other]
Title: Binaural Audio Source Remixing with Microphone Array Listening Devices
Ryan M. Corey, Andrew C. Singer
Comments: To appear at ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94] arXiv:2004.12031 (cross-list from cs.LG) [pdf, other]
Title: On the Role of Visual Cues in Audiovisual Speech Enhancement
Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen Abdelaziz
Comments: ICASSP 2021
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2004.12046 (cross-list from eess.AS) [pdf, other]
Title: Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-occurrence
Keisuke Imoto, Seisuke Kyochi
Comments: Accepted to IEICE Transactions on Information and Systems
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[96] arXiv:2004.12071 (cross-list from eess.AS) [pdf, other]
Title: Active Voice Authentication
Zhong Meng, M Umair Bin Altaf, Biing-Hwang (Fred)Juang
Comments: 39 pages, 4 figures
Journal-ref: Digital Signal Processing, Volume 101, June 2020, 102672, ISSN 1051-2004
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[97] arXiv:2004.12261 (cross-list from eess.AS) [pdf, other]
Title: Enabling Fast and Universal Audio Adversarial Attack Using Generative Model
Yi Xie, Zhuohang Li, Cong Shi, Jian Liu, Yingying Chen, Bo Yuan
Comments: Publish on AAAI21
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[98] arXiv:2004.12569 (cross-list from cs.MM) [pdf, other]
Title: DWT-GBT-SVD-based Robust Speech Steganography
Noshin Amiri, Iman Naderi
Comments: 10 pages, 4 Figures
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2004.12745 (cross-list from eess.AS) [pdf, other]
Title: Time-Frequency Analysis and Parameterisation of Knee Sounds for Non-invasive Detection of Osteoarthritis
Costas Yiallourides, Patrick A. Naylor
Comments: Submitted to IEEE Transactions on Biomedical Engineering
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[100] arXiv:2004.13007 (cross-list from cs.IR) [pdf, other]
Title: A session-based song recommendation approach involving user characterization along the play power-law distribution
Diego Sánchez-Moreno, Vivian F. López Batista, M. Dolores Muñoz Vicente, Ana B. Gil González, María N. Moreno-García
Comments: Accepted in Complexity (ISSN: 1099-0526)
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 118 entries : 1-50 51-100 101-118
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack