Sound

Authors and titles for May 2019

Total of 93 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:1905.00078 [pdf, other]: Title: Deep Learning for Audio Signal Processing

Hendrik Purwins (1), Bo Li (2), Tuomas Virtanen (3), Jan Schlüter (4 and 5), Shuo-yiin Chang (2), Tara Sainath (2) ((1) Aalborg University Copenhagen, (2) Google, (3) Tampere University, (4) Université de Toulon, (5) Austrian Research Institute for Artificial Intelligence)

Comments: 15 pages, 2 pdf figures

Journal-ref: Journal of Selected Topics of Signal Processing 14, No. 8 (2019)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[2] arXiv:1905.00151 [pdf, other]: Title: A Style Transfer Approach to Source Separation

Shrikant Venkataramani, Efthymios Tzinis, Paris Smaragdis

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3] arXiv:1905.00268 [pdf, other]: Title: Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Yin Cao, Qiuqiang Kong, Turab Iqbal, Fengyan An, Wenwu Wang, Mark D. Plumbley

Comments: 6 pages, 2 figures, conference

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:1905.01209 [pdf, other]: Title: A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders

Manuel Pariente (MULTISPEECH), Antoine Deleforge (MULTISPEECH), Emmanuel Vincent (MULTISPEECH)

Comments: Submitted to INTERSPEECH 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[5] arXiv:1905.01391 [pdf, other]: Title: Deep Tensor Factorization for Spatially-Aware Scene Decomposition

Jonah Casebeer, Michael Colomb, Paris Smaragdis

Comments: 5 pages, 5 figures, accepted to WASPAA 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[6] arXiv:1905.01842 [pdf, other]: Title: Topology of Networks in Generalized Musical Spaces

Marco Buongiorno Nardelli

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:1905.01898 [pdf, other]: Title: Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Szu-Wei Fu, Chien-Feng Liao, Yu Tsao

Comments: Accepted by IEEE Signal Processing Letters (SPL)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:1905.01899 [pdf, other]: Title: Investigating kernel shapes and skip connections for deep learning-based harmonic-percussive separation

Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck

Comments: Accepted for publication at WASPAA 2019, 5 pages, 5 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:1905.03278 [pdf, other]: Title: On the representation of speech and music

David N. Levin

Comments: 6 pages, 6 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Methodology (stat.ME)
[10] arXiv:1905.03330 [pdf, other]: Title: Universal Sound Separation

Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, John R. Hershey

Comments: 5 pages, accepted to WASPAA 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[11] arXiv:1905.03500 [pdf, other]: Title: Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney

Journal-ref: Proceedings of INTERSPEECH 2019

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[12] arXiv:1905.03632 [pdf, other]: Title: Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Jiri Malek, Zbynek Koldovsky, Marek Bohac

Comments: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusions

Journal-ref: IET Signal Processing, vol. 14, no. 3, pp. 124-133, May 2020

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Systems and Control (eess.SY)
[13] arXiv:1905.03637 [pdf, other]: Title: Sound texture synthesis using convolutional neural networks

Hugo Caracalla, Axel Roebel

Comments: submitted to Digital Audio Conference (DAFx 2019)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:1905.04348 [pdf, other]: Title: Multiclass Language Identification using Deep Learning on Spectral Images of Audio Signals

Shauna Revay, Matthew Teschke

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:1905.04554 [pdf, other]: Title: Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass

Comments: Copyright (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:1905.04874 [pdf, other]: Title: MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin

Comments: Accepted by Thirty-sixth International Conference on Machine Learning (ICML) 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:1905.05375 [pdf, other]: Title: Self-supervised Audio Spatialization with Correspondence Classifier

Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

Comments: ICIP 2019

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[18] arXiv:1905.06118 [pdf, other]: Title: Learning to Groove with Inverse Sequence Transformations

Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, David Bamman

Comments: Blog post and links: this https URL

Journal-ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2269-2279, 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[19] arXiv:1905.06286 [pdf, other]: Title: End-to-End Multi-Channel Speech Separation

Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

Comments: submitted to interspeech 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[20] arXiv:1905.06717 [pdf, other]: Title: Multi Web Audio Sequencer: Collaborative Music Making

Xavier Favory, Xavier Serra

Comments: 4 pages, 4 figures, short paper of the Web Audio Conference 2018

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[21] arXiv:1905.07497 [pdf, other]: Title: A comprehensive study of speech separation: spectrogram vs waveform separation

Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu

Comments: INTERSPEECH 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22] arXiv:1905.07880 [pdf, other]: Title: Independent Vector Analysis with more Microphones than Sources

Robin Scheibler, Nobutaka Ono

Comments: Accepted to WASPAA 2019, 5 pages, 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:1905.08076 [pdf, other]: Title: Dance Hit Song Prediction

Dorien herremans, David Martens, Kenneth Sörensen

Journal-ref: Journal of New music Research. 43:302 (2014)

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[24] arXiv:1905.08352 [pdf, other]: Title: Robust sound event detection in bioacoustic sensor networks

Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello

Comments: 32 pages, in English. Submitted to PLOS ONE journal in February 2019; revised August 2019; published October 2019

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[25] arXiv:1905.08546 [pdf, other]: Title: A multi-room reverberant dataset for sound event localization and detection

Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:1905.08557 [pdf, other]: Title: Bayesian Pitch Tracking Based on the Harmonic Model

Liming Shi, Jesper Kjaer Nielsen, Jesper Rindom Jensen, Max A. Little, Mads Graesboll Christensen

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:1905.08601 [pdf, other]: Title: Une ou deux composantes ? La réponse de la diffusion en ondelettes

Vincent Lostanlen

Comments: 4 pages, in French. Submitted to the GRETSI workshop

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:1905.10091 [pdf, other]: Title: Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection

Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian

Comments: Accept by TASLP

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[29] arXiv:1905.10604 [pdf, other]: Title: Reconstructing faces from voices

Yandong Wen, Rita Singh, Bhiksha Raj

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[30] arXiv:1905.10751 [pdf, other]: Title: Auditory Separation of a Conversation from Background via Attentional Gating

Shariq Mobin, Bruno Olshausen

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[31] arXiv:1905.11173 [pdf, other]: Title: ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent Generative Adversarial Networks

Xiaoqi Jia, Jianwei Tai, Hang Zhou, Yakai Li, Weijuan Zhang, Haichao Du, Qingjia Huang

Comments: Accepted by ECAI 2020, 8 pages, 4 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[32] arXiv:1905.11689 [pdf, other]: Title: Demonstration of PerformanceNet: A Convolutional Neural Network Model for Score-to-Audio Music Generation

Yu-Hua Chen, Bryan Wang, Yi-Hsuan Yang

Comments: 3 pages, 2 figures, IJCAI Demo 2019 camera-ready version

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:1905.11700 [pdf, other]: Title: Ensemble-based cover song detection

Marc Sarfati, Anthony Hu, Jonathan Donier

Comments: 7 pages, 4 figures, 7 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:1905.11760 [pdf, other]: Title: Two-level Explanations in Music Emotion Recognition

Verena Haunschmid, Shreyan Chowdhury, Gerhard Widmer

Comments: ML4MD Workshop of the 36th International Conference on Machine Learning

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35] arXiv:1905.11959 [pdf, other]: Title: Texture Selection for Automatic Music Genre Classification

Juliano H. Foleiss, Tiago F. Tavares

Comments: Submitted to Pattern Recognition (may, 2019)

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36] arXiv:1905.12324 [pdf, other]: Title: A new definition of the distortion matrix for an audio-to-score alignment system

A. J. Muñoz-Montoro, P. Vera-Candeas, D. Suarez-Dou, R. Cortina

Comments: CMMSE 2019

Journal-ref: Computational and Mathematical Methods, Wiley Online Library. 2019

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37] arXiv:1905.12439 [pdf, other]: Title: Towards robust audio spoofing detection: a detailed comparison of traditional and learned features

Balamurali BT, Kin Wah Edward Lin, Simon Lui, Jer-Ming Chen, Dorien Herremans

Journal-ref: IEEE Access. 2019

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM); Machine Learning (stat.ML)
[38] arXiv:1905.12629 [pdf, other]: Title: A New Multilabel System for Automatic Music Emotion Recognition

Fabio Paolizzo, Natalia Pichierri, Daniele Casali, Daniele Giardino, Marco Matta, Giovanni Costantini

Comments: 2 tables. Research supported by the EU through the MUSICAL-MOODS project funded by the Marie Sklodowska-Curie Actions Individual Fellowships Global Fellowships (MSCA-IF-GF) of the Horizon 2020 Programme H2020/2014-2020, REA grant agreement n.659434

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[39] arXiv:1905.12804 [pdf, other]: Title: A Music Classification Model based on Metric Learning and Feature Extraction from MP3 Audio Files

Angelo C. Mendes da Silva, Mauricio A. Nunes, Raul Fonseca Neto

Comments: In a review process, I found some errors and made some changes in methodology that improved my results. Once I finish the experiments, I will upload the new version

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[40] arXiv:1905.13448 [pdf, other]: Title: Audio Caption in a Car Setting with a Sentence-Level Loss

Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[41] arXiv:1905.00377 (cross-list from stat.AP) [pdf, other]: Title: Developing a large scale population screening tool for the assessment of Parkinson's disease using telephone-quality voice

Siddharth Arora, Ladan Baghai-Ravary, Athanasios Tsanas

Comments: 43 pages, 5 figures, 6 tables

Subjects: Applications (stat.AP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:1905.00390 (cross-list from eess.AS) [pdf, other]: Title: Interfacing PDM MEMS microphones with PFM spiking systems: Application for Neuromorphic Auditory Sensors

Angel Jimenez-Fernandez, Daniel Gutierrez-Galan, Antonio Rios-Navarro, Juan Pedro Dominguez-Morales, Gabriel Jimenez-Moreno

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[43] arXiv:1905.00590 (cross-list from eess.AS) [pdf, other]: Title: High quality, lightweight and adaptable TTS using LPCNet

Zvi Kons, Slava Shechtman, Alex Sorin, Carmel Rabinovitz, Ron Hoory

Comments: Accepted to Interspeech 2019

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[44] arXiv:1905.00615 (cross-list from eess.AS) [pdf, other]: Title: Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Comments: 5 pages, 6 figures, 3 tables; Accepted to Interspeech 2019

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[45] arXiv:1905.00628 (cross-list from eess.AS) [pdf, other]: Title: Psychoacoustically Motivated Audio Declipping Based on Weighted l1 Minimization

Pavel Záviška, Pavel Rajmic, Jíří Schimmel

Journal-ref: 2019 42nd International Conference on Telecommunications and Signal Processing (TSP)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46] arXiv:1905.00855 (cross-list from eess.AS) [pdf, other]: Title: Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

Comments: NeuralPS 2018 CDNNRIA workshop

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[47] arXiv:1905.00979 (cross-list from eess.AS) [pdf, other]: Title: City classification from multiple real-world sound scenes

Helen L. Bear, Toni Heittola, Annamaria Mesaros, Emmanouil Benetos, Tuomas Virtanen

Comments: Accepted to WASPAA 2019

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[48] arXiv:1905.01022 (cross-list from eess.AS) [pdf, other]: Title: A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor

Di Sheng, György Fazekas

Comments: 8 pages, accepted in IJCNN 2019

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[49] arXiv:1905.01152 (cross-list from eess.AS) [pdf, other]: Title: Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text

Murali Karthick Baskar, Shinji Watanabe, Ramon Astudillo, Takaaki Hori, Lukáš Burget, Jan Černocký

Comments: INTERSPEECH 2019

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD)
[50] arXiv:1905.01926 (cross-list from cs.LG) [pdf, other]: Title: Zero-Shot Audio Classification Based on Class Label Embeddings

Huang Xie, Tuomas Virtanen

Comments: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[51] arXiv:1905.02525 (cross-list from eess.AS) [pdf, other]: Title: Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support

Gokce Keskin, Tyler Lee, Cory Stephenson, Oguz H. Elibol

Comments: Submitted to Interspeech 2019

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[52] arXiv:1905.02545 (cross-list from eess.AS) [pdf, other]: Title: Meeting Transcription Using Virtual Microphone Arrays

Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[53] arXiv:1905.02639 (cross-list from eess.AS) [pdf, other]: Title: Transparent pronunciation scoring using articulatorily weighted phoneme edit distance

Reima Karhila, Anna-Riikka Smolander, Sari Ylinen, Mikko Kurimo

Comments: Submitted to Interspeech 2019

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[54] arXiv:1905.03072 (cross-list from cs.CL) [pdf, other]: Title: RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation

Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

Comments: Proceedings of INTERSPEECH 2019

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:1905.03828 (cross-list from cs.LG) [pdf, other]: Title: Universal Adversarial Perturbations for Speech Recognition Systems

Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar

Comments: Published as a conference paper at INTERSPEECH 2019

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[56] arXiv:1905.03864 (cross-list from eess.AS) [pdf, other]: Title: Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion

Orhan Ocal, Oguz H. Elibol, Gokce Keskin, Cory Stephenson, Anil Thomas, Kannan Ramchandran

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[57] arXiv:1905.04192 (cross-list from cs.LG) [pdf, other]: Title: Do Autonomous Agents Benefit from Hearing?

Abraham Woubie, Anssi Kanervisto, Janne Karttunen, Ville Hautamaki

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Machine Learning (stat.ML)
[58] arXiv:1905.04418 (cross-list from eess.SP) [pdf, other]: Title: Machine learning in acoustics: theory and applications

Michael J. Bianco, Peter Gerstoft, James Traer, Emma Ozanich, Marie A. Roch, Sharon Gannot, Charles-Alban Deledalle

Comments: Published with free access in Journal of the Acoustical Society of America, 27 Nov. 2019

Journal-ref: Journal of the Acoustical Society of America, 146(5) pp.3590--3628, 2019

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[59] arXiv:1905.04628 (cross-list from eess.AS) [pdf, other]: Title: Improving Opus Low Bit Rate Quality with Neural Speech Synthesis

Jan Skoglund, Jean-Marc Valin

Comments: Proc. Interspeech 2020, 5 pages

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[60] arXiv:1905.04709 (cross-list from cs.MM) [pdf, other]: Title: Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder

Gang Min, Changqing Zhang, Xiongwei Zhang, Wei Tan

Subjects: Multimedia (cs.MM); Information Theory (cs.IT); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61] arXiv:1905.05605 (cross-list from cs.CR) [pdf, other]: Title: Encrypted Speech Recognition using Deep Polynomial Networks

Shi-Xiong Zhang, Yifan Gong, Dong Yu

Comments: ICASSP 2019, slides@ this https URL

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[62] arXiv:1905.05879 (cross-list from eess.AS) [pdf, other]: Title: AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson

Comments: To Appear in Thirty-sixth International Conference on Machine Learning (ICML 2019)

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[63] arXiv:1905.06148 (cross-list from eess.AS) [pdf, other]: Title: A general-purpose deep learning approach to model time-varying audio effects

Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

Comments: audio files: this https URL

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[64] arXiv:1905.06533 (cross-list from cs.CL) [pdf, other]: Title: Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

Emre Yılmaz, Vikramjit Mitra, Ganesh Sivaraman, Horacio Franco

Comments: to appear in Computer Speech & Language - this https URL - arXiv admin note: substantial text overlap with arXiv:1807.10948

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65] arXiv:1905.06655 (cross-list from cs.CL) [pdf, other]: Title: Effective Sentence Scoring Method using Bidirectional Language Model for Speech Recognition

Joongbo Shin, Yoonhyung Lee, Kyomin Jung

Comments: submitted to INTERSPEECH 2019, 5 pages

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66] arXiv:1905.06791 (cross-list from eess.AS) [pdf, other]: Title: Almost Unsupervised Text to Speech and Automatic Speech Recognition

Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Comments: Accepted by ICML2019

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[67] arXiv:1905.06860 (cross-list from eess.AS) [pdf, other]: Title: Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models

Ahmed Hussen Abdelaziz, Barry-John Theobald, Justin Binder, Gabriele Fanelli, Paul Dixon, Nicholas Apostoloff, Thibaut Weise, Sachin Kajareker

Comments: 9 pages, 2 figures, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[68] arXiv:1905.06907 (cross-list from cs.LG) [pdf, other]: Title: Learning discriminative features in sequence training without requiring framewise labelled data

Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu

Comments: Accepted in ICASSP 2019 lecture session

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69] arXiv:1905.07082 (cross-list from cs.CR) [pdf, other]: Title: The Audio Auditor: User-Level Membership Inference in Internet of Things Voice Services

Yuantian Miao, Minhui Xue, Chao Chen, Lei Pan, Jun Zhang, Benjamin Zi Hao Zhao, Dali Kaafar, Yang Xiang

Comments: Accepted by PoPETs 2021.1

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:1905.07149 (cross-list from eess.AS) [pdf, other]: Title: End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System

Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura

Comments: accepted for Interspeech 2019

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[71] arXiv:1905.07195 (cross-list from cs.CL) [pdf, other]: Title: CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network

Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72] arXiv:1905.07293 (cross-list from cs.LG) [pdf, other]: Title: Weakly-Supervised Temporal Localization via Occurrence Count Learning

Julien Schroeter, Kirill Sidorov, David Marshall

Comments: Accepted at ICML 2019

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[73] arXiv:1905.08459 (cross-list from cs.CL) [pdf, other]: Title: Non-Autoregressive Neural Text-to-Speech

Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

Comments: Published at ICML 2020. (v3 changed paper title)

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:1905.08486 (cross-list from eess.AS) [pdf, other]: Title: Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

Ohsung Kwon, Eunwoo Song, Jae-Min Kim, Hong-Goo Kang

Comments: 5 pages, 3 figures, 3 tables, submitted to Speech Synthesis Workshop 2019

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[75] arXiv:1905.08492 (cross-list from eess.AS) [pdf, other]: Title: DNN-Based Speech Presence Probability Estimation for Multi-Frame Single-Microphone Speech Enhancement

Marvin Tammen, Dörte Fischer, Bernd T. Meyer, Simon Doclo

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[76] arXiv:1905.08632 (cross-list from eess.AS) [pdf, other]: Title: Human Vocal Sentiment Analysis

Andrew Huang, Puwei Bao

Comments: NYU Shanghai CSCS 2019

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[77] arXiv:1905.09263 (cross-list from cs.CL) [pdf, other]: Title: FastSpeech: Fast, Robust and Controllable Text to Speech

Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Comments: Accepted by NeurIPS2019

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:1905.09754 (cross-list from eess.AS) [pdf, other]: Title: A Perceptual Weighting Filter Loss for DNN Training in Speech Enhancement

Ziyue Zhao, Samy Elshamy, Tim Fingscheidt

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79] arXiv:1905.10399 (cross-list from eess.AS) [pdf, other]: Title: Fast computation of loudness using a deep neural network

Josef Schlittenlacher, Richard E. Turner, Brian C. J. Moore

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[80] arXiv:1905.10954 (cross-list from cs.LG) [pdf, other]: Title: Transcribing Content from Structural Images with Spotlight Mechanism

Yu Yin, Zhenya Huang, Enhong Chen, Qi Liu, Fuzheng Zhang, Xing Xie, Guoping Hu

Comments: Accepted by KDD2018 Research Track. In proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18)

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Machine Learning (stat.ML)
[81] arXiv:1905.11142 (cross-list from cs.LG) [pdf, other]: Title: Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

Guanzhong Tian, Yi Yuan, Yong liu

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[82] arXiv:1905.11235 (cross-list from cs.CL) [pdf, other]: Title: CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

Linhao Dong, Bo Xu

Comments: To appear at ICASSP 2020

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:1905.11276 (cross-list from eess.AS) [pdf, other]: Title: UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge

Zbyněk Zajíc, Marie Kunešová, Marek Hrúz, Jan Vaněk

Comments: Submitted to Interspeech 2019

Journal-ref: INTERSPEECH 2019

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[84] arXiv:1905.11449 (cross-list from cs.CL) [pdf, other]: Title: VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura

Comments: Submitted to Interspeech 2019

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:1905.11563 (cross-list from cs.CL) [pdf, other]: Title: Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion

Andy T. Liu, Po-chun Hsu, Hung-yi Lee

Comments: Accepted by Interspeech 2019, Graz, Austria

Journal-ref: Interspeech 2019

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:1905.11785 (cross-list from eess.AS) [pdf, other]: Title: Automatic Quality Control and Enhancement for Voice-Based Remote Parkinson's Disease Detection

Amir Hossein Poorjam, Mathew Shaji Kavalekalam, Liming Shi, Yordan P. Raykov, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen

Comments: Preprint, 12 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[87] arXiv:1905.11796 (cross-list from eess.AS) [pdf, other]: Title: Self-supervised audio representation learning for mobile devices

Marco Tagliasacchi, Beat Gfeller, Félix de Chaumont Quitry, Dominik Roblek

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[88] arXiv:1905.11928 (cross-list from eess.AS) [pdf, other]: Title: SignalTrain: Profiling Audio Compressors with Deep Neural Networks

Scott H. Hawley, Benjamin Colburn, Stylianos I. Mimilakis

Comments: 9 pages, 10 figures. v2: typos & references fixed

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[89] arXiv:1905.12230 (cross-list from cs.CL) [pdf, other]: Title: Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR

Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach

Comments: Accepted to INTERSPEECH 2019

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:1905.13150 (cross-list from cs.CL) [pdf, other]: Title: Lattice-based lightly-supervised acoustic model training

Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell

Comments: Proc. INTERSPEECH 2019

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91] arXiv:1905.13399 (cross-list from cs.CR) [pdf, other]: Title: Real-Time Adversarial Attacks

Yuan Gong, Boyang Li, Christian Poellabauer, Yiyu Shi

Comments: To Appear in the Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). Code: this https URL

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:1905.13561 (cross-list from eess.AS) [pdf, other]: Title: Speaker Anonymization Using X-vector and Neural Waveform Models

Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans, Jean-Francois Bonastre

Comments: Submitted to the 10th ISCA Speech Synthesis Workshop (SSW10)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[93] arXiv:1905.13567 (cross-list from eess.AS) [pdf, other]: Title: Musical Composition Style Transfer via Disentangled Timbre Representations

Yun-Ning Hung, I-Tung Chiang, Yi-An Chen, Yi-Hsuan Yang

Comments: Accepted by the 28th International Joint Conference on Artificial Intelligence. arXiv admin note: text overlap with arXiv:1811.03271

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Total of 93 entries

Showing up to 2000 entries per page: fewer | more | all