Sound

Authors and titles for July 2018

Total of 73 entries : 26-73 51-73

Showing up to 50 entries per page: fewer | more | all

[26] arXiv:1807.08636 [pdf, other]: Title: Auto-adaptive Resonance Equalization using Dilated Residual Networks

Maarten Grachten, Emmanuel Deruty, Alexandre Tanguy

Journal-ref: Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019. Pp. 405-411. https://archives.ismir.net/ismir2019/paper/000048.pdf

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:1807.08869 [pdf, other]: Title: Joint Time-Frequency Scattering

Joakim Andén, Vincent Lostanlen, Stéphane Mallat

Comments: 14 pages, 10 figures

Journal-ref: IEEE Transactions on Signal Processing, vol. 67, no. 14, pp. 3704-3718, July 15, 2019

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:1807.08974 [pdf, other]: Title: Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures

Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian, Dong Yu

Comments: Accepted in Interspeech 2018

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:1807.09208 [pdf, other]: Title: A Hybrid of Deep Audio Feature and i-vector for Artist Recognition

Jiyoung Park, Donghyun Kim, Jongpil Lee, Sangeun Kum, Juhan Nam

Comments: Joint Workshop on Machine Learning for Music, the 34th International Conference on Machine Learning (ICML), 2018

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30] arXiv:1807.09902 [pdf, other]: Title: General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

Comments: Camera ready for DCASE Workshop 2018

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[31] arXiv:1807.10236 [pdf, other]: Title: Modulation-Domain Kalman Filtering for Monaural Blind Speech Denoising and Dereverberation

Nikolaos Dionelis, Mike Brookes

Comments: 13 pages, 13 figures, Submitted to IEEE Transactions on Audio, Speech and Language Processing

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32] arXiv:1807.10501 [pdf, other]: Title: Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments

Romain Serizel (MULTISPEECH), Nicolas Turpault (MULTISPEECH), Hamid Eghbal-Zadeh, Ankit Parag Shah (LTI)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:1807.11089 [pdf, other]: Title: Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

Pramit Saha, Praneeth Srungarapu, Sidney Fels

Comments: To appear in the INTERSPEECH 2018 Proceedings

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[34] arXiv:1807.11094 [pdf, other]: Title: Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

Juan Manuel Vera-Diaz, Daniel Pizarro, Javier Macias-Guarasa

Comments: 18 pages, 3 figures, 8 tables

Journal-ref: Sensors 2018, (volume 18(10), 3418)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:1807.11138 [pdf, other]: Title: Audio segmentation based on melodic style with hand-crafted features and with convolutional neural networks

Amruta Vidwans, Nachiket Deo, Preeti Rao

Comments: This work was done in 2015 at Indian Institute of Technology, Bombay, as a part of the ERC grant agreement 267583 (CompMusic) project

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:1807.11161 [pdf, other]: Title: Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network

Hao-Min Liu, Yi-Hsuan Yang

Comments: 7 pages, 7 figures and 4 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[37] arXiv:1807.11298 [pdf, other]: Title: Harmonic-Percussive Source Separation with Deep Neural Networks and Phase Recovery

Konstantinos Drossos, Paul Magron, Stylianos Ioannis Mimilakis, Tuomas Virtanen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:1807.00752 (cross-list from eess.AS) [pdf, other]: Title: Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks

Akihiro Kato, Tomi Kinnunen

Comments: Accepted by peer reviewing for Interspeech 2018

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD); Machine Learning (stat.ML)
[39] arXiv:1807.01106 (cross-list from cs.HC) [pdf, other]: Title: A Study of Material Sonification in Touchscreen Devices

Rodrigo Martín, Michael Weinmann, Matthias B. Hullin

Comments: 9 pages

Journal-ref: Proc. ACM ISS 2018, 305-310

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40] arXiv:1807.01126 (cross-list from cs.LG) [pdf, other]: Title: Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation

Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, Tetsuya Ogata

Comments: 8 pages, 7 figures. Proc. IJCNN 2019

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[41] arXiv:1807.01738 (cross-list from eess.AS) [pdf, other]: Title: Investigating the role of L1 in automatic pronunciation evaluation of L2 speech

Ming Tu, Anna Grabek, Julie Liss, Visar Berisha

Comments: To appear in Interspeech 2018

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[42] arXiv:1807.01956 (cross-list from cs.CL) [pdf, other]: Title: Neural Language Codes for Multilingual Acoustic Models

Markus Müller, Sebastian Stüker, Alex Waibel

Comments: 5 pages, 3 figures, accepted at Interspeech 2018

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:1807.02465 (cross-list from eess.AS) [pdf, other]: Title: Tone Recognition Using Lifters and CTC

Loren Lugosch, Vikrant Singh Tomar

Comments: Accepted to Interspeech 2018

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[44] arXiv:1807.03094 (cross-list from cs.CV) [pdf, other]: Title: Deep Multimodal Clustering for Unsupervised Audiovisual Learning

Di Hu, Feiping Nie, Xuelong Li

Comments: Accepted by CVPR2019

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45] arXiv:1807.03191 (cross-list from cs.CV) [pdf, other]: Title: Approximate k-space models and Deep Learning for fast photoacoustic reconstruction

Andreas Hauptmann, Ben Cox, Felix Lucka, Nam Huynh, Marta Betcke, Paul Beard, Simon Arridge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Optimization and Control (math.OC)
[46] arXiv:1807.03396 (cross-list from cs.CL) [pdf, other]: Title: On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition

Hao Tang, James Glass

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47] arXiv:1807.04353 (cross-list from eess.AS) [pdf, other]: Title: Efficient keyword spotting using time delay neural networks

Samuel Myer, Vikrant Singh Tomar

Comments: Will appear in Interspeech 2018

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[48] arXiv:1807.04636 (cross-list from eess.AS) [pdf, other]: Title: Optimal Binaural LCMV Beamforming in Complex Acoustic Scenarios: Theoretical and Practical Insights

N. Gößling, D. Marquardt, I. Merks, T. Zhang, S. Doclo

Comments: To appear in Proc. IWAENC 2018

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[49] arXiv:1807.04978 (cross-list from eess.AS) [pdf, other]: Title: Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

Zhangyu Xiao, Zhijian Ou, Wei Chu, Hui Lin

Comments: accepted by ISCSLP 2018

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[50] arXiv:1807.05855 (cross-list from cs.CL) [pdf, other]: Title: A Fast-Converged Acoustic Modeling for Korean Speech Recognition: A Preliminary Study on Time Delay Neural Network

Hosung Park, Donghyun Lee, Minkyu Lim, Yoseb Kang, Juneseok Oh, Ji-Hwan Kim

Comments: 6 pages, 2 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[51] arXiv:1807.06391 (cross-list from cs.AI) [pdf, other]: Title: Learning to Listen, Read, and Follow: Score Following as a Reinforcement Learning Game

Matthias Dorfer, Florian Henkel, Gerhard Widmer

Comments: Published in the Proceedings of the 19th International Society for Music Information Retrieval Conference, Paris, France, 2018

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52] arXiv:1807.06441 (cross-list from eess.AS) [pdf, other]: Title: A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Jan Vanek, Josef Michalek, Jan Zelinka, Josef Psutka

Comments: submitted and accepted to SLSP 2018 conference. arXiv admin note: text overlap with arXiv:1806.07186, arXiv:1806.07974

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[53] arXiv:1807.06610 (cross-list from eess.AS) [pdf, other]: Title: Learning Noise-Invariant Representations for Robust Speech Recognition

Davis Liang, Zhiheng Huang, Zachary C. Lipton

Comments: Under Review at IEEE SLT 2018

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[54] arXiv:1807.06663 (cross-list from eess.AS) [pdf, other]: Title: MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass

Comments: MCE 2018 Plan (this http URL)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[55] arXiv:1807.06706 (cross-list from cs.HC) [pdf, other]: Title: Sonification in security operations centres: what do security practitioners think?

Louise M. Axon, Bushra Alahmadi, Jason R. C. Nurse, Michael Goldsmith, Sadie Creese

Journal-ref: Workshop on Usable Security (USEC) at the Network and Distributed System Security (NDSS) Symposium 2018

Subjects: Human-Computer Interaction (cs.HC); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[56] arXiv:1807.06736 (cross-list from cs.CL) [pdf, other]: Title: Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

Comments: 5 pages, 3 figures, 2 tables. Published in IEEE International Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP2018)

Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (2018) 4789-4793

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57] arXiv:1807.07281 (cross-list from cs.CL) [pdf, other]: Title: ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech

Wei Ping, Kainan Peng, Jitong Chen

Comments: Published at ICLR 2019. (v3: add important details & discussion in Appendix A)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[58] arXiv:1807.07436 (cross-list from eess.AS) [pdf, other]: Title: A Capsule based Approach for Polyphonic Sound Event Detection

Yaming Liu, Jian Tang, Yan Song, Lirong Dai

Comments: 4 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59] arXiv:1807.08089 (cross-list from cs.CL) [pdf, other]: Title: Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval

Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-yi Lee, Lin-shan Lee

Comments: Accepted at SLT2018

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:1807.08280 (cross-list from cs.CL) [pdf, other]: Title: Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61] arXiv:1807.08312 (cross-list from eess.AS) [pdf, other]: Title: Unified Hypersphere Embedding for Speaker Recognition

Mahdi Hajibabaei, Dengxin Dai

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[62] arXiv:1807.09597 (cross-list from eess.AS) [pdf, other]: Title: Acoustic-to-Word Recognition with Sequence-to-Sequence Models

Shruti Palaskar, Florian Metze

Comments: 9 pages, 3 figures, Under Review at SLT 2018

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[63] arXiv:1807.09840 (cross-list from eess.AS) [pdf, other]: Title: A multi-device dataset for urban acoustic scene classification

Annamaria Mesaros, Toni Heittola, Tuomas Virtanen

Comments: accepted to DCASE 2018 Workshop

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64] arXiv:1807.10857 (cross-list from eess.AS) [pdf, other]: Title: A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N Sainath, Karen Livescu

Comments: Accepted in SLT 2018

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[65] arXiv:1807.10941 (cross-list from eess.AS) [pdf, other]: Title: Analysing Shortcomings of Statistical Parametric Speech Synthesis

Gustav Eje Henter, Simon King, Thomas Merritt, Gilles Degottex

Comments: 34 pages with 4 figures; draft book chapter

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[66] arXiv:1807.10984 (cross-list from cs.CL) [pdf, other]: Title: Domain Robust Feature Extraction for Rapid Low Resource ASR Development

Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. Black

Comments: To appear in SLT 2018

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67] arXiv:1807.11246 (cross-list from eess.AS) [pdf, other]: Title: DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics

Gert Dekkers, Lode Vuegen, Toon van Waterschoot, Bart Vanrumste, Peter Karsmakers

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[68] arXiv:1807.11284 (cross-list from eess.AS) [pdf, other]: Title: Unsupervised Domain Adaptation by Adversarial Learning for Robust Speech Recognition

Pavel Denisov, Ngoc Thang Vu, Marc Ferras Font

Comments: 5 pages, 2 figures, the 13th ITG conference on Speech Communication

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[69] arXiv:1807.11470 (cross-list from eess.AS) [pdf, other]: Title: Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis

Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi

Comments: 17 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[70] arXiv:1807.11632 (cross-list from eess.AS) [pdf, other]: Title: Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems

Hieu-Thi Luong, Junichi Yamagishi

Comments: Accepted for 2018 IEEE Workshop on Spoken Language Technology (SLT), Athens, Greece

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
[71] arXiv:1807.11679 (cross-list from eess.AS) [pdf, other]: Title: Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD); Machine Learning (stat.ML)
[72] arXiv:1807.11722 (cross-list from eess.AS) [pdf, other]: Title: Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals

Soumitro Chakrabarty, Emanuël A. P. Habets

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[73] arXiv:1807.11893 (cross-list from eess.AS) [pdf, other]: Title: Manual Post-editing of Automatically Transcribed Speeches from the Icelandic Parliament - Althingi

Judy Y. Fong, Michal Borsky, Inga R. Helgadóttir, Jon Gudnason

Comments: submitted to IEEE SLT 2018, Athens

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Total of 73 entries : 26-73 51-73

Showing up to 50 entries per page: fewer | more | all