Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for October 2017

Total of 46 entries
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:1710.00082 [pdf, other]
Title: Real-Time Wind Noise Detection and Suppression with Neural-Based Signal Reconstruction for Mult-Channel, Low-Power Devices
Anthony D. Rhodes
Comments: 5 pages, 8 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:1710.00343 [pdf, other]
Title: Large-scale weakly supervised audio classification using gated convolutional neural network
Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
Comments: submitted to ICASSP2018, summary on the 1st place system in DCASE2017 task4 challenge
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3] arXiv:1710.01446 [pdf, other]
Title: Improving Compression Based Dissimilarity Measure for Music Score Analysis
Ayaka Takamoto, Mayu Umemura, Mitsuo Yoshida, Kyoji Umemura
Comments: The 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA2016)
Subjects: Sound (cs.SD); Other Computer Science (cs.OH); Audio and Speech Processing (eess.AS)
[4] arXiv:1710.01589 [pdf, other]
Title: Independent Low-Rank Matrix Analysis Based on Parametric Majorization-Equalization Algorithm
Yoshiki Mitsui, Daichi Kitamura, Norihiro Takamune, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo
Comments: Preprint Manuscript of 2017 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP 2017)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5] arXiv:1710.02280 [pdf, other]
Title: Generating Nontrivial Melodies for Music as a Service
Yifei Teng, An Zhao, Camille Goudeseune
Comments: ISMIR 2017 Conference
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6] arXiv:1710.02997 [pdf, other]
Title: A report on sound event detection with different binaural features
Sharath Adavanne, Tuomas Virtanen
Comments: Technical report for the top performing method in Task 3: Real life sound event detection challenge, at Detection and classification of acoustic scene and events (DCASE) 2017
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:1710.02998 [pdf, other]
Title: Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network
Sharath Adavanne, Tuomas Virtanen
Comments: Accepted in Detection and Classification of Acoustic Scenes and Events (DCASE 2017)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:1710.04196 [pdf, other]
Title: Pyroomacoustics: A Python package for audio room simulations and array processing algorithms
Robin Scheibler, Eric Bezzam, Ivan Dokmanić
Comments: 5 pages, 5 figures, describes a software package
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:1710.06648 [pdf, other]
Title: Representation Learning of Music Using Artist Labels
Jiyoung Park, Jongpil Lee, Jangyeon Park, Jung-Woo Ha, Juhan Nam
Comments: 19th International Society for Music Information Retrieval Conference (ISMIR), 2018
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10] arXiv:1710.07654 [pdf, other]
Title: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller
Comments: Published as a conference paper at ICLR 2018. (v3 changed paper title)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11] arXiv:1710.07868 [pdf, other]
Title: Deep Triphone Embedding Improves Phoneme Recognition
Mohit Yadav, Vivek Tyagi
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12] arXiv:1710.08377 [pdf, other]
Title: Listening to the World Improves Speech Command Recognition
Brian McMahan, Delip Rao
Comments: 8 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[13] arXiv:1710.08684 [pdf, other]
Title: Inferring Room Semantics Using Acoustic Monitoring
Muhammad A. Shah, Bhiksha Raj, Khaled A. Harras
Comments: 2017 IEEE International Workshop on Machine Learning for Signal Processing, Sept.\ 25--28, 2017, Tokyo, Japan
Journal-ref: IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 27 (2017) 1-6
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:1710.08969 [pdf, other]
Title: Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara
Comments: 5 pages, 3figures, IEEE ICASSP 2018
Journal-ref: Proc. ICASSP (2018) 4784-4788
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:1710.09064 [pdf, other]
Title: End-to-End Optimized Speech Coding with Deep Neural Networks
Srihari Kankanahalli
Comments: Accepted and presented at ICASSP 2018. Samples available here: this http URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:1710.09091 [pdf, other]
Title: Relative Transfer Function Inverse Regression from Low Dimensional Manifold
Ziteng Wang, Emmanuel Vincent, Yonghong Yan
Comments: 5 pages, in preparation for Signal Processing Letters
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:1710.10005 [pdf, other]
Title: Separation of Moving Sound Sources Using Multichannel NMF and Acoustic Tracking
Joonas Nikunen, Aleksandr Diment, Tuomas Virtanen
Comments: Preprint of manuscript submitted to IEEE/ACM Transactions on Audio Speech and Language processing (R1)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:1710.10059 [pdf, other]
Title: Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network
Sharath Adavanne, Archontis Politis, Tuomas Virtanen
Comments: EUSIPCO 2018
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19] arXiv:1710.10436 [pdf, other]
Title: Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification
Yi Liu, Liang He, Weiqiang Zhang, Jia Liu, Michael T. Johnson
Comments: accepted by APSIPA ASC 2018
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:1710.10451 [pdf, other]
Title: Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms
Taejun Kim, Jongpil Lee, Juhan Nam
Comments: Accepted for publication at ICASSP 2018
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[21] arXiv:1710.10779 [pdf, other]
Title: Generative Adversarial Source Separation
Cem Subakan, Paris Smaragdis
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[22] arXiv:1710.10948 [pdf, other]
Title: Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks
Eric L. Ferguson, Stefan B. Williams, Craig T. Jin
Comments: 5 pages, 5 figures, Final draft of paper submitted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 15-20 April 2018 in Calgary, Alberta, Canada. arXiv admin note: text overlap with arXiv:1612.03505
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[23] arXiv:1710.10974 [pdf, other]
Title: Content-based Representations of audio using Siamese neural networks
Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[24] arXiv:1710.11153 [pdf, other]
Title: Onsets and Frames: Dual-Objective Piano Transcription
Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck
Comments: Examples available at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[25] arXiv:1710.11385 [pdf, other]
Title: Audio style transfer
Eric Grinstein, Ngoc Duong, Alexey Ozerov, Patrick Pérez
Comments: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, France. IEEE
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)
[26] arXiv:1710.11418 [pdf, other]
Title: Polyphonic Music Generation with Sequence Generative Adversarial Networks
Sang-gil Lee, Uiwon Hwang, Seonwoo Min, Sungroh Yoon
Comments: 8 pages, 3 figures, 3 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:1710.11428 [pdf, other]
Title: SVSGAN: Singing Voice Separation via Generative Adversarial Network
Zhe-Cheng Fan, Yen-Lin Lai, Jyh-Shing Roger Jang
Comments: 5 pages, 4 figures, 1 table. Demo website: this http URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[28] arXiv:1710.11439 [pdf, other]
Title: Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization
Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
Comments: 5 pages, 3 figures, version that Eqs. (9), (19), and (20) in v2 (submitted to ICASSP 2018) are corrected. Samples available here: this http URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[29] arXiv:1710.11473 [pdf, other]
Title: Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
Emad M. Grais, Hagen Wierstorf, Dominic Ward, Mark D. Plumbley
Comments: arXiv admin note: text overlap with arXiv:1703.08019
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30] arXiv:1710.11549 [pdf, other]
Title: Melody Generation for Pop Music via Word Representation of Musical Properties
Andrew Shin, Leopold Crestel, Hiroharu Kato, Kuniaki Saito, Katsunori Ohnishi, Masataka Yamaguchi, Masahiro Nakawaki, Yoshitaka Ushiku, Tatsuya Harada
Comments: submitted to ICLR 2018
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[31] arXiv:1710.00113 (cross-list from eess.AS) [pdf, other]
Title: UTD-CRSS Submission for MGB-3 Arabic Dialect Identification: Front-end and Back-end Advancements on Broadcast Speech
Ahmet E. Bulut, Qian Zhang, Chunlei Zhang, Fahimeh Bahmaninezhad, John H. L. Hansen
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[32] arXiv:1710.00116 (cross-list from eess.AS) [pdf, other]
Title: PLDA-Based Diarization of Telephone Conversations
Ahmet E. Bulut, Hakan Demir, Yusuf Ziya Isik, Hakan Erdogan
Journal-ref: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1 (2015) 4809-4813
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[33] arXiv:1710.01904 (cross-list from eess.AS) [pdf, other]
Title: Head shadow enhancement with low-frequency beamforming improves sound localization and speech perception for simulated bimodal listeners
Benjamin Dieudonné, Tom Francart
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[34] arXiv:1710.02369 (cross-list from eess.AS) [pdf, other]
Title: End-to-end DNN Based Speaker Recognition Inspired by i-vector and PLDA
Johan Rohdin, Anna Silnova, Mireia Diez, Oldrich Plchot, Pavel Matejka, Lukas Burget
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[35] arXiv:1710.02560 (cross-list from eess.AS) [pdf, other]
Title: The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments
Mirco Ravanelli, Maurizio Omologo
Comments: ASRU 2015
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[36] arXiv:1710.03538 (cross-list from eess.AS) [pdf, other]
Title: Contaminated speech training methods for robust DNN-HMM distant speech recognition
Mirco Ravanelli, Maurizio Omologo
Journal-ref: INTERSPEECH 2015
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[37] arXiv:1710.03975 (cross-list from eess.AS) [pdf, other]
Title: PROSE: Perceptual Risk Optimization for Speech Enhancement
Jishnu Sadasivan, Chandra Sekhar Seelamantula, Nagarjuna Reddy Muraka
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[38] arXiv:1710.04288 (cross-list from eess.AS) [pdf, other]
Title: Audio Concept Classification with Hierarchical Deep Neural Networks
Mirco Ravanelli, Benjamin Elizalde, Karl Ni, Gerald Friedland
Journal-ref: EUSIPCO 2014
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:1710.09985 (cross-list from eess.AS) [pdf, other]
Title: Acoustic Landmarks Contain More Information About the Phone String than Other Frames for Automatic Speech Recognition with Deep Neural Network Acoustic Model
Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen
Comments: The article has been submitted to Journal of the Acoustical Society of America
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40] arXiv:1710.10197 (cross-list from cs.LG) [pdf, other]
Title: Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition
Fei Tao, Gang Liu
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[41] arXiv:1710.10224 (cross-list from cs.CL) [pdf, other]
Title: BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition
Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee
Comments: Accepted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:1710.10432 (cross-list from eess.AS) [pdf, other]
Title: Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework
Shoufeng Lin
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43] arXiv:1710.10468 (cross-list from eess.AS) [pdf, other]
Title: Speaker Diarization with LSTM
Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno
Comments: Published at ICASSP 2018
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[44] arXiv:1710.10470 (cross-list from eess.AS) [pdf, other]
Title: Attention-Based Models for Text-Dependent Speaker Verification
F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan
Comments: Submitted to ICASSP 2018
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[45] arXiv:1710.10774 (cross-list from cs.CL) [pdf, other]
Title: Sequence-to-Sequence ASR Optimization via Reinforcement Learning
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Comments: Accepted at ICASSP 2018
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[46] arXiv:1710.11317 (cross-list from eess.AS) [pdf, other]
Title: Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
Kanru Hua
Comments: To be presented at Interspeech 2018
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Total of 46 entries
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack