Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for July 2019

Total of 125 entries : 1-50 51-100 101-125
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:1907.01160 [pdf, other]
Title: WHAM!: Extending Speech Separation to Noisy Environments
Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux
Comments: Accepted for publication at Interspeech 2019
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[2] arXiv:1907.01169 [pdf, other]
Title: Can a Robot Hear the Shape and Dimensions of a Room?
Linh Nguyen, Jaime Valls Miro, Xiaojun Qiu
Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2019)
Subjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[3] arXiv:1907.01195 [pdf, other]
Title: Kite: Automatic speech recognition for unmanned aerial vehicles
Dan Oneata, Horia Cucu
Comments: 5 pages, accepted at Interspeech 2019
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[4] arXiv:1907.01742 [pdf, other]
Title: Supervised Classifiers for Audio Impairments with Noisy Labels
Chandan K A Reddy, Ross Cutler, Johannes Gehrke
Comments: To appear in INTERSPEECH 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5] arXiv:1907.01813 [pdf, other]
Title: A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features
Olga Slizovskaia, Emilia Gómez, Gloria Haro
Comments: The 2018 Joint Workshop on Machine Learning for Music, The Federated Artificial Intelligence Meeting (FAIM), Joint workshop program of ICML, IJCAI/ECAI, and AAMAS, Stockholm, Sweden, Saturday, July 14th, 2018
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:1907.01824 [pdf, other]
Title: Cover Detection using Dominant Melody Embeddings
Guillaume Doras, Geoffroy Peeters
Journal-ref: 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Machine Learning (stat.ML)
[7] arXiv:1907.02230 [pdf, other]
Title: Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification
Zhichao Zhang, Shugong Xu, Tianhao Qiao, Shunqing Zhang, Shan Cao
Comments: Accepted to Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:1907.02265 [pdf, other]
Title: Supervised Symbolic Music Style Translation Using Synthetic Data
Ondřej Cífka, Umut Şimşekli, Gaël Richard
Comments: ISMIR 2019 camera-ready
Journal-ref: Proceedings of the 20th International Society for Music Information Retrieval Conference (2019) 588-595
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[9] arXiv:1907.02526 [pdf, other]
Title: Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients
Nursadul Mamun, Soheil Khorram, John H.L. Hansen
Comments: Interspeech 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10] arXiv:1907.02637 [pdf, other]
Title: Neural Drum Machine : An Interactive System for Real-time Synthesis of Drum Sounds
Cyran Aouameur, Philippe Esling, Gaëtan Hadjeres
Comments: 8 pages, accepted at the International Conference on Computational Creativity 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11] arXiv:1907.02698 [pdf, other]
Title: A Bi-directional Transformer for Musical Chord Recognition
Jonggwon Park, Kyoyun Choi, Sungwook Jeon, Dokyun Kim, Jonghun Park
Comments: 20th International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands, 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12] arXiv:1907.02864 [pdf, other]
Title: Deep Neural Baselines for Computational Paralinguistics
Daniel Elsner, Stefan Langer, Fabian Ritz, Robert Müller, Steffen Illium
Comments: 5 pages, 3 figures; This paper was accepted at INTERSPEECH 2019, Graz, 15-19th September 2019. DOI will be added after publishment of the accepted paper
Journal-ref: Proc. Interspeech 2019, 2388-2392
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[13] arXiv:1907.03572 [pdf, other]
Title: Towards Explainable Music Emotion Recognition: The Route via Mid-level Features
Shreyan Chowdhury, Andreu Vall, Verena Haunschmid, Gerhard Widmer
Comments: International Society for Music Information Retrieval Conference, Delft, The Netherlands, 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Machine Learning (stat.ML)
[14] arXiv:1907.03988 [pdf, other]
Title: Improving Reverberant Speech Training Using Diffuse Acoustic Simulation
Zhenyu Tang, Lianwu Chen, Bo Wu, Dong Yu, Dinesh Manocha
Comments: Accepted to ICASSP 2020, impulse response generation code at this https URL
Journal-ref: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6969-6973)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:1907.04292 [pdf, other]
Title: Evolution of the Informational Complexity of Contemporary Western Music
Thomas Parmer, Yong-Yeol Ahn
Comments: 8 pages, 6 figures; added supplementary materials
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:1907.04352 [pdf, other]
Title: Exploring Conditioning for Generative Music Systems with Human-Interpretable Controls
Nicholas Meade, Nicholas Barreyre, Scott C. Lowe, Sageev Oore
Journal-ref: International Conference on Computational Creativity, 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:1907.04868 [pdf, other]
Title: LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison W. Cottrell, Julian McAuley
Comments: Published as a conference paper at ISMIR 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[18] arXiv:1907.04984 [pdf, other]
Title: Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming
Yoshiki Masuyama, Masahito Togami, Tatsuya Komatsu
Comments: 5 pages, Accepted at INTERSPEECH 2019
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:1907.05208 [pdf, other]
Title: Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs
Benjamin Genchel, Ashis Pati, Alexander Lerch
Comments: In Proceedings of the 7th International Workshop on Musical Meta-creation (MUME). Charlotte, North Carolina 2019
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:1907.05584 [pdf, other]
Title: Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams
Harishchandra Dubey, Abhijeet Sangwan, John Hansen
Comments: 6 Pages, 3 Fiigures, 5 Equations
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:1907.05982 [pdf, other]
Title: Learning Complex Basis Functions for Invariant Representations of Audio
Stefan Lattner, Monika Dörfler, Andreas Arzt
Comments: Paper accepted at the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8; 8 pages, 4 figures, 4 tables
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22] arXiv:1907.06078 [pdf, other]
Title: Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps, Björn W. Schuller
Comments: Accepted in IEEE Transactions on Affective Computing
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:1907.06083 [pdf, other]
Title: Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition
Siddique Latif, Junaid Qadir, Muhammad Bilal
Comments: Accepted in Affective Computing & Intelligent Interaction (ACII 2019)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:1907.06129 [pdf, other]
Title: Towards Robust Voice Pathology Detection
Pavol Harar, Zoltan Galaz, Jesus B. Alonso-Hernandez, Jiri Mekyska, Radim Burget, Zdenek Smekal
Comments: 11 pages, 1 figure, 10 tables. Keywords: Voice pathology detection, deep learning, gradient boosting, anomaly detection
Journal-ref: Neural Computing and Applications (2018): 1-11
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[25] arXiv:1907.06637 [pdf, other]
Title: The Bach Doodle: Approachable music composition with machine learning at scale
Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica Dinculescu, James Wexler, Leon Hong, Jacob Howcroft
Comments: Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2019
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[26] arXiv:1907.07398 [pdf, other]
Title: HODGEPODGE: Sound event detection based on ensemble of semi-supervised learning methods
Ziqiang Shi, Liu Liu, Huibin Lin, Rujie Liu, Anyan Shi
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:1907.08506 [pdf, other]
Title: Language Modelling for Sound Event Detection with Teacher Forcing and Scheduled Sampling
Konstantinos Drossos, Shayan Gharib, Paul Magron, Tuomas Virtanen
Comments: Fixed the display of URLs at footnote, updated the results
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[28] arXiv:1907.08520 [pdf, other]
Title: Data Augmentation for Instrument Classification Robust to Audio Effects
António Ramires, Xavier Serra
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:1907.08698 [pdf, other]
Title: Leveraging Knowledge Bases And Parallel Annotations For Music Genre Translation
Elena V. Epure, Anis Khlif, Romain Hennequin
Comments: Published in ISMIR 2019
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[30] arXiv:1907.09238 [pdf, other]
Title: Crowdsourcing a Dataset of Audio Captions
Samuel Lipping, Konstantinos Drossos, Tuomas Virtanen
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:1907.09884 [pdf, other]
Title: Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features
Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen
Comments: 5 pages, 1 figure, accepted by INTERSPEECH 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[32] arXiv:1907.09936 [pdf, other]
Title: Log Complex Color for Visual Pattern Recognition of Total Sound
Stephen Wedekind, P. Fraundorf
Comments: 6 pages, 5 figures, 28 references, cf. this http URL
Journal-ref: Audio Engineering Society Convention 141 (2016) paper 9647; Subject of US patent 10,341,795
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:1907.11238 [pdf, other]
Title: Interactive Lungs Auscultation with Reinforcement Learning Agent
Tomasz Grzywalski, Riccardo Belluzzo, Szymon Drgas, Agnieszka Cwalinska, Honorata Hafke-Dys
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[34] arXiv:1907.11956 [pdf, other]
Title: Dilated FCN: Listening Longer to Hear Better
Shuyu Gong, Zhewei Wang, Tao Sun, Yuanhang Zhang, Charles D. Smith, Li Xu, Jundong Liu
Comments: 5 pages; will appear in WASPAA conference
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35] arXiv:1907.12279 [pdf, other]
Title: StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Comments: Accepted to Interspeech 2019. Project page: this http URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[36] arXiv:1907.13188 [pdf, other]
Title: Marine Mammal Species Classification using Convolutional Neural Networks and a Novel Acoustic Representation
Mark Thomas, Bruce Martin, Katie Kowarski, Briand Gaudet, Stan Matwin
Comments: 16 pages, To appear in ECML-PKDD 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[37] arXiv:1907.00112 (cross-list from cs.CL) [pdf, other]
Title: Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice
Vikramjit Mitra, Sue Booker, Erik Marchi, David Scott Farrar, Ute Dorothea Peitz, Bridget Cheng, Ermine Teves, Anuj Mehta, Devang Naik
Comments: 5 pages, 6 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:1907.00443 (cross-list from cs.CL) [pdf, other]
Title: Multilingual Bottleneck Features for Query by Example Spoken Term Detection
Dhananjay Ram, Lesly Miculicich, Hervé Bourlard
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:1907.00457 (cross-list from cs.CL) [pdf, other]
Title: BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition
Shaoshi Ling, Julian Salazar, Yuzong Liu, Katrin Kirchhoff
Comments: Odyssey 2020 camera-ready (presented Nov. 2020)
Journal-ref: Proc. the Speaker and Language Recognition Workshop (Odyssey 2020), 9-16
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40] arXiv:1907.00477 (cross-list from cs.CL) [pdf, other]
Title: Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions
Tejas Srinivasan, Ramon Sanabria, Florian Metze
Comments: Accepted to How2 Workshop, ICML 2019
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:1907.00758 (cross-list from cs.CL) [pdf, other]
Title: Synchronising audio and ultrasound by learning cross-modal embeddings
Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals
Comments: 5 pages, 1 figure, 4 tables; Interspeech 2019 with the following edits: 1) Loss and accuracy upon convergence were accidentally reported from an older model. Now updated with model described throughout the paper. All other results remain unchanged. 2) Max true offset in the training data corrected from 179ms to 1789ms. 3) Detectability "boundary/range" renamed to detectability "thresholds"
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[42] arXiv:1907.00772 (cross-list from eess.AS) [pdf, other]
Title: Analysis by Adversarial Synthesis -- A Novel Approach for Speech Vocoding
Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas Maier
Comments: Accepted to Interspeech 2019
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[43] arXiv:1907.00797 (cross-list from eess.AS) [pdf, other]
Title: Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda
Comments: 5 pages, 4 figures, Proc. Interspeech, 2019
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[44] arXiv:1907.00818 (cross-list from eess.AS) [pdf, other]
Title: Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions
Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Comments: 5 pages, 3 figures, Accepted for publication at Interspeech 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD); Image and Video Processing (eess.IV)
[45] arXiv:1907.00824 (cross-list from cs.HC) [pdf, other]
Title: Designing Deep Reinforcement Learning for Human Parameter Exploration
Hugo Scurto, Bavo Van Kerrebroeck, Baptiste Caramiaux, Frédéric Bevilacqua
Comments: Author's version of the work. The definitive Version of Record was published in ACM Transactions on Computer-Human Interaction (TOCHI)
Journal-ref: ACM Trans. Comput.-Hum. Interact. 28, 1, Article 1 (January 2021), 35 pages (2021)
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[46] arXiv:1907.00835 (cross-list from cs.CL) [pdf, other]
Title: UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions
Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James Scobbie, Alan Wrench
Comments: 5 pages, 1 figure, 3 tables; accepted to Interspeech 2018: 19th Annual Conference of the International Speech Communication Association (ISCA)
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[47] arXiv:1907.00873 (cross-list from eess.AS) [pdf, other]
Title: Compression of Acoustic Event Detection Models With Quantized Distillation
Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
Comments: Interspeech 2019
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[48] arXiv:1907.00971 (cross-list from cs.LG) [pdf, other]
Title: Universal audio synthesizer control with normalizing flows
Philippe Esling, Naotake Masuda, Adrien Bardet, Romeo Despres, Axel Chemla--Romeu-Santos
Comments: DaFX 2019
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[49] arXiv:1907.01030 (cross-list from eess.AS) [pdf, other]
Title: LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring
Eugen Beck, Wei Zhou, Ralf Schlüter, Hermann Ney
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[50] arXiv:1907.01154 (cross-list from cs.MM) [pdf, other]
Title: Adaptive Music Composition for Games
Patrick Hutchings, Jon McCormack
Comments: Preprint. Accepted for publication in IEEE Transactions on Games, 2019
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 125 entries : 1-50 51-100 101-125
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack