Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for July 2019

Total of 125 entries : 1-50 51-100 101-125
Showing up to 50 entries per page: fewer | more | all
[51] arXiv:1907.01164 (cross-list from cs.LG) [pdf, other]
Title: Learning to Traverse Latent Spaces for Musical Score Inpainting
Ashis Pati, Alexander Lerch, Gaëtan Hadjeres
Comments: 20th International Society for Music Information Retrieval Conference (ISMIR), 2019, Delft, The Netherlands; 6 pages, 8 figures
Journal-ref: 20th International Society for Music Information Retrieval Conference (ISMIR), 2019, Delft, The Netherlands
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[52] arXiv:1907.01277 (cross-list from eess.AS) [pdf, other]
Title: Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations
Gabriel Meseguer-Brocal, Geoffroy Peeters
Journal-ref: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR, Delft, Netherlands, 2019
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53] arXiv:1907.01367 (cross-list from eess.AS) [pdf, other]
Title: Lipper: Synthesizing Thy Speech using Multi-View Lipreading
Yaman Kumar, Rohit Jain, Khwaja Mohd. Salik, Rajiv Ratn Shah, Yifang yin, Roger Zimmermann
Comments: Accepted at AAAI 2019
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[54] arXiv:1907.01369 (cross-list from eess.AS) [pdf, other]
Title: Analyzing Verbal and Nonverbal Features for Predicting Group Performance
Uliyana Kubasova, Gabriel Murray, McKenzie Braley
Comments: Accepted to INTERSPEECH 2019 (Graz, Austria)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[55] arXiv:1907.01372 (cross-list from eess.AS) [pdf, other]
Title: Improving Performance of End-to-End ASR on Numeric Sequences
Cal Peyser, Hao Zhang, Tara N. Sainath, Zelin Wu
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[56] arXiv:1907.01409 (cross-list from eess.AS) [pdf, other]
Title: Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR
Wilfried Michel, Ralf Schlüter, Hermann Ney
Comments: Submitted to Interspeech 2019
Journal-ref: Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, pp. 1601--1605
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[57] arXiv:1907.01413 (cross-list from eess.AS) [pdf, other]
Title: Speaker-independent classification of phonetic segments from raw ultrasound in child speech
Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Comments: 5 pages, 4 figures, published in ICASSP2019 (IEEE International Conference on Acoustics, Speech and Signal Processing, 2019)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[58] arXiv:1907.01448 (cross-list from eess.AS) [pdf, other]
Title: Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification
Chieh-Chi Kao, Ming Sun, Yixin Gao, Shiv Vitaladevuni, Chao Wang
Comments: Accepted by Interspeech 2019
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59] arXiv:1907.01607 (cross-list from eess.AS) [pdf, other]
Title: MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation
Xia Liang, Junmin Wu, Yan Yin
Comments: cast KSEM2019 on May 3, 2019 (weak rejected)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[60] arXiv:1907.01640 (cross-list from cs.IR) [pdf, other]
Title: SeER: An Explainable Deep Learning MIDI-based Hybrid Song Recommender System
Khalil Damak, Olfa Nasraoui
Comments: 8 pages, 6 figures; added offline validation of explainability method
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[61] arXiv:1907.01803 (cross-list from cs.LG) [pdf, other]
Title: The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification
Khaled Koutini, Hamid Eghbal-zadeh, Matthias Dorfer, Gerhard Widmer
Comments: IEEE EUSIPCO 2019
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[62] arXiv:1907.01914 (cross-list from eess.AS) [pdf, other]
Title: Attention model for articulatory features detection
Ievgen Karaulov, Dmytro Tkanov
Comments: Interspeech 2019, 5 pages, 2 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[63] arXiv:1907.01957 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Speech Recognition with High-Frame-Rate Features Extraction
Cong-Thanh Do
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[64] arXiv:1907.02404 (cross-list from eess.SP) [pdf, other]
Title: Blind Audio Source Separation with Minimum-Volume Beta-Divergence NMF
Valentin Leplat, Nicolas Gillis, Man Shun Ang
Comments: 24 pages, 10 figures, 3 tables
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[65] arXiv:1907.02477 (cross-list from cs.LG) [pdf, other]
Title: Adversarial Attacks in Sound Event Classification
Vinod Subramanian, Emmanouil Benetos, Ning Xu, SKoT McDonald, Mark Sandler
Comments: Fixed Freesound data reference to FSDKaggle2018
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66] arXiv:1907.02663 (cross-list from eess.AS) [pdf, other]
Title: The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion
Weicheng Cai, Haiwei Wu, Danwei Cai, Ming Li
Comments: Accepted for INTERSPEECH 2019
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[67] arXiv:1907.02670 (cross-list from cs.LG) [pdf, other]
Title: Zero-shot Learning for Audio-based Music Classification and Tagging
Jeong Choi, Jongpil Lee, Jiyoung Park, Juhan Nam
Comments: 20th International Society for Music Information Retrieval Conference (ISMIR), 2019
Subjects: Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68] arXiv:1907.02784 (cross-list from eess.AS) [pdf, other]
Title: A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
Noé Tits
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[69] arXiv:1907.03233 (cross-list from cs.CL) [pdf, other]
Title: NIESR: Nuisance Invariant End-to-end Speech Recognition
I-Hung Hsu, Ayush Jaiswal, Premkumar Natarajan
Comments: To appear in Proceedings of Interspeech 2019
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:1907.04224 (cross-list from cs.CL) [pdf, other]
Title: Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
Yonatan Belinkov, Ahmed Ali, James Glass
Comments: Corrected dataset statistics
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71] arXiv:1907.04258 (cross-list from cs.NE) [pdf, other]
Title: Melody Generation using an Interactive Evolutionary Algorithm
Majid Farzaneh, Rahil Mahdian Toroghi
Comments: 5 pages, 4 images, submitted to MEDPRAI2019 conference
Subjects: Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72] arXiv:1907.04294 (cross-list from cs.IR) [pdf, other]
Title: An Attention Mechanism for Musical Instrument Recognition
Siddharth Gururani, Mohit Sharma, Alexander Lerch
Comments: To appear in: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019
Subjects: Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:1907.04355 (cross-list from cs.CL) [pdf, other]
Title: Transfer Learning from Audio-Visual Grounding to Speech Recognition
Wei-Ning Hsu, David Harwath, James Glass
Comments: Accepted to Interspeech 2019. 4 pages, 2 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:1907.04448 (cross-list from cs.CL) [pdf, other]
Title: Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, RJ Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran
Comments: 5 pages, submitted to Interspeech 2019
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75] arXiv:1907.04462 (cross-list from cs.CL) [pdf, other]
Title: Multi-Speaker End-to-End Speech Synthesis
Jihyun Park, Kexin Zhao, Kainan Peng, Wei Ping
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[76] arXiv:1907.04536 (cross-list from cs.LG) [pdf, other]
Title: Multi-layer Attention Mechanism for Speech Keyword Recognition
Ruisen Luo, Tianran Sun, Chen Wang, Miao Du, Zuodong Tang, Kai Zhou, Xiaofeng Gong, Xiaomei Yang
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[77] arXiv:1907.04655 (cross-list from eess.SP) [pdf, other]
Title: Audio-Based Search and Rescue with a Drone: Highlights from the IEEE Signal Processing Cup 2019 Student Competition
Antoine Deleforge, Diego Di Carlo, Martin Strauss, Romain Serizel, Lucio Marcenaro
Journal-ref: IEEE Signal Processing Magazine, Institute of Electrical and Electronics Engineers, In press
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:1907.04743 (cross-list from eess.AS) [pdf, other]
Title: Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech
Daniel Korzekwa, Roberto Barra-Chicote, Bozena Kostek, Thomas Drugman, Mateusz Lajszczak
Comments: 5 pages, 5 figures, Accepted for Interspeech 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[79] arXiv:1907.04887 (cross-list from eess.AS) [pdf, other]
Title: Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition
Khoi-Nguyen C. Mac, Xiaodong Cui, Wei Zhang, Michael Picheny
Comments: Interspeech 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[80] arXiv:1907.04916 (cross-list from eess.AS) [pdf, other]
Title: Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR
Felix Weninger, Jesús Andrés-Ferrer, Xinwei Li, Puming Zhan
Comments: To appear in INTERSPEECH 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[81] arXiv:1907.04926 (cross-list from eess.AS) [pdf, other]
Title: Synchronizing Audio-Visual Film Stimuli in Unity (version 5.5.1f1): Game Engines as a Tool for Research
Javier Sanz, Andreas Wulff-Abramsson, Carlos Aguilar-Paredes, Luis Emilio Bruni, Lydia Sanchez
Comments: 13 Pages
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Image and Video Processing (eess.IV)
[82] arXiv:1907.04927 (cross-list from eess.AS) [pdf, other]
Title: Speech bandwidth extension with WaveNet
Archit Gupta, Brendan Shillingford, Yannis Assael, Thomas C. Walters
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[83] arXiv:1907.04928 (cross-list from eess.AS) [pdf, other]
Title: Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction
Mohammed Senoussaoui, Patrick Cardinal, Alessandro Lameiras Koerich
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[84] arXiv:1907.04975 (cross-list from cs.CV) [pdf, other]
Title: My lips are concealed: Audio-visual speech enhancement through obstructions
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
Comments: Accepted to Interspeech 2019
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:1907.05122 (cross-list from eess.AS) [pdf, other]
Title: Polyphonic Sound Event and Sound Activity Detection: A Multi-task approach
Arjun Pankajakshan, Helen L. Bear, Emmanouil Benetos
Comments: Accepted to WASPAA 2019
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[86] arXiv:1907.05337 (cross-list from cs.CL) [pdf, other]
Title: Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Laurent El Shafey, Hagen Soltau, Izhak Shafran
Journal-ref: Proc. Interspeech 2019
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:1907.05351 (cross-list from eess.SP) [pdf, other]
Title: Optimized Sharing of Coefficients in Parallel Filter Banks
M. Tunç Arslan, Onur Yorulmaz, Erdinç L. Atılgan
Comments: 10 pages, submitted to IEEE Transactions on Signal Processing
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[88] arXiv:1907.05599 (cross-list from eess.AS) [pdf, other]
Title: Effective Incorporation of Speaker Information in Utterance Encoding in Dialog
Tianyu Zhao, Tatsuya Kawahara
Comments: 8+1 pages, 3 figures, and 5 tables. Rejected by SIGDIAL 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[89] arXiv:1907.05698 (cross-list from eess.AS) [pdf, other]
Title: Teach an all-rounder with experts in different domains
Zhao You, Dan Su, Dong Yu
Comments: 5 pages and 2 figures; accepted by 2019 IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[90] arXiv:1907.05701 (cross-list from eess.AS) [pdf, other]
Title: A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny
Journal-ref: INTERSPEECH 2019
Subjects: Audio and Speech Processing (eess.AS); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[91] arXiv:1907.05708 (cross-list from eess.AS) [pdf, other]
Title: Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks
Diego Perna, Andrea Tagarelli
Comments: Paper accepted for publication with Procs. of the 32th IEEE CBMS International Symposium on Computer-Based Medical Systems (CBMS 2019)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[92] arXiv:1907.05905 (cross-list from eess.AS) [pdf, other]
Title: Voice Pathology Detection Using Deep Learning: a Preliminary Study
Pavol Harar, Jesus B. Alonso-Hernandez, Jiri Mekyska, Zoltan Galaz, Radim Burget, Zdenek Smekal
Comments: 4 pages, 1 figure, 5 tables
Journal-ref: In 2017 international conference and workshop on bioinspired intelligence (IWOBI), pp. 1-4. IEEE, 2017
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[93] arXiv:1907.06111 (cross-list from eess.AS) [pdf, other]
Title: Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors
Nooshin Maghsoodi, Hossein Sameti, Hossein Zeinali, ThemosStafylakis
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[94] arXiv:1907.06112 (cross-list from eess.AS) [pdf, other]
Title: BUT VOiCES 2019 System Description
Hossein Zeinali, Pavel Matějka, Ladislav Mošner, Oldřich Plchot, Anna Silnova, Ondřej Novotný, Ján Profant, Ondřej Glembek, Lukáš Burget
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[95] arXiv:1907.06286 (cross-list from q-bio.NC) [pdf, other]
Title: Autoencoding sensory substitution
Viktor Tóth, Lauri Parkkonen
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:1907.06342 (cross-list from cs.CL) [pdf, other]
Title: Joint Language Identification of Code-Switching Speech using Attention based E2E Network
Sreeram Ganji, Kunal Dhawan, Kumar Priyadarshi, Rohit Sinha
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97] arXiv:1907.06639 (cross-list from eess.AS) [pdf, other]
Title: Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling
Hangting Chen, Zuozhen Liu, Zongming Liu, Pengyuan Zhang, Yonghong Yan
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[98] arXiv:1907.06859 (cross-list from eess.AS) [pdf, other]
Title: Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features
Kunal Dhawan, Colin Vaz, Ruchir Travadi, Shrikanth Narayanan
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[99] arXiv:1907.07127 (cross-list from eess.AS) [pdf, other]
Title: Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge
Hossein Zeinali, Lukáš Burget, Jan "Honza'' Černocký
Comments: arXiv admin note: text overlap with arXiv:1810.04273
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[100] arXiv:1907.07564 (cross-list from cs.HC) [pdf, other]
Title: Conversational Help for Task Completion and Feature Discovery in Personal Assistants
Madan Gopal Jhawar, Vipindeep Vangala, Nishchay Sharma, Ankur Hayatnagarkar, Mansi Saxena, Swati Valecha
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Total of 125 entries : 1-50 51-100 101-125
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack