Sound

Authors and titles for September 2018

Total of 45 entries : 1-25 26-45

Showing up to 25 entries per page: fewer | more | all

[26] arXiv:1809.01728 (cross-list from eess.AS) [pdf, other]: Title: Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

George Sterpu, Christian Saam, Naomi Harte

Comments: In ICMI'18, October 16-20, 2018, Boulder, CO, USA. Equation (2) corrected on this version

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[27] arXiv:1809.02251 (cross-list from eess.AS) [pdf, other]: Title: Adversarial Feature-Mapping for Speech Enhancement

Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred)Juang

Comments: 5 pages, 2 figures, Interspeech 2018

Journal-ref: Interspeech 2018

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[28] arXiv:1809.02253 (cross-list from eess.AS) [pdf, other]: Title: Cycle-Consistent Speech Enhancement

Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred)Juang

Comments: 5 pages, 2 figures. Interspeech 2018. arXiv admin note: text overlap with arXiv:1809.02251

Journal-ref: Interspeech 2018

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[29] arXiv:1809.02906 (cross-list from eess.AS) [pdf, other]: Title: End-to-end Language Identification using NetFV and NetVLAD

Jinkun Chen, Weicheng Cai, Danwei Cai, Zexin Cai, Haibin Zhong, Ming Li

Comments: Accepted for ISCSLP 2018

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[30] arXiv:1809.03868 (cross-list from eess.AS) [pdf, other]: Title: Dual-label Deep LSTM Dereverberation For Speaker Verification

Hao Zhang, Stephen Zahorian, Xiao Chen, Peter Guzewich, Xiaoyu Liu

Comments: 4 pages, 3 figures, submitted to Interspeech 2018

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[31] arXiv:1809.04115 (cross-list from eess.AS) [pdf, other]: Title: One-Shot Speaker Identification for a Service Robot using a CNN-based Generic Verifier

Ivette Vélez (1), Caleb Rascon (1), Gibrán Fuentes-Pineda (1) ((1) Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS), Universidad Nacional Autónoma de México (UNAM), Mexico.)

Comments: 8 pages, 9 figures, 2 tables. This paper is under review as a Submission for RA-L and ICRA for the IEEE Robotics and Automation Letters (RA-L). A video demonstration of the full system, as well as all relevant downloads (corpora, source code, models, etc.) can be found at: this http URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[32] arXiv:1809.04214 (cross-list from cs.CL) [pdf, other]: Title: Automatic, Personalized, and Flexible Playlist Generation using Reinforcement Learning

Shun-Yao Shih, Heng-Yu Chi

Comments: 7 pages, 4 figures, ISMIR 2018

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:1809.04281 (cross-list from cs.LG) [pdf, other]: Title: Music Transformer

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

Comments: Improved skewing section and accompanying figures. Previous titles are "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" and "Music Transformer"

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[34] arXiv:1809.06798 (cross-list from eess.AS) [pdf, other]: Title: Generative x-vectors for text-independent speaker verification

Longting Xu, Rohan Kumar Das, Emre Yılmaz, Jichen Yang, Haizhou Li

Comments: Accepted for publication at SLT 2018

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[35] arXiv:1809.06800 (cross-list from eess.AS) [pdf, other]: Title: Visual Speech Language Models

Helen L Bear

Comments: Extended abstract based on Decoding Visemes: improving machine lipreading, Bear & Harvey, ICASSP 2016

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36] arXiv:1809.07276 (cross-list from cs.IR) [pdf, other]: Title: Music Mood Detection Based On Audio And Lyrics With Deep Neural Net

Rémi Delbouys, Romain Hennequin, Francesco Piccoli, Jimena Royo-Letelier, Manuel Moussallam

Comments: Published in ISMIR 2018

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[37] arXiv:1809.07384 (cross-list from eess.AS) [pdf, other]: Title: New insights on the optimality of parameterized wiener filters for speech enhancement applications

Rafael Attili Chiea, Márcio Holsbach Costa, Guillaume Barrault

Comments: 26 pages

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[38] arXiv:1809.07524 (cross-list from cs.RO) [pdf, other]: Title: Diffraction-Aware Sound Localization for a Non-Line-of-Sight Source

Inkyu An, Doheon Lee, Jung-woo Choi, Dinesh Manocha, Sung-eui Yoon

Comments: Submitted to ICRA 2019. The working video is available at (this https URL)

Subjects: Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:1809.07824 (cross-list from cs.LG) [pdf, other]: Title: Metric Learning for Phoneme Perception

Yair Lakretz, Gal Chechik, Evan-Gary Cohen, Alessandro Treves, Naama Friedmann

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[40] arXiv:1809.08001 (cross-list from cs.CV) [pdf, other]: Title: Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

Comments: Preprint. Work in progress

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:1809.08909 (cross-list from cs.CL) [pdf, other]: Title: Language Identification with Deep Bottleneck Features

Zhanyu Ma, Hong Yu

Comments: Preliminary work report

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[42] arXiv:1809.09190 (cross-list from eess.AS) [pdf, other]: Title: From Audio to Semantics: Approaches to end-to-end spoken language understanding

Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[43] arXiv:1809.10288 (cross-list from eess.AS) [pdf, other]: Title: WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka

Comments: SLT2018

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[44] arXiv:1809.10460 (cross-list from cs.LG) [pdf, other]: Title: Sample Efficient Adaptive Text-to-Speech

Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas

Comments: Accepted by ICLR 2019

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[45] arXiv:1809.10875 (cross-list from cs.LG) [pdf, other]: Title: Characterizing Audio Adversarial Examples Using Temporal Dependency

Zhuolin Yang, Bo Li, Pin-Yu Chen, Dawn Song

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Total of 45 entries : 1-25 26-45

Showing up to 25 entries per page: fewer | more | all