Sound

Authors and titles for July 2019

Total of 125 entries : 1-25 26-50 51-75 76-100 101-125

Showing up to 25 entries per page: fewer | more | all

[101] arXiv:1907.07626 (cross-list from eess.AS) [pdf, other]: Title: AP19-OLR Challenge: Three Tasks and Their Baselines

Zhiyuan Tang, Dong Wang, Liming Song

Comments: arXiv admin note: substantial text overlap with arXiv:1806.00616, arXiv:1706.09742, arXiv:1609.08445

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[102] arXiv:1907.07769 (cross-list from eess.AS) [pdf, other]: Title: Hierarchical Sequence to Sequence Voice Conversion with Limited Data

Praveen Narayanan, Punarjay Chakravarty, Francois Charette, Gint Puskorius

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[103] arXiv:1907.07951 (cross-list from eess.IV) [pdf, other]: Title: Automatic vocal tract landmark localization from midsagittal MRI data

Mohammad Eslami, Christiane Neuschaefer-Rube, Antoine Serrurier

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104] arXiv:1907.08293 (cross-list from eess.AS) [pdf, other]: Title: Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data

Kunal Dhawan, Ganji Sreeram, Kumar Priyadarshi, Rohit Sinha

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[105] arXiv:1907.08294 (cross-list from eess.AS) [pdf, other]: Title: DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

Comments: 6 pages, 7 figures, accepted for The 10th ISCA Speech Synthesis Workshop (SSW10)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[106] arXiv:1907.08338 (cross-list from eess.AS) [pdf, other]: Title: Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

Yuma Koizumi, Shoichiro Saito, Masataka Yamaguchi, Shin Murata, Noboru Harada

Comments: 5 pages, to appear in IEEE WASPAA 2019

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[107] arXiv:1907.08661 (cross-list from cs.HC) [pdf, other]: Title: Sound Search by Text Description or Vocal Imitation?

Yichi Zhang, Yiting Zhang, Zhiyao Duan

Comments: 5 pages, 3 figures

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108] arXiv:1907.08940 (cross-list from eess.AS) [pdf, other]: Title: Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder

Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

Comments: 6pages, 7figures, Proc. SSW10, 2019

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[109] arXiv:1907.09006 (cross-list from eess.AS) [pdf, other]: Title: Forward-Backward Decoding for Regularizing End-to-End TTS

Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jianhua Tao

Comments: Accepted by INTERSPEECH2019. arXiv admin note: text overlap with arXiv:1808.04064, arXiv:1804.05374 by other authors

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[110] arXiv:1907.09250 (cross-list from eess.AS) [pdf, other]: Title: ML Estimation and CRBs for Reverberation, Speech and Noise PSDs in Rank-Deficient Noise-Field

Yaron Laufer, Bracha Laufer-Goldshtein, Sharon Gannot

Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[111] arXiv:1907.09636 (cross-list from cs.CL) [pdf, other]: Title: On Modeling ASR Word Confidence

Woojay Jeon, Maxwell Jordan, Mahesh Krishnamoorthy

Comments: Presented at IEEE ICASSP 2020, May 2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[112] arXiv:1907.09775 (cross-list from cs.RO) [pdf, other]: Title: Multisensory Learning Framework for Robot Drumming

A. Barsky, C. Zito, H. Mori, T. Ogata, J. L. Wyatt

Comments: Extended abstract

Journal-ref: Workshop on Crossmodal Learning for Intelligent Robotics 2nd Edition. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[113] arXiv:1907.09919 (cross-list from cs.HC) [pdf, other]: Title: Speech, Head, and Eye-based Cues for Continuous Affect Prediction

Jonny O'Dwyer

Comments: Accepted paper (pre-print) for 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[114] arXiv:1907.10185 (cross-list from eess.AS) [pdf, other]: Title: Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

Comments: Accepted to INTERSPEECH 2019

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[115] arXiv:1907.10380 (cross-list from cs.HC) [pdf, other]: Title: NONOTO: A Model-agnostic Web Interface for Interactive Music Composition by Inpainting

Théis Bazin, Gaëtan Hadjeres

Comments: 3 pages, 1 figure. Published as a conference paper at the 10th International Conference on Computational Creativity (ICCC 2019), UNC Charlotte, North Carolina

Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:1907.10393 (cross-list from eess.AS) [pdf, other]: Title: LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras

Comments: Accepted for INTERSPEECH 2019

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[117] arXiv:1907.10420 (cross-list from eess.AS) [pdf, other]: Title: A Deep Neural Network for Short-Segment Speaker Recognition

Amirhossein Hajavi, Ali Etemad

Comments: Accepted in Interspeech 2019

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[118] arXiv:1907.10428 (cross-list from cs.LG) [pdf, other]: Title: EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings

Jing Han, Zixing Zhang, Zhao Ren, Björn Schuller

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:1907.10726 (cross-list from eess.AS) [pdf, other]: Title: Cross-Attention End-to-End ASR for Two-Party Conversations

Suyoun Kim, Siddharth Dalmia, Florian Metze

Comments: Interspeech 2019

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[120] arXiv:1907.11361 (cross-list from eess.AS) [pdf, other]: Title: Correlation Distance Skip Connection Denoising Autoencoder (CDSK-DAE) for Speech Feature Enhancement

Alzahra Badi, Sangwook Park, David K. Han, Hanseok Ko

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[121] arXiv:1907.11640 (cross-list from cs.CL) [pdf, other]: Title: On the Use/Misuse of the Term 'Phoneme'

Roger K. Moore, Lucy Skidmore

Comments: Accepted at INTERSPEECH-2019

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:1907.12421 (cross-list from eess.AS) [pdf, other]: Title: MIRaGe: Multichannel Database Of Room Impulse Responses Measured On High-Resolution Cube-Shaped Grid In Multiple Acoustic Conditions

Jaroslav Čmejla, Tomáš Kounovský, Sharon Gannot, Zbyněk Koldovský, Pinchas Tandeitnik

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[123] arXiv:1907.12621 (cross-list from eess.AS) [pdf, other]: Title: Fast and Robust 3-D Sound Source Localization with DSVD-PHAT

Francois Grondin, James Glass

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[124] arXiv:1907.13121 (cross-list from eess.AS) [pdf, other]: Title: Multi-Frame Cross-Entropy Training for Convolutional Neural Networks in Speech Recognition

Tom Sercu, Neil Mallinar

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[125] arXiv:1907.13511 (cross-list from cs.CL) [pdf, other]: Title: Personalizing ASR for Dysarthric and Accented Speech with Limited Data

Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias

Comments: 5 pages

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 125 entries : 1-25 26-50 51-75 76-100 101-125

Showing up to 25 entries per page: fewer | more | all