Sound

Authors and titles for May 2020

Total of 222 entries : 1-50 51-100 101-150 151-200 201-222

Showing up to 50 entries per page: fewer | more | all

[51] arXiv:2005.03295 (cross-list from eess.AS) [pdf, other]: Title: Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

Seung-won Park, Doo-young Kim, Myun-chul Joe

Comments: To appear in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[52] arXiv:2005.03329 (cross-list from eess.AS) [pdf, other]: Title: Segment Aggregation for short utterances speaker verification using raw waveforms

Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu

Comments: 5 pages, accepted by INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53] arXiv:2005.03418 (cross-list from cs.CL) [pdf, other]: Title: The Perceptimatic English Benchmark for Speech Perception Models

Juliette Millet, Ewan Dunbar

Comments: Accepted to CogSci Conference 2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[54] arXiv:2005.03848 (cross-list from cs.CL) [pdf, other]: Title: Distilling Knowledge from Pre-trained Language Models via Text Smoothing

Xing Wu, Yibing Liu, Xiangyang Zhou, Dianhai Yu

Comments: 5 pages, 2 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2005.03867 (cross-list from eess.AS) [pdf, other]: Title: Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim

Comments: Accepted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[56] arXiv:2005.03889 (cross-list from eess.AS) [pdf, other]: Title: Neural Spatio-Temporal Beamformer for Target Speech Separation

Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu

Comments: accepted to Interspeech2020, Demo: this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57] arXiv:2005.04132 (cross-list from eess.AS) [pdf, other]: Title: Asteroid: the PyTorch-based audio source separation toolkit for researchers

Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent

Comments: Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58] arXiv:2005.04288 (cross-list from eess.AS) [pdf, other]: Title: Incremental Learning for End-to-End Automatic Speech Recognition

Li Fu, Xiaoxiao Li, Libo Zi, Zhengchen Zhang, Youzheng Wu, Xiaodong He, Bowen Zhou

Comments: ASRU 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[59] arXiv:2005.04426 (cross-list from eess.SP) [pdf, other]: Title: Temporal-Framing Adaptive Network for Heart Sound Segmentation without Prior Knowledge of State Duration

Xingyao Wang, Chengyu Liu, Yuwen Li, Xianghong Cheng, Jianqing Li, Gari D. Clifford

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:2005.04587 (cross-list from eess.AS) [pdf, other]: Title: From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint

Zexin Cai, Chuxiong Zhang, Ming Li

Comments: Accepted by INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[61] arXiv:2005.04686 (cross-list from eess.AS) [pdf, other]: Title: SpEx+: A Complete Time Domain Speaker Extraction Network

Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Comments: accepted in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62] arXiv:2005.05313 (cross-list from eess.AS) [pdf, other]: Title: Audio and Contact Microphones for Cough Detection

Thomas Drugman, Jerome Urbain, Nathalie Bauwens, Ricardo Chessini, Anne-Sophie Aubriot, Patrick Lebecque, Thierry Dutoit

Comments: arXiv admin note: substantial text overlap with arXiv:2001.00537

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[63] arXiv:2005.05487 (cross-list from cs.CL) [pdf, other]: Title: Exploring TTS without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020)

Takashi Morita, Hiroki Koda

Comments: Accepted in INTERSPEECH 2020

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64] arXiv:2005.05525 (cross-list from cs.CL) [pdf, other]: Title: DiscreTalk: Text-to-Speech as a Machine Translation Problem

Tomoki Hayashi, Shinji Watanabe

Comments: Submitted to INTERSPEECH 2020. The demo is available on this https URL

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65] arXiv:2005.05592 (cross-list from cs.CV) [pdf, other]: Title: Discriminative Multi-modality Speech Recognition

Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang

Comments: CVPR2020

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[66] arXiv:2005.06038 (cross-list from cs.LG) [pdf, other]: Title: Generalized Multi-view Shared Subspace Learning using View Bootstrapping

Krishna Somandepalli, Shrikanth Narayanan

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Machine Learning (stat.ML)
[67] arXiv:2005.06065 (cross-list from eess.AS) [pdf, other]: Title: Automatic Estimation of Intelligibility Measure for Consonants in Speech

Ali Abavisani, Mark Hasegawa-Johnson

Comments: 5 pages, 1 figure, 7 tables, submitted to Inter Speech 2020 Conference

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[68] arXiv:2005.06650 (cross-list from eess.AS) [pdf, other]: Title: Memory Controlled Sequential Self Attention for Sound Recognition

Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos

Comments: Accepted to INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[69] arXiv:2005.06720 (cross-list from eess.AS) [pdf, other]: Title: Streaming keyword spotting on mobile devices

Oleg Rybakov, Natasha Kononenko, Niranjan Subrahmanya, Mirko Visontai, Stella Laurenzo

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[70] arXiv:2005.06959 (cross-list from eess.AS) [pdf, other]: Title: Consonant gemination in Italian: the affricate and fricative case

Maria Gabriella Di Benedetto, Luca De Nardis

Comments: Submitted to Speech Communication. arXiv admin note: substantial text overlap with arXiv:2005.06960

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[71] arXiv:2005.06960 (cross-list from eess.AS) [pdf, other]: Title: Consonant gemination in Italian: the nasal and liquid case

Maria-Gabriella Di Benedetto, Luca De Nardis

Comments: Submitted to Speech Communication. arXiv admin note: substantial text overlap with arXiv:2005.06959

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[72] arXiv:2005.06987 (cross-list from cs.SI) [pdf, other]: Title: The universality of skipping behaviours on music streaming platforms

Jonathan Donier

Subjects: Social and Information Networks (cs.SI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[73] arXiv:2005.06993 (cross-list from cs.LG) [pdf, other]: Title: deepSELF: An Open Source Deep Self End-to-End Learning Framework

Tomoya Koike, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto

Comments: 4 pages, 1 figure

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2005.07006 (cross-list from eess.AS) [pdf, other]: Title: Foreground-Background Ambient Sound Scene Separation

Michel Olvera (MULTISPEECH), Emmanuel Vincent (MULTISPEECH), Romain Serizel (MULTISPEECH), Gilles Gasso (LITIS)

Journal-ref: 28th European Signal Processing Conference (EUSIPCO), Jan 2021, Amsterdam, Netherlands

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[75] arXiv:2005.07029 (cross-list from eess.AS) [pdf, other]: Title: DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

Yi-Chen Chen, Jui-Yang Hsu, Cheng-Kuang Lee, Hung-yi Lee

Comments: Accepted at INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[76] arXiv:2005.07036 (cross-list from eess.AS) [pdf, other]: Title: Infant Crying Detection in Real-World Environments

Xuewen Yao, Megan Micheletti, Mckensey Johnson, Edison Thomaz, Kaya de Barbaro

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[77] arXiv:2005.07057 (cross-list from eess.AS) [pdf, other]: Title: Vibration Analysis in Bearings for Failure Prevention using CNN

Luis A. Pinedo-Sanchez, Diego A. Mercado-Ravell, Carlos A. Carballo-Monsivais

Comments: This paper is a preprint of a paper submitted to Journal of the Brazilian Society of Mechanical Sciences and Engineering

Journal-ref: J Braz. Soc. Mech. Sci. Eng. 42, 628 (2020)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[78] arXiv:2005.07143 (cross-list from eess.AS) [pdf, other]: Title: ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck

Comments: proceedings of INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79] arXiv:2005.07157 (cross-list from eess.AS) [pdf, other]: Title: You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Aleksandr Laptev, Roman Korostik, Aleksey Svischev, Andrei Andrusenko, Ivan Medennikov, Sergey Rybin

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[80] arXiv:2005.07272 (cross-list from eess.AS) [pdf, other]: Title: Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko

Comments: Accepted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[81] arXiv:2005.07383 (cross-list from eess.AS) [pdf, other]: Title: On Bottleneck Features for Text-Dependent Speaker Verification Using X-vectors

Achintya Kumar Sarkar, Zheng-Hua Tan

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[82] arXiv:2005.07394 (cross-list from cs.CL) [pdf, other]: Title: Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model

Da-Rong Liu, Chunxi Liu, Frank Zhang, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2005.07412 (cross-list from eess.AS) [pdf, other]: Title: WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

Po-chun Hsu, Hung-yi Lee

Comments: INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[84] arXiv:2005.07549 (cross-list from eess.AS) [pdf, other]: Title: Siamese Neural Networks for Class Activity Detection

Hang Li, Zhiwei Wang, Jiliang Tang, Wenbiao Ding, Zitao Liu

Comments: The 21th International Conference on Artificial Intelligence in Education(AIED), 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[85] arXiv:2005.07551 (cross-list from eess.AS) [pdf, other]: Title: Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression

Nils L. Westhausen, Bernd T. Meyer

Comments: Accepted by Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[86] arXiv:2005.07578 (cross-list from eess.AS) [pdf, other]: Title: Context-Dependent Acoustic Modeling without Explicit Phone Clustering

Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney

Comments: Proceedings of Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[87] arXiv:2005.07623 (cross-list from eess.AS) [pdf, other]: Title: An Auto Encoder For Audio Dolphin Communication

Daniel Kohlsdorf, Denise Herzing, Thad Starner

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[88] arXiv:2005.07631 (cross-list from eess.AS) [pdf, other]: Title: Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet

Hongsheng Chen, Teng Xiang, Kai Chen, Jing Lu

Comments: 5 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[89] arXiv:2005.07757 (cross-list from eess.AS) [pdf, other]: Title: "I have vxxx bxx connexxxn!": Facing Packet Loss in Deep Speech Emotion Recognition

Mostafa M. Mohamed, Björn W. Schuller

Comments: Submitted to INTERSPEECH 2020. 4 Pages + 1 page for references. 4 Figures and 2 Tables

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[90] arXiv:2005.07777 (cross-list from eess.AS) [pdf, other]: Title: ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition

Mostafa M. Mohamed, Björn W. Schuller

Comments: Submission for INTERSPEECH 2020. 4 Pages + 1 references page. 4 Tables, 3 Figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[91] arXiv:2005.07788 (cross-list from eess.AS) [pdf, other]: Title: Reliable Local Explanations for Machine Listening

Saumitra Mishra, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

Comments: 8 pages plus references. Accepted at the IJCNN 2020 Special Session on Explainable Computational/Artificial Intelligence. Camera-ready version

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[92] arXiv:2005.07794 (cross-list from eess.AS) [pdf, other]: Title: On Deep Speech Packet Loss Concealment: A Mini-Survey

Mostafa M. Mohamed, Mina A. Nessiem, Björn W. Schuller

Comments: Submission for INTERSPEECH 2020. 4 pages + 1 references page. 3 Figures and 1 Table

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[93] arXiv:2005.07799 (cross-list from eess.AS) [pdf, other]: Title: JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

Dan Lim, Won Jang, Gyeonghwan O, Heayoung Park, Bongwan Kim, Jaesam Yoon

Comments: Accepted for publication in Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[94] arXiv:2005.07809 (cross-list from eess.AS) [pdf, other]: Title: Feature Fusion Strategies for End-to-End Evaluation of Cognitive Behavior Therapy Sessions

Zhuohao Chen, Nikolaos Flemotomos, Victor Ardulov, Torrey A. Creed, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[95] arXiv:2005.07810 (cross-list from eess.AS) [pdf, other]: Title: Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

Mohammad Asif Khan, Fabien Cardinaux, Stefan Uhlich, Marc Ferras, Asja Fischer

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[96] arXiv:2005.07815 (cross-list from eess.AS) [pdf, other]: Title: ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network

Yurii Rebryk, Stanislav Beliaev

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[97] arXiv:2005.07817 (cross-list from eess.AS) [pdf, other]: Title: Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Yanpei Shi, Qiang Huang, Thomas Hain

Comments: Acceptted for presentation at Interspeech2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[98] arXiv:2005.07818 (cross-list from eess.AS) [pdf, other]: Title: Speaker Re-identification with Speaker Dependent Speech Enhancement

Yanpei Shi, Qiang Huang, Thomas Hain

Comments: Acceptted for presentation at Interspeech2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[99] arXiv:2005.07850 (cross-list from eess.AS) [pdf, other]: Title: Large scale weakly and semi-supervised learning for low-resource video ASR

Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[100] arXiv:2005.07884 (cross-list from eess.AS) [pdf, other]: Title: Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Yi Zhao, Haoyu Li, Cheng-I Lai, Jennifer Williams, Erica Cooper, Junichi Yamagishi

Comments: Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Total of 222 entries : 1-50 51-100 101-150 151-200 201-222

Showing up to 50 entries per page: fewer | more | all