Audio and Speech Processing

Authors and titles for February 2021

Total of 208 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 176-200 ... 201-208

Showing up to 25 entries per page: fewer | more | all

[101] arXiv:2102.01991 (cross-list from cs.SD) [pdf, other]: Title: Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Shengkui Zhao, Hao Wang, Trung Hieu Nguyen, Bin Ma

Comments: 5 pages, 2 figures, 4 tables, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[102] arXiv:2102.01993 (cross-list from cs.SD) [pdf, html, other]: Title: Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Shengkui Zhao, Trung Hieu Nguyen, Bin Ma

Comments: 5 pages, 4 figures, 2 tables, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[103] arXiv:2102.02028 (cross-list from cs.SD) [pdf, other]: Title: Music source separation conditioned on 3D point clouds

Francesc Lluís, Vasileios Chatziioannou, Alex Hofmann

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[104] arXiv:2102.02074 (cross-list from cs.SD) [pdf, other]: Title: Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[105] arXiv:2102.02270 (cross-list from cs.CL) [pdf, other]: Title: Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Prashanth Gurunath Shivakumar, Panayiotis Georgiou, Shrikanth Narayanan

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[106] arXiv:2102.02282 (cross-list from cs.SD) [pdf, other]: Title: Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks

Bruno Di Giorgi, Matthias Mauch, Mark Levy

Comments: 7 pages, 5 figures, Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020

Journal-ref: Proceedings of the 21st International Society for Music Information Retrieval Conference (2020) 216-222

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[107] arXiv:2102.02417 (cross-list from cs.SD) [pdf, other]: Title: Audio Adversarial Examples: Attacks Using Vocal Masks

Kai Yuan Tay, Lynnette Ng, Wei Han Chua, Lucerne Loke, Danqi Ye, Melissa Chua

Comments: 9 pages, 1 figure, 2 tables. Submitted to COLING2020

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[108] arXiv:2102.02640 (cross-list from cs.SD) [pdf, other]: Title: Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Gang Min, Xiongwei Zhang, Xia Zou, Xiangyang Liu

Comments: 6 pages

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[109] arXiv:2102.02964 (cross-list from cs.SD) [pdf, other]: Title: Diversity-Robust Acoustic Feature Signatures Based on Multiscale Fractal Dimension for Similarity Search of Environmental Sounds

Motohiro Sunouchi, Masaharu Yoshioka

Comments: 15 pages, 14 figures

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[110] arXiv:2102.03049 (cross-list from cs.SD) [pdf, other]: Title: Benchmarking of eight recurrent neural network variants for breath phase and adventitious sound detection on a self-developed open-access lung sound database-HF_Lung_V1

Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Chao-Jung Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Jack Hsiao, Chung-Wei Chen, Li-Chin Chen, Yen-Chun Lai, Bi-Fang Hsu, Nian-Jhen Lin, Wan-Lin Tsai, Yi-Lin Wu, Tzu-Ling Tseng, Ching-Ting Tseng, Yi-Tsun Chen, Feipei Lai

Comments: 48 pages, 8 figures. Accepted by PLoS One

Journal-ref: PLoS ONE, 2021, 16(7): e0254134

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[111] arXiv:2102.03055 (cross-list from cs.SD) [pdf, other]: Title: Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR

Ruizhi Li, Gregory Sell, Hynek Hermansky

Comments: Accepted at IEEE SLT 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[112] arXiv:2102.03170 (cross-list from cs.SD) [pdf, other]: Title: White-box Audio VST Effect Programming

Christopher Mitcheltree, Hideki Koike

Comments: The latest version of the system is to appear at EvoMUSART 2021 as a full paper. Audio samples of the latest system can be listened to at this https URL

Journal-ref: 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[113] arXiv:2102.03207 (cross-list from cs.SD) [pdf, other]: Title: Real-time Denoising and Dereverberation with Tiny Recurrent U-Net

Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee

Comments: 5 pages, 2 figures, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). arXiv admin note: text overlap with arXiv:2006.00687

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[114] arXiv:2102.03229 (cross-list from cs.SD) [pdf, other]: Title: Multi-Task Self-Supervised Pre-Training for Music Classification

Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang

Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[115] arXiv:2102.03424 (cross-list from cs.CV) [pdf, other]: Title: Learning Audio-Visual Correlations from Variational Cross-Modal Generation

Ye Zhu, Yu Wu, Hugo Latapie, Yi Yang, Yan Yan

Comments: Accepted to ICASSP 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[116] arXiv:2102.03662 (cross-list from cs.CL) [pdf, other]: Title: A bandit approach to curriculum generation for automatic speech recognition

Anastasia Kuznetsova, Anurag Kumar, Francis M. Tyers

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2102.03868 (cross-list from cs.SD) [pdf, other]: Title: U-vectors: Generating clusterable speaker embedding from unlabeled data

M. F. Mridha, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md. Abdul Hamid, Md. Rashedul Islam, Yutaka Watanobe

Comments: 18 pages, 7 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[118] arXiv:2102.03957 (cross-list from cs.SD) [pdf, other]: Title: Extracting the Auditory Attention in a Dual-Speaker Scenario from EEG using a Joint CNN-LSTM Model

Ivine Kuruvila, Jan Muncke, Eghart Fischer, Ulrich Hoppe

Comments: 18 pages, 6 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[119] arXiv:2102.04040 (cross-list from cs.SD) [pdf, other]: Title: LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu

Comments: Accepted to ICASSP 21

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[120] arXiv:2102.04051 (cross-list from cs.HC) [pdf, other]: Title: HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

Comments: 5 pages, 6 figures, to be published in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121] arXiv:2102.04056 (cross-list from cs.SD) [pdf, other]: Title: Speaker and Direction Inferred Dual-channel Speech Separation

Chenxing Li, Jiaming Xu, Nima Mesgarani, Bo Xu

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2102.04062 (cross-list from cs.SD) [pdf, other]: Title: An Update on a Progressively Expanded Database for Automated Lung Sound Analysis

Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Jack Hsiao, Chung-Wei Chen, Feipei Lai

Comments: Under review, 14 pages, 5 figures, 3 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[123] arXiv:2102.04198 (cross-list from cs.SD) [pdf, other]: Title: ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network

Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li

Comments: 5 pages, 3 figures, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124] arXiv:2102.04254 (cross-list from cs.CE) [pdf, other]: Title: A Data-Driven Approach to Violin Making

Sebastian Gonzalez, Davide Salvi, Daniel Baeza, Fabio Antonacci, Augusto Sarti

Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2102.04429 (cross-list from cs.SD) [pdf, other]: Title: Federated Acoustic Modeling For Automatic Speech Recognition

Xiaodong Cui, Songtao Lu, Brian Kingsbury

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Distributed, Parallel, and Cluster Computing (cs.DC); Audio and Speech Processing (eess.AS)

Total of 208 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 176-200 ... 201-208

Showing up to 25 entries per page: fewer | more | all