Sound

Authors and titles for February 2021

Total of 183 entries : 1-50 51-100 101-150 151-183

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2102.00151 [pdf, other]: Title: Expressive Neural Voice Cloning

Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

Comments: 12 pages, 2 figures, 2 tables

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2102.00201 [pdf, other]: Title: Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra, Dmitry Bogdanov

Comments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[3] arXiv:2102.00291 [pdf, other]: Title: Speech Recognition by Simply Fine-tuning BERT

Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[4] arXiv:2102.00313 [pdf, other]: Title: Cortical Features for Defense Against Adversarial Audio Attacks

Ilya Kavalerov, Ruijie Zheng, Wojciech Czaja, Rama Chellappa

Comments: Co-author legal name changed

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5] arXiv:2102.00382 [pdf, other]: Title: Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks

Ruchit Agrawal, Daniel Wolff, Simon Dixon

Comments: ICASSP 2021 camera-ready version. Copyrights belong to IEEE

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:2102.00429 [pdf, other]: Title: High Fidelity Speech Regeneration with Application to Speech Enhancement

Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7] arXiv:2102.00550 [pdf, other]: Title: Boosting the Predictive Accurary of Singer Identification Using Discrete Wavelet Transform For Feature Extraction

Victoire Djimna Noyum, Younous Perieukeu Mofenjou, Cyrille Feudjio, Alkan Göktug, Ernest Fokoué

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:2102.00616 [pdf, other]: Title: Neural Network architectures to classify emotions in Indian Classical Music

Uddalok Sarkar, Sayan Nag, Medha Basu, Archi Banerjee, Shankha Sanyal, Ranjan Sengupta, Dipak Ghosh

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[9] arXiv:2102.00851 [pdf, other]: Title: Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Chenpeng Du, Kai Yu

Comments: Accepted to Interspeech 2021

Subjects: Sound (cs.SD)
[10] arXiv:2102.01133 [pdf, other]: Title: Deep Music Information Dynamics

Shlomo Dubnov

Journal-ref: The 2020 Joint Conference on AI Music Creativity, October 19-23, 2020, Royal Institute of Technology (KTH), Stockholm, Sweden

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11] arXiv:2102.01243 [pdf, other]: Title: PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation

Yuan Gong, Yu-An Chung, James Glass

Comments: Published in IEEE/ACM Transactions on Audio Speech and Language Processing. Code at this https URL

Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3292-3306, 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12] arXiv:2102.01547 [pdf, other]: Title: WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

Zhuoyuan Yao, Di Wu, Xiong Wang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei

Comments: 5 pages, 2 figures, 4 tables

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[13] arXiv:2102.01640 [pdf, other]: Title: SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer

Pramit Saha, Debasish Ray Mohapatra, Sidney Fels

Comments: 2 pages, 1 figure

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[14] arXiv:2102.01692 [pdf, other]: Title: Generacion de voces artificiales infantiles en castellano con acento costarricense

Ana Lilia Alvarez-Blanco, Eugenia Cordoba-Warner, Marvin Coto-Jimenez, Vivian Fallas-Lopez, Maribel Morales Rodriguez

Comments: 12 pages, in Spanish

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[15] arXiv:2102.01754 [pdf, other]: Title: LSSED: a large-scale dataset and benchmark for speech emotion recognition

Weiquan Fan, Xiangmin Xu, Xiaofen Xing, Weidong Chen, Dongyan Huang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[16] arXiv:2102.01760 [pdf, other]: Title: A Speaker Verification Backend with Robust Performance across Conditions

Luciana Ferrer, Mitchell McLaren, Niko Brummer

Journal-ref: Computer Speech and Language, Volume 71, 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[17] arXiv:2102.01813 [pdf, other]: Title: Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

Mingke Xu, Fan Zhang, Xiaodong Cui, Wei Zhang

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18] arXiv:2102.01927 [pdf, other]: Title: Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

Keisuke Imoto, Sakiko Mishima, Yumi Arai, Reishi Kondo

Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2006.15253

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2102.01930 [pdf, other]: Title: General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[20] arXiv:2102.01991 [pdf, other]: Title: Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Shengkui Zhao, Hao Wang, Trung Hieu Nguyen, Bin Ma

Comments: 5 pages, 2 figures, 4 tables, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:2102.01993 [pdf, html, other]: Title: Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Shengkui Zhao, Trung Hieu Nguyen, Bin Ma

Comments: 5 pages, 4 figures, 2 tables, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22] arXiv:2102.02028 [pdf, other]: Title: Music source separation conditioned on 3D point clouds

Francesc Lluís, Vasileios Chatziioannou, Alex Hofmann

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[23] arXiv:2102.02063 [pdf, other]: Title: Acoustic Structure Inverse Design and Optimization Using Deep Learning

Xuecong Sun, Han Jia, Yuzhen Yang, Han Zhao, Yafeng Bi, Zhaoyong Sun, Jun Yang

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Applied Physics (physics.app-ph)
[24] arXiv:2102.02074 [pdf, other]: Title: Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[25] arXiv:2102.02282 [pdf, other]: Title: Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks

Bruno Di Giorgi, Matthias Mauch, Mark Levy

Comments: 7 pages, 5 figures, Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020

Journal-ref: Proceedings of the 21st International Society for Music Information Retrieval Conference (2020) 216-222

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[26] arXiv:2102.02417 [pdf, other]: Title: Audio Adversarial Examples: Attacks Using Vocal Masks

Kai Yuan Tay, Lynnette Ng, Wei Han Chua, Lucerne Loke, Danqi Ye, Melissa Chua

Comments: 9 pages, 1 figure, 2 tables. Submitted to COLING2020

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[27] arXiv:2102.02640 [pdf, other]: Title: Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Gang Min, Xiongwei Zhang, Xia Zou, Xiangyang Liu

Comments: 6 pages

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[28] arXiv:2102.02917 [pdf, other]: Title: Chord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute Prediction

Allison Lahnala, Gauri Kambhatla, Jiajun Peng, Matthew Whitehead, Gillian Minnehan, Eric Guldan, Jonathan K. Kummerfeld, Anıl Çamcı, Rada Mihalcea

Comments: 16 pages, accepted to EvoMUSART

Journal-ref: Computational Intelligence in Music, Sound, Art and Design, 10th International Conference, EvoMUSART 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[29] arXiv:2102.02964 [pdf, other]: Title: Diversity-Robust Acoustic Feature Signatures Based on Multiscale Fractal Dimension for Similarity Search of Environmental Sounds

Motohiro Sunouchi, Masaharu Yoshioka

Comments: 15 pages, 14 figures

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[30] arXiv:2102.03049 [pdf, other]: Title: Benchmarking of eight recurrent neural network variants for breath phase and adventitious sound detection on a self-developed open-access lung sound database-HF_Lung_V1

Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Chao-Jung Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Jack Hsiao, Chung-Wei Chen, Li-Chin Chen, Yen-Chun Lai, Bi-Fang Hsu, Nian-Jhen Lin, Wan-Lin Tsai, Yi-Lin Wu, Tzu-Ling Tseng, Ching-Ting Tseng, Yi-Tsun Chen, Feipei Lai

Comments: 48 pages, 8 figures. Accepted by PLoS One

Journal-ref: PLoS ONE, 2021, 16(7): e0254134

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[31] arXiv:2102.03055 [pdf, other]: Title: Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR

Ruizhi Li, Gregory Sell, Hynek Hermansky

Comments: Accepted at IEEE SLT 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[32] arXiv:2102.03170 [pdf, other]: Title: White-box Audio VST Effect Programming

Christopher Mitcheltree, Hideki Koike

Comments: The latest version of the system is to appear at EvoMUSART 2021 as a full paper. Audio samples of the latest system can be listened to at this https URL

Journal-ref: 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[33] arXiv:2102.03207 [pdf, other]: Title: Real-time Denoising and Dereverberation with Tiny Recurrent U-Net

Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee

Comments: 5 pages, 2 figures, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). arXiv admin note: text overlap with arXiv:2006.00687

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2102.03229 [pdf, other]: Title: Multi-Task Self-Supervised Pre-Training for Music Classification

Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang

Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35] arXiv:2102.03868 [pdf, other]: Title: U-vectors: Generating clusterable speaker embedding from unlabeled data

M. F. Mridha, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md. Abdul Hamid, Md. Rashedul Islam, Yutaka Watanobe

Comments: 18 pages, 7 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36] arXiv:2102.03957 [pdf, other]: Title: Extracting the Auditory Attention in a Dual-Speaker Scenario from EEG using a Joint CNN-LSTM Model

Ivine Kuruvila, Jan Muncke, Eghart Fischer, Ulrich Hoppe

Comments: 18 pages, 6 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[37] arXiv:2102.04040 [pdf, other]: Title: LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu

Comments: Accepted to ICASSP 21

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38] arXiv:2102.04056 [pdf, other]: Title: Speaker and Direction Inferred Dual-channel Speech Separation

Chenxing Li, Jiaming Xu, Nima Mesgarani, Bo Xu

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2102.04062 [pdf, other]: Title: An Update on a Progressively Expanded Database for Automated Lung Sound Analysis

Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Jack Hsiao, Chung-Wei Chen, Feipei Lai

Comments: Under review, 14 pages, 5 figures, 3 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[40] arXiv:2102.04198 [pdf, other]: Title: ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network

Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li

Comments: 5 pages, 3 figures, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:2102.04429 [pdf, other]: Title: Federated Acoustic Modeling For Automatic Speech Recognition

Xiaodong Cui, Songtao Lu, Brian Kingsbury

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Distributed, Parallel, and Cluster Computing (cs.DC); Audio and Speech Processing (eess.AS)
[42] arXiv:2102.04588 [pdf, other]: Title: A comparative study of two-dimensional vocal tract acoustic modeling based on Finite-Difference Time-Domain methods

Debasish Ray Mohapatra, Victor Zappi, Sidney Fels

Comments: 4 pages, 3 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[43] arXiv:2102.04680 [pdf, other]: Title: TräumerAI: Dreaming Music with StyleGAN

Dasaem Jeong, Seungheon Doh, Taegyun Kwon

Comments: presented in NeurIPS Workshop 2020: Machine Learning for Creativity and Design

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44] arXiv:2102.04880 [pdf, other]: Title: Diagnosis of COVID-19 and Non-COVID-19 Patients by Classifying Only a Single Cough Sound

Masoud Maleki

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Optimization and Control (math.OC)
[45] arXiv:2102.04945 [pdf, other]: Title: On permutation invariant training for speech source separation

Xiaoyu Liu, Jordi Pons

Comments: In proceedings of ICASSP2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[46] arXiv:2102.05151 [pdf, other]: Title: Enhancing Audio Augmentation Methods with Consistency Learning

Turab Iqbal, Karim Helwani, Arvindh Krishnaswamy, Wenwu Wang

Comments: Accepted to 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47] arXiv:2102.05225 [pdf, other]: Title: Exploring Automatic COVID-19 Diagnosis via voice and symptoms from Crowdsourced Data

Jing Han, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Cecilia Mascolo

Comments: 5 pages, 3 figures, 2 tables, Accepted for publication at ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48] arXiv:2102.05288 [pdf, other]: Title: Sound Event Detection Based on Curriculum Learning Considering Learning Difficulty of Events

Noriyuki Tonami, Keisuke Imoto, Yuki Okamoto, Takahiro Fukumori, Yoichi Yamashita

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD)
[49] arXiv:2102.05630 [pdf, other]: Title: Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning

Giuseppe Ruggiero, Enrico Zovato, Luigi Di Caro, Vincent Pollet

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[50] arXiv:2102.05749 [pdf, other]: Title: Self-Supervised VQ-VAE for One-Shot Music Style Transfer

Ondřej Cífka, Alexey Ozerov, Umut Şimşekli, Gaël Richard

Comments: ICASSP 2021. Website: this https URL

Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (2021) 96-100

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Total of 183 entries : 1-50 51-100 101-150 151-183

Showing up to 50 entries per page: fewer | more | all