Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for October 2021

Total of 322 entries : 51-150 101-200 201-300 301-322
Showing up to 100 entries per page: fewer | more | all
[51] arXiv:2110.05087 [pdf, other]
Title: A Multi-Resolution Front-End for End-to-End Speech Anti-Spoofing
Wei Liu, Meng Sun, Xiongwei Zhang, Hugo Van hamme, Thomas Fang Zheng
Comments: submitted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52] arXiv:2110.05580 [pdf, other]
Title: vocadito: A dataset of solo vocals with $f_0$, note, and lyric annotations
Rachel M. Bittner, Katherine Pasalo, Juan José Bosch, Gabriel Meseguer-Brocal, David Rubinstein
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53] arXiv:2110.05587 [pdf, other]
Title: Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes
Karn N. Watcharasupat, Alexander Lerch
Comments: Submitted to the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[54] arXiv:2110.05713 [pdf, other]
Title: Foster Strengths and Circumvent Weaknesses: a Speech Enhancement Framework with Two-branch Collaborative Learning
Wenxin Tai, Jiajia Li, Yixiang Wang, Tian Lan, Qiao Liu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2110.05765 [pdf, other]
Title: Music Sentiment Transfer
Miles Sigel, Michael Zhou, Jiebo Luo
Comments: NSF REU: Computational Methods for Understanding Music, Media, and Minds, University of Rochester
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[56] arXiv:2110.05777 [pdf, other]
Title: Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
Zhengyang Chen, Sanyuan Chen, Yu Wu, Yao Qian, Chengyi Wang, Shujie Liu, Yanmin Qian, Michael Zeng
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57] arXiv:2110.05798 [pdf, other]
Title: Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara, Jason Li, Boris Ginsburg
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[58] arXiv:2110.05866 [pdf, other]
Title: MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech
Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[59] arXiv:2110.05966 [pdf, other]
Title: Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training
Changsheng Quan, Xiaofei Li
Comments: accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:2110.05975 [pdf, other]
Title: Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays
Chengdong Liang, Yijiang Chen, Jiadi Yao, Xiao-Lei Zhang
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61] arXiv:2110.06100 [pdf, other]
Title: Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Zhongjie Ye, Helin Wang, Dongchao Yang, Yuexian Zou
Comments: 5 pages, 1 figure, accepted by DCASE 2021 workshop
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[62] arXiv:2110.06123 [pdf, other]
Title: COVID-19 Diagnosis from Cough Acoustics using ConvNets and Data Augmentation
Saranga Kingkor Mahanta, Darsh Kaushik, Shubham Jain, Hoang Van Truong, Koushik Guha
Comments: DiCOVA, top 1st, This work has been submitted to the IEEE for possible publication
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63] arXiv:2110.06280 [pdf, other]
Title: S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations
Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda
Comments: Submitted to ICASSP 2022. Code available at: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64] arXiv:2110.06323 [pdf, other]
Title: An Annihilating Filter-Based DOA Estimation for Uniform Linear Array
Son Phan, Lam Pham
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65] arXiv:2110.06371 [pdf, other]
Title: Algorithmic Composition by Autonomous Systems with Multiple Time-Scales
Risto Holopainen
Comments: 28 pages, 3 figures. Submitted to Divergence Press
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Adaptation and Self-Organizing Systems (nlin.AO)
[66] arXiv:2110.06467 [pdf, other]
Title: Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement
Guochen Yu, Andong Li, Chengshi Zheng, Yinuo Guo, Yutian Wang, Hui Wang
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[67] arXiv:2110.06494 [pdf, other]
Title: Music Source Separation with Deep Equilibrium Models
Yuichiro Koyama, Naoki Murata, Stefan Uhlich, Giorgio Fabbro, Shusuke Takahashi, Yuki Mitsufuji
Comments: 5 pages, 4 figures, accepted for publication in IEEE ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68] arXiv:2110.06501 [pdf, other]
Title: Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
Yuichiro Koyama, Kazuhide Shigemi, Masafumi Takahashi, Kazuki Shimada, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji
Comments: 5 pages, 2 figures, accepted for publication in IEEE ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69] arXiv:2110.06525 [pdf, other]
Title: Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
Bo-Yu Chen, Wei-Han Hsu, Wei-Hsiang Liao, Marco A. Martínez Ramírez, Yuki Mitsufuji, Yi-Hsuan Yang
Comments: To be published at ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[70] arXiv:2110.06534 [pdf, other]
Title: Simple Attention Module based Speaker Verification with Iterative noisy label detection
Xiaoyi Qin, Na Li, Chao Weng, Dan Su, Ming Li
Comments: submitted to ICASSP2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71] arXiv:2110.06543 [pdf, other]
Title: EIHW-MTG DiCOVA 2021 Challenge System Report
Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72] arXiv:2110.06565 [pdf, other]
Title: Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning
Li Zhang, Qing Wang, Lei Xie
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2110.06634 [pdf, other]
Title: End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network
Yina Guo, Xiaofei Zhang, Zhenying Gong, Anhong Wang, Wenwu Wang
Comments: 12 pages, 13 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[74] arXiv:2110.06707 [pdf, html, other]
Title: Singer separation for karaoke content generation
Hsuan-Yu Lin, Xuanjun Chen, Jyh-Shing Roger Jang
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[75] arXiv:2110.06999 [pdf, other]
Title: Study of positional encoding approaches for Audio Spectrogram Transformers
Leonardo Pepino, Pablo Riera, Luciana Ferrer
Comments: Submitted to ICASSP 2022. 5 pages, 3 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[76] arXiv:2110.07027 [pdf, other]
Title: Comparison of SVD and factorized TDNN approaches for speech to text
Jeffrey Josanne Michael, Nagendra Kumar Goel, Navneeth K, Jonas Robertson, Shravan Mishra
Comments: 4 pages, 1 figure, 3 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[77] arXiv:2110.07210 [pdf, other]
Title: Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Haitong Zhang, Yue Lin
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[78] arXiv:2110.07311 [pdf, other]
Title: SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs
Adrián Barahona-Ríos, Tom Collins
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[79] arXiv:2110.07313 [pdf, other]
Title: Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf
Comments: 4 pages. Submitted to ICASSP in Oct 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[80] arXiv:2110.07393 [pdf, other]
Title: M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2110.07607 [pdf, other]
Title: HumBugDB: A Large-scale Acoustic Mosquito Dataset
Ivan Kiskin, Marianne Sinka, Adam D. Cobb, Waqas Rafique, Lawrence Wang, Davide Zilli, Benjamin Gutteridge, Rinita Dam, Theodoros Marinos, Yunpeng Li, Dickson Msaky, Emmanuel Kaindoa, Gerard Killeen, Eva Herreros-Moya, Kathy J. Willis, Stephen J. Roberts
Comments: Accepted at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. 10 pages main, 39 pages including appendix. This paper accompanies the dataset found at this https URL with corresponding code at this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[82] arXiv:2110.08090 [pdf, other]
Title: Using DeepProbLog to perform Complex Event Processing on an Audio Stream
Marc Roig Vilamala, Tianwei Xing, Harrison Taylor, Luis Garcia, Mani Srivastava, Lance Kaplan, Alun Preece, Angelika Kimmig, Federico Cerutti
Comments: 8 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[83] arXiv:2110.08213 [pdf, other]
Title: Towards Identity Preserving Normal to Dysarthric Voice Conversion
Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[84] arXiv:2110.08352 [pdf, other]
Title: Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet
Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li, Pierce Chuang, Xiaohui Zhang, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[85] arXiv:2110.08437 [pdf, other]
Title: NN3A: Neural Network supported Acoustic Echo Cancellation, Noise Suppression and Automatic Gain Control for Real-Time Communications
Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu
Comments: submitted to ICASSP2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2110.08439 [pdf, other]
Title: Controllable Multichannel Speech Dereverberation based on Deep Neural Networks
Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu
Comments: submitted to ICASSP2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2110.08634 [pdf, other]
Title: Towards Robust Waveform-Based Acoustic Models
Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[88] arXiv:2110.08731 [pdf, other]
Title: Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms
Tien-Hong Lo, Yao-Ting Sung, Berlin Chen
Comments: 7 pages, 2 figures, 4 tables, accepted to Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[89] arXiv:2110.08821 [pdf, other]
Title: Storage and Authentication of Audio Footage for IoAuT Devices Using Distributed Ledger Technology
Srivatsav Chenna, Nils Peters
Comments: 11 pages, 3 Figures, 1 code listing
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[90] arXiv:2110.08895 [pdf, other]
Title: DECAR: Deep Clustering for learning general-purpose Audio Representations
Sreyan Ghosh, Sandesh V Katta, Ashish Seth, S. Umesh
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[91] arXiv:2110.09103 [pdf, other]
Title: LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech
Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda
Comments: Submitted to ICASSP 2022. Code available at: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[92] arXiv:2110.09116 [pdf, other]
Title: Real Additive Margin Softmax for Speaker Verification
Lantian Li, Ruiqian Nai, Dong Wang
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[93] arXiv:2110.09121 [pdf, other]
Title: KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke
Xiaobin Zhuang, Huiran Yu, Weifeng Zhao, Tao Jiang, Peng Hu
Comments: To be published in Proc. Interspeech 2022, Incheon, South Korea
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2110.09127 [pdf, other]
Title: SpecTNT: a Time-Frequency Transformer for Music Audio
Wei-Tsung Lu, Ju-Chiang Wang, Minz Won, Keunwoo Choi, Xuchen Song
Comments: 6 pages
Journal-ref: International Society for Music Information Retrieval (ISMIR) 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[95] arXiv:2110.09223 [pdf, other]
Title: Learning Models for Query by Vocal Percussion: A Comparative Study
Alejandro Delgado, SkoT McDonald, Ning Xu, Charalampos Saitis, Mark Sandler
Comments: Published in proceedings of the International Computer Music Conference (ICMC) 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2110.09239 [pdf, other]
Title: EIHW-MTG: Second DiCOVA Challenge System Report
Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[97] arXiv:2110.09441 [pdf, other]
Title: FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
Zhenyu Zhang, Yewei Gu, Xiaowei Yi, Xianfeng Zhao
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[98] arXiv:2110.09598 [pdf, other]
Title: Adversarial Domain Adaptation with Paired Examples for Acoustic Scene Classification on Different Recording Devices
Stanisław Kacprzak, Konrad Kowalczyk
Comments: Accepted for publication in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021
Journal-ref: 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021, pp. 1030-103
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[99] arXiv:2110.09600 [pdf, other]
Title: Who calls the shots? Rethinking Few-Shot Learning for Audio
Yu Wang, Nicholas J. Bryan, Justin Salamon, Mark Cartwright, Juan Pablo Bello
Comments: WASPAA 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100] arXiv:2110.09605 [pdf, other]
Title: Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks
Marco Comunità, Huy Phan, Joshua D. Reiss
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[101] arXiv:2110.09698 [pdf, other]
Title: Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge
Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
Comments: 5 pages, 3 figures; accepted by Interspeech 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[102] arXiv:2110.09720 [pdf, other]
Title: Rep Works in Speaker Verification
Yufeng Ma, Miao Zhao, Yiwei Ding, Yu Zheng, Min Liu, Minqiang Xu
Comments: submitted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2110.09780 [pdf, other]
Title: Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation
Fengyu Yang, Jian Luan, Yujun Wang
Comments: accepted by ICASSP2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104] arXiv:2110.09784 [pdf, other]
Title: SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass
Comments: Accepted at AAAI2022. Code at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[105] arXiv:2110.09814 [pdf, other]
Title: Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition
Haozhe Chen, Weiming Zhang, Kunlin Liu, Kejiang Chen, Han Fang, Nenghai Yu
Comments: 5 pages, 2 figures. Acceptted by 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[106] arXiv:2110.10010 [pdf, other]
Title: Temporal separation of whale vocalizations from background oceanic noise using a power calculation
Jacques van Wyk, Jaco Versfeld, Johan du Preez
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2110.10103 [pdf, other]
Title: Continual self-training with bootstrapped remixing for speech enhancement
Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar
Comments: To appear in Proc. ICASSP 2022, May 22-27, 2022, Singapore
Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[108] arXiv:2110.10402 [pdf, other]
Title: An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR
Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi
Comments: Accepted to APSIPA 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[109] arXiv:2110.10491 [pdf, other]
Title: A Study On Data Augmentation In Voice Anti-Spoofing
Ariel Cohen, Inbal Rimon, Eran Aflalo, Haim Permuter
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[110] arXiv:2110.10593 [pdf, other]
Title: Progressive Learning for Stabilizing Label Selection in Speech Separation with Mapping-based Method
Chenyang Gao, Yue Gu, Ivan Marsic
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[111] arXiv:2110.10739 [pdf, other]
Title: Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training
Aswin Sivaraman, Scott Wisdom, Hakan Erdogan, John R. Hershey
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2110.10757 [pdf, other]
Title: TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang
Comments: Accepted for publication in ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113] arXiv:2110.10983 [pdf, other]
Title: Optimizing Multi-Taper Features for Deep Speaker Verification
Xuechen Liu, Md Sahidullah, Tomi Kinnunen
Comments: To appear in IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[114] arXiv:2110.11499 [pdf, other]
Title: Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello
Comments: Copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[115] arXiv:2110.11807 [pdf, other]
Title: Signal-Envelope: A C++ library with Python bindings for temporal envelope estimation
Carlos Tarjano, Valdecy Pereira
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2110.11844 [pdf, other]
Title: Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang
Comments: Accepted for publication in INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2110.12138 [pdf, other]
Title: Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding
Wei Wang, Shuo Ren, Yao Qian, Shujie Liu, Yu Shi, Yanmin Qian, Michael Zeng
Comments: submitted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2110.12539 [pdf, other]
Title: Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech
Marek Strong, Jonas Rohnke, Antonio Bonafonte, Mateusz Łajszczak, Trevor Wood
Comments: 5 pages, 5 figures, accepted at IberSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[119] arXiv:2110.12561 [pdf, other]
Title: Lhotse: a speech data representation library for the modern deep learning ecosystem
Piotr Żelasko, Daniel Povey, Jan "Yenda" Trmal, Sanjeev Khudanpur
Comments: Accepted for presentation at NeurIPS 2021 Data-Centric AI (DCAI) Workshop
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2110.12612 [pdf, other]
Title: DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[121] arXiv:2110.12778 [pdf, other]
Title: A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments
Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
Comments: arXiv admin note: text overlap with arXiv:2105.04488
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[122] arXiv:2110.12855 [pdf, other]
Title: Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience
Wei-Tsung Lu, Meng-Hsuan Wu, Yuh-Ming Chiu, Li Su
Comments: 9 pages, Proceedings of the 29th ACM International Conference on Multimedia
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[123] arXiv:2110.13071 [pdf, other]
Title: Unsupervised Source Separation By Steering Pretrained Music Models
Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[124] arXiv:2110.13130 [pdf, other]
Title: Multichannel Speech Enhancement without Beamforming
Asutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang
Comments: Accepted for publication in ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2110.13323 [pdf, other]
Title: Deep Learning Tools for Audacity: Helping Researchers Expand the Artist's Toolkit
Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Dmitry Vedenko, Bryan Pardo
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[126] arXiv:2110.13465 [pdf, other]
Title: CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization
Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Lin Zhang, Yantao Ji, Junhai Xu, Xugang Lu
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127] arXiv:2110.13589 [pdf, other]
Title: AQP: An Open Modular Python Platform for Objective Speech and Audio Quality Metrics
Jack Geraghty, Jiazheng Li, Alessandro Ragano, Andrew Hines
Comments: 6 pages, 3 figures, accepted and presented at ACM MMSys22, June, 2022, Athlone, Ireland
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128] arXiv:2110.14131 [pdf, other]
Title: Temporal Knowledge Distillation for On-device Audio Classification
Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang
Comments: ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[129] arXiv:2110.14422 [pdf, other]
Title: Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang, Dimche Kostadinov, Damian Borth
Comments: Published in: 2022 International Joint Conference on Neural Networks (IJCNN)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[130] arXiv:2110.14425 [pdf, other]
Title: Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data
Pablo Gimeno, Victoria Mingote, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
Journal-ref: IEEE Signal Processing Letters, vol. 28, pp. 1135-1139, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[131] arXiv:2110.14434 [pdf, other]
Title: Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of Audio Signals
Axel Marmoret, Florian Voorwinden, Valentin Leplat, Jérémy E. Cohen, Frédéric Bimbot
Comments: 4 pages, 2 figures, 1 table, 1 algorithm. To be published in GRETSI2022. The algorithm is available at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Numerical Analysis (math.NA)
[132] arXiv:2110.14437 [pdf, other]
Title: Exploring single-song autoencoding schemes for audio-based music structure analysis
Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot
Comments: 4 pages, 4 figures, 2 tables. Rejected from ICASSP 2022, an extended version is available at arXiv:2202.04981
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[133] arXiv:2110.14513 [pdf, other]
Title: Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
Hyeong-Seok Choi, Juheon Lee, Wansoo Kim, Jie Hwan Lee, Hoon Heo, Kyogu Lee
Comments: Neural Information Processing Systems (NeurIPS) 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[134] arXiv:2110.15316 [pdf, other]
Title: VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge
Yougen Yuan, Zhiqiang Lv, Shen Huang, Pengfei Hu
Comments: 6 pages, in Chinese language, 3 tables, NCMMC 2021 conference paper
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2110.15430 [pdf, other]
Title: Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang
Comments: 5 pages, 1 figure, submitted to ICASSP 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[136] arXiv:2110.15729 [pdf, other]
Title: Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Mohd Abbas Zaidi, Beomseok Lee, Sangha Kim, Chanwoo Kim
Comments: 5 pages, 3 figures, 1 table
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[137] arXiv:2110.15792 [pdf, other]
Title: VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
Alejandro Pérez-González-de-Martos, Albert Sanchis, Alfons Juan
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2110.00165 (cross-list from eess.AS) [pdf, other]
Title: Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning
Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He
Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[139] arXiv:2110.00275 (cross-list from eess.AS) [pdf, other]
Title: SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Ngoc Khanh Nguyen, Douglas L. Jones, Woon-Seng Gan
Comments: (c) 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1749-1762, 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[140] arXiv:2110.00508 (cross-list from cs.LG) [pdf, other]
Title: An Ensemble-based Multi-Criteria Decision Making Method for COVID-19 Cough Classification
Nihad Karim Chowdhury, Muhammad Ashad Kabir, Md. Muhtadir Rahman
Comments: 21 pages, 6 figures
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2110.00745 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression
Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma
Comments: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)
Journal-ref: Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 656-660
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[142] arXiv:2110.00797 (cross-list from eess.AS) [pdf, other]
Title: Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition
Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[143] arXiv:2110.01001 (cross-list from cs.MM) [pdf, other]
Title: Multimodal Fusion Based Attentive Networks for Sequential Music Recommendation
Kunal Vaswani, Yudhik Agrawal, Vinoo Alluri
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2110.01077 (cross-list from eess.AS) [pdf, other]
Title: Multi-task Voice Activated Framework using Self-supervised Learning
Shehzeen Hussain, Van Nguyen, Shuhua Zhang, Erik Visser
Comments: Accepted at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[145] arXiv:2110.01164 (cross-list from eess.AS) [pdf, other]
Title: Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks
Zhaojie Luo, Shoufeng Lin, Rui Liu, Jun Baba, Yuichiro Yoshikawa, Ishiguro Hiroshi
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[146] arXiv:2110.01177 (cross-list from eess.AS) [pdf, other]
Title: The Second DiCOVA Challenge: Dataset and performance analysis for COVID-19 diagnosis using acoustics
Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, Sriram Ganapathy
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[147] arXiv:2110.01422 (cross-list from eess.AS) [pdf, other]
Title: Individualized sound pressure equalization in hearing devices exploiting an electro-acoustic model
Henning Schepker, Reinhild Rohden, Florian Denk, Birger Kollmeier, Matthias Blau, Simon Doclo
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[148] arXiv:2110.01436 (cross-list from eess.AS) [pdf, other]
Title: WaveBeat: End-to-end beat and downbeat tracking in the time domain
Christian J. Steinmetz, Joshua D. Reiss
Comments: To appear at the 151st AES Convention
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[149] arXiv:2110.01763 (cross-list from eess.AS) [pdf, other]
Title: DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
Chandan K A Reddy, Vishak Gopal, Ross Cutler
Comments: arXiv admin note: substantial text overlap with arXiv:2010.15258
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[150] arXiv:2110.02077 (cross-list from eess.AS) [pdf, other]
Title: Deep Optimization of Parametric IIR Filters for Audio Equalization
Giovanni Pepe (1 and 2), Leonardo Gabrielli (1), Stefano Squartini (1), Carlo Tripodi (2), Nicolò Strozzi (2) ((1) Università Politecnica delle Marche, (2) ASK Industries S.p.A.)
Comments: submitted to IEEE/ACM TASLP on 12 May 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Total of 322 entries : 51-150 101-200 201-300 301-322
Showing up to 100 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack