Sound

Authors and titles for October 2021

Total of 322 entries : 1-50 51-100 101-150 151-200 201-250 251-300 ... 301-322

Showing up to 50 entries per page: fewer | more | all

[101] arXiv:2110.09698 [pdf, other]: Title: Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Mutian He, Jingzhou Yang, Lei He, Frank K. Soong

Comments: 5 pages, 3 figures; accepted by Interspeech 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[102] arXiv:2110.09720 [pdf, other]: Title: Rep Works in Speaker Verification

Yufeng Ma, Miao Zhao, Yiwei Ding, Yu Zheng, Min Liu, Minqiang Xu

Comments: submitted to ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2110.09780 [pdf, other]: Title: Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation

Fengyu Yang, Jian Luan, Yujun Wang

Comments: accepted by ICASSP2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104] arXiv:2110.09784 [pdf, other]: Title: SSAST: Self-Supervised Audio Spectrogram Transformer

Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

Comments: Accepted at AAAI2022. Code at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[105] arXiv:2110.09814 [pdf, other]: Title: Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition

Haozhe Chen, Weiming Zhang, Kunlin Liu, Kejiang Chen, Han Fang, Nenghai Yu

Comments: 5 pages, 2 figures. Acceptted by 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[106] arXiv:2110.10010 [pdf, other]: Title: Temporal separation of whale vocalizations from background oceanic noise using a power calculation

Jacques van Wyk, Jaco Versfeld, Johan du Preez

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2110.10103 [pdf, other]: Title: Continual self-training with bootstrapped remixing for speech enhancement

Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

Comments: To appear in Proc. ICASSP 2022, May 22-27, 2022, Singapore

Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[108] arXiv:2110.10402 [pdf, other]: Title: An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Comments: Accepted to APSIPA 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[109] arXiv:2110.10491 [pdf, other]: Title: A Study On Data Augmentation In Voice Anti-Spoofing

Ariel Cohen, Inbal Rimon, Eran Aflalo, Haim Permuter

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[110] arXiv:2110.10593 [pdf, other]: Title: Progressive Learning for Stabilizing Label Selection in Speech Separation with Mapping-based Method

Chenyang Gao, Yue Gu, Ivan Marsic

Comments: Submitted to Interspeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[111] arXiv:2110.10739 [pdf, other]: Title: Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

Aswin Sivaraman, Scott Wisdom, Hakan Erdogan, John R. Hershey

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2110.10757 [pdf, other]: Title: TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

Comments: Accepted for publication in ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113] arXiv:2110.10983 [pdf, other]: Title: Optimizing Multi-Taper Features for Deep Speaker Verification

Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Comments: To appear in IEEE Signal Processing Letters

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[114] arXiv:2110.11499 [pdf, other]: Title: Wav2CLIP: Learning Robust Audio Representations From CLIP

Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

Comments: Copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[115] arXiv:2110.11807 [pdf, other]: Title: Signal-Envelope: A C++ library with Python bindings for temporal envelope estimation

Carlos Tarjano, Valdecy Pereira

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2110.11844 [pdf, other]: Title: Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network

Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

Comments: Accepted for publication in INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2110.12138 [pdf, other]: Title: Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding

Wei Wang, Shuo Ren, Yao Qian, Shujie Liu, Yu Shi, Yanmin Qian, Michael Zeng

Comments: submitted to ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2110.12539 [pdf, other]: Title: Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech

Marek Strong, Jonas Rohnke, Antonio Bonafonte, Mateusz Łajszczak, Trevor Wood

Comments: 5 pages, 5 figures, accepted at IberSPEECH 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[119] arXiv:2110.12561 [pdf, other]: Title: Lhotse: a speech data representation library for the modern deep learning ecosystem

Piotr Żelasko, Daniel Povey, Jan "Yenda" Trmal, Sanjeev Khudanpur

Comments: Accepted for presentation at NeurIPS 2021 Data-Centric AI (DCAI) Workshop

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2110.12612 [pdf, other]: Title: DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[121] arXiv:2110.12778 [pdf, other]: Title: A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments

Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis

Comments: arXiv admin note: text overlap with arXiv:2105.04488

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[122] arXiv:2110.12855 [pdf, other]: Title: Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience

Wei-Tsung Lu, Meng-Hsuan Wu, Yuh-Ming Chiu, Li Su

Comments: 9 pages, Proceedings of the 29th ACM International Conference on Multimedia

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[123] arXiv:2110.13071 [pdf, other]: Title: Unsupervised Source Separation By Steering Pretrained Music Models

Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo

Comments: Submitted to ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[124] arXiv:2110.13130 [pdf, other]: Title: Multichannel Speech Enhancement without Beamforming

Asutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

Comments: Accepted for publication in ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2110.13323 [pdf, other]: Title: Deep Learning Tools for Audacity: Helping Researchers Expand the Artist's Toolkit

Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Dmitry Vedenko, Bryan Pardo

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[126] arXiv:2110.13465 [pdf, other]: Title: CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization

Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Lin Zhang, Yantao Ji, Junhai Xu, Xugang Lu

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127] arXiv:2110.13589 [pdf, other]: Title: AQP: An Open Modular Python Platform for Objective Speech and Audio Quality Metrics

Jack Geraghty, Jiazheng Li, Alessandro Ragano, Andrew Hines

Comments: 6 pages, 3 figures, accepted and presented at ACM MMSys22, June, 2022, Athlone, Ireland

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128] arXiv:2110.14131 [pdf, other]: Title: Temporal Knowledge Distillation for On-device Audio Classification

Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang

Comments: ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[129] arXiv:2110.14422 [pdf, other]: Title: Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning

Shijun Wang, Dimche Kostadinov, Damian Borth

Comments: Published in: 2022 International Joint Conference on Neural Networks (IJCNN)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[130] arXiv:2110.14425 [pdf, other]: Title: Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data

Pablo Gimeno, Victoria Mingote, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

Journal-ref: IEEE Signal Processing Letters, vol. 28, pp. 1135-1139, 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[131] arXiv:2110.14434 [pdf, other]: Title: Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of Audio Signals

Axel Marmoret, Florian Voorwinden, Valentin Leplat, Jérémy E. Cohen, Frédéric Bimbot

Comments: 4 pages, 2 figures, 1 table, 1 algorithm. To be published in GRETSI2022. The algorithm is available at this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Numerical Analysis (math.NA)
[132] arXiv:2110.14437 [pdf, other]: Title: Exploring single-song autoencoding schemes for audio-based music structure analysis

Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot

Comments: 4 pages, 4 figures, 2 tables. Rejected from ICASSP 2022, an extended version is available at arXiv:2202.04981

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[133] arXiv:2110.14513 [pdf, other]: Title: Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Hyeong-Seok Choi, Juheon Lee, Wansoo Kim, Jie Hwan Lee, Hoon Heo, Kyogu Lee

Comments: Neural Information Processing Systems (NeurIPS) 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[134] arXiv:2110.15316 [pdf, other]: Title: VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge

Yougen Yuan, Zhiqiang Lv, Shen Huang, Pengfei Hu

Comments: 6 pages, in Chinese language, 3 tables, NCMMC 2021 conference paper

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2110.15430 [pdf, other]: Title: Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

Comments: 5 pages, 1 figure, submitted to ICASSP 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[136] arXiv:2110.15729 [pdf, other]: Title: Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems

Mohd Abbas Zaidi, Beomseok Lee, Sangha Kim, Chanwoo Kim

Comments: 5 pages, 3 figures, 1 table

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[137] arXiv:2110.15792 [pdf, other]: Title: VRAIN-UPV MLLP's system for the Blizzard Challenge 2021

Alejandro Pérez-González-de-Martos, Albert Sanchis, Alfons Juan

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2110.00165 (cross-list from eess.AS) [pdf, other]: Title: Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[139] arXiv:2110.00275 (cross-list from eess.AS) [pdf, other]: Title: SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection

Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Ngoc Khanh Nguyen, Douglas L. Jones, Woon-Seng Gan

Comments: (c) 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1749-1762, 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[140] arXiv:2110.00508 (cross-list from cs.LG) [pdf, other]: Title: An Ensemble-based Multi-Criteria Decision Making Method for COVID-19 Cough Classification

Nihad Karim Chowdhury, Muhammad Ashad Kabir, Md. Muhtadir Rahman

Comments: 21 pages, 6 figures

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2110.00745 (cross-list from eess.AS) [pdf, other]: Title: End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma

Comments: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)

Journal-ref: Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 656-660

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[142] arXiv:2110.00797 (cross-list from eess.AS) [pdf, other]: Title: Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[143] arXiv:2110.01001 (cross-list from cs.MM) [pdf, other]: Title: Multimodal Fusion Based Attentive Networks for Sequential Music Recommendation

Kunal Vaswani, Yudhik Agrawal, Vinoo Alluri

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2110.01077 (cross-list from eess.AS) [pdf, other]: Title: Multi-task Voice Activated Framework using Self-supervised Learning

Shehzeen Hussain, Van Nguyen, Shuhua Zhang, Erik Visser

Comments: Accepted at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[145] arXiv:2110.01164 (cross-list from eess.AS) [pdf, other]: Title: Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks

Zhaojie Luo, Shoufeng Lin, Rui Liu, Jun Baba, Yuichiro Yoshikawa, Ishiguro Hiroshi

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[146] arXiv:2110.01177 (cross-list from eess.AS) [pdf, other]: Title: The Second DiCOVA Challenge: Dataset and performance analysis for COVID-19 diagnosis using acoustics

Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, Sriram Ganapathy

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[147] arXiv:2110.01422 (cross-list from eess.AS) [pdf, other]: Title: Individualized sound pressure equalization in hearing devices exploiting an electro-acoustic model

Henning Schepker, Reinhild Rohden, Florian Denk, Birger Kollmeier, Matthias Blau, Simon Doclo

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[148] arXiv:2110.01436 (cross-list from eess.AS) [pdf, other]: Title: WaveBeat: End-to-end beat and downbeat tracking in the time domain

Christian J. Steinmetz, Joshua D. Reiss

Comments: To appear at the 151st AES Convention

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[149] arXiv:2110.01763 (cross-list from eess.AS) [pdf, other]: Title: DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

Chandan K A Reddy, Vishak Gopal, Ross Cutler

Comments: arXiv admin note: substantial text overlap with arXiv:2010.15258

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[150] arXiv:2110.02077 (cross-list from eess.AS) [pdf, other]: Title: Deep Optimization of Parametric IIR Filters for Audio Equalization

Giovanni Pepe (1 and 2), Leonardo Gabrielli (1), Stefano Squartini (1), Carlo Tripodi (2), Nicolò Strozzi (2) ((1) Università Politecnica delle Marche, (2) ASK Industries S.p.A.)

Comments: submitted to IEEE/ACM TASLP on 12 May 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Total of 322 entries : 1-50 51-100 101-150 151-200 201-250 251-300 ... 301-322

Showing up to 50 entries per page: fewer | more | all