Audio and Speech Processing

Authors and titles for May 2023

Total of 427 entries : 1-100 101-200 201-300 301-400 401-427

Showing up to 100 entries per page: fewer | more | all

[201] arXiv:2305.05599 (cross-list from cs.SD) [pdf, other]: Title: Inter-SubNet: Speech Enhancement with Subband Interaction

Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu, Yannan Wang, Shidong Shang, Helen Meng

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[202] arXiv:2305.05736 (cross-list from cs.SD) [pdf, other]: Title: VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation

Yuanda Wang, Hanqing Guo, Guangjing Wang, Bocheng Chen, Qiben Yan

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[203] arXiv:2305.05780 (cross-list from cs.SD) [pdf, other]: Title: Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks

Deniss Strods, Alan F. Smeaton

Comments: 7 pages, 4 figures, 4 tables. 34th Irish Signals and Systems Conferences, 13-14 June 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[204] arXiv:2305.06273 (cross-list from cs.CL) [pdf, other]: Title: Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup

Lei Kang, Lichao Zhang, Dazhi Jiang

Comments: Accepted to ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[205] arXiv:2305.06429 (cross-list from cs.SD) [pdf, other]: Title: Mispronunciation Detection of Basic Quranic Recitation Rules using Deep Learning

Ahmad Al Harere, Khloud Al Jallad

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[206] arXiv:2305.06594 (cross-list from cs.SD) [pdf, html, other]: Title: V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk

Comments: accepted at AAAI 2024, music samples available at this https URL

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[207] arXiv:2305.06701 (cross-list from cs.SD) [pdf, other]: Title: Extending Audio Masked Autoencoders Toward Audio Restoration

Zhi Zhong, Hao Shi, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

Comments: WASPAA this http URL 2023 this http URL use of this material is this http URL from IEEE must be obtained for all other uses,in any current or future media,including reprinting/republishing this material for advertising or promotional purposes, creating new collective works,for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[208] arXiv:2305.06806 (cross-list from cs.SD) [pdf, other]: Title: HappyQuokka System for ICASSP 2023 Auditory EEG Challenge

Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang

Comments: First Place in Task 2 of Auditory EEG decoding Challenge, which is part of ICASSP Signal Processing Grand Challenge (SPGC) 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[209] arXiv:2305.06908 (cross-list from cs.SD) [pdf, other]: Title: CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo

Comments: Accepted to ACM MM 2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[210] arXiv:2305.07132 (cross-list from cs.SD) [pdf, other]: Title: Tackling Interpretability in Audio Classification Networks with Non-negative Matrix Factorization

Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Gaël Richard, Florence d'Alché-Buc

Comments: Under submission at IEEE/ACM TASLP. arXiv admin note: text overlap with arXiv:2202.11479

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[211] arXiv:2305.07216 (cross-list from cs.LG) [pdf, html, other]: Title: Versatile audio-visual learning for emotion recognition

Lucas Goncalves, Seong-Gyun Leem, Wei-Cheng Lin, Berrak Sisman, Carlos Busso

Comments: 18 pages, 4 Figures, 3 tables (published at IEEE Transactions on Affective Computing)

Subjects: Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[212] arXiv:2305.07223 (cross-list from cs.SD) [pdf, html, other]: Title: Transavs: End-To-End Audio-Visual Segmentation With Transformer

Yuhang Ling, Yuxi Li, Zhenye Gan, Jiangning Zhang, Mingmin Chi, Yabiao Wang

Comments: 4 pages, 3 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[213] arXiv:2305.07243 (cross-list from cs.SD) [pdf, other]: Title: Better speech synthesis through scaling

James Betker

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[214] arXiv:2305.07347 (cross-list from cs.SD) [pdf, other]: Title: Music Rearrangement Using Hierarchical Segmentation

Christos Plachouras, Marius Miron

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[215] arXiv:2305.07389 (cross-list from cs.CL) [pdf, other]: Title: Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Emma O'Neill, Julie Carson-Berndsen

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[216] arXiv:2305.07447 (cross-list from cs.SD) [pdf, other]: Title: Universal Source Separation with Weakly Labelled Data

Qiuqiang Kong, Ke Chen, Haohe Liu, Xingjian Du, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Mark D. Plumbley

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[217] arXiv:2305.07455 (cross-list from cs.CL) [pdf, other]: Title: Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-yi Lee

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[218] arXiv:2305.07489 (cross-list from cs.SD) [pdf, html, other]: Title: Benchmarks and leaderboards for sound demixing tasks

Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[219] arXiv:2305.07499 (cross-list from cs.SD) [pdf, other]: Title: Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

Tobias Morocutti, Florian Schmid, Khaled Koutini, Gerhard Widmer

Comments: In Proceedings of the 31st European Signal Processing Conference, EUSIPCO 2023. Source Code available at: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[220] arXiv:2305.07828 (cross-list from cs.SD) [pdf, other]: Title: Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Yohei Kawaguchi

Comments: anomaly detection, acoustic condition monitoring, domain shift, first-shot problem, DCASE Challenge, Accepted in DCASE2023 Workshop

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[221] arXiv:2305.07909 (cross-list from cs.SD) [pdf, other]: Title: Higher-Order Frequency Modulation Synthesis

Victor Lazzarini, Joseph Timoney

Comments: 15 pages, 6 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2305.07952 (cross-list from cs.SD) [pdf, other]: Title: APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

Yang Ai, Zhen-Hua Ling

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing. Codes are available

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[223] arXiv:2305.07960 (cross-list from cs.SD) [pdf, other]: Title: Sound-to-Vibration Transformation for Sensorless Motor Health Monitoring

Ozer Can Devecioglu, Serkan Kiranyaz, Amer Elhmes, Sadok Sassi, Turker Ince, Onur Avci, Mohammad Hesam Soleimani-Babakamali, Ertugrul Taciroglu, Moncef Gabbouj

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[224] arXiv:2305.08014 (cross-list from cs.CV) [pdf, other]: Title: Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning

Md. Rabiul Islam, Daniel Massicotte, Philippe Y. Massicotte, Wei-Ping Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[225] arXiv:2305.08029 (cross-list from cs.SD) [pdf, html, other]: Title: REMAST: Real-time Emotion-based Music Arrangement with Soft Transition

Zihao Wang, Le Ma, Chen Zhang, Bo Han, Yunfei Xu, Yikai Wang, Xinyi Chen, HaoRong Hong, Wenbo Liu, Xinda Wu, Kejun Zhang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[226] arXiv:2305.08067 (cross-list from cs.CL) [pdf, other]: Title: Improving End-to-End SLU performance with Prosodic Attention and Distillation

Shangeth Rajaa

Comments: Submitted to InterSpeech 2023

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[227] arXiv:2305.08099 (cross-list from cs.SD) [pdf, other]: Title: Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations

Weiwei Lin, Chenhang He, Man-Wai Mak, Youzhi Tu

Comments: accepted by ICML 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[228] arXiv:2305.08292 (cross-list from cs.SD) [pdf, other]: Title: ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement

Feng Dang, Qi Hu, Pengyuan Zhang, Yonghong Yan

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[229] arXiv:2305.08541 (cross-list from cs.SD) [pdf, other]: Title: Ripple sparse self-attention for monaural speech enhancement

Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li

Comments: 5 pages, ICASSP 2023 published

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[230] arXiv:2305.08706 (cross-list from cs.CL) [pdf, other]: Title: Understanding and Bridging the Modality Gap for Speech Translation

Qingkai Fang, Yang Feng

Comments: ACL 2023 main conference

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[231] arXiv:2305.08709 (cross-list from cs.CL) [pdf, other]: Title: Back Translation for Speech-to-text Translation Without Transcripts

Qingkai Fang, Yang Feng

Comments: ACL 2023 main conference

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[232] arXiv:2305.09167 (cross-list from cs.SD) [pdf, other]: Title: Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng,

Comments: Accepted by ICME 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[233] arXiv:2305.09302 (cross-list from cs.CV) [pdf, other]: Title: Pink-Eggs Dataset V1: A Step Toward Invasive Species Management Using Deep Learning Embedded Solutions

Di Xu, Yang Zhao, Xiang Hao, Xin Meng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[234] arXiv:2305.09463 (cross-list from cs.SD) [pdf, other]: Title: Low-complexity deep learning frameworks for acoustic scene classification using teacher-student scheme and multiple spectrograms

Lam Pham, Dat Ngo, Cam Le, Anahid Jalali, Alexander Schindler

Comments: arXiv admin note: text overlap with arXiv:2206.06057

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[235] arXiv:2305.09489 (cross-list from cs.SD) [pdf, other]: Title: Discrete Diffusion Probabilistic Models for Symbolic Music Generation

Matthias Plasser, Silvan Peter, Gerhard Widmer

Comments: In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23), Macau, China

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[236] arXiv:2305.09559 (cross-list from cs.SD) [pdf, other]: Title: Robust and lightweight audio fingerprint for Automatic Content Recognition

Anoubhav Agarwaal, Prabhat Kanaujia, Sartaki Sinha Roy, Susmita Ghose

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[237] arXiv:2305.09636 (cross-list from cs.SD) [pdf, other]: Title: SoundStorm: Efficient Parallel Audio Generation

Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[238] arXiv:2305.09652 (cross-list from cs.CL) [pdf, other]: Title: The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

Mutian He, Philip N. Garner

Comments: 16 pages, 3 figures; accepted by Findings of EMNLP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[239] arXiv:2305.09690 (cross-list from cs.SD) [pdf, other]: Title: A Whisper transformer for audio captioning trained with synthetic captions and transfer learning

Marek Kadlčík, Adam Hájek, Jürgen Kieslich, Radosław Winiecki

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[240] arXiv:2305.09764 (cross-list from cs.CL) [pdf, other]: Title: Application-Agnostic Language Modeling for On-Device ASR

Markus Nußbaum-Thom, Lyan Verwimp, Youssef Oualil

Comments: accepted for ACL 2023 industry track

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[241] arXiv:2305.10270 (cross-list from cs.CL) [pdf, other]: Title: Boosting Local Spectro-Temporal Features for Speech Analysis

Michael Guerzhoy

Comments: Master's project, University of Toronto, 2010

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[242] arXiv:2305.10321 (cross-list from cs.CL) [pdf, other]: Title: Controllable Speaking Styles Using a Large Language Model

Atli Thor Sigurgeirsson, Simon King

Comments: Submitted to ICASSP 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[243] arXiv:2305.10358 (cross-list from cs.CR) [pdf, other]: Title: NUANCE: Near Ultrasound Attack On Networked Communication Environments

Forrest McKee, David Noever

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[244] arXiv:2305.10615 (cross-list from cs.SD) [pdf, html, other]: Title: ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

Comments: Accepted by Interspeech

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[245] arXiv:2305.10649 (cross-list from cs.SD) [pdf, other]: Title: ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu

Comments: accepted by interspeech 2023

Journal-ref: @inproceedings{song23c_interspeech, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1648--1652}}

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[246] arXiv:2305.10652 (cross-list from cs.SD) [pdf, html, other]: Title: Speech Separation based on Contrastive Learning and Deep Modularization

Peter Ochieng

Comments: arXiv admin note: substantial text overlap with arXiv:2212.00369

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[247] arXiv:2305.10666 (cross-list from cs.CL) [pdf, html, other]: Title: A unified front-end framework for English text-to-speech synthesis

Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, Qiao Tian, Yuanyuan Huo, Yuxuan Wang

Comments: Accepted in ICASSP 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[248] arXiv:2305.10680 (cross-list from cs.SD) [pdf, other]: Title: Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan

Comments: 5 pages, 4 figures, Interspeech2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[249] arXiv:2305.10686 (cross-list from cs.SD) [pdf, other]: Title: RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

Jinzheng He, Jinglin Liu, Zhenhui Ye, Rongjie Huang, Chenye Cui, Huadai Liu, Zhou Zhao

Comments: Accepted by Finding of ACL2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[250] arXiv:2305.10704 (cross-list from cs.SD) [pdf, other]: Title: Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian

Comments: Accepted by InterSpeech 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[251] arXiv:2305.10734 (cross-list from cs.SD) [pdf, html, other]: Title: Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[252] arXiv:2305.10761 (cross-list from cs.SD) [pdf, html, other]: Title: Noise-Aware Speech Separation with Contrastive Learning

Zizheng Zhang, Chen Chen, Hsin-Hung Chen, Xiang Liu, Yuchen Hu, Eng Siong Chng

Comments: 5 pages, 3 figures, ICASSP 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[253] arXiv:2305.10763 (cross-list from cs.SD) [pdf, other]: Title: CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

Zhenhui Ye, Rongjie Huang, Yi Ren, Ziyue Jiang, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao

Comments: Accepted by ACL 2023 (Main Conference)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[254] arXiv:2305.10788 (cross-list from cs.SD) [pdf, html, other]: Title: DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition

Hang Shao, Bei Liu, Wei Wang, Xun Gong, Yanmin Qian

Comments: Accepted by SLT2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[255] arXiv:2305.10805 (cross-list from cs.SD) [pdf, other]: Title: Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions

Francesco Sigona, Mirko Grimaldi

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[256] arXiv:2305.10839 (cross-list from cs.CL) [pdf, other]: Title: A Lexical-aware Non-autoregressive Transformer-based ASR Model

Chong-En Lin, Kuan-Yu Chen

Comments: Accepted by Interspeech 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[257] arXiv:2305.10841 (cross-list from cs.SD) [pdf, other]: Title: GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan

Comments: 13 pages, 4 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[258] arXiv:2305.10951 (cross-list from cs.CL) [pdf, other]: Title: Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

Martijn Bartelds, Nay San, Bradley McDonnell, Dan Jurafsky, Martijn Wieling

Comments: Accepted at ACL 2023

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[259] arXiv:2305.11013 (cross-list from cs.SD) [pdf, other]: Title: FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

Comments: 5 pages, 3 figures, accepted by INTERSPEECH 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[260] arXiv:2305.11072 (cross-list from cs.CL) [pdf, other]: Title: Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

Heng-Jui Chang, Alexander H. Liu, James Glass

Comments: Accepted to Interspeech 2023

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[261] arXiv:2305.11073 (cross-list from cs.CL) [pdf, other]: Title: A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe

Comments: Accepted at INTERSPEECH 2023. Code: this https URL

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[262] arXiv:2305.11094 (cross-list from cs.HC) [pdf, other]: Title: QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang

Comments: 15 pages, 12 figures, CVPR 2023 Highlight

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[263] arXiv:2305.11151 (cross-list from cs.SD) [pdf, html, other]: Title: Unsupervised Multi-channel Separation and Adaptation

Cong Han, Kevin Wilson, Scott Wisdom, John R. Hershey

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[264] arXiv:2305.11172 (cross-list from cs.CV) [pdf, other]: Title: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, Chang Zhou

Comments: 30 pages, 9 figures, 18 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[265] arXiv:2305.11229 (cross-list from cs.SD) [pdf, other]: Title: TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition

Tiantian Feng, Rajat Hebbar, Shrikanth Narayanan

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[266] arXiv:2305.11244 (cross-list from cs.CL) [pdf, other]: Title: A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model

Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

Comments: Accepted to Interspeech 2023, 5 pages. Code is available at: this https URL under MIT license

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[267] arXiv:2305.11310 (cross-list from cs.HC) [pdf, other]: Title: AMII: Adaptive Multimodal Inter-personal and Intra-personal Model for Adapted Behavior Synthesis

Jieyeon Woo, Mireille Fares, Catherine Pelachaud, Catherine Achard

Comments: 8 pages, 1 figure

Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[268] arXiv:2305.11320 (cross-list from cs.SD) [pdf, other]: Title: Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien

Comments: Accepted to Interspeech 2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[269] arXiv:2305.11360 (cross-list from cs.SD) [pdf, other]: Title: Differentially Private Adapters for Parameter Efficient Acoustic Modeling

Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi

Comments: Accepted to Interspeech 2023. Code will be available at: this https URL. The authors would like to express their gratitude to Prof. Chin-Hui Lee from Georgia Tech for providing helpful insights and suggestions

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[270] arXiv:2305.11408 (cross-list from cs.CL) [pdf, other]: Title: AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation

Sara Papi, Marco Turchi, Matteo Negri

Comments: Accepted at Interspeech 2023

Journal-ref: Proceedings of INTERSPEECH 2023

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[271] arXiv:2305.11411 (cross-list from cs.CL) [pdf, other]: Title: DUB: Discrete Unit Back-translation for Speech Translation

Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou

Comments: Accepted to Findings of ACL 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[272] arXiv:2305.11413 (cross-list from cs.SD) [pdf, other]: Title: A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model

Ibrahim Malik, Siddique Latif, Raja Jurdak, Björn Schuller

Comments: Accepted Interspeech 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[273] arXiv:2305.11438 (cross-list from cs.CL) [pdf, other]: Title: Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[274] arXiv:2305.11582 (cross-list from cs.SD) [pdf, other]: Title: What You Hear Is What You See: Audio Quality Metrics From Image Quality Metrics

Tashi Namgyal, Alexander Hepburn, Raul Santos-Rodriguez, Valero Laparra, Jesus Malo

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[275] arXiv:2305.11605 (cross-list from cs.SD) [pdf, other]: Title: MIDI-Draw: Sketching to Control Melody Generation

Tashi Namgyal, Peter Flach, Raul Santos-Rodriguez

Comments: Late-Breaking / Demo Session Extended Abstract, ISMIR 2022 Conference

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[276] arXiv:2305.11683 (cross-list from cs.SD) [pdf, other]: Title: Sensing of inspiration events from speech: comparison of deep learning and linguistic methods

Aki Härmä, Ulf Grossekathöfer, Okke Ouweltjes, Venkata Srikanth Nallanthighal

Comments: 8 pages

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[277] arXiv:2305.11727 (cross-list from cs.SD) [pdf, other]: Title: Direction Specific Ambisonics Source Separation with End-To-End Deep Learning

Francesc Lluís, Nils Meyer-Kahlen, Vasileios Chatziioannou, Alex Hofmann

Comments: Code and listening examples: this https URL

Journal-ref: Acta Acustica 2023, 7, 29

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[278] arXiv:2305.11846 (cross-list from cs.CV) [pdf, other]: Title: Any-to-Any Generation via Composable Diffusion

Zineng Tang, Ziyi Yang, Chenguang Zhu, Michael Zeng, Mohit Bansal

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[279] arXiv:2305.11926 (cross-list from cs.SD) [pdf, other]: Title: MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting

Neil Shah, Vishal Tambrahalli, Saiteja Kosgi, Niranjan Pedanekar, Vineet Gandhi

Comments: 5 pages, 1 figure

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[280] arXiv:2305.12107 (cross-list from cs.SD) [pdf, html, other]: Title: EE-TTS: Emphatic Expressive TTS with Linguistic Information

Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun

Comments: Accepted by Interspeech 2023, fix some typos

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[281] arXiv:2305.12121 (cross-list from cs.SD) [pdf, other]: Title: ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

Jia Qi Yip, Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

Comments: Accepted to INTERSPEECH 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[282] arXiv:2305.12200 (cross-list from cs.SD) [pdf, other]: Title: ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios

Yuyue Wang, Huan Xiao, Yihan Wu, Ruihua Song

Comments: 5 pages, 4 tables, 2 figure

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[283] arXiv:2305.12263 (cross-list from cs.CL) [pdf, other]: Title: Self-supervised representations in speech-based depression detection

Wen Wu, Chao Zhang, Philip C. Woodland

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[284] arXiv:2305.12301 (cross-list from cs.CL) [pdf, other]: Title: Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding

Yi Xuan Tan, Navonil Majumder, Soujanya Poria

Comments: Interspeech 2023

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[285] arXiv:2305.12311 (cross-list from cs.CL) [pdf, other]: Title: i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[286] arXiv:2305.12442 (cross-list from cs.SD) [pdf, other]: Title: Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus

Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari

Comments: Accepted by INTERSPEECH 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[287] arXiv:2305.12445 (cross-list from cs.SD) [pdf, other]: Title: JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions

Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari

Comments: 4 pages, 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[288] arXiv:2305.12460 (cross-list from cs.SD) [pdf, other]: Title: Study of GANs for Noisy Speech Simulation from Clean Speech

Leander Melroy Maben, Zixun Guo, Chen Chen, Utkarsh Chudiwal, Chng Eng Siong

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[289] arXiv:2305.12501 (cross-list from cs.CL) [pdf, other]: Title: Exploring How Generative Adversarial Networks Learn Phonological Representations

Jingyi Chen, Micha Elsner

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[290] arXiv:2305.12514 (cross-list from cs.SD) [pdf, other]: Title: Towards robust paralinguistic assessment for real-world mobile health (mHealth) monitoring: an initial study of reverberation effects on speech

Judith Dineley, Ewan Carr, Faith Matcham, Johnny Downs, Richard Dobson, Thomas F Quatieri, Nicholas Cummins

Comments: Accepted for publication at Interspeech 2023

Journal-ref: Proc. INTERSPEECH 2023, 2373-2377

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[291] arXiv:2305.12552 (cross-list from cs.CL) [pdf, other]: Title: Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize Cheng, Zhou Zhao

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[292] arXiv:2305.12579 (cross-list from cs.CL) [pdf, other]: Title: Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems

Karel Beneš, Martin Kocour, Lukáš Burget

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[293] arXiv:2305.12606 (cross-list from cs.CL) [pdf, other]: Title: Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Comments: Accepted at Interspeech 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[294] arXiv:2305.12628 (cross-list from cs.CL) [pdf, other]: Title: Duplex Diffusion Models Improve Speech-to-Speech Translation

Xianchao Wu

Comments: 11 pages, 3 figures. Accepted by ACL 2023 findings

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[295] arXiv:2305.12642 (cross-list from cs.SD) [pdf, other]: Title: The HCCL system for VoxCeleb Speaker Recognition Challenge 2022

Zhenduo Zhao, Zhuo Li, Wenchao Wang, Pengyuan Zhang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[296] arXiv:2305.12701 (cross-list from cs.SD) [pdf, other]: Title: More Perspectives Mean Better: Underwater Target Recognition and Localization with Multimodal Data via Symbiotic Transformer and Multiview Regression

Shipei Liu, Xiaoya Fan, Guowei Wu

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[297] arXiv:2305.12703 (cross-list from cs.SD) [pdf, other]: Title: Progressive Sub-Graph Clustering Algorithm for Semi-Supervised Domain Adaptation Speaker Verification

Zhuo Li, Jingze Lu, Zhenduo Zhao, Wenchao Wang, Pengyuan Zhang

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[298] arXiv:2305.12712 (cross-list from cs.SD) [pdf, other]: Title: LEAN: Light and Efficient Audio Classification Network

Shwetank Choudhary, CR Karthik, Punuru Sri Lakshmi, Sumit Kumar

Comments: Accepted at INDICON 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[299] arXiv:2305.12755 (cross-list from cs.SD) [pdf, other]: Title: GNCformer Enhanced Self-attention for Automatic Speech Recognition

J. Li, Z. Duan, S. Li, X. Yu, G. Yang

Comments: 5 pages,3 figures,

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[300] arXiv:2305.12804 (cross-list from cs.SD) [pdf, other]: Title: The defender's perspective on automatic speaker verification: An overview

Haibin Wu, Jiawen Kang, Lingwei Meng, Helen Meng, Hung-yi Lee

Comments: Accepted to IJCAI 2023 Workshop

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Total of 427 entries : 1-100 101-200 201-300 301-400 401-427

Showing up to 100 entries per page: fewer | more | all