Audio and Speech Processing

Authors and titles for August 2023

Total of 236 entries : 1-25 ... 126-150 151-175 176-200 201-225 226-236

Showing up to 25 entries per page: fewer | more | all

[201] arXiv:2308.13007 (cross-list from cs.SD) [pdf, other]: Title: Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations

Wenbin Wang, Yang Song, Sanjay Jha

Comments: 5 pages, 3 figures. Accepted by Interspeech 2023, Oral

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[202] arXiv:2308.13201 (cross-list from cs.SD) [pdf, other]: Title: Deep Active Audio Feature Learning in Resource-Constrained Environments

Md Mohaimenuzzaman, Christoph Bergmeir, Bernd Meyer

Journal-ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[203] arXiv:2308.13365 (cross-list from cs.SD) [pdf, html, other]: Title: Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang

Comments: accepted at Interspeech 2024

Journal-ref: Proceedings of Interspeech 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[204] arXiv:2308.13421 (cross-list from cs.CV) [pdf, other]: Title: Exploiting Diverse Feature for Multimodal Sentiment Analysis

Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo, Meng Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[205] arXiv:2308.13736 (cross-list from cs.SD) [pdf, other]: Title: A Comprehensive Survey for Evaluation Methodologies of AI-Generated Music

Zeyu Xiong, Weitao Wang, Jing Yu, Yue Lin, Ziyan Wang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[206] arXiv:2308.13941 (cross-list from cs.SD) [pdf, other]: Title: A small vocabulary database of ultrasound image sequences of vocal tract dynamics

Margareth Castillo, Felipe Rubio, Dagoberto Porras, Sonia H. Contreras-Ortiz, Alexander Sepúlveda

Journal-ref: STSIVA-2019, Bucaramanga, Colombia, 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[207] arXiv:2308.14059 (cross-list from cs.SD) [pdf, other]: Title: Multi-Subdomain Adversarial Network for Cross-Subject EEG-based Emotion Recognition

Guang Lin, Jianhai Zhang

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[208] arXiv:2308.14063 (cross-list from cs.SD) [pdf, other]: Title: Anomalous Sound Detection Using Self-Attention-Based Frequency Pattern Analysis of Machine Sounds

Hejing Zhang, Jian Guan, Qiaoxi Zhu, Feiyang Xiao, Youde Liu

Comments: Published in INTERSPEECH 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[209] arXiv:2308.14317 (cross-list from cs.SD) [pdf, other]: Title: Symbolic & Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music

Kexin Zhu, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Accepted by 19th International Conference on Advanced Data Mining and Applications. (ADMA 2023)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[210] arXiv:2308.14319 (cross-list from cs.SD) [pdf, other]: Title: Voice Conversion with Denoising Diffusion Probabilistic GAN Models

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Accepted by 19th International Conference on Advanced Data Mining and Applications. (ADMA 2023)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[211] arXiv:2308.14360 (cross-list from cs.SD) [pdf, html, other]: Title: InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song

Comments: Demo samples are available at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[212] arXiv:2308.14512 (cross-list from eess.SP) [pdf, html, other]: Title: A time-causal and time-recursive analogue of the Gabor transform

Tony Lindeberg

Comments: 31 pages, 7 figures, 7 tables, 1 algorithm

Journal-ref: IEEE Transactions on Information Theory, 71(2): 1450-1480, 2025

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Functional Analysis (math.FA)
[213] arXiv:2308.14536 (cross-list from cs.CL) [pdf, html, other]: Title: Spoken Language Intelligence of Large Language Models for Language Learning

Linkai Peng, Baorian Nuchged, Yingming Gao

Comments: 28 pages, 7 figures, Preprint Feb 04, 2025 update: Add Deepseek R1 performance

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[214] arXiv:2308.14568 (cross-list from cs.SD) [pdf, other]: Title: Time-Frequency Transformer: A Novel Time Frequency Joint Learning Method for Speech Emotion Recognition

Yong Wang, Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Sunan Li

Comments: Accepted by International Conference on Neural Information Processing (ICONIP2023)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[215] arXiv:2308.14894 (cross-list from cs.CL) [pdf, other]: Title: Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations

Théo Deschamps-Berger, Lori Lamel, Laurence Devillers

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[216] arXiv:2308.14905 (cross-list from cs.CL) [pdf, other]: Title: Neural approaches to spoken content embedding

Shane Settle

Comments: PhD thesis

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[217] arXiv:2308.14909 (cross-list from cs.SD) [pdf, other]: Title: Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

Comments: INTERSPEECH 2023

Journal-ref: Proc. INTERSPEECH 2023, 4299-4303

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[218] arXiv:2308.14951 (cross-list from cs.CL) [pdf, other]: Title: Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset

Mustafa Eyceoz, Justin Lee, Siddharth Pittie, Homayoon Beigi

Comments: 6pages, 1 table, 6 figures

Journal-ref: Recognition Technologies, Inc. Technical Report (2023), RTI-20230328-01

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[219] arXiv:2308.14970 (cross-list from cs.SD) [pdf, other]: Title: Audio Deepfake Detection: A Survey

Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, Yan Zhao

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[220] arXiv:2308.15090 (cross-list from cs.CL) [pdf, other]: Title: Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?

Etienne Labbé (IRIT-SAMoVA), Thomas Pellegrini (IRIT-SAMoVA), Julien Pinquier (IRIT-SAMoVA)

Comments: cam ready version (14/08/23)

Journal-ref: DCASE2023, Sep 2023, Tampere, Finland

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[221] arXiv:2308.15422 (cross-list from cs.SD) [pdf, other]: Title: A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis

Ben Hayes, Jordie Shier, György Fazekas, Andrew McPherson, Charalampos Saitis

Comments: Under review for Frontiers in Signal Processing

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2308.15726 (cross-list from cs.SD) [pdf, other]: Title: AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition

Nan Che, Chenrui Liu, Fei Yu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[223] arXiv:2308.15742 (cross-list from cs.SD) [pdf, other]: Title: ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers

Yi Liu, Yuekang Li, Gelei Deng, Felix Juefei-Xu, Yao Du, Cen Zhang, Chengwei Liu, Yeting Li, Lei Ma, Yang Liu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Software Engineering (cs.SE); Audio and Speech Processing (eess.AS)
[224] arXiv:2308.15930 (cross-list from cs.CL) [pdf, other]: Title: LLaSM: Large Language and Speech Model

Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[225] arXiv:2308.15990 (cross-list from cs.SD) [pdf, other]: Title: Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 236 entries : 1-25 ... 126-150 151-175 176-200 201-225 226-236

Showing up to 25 entries per page: fewer | more | all