Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for August 2023

Total of 236 entries : 1-25 ... 126-150 151-175 176-200 201-225 226-236
Showing up to 25 entries per page: fewer | more | all
[201] arXiv:2308.13007 (cross-list from cs.SD) [pdf, other]
Title: Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
Wenbin Wang, Yang Song, Sanjay Jha
Comments: 5 pages, 3 figures. Accepted by Interspeech 2023, Oral
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[202] arXiv:2308.13201 (cross-list from cs.SD) [pdf, other]
Title: Deep Active Audio Feature Learning in Resource-Constrained Environments
Md Mohaimenuzzaman, Christoph Bergmeir, Bernd Meyer
Journal-ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 32, 2024
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[203] arXiv:2308.13365 (cross-list from cs.SD) [pdf, html, other]
Title: Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang
Comments: accepted at Interspeech 2024
Journal-ref: Proceedings of Interspeech 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[204] arXiv:2308.13421 (cross-list from cs.CV) [pdf, other]
Title: Exploiting Diverse Feature for Multimodal Sentiment Analysis
Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo, Meng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[205] arXiv:2308.13736 (cross-list from cs.SD) [pdf, other]
Title: A Comprehensive Survey for Evaluation Methodologies of AI-Generated Music
Zeyu Xiong, Weitao Wang, Jing Yu, Yue Lin, Ziyan Wang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[206] arXiv:2308.13941 (cross-list from cs.SD) [pdf, other]
Title: A small vocabulary database of ultrasound image sequences of vocal tract dynamics
Margareth Castillo, Felipe Rubio, Dagoberto Porras, Sonia H. Contreras-Ortiz, Alexander Sepúlveda
Journal-ref: STSIVA-2019, Bucaramanga, Colombia, 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[207] arXiv:2308.14059 (cross-list from cs.SD) [pdf, other]
Title: Multi-Subdomain Adversarial Network for Cross-Subject EEG-based Emotion Recognition
Guang Lin, Jianhai Zhang
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[208] arXiv:2308.14063 (cross-list from cs.SD) [pdf, other]
Title: Anomalous Sound Detection Using Self-Attention-Based Frequency Pattern Analysis of Machine Sounds
Hejing Zhang, Jian Guan, Qiaoxi Zhu, Feiyang Xiao, Youde Liu
Comments: Published in INTERSPEECH 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[209] arXiv:2308.14317 (cross-list from cs.SD) [pdf, other]
Title: Symbolic & Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music
Kexin Zhu, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Comments: Accepted by 19th International Conference on Advanced Data Mining and Applications. (ADMA 2023)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[210] arXiv:2308.14319 (cross-list from cs.SD) [pdf, other]
Title: Voice Conversion with Denoising Diffusion Probabilistic GAN Models
Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Comments: Accepted by 19th International Conference on Advanced Data Mining and Applications. (ADMA 2023)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[211] arXiv:2308.14360 (cross-list from cs.SD) [pdf, html, other]
Title: InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models
Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song
Comments: Demo samples are available at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[212] arXiv:2308.14512 (cross-list from eess.SP) [pdf, html, other]
Title: A time-causal and time-recursive analogue of the Gabor transform
Tony Lindeberg
Comments: 31 pages, 7 figures, 7 tables, 1 algorithm
Journal-ref: IEEE Transactions on Information Theory, 71(2): 1450-1480, 2025
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Functional Analysis (math.FA)
[213] arXiv:2308.14536 (cross-list from cs.CL) [pdf, html, other]
Title: Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng, Baorian Nuchged, Yingming Gao
Comments: 28 pages, 7 figures, Preprint Feb 04, 2025 update: Add Deepseek R1 performance
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[214] arXiv:2308.14568 (cross-list from cs.SD) [pdf, other]
Title: Time-Frequency Transformer: A Novel Time Frequency Joint Learning Method for Speech Emotion Recognition
Yong Wang, Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Sunan Li
Comments: Accepted by International Conference on Neural Information Processing (ICONIP2023)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[215] arXiv:2308.14894 (cross-list from cs.CL) [pdf, other]
Title: Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations
Théo Deschamps-Berger, Lori Lamel, Laurence Devillers
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[216] arXiv:2308.14905 (cross-list from cs.CL) [pdf, other]
Title: Neural approaches to spoken content embedding
Shane Settle
Comments: PhD thesis
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[217] arXiv:2308.14909 (cross-list from cs.SD) [pdf, other]
Title: Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang
Comments: INTERSPEECH 2023
Journal-ref: Proc. INTERSPEECH 2023, 4299-4303
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[218] arXiv:2308.14951 (cross-list from cs.CL) [pdf, other]
Title: Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Mustafa Eyceoz, Justin Lee, Siddharth Pittie, Homayoon Beigi
Comments: 6pages, 1 table, 6 figures
Journal-ref: Recognition Technologies, Inc. Technical Report (2023), RTI-20230328-01
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[219] arXiv:2308.14970 (cross-list from cs.SD) [pdf, other]
Title: Audio Deepfake Detection: A Survey
Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, Yan Zhao
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[220] arXiv:2308.15090 (cross-list from cs.CL) [pdf, other]
Title: Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?
Etienne Labbé (IRIT-SAMoVA), Thomas Pellegrini (IRIT-SAMoVA), Julien Pinquier (IRIT-SAMoVA)
Comments: cam ready version (14/08/23)
Journal-ref: DCASE2023, Sep 2023, Tampere, Finland
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[221] arXiv:2308.15422 (cross-list from cs.SD) [pdf, other]
Title: A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
Ben Hayes, Jordie Shier, György Fazekas, Andrew McPherson, Charalampos Saitis
Comments: Under review for Frontiers in Signal Processing
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2308.15726 (cross-list from cs.SD) [pdf, other]
Title: AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che, Chenrui Liu, Fei Yu
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[223] arXiv:2308.15742 (cross-list from cs.SD) [pdf, other]
Title: ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers
Yi Liu, Yuekang Li, Gelei Deng, Felix Juefei-Xu, Yao Du, Cen Zhang, Chengwei Liu, Yeting Li, Lei Ma, Yang Liu
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Software Engineering (cs.SE); Audio and Speech Processing (eess.AS)
[224] arXiv:2308.15930 (cross-list from cs.CL) [pdf, other]
Title: LLaSM: Large Language and Speech Model
Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[225] arXiv:2308.15990 (cross-list from cs.SD) [pdf, other]
Title: Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 236 entries : 1-25 ... 126-150 151-175 176-200 201-225 226-236
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack