Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for December 2023

Total of 234 entries : 1-50 51-100 101-150 151-200 201-234
Showing up to 50 entries per page: fewer | more | all
[151] arXiv:2312.09582 (cross-list from cs.CL) [pdf, html, other]
Title: Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe
Comments: Accepted to ICASSP2024
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2312.09583 (cross-list from cs.CL) [pdf, other]
Title: Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition
Tzu-Ting Yang, Hsin-Wei Wang, Berlin Chen
Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153] arXiv:2312.09603 (cross-list from cs.SD) [pdf, html, other]
Title: Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification
June-Woo Kim, Sangmin Bae, Won-Yang Cho, Byungjo Lee, Ho-Young Jung
Comments: accepted to ICASSP 2024
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[154] arXiv:2312.09651 (cross-list from cs.SD) [pdf, html, other]
Title: What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, Jianhua Tao
Comments: Accepted by the main track The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[155] arXiv:2312.09727 (cross-list from cs.CV) [pdf, html, other]
Title: LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Hendrik Laux, Emil Mededovic, Ahmed Hallawa, Lukas Martin, Arne Peine, Anke Schmeink
Comments: Accepted for publication at ICASSP 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[156] arXiv:2312.09736 (cross-list from cs.CL) [pdf, html, other]
Title: HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Sunjae Yoon, Dahyun Kim, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chnag D. Yoo
Comments: EMNLP 2023, 14 pages, 13 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2312.09746 (cross-list from cs.SD) [pdf, html, other]
Title: Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies
Bingshen Mu, Pengcheng Guo, Dake Guo, Pan Zhou, Wei Chen, Lei Xie
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[158] arXiv:2312.09842 (cross-list from cs.SD) [pdf, html, other]
Title: On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition
Nagaraj Adiga, Jinhwan Park, Chintigari Shiva Kumar, Shatrughan Singh, Kyungmin Lee, Chanwoo Kim, Dhananjaya Gowda
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[159] arXiv:2312.09895 (cross-list from cs.CL) [pdf, html, other]
Title: Generative Context-aware Fine-tuning of Self-supervised Speech Models
Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2312.09911 (cross-list from cs.SD) [pdf, html, other]
Title: Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Jiaqi Li, Haorui He, Chaoren Wang, Songting Liu, Xi Chen, Junan Zhang, Zihao Fang, Haopeng Chen, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu
Comments: Accepted by IEEE SLT 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[161] arXiv:2312.10019 (cross-list from cs.IT) [pdf, html, other]
Title: Understanding Probe Behaviors through Variational Bounds of Mutual Information
Kwanghee Choi, Jee-weon Jung, Shinji Watanabe
Comments: Accepted to ICASSP 2024, implementation available at this https URL
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[162] arXiv:2312.10265 (cross-list from cs.SD) [pdf, html, other]
Title: VoCopilot: Voice-Activated Tracking of Everyday Interactions
Sheen An Goh, Manoj Gulati, Ambuj Varshney
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[163] arXiv:2312.10305 (cross-list from cs.SD) [pdf, html, other]
Title: Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang
Comments: Accepted by AAAI2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[164] arXiv:2312.10307 (cross-list from cs.SD) [pdf, html, other]
Title: MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion
Shulei Ji, Xinyu Yang
Comments: Accepted by AAAI 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[165] arXiv:2312.10381 (cross-list from cs.SD) [pdf, html, other]
Title: SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shixiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu
Comments: Accepted by AAAI 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2312.10402 (cross-list from cs.SD) [pdf, html, other]
Title: Annotation-free Automatic Music Transcription with Scalable Synthetic Data and Adversarial Domain Confusion
Gakusei Sato, Taketo Akama
Comments: 7 pages, 1 figure, Accepted to 2024 IEEE International Conference on Multimedia and Expo (ICME)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[167] arXiv:2312.10518 (cross-list from cs.SD) [pdf, html, other]
Title: Seq2seq for Automatic Paraphasia Detection in Aphasic Speech
Matthew Perez, Duc Le, Amrit Romana, Elise Jones, Keli Licata, Emily Mower Provost
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[168] arXiv:2312.10605 (cross-list from cs.SD) [pdf, html, other]
Title: Meta-AF Echo Cancellation for Improved Keyword Spotting
Jonah Casebeer, Junkai Wu, Paris Smaragdis
Comments: 5 pages, 2 figures, ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2312.10742 (cross-list from cs.SD) [pdf, other]
Title: Exploring Sound vs Vibration for Robust Fault Detection on Rotating Machinery
Serkan Kiranyaz, Ozer Can Devecioglu, Amir Alhams, Sadok Sassi, Turker Ince, Onur Avci, Moncef Gabbouj
Comments: 8 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[170] arXiv:2312.10921 (cross-list from cs.CV) [pdf, html, other]
Title: AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis
Dongze Li, Kang Zhao, Wei Wang, Bo Peng, Yingya Zhang, Jing Dong, Tieniu Tan
Comments: Accepted by AAAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[171] arXiv:2312.10937 (cross-list from cs.SD) [pdf, html, other]
Title: An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance
David Hason Rudd, Huan Huo, Guandong Xu
Comments: 12 pages
Journal-ref: Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13937. Springer, Cham
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[172] arXiv:2312.10949 (cross-list from cs.SD) [pdf, html, other]
Title: Leveraged Mel spectrograms using Harmonic and Percussive Components in Speech Emotion Recognition
David Hason Rudd, Huan Huo, Guandong Xu
Comments: 12 pages
Journal-ref: Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13281. Springer, Cham
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[173] arXiv:2312.10952 (cross-list from cs.CL) [pdf, html, other]
Title: Soft Alignment of Modality Space for End-to-end Speech Translation
Yuhao Zhang, Kaiqi Kou, Bei Li, Chen Xu, Chunliang Zhang, Tong Xiao, Jingbo Zhu
Comments: Accepted to ICASSP2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2312.10959 (cross-list from cs.SD) [pdf, html, other]
Title: Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen, Xugang Lu, Hisashi Kawai
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[175] arXiv:2312.10964 (cross-list from cs.CL) [pdf, html, other]
Title: Generative linguistic representation for spoken language identification
Peng Shen, Xuguang Lu, Hisashi Kawai
Comments: Accepted by IEEE ASRU2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176] arXiv:2312.10979 (cross-list from cs.SD) [pdf, html, other]
Title: 3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications
Shulin He, Jinjiang liu, Hao Li, Yang Yang, Fei Chen, Xueliang Zhang
Comments: Accepted to ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2312.11123 (cross-list from cs.SD) [pdf, html, other]
Title: Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers
Guru Prakash Arumugam, Shuo-yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia
Comments: 8 pages, ASRU 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[178] arXiv:2312.11234 (cross-list from cs.SD) [pdf, html, other]
Title: Perceptual Musical Features for Interpretable Audio Tagging
Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou
Comments: Github Repository: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[179] arXiv:2312.11240 (cross-list from cs.SD) [pdf, html, other]
Title: Evaluation of Barlow Twins and VICReg self-supervised learning for sound patterns of bird and anuran species
Fábio Felix Dias, Moacir Antonelli Ponti, Mílton Cezar Ribeiro, Rosane Minghim
Comments: 10 pages, 2 figures, 3 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[180] arXiv:2312.11509 (cross-list from cs.CL) [pdf, html, other]
Title: Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency
Pavlos Constas, Vikram Rawal, Matthew Honorio Oliveira, Andreas Constas, Aditya Khan, Kaison Cheung, Najma Sultani, Carrie Chen, Micol Altomare, Michael Akzam, Jiacheng Chen, Vhea He, Lauren Altomare, Heraa Murqi, Asad Khan, Nimit Amikumar Bhanshali, Youssef Rachad, Michael Guerzhoy
Comments: In Proc. Machine Learning for Cognitive and Mental Health Workshop (ML4CMH) at AAAI 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[181] arXiv:2312.11563 (cross-list from cs.SD) [pdf, other]
Title: A review-based study on different Text-to-Speech technologies
Md. Jalal Uddin Chowdhury, Ashab Hussan
Comments: 4 pages
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[182] arXiv:2312.11825 (cross-list from cs.SD) [pdf, html, other]
Title: MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Jiaqi Yip, Dianwen Ng, Bin Ma
Comments: 5 pages, 3 figures, accepted by ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2312.11947 (cross-list from cs.CL) [pdf, html, other]
Title: Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li
Comments: 9 pages, 4 figures, Accepted by AAAI'2024, Code and audio samples: this https URL
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184] arXiv:2312.11974 (cross-list from cs.SD) [pdf, html, other]
Title: Ms-senet: Enhancing Speech Emotion Recognition Through Multi-scale Feature Fusion With Squeeze-and-excitation Blocks
Mengbo Li, Yuanzhong Zheng, Dichucheng Li, Yulun Wu, Yaoxuan Wang, Haojun Fei
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[185] arXiv:2312.12153 (cross-list from cs.SD) [pdf, html, other]
Title: Noise robust distillation of self-supervised speech models via correlation metrics
Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H.M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen
Comments: 6 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186] arXiv:2312.12181 (cross-list from cs.SD) [pdf, html, other]
Title: StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng
Comments: Accepted to ICASSP 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[187] arXiv:2312.12269 (cross-list from cs.CL) [pdf, other]
Title: Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
Gloria Araiza-Illan, Luke Meyer, Khiet P. Truong, Deniz Baskent
Comments: 25 pages (double spaced), 5 figures, 3 tables, 54 references
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2312.12364 (cross-list from cs.CL) [pdf, html, other]
Title: SpokesBiz -- an Open Corpus of Conversational Polish
Piotr Pęzik, Sylwia Karasińska, Anna Cichosz, Łukasz Jałowiecki, Konrad Kaczyński, Małgorzata Krawentek, Karolina Walkusz, Paweł Wilk, Mariusz Kleć, Krzysztof Szklanny, Szymon Marszałkowski
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[189] arXiv:2312.13143 (cross-list from cs.SD) [pdf, html, other]
Title: Underwater Acoustic Signal Recognition Based on Salient Feature
Minghao Chen
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[190] arXiv:2312.13556 (cross-list from cs.SD) [pdf, other]
Title: Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions
Yang Liu, Haoqin Sun, Geng Chen, Qingyue Wang, Zhen Zhao, Xugang Lu, Longbiao Wang
Comments: Accepted by INTERSPEECH 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2312.13560 (cross-list from cs.SD) [pdf, html, other]
Title: kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
Jiaming Zhou, Shiwan Zhao, Yaqi Liu, Wenjia Zeng, Yong Chen, Yong Qin
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192] arXiv:2312.13567 (cross-list from cs.SD) [pdf, html, other]
Title: Fine-grained Disentangled Representation Learning for Multimodal Emotion Recognition
Haoqin Sun, Shiwan Zhao, Xuechen Wang, Wenjia Zeng, Yong Chen, Yong Qin
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[193] arXiv:2312.13585 (cross-list from cs.CL) [pdf, html, other]
Title: Speech Translation with Large Language Models: An Industrial Practice
Zhichao Huang, Rong Ye, Tom Ko, Qianqian Dong, Shanbo Cheng, Mingxuan Wang, Hang Li
Comments: Technical report. 13 pages. Demo: this https URL
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2312.13722 (cross-list from cs.SD) [pdf, html, other]
Title: BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution
Guochen Yu, Xiguang Zheng, Nan Li, Runqiang Han, Chengshi Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu
Comments: Accepted to ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2312.13873 (cross-list from cs.SD) [pdf, html, other]
Title: Self-Supervised Adaptive AV Fusion Module for Pre-Trained ASR Models
Christopher Simic, Tobias Bocklet
Comments: Accepted at ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196] arXiv:2312.14005 (cross-list from cs.SD) [pdf, html, other]
Title: On the choice of the optimal temporal support for audio classification with Pre-trained embeddings
Aurian Quelennec, Michel Olvera, Geoffroy Peeters, Slim Essid
Comments: Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[197] arXiv:2312.14020 (cross-list from cs.HC) [pdf, other]
Title: BANSpEmo: A Bangla Emotional Speech Recognition Dataset
Md Gulzar Hussain, Mahmuda Rahman, Babe Sultana, Ye Shiren
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[198] arXiv:2312.14036 (cross-list from cs.SD) [pdf, html, other]
Title: Total variation in popular rap vocals from 2009-2023: extension of the analysis by Georgieva, Ripolles & McFee
Kelvin L Walls, Iran R Roman, Bea Steers, Elena Georgieva
Journal-ref: Ismir 2023 Hybrid Conference 2023 Nov 5
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[199] arXiv:2312.14069 (cross-list from cs.CL) [pdf, html, other]
Title: EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
Maureen de Seyssel, Antony D'Avirro, Adina Williams, Emmanuel Dupoux
Comments: Accepted at EMNLP 2024 (Main)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2312.14378 (cross-list from cs.LG) [pdf, other]
Title: Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu
Comments: 5 pages, 1 figure, ICASSP 2024 Workshop on Self-supervision in Audio, Speech and Beyond
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 234 entries : 1-50 51-100 101-150 151-200 201-234
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack