Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for March 2023

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232
Showing up to 50 entries per page: fewer | more | all
[151] arXiv:2303.07624 (cross-list from cs.CL) [pdf, other]
Title: I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng, Jaesong Lee, Shinji Watanabe
Comments: Accepted at ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2303.07650 (cross-list from cs.CL) [pdf, other]
Title: Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features
Xuchu Chen, Yu Pu, Jinpeng Li, Wei-Qiang Zhang
Comments: accepted by ICASSP 2023
Journal-ref: ICASSP (2023)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153] arXiv:2303.07704 (cross-list from eess.AS) [pdf, other]
Title: TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge
Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[154] arXiv:2303.07739 (cross-list from eess.SP) [pdf, other]
Title: Detecting post-stroke aphasia using EEG-based neural envelope tracking of natural speech
Pieter De Clercq, Jill Kries, Ramtin Mehraram, Jonas Vanthornhout, Tom Francart, Maaike Vandermosten
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[155] arXiv:2303.07816 (cross-list from eess.AS) [pdf, other]
Title: Multi-Channel Masking with Learnable Filterbank for Sound Source Separation
Wang Dai, Archontis Politis, Tuomas Virtanen
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[156] arXiv:2303.07924 (cross-list from cs.LG) [pdf, other]
Title: Improving Accented Speech Recognition with Multi-Domain Training
Lucas Maison, Yannick Estève
Comments: 5 pages, 2 figures. Accepted to ICASSP 2023
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2303.08005 (cross-list from eess.AS) [pdf, other]
Title: Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks
Darius Petermann, Inseon Jang, Minje Kim
Comments: Accepted to ICASSP 2023. For resources and examples, see this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[158] arXiv:2303.08019 (cross-list from eess.AS) [pdf, other]
Title: Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng
Comments: 5 pages, 3 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[159] arXiv:2303.08027 (cross-list from eess.AS) [pdf, other]
Title: A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition
Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen Meng
Comments: 5 pages, 3 figures, 5 tables
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[160] arXiv:2303.08052 (cross-list from eess.AS) [pdf, other]
Title: Localizing Spatial Information in Neural Spatiospectral Filters
Annika Briegleb, Thomas Haubner, Vasileios Belagiannis, Walter Kellermann
Comments: Accepted to the 31st European Signal Processing Conference (EUSIPCO 2023), Helsinki, Finland. 5 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[161] arXiv:2303.08268 (cross-list from cs.RO) [pdf, other]
Title: Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
Xufeng Zhao, Mengdi Li, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter
Comments: IROS2023, Detroit. See the project website at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162] arXiv:2303.08295 (cross-list from eess.SP) [pdf, other]
Title: A large-scale multimodal dataset of human speech recognition
Yao Ge, Chong Tang, Haobo Li, Zikang Zhang, Wenda Li, Kevin Chetty, Daniele Faccio, Qammer H. Abbasi, Muhammad Imran
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[163] arXiv:2303.08343 (cross-list from eess.AS) [pdf, other]
Title: Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models
Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw
Comments: Accepted to IEEE ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[164] arXiv:2303.08372 (cross-list from eess.AS) [pdf, other]
Title: Target Sound Extraction with Variable Cross-modality Clues
Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[165] arXiv:2303.08379 (cross-list from eess.AS) [pdf, other]
Title: Implementing Continuous HRTF Measurement in Near-Field
Ee-Leng Tan, Santi Peksi, Woon-Seng Gan
Comments: 5 pages, 9 figures, Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[166] arXiv:2303.08480 (cross-list from eess.AS) [pdf, other]
Title: Acoustic source localization in the spherical harmonics domain exploiting low-rank approximations
Maximo Cobos, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti
Comments: To appear in ICASSP 2023
Journal-ref: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[167] arXiv:2303.08536 (cross-list from cs.MM) [pdf, other]
Title: Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro
Comments: Accepted at CVPR 2023. Implementation available: this https URL
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2303.08636 (cross-list from eess.AS) [pdf, other]
Title: HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism
Yuguang Yang, Yu Pan, Jingjing Yin, Jiangyu Han, Lei Ma, Heng Lu
Comments: Accepted by ICASSP2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[169] arXiv:2303.08670 (cross-list from cs.CV) [pdf, other]
Title: Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim, Chae Won Kim, Yong Man Ro
Comments: Accepted in AAAI2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2303.08674 (cross-list from eess.AS) [pdf, other]
Title: Speech Signal Improvement Using Causal Generative Diffusion Models
Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[171] arXiv:2303.08702 (cross-list from eess.AS) [pdf, other]
Title: Beamformer-Guided Target Speaker Extraction
Mohamed Elminshawi, Srikanth Raj Chetupalli, Emanuël A. P. Habets
Comments: Submitted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[172] arXiv:2303.09057 (cross-list from eess.AS) [pdf, other]
Title: TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
Hyun Joon Park, Seok Woo Yang, Jin Sob Kim, Wooseok Shin, Sung Won Han
Comments: To appear in ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[173] arXiv:2303.09119 (cross-list from cs.CV) [pdf, other]
Title: Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, Lequan Yu
Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023. 10 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2303.09278 (cross-list from eess.AS) [pdf, other]
Title: DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Yanzhe Fu, Yueteng Kang, Songjun Cao, Long Ma
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[175] arXiv:2303.09404 (cross-list from eess.AS) [pdf, other]
Title: Speech Modeling with a Hierarchical Transformer Dynamical VAE
Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[176] arXiv:2303.09438 (cross-list from cs.CL) [pdf, other]
Title: Trustera: A Live Conversation Redaction System
Evandro Gouvêa, Ali Dadgar, Shahab Jalalvand, Rathi Chengalvarayan, Badrinath Jayakumar, Ryan Price, Nicholas Ruiz, Jennifer McGovern, Srinivas Bangalore, Ben Stern
Comments: 5
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2303.09455 (cross-list from cs.CL) [pdf, other]
Title: Learning Cross-lingual Visual Speech Representations
Andreas Zinonos, Alexandros Haliassos, Pingchuan Ma, Stavros Petridis, Maja Pantic
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2303.09645 (cross-list from cs.RO) [pdf, other]
Title: Development of a Voice Controlled Robotic Arm
Akkas U. Haque, Humayun Kabir, S. C. Banik, M. T. Islam
Subjects: Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2303.09966 (cross-list from eess.AS) [pdf, other]
Title: Magnitude-Corrected and Time-Aligned Interpolation of Head-Related Transfer Functions
Johannes M. Arend, Christoph Pörschmann, Stefan Weinzierl, Fabian Brinkmann
Journal-ref: IEEE/ACM Trans. Audio Speech and Lang. Proc., 31, 3783--3799 (2023)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[180] arXiv:2303.10008 (cross-list from eess.AS) [pdf, other]
Title: Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Julien Hauret, Thomas Joubaud, Véronique Zimpfer, Éric Bavu
Comments: Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing on 14/08/2023
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023 - Volume: 31) - pp. 3499 - 3512
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[181] arXiv:2303.10160 (cross-list from eess.AS) [pdf, other]
Title: Visual Information Matters for ASR Error Correction
Vanya Bannihatti Kumar, Shanbo Cheng, Ningxin Peng, Yuchen Zhang
Comments: Accepted at ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[182] arXiv:2303.10335 (cross-list from cs.MM) [pdf, other]
Title: Multimodal Continuous Emotion Recognition: A Technical Report for ABAW5
Su Zhang, Ziyuan Zhao, Cuntai Guan
Comments: 6 pages. 1 figure. arXiv admin note: substantial text overlap with arXiv:2203.13031
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2303.10384 (cross-list from eess.AS) [pdf, other]
Title: Powerful and Extensible WFST Framework for RNN-Transducer Losses
Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg
Comments: To appear in Proc. ICASSP 2023, June 04-10, 2023, Rhodes island, Greece. 5 pages, 5 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[184] arXiv:2303.10510 (cross-list from cs.CL) [pdf, other]
Title: A Deep Learning System for Domain-specific Speech Recognition
Yanan Jia
Comments: 4th International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2023)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2303.10556 (cross-list from eess.AS) [pdf, other]
Title: The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
Zirui Ge, Haiyan Guo, Zhen Yang
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[186] arXiv:2303.10721 (cross-list from cs.HC) [pdf, other]
Title: Right the docs: Characterising voice dataset documentation practices used in machine learning
Kathy Reid, Elizabeth T. Williams
Comments: 16 pages, 3 tables, preprint of a submission to AIES 2023
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2303.10727 (cross-list from cs.LG) [pdf, html, other]
Title: ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement
Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Celine Lin, Ashutosh Sabharwal
Comments: Accepted by ICASSP'23
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2303.10917 (cross-list from eess.AS) [pdf, other]
Title: Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[189] arXiv:2303.10931 (cross-list from stat.ML) [pdf, other]
Title: Approaching an unknown communication system by latent space exploration and causal inference
Gašper Beguš, Andrej Leban, Shane Gero
Comments: 25 pages, 23 figures; new format and section layout (moved some sections to the appendix), added replication experiments, updated references: to a subsequent experimental validation of the work, as well as to related methodological work
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[190] arXiv:2303.10942 (cross-list from cs.CL) [pdf, other]
Title: On-the-fly Text Retrieval for End-to-End ASR Adaptation
Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko
Comments: Accepted to ICASSP 2023; Appendix added to include ablations that could not fit into the conference 4-page limit
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2303.10949 (cross-list from eess.AS) [pdf, other]
Title: Code-Switching Text Generation and Injection in Mandarin-English ASR
Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[192] arXiv:2303.11089 (cross-list from cs.CV) [pdf, other]
Title: EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
Ziqiao Peng, Haoyu Wu, Zhenbo Song, Hao Xu, Xiangyu Zhu, Jun He, Hongyan Liu, Zhaoxin Fan
Comments: Accepted by ICCV 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2303.11131 (cross-list from cs.CL) [pdf, other]
Title: Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Maryam Fazel-Zarandi, Wei-Ning Hsu
Comments: ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2303.11329 (cross-list from cs.CV) [pdf, other]
Title: Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen, Shengyi Qian, Andrew Owens
Comments: ICCV 2023. Project site: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2303.11551 (cross-list from cs.CV) [pdf, other]
Title: ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Akash Gupta, Rohun Tripathi, Wondong Jang
Comments: Paper accepted at ICASSP 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[196] arXiv:2303.11607 (cross-list from cs.CL) [pdf, html, other]
Title: Transformers in Speech Processing: A Survey
Siddique Latif, Aun Zaidi, Heriberto Cuayahuitl, Fahad Shamshad, Moazzam Shoukat, Muhammad Usama, Junaid Qadir
Comments: Accepted in Computer Science Review 2025
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197] arXiv:2303.12002 (cross-list from eess.AS) [pdf, html, other]
Title: End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
Comments: 16 pages, 7 figures
Journal-ref: Speech Communication 161 (2024) 103081
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[198] arXiv:2303.12187 (cross-list from eess.AS) [pdf, other]
Title: Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English
Xiaoming Ren, Chao Li, Shenjian Wang, Biao Li
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[199] arXiv:2303.12337 (cross-list from cs.MM) [pdf, other]
Title: Music-Driven Group Choreography
Nhat Le, Thang Pham, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen
Comments: accepted in CVPR 2023
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2303.12659 (cross-list from cs.AI) [pdf, other]
Title: Posthoc Interpretation via Quantization
Francesco Paissan, Cem Subakan, Mirco Ravanelli
Comments: Francesco Paissan and Cem Subakan contributed equally
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 232 entries : 1-50 51-100 101-150 151-200 201-232
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack