Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for March 2023

Total of 232 entries
Showing up to 2000 entries per page: fewer | more | all
[151] arXiv:2303.07624 (cross-list from cs.CL) [pdf, other]
Title: I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng, Jaesong Lee, Shinji Watanabe
Comments: Accepted at ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2303.07650 (cross-list from cs.CL) [pdf, other]
Title: Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features
Xuchu Chen, Yu Pu, Jinpeng Li, Wei-Qiang Zhang
Comments: accepted by ICASSP 2023
Journal-ref: ICASSP (2023)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153] arXiv:2303.07704 (cross-list from eess.AS) [pdf, other]
Title: TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge
Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[154] arXiv:2303.07739 (cross-list from eess.SP) [pdf, other]
Title: Detecting post-stroke aphasia using EEG-based neural envelope tracking of natural speech
Pieter De Clercq, Jill Kries, Ramtin Mehraram, Jonas Vanthornhout, Tom Francart, Maaike Vandermosten
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[155] arXiv:2303.07816 (cross-list from eess.AS) [pdf, other]
Title: Multi-Channel Masking with Learnable Filterbank for Sound Source Separation
Wang Dai, Archontis Politis, Tuomas Virtanen
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[156] arXiv:2303.07924 (cross-list from cs.LG) [pdf, other]
Title: Improving Accented Speech Recognition with Multi-Domain Training
Lucas Maison, Yannick Estève
Comments: 5 pages, 2 figures. Accepted to ICASSP 2023
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2303.08005 (cross-list from eess.AS) [pdf, other]
Title: Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks
Darius Petermann, Inseon Jang, Minje Kim
Comments: Accepted to ICASSP 2023. For resources and examples, see this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[158] arXiv:2303.08019 (cross-list from eess.AS) [pdf, other]
Title: Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng
Comments: 5 pages, 3 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[159] arXiv:2303.08027 (cross-list from eess.AS) [pdf, other]
Title: A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition
Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen Meng
Comments: 5 pages, 3 figures, 5 tables
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[160] arXiv:2303.08052 (cross-list from eess.AS) [pdf, other]
Title: Localizing Spatial Information in Neural Spatiospectral Filters
Annika Briegleb, Thomas Haubner, Vasileios Belagiannis, Walter Kellermann
Comments: Accepted to the 31st European Signal Processing Conference (EUSIPCO 2023), Helsinki, Finland. 5 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[161] arXiv:2303.08268 (cross-list from cs.RO) [pdf, other]
Title: Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
Xufeng Zhao, Mengdi Li, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter
Comments: IROS2023, Detroit. See the project website at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162] arXiv:2303.08295 (cross-list from eess.SP) [pdf, other]
Title: A large-scale multimodal dataset of human speech recognition
Yao Ge, Chong Tang, Haobo Li, Zikang Zhang, Wenda Li, Kevin Chetty, Daniele Faccio, Qammer H. Abbasi, Muhammad Imran
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[163] arXiv:2303.08343 (cross-list from eess.AS) [pdf, other]
Title: Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models
Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw
Comments: Accepted to IEEE ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[164] arXiv:2303.08372 (cross-list from eess.AS) [pdf, other]
Title: Target Sound Extraction with Variable Cross-modality Clues
Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[165] arXiv:2303.08379 (cross-list from eess.AS) [pdf, other]
Title: Implementing Continuous HRTF Measurement in Near-Field
Ee-Leng Tan, Santi Peksi, Woon-Seng Gan
Comments: 5 pages, 9 figures, Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[166] arXiv:2303.08480 (cross-list from eess.AS) [pdf, other]
Title: Acoustic source localization in the spherical harmonics domain exploiting low-rank approximations
Maximo Cobos, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti
Comments: To appear in ICASSP 2023
Journal-ref: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[167] arXiv:2303.08536 (cross-list from cs.MM) [pdf, other]
Title: Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro
Comments: Accepted at CVPR 2023. Implementation available: this https URL
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2303.08636 (cross-list from eess.AS) [pdf, other]
Title: HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism
Yuguang Yang, Yu Pan, Jingjing Yin, Jiangyu Han, Lei Ma, Heng Lu
Comments: Accepted by ICASSP2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[169] arXiv:2303.08670 (cross-list from cs.CV) [pdf, other]
Title: Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim, Chae Won Kim, Yong Man Ro
Comments: Accepted in AAAI2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2303.08674 (cross-list from eess.AS) [pdf, other]
Title: Speech Signal Improvement Using Causal Generative Diffusion Models
Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[171] arXiv:2303.08702 (cross-list from eess.AS) [pdf, other]
Title: Beamformer-Guided Target Speaker Extraction
Mohamed Elminshawi, Srikanth Raj Chetupalli, Emanuël A. P. Habets
Comments: Submitted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[172] arXiv:2303.09057 (cross-list from eess.AS) [pdf, other]
Title: TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
Hyun Joon Park, Seok Woo Yang, Jin Sob Kim, Wooseok Shin, Sung Won Han
Comments: To appear in ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[173] arXiv:2303.09119 (cross-list from cs.CV) [pdf, other]
Title: Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, Lequan Yu
Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023. 10 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2303.09278 (cross-list from eess.AS) [pdf, other]
Title: DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Yanzhe Fu, Yueteng Kang, Songjun Cao, Long Ma
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[175] arXiv:2303.09404 (cross-list from eess.AS) [pdf, other]
Title: Speech Modeling with a Hierarchical Transformer Dynamical VAE
Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[176] arXiv:2303.09438 (cross-list from cs.CL) [pdf, other]
Title: Trustera: A Live Conversation Redaction System
Evandro Gouvêa, Ali Dadgar, Shahab Jalalvand, Rathi Chengalvarayan, Badrinath Jayakumar, Ryan Price, Nicholas Ruiz, Jennifer McGovern, Srinivas Bangalore, Ben Stern
Comments: 5
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2303.09455 (cross-list from cs.CL) [pdf, other]
Title: Learning Cross-lingual Visual Speech Representations
Andreas Zinonos, Alexandros Haliassos, Pingchuan Ma, Stavros Petridis, Maja Pantic
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2303.09645 (cross-list from cs.RO) [pdf, other]
Title: Development of a Voice Controlled Robotic Arm
Akkas U. Haque, Humayun Kabir, S. C. Banik, M. T. Islam
Subjects: Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2303.09966 (cross-list from eess.AS) [pdf, other]
Title: Magnitude-Corrected and Time-Aligned Interpolation of Head-Related Transfer Functions
Johannes M. Arend, Christoph Pörschmann, Stefan Weinzierl, Fabian Brinkmann
Journal-ref: IEEE/ACM Trans. Audio Speech and Lang. Proc., 31, 3783--3799 (2023)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[180] arXiv:2303.10008 (cross-list from eess.AS) [pdf, other]
Title: Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Julien Hauret, Thomas Joubaud, Véronique Zimpfer, Éric Bavu
Comments: Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing on 14/08/2023
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023 - Volume: 31) - pp. 3499 - 3512
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[181] arXiv:2303.10160 (cross-list from eess.AS) [pdf, other]
Title: Visual Information Matters for ASR Error Correction
Vanya Bannihatti Kumar, Shanbo Cheng, Ningxin Peng, Yuchen Zhang
Comments: Accepted at ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[182] arXiv:2303.10335 (cross-list from cs.MM) [pdf, other]
Title: Multimodal Continuous Emotion Recognition: A Technical Report for ABAW5
Su Zhang, Ziyuan Zhao, Cuntai Guan
Comments: 6 pages. 1 figure. arXiv admin note: substantial text overlap with arXiv:2203.13031
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2303.10384 (cross-list from eess.AS) [pdf, other]
Title: Powerful and Extensible WFST Framework for RNN-Transducer Losses
Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg
Comments: To appear in Proc. ICASSP 2023, June 04-10, 2023, Rhodes island, Greece. 5 pages, 5 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[184] arXiv:2303.10510 (cross-list from cs.CL) [pdf, other]
Title: A Deep Learning System for Domain-specific Speech Recognition
Yanan Jia
Comments: 4th International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2023)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2303.10556 (cross-list from eess.AS) [pdf, other]
Title: The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
Zirui Ge, Haiyan Guo, Zhen Yang
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[186] arXiv:2303.10721 (cross-list from cs.HC) [pdf, other]
Title: Right the docs: Characterising voice dataset documentation practices used in machine learning
Kathy Reid, Elizabeth T. Williams
Comments: 16 pages, 3 tables, preprint of a submission to AIES 2023
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2303.10727 (cross-list from cs.LG) [pdf, html, other]
Title: ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement
Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Celine Lin, Ashutosh Sabharwal
Comments: Accepted by ICASSP'23
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2303.10917 (cross-list from eess.AS) [pdf, other]
Title: Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[189] arXiv:2303.10931 (cross-list from stat.ML) [pdf, other]
Title: Approaching an unknown communication system by latent space exploration and causal inference
Gašper Beguš, Andrej Leban, Shane Gero
Comments: 25 pages, 23 figures; new format and section layout (moved some sections to the appendix), added replication experiments, updated references: to a subsequent experimental validation of the work, as well as to related methodological work
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[190] arXiv:2303.10942 (cross-list from cs.CL) [pdf, other]
Title: On-the-fly Text Retrieval for End-to-End ASR Adaptation
Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko
Comments: Accepted to ICASSP 2023; Appendix added to include ablations that could not fit into the conference 4-page limit
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2303.10949 (cross-list from eess.AS) [pdf, other]
Title: Code-Switching Text Generation and Injection in Mandarin-English ASR
Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[192] arXiv:2303.11089 (cross-list from cs.CV) [pdf, other]
Title: EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
Ziqiao Peng, Haoyu Wu, Zhenbo Song, Hao Xu, Xiangyu Zhu, Jun He, Hongyan Liu, Zhaoxin Fan
Comments: Accepted by ICCV 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2303.11131 (cross-list from cs.CL) [pdf, other]
Title: Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Maryam Fazel-Zarandi, Wei-Ning Hsu
Comments: ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2303.11329 (cross-list from cs.CV) [pdf, other]
Title: Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen, Shengyi Qian, Andrew Owens
Comments: ICCV 2023. Project site: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2303.11551 (cross-list from cs.CV) [pdf, other]
Title: ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Akash Gupta, Rohun Tripathi, Wondong Jang
Comments: Paper accepted at ICASSP 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[196] arXiv:2303.11607 (cross-list from cs.CL) [pdf, html, other]
Title: Transformers in Speech Processing: A Survey
Siddique Latif, Aun Zaidi, Heriberto Cuayahuitl, Fahad Shamshad, Moazzam Shoukat, Muhammad Usama, Junaid Qadir
Comments: Accepted in Computer Science Review 2025
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197] arXiv:2303.12002 (cross-list from eess.AS) [pdf, html, other]
Title: End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
Comments: 16 pages, 7 figures
Journal-ref: Speech Communication 161 (2024) 103081
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[198] arXiv:2303.12187 (cross-list from eess.AS) [pdf, other]
Title: Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English
Xiaoming Ren, Chao Li, Shenjian Wang, Biao Li
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[199] arXiv:2303.12337 (cross-list from cs.MM) [pdf, other]
Title: Music-Driven Group Choreography
Nhat Le, Thang Pham, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen
Comments: accepted in CVPR 2023
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2303.12659 (cross-list from cs.AI) [pdf, other]
Title: Posthoc Interpretation via Quantization
Francesco Paissan, Cem Subakan, Mirco Ravanelli
Comments: Francesco Paissan and Cem Subakan contributed equally
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[201] arXiv:2303.12908 (cross-list from eess.AS) [pdf, other]
Title: Self-supervised Learning with Speech Modulation Dropout
Samik Sadhu, Hynek Hermansky
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[202] arXiv:2303.12930 (cross-list from cs.CV) [pdf, other]
Title: Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, Feng Zheng
Comments: Accepted by CVPR2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[203] arXiv:2303.13027 (cross-list from eess.AS) [pdf, other]
Title: Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons
Shoichi Koyama, Keisuke Kimura, Natsuki Ueno
Comments: Accepted to Journal of Audio Engineering Society, Special Issue on Spatial Audio
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[204] arXiv:2303.13243 (cross-list from eess.AS) [pdf, other]
Title: Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu, Hailiang Xiong, Gangqiang Yang, Zhengfeng Du, Yewen Cao, Danyal Shah
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[205] arXiv:2303.13453 (cross-list from eess.AS) [pdf, other]
Title: Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV
Matteo Torcoli, Emanuël A. P. Habets
Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[206] arXiv:2303.13471 (cross-list from cs.CV) [pdf, other]
Title: Egocentric Audio-Visual Object Localization
Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
Comments: Accepted by CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[207] arXiv:2303.13536 (cross-list from cs.HC) [pdf, other]
Title: Help the Blind See: Assistance for the Visually Impaired through Augmented Acoustic Simulation
Alexander Mehta, Ritik Jalisatgi
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[208] arXiv:2303.13559 (cross-list from cs.CL) [pdf, other]
Title: Enhancing Unsupervised Speech Recognition with Diffusion GANs
Xianchao Wu
Comments: 5 pages, 1 figure, accepted by ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[209] arXiv:2303.13932 (cross-list from cs.CL) [pdf, other]
Title: Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)
Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao
Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[210] arXiv:2303.14044 (cross-list from cs.GR) [pdf, other]
Title: MusicFace: Music-driven Expressive Singing Face Synthesis
Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng
Comments: Accepted to CVMJ
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[211] arXiv:2303.14307 (cross-list from cs.CV) [pdf, other]
Title: Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic
Comments: Accepted to ICASSP 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[212] arXiv:2303.14885 (cross-list from eess.AS) [pdf, other]
Title: Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel
Comments: ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[213] arXiv:2303.15042 (cross-list from eess.AS) [pdf, other]
Title: Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise
Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann
Comments: Accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)
Journal-ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD)
[214] arXiv:2303.15132 (cross-list from eess.AS) [pdf, other]
Title: Cross-utterance ASR Rescoring with Graph-based Label Propagation
Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran
Comments: To appear in IEEE ICASSP 2023
Journal-ref: Proc. IEEE ICASSP, June 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[215] arXiv:2303.15293 (cross-list from eess.AS) [pdf, other]
Title: A Deliberation-based Joint Acoustic and Text Decoder
Sepand Mavandadi, Tara N. Sainath, Ke Hu, Zelin Wu
Comments: Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[216] arXiv:2303.15705 (cross-list from cs.CL) [pdf, other]
Title: Translate the Beauty in Songs: Jointly Learning to Align Melody and Translate Lyrics
Chengxi Li, Kai Fan, Jiajun Bu, Boxing Chen, Zhongqiang Huang, Zhi Yu
Comments: 13 pages
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[217] arXiv:2303.15944 (cross-list from cs.LG) [pdf, other]
Title: Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding
Haiquan Mao, Feng Hong, Man-wai Mak
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[218] arXiv:2303.16021 (cross-list from eess.AS) [pdf, other]
Title: Spatial Active Noise Control Method Based On Sound Field Interpolation From Reference Microphone Signals
Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari
Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[219] arXiv:2303.16024 (cross-list from cs.CV) [pdf, other]
Title: Egocentric Auditory Attention Localization in Conversations
Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[220] arXiv:2303.16031 (cross-list from cs.CR) [pdf, html, other]
Title: A Universal Identity Backdoor Attack against Speaker Verification based on Siamese Network
Haodong Zhao, Wei Du, Junjie Guo, Gongshen Liu
Comments: The first two authors contributed equally to this work
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[221] arXiv:2303.16501 (cross-list from cs.CV) [pdf, other]
Title: AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
Comments: CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2303.16897 (cross-list from cs.CV) [pdf, other]
Title: Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan
Comments: CVPR 2023. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[223] arXiv:2303.17131 (cross-list from eess.AS) [pdf, other]
Title: PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko
Comments: To appear in Proc. IEEE ICASSP
Journal-ref: Proc. IEEE ICASSP, June 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[224] arXiv:2303.17200 (cross-list from cs.CV) [pdf, other]
Title: SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen
Comments: IEEE/CVF CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[225] arXiv:2303.17395 (cross-list from eess.AS) [pdf, html, other]
Title: WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang
Comments: Accepted to TASLP
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[226] arXiv:2303.17489 (cross-list from eess.AS) [pdf, other]
Title: Prefix tuning for automated audio captioning
Minkyu Kim, Kim Sung-Bin, Tae-Hyun Oh
Comments: ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[227] arXiv:2303.17490 (cross-list from cs.CV) [pdf, other]
Title: Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh
Comments: CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[228] arXiv:2303.17517 (cross-list from cs.CL) [pdf, other]
Title: Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples
Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung
Comments: ICASSP 2023
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[229] arXiv:2303.17611 (cross-list from cs.HC) [pdf, other]
Title: Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition
Yujin Wu, Mohamed Daoudi, Ali Amad
Comments: Accepted IEEE Transactions On Affective Computing
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[230] arXiv:2303.17799 (cross-list from cs.CL) [pdf, other]
Title: Dialog act guided contextual adapter for personalized speech recognition
Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant P. Strimel, Ross McGowan
Comments: Accepted at ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[231] arXiv:2303.17829 (cross-list from eess.AS) [pdf, other]
Title: Evaluation of Noise Reduction Methods for Sentence Recognition by Sinhala Speaking Listeners
Malitha Gunawardhana, Chathuki Navanjana, Dinithi Fernando, Nipuna Upeksha, Anjula De Silva
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[232] arXiv:2303.18110 (cross-list from cs.CL) [pdf, other]
Title: The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR
Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell
Comments: Accepted to IEEE ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 232 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack