Sound

Authors and titles for March 2023

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232

Showing up to 50 entries per page: fewer | more | all

[201] arXiv:2303.12908 (cross-list from eess.AS) [pdf, other]: Title: Self-supervised Learning with Speech Modulation Dropout

Samik Sadhu, Hynek Hermansky

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[202] arXiv:2303.12930 (cross-list from cs.CV) [pdf, other]: Title: Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, Feng Zheng

Comments: Accepted by CVPR2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[203] arXiv:2303.13027 (cross-list from eess.AS) [pdf, other]: Title: Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons

Shoichi Koyama, Keisuke Kimura, Natsuki Ueno

Comments: Accepted to Journal of Audio Engineering Society, Special Issue on Spatial Audio

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[204] arXiv:2303.13243 (cross-list from eess.AS) [pdf, other]: Title: Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition

Kai Liu, Hailiang Xiong, Gangqiang Yang, Zhengfeng Du, Yewen Cao, Danyal Shah

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[205] arXiv:2303.13453 (cross-list from eess.AS) [pdf, other]: Title: Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV

Matteo Torcoli, Emanuël A. P. Habets

Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[206] arXiv:2303.13471 (cross-list from cs.CV) [pdf, other]: Title: Egocentric Audio-Visual Object Localization

Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Comments: Accepted by CVPR 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[207] arXiv:2303.13536 (cross-list from cs.HC) [pdf, other]: Title: Help the Blind See: Assistance for the Visually Impaired through Augmented Acoustic Simulation

Alexander Mehta, Ritik Jalisatgi

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[208] arXiv:2303.13559 (cross-list from cs.CL) [pdf, other]: Title: Enhancing Unsupervised Speech Recognition with Diffusion GANs

Xianchao Wu

Comments: 5 pages, 1 figure, accepted by ICASSP 2023

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[209] arXiv:2303.13932 (cross-list from cs.CL) [pdf, other]: Title: Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao

Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[210] arXiv:2303.14044 (cross-list from cs.GR) [pdf, other]: Title: MusicFace: Music-driven Expressive Singing Face Synthesis

Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng

Comments: Accepted to CVMJ

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[211] arXiv:2303.14307 (cross-list from cs.CV) [pdf, other]: Title: Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

Comments: Accepted to ICASSP 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[212] arXiv:2303.14885 (cross-list from eess.AS) [pdf, other]: Title: Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

Comments: ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[213] arXiv:2303.15042 (cross-list from eess.AS) [pdf, other]: Title: Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann

Comments: Accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

Journal-ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD)
[214] arXiv:2303.15132 (cross-list from eess.AS) [pdf, other]: Title: Cross-utterance ASR Rescoring with Graph-based Label Propagation

Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

Comments: To appear in IEEE ICASSP 2023

Journal-ref: Proc. IEEE ICASSP, June 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[215] arXiv:2303.15293 (cross-list from eess.AS) [pdf, other]: Title: A Deliberation-based Joint Acoustic and Text Decoder

Sepand Mavandadi, Tara N. Sainath, Ke Hu, Zelin Wu

Comments: Interspeech 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[216] arXiv:2303.15705 (cross-list from cs.CL) [pdf, other]: Title: Translate the Beauty in Songs: Jointly Learning to Align Melody and Translate Lyrics

Chengxi Li, Kai Fan, Jiajun Bu, Boxing Chen, Zhongqiang Huang, Zhi Yu

Comments: 13 pages

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[217] arXiv:2303.15944 (cross-list from cs.LG) [pdf, other]: Title: Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding

Haiquan Mao, Feng Hong, Man-wai Mak

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[218] arXiv:2303.16021 (cross-list from eess.AS) [pdf, other]: Title: Spatial Active Noise Control Method Based On Sound Field Interpolation From Reference Microphone Signals

Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari

Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[219] arXiv:2303.16024 (cross-list from cs.CV) [pdf, other]: Title: Egocentric Auditory Attention Localization in Conversations

Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[220] arXiv:2303.16031 (cross-list from cs.CR) [pdf, html, other]: Title: A Universal Identity Backdoor Attack against Speaker Verification based on Siamese Network

Haodong Zhao, Wei Du, Junjie Guo, Gongshen Liu

Comments: The first two authors contributed equally to this work

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[221] arXiv:2303.16501 (cross-list from cs.CV) [pdf, other]: Title: AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Comments: CVPR 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2303.16897 (cross-list from cs.CV) [pdf, other]: Title: Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

Comments: CVPR 2023. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[223] arXiv:2303.17131 (cross-list from eess.AS) [pdf, other]: Title: PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

Comments: To appear in Proc. IEEE ICASSP

Journal-ref: Proc. IEEE ICASSP, June 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[224] arXiv:2303.17200 (cross-list from cs.CV) [pdf, other]: Title: SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Comments: IEEE/CVF CVPR 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[225] arXiv:2303.17395 (cross-list from eess.AS) [pdf, html, other]: Title: WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

Comments: Accepted to TASLP

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[226] arXiv:2303.17489 (cross-list from eess.AS) [pdf, other]: Title: Prefix tuning for automated audio captioning

Minkyu Kim, Kim Sung-Bin, Tae-Hyun Oh

Comments: ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[227] arXiv:2303.17490 (cross-list from cs.CV) [pdf, other]: Title: Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh

Comments: CVPR 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[228] arXiv:2303.17517 (cross-list from cs.CL) [pdf, other]: Title: Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung

Comments: ICASSP 2023

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[229] arXiv:2303.17611 (cross-list from cs.HC) [pdf, other]: Title: Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

Yujin Wu, Mohamed Daoudi, Ali Amad

Comments: Accepted IEEE Transactions On Affective Computing

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[230] arXiv:2303.17799 (cross-list from cs.CL) [pdf, other]: Title: Dialog act guided contextual adapter for personalized speech recognition

Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant P. Strimel, Ross McGowan

Comments: Accepted at ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[231] arXiv:2303.17829 (cross-list from eess.AS) [pdf, other]: Title: Evaluation of Noise Reduction Methods for Sentence Recognition by Sinhala Speaking Listeners

Malitha Gunawardhana, Chathuki Navanjana, Dinithi Fernando, Nipuna Upeksha, Anjula De Silva

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[232] arXiv:2303.18110 (cross-list from cs.CL) [pdf, other]: Title: The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR

Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell

Comments: Accepted to IEEE ICASSP 2023

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232

Showing up to 50 entries per page: fewer | more | all