Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for March 2023

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232
Showing up to 50 entries per page: fewer | more | all
[201] arXiv:2303.12908 (cross-list from eess.AS) [pdf, other]
Title: Self-supervised Learning with Speech Modulation Dropout
Samik Sadhu, Hynek Hermansky
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[202] arXiv:2303.12930 (cross-list from cs.CV) [pdf, other]
Title: Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, Feng Zheng
Comments: Accepted by CVPR2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[203] arXiv:2303.13027 (cross-list from eess.AS) [pdf, other]
Title: Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons
Shoichi Koyama, Keisuke Kimura, Natsuki Ueno
Comments: Accepted to Journal of Audio Engineering Society, Special Issue on Spatial Audio
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[204] arXiv:2303.13243 (cross-list from eess.AS) [pdf, other]
Title: Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu, Hailiang Xiong, Gangqiang Yang, Zhengfeng Du, Yewen Cao, Danyal Shah
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[205] arXiv:2303.13453 (cross-list from eess.AS) [pdf, other]
Title: Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV
Matteo Torcoli, Emanuël A. P. Habets
Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[206] arXiv:2303.13471 (cross-list from cs.CV) [pdf, other]
Title: Egocentric Audio-Visual Object Localization
Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
Comments: Accepted by CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[207] arXiv:2303.13536 (cross-list from cs.HC) [pdf, other]
Title: Help the Blind See: Assistance for the Visually Impaired through Augmented Acoustic Simulation
Alexander Mehta, Ritik Jalisatgi
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[208] arXiv:2303.13559 (cross-list from cs.CL) [pdf, other]
Title: Enhancing Unsupervised Speech Recognition with Diffusion GANs
Xianchao Wu
Comments: 5 pages, 1 figure, accepted by ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[209] arXiv:2303.13932 (cross-list from cs.CL) [pdf, other]
Title: Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)
Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao
Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[210] arXiv:2303.14044 (cross-list from cs.GR) [pdf, other]
Title: MusicFace: Music-driven Expressive Singing Face Synthesis
Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng
Comments: Accepted to CVMJ
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[211] arXiv:2303.14307 (cross-list from cs.CV) [pdf, other]
Title: Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic
Comments: Accepted to ICASSP 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[212] arXiv:2303.14885 (cross-list from eess.AS) [pdf, other]
Title: Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel
Comments: ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[213] arXiv:2303.15042 (cross-list from eess.AS) [pdf, other]
Title: Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise
Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann
Comments: Accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)
Journal-ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD)
[214] arXiv:2303.15132 (cross-list from eess.AS) [pdf, other]
Title: Cross-utterance ASR Rescoring with Graph-based Label Propagation
Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran
Comments: To appear in IEEE ICASSP 2023
Journal-ref: Proc. IEEE ICASSP, June 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[215] arXiv:2303.15293 (cross-list from eess.AS) [pdf, other]
Title: A Deliberation-based Joint Acoustic and Text Decoder
Sepand Mavandadi, Tara N. Sainath, Ke Hu, Zelin Wu
Comments: Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[216] arXiv:2303.15705 (cross-list from cs.CL) [pdf, other]
Title: Translate the Beauty in Songs: Jointly Learning to Align Melody and Translate Lyrics
Chengxi Li, Kai Fan, Jiajun Bu, Boxing Chen, Zhongqiang Huang, Zhi Yu
Comments: 13 pages
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[217] arXiv:2303.15944 (cross-list from cs.LG) [pdf, other]
Title: Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding
Haiquan Mao, Feng Hong, Man-wai Mak
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[218] arXiv:2303.16021 (cross-list from eess.AS) [pdf, other]
Title: Spatial Active Noise Control Method Based On Sound Field Interpolation From Reference Microphone Signals
Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari
Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[219] arXiv:2303.16024 (cross-list from cs.CV) [pdf, other]
Title: Egocentric Auditory Attention Localization in Conversations
Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[220] arXiv:2303.16031 (cross-list from cs.CR) [pdf, html, other]
Title: A Universal Identity Backdoor Attack against Speaker Verification based on Siamese Network
Haodong Zhao, Wei Du, Junjie Guo, Gongshen Liu
Comments: The first two authors contributed equally to this work
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[221] arXiv:2303.16501 (cross-list from cs.CV) [pdf, other]
Title: AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
Comments: CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2303.16897 (cross-list from cs.CV) [pdf, other]
Title: Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan
Comments: CVPR 2023. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[223] arXiv:2303.17131 (cross-list from eess.AS) [pdf, other]
Title: PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko
Comments: To appear in Proc. IEEE ICASSP
Journal-ref: Proc. IEEE ICASSP, June 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[224] arXiv:2303.17200 (cross-list from cs.CV) [pdf, other]
Title: SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen
Comments: IEEE/CVF CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[225] arXiv:2303.17395 (cross-list from eess.AS) [pdf, html, other]
Title: WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang
Comments: Accepted to TASLP
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[226] arXiv:2303.17489 (cross-list from eess.AS) [pdf, other]
Title: Prefix tuning for automated audio captioning
Minkyu Kim, Kim Sung-Bin, Tae-Hyun Oh
Comments: ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[227] arXiv:2303.17490 (cross-list from cs.CV) [pdf, other]
Title: Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh
Comments: CVPR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[228] arXiv:2303.17517 (cross-list from cs.CL) [pdf, other]
Title: Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples
Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung
Comments: ICASSP 2023
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[229] arXiv:2303.17611 (cross-list from cs.HC) [pdf, other]
Title: Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition
Yujin Wu, Mohamed Daoudi, Ali Amad
Comments: Accepted IEEE Transactions On Affective Computing
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[230] arXiv:2303.17799 (cross-list from cs.CL) [pdf, other]
Title: Dialog act guided contextual adapter for personalized speech recognition
Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant P. Strimel, Ross McGowan
Comments: Accepted at ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[231] arXiv:2303.17829 (cross-list from eess.AS) [pdf, other]
Title: Evaluation of Noise Reduction Methods for Sentence Recognition by Sinhala Speaking Listeners
Malitha Gunawardhana, Chathuki Navanjana, Dinithi Fernando, Nipuna Upeksha, Anjula De Silva
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[232] arXiv:2303.18110 (cross-list from cs.CL) [pdf, other]
Title: The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR
Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell
Comments: Accepted to IEEE ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 232 entries : 1-50 51-100 101-150 151-200 201-232
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack