Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for March 2024

Total of 213 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 ... 201-213
Showing up to 25 entries per page: fewer | more | all
[76] arXiv:2403.00854 (cross-list from q-bio.NC) [pdf, html, other]
Title: Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning
Lauren Stumpf, Balasundaram Kadirvelu, Sigourney Waibel, A. Aldo Faisal
Comments: 17 pages, 2 tables, 4 main figures, 2 supplemental figures, prepared for journal submission
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[77] arXiv:2403.00977 (cross-list from cs.SD) [pdf, html, other]
Title: Scaling Up Adaptive Filter Optimizers
Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:2403.01087 (cross-list from cs.MM) [pdf, html, other]
Title: Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu Hegde, Rudrabha Mukhopadhyay, C.V. Jawahar, Vinay Namboodiri
Comments: 8 pages of content, 1 page of references and 4 figures
Journal-ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2403.01132 (cross-list from cs.LG) [pdf, other]
Title: MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems
Chu Wang, Jinhong Wu, Yanzhi Wang, Zhijian Zha, Qi Zhou
Comments: The number of figures is 16. The number of tables is 5. The number of words is 9717
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2403.01255 (cross-list from cs.SD) [pdf, html, other]
Title: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
Hamza Kheddar, Mustapha Hemis, Yassine Himeur
Journal-ref: Information Fusion, Elsevier, 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[81] arXiv:2403.01278 (cross-list from cs.SD) [pdf, html, other]
Title: Enhancing Audio Generation Diversity with Visual Information
Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2403.01699 (cross-list from cs.CL) [pdf, other]
Title: Brilla AI: AI Contestant for the National Science and Maths Quiz
George Boateng, Jonathan Abrefah Mensah, Kevin Takyi Yeboah, William Edor, Andrew Kojo Mensah-Onumah, Naafi Dasana Ibrahim, Nana Sam Yeboah
Comments: 14 pages. Accepted for the WideAIED track at the 25th International Conference on AI in Education (AIED 2024)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2403.01700 (cross-list from cs.SD) [pdf, html, other]
Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer
Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[84] arXiv:2403.01785 (cross-list from cs.SD) [pdf, html, other]
Title: What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution
Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2403.01792 (cross-list from cs.SD) [pdf, html, other]
Title: ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2403.01960 (cross-list from cs.SD) [pdf, html, other]
Title: A robust audio deepfake detection system via multi-view feature
Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang
Comments: 5 pages, 2 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2403.02002 (cross-list from cs.SD) [pdf, html, other]
Title: Fine-Grained Quantitative Emotion Editing for Speech Generation
Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li
Comments: This is accepted to IEEE APSIPA ASC 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88] arXiv:2403.02010 (cross-list from cs.SD) [pdf, html, other]
Title: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2403.02687 (cross-list from cs.HC) [pdf, other]
Title: Enhanced DareFightingICE Competitions: Sound Design and AI Competitions
Ibrahim Khan, Chollakorn Nimpattanavong, Thai Van Nguyen, Kantinan Plupattanakit, Ruck Thawonmas
Comments: This paper describes a new competition platform using Unity for our competitions at the 2024 IEEE Conference on Games (CoG 2024). It was accepted for presentation at CoG 2024. However, we recently discovered a much more effective way to do this task without using Unity, leading to our decision to withdraw the paper from CoG 2024 and ArXiv
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:2403.02701 (cross-list from cs.SD) [pdf, html, other]
Title: Fighting Game Adaptive Background Music for Improved Gameplay
Ibrahim Khan, Thai Van Nguyen, Chollakorn Nimpattanavong, Ruck Thawonmas
Comments: This is an updated version of our IEEE CoG 2023 paper (this https URL). This version has revised the description of the association between the distance between the two players (PD) and the instrument's volume on page 2. arXiv admin note: substantial text overlap with arXiv:2303.15734
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[91] arXiv:2403.02918 (cross-list from cs.RO) [pdf, html, other]
Title: Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Yue Li, Koen V Hindriks, Florian Kunneman
Comments: Accepted by ACM Technological Advances in Human-Robot Interaction. 9 pages
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2403.02938 (cross-list from cs.CL) [pdf, html, other]
Title: AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models
Kazuki Kawamura, Jun Rekimoto
Journal-ref: AHs '23: Proceedings of the Augmented Humans International Conference 2023
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93] arXiv:2403.03095 (cross-list from cs.CV) [pdf, html, other]
Title: Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou
Comments: Accepted To ICASSP2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2403.03145 (cross-list from cs.CV) [pdf, html, other]
Title: Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng
Comments: Accepted to NeurIPS2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2403.03224 (cross-list from physics.soc-ph) [pdf, html, other]
Title: Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory
Vedant Tapiavala, Joshua Piesner, Sourjyamoy Barman, Feng Fu
Comments: 16 pages, 4 figures
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2403.03395 (cross-list from cs.SD) [pdf, other]
Title: Interactive Melody Generation System for Enhancing the Creativity of Musicians
So Hirawata, Noriko Otani
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[97] arXiv:2403.03411 (cross-list from cs.SD) [pdf, html, other]
Title: CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation
Vahid Ahmadi Kalkhorani, DeLiang Wang
Comments: 9 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2403.03510 (cross-list from cs.SD) [pdf, html, other]
Title: METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D
Stefan Schoder, Paul Maurerlehner
Comments: 4
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)
[99] arXiv:2403.03522 (cross-list from cs.SD) [pdf, html, other]
Title: Non-verbal information in spontaneous speech -- towards a new framework of analysis
Tirza Biron, Moshe Barboy, Eran Ben-Artzy, Alona Golubchik, Yanir Marmor, Smadar Szekely, Yaron Winter, David Harel
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[100] arXiv:2403.03538 (cross-list from cs.SD) [pdf, html, other]
Title: RADIA -- Radio Advertisement Detection with Intelligent Analytics
Jorge Álvarez, Juan Carlos Armenteros, Camilo Torrón, Miguel Ortega-Martín, Alfonso Ardoiz, Óscar García, Ignacio Arranz, Íñigo Galdeano, Ignacio Garrido, Adrián Alonso, Fernando Bayón, Oleg Vorontsov
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Total of 213 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 ... 201-213
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack