Audio and Speech Processing

Authors and titles for March 2024

Total of 213 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 ... 201-213

Showing up to 25 entries per page: fewer | more | all

[76] arXiv:2403.00854 (cross-list from q-bio.NC) [pdf, html, other]: Title: Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning

Lauren Stumpf, Balasundaram Kadirvelu, Sigourney Waibel, A. Aldo Faisal

Comments: 17 pages, 2 tables, 4 main figures, 2 supplemental figures, prepared for journal submission

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[77] arXiv:2403.00977 (cross-list from cs.SD) [pdf, html, other]: Title: Scaling Up Adaptive Filter Optimizers

Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:2403.01087 (cross-list from cs.MM) [pdf, html, other]: Title: Towards Accurate Lip-to-Speech Synthesis in-the-Wild

Sindhu Hegde, Rudrabha Mukhopadhyay, C.V. Jawahar, Vinay Namboodiri

Comments: 8 pages of content, 1 page of references and 4 figures

Journal-ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2403.01132 (cross-list from cs.LG) [pdf, other]: Title: MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems

Chu Wang, Jinhong Wu, Yanzhi Wang, Zhijian Zha, Qi Zhou

Comments: The number of figures is 16. The number of tables is 5. The number of words is 9717

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2403.01255 (cross-list from cs.SD) [pdf, html, other]: Title: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey

Hamza Kheddar, Mustapha Hemis, Yassine Himeur

Journal-ref: Information Fusion, Elsevier, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[81] arXiv:2403.01278 (cross-list from cs.SD) [pdf, html, other]: Title: Enhancing Audio Generation Diversity with Visual Information

Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2403.01699 (cross-list from cs.CL) [pdf, other]: Title: Brilla AI: AI Contestant for the National Science and Maths Quiz

George Boateng, Jonathan Abrefah Mensah, Kevin Takyi Yeboah, William Edor, Andrew Kojo Mensah-Onumah, Naafi Dasana Ibrahim, Nana Sam Yeboah

Comments: 14 pages. Accepted for the WideAIED track at the 25th International Conference on AI in Education (AIED 2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2403.01700 (cross-list from cs.SD) [pdf, html, other]: Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

Comments: Accepted by ICASSP 2024

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[84] arXiv:2403.01785 (cross-list from cs.SD) [pdf, html, other]: Title: What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution

Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2403.01792 (cross-list from cs.SD) [pdf, html, other]: Title: ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2403.01960 (cross-list from cs.SD) [pdf, html, other]: Title: A robust audio deepfake detection system via multi-view feature

Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang

Comments: 5 pages, 2 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2403.02002 (cross-list from cs.SD) [pdf, html, other]: Title: Fine-Grained Quantitative Emotion Editing for Speech Generation

Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

Comments: This is accepted to IEEE APSIPA ASC 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88] arXiv:2403.02010 (cross-list from cs.SD) [pdf, html, other]: Title: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2403.02687 (cross-list from cs.HC) [pdf, other]: Title: Enhanced DareFightingICE Competitions: Sound Design and AI Competitions

Ibrahim Khan, Chollakorn Nimpattanavong, Thai Van Nguyen, Kantinan Plupattanakit, Ruck Thawonmas

Comments: This paper describes a new competition platform using Unity for our competitions at the 2024 IEEE Conference on Games (CoG 2024). It was accepted for presentation at CoG 2024. However, we recently discovered a much more effective way to do this task without using Unity, leading to our decision to withdraw the paper from CoG 2024 and ArXiv

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:2403.02701 (cross-list from cs.SD) [pdf, html, other]: Title: Fighting Game Adaptive Background Music for Improved Gameplay

Ibrahim Khan, Thai Van Nguyen, Chollakorn Nimpattanavong, Ruck Thawonmas

Comments: This is an updated version of our IEEE CoG 2023 paper (this https URL). This version has revised the description of the association between the distance between the two players (PD) and the instrument's volume on page 2. arXiv admin note: substantial text overlap with arXiv:2303.15734

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[91] arXiv:2403.02918 (cross-list from cs.RO) [pdf, html, other]: Title: Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction

Yue Li, Koen V Hindriks, Florian Kunneman

Comments: Accepted by ACM Technological Advances in Human-Robot Interaction. 9 pages

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2403.02938 (cross-list from cs.CL) [pdf, html, other]: Title: AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models

Kazuki Kawamura, Jun Rekimoto

Journal-ref: AHs '23: Proceedings of the Augmented Humans International Conference 2023

Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93] arXiv:2403.03095 (cross-list from cs.CV) [pdf, html, other]: Title: Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

Comments: Accepted To ICASSP2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2403.03145 (cross-list from cs.CV) [pdf, html, other]: Title: Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

Comments: Accepted to NeurIPS2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2403.03224 (cross-list from physics.soc-ph) [pdf, html, other]: Title: Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory

Vedant Tapiavala, Joshua Piesner, Sourjyamoy Barman, Feng Fu

Comments: 16 pages, 4 figures

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2403.03395 (cross-list from cs.SD) [pdf, other]: Title: Interactive Melody Generation System for Enhancing the Creativity of Musicians

So Hirawata, Noriko Otani

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[97] arXiv:2403.03411 (cross-list from cs.SD) [pdf, html, other]: Title: CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation

Vahid Ahmadi Kalkhorani, DeLiang Wang

Comments: 9 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2403.03510 (cross-list from cs.SD) [pdf, html, other]: Title: METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D

Stefan Schoder, Paul Maurerlehner

Comments: 4

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)
[99] arXiv:2403.03522 (cross-list from cs.SD) [pdf, html, other]: Title: Non-verbal information in spontaneous speech -- towards a new framework of analysis

Tirza Biron, Moshe Barboy, Eran Ben-Artzy, Alona Golubchik, Yanir Marmor, Smadar Szekely, Yaron Winter, David Harel

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[100] arXiv:2403.03538 (cross-list from cs.SD) [pdf, html, other]: Title: RADIA -- Radio Advertisement Detection with Intelligent Analytics

Jorge Álvarez, Juan Carlos Armenteros, Camilo Torrón, Miguel Ortega-Martín, Alfonso Ardoiz, Óscar García, Ignacio Arranz, Íñigo Galdeano, Ignacio Garrido, Adrián Alonso, Fernando Bayón, Oleg Vorontsov

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Total of 213 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 ... 201-213

Showing up to 25 entries per page: fewer | more | all