Sound

Authors and titles for March 2024

Total of 170 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-170

Showing up to 25 entries per page: fewer | more | all

[76] arXiv:2403.17327 [pdf, html, other]: Title: Accuracy enhancement method for speech emotion recognition from spectrogram using temporal frequency correlation and positional information learning through knowledge transfer

Jeong-Yoon Kim, Seung-Ho Lee

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[77] arXiv:2403.17376 [pdf, other]: Title: Theoretical Analysis of Quality of Conventional Beamforming for Phased Microphone Arrays

Dheepak Khatri, Kenneth Granlund

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:2403.17378 [pdf, html, other]: Title: Low-Latency Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks

Yang Ai, Zhen-Hua Ling

Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing. arXiv admin note: substantial text overlap with arXiv:2211.15974

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2403.17379 [pdf, html, other]: Title: Exploring and Applying Audio-Based Sentiment Analysis in Music

Etash Jhanji

Comments: 5 pages, 7 figures, 2 tables. For source code, see this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[80] arXiv:2403.17508 [pdf, html, other]: Title: Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant

Modan Tailleur, Junwon Lee, Mathieu Lagrange, Keunwoo Choi, Laurie M. Heller, Keisuke Imoto, Yuki Okamoto

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2403.17529 [pdf, html, other]: Title: Detection of Deepfake Environmental Audio

Hafsa Ouajdi, Oussama Hadder, Modan Tailleur, Mathieu Lagrange, Laurie M. Heller

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2403.17562 [pdf, html, other]: Title: Deep functional multiple index models with an application to SER

Matthieu Saumard, Abir El Haj, Thibault Napoleon

Comments: 5 pages, 1 figure

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applications (stat.AP)
[83] arXiv:2403.18572 [pdf, html, other]: Title: ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Gijs Wijngaard, Elia Formisano, Bruno L. Giordano, Michel Dumontier

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2403.18821 [pdf, html, other]: Title: Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

Comments: Accepted to CVPR 2024. Project site: this https URL

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[85] arXiv:2403.19224 [pdf, html, other]: Title: Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Siyuan Shen, Yu Gao, Feng Liu, Hanyang Wang, Aimin Zhou

Comments: Accepted by 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2403.19441 [pdf, html, other]: Title: A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical Interviews

Mamadou Dia, Ghazaleh Khodabandelou, Alice Othmani

Journal-ref: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (2023) 700-705

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[87] arXiv:2403.19634 [pdf, html, other]: Title: Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2

Pierre-Michel Bousquet, Mickael Rouvier

Comments: LIA system description for the Short Duration Speaker Verification (SdSv) challenge 2020 Task 2

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[88] arXiv:2403.19763 [pdf, html, other]: Title: Creating Aesthetic Sonifications on the Web with SIREN

Tristan Peng, Hongchan Choi, Jonathan Berger

Comments: 7 pages, 1 figure, 5 listings, submitted to the Web Audio Conference 2024

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[89] arXiv:2403.20130 [pdf, html, other]: Title: Sound event localization and classification using WASN in Outdoor Environment

Dongzhe Zhang, Jianfeng Chen, Jisheng Bai, Mou Wang

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[90] arXiv:2403.20202 [pdf, other]: Title: Voice Signal Processing for Machine Learning. The Case of Speaker Isolation

Radan Ganchev

Comments: MSc. thesis. for associated source code, see this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[91] arXiv:2403.00212 (cross-list from cs.CL) [pdf, html, other]: Title: Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2403.00274 (cross-list from cs.CV) [pdf, html, other]: Title: CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93] arXiv:2403.00293 (cross-list from eess.AS) [pdf, html, other]: Title: Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification

Mufan Sang, John H.L. Hansen

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[94] arXiv:2403.00370 (cross-list from cs.CL) [pdf, html, other]: Title: Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview

Heyang Liu, Yu Wang, Yanfeng Wang

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2403.00379 (cross-list from eess.AS) [pdf, html, other]: Title: The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[96] arXiv:2403.00854 (cross-list from q-bio.NC) [pdf, html, other]: Title: Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning

Lauren Stumpf, Balasundaram Kadirvelu, Sigourney Waibel, A. Aldo Faisal

Comments: 17 pages, 2 tables, 4 main figures, 2 supplemental figures, prepared for journal submission

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97] arXiv:2403.00887 (cross-list from eess.AS) [pdf, html, other]: Title: SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

Aron R, Indra Sigicharla, Chirag Periwal, Mohanaprasad K, Nithya Darisini P S, Sourabh Tiwari, Shivani Arora

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[98] arXiv:2403.01087 (cross-list from cs.MM) [pdf, html, other]: Title: Towards Accurate Lip-to-Speech Synthesis in-the-Wild

Sindhu Hegde, Rudrabha Mukhopadhyay, C.V. Jawahar, Vinay Namboodiri

Comments: 8 pages of content, 1 page of references and 4 figures

Journal-ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2403.01132 (cross-list from cs.LG) [pdf, other]: Title: MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems

Chu Wang, Jinhong Wu, Yanzhi Wang, Zhijian Zha, Qi Zhou

Comments: The number of figures is 16. The number of tables is 5. The number of words is 9717

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100] arXiv:2403.01494 (cross-list from eess.AS) [pdf, html, other]: Title: PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian

Comments: Accepted to ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)

Total of 170 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-170

Showing up to 25 entries per page: fewer | more | all