Sound

Authors and titles for December 2022

Total of 137 entries

Showing up to 2000 entries per page: fewer | more | all

[126] arXiv:2212.11772 (cross-list from cs.CL) [pdf, other]: Title: A Self-Adjusting Fusion Representation Learning Model for Unaligned Text-Audio Sequences

Kaicheng Yang, Ruxuan Zhang, Hua Xu, Kai Gao

Comments: 8 pages

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127] arXiv:2212.11851 (cross-list from eess.AS) [pdf, other]: Title: StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann

Comments: Published in IEEE/ACM Transactions on Audio, Speech and Language Processing, 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[128] arXiv:2212.12048 (cross-list from cs.CL) [pdf, other]: Title: Pushing the performances of ASR models on English and Spanish accents

Pooja Chitkara, Morgane Riviere, Jade Copet, Frank Zhang, Yatharth Saraf

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2212.12497 (cross-list from nlin.PS) [pdf, other]: Title: Fractal Patterns in Music

John McDonough, Andrzej Herczyński

Comments: 27 pages, 12 figures

Subjects: Pattern Formation and Solitons (nlin.PS); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2212.13015 (cross-list from cs.CL) [pdf, other]: Title: Skit-S2I: An Indian Accented Speech to Intent dataset

Shangeth Rajaa, Swaraj Dalmia, Kumarmanas Nethil

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131] arXiv:2212.13378 (cross-list from cs.CL) [pdf, other]: Title: Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Tomer Wullach, Shlomo E. Chazan

Comments: Accepted to AAAI 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132] arXiv:2212.13442 (cross-list from eess.IV) [pdf, other]: Title: Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research

Thomas Robotham, Ashutosh Singla, Olli S. Rummukainen, Alexander Raake, Emanuël A. P. Habets

Comments: 6 pages, 2 figures, accepted and presented at the 2022 14th International Conference on Quality of Multimedia Experience (QoMEX). Database is publicly accessible at this https URL

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2212.13703 (cross-list from eess.AS) [pdf, other]: Title: Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Comments: 5 pages, 4 figures, 2 tables, accepted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[134] arXiv:2212.13917 (cross-list from cs.HC) [pdf, other]: Title: Multimodal Emotion Recognition among Couples from Lab Settings to Daily Life using Smartwatches

George Boateng

Comments: PhD Thesis, 2022 - ETH Zurich

Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[135] arXiv:2212.14149 (cross-list from cs.LG) [pdf, other]: Title: Macro-block dropout for improved regularization in training end-to-end speech recognition models

Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung

Comments: Accepted for presentation at The 2022 IEEE Spoken Language Technology Workshop (SLT 2022)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2212.14227 (cross-list from eess.AS) [pdf, other]: Title: StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models

Yinghao Aaron Li, Cong Han, Nima Mesgarani

Comments: SLT 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[137] arXiv:2212.14518 (cross-list from eess.AS) [pdf, other]: Title: ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic

Comments: 13 pages, 5 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)

Total of 137 entries

Showing up to 2000 entries per page: fewer | more | all