Audio and Speech Processing

Authors and titles for April 2022

Total of 320 entries : 1-25 76-100 101-125 126-150 151-175 176-200 201-225 226-250 ... 301-320

Showing up to 25 entries per page: fewer | more | all

[151] arXiv:2204.01397 (cross-list from cs.CL) [pdf, other]: Title: A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2204.01564 (cross-list from cs.SD) [pdf, other]: Title: Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Comments: Submitted to Interspeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[153] arXiv:2204.01670 (cross-list from cs.CL) [pdf, other]: Title: Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

Comments: Submitted for review at Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2204.01672 (cross-list from cs.SD) [pdf, other]: Title: Residual-guided Personalized Speech Synthesis based on Face Image

Jianrong Wang, Zixuan Wang, Xiaosheng Hu, Xuewei Li, Qiang Fang, Li Liu

Comments: ICASSP 2022

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[155] arXiv:2204.01726 (cross-list from cs.CV) [pdf, other]: Title: Lip to Speech Synthesis with Visual Context Attentional GAN

Minsu Kim, Joanna Hong, Yong Man Ro

Comments: Published at NeurIPS 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[156] arXiv:2204.01787 (cross-list from cs.SD) [pdf, other]: Title: GWA: A Large High-Quality Acoustic Dataset for Audio Processing

Zhenyu Tang, Rohith Aralikatti, Anton Ratnarajah, Dinesh Manocha

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2204.01893 (cross-list from cs.CL) [pdf, other]: Title: Deliberation Model for On-Device Spoken Language Understanding

Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer

Comments: Accepted for publication at INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[158] arXiv:2204.01905 (cross-list from cs.SD) [pdf, other]: Title: Learning to Adapt to Domain Shifts with Few-shot Samples in Anomalous Sound Detection

Bingqing Chen, Luca Bondi, Samarjit Das

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[159] arXiv:2204.01954 (cross-list from physics.comp-ph) [pdf, other]: Title: Application of a Spectral Method to Simulate Quasi-Three-Dimensional Underwater Acoustic Fields

Houwang Tu, Yongxian Wang, Wei Liu, Chunmei Yang, Jixing Qin, Shuqing Ma, Xiaodong Wang

Comments: 31 pages, 22 figures. arXiv admin note: text overlap with arXiv:2112.13602

Subjects: Computational Physics (physics.comp-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2204.01977 (cross-list from cs.SD) [pdf, other]: Title: Audio-visual multi-channel speech separation, dereverberation and recognition

Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[161] arXiv:2204.02023 (cross-list from cs.SD) [pdf, other]: Title: A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Li-Rong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[162] arXiv:2204.02040 (cross-list from cs.SD) [pdf, other]: Title: On the Relevance of Bandwidth Extension for Speaker Verification

Marcos Faundez-Zanuy, Mattias Nilsson, W. Bastiaan Kleijn

Comments: 4 pages published in 7th International Conference on Spoken Language Processing, September 16-20, 2002, Denver, Colorado, USA. arXiv admin note: text overlap with arXiv:2202.13865

Journal-ref: 7th International Conference on Spoken Language Processing (ICSLP2002), September 16-20, 2002

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[163] arXiv:2204.02088 (cross-list from cs.SD) [pdf, other]: Title: A Mixed supervised Learning Framework for Target Sound Detection

Dongchao Yang, Helin Wang, Yuexian Zou, Wenwu Wang

Comments: submitted to DCASE workshop

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164] arXiv:2204.02090 (cross-list from cs.CV) [pdf, other]: Title: VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices

Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro

Comments: Paper accepted to Interspeech 2022; Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2204.02101 (cross-list from cs.SD) [pdf, other]: Title: Non-Linear Speech coding with MLP, RBF and Elman based prediction

Marcos Faundez-Zanuy

Comments: 9 pages, published in Mira, J., Álvarez, J.R. (eds) Artificial Neural Nets Problem Solving Methods. IWANN 2003. Lecture Notes in Computer Science, vol 2687. Springer, Berlin, Heidelberg

Journal-ref: International Work-Conference on Artificial Neural Networks IWANN 2003, LNCS 2687 Menorca (Spain)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2204.02121 (cross-list from cs.SD) [pdf, other]: Title: MetaAudio: A Few-Shot Audio Classification Benchmark

Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi

Comments: 9 pages with 1 figure and 2 main results tables. V1 Preprint

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[167] arXiv:2204.02143 (cross-list from cs.SD) [pdf, other]: Title: RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection

Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang

Comments: submitted to interspeech2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2204.02152 (cross-list from cs.SD) [pdf, other]: Title: UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari

Comments: Accepted to INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2204.02172 (cross-list from cs.SD) [pdf, other]: Title: Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech

Hyungchan Yoon, Seyun Um, Changwhan Kim, Hong-Goo Kang

Comments: INTERSPEECH 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2204.02269 (cross-list from cs.SD) [pdf, other]: Title: Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[171] arXiv:2204.02279 (cross-list from cs.SD) [pdf, other]: Title: How Information on Acoustic Scenes and Sound Events Mutually Benefits Event Detection and Scene Classification Tasks

Keisuke Imoto, Yuka Komatsu, Shunsuke Tsubaki, Tatsuya Komatsu

Comments: Submitted to INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2204.02389 (cross-list from cs.CV) [pdf, other]: Title: ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu

Comments: In CVPR 2022. Gao, Si, and Chang contributed equally to this work. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[173] arXiv:2204.02400 (cross-list from cs.SD) [pdf, other]: Title: What can predictive speech coders learn from speaker recognizers?

Marcos Faundez-Zanuy

Comments: 7 pages, published in ITRW on Non-Linear Speech Processing (NOLISP 03), May 20-23, 2003, Le Croisic, France, paper 001. arXiv admin note: text overlap with arXiv:2204.02101

Journal-ref: Non-Linear Speech Processing (NOLISP) 2003

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2204.02455 (cross-list from cs.SD) [pdf, other]: Title: Improving Voice Trigger Detection with Metric Learning

Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

Comments: Accepted at InterSpeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[175] arXiv:2204.02470 (cross-list from cs.CL) [pdf, other]: Title: Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel Lopez-Francisco, Jonathan D. Amith, Shinji Watanabe

Comments: 5 pages, 2 figures, submitted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 320 entries : 1-25 76-100 101-125 126-150 151-175 176-200 201-225 226-250 ... 301-320

Showing up to 25 entries per page: fewer | more | all