Audio and Speech Processing

Authors and titles for April 2022

Total of 320 entries : 1-50 51-100 101-150 126-175 151-200 201-250 251-300 ... 301-320

Showing up to 50 entries per page: fewer | more | all

[126] arXiv:2204.00352 (cross-list from cs.LG) [pdf, other]: Title: On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting

Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee

Comments: Accepted by SLT 2022

Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127] arXiv:2204.00540 (cross-list from cs.SD) [pdf, other]: Title: End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation

Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe

Comments: Submitted to Interspeech 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[128] arXiv:2204.00558 (cross-list from cs.CL) [pdf, other]: Title: Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra

Comments: Accepted at ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2204.00604 (cross-list from cs.CV) [pdf, other]: Title: Quantized GAN for Complex Music Generation from Dance Videos

Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, Sergey Tulyakov

Comments: Dataset and code at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2204.00628 (cross-list from cs.SD) [pdf, other]: Title: Learning Neural Acoustic Fields

Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

Comments: NeurIPS 2022. Project page: this https URL

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[131] arXiv:2204.00652 (cross-list from cs.SD) [pdf, other]: Title: End-to-end multi-talker audio-visual ASR using an active speaker attention module

Richard Rose, Olivier Siohan

Comments: 5 pages, 3 figures, 3 tables, 28 citations

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[132] arXiv:2204.00679 (cross-list from cs.CV) [pdf, other]: Title: Learning Audio-Video Modalities from Image Captions

Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2204.00770 (cross-list from cs.SD) [pdf, other]: Title: Speaker adaptation for Wav2vec2 based dysarthric ASR

Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Diez, Tim Polzehl, Lukáš Burget, Jan "Honza'' Černocký

Comments: Submitted to INTERSPEECH 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[134] arXiv:2204.00803 (cross-list from cs.CL) [pdf, other]: Title: End-to-end model for named entity recognition from speech without paired training data

Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève

Comments: Submitted to INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2204.00819 (cross-list from cs.SD) [pdf, other]: Title: Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition

Guodong Ma, Pengfei Hu, Jian Kang, Shen Huang, Hao Huang

Comments: Accepted by INTERSPEECH 2021

Journal-ref: INTERSPEECH 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2204.00821 (cross-list from cs.SD) [pdf, other]: Title: Improving Target Sound Extraction with Timestamp Information

Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou

Comments: submitted to interspeech2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2204.00873 (cross-list from cs.SD) [pdf, other]: Title: Acoustic-to-articulatory Inversion based on Speech Decomposition and Auxiliary Feature

Jianrong Wang, Jinyu Liu, Longxuan Zhao, Shanyu Wang, Ruiguo Yu, Li Liu

Journal-ref: ICASSP 2022

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[138] arXiv:2204.00902 (cross-list from cs.SD) [pdf, other]: Title: An objective test tool for pitch extractors' response attributes

Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise

Comments: 5 pages, 9 figures, submitted to Interspeech2022. arXiv admin note: text overlap with arXiv:2111.03629

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[139] arXiv:2204.00907 (cross-list from cs.SD) [pdf, other]: Title: StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks

Antoine Lavault, Axel Roebel, Matthieu Voiry

Comments: Accepted for publication in Sound and Music Computing 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2204.00911 (cross-list from cs.SD) [pdf, other]: Title: Measuring pitch extractors' response to frequency-modulated multi-component signals

Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise

Comments: 11 pages, 9 figures, The following article has been submitted to/accepted by The Acoustical Society of America. After it is published, it will be found at this http URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2204.00977 (cross-list from cs.CL) [pdf, other]: Title: Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents

Priyank Dubey, Bilal Shah

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142] arXiv:2204.00990 (cross-list from cs.SD) [pdf, other]: Title: Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis

Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng

Comments: Accepted by Interspeech 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2204.01009 (cross-list from cs.SD) [pdf, other]: Title: A Computational Analysis of Pitch Drift in Unaccompanied Solo Singing using DBSCAN Clustering

Sepideh Shafiei, S. Hakam

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[144] arXiv:2204.01115 (cross-list from cs.SD) [pdf, other]: Title: On incorporating social speaker characteristics in synthetic speech

Sai Sirisha Rallabandi, Sebastian Möller

Comments: Submitted to Interspeech 2022

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[145] arXiv:2204.01235 (cross-list from cs.CL) [pdf, other]: Title: An Analysis of Semantically-Aligned Speech-Text Embeddings

Muhammad Huzaifah, Ivan Kukanov

Comments: This is the accepted version of the paper published at IEEE Spoken Language Technology (SLT) Workshop 2022

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[146] arXiv:2204.01265 (cross-list from cs.CV) [pdf, other]: Title: Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

Comments: Published at ICCV 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147] arXiv:2204.01294 (cross-list from cs.SD) [pdf, other]: Title: On The Model Size Selection For Speaker Identification

Marcos Faundez-Zanuy

Comments: 5 pages, published in Speaker odyssey 2001, The speaker recognition workshop. 189-194 Crete (Greece)

Journal-ref: 2001 A Speaker Odyssey - The Speaker Recognition Workshop June 18-22, 2001, Crete, Greece

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2204.01295 (cross-list from cs.SD) [pdf, other]: Title: Nonlinear Vectorial Prediction with Neural Nets

Marcos Faundez-Zanuy

Comments: 9 pages, published in Proceedings of the 6th International Work Conference on Artificial and Natural Neural Networks: Bio inspired Applications of Connectionism Part II June 2001 Pages 754 761

Journal-ref: Lecture Notes in Computer Science LNCS 2085 Vol. II, pages 754-761. IWANN 2001, Granada (Spain) ISSN 0302-9743

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2204.01338 (cross-list from cs.SD) [pdf, other]: Title: An Initialization Scheme for Meeting Separation with Spatial Mixture Models

Christoph Boeddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach

Comments: Submitted to INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150] arXiv:2204.01360 (cross-list from cs.SD) [pdf, other]: Title: Learning the Proximity Operator in Unfolded ADMM for Phase Retrieval

Pierre-Hugo Vial, Paul Magron, Thomas Oberlin, Cédric Févotte

Comments: 10 pages, 5 figures, submitted to IEEE SPL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[151] arXiv:2204.01397 (cross-list from cs.CL) [pdf, other]: Title: A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2204.01564 (cross-list from cs.SD) [pdf, other]: Title: Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Comments: Submitted to Interspeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[153] arXiv:2204.01670 (cross-list from cs.CL) [pdf, other]: Title: Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

Comments: Submitted for review at Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2204.01672 (cross-list from cs.SD) [pdf, other]: Title: Residual-guided Personalized Speech Synthesis based on Face Image

Jianrong Wang, Zixuan Wang, Xiaosheng Hu, Xuewei Li, Qiang Fang, Li Liu

Comments: ICASSP 2022

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[155] arXiv:2204.01726 (cross-list from cs.CV) [pdf, other]: Title: Lip to Speech Synthesis with Visual Context Attentional GAN

Minsu Kim, Joanna Hong, Yong Man Ro

Comments: Published at NeurIPS 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[156] arXiv:2204.01787 (cross-list from cs.SD) [pdf, other]: Title: GWA: A Large High-Quality Acoustic Dataset for Audio Processing

Zhenyu Tang, Rohith Aralikatti, Anton Ratnarajah, Dinesh Manocha

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2204.01893 (cross-list from cs.CL) [pdf, other]: Title: Deliberation Model for On-Device Spoken Language Understanding

Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer

Comments: Accepted for publication at INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[158] arXiv:2204.01905 (cross-list from cs.SD) [pdf, other]: Title: Learning to Adapt to Domain Shifts with Few-shot Samples in Anomalous Sound Detection

Bingqing Chen, Luca Bondi, Samarjit Das

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[159] arXiv:2204.01954 (cross-list from physics.comp-ph) [pdf, other]: Title: Application of a Spectral Method to Simulate Quasi-Three-Dimensional Underwater Acoustic Fields

Houwang Tu, Yongxian Wang, Wei Liu, Chunmei Yang, Jixing Qin, Shuqing Ma, Xiaodong Wang

Comments: 31 pages, 22 figures. arXiv admin note: text overlap with arXiv:2112.13602

Subjects: Computational Physics (physics.comp-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2204.01977 (cross-list from cs.SD) [pdf, other]: Title: Audio-visual multi-channel speech separation, dereverberation and recognition

Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[161] arXiv:2204.02023 (cross-list from cs.SD) [pdf, other]: Title: A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Li-Rong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[162] arXiv:2204.02040 (cross-list from cs.SD) [pdf, other]: Title: On the Relevance of Bandwidth Extension for Speaker Verification

Marcos Faundez-Zanuy, Mattias Nilsson, W. Bastiaan Kleijn

Comments: 4 pages published in 7th International Conference on Spoken Language Processing, September 16-20, 2002, Denver, Colorado, USA. arXiv admin note: text overlap with arXiv:2202.13865

Journal-ref: 7th International Conference on Spoken Language Processing (ICSLP2002), September 16-20, 2002

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[163] arXiv:2204.02088 (cross-list from cs.SD) [pdf, other]: Title: A Mixed supervised Learning Framework for Target Sound Detection

Dongchao Yang, Helin Wang, Yuexian Zou, Wenwu Wang

Comments: submitted to DCASE workshop

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164] arXiv:2204.02090 (cross-list from cs.CV) [pdf, other]: Title: VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices

Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro

Comments: Paper accepted to Interspeech 2022; Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2204.02101 (cross-list from cs.SD) [pdf, other]: Title: Non-Linear Speech coding with MLP, RBF and Elman based prediction

Marcos Faundez-Zanuy

Comments: 9 pages, published in Mira, J., Álvarez, J.R. (eds) Artificial Neural Nets Problem Solving Methods. IWANN 2003. Lecture Notes in Computer Science, vol 2687. Springer, Berlin, Heidelberg

Journal-ref: International Work-Conference on Artificial Neural Networks IWANN 2003, LNCS 2687 Menorca (Spain)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2204.02121 (cross-list from cs.SD) [pdf, other]: Title: MetaAudio: A Few-Shot Audio Classification Benchmark

Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi

Comments: 9 pages with 1 figure and 2 main results tables. V1 Preprint

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[167] arXiv:2204.02143 (cross-list from cs.SD) [pdf, other]: Title: RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection

Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang

Comments: submitted to interspeech2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2204.02152 (cross-list from cs.SD) [pdf, other]: Title: UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari

Comments: Accepted to INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2204.02172 (cross-list from cs.SD) [pdf, other]: Title: Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech

Hyungchan Yoon, Seyun Um, Changwhan Kim, Hong-Goo Kang

Comments: INTERSPEECH 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2204.02269 (cross-list from cs.SD) [pdf, other]: Title: Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[171] arXiv:2204.02279 (cross-list from cs.SD) [pdf, other]: Title: How Information on Acoustic Scenes and Sound Events Mutually Benefits Event Detection and Scene Classification Tasks

Keisuke Imoto, Yuka Komatsu, Shunsuke Tsubaki, Tatsuya Komatsu

Comments: Submitted to INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2204.02389 (cross-list from cs.CV) [pdf, other]: Title: ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu

Comments: In CVPR 2022. Gao, Si, and Chang contributed equally to this work. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[173] arXiv:2204.02400 (cross-list from cs.SD) [pdf, other]: Title: What can predictive speech coders learn from speaker recognizers?

Marcos Faundez-Zanuy

Comments: 7 pages, published in ITRW on Non-Linear Speech Processing (NOLISP 03), May 20-23, 2003, Le Croisic, France, paper 001. arXiv admin note: text overlap with arXiv:2204.02101

Journal-ref: Non-Linear Speech Processing (NOLISP) 2003

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2204.02455 (cross-list from cs.SD) [pdf, other]: Title: Improving Voice Trigger Detection with Metric Learning

Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

Comments: Accepted at InterSpeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[175] arXiv:2204.02470 (cross-list from cs.CL) [pdf, other]: Title: Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel Lopez-Francisco, Jonathan D. Amith, Shinji Watanabe

Comments: 5 pages, 2 figures, submitted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 320 entries : 1-50 51-100 101-150 126-175 151-200 201-250 251-300 ... 301-320

Showing up to 50 entries per page: fewer | more | all