Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for April 2022

Total of 320 entries : 1-100 101-200 126-225 201-300 301-320
Showing up to 100 entries per page: fewer | more | all
[126] arXiv:2204.00352 (cross-list from cs.LG) [pdf, other]
Title: On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting
Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee
Comments: Accepted by SLT 2022
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127] arXiv:2204.00540 (cross-list from cs.SD) [pdf, other]
Title: End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[128] arXiv:2204.00558 (cross-list from cs.CL) [pdf, other]
Title: Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding
Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra
Comments: Accepted at ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2204.00604 (cross-list from cs.CV) [pdf, other]
Title: Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, Sergey Tulyakov
Comments: Dataset and code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2204.00628 (cross-list from cs.SD) [pdf, other]
Title: Learning Neural Acoustic Fields
Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
Comments: NeurIPS 2022. Project page: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[131] arXiv:2204.00652 (cross-list from cs.SD) [pdf, other]
Title: End-to-end multi-talker audio-visual ASR using an active speaker attention module
Richard Rose, Olivier Siohan
Comments: 5 pages, 3 figures, 3 tables, 28 citations
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[132] arXiv:2204.00679 (cross-list from cs.CV) [pdf, other]
Title: Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2204.00770 (cross-list from cs.SD) [pdf, other]
Title: Speaker adaptation for Wav2vec2 based dysarthric ASR
Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Diez, Tim Polzehl, Lukáš Burget, Jan "Honza'' Černocký
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[134] arXiv:2204.00803 (cross-list from cs.CL) [pdf, other]
Title: End-to-end model for named entity recognition from speech without paired training data
Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève
Comments: Submitted to INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2204.00819 (cross-list from cs.SD) [pdf, other]
Title: Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Guodong Ma, Pengfei Hu, Jian Kang, Shen Huang, Hao Huang
Comments: Accepted by INTERSPEECH 2021
Journal-ref: INTERSPEECH 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2204.00821 (cross-list from cs.SD) [pdf, other]
Title: Improving Target Sound Extraction with Timestamp Information
Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou
Comments: submitted to interspeech2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2204.00873 (cross-list from cs.SD) [pdf, other]
Title: Acoustic-to-articulatory Inversion based on Speech Decomposition and Auxiliary Feature
Jianrong Wang, Jinyu Liu, Longxuan Zhao, Shanyu Wang, Ruiguo Yu, Li Liu
Journal-ref: ICASSP 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[138] arXiv:2204.00902 (cross-list from cs.SD) [pdf, other]
Title: An objective test tool for pitch extractors' response attributes
Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise
Comments: 5 pages, 9 figures, submitted to Interspeech2022. arXiv admin note: text overlap with arXiv:2111.03629
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[139] arXiv:2204.00907 (cross-list from cs.SD) [pdf, other]
Title: StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks
Antoine Lavault, Axel Roebel, Matthieu Voiry
Comments: Accepted for publication in Sound and Music Computing 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2204.00911 (cross-list from cs.SD) [pdf, other]
Title: Measuring pitch extractors' response to frequency-modulated multi-component signals
Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise
Comments: 11 pages, 9 figures, The following article has been submitted to/accepted by The Acoustical Society of America. After it is published, it will be found at this http URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2204.00977 (cross-list from cs.CL) [pdf, other]
Title: Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents
Priyank Dubey, Bilal Shah
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142] arXiv:2204.00990 (cross-list from cs.SD) [pdf, other]
Title: Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2204.01009 (cross-list from cs.SD) [pdf, other]
Title: A Computational Analysis of Pitch Drift in Unaccompanied Solo Singing using DBSCAN Clustering
Sepideh Shafiei, S. Hakam
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[144] arXiv:2204.01115 (cross-list from cs.SD) [pdf, other]
Title: On incorporating social speaker characteristics in synthetic speech
Sai Sirisha Rallabandi, Sebastian Möller
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[145] arXiv:2204.01235 (cross-list from cs.CL) [pdf, other]
Title: An Analysis of Semantically-Aligned Speech-Text Embeddings
Muhammad Huzaifah, Ivan Kukanov
Comments: This is the accepted version of the paper published at IEEE Spoken Language Technology (SLT) Workshop 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[146] arXiv:2204.01265 (cross-list from cs.CV) [pdf, other]
Title: Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro
Comments: Published at ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147] arXiv:2204.01294 (cross-list from cs.SD) [pdf, other]
Title: On The Model Size Selection For Speaker Identification
Marcos Faundez-Zanuy
Comments: 5 pages, published in Speaker odyssey 2001, The speaker recognition workshop. 189-194 Crete (Greece)
Journal-ref: 2001 A Speaker Odyssey - The Speaker Recognition Workshop June 18-22, 2001, Crete, Greece
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2204.01295 (cross-list from cs.SD) [pdf, other]
Title: Nonlinear Vectorial Prediction with Neural Nets
Marcos Faundez-Zanuy
Comments: 9 pages, published in Proceedings of the 6th International Work Conference on Artificial and Natural Neural Networks: Bio inspired Applications of Connectionism Part II June 2001 Pages 754 761
Journal-ref: Lecture Notes in Computer Science LNCS 2085 Vol. II, pages 754-761. IWANN 2001, Granada (Spain) ISSN 0302-9743
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2204.01338 (cross-list from cs.SD) [pdf, other]
Title: An Initialization Scheme for Meeting Separation with Spatial Mixture Models
Christoph Boeddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150] arXiv:2204.01360 (cross-list from cs.SD) [pdf, other]
Title: Learning the Proximity Operator in Unfolded ADMM for Phase Retrieval
Pierre-Hugo Vial, Paul Magron, Thomas Oberlin, Cédric Févotte
Comments: 10 pages, 5 figures, submitted to IEEE SPL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[151] arXiv:2204.01397 (cross-list from cs.CL) [pdf, other]
Title: A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems
Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève
Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2204.01564 (cross-list from cs.SD) [pdf, other]
Title: Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection
Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[153] arXiv:2204.01670 (cross-list from cs.CL) [pdf, other]
Title: Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang
Comments: Submitted for review at Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2204.01672 (cross-list from cs.SD) [pdf, other]
Title: Residual-guided Personalized Speech Synthesis based on Face Image
Jianrong Wang, Zixuan Wang, Xiaosheng Hu, Xuewei Li, Qiang Fang, Li Liu
Comments: ICASSP 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[155] arXiv:2204.01726 (cross-list from cs.CV) [pdf, other]
Title: Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim, Joanna Hong, Yong Man Ro
Comments: Published at NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[156] arXiv:2204.01787 (cross-list from cs.SD) [pdf, other]
Title: GWA: A Large High-Quality Acoustic Dataset for Audio Processing
Zhenyu Tang, Rohith Aralikatti, Anton Ratnarajah, Dinesh Manocha
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2204.01893 (cross-list from cs.CL) [pdf, other]
Title: Deliberation Model for On-Device Spoken Language Understanding
Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer
Comments: Accepted for publication at INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[158] arXiv:2204.01905 (cross-list from cs.SD) [pdf, other]
Title: Learning to Adapt to Domain Shifts with Few-shot Samples in Anomalous Sound Detection
Bingqing Chen, Luca Bondi, Samarjit Das
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[159] arXiv:2204.01954 (cross-list from physics.comp-ph) [pdf, other]
Title: Application of a Spectral Method to Simulate Quasi-Three-Dimensional Underwater Acoustic Fields
Houwang Tu, Yongxian Wang, Wei Liu, Chunmei Yang, Jixing Qin, Shuqing Ma, Xiaodong Wang
Comments: 31 pages, 22 figures. arXiv admin note: text overlap with arXiv:2112.13602
Subjects: Computational Physics (physics.comp-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2204.01977 (cross-list from cs.SD) [pdf, other]
Title: Audio-visual multi-channel speech separation, dereverberation and recognition
Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[161] arXiv:2204.02023 (cross-list from cs.SD) [pdf, other]
Title: A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Li-Rong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[162] arXiv:2204.02040 (cross-list from cs.SD) [pdf, other]
Title: On the Relevance of Bandwidth Extension for Speaker Verification
Marcos Faundez-Zanuy, Mattias Nilsson, W. Bastiaan Kleijn
Comments: 4 pages published in 7th International Conference on Spoken Language Processing, September 16-20, 2002, Denver, Colorado, USA. arXiv admin note: text overlap with arXiv:2202.13865
Journal-ref: 7th International Conference on Spoken Language Processing (ICSLP2002), September 16-20, 2002
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[163] arXiv:2204.02088 (cross-list from cs.SD) [pdf, other]
Title: A Mixed supervised Learning Framework for Target Sound Detection
Dongchao Yang, Helin Wang, Yuexian Zou, Wenwu Wang
Comments: submitted to DCASE workshop
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164] arXiv:2204.02090 (cross-list from cs.CV) [pdf, other]
Title: VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro
Comments: Paper accepted to Interspeech 2022; Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2204.02101 (cross-list from cs.SD) [pdf, other]
Title: Non-Linear Speech coding with MLP, RBF and Elman based prediction
Marcos Faundez-Zanuy
Comments: 9 pages, published in Mira, J., Álvarez, J.R. (eds) Artificial Neural Nets Problem Solving Methods. IWANN 2003. Lecture Notes in Computer Science, vol 2687. Springer, Berlin, Heidelberg
Journal-ref: International Work-Conference on Artificial Neural Networks IWANN 2003, LNCS 2687 Menorca (Spain)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2204.02121 (cross-list from cs.SD) [pdf, other]
Title: MetaAudio: A Few-Shot Audio Classification Benchmark
Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi
Comments: 9 pages with 1 figure and 2 main results tables. V1 Preprint
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[167] arXiv:2204.02143 (cross-list from cs.SD) [pdf, other]
Title: RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection
Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang
Comments: submitted to interspeech2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2204.02152 (cross-list from cs.SD) [pdf, other]
Title: UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari
Comments: Accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2204.02172 (cross-list from cs.SD) [pdf, other]
Title: Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
Hyungchan Yoon, Seyun Um, Changwhan Kim, Hong-Goo Kang
Comments: INTERSPEECH 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2204.02269 (cross-list from cs.SD) [pdf, other]
Title: Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation
Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[171] arXiv:2204.02279 (cross-list from cs.SD) [pdf, other]
Title: How Information on Acoustic Scenes and Sound Events Mutually Benefits Event Detection and Scene Classification Tasks
Keisuke Imoto, Yuka Komatsu, Shunsuke Tsubaki, Tatsuya Komatsu
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2204.02389 (cross-list from cs.CV) [pdf, other]
Title: ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu
Comments: In CVPR 2022. Gao, Si, and Chang contributed equally to this work. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[173] arXiv:2204.02400 (cross-list from cs.SD) [pdf, other]
Title: What can predictive speech coders learn from speaker recognizers?
Marcos Faundez-Zanuy
Comments: 7 pages, published in ITRW on Non-Linear Speech Processing (NOLISP 03), May 20-23, 2003, Le Croisic, France, paper 001. arXiv admin note: text overlap with arXiv:2204.02101
Journal-ref: Non-Linear Speech Processing (NOLISP) 2003
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2204.02455 (cross-list from cs.SD) [pdf, other]
Title: Improving Voice Trigger Detection with Metric Learning
Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik
Comments: Accepted at InterSpeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[175] arXiv:2204.02470 (cross-list from cs.CL) [pdf, other]
Title: Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel Lopez-Francisco, Jonathan D. Amith, Shinji Watanabe
Comments: 5 pages, 2 figures, submitted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176] arXiv:2204.02485 (cross-list from cs.CV) [pdf, other]
Title: Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization
Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2204.02492 (cross-list from cs.CL) [pdf, other]
Title: Towards End-to-end Unsupervised Speech Recognition
Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Comments: Preprint
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2204.02500 (cross-list from cs.CR) [pdf, other]
Title: User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning
Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan
Journal-ref: Proc. Interspeech 2022
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2204.02524 (cross-list from cs.SD) [pdf, other]
Title: Simple and Effective Unsupervised Speech Synthesis
Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass
Comments: preprint, equal contribution from first two authors
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[180] arXiv:2204.02530 (cross-list from cs.CL) [pdf, other]
Title: Prosodic Alignment for off-screen automatic dubbing
Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote
Comments: 5 pages, 2 figures, 3 tables, Submitted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181] arXiv:2204.02609 (cross-list from cs.SD) [pdf, other]
Title: A New Nonlinear speaker parameterization algorithm for speaker identification
Mohamed Chetouani, Marcos Faundez-Zanuy, Bruno Gas, Jean-Luc Zarader
Comments: 5 pages, published in The speaker and Language recognition Workshop. ISCA tutorial and research Workshop. ISBN 84-7490-722-5, May 31 -- June 3, 2004
Journal-ref: The speaker and Language recognition Workshop (Speaker Odyssey), Toledo (Spain), 2004
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[182] arXiv:2204.02743 (cross-list from cs.SD) [pdf, other]
Title: Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu, Shiyin Kang, Helen Meng
Comments: Accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2204.02804 (cross-list from cs.SD) [pdf, other]
Title: Federated Self-supervised Speech Representations: Are We There Yet?
Yan Gao, Javier Fernandez-Marques, Titouan Parcollet, Abhinav Mehrotra, Nicholas D. Lane
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[184] arXiv:2204.02810 (cross-list from cs.CV) [pdf, other]
Title: Expression-preserving face frontalization improves visually assisted speech processing
Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda
Comments: arXiv admin note: text overlap with arXiv:2202.00538
Journal-ref: International Journal of Computer Vision 131 (5), 1122-1140, 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2204.02814 (cross-list from cs.SD) [pdf, other]
Title: Aggression in Hindi and English Speech: Acoustic Correlates and Automatic Identification
Ritesh Kumar, Atul Kr. Ojha, Bornini Lahiri, Chingrimnng Lungleng
Comments: To appear in the Proceedings of Conference on Sanskrit and Indian Languages: Technology
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[186] arXiv:2204.02874 (cross-list from cs.CV) [pdf, other]
Title: ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius
Comments: ECCV 2022 Oral project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2204.02967 (cross-list from cs.CL) [pdf, other]
Title: Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee
Comments: Accepted to be published in the Proceedings of Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2204.03040 (cross-list from cs.SD) [pdf, other]
Title: SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Georgia Maniati, Alexandra Vioni, Nikolaos Ellinas, Karolos Nikitaras, Konstantinos Klapsas, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis
Comments: Accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[189] arXiv:2204.03042 (cross-list from cs.SD) [pdf, other]
Title: FFC-SE: Fast Fourier Convolution for Speech Enhancement
Ivan Shchekotov, Pavel Andreev, Oleg Ivanov, Aibek Alanov, Dmitry Vetrov
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[190] arXiv:2204.03063 (cross-list from cs.MM) [pdf, other]
Title: Late multimodal fusion for image and audio music transcription
María Alfaro-Contreras (1), Jose J. Valero-Mas (1), José M. Iñesta (1), Jorge Calvo-Zaragoza (1) ((1) Instituto Universitario de Investigación Informática, University of Alicante, Alicante, Spain)
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2204.03083 (cross-list from cs.CV) [pdf, other]
Title: Audio-Visual Person-of-Interest DeepFake Detection
Davide Cozzolino, Alessandro Pianese, Matthias Nießner, Luisa Verdoliva
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192] arXiv:2204.03178 (cross-list from cs.SD) [pdf, other]
Title: 3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Zhao You, Shulin Feng, Dan Su, Dong Yu
Comments: 5 pages, 1 figure. Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[193] arXiv:2204.03240 (cross-list from cs.SD) [pdf, other]
Title: Speech Pre-training with Acoustic Piece
Shuo Ren, Shujie Liu, Yu Wu, Long Zhou, Furu Wei
Comments: 5 pages, 4 figures; submitted to Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2204.03249 (cross-list from cs.SD) [pdf, other]
Title: Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder
Juheon Lee, Hyeong-Seok Choi, Kyogu Lee
Comments: 4 pages, Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2204.03255 (cross-list from cs.SD) [pdf, other]
Title: Arabic Text-To-Speech (TTS) Data Preparation
Hala Al Masri, Muhy Eddin Za'ter
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[196] arXiv:2204.03307 (cross-list from cs.SD) [pdf, other]
Title: Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li
Comments: 5 pages, 1 figure, accepted by IEEE ICASSP 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[197] arXiv:2204.03315 (cross-list from cs.CL) [pdf, other]
Title: Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
Nick J.C. Wang, Lu Wang, Yandan Sun, Haimei Kang, Dejun Zhang
Comments: Published in INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[198] arXiv:2204.03398 (cross-list from cs.SD) [pdf, other]
Title: Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition
Qijie Shao, Jinghao Yan, Jian Kang, Pengcheng Guo, Xian Shi, Pengfei Hu, Lei Xie
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[199] arXiv:2204.03409 (cross-list from cs.CL) [pdf, other]
Title: MAESTRO: Matched Speech Text Representations through Modality Matching
Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Moreno, Ankur Bapna, Heiga Zen
Comments: Accepted by Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2204.03421 (cross-list from cs.SD) [pdf, other]
Title: Self-supervised learning for robust voice cloning
Konstantinos Klapsas, Nikolaos Ellinas, Karolos Nikitaras, Georgios Vamvoukakis, Panos Kakoulidis, Konstantinos Markopoulos, Spyros Raptis, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis
Comments: Accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[201] arXiv:2204.03561 (cross-list from cs.CV) [pdf, other]
Title: Emotional Speech Recognition with Pre-trained Deep Visual Models
Waleed Ragheb, Mehdi Mirzapour, Ali Delfardi, Hélène Jacquenet, Lawrence Carbon
Journal-ref: Deep Learning for NLP Workshop, Extraction et Gestion des Connaissances (EGC), 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[202] arXiv:2204.03594 (cross-list from cs.SD) [pdf, other]
Title: Heterogeneous Target Speech Separation
Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux
Comments: Submitted to Interspeech 2022
Journal-ref: Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[203] arXiv:2204.03740 (cross-list from cs.SD) [pdf, other]
Title: Successes and critical failures of neural networks in capturing human-like speech recognition
Federico Adolfi, Jeffrey S. Bowers, David Poeppel
Journal-ref: Neural Networks, 162, 199-211 (2023)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[204] arXiv:2204.03847 (cross-list from cs.SD) [pdf, other]
Title: Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion
Weida Liang, Lantian Li, Wenqiang Du, Dong Wang
Comments: submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[205] arXiv:2204.03852 (cross-list from cs.SD) [pdf, other]
Title: Reliable Visualization for Deep Speaker Recognition
Pengqi Li, Lantian Li, Askar Hamdulla, Dong Wang
Comments: submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[206] arXiv:2204.03879 (cross-list from cs.CL) [pdf, other]
Title: A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding
Nick J.C. Wang, Shaojun Wang, Jing Xiao
Comments: Submitted to INTERSPEECH 2022. (5 pages, 1 figure.)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[207] arXiv:2204.03888 (cross-list from cs.CL) [pdf, other]
Title: Transducer-based language embedding for spoken language identification
Peng Shen, Xugang Lu, Hisashi Kawai
Comments: This paper was accepted by Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[208] arXiv:2204.03889 (cross-list from cs.SD) [pdf, other]
Title: Adding Connectionist Temporal Summarization into Conformer to Improve Its Decoder Efficiency For Speech Recognition
Nick J.C. Wang, Zongfeng Quan, Shaojun Wang, Jing Xiao
Comments: Submitted to INTERSPEECH 2022 (5 pages, 2 figures)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[209] arXiv:2204.03939 (cross-list from cs.CL) [pdf, other]
Title: GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao
Comments: Accepted at Interspeech 2023. GigaST dataset is available at this https URL
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[210] arXiv:2204.03967 (cross-list from cs.SD) [pdf, other]
Title: The Sillwood Technologies System for the VoiceMOS Challenge 2022
Jiameng Gao
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[211] arXiv:2204.04013 (cross-list from cs.LG) [pdf, other]
Title: Mel-spectrogram features for acoustic vehicle detection and speed estimation
Nikola Bulatovic, Slobodan Djukanovic
Comments: Published in: 2022 26th International Conference on Information Technology (IT)
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[212] arXiv:2204.04166 (cross-list from cs.SD) [pdf, other]
Title: Self-supervised Speaker Diarization
Yehoshua Dissen, Felix Kreuk, Joseph Keshet
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[213] arXiv:2204.04464 (cross-list from cs.SD) [pdf, other]
Title: Multichannel Speech Separation with Narrow-band Conformer
Changsheng Quan, Xiaofei Li
Comments: accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[214] arXiv:2204.04579 (cross-list from cs.SD) [pdf, other]
Title: Inferring Pitch from Coarse Spectral Features
Danni Ma, Neville Ryant, Mark Liberman
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[215] arXiv:2204.04645 (cross-list from cs.SD) [pdf, other]
Title: Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data
Yu Kang, Tianqiao Liu, Hang Li, Yang Hao, Wenbiao Ding
Comments: AAAI 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[216] arXiv:2204.04646 (cross-list from cs.SD) [pdf, other]
Title: Deep Embeddings for Robust User-Based Amateur Vocal Percussion Classification
Alejandro Delgado, Emir Demirel, Vinod Subramanian, Charalampos Saitis, Mark Sandler
Comments: Accepted at Sound and Music Computing (SMC) conference 2022
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[217] arXiv:2204.04651 (cross-list from cs.SD) [pdf, other]
Title: Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation
Alejandro Delgado, Charalampos Saitis, Emmanouil Benetos, Mark Sandler
Comments: Submitted to Interspeech 2022 (under review)
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[218] arXiv:2204.04756 (cross-list from cs.SD) [pdf, other]
Title: Towards Evaluation of Autonomously Generated Musical Compositions: A Comprehensive Survey
Daniel Kvak
Subjects: Sound (cs.SD); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[219] arXiv:2204.04802 (cross-list from cs.SD) [pdf, other]
Title: On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice
Ankit Shah, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, Rita Singh
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[220] arXiv:2204.04855 (cross-list from cs.SD) [pdf, other]
Title: Fusion of Self-supervised Learned Models for MOS Prediction
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao
Comments: MOS 2022 shared task system description paper
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[221] arXiv:2204.04965 (cross-list from cs.CL) [pdf, other]
Title: Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding
Sanjana Sankar (GIPSA-CRISSP), Denis Beautemps (GIPSA-CRISSP), Thomas Hueber (GIPSA-CRISSP)
Journal-ref: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, Singapour, Singapore
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2204.05070 (cross-list from cs.SD) [pdf, other]
Title: Fine-grained Noise Control for Multispeaker Speech Synthesis
Karolos Nikitaras, Georgios Vamvoukakis, Nikolaos Ellinas, Konstantinos Klapsas, Konstantinos Markopoulos, Spyros Raptis, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis
Comments: Accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[223] arXiv:2204.05076 (cross-list from cs.CL) [pdf, other]
Title: End-to-End Speech Translation for Code Switched Speech
Orion Weller, Matthias Sperber, Telmo Pires, Hendra Setiawan, Christian Gollan, Dominic Telaar, Matthias Paulik
Comments: Accepted to Findings of ACL 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[224] arXiv:2204.05082 (cross-list from cs.SD) [pdf, other]
Title: An approach to improving sound-based vehicle speed estimation
Nikola Bulatovic, Slobodan Djukanovic
Comments: Submitted to: 2022 Zooming Innovation in Consumer Technologies Conference (ZINC)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[225] arXiv:2204.05156 (cross-list from cs.SD) [pdf, other]
Title: How to Listen? Rethinking Visual Sound Localization
Ho-Hsiang Wu, Magdalena Fuentes, Prem Seetharaman, Juan Pablo Bello
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 320 entries : 1-100 101-200 126-225 201-300 301-320
Showing up to 100 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack