Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for April 2022

Total of 320 entries : 1-50 51-100 101-150 151-200 201-250 251-300 ... 301-320
Showing up to 50 entries per page: fewer | more | all
[101] arXiv:2204.11232 [pdf, other]
Title: Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
Natsuo Yamashita, Shota Horiguchi, Takeshi Homma
Comments: Accepted to Speaker Odyssey 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[102] arXiv:2204.11286 [pdf, other]
Title: Improved far-field speech recognition using Joint Variational Autoencoder
Shashi Kumar, Shakti P. Rath, Abhishek Pandey
Comments: 5 pages, 2 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[103] arXiv:2204.11501 [pdf, other]
Title: Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data
Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[104] arXiv:2204.11933 [pdf, other]
Title: Cleanformer: A multichannel array configuration-invariant neural enhancement frontend for ASR in smart speakers
Joseph Caroselli, Arun Narayanan, Nathan Howard, Tom O'Malley
Comments: Accepted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[105] arXiv:2204.12076 [pdf, other]
Title: ATST: Audio Representation Learning with Teacher-Student Transformer
Xian Li, Xiaofei Li
Comments: INTERSPEECH2022(Accepted)
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[106] arXiv:2204.12092 [pdf, other]
Title: Mask scalar prediction for improving robust automatic speech recognition
Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[107] arXiv:2204.12260 [pdf, other]
Title: Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Comments: 22 pages, 8 figures. Under the review process
Journal-ref: HEAR: Holistic Evaluation of Audio Representations (NeurIPS 2021 Competition) PMLR 166 (2022) 1-24
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[108] arXiv:2204.12279 [pdf, other]
Title: Low-dimensional representation of infant and adult vocalization acoustics
Silvia Pagliarini, Sara Schneider, Christopher T. Kello, Anne S. Warlaumont
Comments: Under review at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[109] arXiv:2204.12308 [pdf, other]
Title: Supervised Attention in Sequence-to-Sequence Models for Speech Recognition
Gene-Ping Yang, Hao Tang
Comments: Accepted at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[110] arXiv:2204.12649 [pdf, other]
Title: Study on the Fairness of Speaker Verification Systems on Underrepresented Accents in English
Mariel Estevez, Luciana Ferrer
Comments: 5 pages, 2 figures, submitted to INTERSPEECH
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[111] arXiv:2204.12777 [pdf, other]
Title: Ultra Fast Speech Separation Model with Teacher Student Learning
Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu
Comments: Accepted by interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[112] arXiv:2204.13883 [pdf, other]
Title: Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain
Karn N. Watcharasupat, Kenneth Ooi, Bhan Lam, Trevor Wong, Zhen-Ting Ong, Woon-Seng Gan
Comments: Accepted to IEEE Signal Processing Letters. (c) 2022 IEEE
Journal-ref: IEEE Signal Processing Letters, Vol. 29, pp. 1749 - 1753, 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[113] arXiv:2204.13890 [pdf, other]
Title: Deployment of an IoT System for Adaptive In-Situ Soundscape Augmentation
Trevor Wong, Karn N. Watcharasupat, Bhan Lam, Kenneth Ooi, Zhen-Ting Ong, Furi Andi Karnapi, Woon-Seng Gan
Comments: To be presented at the 51st International Congress and Exposition on Noise Control Engineering
Journal-ref: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Feb. 2022, vol. 265, no. 5, pp. 2013-2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Systems and Control (eess.SY)
[114] arXiv:2204.00061 (cross-list from cs.SD) [pdf, other]
Title: Data-augmented cross-lingual synthesis in a teacher-student framework
Marcel de Korte, Jaebok Kim, Aki Kunikoshi, Adaeze Adigwe, Esther Klabbers
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[115] arXiv:2204.00088 (cross-list from cs.SD) [pdf, other]
Title: Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression
Salvatore Fara, Stefano Goria, Emilia Molimpakis, Nicholas Cummins
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[116] arXiv:2204.00094 (cross-list from cs.SD) [pdf, other]
Title: Perceptive, non-linear Speech Processing and Spiking Neural Networks
Jean Rouat, Ramin Pichevar, Stéphane Loiselle
Comments: preprint of the 2005 published paper: Perceptive, Non-linear Speech Processing and Spiking Neural Networks. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science, vol 3445. Springer, Berlin, Heidelberg
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[117] arXiv:2204.00164 (cross-list from cs.CL) [pdf, other]
Title: Filter-based Discriminative Autoencoders for Children Speech Recognition
Chiang-Lin Tai, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Comments: Published in EUSIPCO 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2204.00174 (cross-list from cs.CL) [pdf, other]
Title: InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR
Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida
Comments: This paper was submitted to INTERSPEECH2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:2204.00175 (cross-list from cs.CL) [pdf, other]
Title: Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR
Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida
Comments: SLT 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2204.00176 (cross-list from cs.CL) [pdf, other]
Title: Better Intermediates Improve CTC Inference
Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida
Comments: 5 pages, submitted INTERSPEECH2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121] arXiv:2204.00212 (cross-list from cs.CL) [pdf, other]
Title: Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems
Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon
Comments: Accepted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2204.00291 (cross-list from cs.CL) [pdf, other]
Title: Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Rodolfo Zevallos
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123] arXiv:2204.00311 (cross-list from cs.SD) [pdf, other]
Title: Speaker verification in mismatch training and testing conditions
Marcos Faundez-Zanuy, Adam Slupinski
Comments: 4 pages, published in 6th international conference on spoken language processing (ICSLP 2000), Vol. II, pp.322-325. ICSLP 2000, ISBN 7-80150-144-4/G.18Beijing (China). October 16-20, 2000. arXiv admin note: substantial text overlap with arXiv:2203.00513
Journal-ref: 6th international conference on spoken language processing (ICSLP 2000), Vol. II, pp.322-325, 2000
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124] arXiv:2204.00331 (cross-list from cs.SD) [pdf, other]
Title: Using segment-based features of jaw movements to recognize foraging activities in grazing cattle
José O. Chelotti, Sebastián R. Vanrell, Luciano S. Martinez-Rau, Julio R. Galli, Santiago A. Utsumi, Alejandra M. Planisich, Suyai A. Almirón, Diego H. Milone, Leonardo L. Giovanini, H. Leonardo Rufiner
Comments: Preprint submitted to journal
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2204.00348 (cross-list from cs.CL) [pdf, other]
Title: WavFT: Acoustic model finetuning with labelled and unlabelled data
Utkarsh Chauhan, Vikas Joshi, Rupesh R. Mehta
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126] arXiv:2204.00352 (cross-list from cs.LG) [pdf, other]
Title: On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting
Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee
Comments: Accepted by SLT 2022
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127] arXiv:2204.00540 (cross-list from cs.SD) [pdf, other]
Title: End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[128] arXiv:2204.00558 (cross-list from cs.CL) [pdf, other]
Title: Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding
Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra
Comments: Accepted at ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2204.00604 (cross-list from cs.CV) [pdf, other]
Title: Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, Sergey Tulyakov
Comments: Dataset and code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2204.00628 (cross-list from cs.SD) [pdf, other]
Title: Learning Neural Acoustic Fields
Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
Comments: NeurIPS 2022. Project page: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[131] arXiv:2204.00652 (cross-list from cs.SD) [pdf, other]
Title: End-to-end multi-talker audio-visual ASR using an active speaker attention module
Richard Rose, Olivier Siohan
Comments: 5 pages, 3 figures, 3 tables, 28 citations
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[132] arXiv:2204.00679 (cross-list from cs.CV) [pdf, other]
Title: Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2204.00770 (cross-list from cs.SD) [pdf, other]
Title: Speaker adaptation for Wav2vec2 based dysarthric ASR
Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Diez, Tim Polzehl, Lukáš Burget, Jan "Honza'' Černocký
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[134] arXiv:2204.00803 (cross-list from cs.CL) [pdf, other]
Title: End-to-end model for named entity recognition from speech without paired training data
Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève
Comments: Submitted to INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2204.00819 (cross-list from cs.SD) [pdf, other]
Title: Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Guodong Ma, Pengfei Hu, Jian Kang, Shen Huang, Hao Huang
Comments: Accepted by INTERSPEECH 2021
Journal-ref: INTERSPEECH 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2204.00821 (cross-list from cs.SD) [pdf, other]
Title: Improving Target Sound Extraction with Timestamp Information
Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou
Comments: submitted to interspeech2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2204.00873 (cross-list from cs.SD) [pdf, other]
Title: Acoustic-to-articulatory Inversion based on Speech Decomposition and Auxiliary Feature
Jianrong Wang, Jinyu Liu, Longxuan Zhao, Shanyu Wang, Ruiguo Yu, Li Liu
Journal-ref: ICASSP 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[138] arXiv:2204.00902 (cross-list from cs.SD) [pdf, other]
Title: An objective test tool for pitch extractors' response attributes
Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise
Comments: 5 pages, 9 figures, submitted to Interspeech2022. arXiv admin note: text overlap with arXiv:2111.03629
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[139] arXiv:2204.00907 (cross-list from cs.SD) [pdf, other]
Title: StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks
Antoine Lavault, Axel Roebel, Matthieu Voiry
Comments: Accepted for publication in Sound and Music Computing 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2204.00911 (cross-list from cs.SD) [pdf, other]
Title: Measuring pitch extractors' response to frequency-modulated multi-component signals
Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise
Comments: 11 pages, 9 figures, The following article has been submitted to/accepted by The Acoustical Society of America. After it is published, it will be found at this http URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2204.00977 (cross-list from cs.CL) [pdf, other]
Title: Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents
Priyank Dubey, Bilal Shah
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142] arXiv:2204.00990 (cross-list from cs.SD) [pdf, other]
Title: Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2204.01009 (cross-list from cs.SD) [pdf, other]
Title: A Computational Analysis of Pitch Drift in Unaccompanied Solo Singing using DBSCAN Clustering
Sepideh Shafiei, S. Hakam
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[144] arXiv:2204.01115 (cross-list from cs.SD) [pdf, other]
Title: On incorporating social speaker characteristics in synthetic speech
Sai Sirisha Rallabandi, Sebastian Möller
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[145] arXiv:2204.01235 (cross-list from cs.CL) [pdf, other]
Title: An Analysis of Semantically-Aligned Speech-Text Embeddings
Muhammad Huzaifah, Ivan Kukanov
Comments: This is the accepted version of the paper published at IEEE Spoken Language Technology (SLT) Workshop 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[146] arXiv:2204.01265 (cross-list from cs.CV) [pdf, other]
Title: Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro
Comments: Published at ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147] arXiv:2204.01294 (cross-list from cs.SD) [pdf, other]
Title: On The Model Size Selection For Speaker Identification
Marcos Faundez-Zanuy
Comments: 5 pages, published in Speaker odyssey 2001, The speaker recognition workshop. 189-194 Crete (Greece)
Journal-ref: 2001 A Speaker Odyssey - The Speaker Recognition Workshop June 18-22, 2001, Crete, Greece
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2204.01295 (cross-list from cs.SD) [pdf, other]
Title: Nonlinear Vectorial Prediction with Neural Nets
Marcos Faundez-Zanuy
Comments: 9 pages, published in Proceedings of the 6th International Work Conference on Artificial and Natural Neural Networks: Bio inspired Applications of Connectionism Part II June 2001 Pages 754 761
Journal-ref: Lecture Notes in Computer Science LNCS 2085 Vol. II, pages 754-761. IWANN 2001, Granada (Spain) ISSN 0302-9743
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2204.01338 (cross-list from cs.SD) [pdf, other]
Title: An Initialization Scheme for Meeting Separation with Spatial Mixture Models
Christoph Boeddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150] arXiv:2204.01360 (cross-list from cs.SD) [pdf, other]
Title: Learning the Proximity Operator in Unfolded ADMM for Phase Retrieval
Pierre-Hugo Vial, Paul Magron, Thomas Oberlin, Cédric Févotte
Comments: 10 pages, 5 figures, submitted to IEEE SPL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Total of 320 entries : 1-50 51-100 101-150 151-200 201-250 251-300 ... 301-320
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack