Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for February 2022

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232
Showing up to 50 entries per page: fewer | more | all
[201] arXiv:2202.10137 (cross-list from cs.CL) [pdf, other]
Title: A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury
Comments: \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[202] arXiv:2202.10453 (cross-list from cs.CV) [pdf, other]
Title: Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses
Phoebe Chua (1), Dimos Makris (2), Dorien Herremans (2), Gemma Roig (3), Kat Agres (4) ((1) Department of Information Systems and Analytics, National University of Singapore, (2) Singapore University of Technology and Design, (3) Goethe University Frankfurt, (4) Yong Siew Toh Conservatory of Music, National University of Singapore)
Comments: 16 pages with 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[203] arXiv:2202.10573 (cross-list from eess.IV) [pdf, other]
Title: Deep Iterative Phase Retrieval for Ptychography
Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann
Comments: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[204] arXiv:2202.10594 (cross-list from cs.SD) [pdf, other]
Title: Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey
Ngoc Dung Huynh, Mohamed Reda Bouadjenek, Imran Razzak, Kevin Lee, Chetan Arora, Ali Hassani, Arkady Zaslavsky
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[205] arXiv:2202.10631 (cross-list from cs.HC) [pdf, other]
Title: Hidden bawls, whispers, and yelps: can text be made to sound more than just its words?
Caluã de Lacerda Pataca, Paula Dornhofer Paro Costa
Comments: 10 pages, 7 figures. This work has been submitted to the IEEE for possible publication
Journal-ref: IEEE Trans. Affect. Comput. 14 (2023) 6-16
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[206] arXiv:2202.10712 (cross-list from cs.SD) [pdf, other]
Title: nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech
Botao Zhao, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[207] arXiv:2202.10729 (cross-list from cs.SD) [pdf, other]
Title: Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Jianhao Ye, Hongbin Zhou, Zhiba Su, Wendi He, Kaimeng Ren, Lin Li, Heng Lu
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[208] arXiv:2202.10910 (cross-list from cs.SD) [pdf, other]
Title: Sound Adversarial Audio-Visual Navigation
Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang, Xiaohong Liu
Comments: This work aims to do an adversarial sound intervention for robust audio-visual navigation
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[209] arXiv:2202.10941 (cross-list from cs.LG) [pdf, other]
Title: Recognizing Concepts and Recognizing Musical Themes. A Quantum Semantic Analysis
Maria Luisa Dalla Chiara, Roberto Giuntini, Eleonora Negri, Giuseppe Sergioli
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantum Physics (quant-ph)
[210] arXiv:2202.10976 (cross-list from cs.SD) [pdf, other]
Title: DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning
Qiqi Wang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Comments: Published at ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[211] arXiv:2202.11134 (cross-list from cs.HC) [pdf, other]
Title: ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users
Dhruv Jain, Khoa Huynh Anh Nguyen, Steven Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, Jon E. Froehlich
Comments: Published at the ACM CHI Conference on Human Factors in Computing Systems (CHI) 2022
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[212] arXiv:2202.11136 (cross-list from cs.SD) [pdf, other]
Title: FlowSense: Monitoring Airflow in Building Ventilation Systems Using Audio Sensing
Bhawana Chhaglani, Camellia Zakaria, Adam Lechowicz, Prashant Shenoy, Jeremy Gummeson
Comments: 26 pages, 12 figures, Will appear in March issue of the IMWUT 2022 journal
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[213] arXiv:2202.11424 (cross-list from cs.SD) [pdf, other]
Title: Towards Speaker Age Estimation with Label Distribution Learning
Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao
Comments: Accepted by the 47th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[214] arXiv:2202.11479 (cross-list from cs.SD) [pdf, other]
Title: Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF
Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Florence d'Alché-Buc, Gaël Richard
Comments: Accepted at NeurIPS 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[215] arXiv:2202.11823 (cross-list from cs.SD) [pdf, other]
Title: Differentially Private Speaker Anonymization
Ali Shahin Shamsabadi, Brij Mohan Lal Srivastava, Aurélien Bellet, Nathalie Vauquier, Emmanuel Vincent, Mohamed Maouche, Marc Tommasi, Nicolas Papernot
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[216] arXiv:2202.11918 (cross-list from cs.SD) [pdf, other]
Title: Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement
Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[217] arXiv:2202.11929 (cross-list from cs.CL) [pdf, other]
Title: Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring
Herman Kamper
Comments: 11 pages, 5 figures, 5 tables
Journal-ref: IEEE/ACM Transactions on Audio, Speech and Language Processing 31 (2023) 684-694
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[218] arXiv:2202.12187 (cross-list from cs.NE) [pdf, other]
Title: SonOpt: Sonifying Bi-objective Population-Based Optimization Algorithms
Tasos Asonitis, Richard Allmendinger, Matt Benatan, Ricardo Climent
Subjects: Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[219] arXiv:2202.12243 (cross-list from cs.SD) [pdf, other]
Title: Flat Latent Manifolds for Human-machine Co-creation of Music
Nutan Chen, Djalel Benbouzid, Francesco Ferroni, Mathis Nitschke, Luciano Pinna, Patrick van der Smagt
Comments: 3rd Conference on AI Music Creativity (AIMC 2022)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[220] arXiv:2202.12257 (cross-list from cs.SD) [pdf, other]
Title: A Perceptual Measure for Evaluating the Resynthesis of Automatic Music Transcriptions
Federico Simonetta, Federico Avanzini, Stavros Ntalampiras
Comments: Accepted by Multimedia Tools and Applications (2022); supplementary materials are in the latex sources
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[221] arXiv:2202.12307 (cross-list from cs.LG) [pdf, other]
Title: Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Dacheng Yin, Xuanchi Ren, Chong Luo, Yuwang Wang, Zhiwei Xiong, Wenjun Zeng
Comments: Accepted to ICLR 2022. Project page at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2202.12576 (cross-list from cs.CL) [pdf, other]
Title: A Survey of Multilingual Models for Automatic Speech Recognition
Hemant Yadav, Sunayana Sitaram
Comments: 9 pages. Submitted to LREC 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[223] arXiv:2202.12719 (cross-list from cs.SD) [pdf, other]
Title: Ask2Mask: Guided Data Selection for Masked Speech Modeling
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Pedro Moreno
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[224] arXiv:2202.12917 (cross-list from cs.CL) [pdf, other]
Title: Learning English with Peppa Pig
Mitja Nikolaus, Afra Alishahi, Grzegorz Chrupała
Comments: Accepted to TACL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[225] arXiv:2202.13084 (cross-list from cs.CV) [pdf, other]
Title: Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma, Stavros Petridis, Maja Pantic
Comments: Published in Nature Machine Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[226] arXiv:2202.13097 (cross-list from cs.SD) [pdf, other]
Title: Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[227] arXiv:2202.13155 (cross-list from cs.CL) [pdf, other]
Title: Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo
Comments: \c{opyright}2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[228] arXiv:2202.13226 (cross-list from cs.SD) [pdf, other]
Title: An acoustic signal cavitation detection framework based on XGBoost with adaptive selection feature engineering
Yu Sha, Johannes Faber, Shuiping Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou
Journal-ref: Measurement 192 (2022), 110897
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[229] arXiv:2202.13245 (cross-list from cs.SD) [pdf, other]
Title: Regional-Local Adversarially Learned One-Class Classifier Anomalous Sound Detection in Global Long-Term Space
Yu Sha, Johannes Faber, Shuiping Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou
Journal-ref: KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[230] arXiv:2202.13255 (cross-list from cs.SD) [pdf, other]
Title: Hierarchical Linear Dynamical System for Representing Notes from Recorded Audio
Leila Kalantari, Jose Principe, Kathryn E. Sieving
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[231] arXiv:2202.13673 (cross-list from cs.MM) [pdf, other]
Title: Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luís Vilaça, Yi Yu, Paula Viana
Comments: 8 pages, 1 figure
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[232] arXiv:2202.13865 (cross-list from cs.SD) [pdf, other]
Title: On the relevance of bandwidth extension for speaker identification
Marcos Faundez-Zanuy, Mattias Nilsson, W. Bastiaan Kleijn
Comments: 4 pages
Journal-ref: 2002 11th European Signal Processing Conference, 2002, pp. 1-4
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Total of 232 entries : 1-50 51-100 101-150 151-200 201-232
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack