Audio and Speech Processing

Authors and titles for February 2022

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232

Showing up to 50 entries per page: fewer | more | all

[201] arXiv:2202.10137 (cross-list from cs.CL) [pdf, other]: Title: A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets

Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury

Comments: \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[202] arXiv:2202.10453 (cross-list from cs.CV) [pdf, other]: Title: Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

Phoebe Chua (1), Dimos Makris (2), Dorien Herremans (2), Gemma Roig (3), Kat Agres (4) ((1) Department of Information Systems and Analytics, National University of Singapore, (2) Singapore University of Technology and Design, (3) Goethe University Frankfurt, (4) Yong Siew Toh Conservatory of Music, National University of Singapore)

Comments: 16 pages with 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[203] arXiv:2202.10573 (cross-list from eess.IV) [pdf, other]: Title: Deep Iterative Phase Retrieval for Ptychography

Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann

Comments: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[204] arXiv:2202.10594 (cross-list from cs.SD) [pdf, other]: Title: Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey

Ngoc Dung Huynh, Mohamed Reda Bouadjenek, Imran Razzak, Kevin Lee, Chetan Arora, Ali Hassani, Arkady Zaslavsky

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[205] arXiv:2202.10631 (cross-list from cs.HC) [pdf, other]: Title: Hidden bawls, whispers, and yelps: can text be made to sound more than just its words?

Caluã de Lacerda Pataca, Paula Dornhofer Paro Costa

Comments: 10 pages, 7 figures. This work has been submitted to the IEEE for possible publication

Journal-ref: IEEE Trans. Affect. Comput. 14 (2023) 6-16

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[206] arXiv:2202.10712 (cross-list from cs.SD) [pdf, other]: Title: nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech

Botao Zhao, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[207] arXiv:2202.10729 (cross-list from cs.SD) [pdf, other]: Title: Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

Jianhao Ye, Hongbin Zhou, Zhiba Su, Wendi He, Kaimeng Ren, Lin Li, Heng Lu

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[208] arXiv:2202.10910 (cross-list from cs.SD) [pdf, other]: Title: Sound Adversarial Audio-Visual Navigation

Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang, Xiaohong Liu

Comments: This work aims to do an adversarial sound intervention for robust audio-visual navigation

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[209] arXiv:2202.10941 (cross-list from cs.LG) [pdf, other]: Title: Recognizing Concepts and Recognizing Musical Themes. A Quantum Semantic Analysis

Maria Luisa Dalla Chiara, Roberto Giuntini, Eleonora Negri, Giuseppe Sergioli

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantum Physics (quant-ph)
[210] arXiv:2202.10976 (cross-list from cs.SD) [pdf, other]: Title: DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning

Qiqi Wang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Published at ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[211] arXiv:2202.11134 (cross-list from cs.HC) [pdf, other]: Title: ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

Dhruv Jain, Khoa Huynh Anh Nguyen, Steven Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, Jon E. Froehlich

Comments: Published at the ACM CHI Conference on Human Factors in Computing Systems (CHI) 2022

Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[212] arXiv:2202.11136 (cross-list from cs.SD) [pdf, other]: Title: FlowSense: Monitoring Airflow in Building Ventilation Systems Using Audio Sensing

Bhawana Chhaglani, Camellia Zakaria, Adam Lechowicz, Prashant Shenoy, Jeremy Gummeson

Comments: 26 pages, 12 figures, Will appear in March issue of the IMWUT 2022 journal

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[213] arXiv:2202.11424 (cross-list from cs.SD) [pdf, other]: Title: Towards Speaker Age Estimation with Label Distribution Learning

Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao

Comments: Accepted by the 47th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[214] arXiv:2202.11479 (cross-list from cs.SD) [pdf, other]: Title: Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Florence d'Alché-Buc, Gaël Richard

Comments: Accepted at NeurIPS 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[215] arXiv:2202.11823 (cross-list from cs.SD) [pdf, other]: Title: Differentially Private Speaker Anonymization

Ali Shahin Shamsabadi, Brij Mohan Lal Srivastava, Aurélien Bellet, Nathalie Vauquier, Emmanuel Vincent, Mohamed Maouche, Marc Tommasi, Nicolas Papernot

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[216] arXiv:2202.11918 (cross-list from cs.SD) [pdf, other]: Title: Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[217] arXiv:2202.11929 (cross-list from cs.CL) [pdf, other]: Title: Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring

Herman Kamper

Comments: 11 pages, 5 figures, 5 tables

Journal-ref: IEEE/ACM Transactions on Audio, Speech and Language Processing 31 (2023) 684-694

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[218] arXiv:2202.12187 (cross-list from cs.NE) [pdf, other]: Title: SonOpt: Sonifying Bi-objective Population-Based Optimization Algorithms

Tasos Asonitis, Richard Allmendinger, Matt Benatan, Ricardo Climent

Subjects: Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[219] arXiv:2202.12243 (cross-list from cs.SD) [pdf, other]: Title: Flat Latent Manifolds for Human-machine Co-creation of Music

Nutan Chen, Djalel Benbouzid, Francesco Ferroni, Mathis Nitschke, Luciano Pinna, Patrick van der Smagt

Comments: 3rd Conference on AI Music Creativity (AIMC 2022)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[220] arXiv:2202.12257 (cross-list from cs.SD) [pdf, other]: Title: A Perceptual Measure for Evaluating the Resynthesis of Automatic Music Transcriptions

Federico Simonetta, Federico Avanzini, Stavros Ntalampiras

Comments: Accepted by Multimedia Tools and Applications (2022); supplementary materials are in the latex sources

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[221] arXiv:2202.12307 (cross-list from cs.LG) [pdf, other]: Title: Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

Dacheng Yin, Xuanchi Ren, Chong Luo, Yuwang Wang, Zhiwei Xiong, Wenjun Zeng

Comments: Accepted to ICLR 2022. Project page at this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[222] arXiv:2202.12576 (cross-list from cs.CL) [pdf, other]: Title: A Survey of Multilingual Models for Automatic Speech Recognition

Hemant Yadav, Sunayana Sitaram

Comments: 9 pages. Submitted to LREC 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[223] arXiv:2202.12719 (cross-list from cs.SD) [pdf, other]: Title: Ask2Mask: Guided Data Selection for Masked Speech Modeling

Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Pedro Moreno

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[224] arXiv:2202.12917 (cross-list from cs.CL) [pdf, other]: Title: Learning English with Peppa Pig

Mitja Nikolaus, Afra Alishahi, Grzegorz Chrupała

Comments: Accepted to TACL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[225] arXiv:2202.13084 (cross-list from cs.CV) [pdf, other]: Title: Visual Speech Recognition for Multiple Languages in the Wild

Pingchuan Ma, Stavros Petridis, Maja Pantic

Comments: Published in Nature Machine Intelligence

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[226] arXiv:2202.13097 (cross-list from cs.SD) [pdf, other]: Title: Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[227] arXiv:2202.13155 (cross-list from cs.CL) [pdf, other]: Title: Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo

Comments: \c{opyright}2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[228] arXiv:2202.13226 (cross-list from cs.SD) [pdf, other]: Title: An acoustic signal cavitation detection framework based on XGBoost with adaptive selection feature engineering

Yu Sha, Johannes Faber, Shuiping Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Journal-ref: Measurement 192 (2022), 110897

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[229] arXiv:2202.13245 (cross-list from cs.SD) [pdf, other]: Title: Regional-Local Adversarially Learned One-Class Classifier Anomalous Sound Detection in Global Long-Term Space

Yu Sha, Johannes Faber, Shuiping Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Journal-ref: KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[230] arXiv:2202.13255 (cross-list from cs.SD) [pdf, other]: Title: Hierarchical Linear Dynamical System for Representing Notes from Recorded Audio

Leila Kalantari, Jose Principe, Kathryn E. Sieving

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[231] arXiv:2202.13673 (cross-list from cs.MM) [pdf, other]: Title: Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

Luís Vilaça, Yi Yu, Paula Viana

Comments: 8 pages, 1 figure

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[232] arXiv:2202.13865 (cross-list from cs.SD) [pdf, other]: Title: On the relevance of bandwidth extension for speaker identification

Marcos Faundez-Zanuy, Mattias Nilsson, W. Bastiaan Kleijn

Comments: 4 pages

Journal-ref: 2002 11th European Signal Processing Conference, 2002, pp. 1-4

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232

Showing up to 50 entries per page: fewer | more | all