Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for June 2022

Total of 268 entries : 1-25 51-75 76-100 101-125 126-150 151-175 176-200 201-225 ... 251-268
Showing up to 25 entries per page: fewer | more | all
[126] arXiv:2206.02187 (cross-list from cs.CV) [pdf, other]
Title: M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation
Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Naoyuki Onoe
Comments: Accepted for publication in the 5th Multimodal Learning and Applications (MULA) Workshop at CVPR 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127] arXiv:2206.02211 (cross-list from cs.SD) [pdf, other]
Title: Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
Santiago Cuervo, Adrian Łańcucki, Ricard Marxer, Paweł Rychlikowski, Jan Chorowski
Comments: Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Journal-ref: Advances in Neural Information Processing Systems, 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[128] arXiv:2206.02246 (cross-list from cs.SD) [pdf, other]
Title: Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
Alon Levkovitch, Eliya Nachmani, Lior Wolf
Comments: Accepted to Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[129] arXiv:2206.02284 (cross-list from cs.SD) [pdf, other]
Title: Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator
Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Jiachen Zhuo, Maureen Stone, Georges El Fakhri, Jonghye Woo
Comments: MICCAI 2022 (early accept, Oral Presentation ~3%)
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[130] arXiv:2206.02671 (cross-list from cs.SD) [pdf, other]
Title: Canonical Cortical Graph Neural Networks and its Application for Speech Enhancement in Audio-Visual Hearing Aids
Leandro A. Passos, João Paulo Papa, Amir Hussain, Ahsan Adeel
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[131] arXiv:2206.03065 (cross-list from cs.SD) [pdf, other]
Title: Universal Speech Enhancement with Score-based Diffusion
Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini
Comments: 24 pages, 6 figures; includes appendix; examples in this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[132] arXiv:2206.03112 (cross-list from cs.LG) [pdf, other]
Title: Singapore Soundscape Site Selection Survey (S5): Identification of Characteristic Soundscapes of Singapore via Weighted k-means Clustering
Kenneth Ooi, Bhan Lam, Joo Young Hong, Karn N. Watcharasupat, Zhen-Ting Ong, Woon-Seng Gan
Comments: 23 pages, 8 figures. Submitted to Sustainability
Journal-ref: MDPI Sustainability. 2022; 14(12):7485
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2206.03173 (cross-list from cs.CL) [pdf, other]
Title: Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation
Yinan Bao, Qianwen Ma, Lingwei Wei, Wei Zhou, Songlin Hu
Comments: Accepted by IJCAI-ECAI 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134] arXiv:2206.03318 (cross-list from cs.CL) [pdf, other]
Title: LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed
Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2206.03351 (cross-list from cs.SD) [pdf, other]
Title: AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems
Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[136] arXiv:2206.03393 (cross-list from cs.SD) [pdf, other]
Title: Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition
Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Feng Wang, Jiashui Wang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[137] arXiv:2206.04006 (cross-list from cs.SD) [pdf, other]
Title: Few-Shot Audio-Visual Learning of Environment Acoustics
Sagnik Majumder, Changan Chen, Ziad Al-Halah, Kristen Grauman
Comments: Accepted to NeurIPS 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[138] arXiv:2206.04523 (cross-list from cs.CL) [pdf, other]
Title: Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Alexander Waibel, Moritz Behr, Fevziye Irem Eyiokur, Dogucan Yaman, Tuan-Nam Nguyen, Carlos Mullov, Mehmet Arif Demirtas, Alperen Kantarcı, Stefan Constantin, Hazım Kemal Ekenel
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[139] arXiv:2206.04571 (cross-list from cs.CL) [pdf, other]
Title: Revisiting End-to-End Speech-to-Text Translation From Scratch
Biao Zhang, Barry Haddow, Rico Sennrich
Comments: ICML
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2206.04658 (cross-list from cs.SD) [pdf, other]
Title: BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon
Comments: To appear at ICLR 2023. Listen to audio samples from BigVGAN at: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[141] arXiv:2206.04769 (cross-list from cs.SD) [pdf, other]
Title: CLAP: Learning Audio Concepts From Natural Language Supervision
Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, Huaming Wang
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142] arXiv:2206.04780 (cross-list from cs.SD) [pdf, other]
Title: Speak Like a Dog: Human to Non-human creature Voice Conversion
Kohei Suzuki, Shoki Sakamoto, Tadahiro Taniguchi, Hirokazu Kameoka
Comments: 5 pages, 4 figures
Journal-ref: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 1388-1393)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[143] arXiv:2206.04805 (cross-list from cs.SD) [pdf, other]
Title: Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022
Anthony Miyaguchi, Jiangyue Yu, Bryan Cheungvivatpant, Dakota Dudley, Aniketh Swain
Comments: Submitted to CEUR-WS under LifeCLEF for the BirdCLEF 2022 challenge as a working note
Journal-ref: CEUR-WS Vol-3180 (2022) 2159-2167
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[144] arXiv:2206.04922 (cross-list from cs.CL) [pdf, other]
Title: A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation
Junhui Zhang, Wudi Bao, Junjie Pan, Xiang Yin, Zejun Ma
Comments: 4 pages,5 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145] arXiv:2206.04962 (cross-list from cs.SD) [pdf, other]
Title: Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and Dereverberation
Yi Li, ShuangLin Li, Yang Sun, Syed Mohsen Naqvi
Comments: arXiv admin note: text overlap with arXiv:2112.11142
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146] arXiv:2206.04984 (cross-list from cs.SD) [pdf, other]
Title: Zero-Shot Audio Classification using Image Embeddings
Duygu Dogan, Huang Xie, Toni Heittola, Tuomas Virtanen
Comments: Accepted to the European Signal Processing Conference (EUSIPCO) 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[147] arXiv:2206.05018 (cross-list from cs.SD) [pdf, other]
Title: Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments using Acoustic Features
Franziska Braun, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Korbinian Riedhammer, Sebastian P. Bayerl
Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[148] arXiv:2206.05053 (cross-list from cs.HC) [pdf, other]
Title: Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms
Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, Murali Alagesan
Journal-ref: Interspeech, 2022
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[149] arXiv:2206.05286 (cross-list from cs.SD) [pdf, other]
Title: AHD ConvNet for Speech Emotion Classification
Asfand Ali, Danial Nasir, Mohammad Hassan Jawad
Comments: Wrong authors quoted
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[150] arXiv:2206.05408 (cross-list from cs.SD) [pdf, other]
Title: Multi-instrument Music Synthesis with Spectrogram Diffusion
Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Total of 268 entries : 1-25 51-75 76-100 101-125 126-150 151-175 176-200 201-225 ... 251-268
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack