Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for April 2022

Total of 291 entries : 1-50 51-100 101-150 151-200 201-250 251-291
Showing up to 50 entries per page: fewer | more | all
[151] arXiv:2204.00768 (cross-list from eess.AS) [pdf, html, other]
Title: VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[152] arXiv:2204.00771 (cross-list from eess.AS) [pdf, other]
Title: Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation
Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang
Comments: Submitted to Interspeech conference 2022 this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[153] arXiv:2204.00803 (cross-list from cs.CL) [pdf, other]
Title: End-to-end model for named entity recognition from speech without paired training data
Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève
Comments: Submitted to INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2204.00890 (cross-list from eess.AS) [pdf, other]
Title: From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization
Federico Landini, Alicia Lozano-Diez, Mireia Diez, Lukáš Burget
Comments: Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[155] arXiv:2204.00967 (cross-list from eess.AS) [pdf, other]
Title: Automatic Dialect Density Estimation for African American English
Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan
Comments: 5 pages, 2 figures
Journal-ref: Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[156] arXiv:2204.00977 (cross-list from cs.CL) [pdf, other]
Title: Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents
Priyank Dubey, Bilal Shah
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2204.01265 (cross-list from cs.CV) [pdf, other]
Title: Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro
Comments: Published at ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[158] arXiv:2204.01271 (cross-list from eess.AS) [pdf, other]
Title: Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim
Comments: Submitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[159] arXiv:2204.01300 (cross-list from eess.AS) [pdf, other]
Title: tPLCnet: Real-time Deep Packet Loss Concealment in the Time Domain Using a Short Temporal Context
Nils L. Westhausen, Bernd T. Meyer
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[160] arXiv:2204.01345 (cross-list from eess.AS) [pdf, other]
Title: MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
Karl El Hajal, Milos Cernak, Pablo Mainar
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[161] arXiv:2204.01355 (cross-list from eess.AS) [pdf, other]
Title: Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches
Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou
Comments: 5 pages, 1 table, 5 figures. Submitted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[162] arXiv:2204.01387 (cross-list from eess.AS) [pdf, other]
Title: Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoirin Kim
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[163] arXiv:2204.01397 (cross-list from cs.CL) [pdf, other]
Title: A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems
Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève
Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164] arXiv:2204.01670 (cross-list from cs.CL) [pdf, other]
Title: Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang
Comments: Submitted for review at Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2204.01735 (cross-list from eess.AS) [pdf, other]
Title: Robust Stuttering Detection via Multi-task and Adversarial Learning
Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni
Comments: Under Review in European Signal Processing Conference 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[166] arXiv:2204.01851 (cross-list from eess.AS) [pdf, other]
Title: Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation
Eleonora Grassucci, Gioia Mancini, Christian Brignone, Aurelio Uncini, Danilo Comminiello
Comments: Paper accepted for publication in Elsevier Pattern Recognition Letters
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[167] arXiv:2204.01954 (cross-list from physics.comp-ph) [pdf, other]
Title: Application of a Spectral Method to Simulate Quasi-Three-Dimensional Underwater Acoustic Fields
Houwang Tu, Yongxian Wang, Wei Liu, Chunmei Yang, Jixing Qin, Shuqing Ma, Xiaodong Wang
Comments: 31 pages, 22 figures. arXiv admin note: text overlap with arXiv:2112.13602
Subjects: Computational Physics (physics.comp-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2204.02090 (cross-list from cs.CV) [pdf, other]
Title: VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro
Comments: Paper accepted to Interspeech 2022; Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2204.02263 (cross-list from eess.AS) [pdf, other]
Title: Multilingual and Multimodal Abuse Detection
Rini Sharon, Heet Shah, Debdoot Mukherjee, Vikram Gupta
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[170] arXiv:2204.02385 (cross-list from eess.AS) [pdf, other]
Title: Learning Speech Emotion Representations in the Quaternion Domain
Eric Guizzo, Tillman Weyde, Simone Scardapane, Danilo Comminiello
Comments: Accepted for Publication in IEEE/ACM Transactions on Audio, Speech and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[171] arXiv:2204.02389 (cross-list from cs.CV) [pdf, other]
Title: ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu
Comments: In CVPR 2022. Gao, Si, and Chang contributed equally to this work. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2204.02470 (cross-list from cs.CL) [pdf, other]
Title: Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel Lopez-Francisco, Jonathan D. Amith, Shinji Watanabe
Comments: 5 pages, 2 figures, submitted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[173] arXiv:2204.02485 (cross-list from cs.CV) [pdf, other]
Title: Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization
Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2204.02492 (cross-list from cs.CL) [pdf, other]
Title: Towards End-to-end Unsupervised Speech Recognition
Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Comments: Preprint
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175] arXiv:2204.02500 (cross-list from cs.CR) [pdf, other]
Title: User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning
Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan
Journal-ref: Proc. Interspeech 2022
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176] arXiv:2204.02530 (cross-list from cs.CL) [pdf, other]
Title: Prosodic Alignment for off-screen automatic dubbing
Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote
Comments: 5 pages, 2 figures, 3 tables, Submitted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2204.02637 (cross-list from eess.AS) [pdf, other]
Title: Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features
Jin Woo Lee, Sungho Lee, Kyogu Lee
Comments: Submitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[178] arXiv:2204.02694 (cross-list from eess.AS) [pdf, other]
Title: Customizable End-to-end Optimization of Online Neural Network-supported Dereverberation for Hearing Devices
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
Comments: ©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[179] arXiv:2204.02741 (cross-list from eess.AS) [pdf, other]
Title: Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
Comments: accepted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[180] arXiv:2204.02810 (cross-list from cs.CV) [pdf, other]
Title: Expression-preserving face frontalization improves visually assisted speech processing
Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda
Comments: arXiv admin note: text overlap with arXiv:2202.00538
Journal-ref: International Journal of Computer Vision 131 (5), 1122-1140, 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181] arXiv:2204.02841 (cross-list from eess.AS) [pdf, other]
Title: Spectral Denoising for Microphone Classification
L. Cuccovillo, A. Giganti, P. Bestagini, P. Aichroth, S. Tubaro
Journal-ref: in ACM International Workshop on Multimedia AI against Disinformation (MAD), Newark, NJ, USA, 2022, pp. 10-17
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[182] arXiv:2204.02874 (cross-list from cs.CV) [pdf, other]
Title: ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius
Comments: ECCV 2022 Oral project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2204.02967 (cross-list from cs.CL) [pdf, other]
Title: Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee
Comments: Accepted to be published in the Proceedings of Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184] arXiv:2204.02978 (cross-list from eess.AS) [pdf, other]
Title: A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
Comments: Accepted for publication in EURASIP Journal on Audio, Speech and Music Processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[185] arXiv:2204.03063 (cross-list from cs.MM) [pdf, other]
Title: Late multimodal fusion for image and audio music transcription
María Alfaro-Contreras (1), Jose J. Valero-Mas (1), José M. Iñesta (1), Jorge Calvo-Zaragoza (1) ((1) Instituto Universitario de Investigación Informática, University of Alicante, Alicante, Spain)
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186] arXiv:2204.03083 (cross-list from cs.CV) [pdf, other]
Title: Audio-Visual Person-of-Interest DeepFake Detection
Davide Cozzolino, Alessandro Pianese, Matthias Nießner, Luisa Verdoliva
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2204.03166 (cross-list from eess.AS) [pdf, other]
Title: Musical Information Extraction from the Singing Voice
Preeti Rao
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[188] arXiv:2204.03219 (cross-list from eess.AS) [pdf, other]
Title: DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores
Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee
Comments: Accepted to Interspeech 2022. Code will be available in the future
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[189] arXiv:2204.03305 (cross-list from eess.AS) [pdf, other]
Title: MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids
Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[190] arXiv:2204.03310 (cross-list from eess.AS) [pdf, other]
Title: MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
Ryandhimas E. Zezario, Szu-wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[191] arXiv:2204.03315 (cross-list from cs.CL) [pdf, other]
Title: Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
Nick J.C. Wang, Lu Wang, Yandan Sun, Haimei Kang, Dejun Zhang
Comments: Published in INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192] arXiv:2204.03409 (cross-list from cs.CL) [pdf, other]
Title: MAESTRO: Matched Speech Text Representations through Modality Matching
Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Moreno, Ankur Bapna, Heiga Zen
Comments: Accepted by Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2204.03417 (cross-list from eess.AS) [pdf, other]
Title: Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer
Comments: Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[194] arXiv:2204.03428 (cross-list from eess.AS) [pdf, other]
Title: Detecting Vocal Fatigue with Neural Embeddings
Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet
Comments: Accepted for Publication in the Journal of Voice
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[195] arXiv:2204.03561 (cross-list from cs.CV) [pdf, other]
Title: Emotional Speech Recognition with Pre-trained Deep Visual Models
Waleed Ragheb, Mehdi Mirzapour, Ali Delfardi, Hélène Jacquenet, Lawrence Carbon
Journal-ref: Deep Learning for NLP Workshop, Extraction et Gestion des Connaissances (EGC), 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196] arXiv:2204.03793 (cross-list from eess.AS) [pdf, other]
Title: Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw
Comments: Accepted by INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[197] arXiv:2204.03848 (cross-list from eess.AS) [pdf, other]
Title: AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification
Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak
Comments: Submitted to InterSpeech 2022
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[198] arXiv:2204.03851 (cross-list from eess.AS) [pdf, other]
Title: Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[199] arXiv:2204.03879 (cross-list from cs.CL) [pdf, other]
Title: A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding
Nick J.C. Wang, Shaojun Wang, Jing Xiao
Comments: Submitted to INTERSPEECH 2022. (5 pages, 1 figure.)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2204.03888 (cross-list from cs.CL) [pdf, other]
Title: Transducer-based language embedding for spoken language identification
Peng Shen, Xugang Lu, Hisashi Kawai
Comments: This paper was accepted by Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 291 entries : 1-50 51-100 101-150 151-200 201-250 251-291
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack