Sound

Authors and titles for April 2022

Total of 291 entries : 1-50 51-100 101-150 151-200 201-250 251-291

Showing up to 50 entries per page: fewer | more | all

[151] arXiv:2204.00768 (cross-list from eess.AS) [pdf, html, other]: Title: VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu

Comments: Accepted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[152] arXiv:2204.00771 (cross-list from eess.AS) [pdf, other]: Title: Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation

Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang

Comments: Submitted to Interspeech conference 2022 this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[153] arXiv:2204.00803 (cross-list from cs.CL) [pdf, other]: Title: End-to-end model for named entity recognition from speech without paired training data

Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève

Comments: Submitted to INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2204.00890 (cross-list from eess.AS) [pdf, other]: Title: From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization

Federico Landini, Alicia Lozano-Diez, Mireia Diez, Lukáš Burget

Comments: Accepted at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[155] arXiv:2204.00967 (cross-list from eess.AS) [pdf, other]: Title: Automatic Dialect Density Estimation for African American English

Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan

Comments: 5 pages, 2 figures

Journal-ref: Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[156] arXiv:2204.00977 (cross-list from cs.CL) [pdf, other]: Title: Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents

Priyank Dubey, Bilal Shah

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2204.01265 (cross-list from cs.CV) [pdf, other]: Title: Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

Comments: Published at ICCV 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[158] arXiv:2204.01271 (cross-list from eess.AS) [pdf, other]: Title: Into-TTS : Intonation Template Based Prosody Control System

Jihwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim

Comments: Submitted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[159] arXiv:2204.01300 (cross-list from eess.AS) [pdf, other]: Title: tPLCnet: Real-time Deep Packet Loss Concealment in the Time Domain Using a Short Temporal Context

Nils L. Westhausen, Bernd T. Meyer

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[160] arXiv:2204.01345 (cross-list from eess.AS) [pdf, other]: Title: MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment

Karl El Hajal, Milos Cernak, Pablo Mainar

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[161] arXiv:2204.01355 (cross-list from eess.AS) [pdf, other]: Title: Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou

Comments: 5 pages, 1 table, 5 figures. Submitted to INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[162] arXiv:2204.01387 (cross-list from eess.AS) [pdf, other]: Title: Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoirin Kim

Comments: Accepted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[163] arXiv:2204.01397 (cross-list from cs.CL) [pdf, other]: Title: A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164] arXiv:2204.01670 (cross-list from cs.CL) [pdf, other]: Title: Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

Comments: Submitted for review at Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2204.01735 (cross-list from eess.AS) [pdf, other]: Title: Robust Stuttering Detection via Multi-task and Adversarial Learning

Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Comments: Under Review in European Signal Processing Conference 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[166] arXiv:2204.01851 (cross-list from eess.AS) [pdf, other]: Title: Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation

Eleonora Grassucci, Gioia Mancini, Christian Brignone, Aurelio Uncini, Danilo Comminiello

Comments: Paper accepted for publication in Elsevier Pattern Recognition Letters

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[167] arXiv:2204.01954 (cross-list from physics.comp-ph) [pdf, other]: Title: Application of a Spectral Method to Simulate Quasi-Three-Dimensional Underwater Acoustic Fields

Houwang Tu, Yongxian Wang, Wei Liu, Chunmei Yang, Jixing Qin, Shuqing Ma, Xiaodong Wang

Comments: 31 pages, 22 figures. arXiv admin note: text overlap with arXiv:2112.13602

Subjects: Computational Physics (physics.comp-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2204.02090 (cross-list from cs.CV) [pdf, other]: Title: VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices

Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro

Comments: Paper accepted to Interspeech 2022; Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2204.02263 (cross-list from eess.AS) [pdf, other]: Title: Multilingual and Multimodal Abuse Detection

Rini Sharon, Heet Shah, Debdoot Mukherjee, Vikram Gupta

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[170] arXiv:2204.02385 (cross-list from eess.AS) [pdf, other]: Title: Learning Speech Emotion Representations in the Quaternion Domain

Eric Guizzo, Tillman Weyde, Simone Scardapane, Danilo Comminiello

Comments: Accepted for Publication in IEEE/ACM Transactions on Audio, Speech and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[171] arXiv:2204.02389 (cross-list from cs.CV) [pdf, other]: Title: ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu

Comments: In CVPR 2022. Gao, Si, and Chang contributed equally to this work. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2204.02470 (cross-list from cs.CL) [pdf, other]: Title: Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel Lopez-Francisco, Jonathan D. Amith, Shinji Watanabe

Comments: 5 pages, 2 figures, submitted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[173] arXiv:2204.02485 (cross-list from cs.CV) [pdf, other]: Title: Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2204.02492 (cross-list from cs.CL) [pdf, other]: Title: Towards End-to-end Unsupervised Speech Recognition

Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Comments: Preprint

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175] arXiv:2204.02500 (cross-list from cs.CR) [pdf, other]: Title: User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning

Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan

Journal-ref: Proc. Interspeech 2022

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176] arXiv:2204.02530 (cross-list from cs.CL) [pdf, other]: Title: Prosodic Alignment for off-screen automatic dubbing

Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote

Comments: 5 pages, 2 figures, 3 tables, Submitted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2204.02637 (cross-list from eess.AS) [pdf, other]: Title: Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features

Jin Woo Lee, Sungho Lee, Kyogu Lee

Comments: Submitted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[178] arXiv:2204.02694 (cross-list from eess.AS) [pdf, other]: Title: Customizable End-to-end Optimization of Online Neural Network-supported Dereverberation for Hearing Devices

Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

Comments: ©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[179] arXiv:2204.02741 (cross-list from eess.AS) [pdf, other]: Title: Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments

Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

Comments: accepted to INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[180] arXiv:2204.02810 (cross-list from cs.CV) [pdf, other]: Title: Expression-preserving face frontalization improves visually assisted speech processing

Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda

Comments: arXiv admin note: text overlap with arXiv:2202.00538

Journal-ref: International Journal of Computer Vision 131 (5), 1122-1140, 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181] arXiv:2204.02841 (cross-list from eess.AS) [pdf, other]: Title: Spectral Denoising for Microphone Classification

L. Cuccovillo, A. Giganti, P. Bestagini, P. Aichroth, S. Tubaro

Journal-ref: in ACM International Workshop on Multimedia AI against Disinformation (MAD), Newark, NJ, USA, 2022, pp. 10-17

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[182] arXiv:2204.02874 (cross-list from cs.CV) [pdf, other]: Title: ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius

Comments: ECCV 2022 Oral project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2204.02967 (cross-list from cs.CL) [pdf, other]: Title: Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation

Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee

Comments: Accepted to be published in the Proceedings of Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184] arXiv:2204.02978 (cross-list from eess.AS) [pdf, other]: Title: A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices

Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

Comments: Accepted for publication in EURASIP Journal on Audio, Speech and Music Processing

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[185] arXiv:2204.03063 (cross-list from cs.MM) [pdf, other]: Title: Late multimodal fusion for image and audio music transcription

María Alfaro-Contreras (1), Jose J. Valero-Mas (1), José M. Iñesta (1), Jorge Calvo-Zaragoza (1) ((1) Instituto Universitario de Investigación Informática, University of Alicante, Alicante, Spain)

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186] arXiv:2204.03083 (cross-list from cs.CV) [pdf, other]: Title: Audio-Visual Person-of-Interest DeepFake Detection

Davide Cozzolino, Alessandro Pianese, Matthias Nießner, Luisa Verdoliva

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2204.03166 (cross-list from eess.AS) [pdf, other]: Title: Musical Information Extraction from the Singing Voice

Preeti Rao

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[188] arXiv:2204.03219 (cross-list from eess.AS) [pdf, other]: Title: DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores

Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

Comments: Accepted to Interspeech 2022. Code will be available in the future

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[189] arXiv:2204.03305 (cross-list from eess.AS) [pdf, other]: Title: MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Comments: Accepted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[190] arXiv:2204.03310 (cross-list from eess.AS) [pdf, other]: Title: MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Ryandhimas E. Zezario, Szu-wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Comments: Accepted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[191] arXiv:2204.03315 (cross-list from cs.CL) [pdf, other]: Title: Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

Nick J.C. Wang, Lu Wang, Yandan Sun, Haimei Kang, Dejun Zhang

Comments: Published in INTERSPEECH 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192] arXiv:2204.03409 (cross-list from cs.CL) [pdf, other]: Title: MAESTRO: Matched Speech Text Representations through Modality Matching

Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Moreno, Ankur Bapna, Heiga Zen

Comments: Accepted by Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2204.03417 (cross-list from eess.AS) [pdf, other]: Title: Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer

Comments: Accepted at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[194] arXiv:2204.03428 (cross-list from eess.AS) [pdf, other]: Title: Detecting Vocal Fatigue with Neural Embeddings

Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet

Comments: Accepted for Publication in the Journal of Voice

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[195] arXiv:2204.03561 (cross-list from cs.CV) [pdf, other]: Title: Emotional Speech Recognition with Pre-trained Deep Visual Models

Waleed Ragheb, Mehdi Mirzapour, Ali Delfardi, Hélène Jacquenet, Lawrence Carbon

Journal-ref: Deep Learning for NLP Workshop, Extraction et Gestion des Connaissances (EGC), 2022

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196] arXiv:2204.03793 (cross-list from eess.AS) [pdf, other]: Title: Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw

Comments: Accepted by INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[197] arXiv:2204.03848 (cross-list from eess.AS) [pdf, other]: Title: AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak

Comments: Submitted to InterSpeech 2022

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[198] arXiv:2204.03851 (cross-list from eess.AS) [pdf, other]: Title: Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[199] arXiv:2204.03879 (cross-list from cs.CL) [pdf, other]: Title: A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

Nick J.C. Wang, Shaojun Wang, Jing Xiao

Comments: Submitted to INTERSPEECH 2022. (5 pages, 1 figure.)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2204.03888 (cross-list from cs.CL) [pdf, other]: Title: Transducer-based language embedding for spoken language identification

Peng Shen, Xugang Lu, Hisashi Kawai

Comments: This paper was accepted by Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 291 entries : 1-50 51-100 101-150 151-200 201-250 251-291

Showing up to 50 entries per page: fewer | more | all