Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for April 2022

Total of 320 entries : 1-25 26-50 51-75 76-100 101-125 126-150 ... 301-320
Showing up to 25 entries per page: fewer | more | all
[51] arXiv:2204.03793 [pdf, other]
Title: Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw
Comments: Accepted by INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[52] arXiv:2204.03848 [pdf, other]
Title: AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification
Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak
Comments: Submitted to InterSpeech 2022
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[53] arXiv:2204.03851 [pdf, other]
Title: Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[54] arXiv:2204.03855 [pdf, other]
Title: Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
Qianying Liu, Zhuo Gong, Zhengdong Yang, Yuhang Yang, Sheng Li, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Chenhui Chu, Sadao Kurohashi
Comments: 7 pages, ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[55] arXiv:2204.03863 [pdf, other]
Title: Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
Eesung Kim, Jae-Jin Jeon, Hyeji Seo, Hoon Kim
Comments: Submitted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[56] arXiv:2204.03895 [pdf, other]
Title: SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki
Comments: Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing on Feb. 10th, 2022, and accepted on Oct. 20, 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57] arXiv:2204.03898 [pdf, other]
Title: Exploring Transformer's potential on automatic piano transcription
Longshen Ou, Ziyi Guo, Emmanouil Benetos, Jiqing Han, Ye Wang
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58] arXiv:2204.03965 [pdf, other]
Title: Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?
Qiongqiong Wang, Kong Aik Lee, Tianchi Liu
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59] arXiv:2204.04004 [pdf, other]
Title: Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
Comments: Accepted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[60] arXiv:2204.04006 [pdf, other]
Title: Analysis and transformations of voice level in singing voice
Frederik Bous, Axel Roebel
Comments: Submitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[61] arXiv:2204.04016 [pdf, other]
Title: Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment
Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Andreas Maier, Elmar Noeth, Bjoern Heismann, Maria Schuster, Seung Hee Yang
Comments: Submitted and Accepted at INTERSPEECH2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[62] arXiv:2204.04068 [pdf, other]
Title: Declipping of Speech Signals Using Frequency Selective Extrapolation
Markus Jonscher, Jürgen Seiler, André Kaup
Comments: 4 pages, 5 figures, 2 tables, Speech Communication 11. ITG Symposium
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[63] arXiv:2204.04127 [pdf, other]
Title: Karaoker: Alignment-free singing voice synthesis with speech training data
Panos Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, June Sig Sung, Gunu Jho, Pirros Tsiakoulis, Aimilios Chalamandaris
Comments: Accepted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[64] arXiv:2204.04170 [pdf, other]
Title: Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning
Salah Zaiem, Titouan Parcollet, Slim Essid
Comments: Submitted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[65] arXiv:2204.04284 [pdf, other]
Title: Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition
Zehai Tu, Jack Deadman, Ning Ma, Jon Barker
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[66] arXiv:2204.04287 [pdf, other]
Title: Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners
Zehai Tu, Ning Ma, Jon Barker
Comments: Accepted to INTERSPEECH2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[67] arXiv:2204.04288 [pdf, other]
Title: Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction
Zehai Tu, Ning Ma, Jon Barker
Comments: Accepted to INTERSPEECH2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[68] arXiv:2204.04333 [pdf, other]
Title: A Study of Using Cepstrogram for Countermeasure Against Replay Attacks
Shih-Kuang Lee, Yu Tsao, Hsin-Min Wang
Comments: Submitted to SLT 2022
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[69] arXiv:2204.04370 [pdf, other]
Title: QuiKo: A Quantum Beat Generation Application
Scott Oshiro
Comments: Pre-publication draft, to appear in the book "Quantum Computer Music", E. R. Miranda (Ed.)
Subjects: Audio and Speech Processing (eess.AS); Emerging Technologies (cs.ET); Human-Computer Interaction (cs.HC); Sound (cs.SD); Quantum Physics (quant-ph)
[70] arXiv:2204.04811 [pdf, other]
Title: Listen only to me! How well can target speech extraction handle false alarms?
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato, Tomohiro Nakatani
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[71] arXiv:2204.05177 [pdf, other]
Title: The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance
Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi
Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (DOI: https://doi.org/10.1109/TASLP.2022.3233236)
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 813-825, 2023
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[72] arXiv:2204.05419 [pdf, other]
Title: A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition
Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Dan Bigioi, Peter Corcoran, Horia Cucu
Comments: Preprint, Submitted to IEEE Access
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[73] arXiv:2204.05460 [pdf, other]
Title: CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee
Comments: Accepted by ISCSLP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[74] arXiv:2204.05609 [pdf, other]
Title: Low Latency Time Domain Multichannel Speech and Music Source Separation
Gerald Schuller (Ilmenau University of Technology)
Comments: This paper was published at the Asilomar Conference on Signals, Systems, and Computers in November 2021. A software repository for the paper is: this https URL
Journal-ref: 55th Asilomar Conference on Signals, Systems, and Computers, ACSSC 2021, Pacific Grove, CA, USA, October 31 - November 3, 2021. IEEE 2021, ISBN 978-1-6654-5828-3
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75] arXiv:2204.05738 [pdf, other]
Title: Text-Driven Separation of Arbitrary Sounds
Kevin Kilgour, Beat Gfeller, Qingqing Huang, Aren Jansen, Scott Wisdom, Marco Tagliasacchi
Comments: Submitted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Total of 320 entries : 1-25 26-50 51-75 76-100 101-125 126-150 ... 301-320
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack