Audio and Speech Processing

Authors and titles for April 2022

Total of 320 entries : 1-25 ... 226-250 251-275 276-300 301-320

Showing up to 25 entries per page: fewer | more | all

[301] arXiv:2204.11934 (cross-list from cs.LG) [pdf, other]: Title: On-demand compute reduction with stochastic wav2vec 2.0

Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Comments: submitted to Interspeech, 2022

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[302] arXiv:2204.11942 (cross-list from cs.SD) [pdf, other]: Title: Meta-AF: Meta-Learning for Adaptive Filters

Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

Comments: Accepted to ACM/IEEE TASLP. Source code and audio examples: this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[303] arXiv:2204.12112 (cross-list from cs.SD) [pdf, other]: Title: Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure

Siqi Zheng, Hongbin Suo

Comments: ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[304] arXiv:2204.12177 (cross-list from cs.SD) [pdf, other]: Title: A Comparative Study on Approaches to Acoustic Scene Classification using CNNs

Ishrat Jahan Ananya, Sarah Suad, Shadab Hafiz Choudhury, Mohammad Ashrafuzzaman Khan

Comments: Presented at 2021 Mexican International Conference on Artificial Intelligence. Published in Advances in Computational Intelligence, MICAI 2021, Lecture Notes in Computer Science. 12 pages, 3 figures, 5 tables

Journal-ref: Advances in Computational Intelligence, MICAI 2021, Lecture Notes in Artificial Intelligence vol. 13067, pp. 81-91 (2021)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[305] arXiv:2204.12290 (cross-list from cs.SD) [pdf, other]: Title: On Machine Learning-Driven Surrogates for Sound Transmission Loss Simulations

Barbara Cunha (LTDS), Abdel-Malek Zine (ICJ), Mohamed Ichchou (ECL), Christophe Droz (COSYS-SII), Stéphane Foulard

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Medical Physics (physics.med-ph)
[306] arXiv:2204.12486 (cross-list from cs.SD) [pdf, other]: Title: Measurement uncertainty and unicity of single number quantities describing the spatial decay of speech level in open-plan offices

Lucas Lenne (INRS (Vandoeuvre lès Nancy)), Patrick Chevret, Étienne Parizet

Journal-ref: Applied Acoustics, Elsevier, 2021, 182, pp.108269

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[307] arXiv:2204.12489 (cross-list from cs.CV) [pdf, other]: Title: Sound Localization by Self-Supervised Time Delay Estimation

Ziyang Chen, David F. Fouhey, Andrew Owens

Comments: ECCV 2022

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[308] arXiv:2204.12622 (cross-list from cs.SD) [pdf, other]: Title: Named Entity Recognition for Audio De-Identification

Guillaume Baril, Patrick Cardinal, Alessandro Lameiras Koerich

Comments: 8 pages

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[309] arXiv:2204.12765 (cross-list from cs.CL) [pdf, other]: Title: Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Comments: Accepted by INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[310] arXiv:2204.12768 (cross-list from cs.SD) [pdf, other]: Title: Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

Dading Chong, Helin Wang, Peilin Zhou, Qingcheng Zeng

Comments: Submit to INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[311] arXiv:2204.13094 (cross-list from cs.SD) [pdf, other]: Title: Unsupervised Word Segmentation using K Nearest Neighbors

Tzeviya Sylvia Fuchs, Yedid Hoshen, Joseph Keshet

Comments: Submitted to interspeech 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[312] arXiv:2204.13206 (cross-list from cs.SD) [pdf, other]: Title: Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Dan Oneata, Horia Cucu

Comments: Accepted at the Multimodal Learning and Applications Workshop (MULA) from CVPR 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[313] arXiv:2204.13289 (cross-list from cs.SD) [pdf, other]: Title: Music Enhancement via Image Translation and Vocoding

Nikhil Kandpal, Oriol Nieto, Zeyu Jin

Comments: ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[314] arXiv:2204.13430 (cross-list from cs.SD) [pdf, other]: Title: Pseudo strong labels for large scale weakly supervised audio tagging

Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[315] arXiv:2204.13437 (cross-list from cs.SD) [pdf, other]: Title: Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss

Efthymios Georgiou, Kosmas Kritsis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[316] arXiv:2204.13601 (cross-list from cs.SD) [pdf, other]: Title: Emotion Recognition In Persian Speech Using Deep Neural Networks

Ali Yazdani, Hossein Simchi, Yasser Shekofteh

Comments: 5 pages, 1 figure, 3 tables

Journal-ref: 11th International Conference on Computer and Knowledge Engineering (ICCKE 2021)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[317] arXiv:2204.13622 (cross-list from eess.SP) [pdf, other]: Title: Fast Cross-Correlation for TDoA Estimation on Small Aperture Microphone Arrays

François Grondin, Marc-Antoine Maheux, Jean-Samuel Lauzon, Jonathan Vincent, François Michaud

Comments: Submitted to IEEE ICASSP 2023

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[318] arXiv:2204.13668 (cross-list from cs.SD) [pdf, other]: Title: Unaligned Supervision For Automatic Music Transcription in The Wild

Ben Maman, Amit H. Bermano

Comments: 16 pages, project page available at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[319] arXiv:2204.14057 (cross-list from cs.SD) [pdf, other]: Title: Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang, Yuxing Peng

Comments: 8 pages, 4 figures. Accepted by IJCAI-2022

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[320] arXiv:2204.14272 (cross-list from cs.CL) [pdf, other]: Title: End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

Chenyu You, Nuo Chen, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou

Comments: In Findings of NAACL 2022. arXiv admin note: substantial text overlap with arXiv:2010.08923

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 320 entries : 1-25 ... 226-250 251-275 276-300 301-320

Showing up to 25 entries per page: fewer | more | all