Audio and Speech Processing

Authors and titles for June 2022

Total of 268 entries : 1-25 ... 151-175 176-200 201-225 226-250 251-268

Showing up to 25 entries per page: fewer | more | all

[226] arXiv:2206.12759 (cross-list from cs.CL) [pdf, other]: Title: Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective

Qingcheng Zeng, Dading Chong, Peilin Zhou, Jie Yang

Comments: INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[227] arXiv:2206.12772 (cross-list from cs.CV) [pdf, other]: Title: Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

Jinxiang Liu, Chen Ju, Weidi Xie, Ya Zhang

Comments: Camera-ready Version for ACMMM 2022, Project page is this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[228] arXiv:2206.12829 (cross-list from cs.SD) [pdf, other]: Title: On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode

Raviraj Joshi, Subodh Kumar

Comments: Accepted at SPCOM 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[229] arXiv:2206.12879 (cross-list from cs.CL) [pdf, other]: Title: Data Augmentation for Dementia Detection in Spoken Language

Anna Hlédiková, Dominika Woszczyk, Alican Akman, Soteris Demetriou, Björn Schuller

Comments: Accepted to INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[230] arXiv:2206.12931 (cross-list from cs.CL) [pdf, other]: Title: Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi

Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, Bornini Lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha

Comments: Speech for Social Good Workshop, 2022, Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[231] arXiv:2206.12955 (cross-list from cs.CL) [pdf, other]: Title: Improving the Training Recipe for a Robust Conformer-based Hybrid Model

Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney

Comments: Accepted at INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[232] arXiv:2206.13021 (cross-list from cs.SD) [pdf, other]: Title: Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion

Tuan Vu Ho, Maori Kobayashi, Masato Akagi

Comments: Accepted at INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[233] arXiv:2206.13071 (cross-list from cs.SD) [pdf, other]: Title: Uncertainty Calibration for Deep Audio Classifiers

Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Accepted by InterSpeech 2022, the first two authors contributed equally

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[234] arXiv:2206.13085 (cross-list from cs.SD) [pdf, other]: Title: Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling

Lonce Wyse, Purnima Kamath, Chitralekha Gupta

Journal-ref: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) (pp. 308-322). Springer, Cham. 2022

Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[235] arXiv:2206.13101 (cross-list from cs.SD) [pdf, other]: Title: SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao

Comments: This paper is accepted by Interspeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[236] arXiv:2206.13110 (cross-list from cs.SD) [pdf, other]: Title: Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Zhiyun Fan, Linhao Dong, Meng Cai, Zejun Ma, Bo Xu

Comments: Signal Processing Letters 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[237] arXiv:2206.13135 (cross-list from cs.CL) [pdf, other]: Title: TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline

Chengfei Li, Shuhao Deng, Yaoping Wang, Guangjing Wang, Yaguang Gong, Changbin Chen, Jinfeng Bai

Comments: accepted by INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[238] arXiv:2206.13136 (cross-list from cs.SD) [pdf, other]: Title: A two-stage full-band speech enhancement model with effective spectral compression mapping

Zhongshu Hou, Qinwen Hu, Kai Chen, Jing Lu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[239] arXiv:2206.13390 (cross-list from cs.CV) [pdf, other]: Title: A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!

Chenglizhao Chen, Mengke Song, Wenfeng Song, Li Guo, Muwei Jian

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[240] arXiv:2206.13415 (cross-list from cs.CL) [pdf, other]: Title: Is the Language Familiarity Effect gradual? A computational modelling approach

Maureen de Seyssel, Guillaume Wisniewski, Emmanuel Dupoux

Comments: 8 pages, 2 figures, accepted at CogSci 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[241] arXiv:2206.13476 (cross-list from cs.SD) [pdf, other]: Title: Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework

Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas

Comments: Accepted at ISCA Interspeech 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[242] arXiv:2206.13611 (cross-list from cs.SD) [pdf, other]: Title: ClearBuds: Wireless Binaural Earbuds for Learning-Based Speech Enhancement

Ishan Chatterjee, Maruchi Kim, Vivek Jayaram, Shyamnath Gollakota, Ira Kemelmacher-Shlizerman, Shwetak Patel, Steven M. Seitz

Comments: 12 pages, Published in Mobisys 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[243] arXiv:2206.13689 (cross-list from cs.SD) [pdf, other]: Title: Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation

Jian Luo, Jianzong Wang, Ning Cheng, Edward Xiao, Xulong Zhang, Jing Xiao

Comments: Accepted by Interspeech 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[244] arXiv:2206.13691 (cross-list from cs.SD) [pdf, other]: Title: Dummy Prototypical Networks for Few-Shot Open-Set Keyword Spotting

Byeonggeun Kim, Seunghan Yang, Inseop Chung, Simyung Chang

Comments: Proceedings of INTERSPEECH 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[245] arXiv:2206.13700 (cross-list from cs.SD) [pdf, other]: Title: Domain Agnostic Few-shot Learning for Speaker Verification

Seunghan Yang, Debasmit Das, Janghoon Cho, Hyoungwoo Park, Sungrack Yun

Comments: Proceedings of INTERSPEECH 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[246] arXiv:2206.13708 (cross-list from cs.SD) [pdf, other]: Title: Personalized Keyword Spotting through Multi-task Learning

Seunghan Yang, Byeonggeun Kim, Inseop Chung, Simyung Chang

Comments: Proceedings of INTERSPEECH 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[247] arXiv:2206.13758 (cross-list from cs.LG) [pdf, other]: Title: Exploring linguistic feature and model combination for speech recognition based automatic AD detection

Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Comments: Accepted by INTERSPEECH 2022

Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[248] arXiv:2206.13817 (cross-list from cs.SD) [pdf, other]: Title: Comparison of Speech Representations for the MOS Prediction System

Aki Kunikoshi, Jaebok Kim, Wonsuk Jun, Kåre Sjölander (ReadSpeaker)

Comments: 5 pages, 4 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[249] arXiv:2206.13909 (cross-list from cs.SD) [pdf, other]: Title: QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design

Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang

Comments: tech report; won 1st place in DCASE2021 challenge. arXiv admin note: substantial text overlap with arXiv:2111.06531

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[250] arXiv:2206.14009 (cross-list from cs.CV) [pdf, other]: Title: Show Me Your Face, And I'll Tell You How You Speak

Christen Millerdurai, Lotfy Abdel Khaliq, Timon Ulrich

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)

Total of 268 entries : 1-25 ... 151-175 176-200 201-225 226-250 251-268

Showing up to 25 entries per page: fewer | more | all