Audio and Speech Processing

Authors and titles for May 2023

Total of 427 entries : 1-50 51-100 101-150 151-200 201-250 251-300 ... 401-427

Showing up to 50 entries per page: fewer | more | all

[101] arXiv:2305.15536 [pdf, other]: Title: RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[102] arXiv:2305.15567 [pdf, other]: Title: Generalized domain adaptation framework for parametric back-end in speaker recognition

Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

Subjects: Audio and Speech Processing (eess.AS)
[103] arXiv:2305.15749 [pdf, other]: Title: Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov

Comments: 5 pages, 1 figure, 3 tables, accepted to Interspeech

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[104] arXiv:2305.15816 [pdf, other]: Title: DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion

Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee

Comments: 23 pages, 10 figures, 17 tables, under review

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[105] arXiv:2305.15958 [pdf, other]: Title: Improving Scheduled Sampling for Neural Transducer-based ASR

Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura

Comments: Accepted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS)
[106] arXiv:2305.15971 [pdf, other]: Title: Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami

Comments: Accepted to Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS)
[107] arXiv:2305.16040 [pdf, other]: Title: The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech

Martti Vainio, Antti Suni, Juraj Šimko, Sofoklis Kakouros

Subjects: Audio and Speech Processing (eess.AS)
[108] arXiv:2305.16065 [pdf, other]: Title: ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

Yuanchao Li, Zeyu Zhao, Ondrej Klejch, Peter Bell, Catherine Lai

Comments: Accepted to INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[109] arXiv:2305.16076 [pdf, other]: Title: Transfer Learning for Personality Perception via Speech Emotion Recognition

Yuanchao Li, Peter Bell, Catherine Lai

Comments: Accepted to INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[110] arXiv:2305.16085 [pdf, other]: Title: Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders

Nina R Benway, Yashish M Siriwardena, Jonathan L Preston, Elaine Hitchcock, Tara McAllister, Carol Espy-Wilson

Comments: *denotes equal contribution. To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

Journal-ref: Proc. INTERSPEECH 2023, 4568-4572

Subjects: Audio and Speech Processing (eess.AS)
[111] arXiv:2305.16111 [pdf, other]: Title: Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants

Nina R Benway, Jonathan L Preston, Asif Salekin, Yi Xiao, Harshit Sharma, Tara McAllister

Comments: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

Journal-ref: Proc. INTERSPEECH 2023, 4563-4567

Subjects: Audio and Speech Processing (eess.AS)
[112] arXiv:2305.16286 [pdf, other]: Title: Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Wangyou Zhang, Yanmin Qian

Comments: Accepted by Interspeech; 5 pages, 1 figure, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[113] arXiv:2305.16608 [pdf, other]: Title: AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

Yi-Chiao Wu, Israel D. Gebru, Dejan Marković, Alexander Richard

Comments: 5 pages, 1 figure, 5 tables. Proc. ICASSP, 2023

Subjects: Audio and Speech Processing (eess.AS)
[114] arXiv:2305.16619 [pdf, other]: Title: 2-bit Conformer quantization for automatic speech recognition

Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

Comments: submitted to Interspeech

Subjects: Audio and Speech Processing (eess.AS)
[115] arXiv:2305.16665 [pdf, other]: Title: ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression

Yixin Wan, Yuan Zhou, Xiulian Peng, Kai-Wei Chang, Yan Lu

Comments: This paper was accepted to Interspeech 2023 Main Conference

Journal-ref: Proceedings of INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[116] arXiv:2305.16690 [pdf, html, other]: Title: Learning Representation of Therapist Empathy in Counseling Conversation Using Siamese Hierarchical Attention Network

Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Comments: Accepted by INTERSPEECH 2024

Subjects: Audio and Speech Processing (eess.AS)
[117] arXiv:2305.16699 [pdf, other]: Title: Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis

Seongyeon Park, Bohyung Kim, Tae-hyun Oh

Comments: Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[118] arXiv:2305.16753 [pdf, other]: Title: ElectrodeNet -- A Deep Learning Based Sound Coding Strategy for Cochlear Implants

Enoch Hsin-Ho Huang, Rong Chao, Yu Tsao, Chao-Min Wu

Comments: 12 pages and 7 figures. Preprint version; IEEE Transactions on Cognitive and Developmental Systems (accepted)

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[119] arXiv:2305.16862 [pdf, other]: Title: Neural modeling of magnetic tape recorders

Otto Mikkonen, Alec Wright, Eloi Moliner, Vesa Välimäki

Comments: Accepted to DAFx 2023. For accompanying web page, see this http URL

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[120] arXiv:2305.17394 [pdf, other]: Title: One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-Jin Yu

Comments: ISCA INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[121] arXiv:2305.17436 [pdf, other]: Title: Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

Yusheng Tian, Guangyan Zhang, Tan Lee

Comments: submitted to INTERSPEECH 2023

Journal-ref: INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS)
[122] arXiv:2305.17719 [pdf, other]: Title: Adapting Language-Audio Models as Few-Shot Audio Learners

Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[123] arXiv:2305.17724 [pdf, other]: Title: Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS

Sewade Ogun, Vincent Colotte, Emmanuel Vincent

Comments: 5 pages with 3 figures, InterSpeech 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[124] arXiv:2305.17971 [pdf, other]: Title: Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze

Comments: Accepted at INTERSPEECH 2023, 5 pages, 2 figures, 4 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[125] arXiv:2305.18074 [pdf, other]: Title: An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini

Comments: 52 pages, 10 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[126] arXiv:2305.18146 [pdf, other]: Title: A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment

Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

Comments: Accepted to Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[127] arXiv:2305.18298 [pdf, other]: Title: Optimization design of a micro-perforated panel absorber with 8.6 octave bands

Xiaoming Wang, Chen Liang, Yulin Mei

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Applied Physics (physics.app-ph); Classical Physics (physics.class-ph)
[128] arXiv:2305.18441 [pdf, other]: Title: DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

Xilin Jiang, Yinghao Aaron Li, Nima Mesgarani

Comments: INTERSPEECH 2023

Journal-ref: Proc. INTERSPEECH 2023, pp.2818--2822

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[129] arXiv:2305.18640 [pdf, other]: Title: Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks

Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS)
[130] arXiv:2305.18739 [pdf, other]: Title: An empirical study on speech restoration guided by self supervised speech representation

Jaeuk Byun, Youna Ji, Soo Whan Chung, Soyeon Choe, Min Seok Choi

Comments: To be presented at ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS)
[131] arXiv:2305.18747 [pdf, other]: Title: Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

Comments: Accepted by Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[132] arXiv:2305.18753 [pdf, other]: Title: Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

Jianyuan Sun, Xubo Liu, Xinhao Mei, Volkan Kılıç, Mark D. Plumbley, Wenwu Wang

Comments: INTERSPEECH 2023. arXiv admin note: substantial text overlap with arXiv:2210.05037

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[133] arXiv:2305.18802 [pdf, other]: Title: LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

Comments: Accepted to Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[134] arXiv:2305.18881 [pdf, other]: Title: MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization

Victoria Y. H. Chua, Hexin Liu, Leibny Paola Garcia Perera, Fei Ting Woon, Jinyi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles

Comments: Accepted by Interspeech 2023, 5 pages, 2 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS)
[135] arXiv:2305.18925 [pdf, other]: Title: Investigating model performance in language identification: beyond simple error statistics

Suzy J. Styles, Victoria Y. H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels

Comments: Accepted to Interspeech 2023, 5 pages, 5 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[136] arXiv:2305.18975 [pdf, other]: Title: Voice Conversion With Just Nearest Neighbors

Matthew Baas, Benjamin van Niekerk, Herman Kamper

Comments: 5 page, 1 table, 2 figures. Accepted at Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[137] arXiv:2305.19011 [pdf, other]: Title: MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models

Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston Hsu, Hung-yi Lee

Comments: Accepted to IEEE ASRU 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
[138] arXiv:2305.19051 [pdf, other]: Title: Towards single integrated spoofing-aware speaker verification embeddings

Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Comments: Accepted by INTERSPEECH 2023. Code and models are available in this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[139] arXiv:2305.19090 [pdf, other]: Title: Prospective Validation of Motor-Based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders

Nina R Benway, Jonathan L Preston

Comments: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

Journal-ref: Proc. INTERSPEECH 2023, 4558-4562

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[140] arXiv:2305.19100 [pdf, other]: Title: Predicting Preferred Dialogue-to-Background Loudness Difference in Dialogue-Separated Audio

Luca Resti, Martin Strauss, Matteo Torcoli, Emanuël Habets, Bernd Edler

Comments: Paper accepted at the 15th International Conference on Quality of Multimedia Experience (QoMEX), 4 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[141] arXiv:2305.19184 [pdf, other]: Title: Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models

Danilo de Oliveira, Navin Raj Prabhu, Timo Gerkmann

Comments: Accepted at Interspeech 2023

Journal-ref: Proc. Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[142] arXiv:2305.19255 [pdf, other]: Title: A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

Comments: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.15982

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[143] arXiv:2305.19269 [pdf, other]: Title: Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[144] arXiv:2305.19396 [pdf, other]: Title: Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages

Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers

Comments: Accepted at INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[145] arXiv:2305.19493 [pdf, other]: Title: MERLIon CCS Challenge Evaluation Plan

Leibny Paola Garcia Perera, Y.H. Victoria Chua, Hexin Liu, Fei Ting Woon, Andy W.H. Khong, Justin Dauwels, Sanjeev Khudanpur, Suzy J. Styles

Comments: Evaluation plan for Interspeech 2023 special session "MERLIon"

Subjects: Audio and Speech Processing (eess.AS)
[146] arXiv:2305.19539 [pdf, other]: Title: Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

Yanxiong Li, Wenchang Cao, Wei Xie, Jialong Li, Emmanouil Benetos

Comments: 13 pages, 8 figures, 12 tables. Accepted for publication in IEEE TMM

Subjects: Audio and Speech Processing (eess.AS)
[147] arXiv:2305.19541 [pdf, other]: Title: Few-Shot Speaker Identification Using Lightweight Prototypical Network with Feature Grouping and Interaction

Yanxiong Li, Hao Chen, Wenchang Cao, Qisheng Huang, Qianhua He

Comments: 12 pages, 4 figures, 12 tables. Accepted for publication in IEEE TMM

Subjects: Audio and Speech Processing (eess.AS)
[148] arXiv:2305.19610 [pdf, other]: Title: FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization

Yabo Wang, Bing Yang, Xiaofei Li

Subjects: Audio and Speech Processing (eess.AS)
[149] arXiv:2305.19972 [pdf, html, other]: Title: VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[150] arXiv:2305.00011 (cross-list from cs.SD) [pdf, html, other]: Title: Adversarial Representation Learning for Robust Privacy Preservation in Audio

Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen

Comments: Published in IEEE Open Journal of Signal Processing

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Total of 427 entries : 1-50 51-100 101-150 151-200 201-250 251-300 ... 401-427

Showing up to 50 entries per page: fewer | more | all