Audio and Speech Processing

Authors and titles for April 2022

Total of 320 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 176-200 ... 301-320

Showing up to 25 entries per page: fewer | more | all

[101] arXiv:2204.11232 [pdf, other]: Title: Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization

Natsuo Yamashita, Shota Horiguchi, Takeshi Homma

Comments: Accepted to Speaker Odyssey 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[102] arXiv:2204.11286 [pdf, other]: Title: Improved far-field speech recognition using Joint Variational Autoencoder

Shashi Kumar, Shakti P. Rath, Abhishek Pandey

Comments: 5 pages, 2 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[103] arXiv:2204.11501 [pdf, other]: Title: Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data

Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li

Comments: Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[104] arXiv:2204.11933 [pdf, other]: Title: Cleanformer: A multichannel array configuration-invariant neural enhancement frontend for ASR in smart speakers

Joseph Caroselli, Arun Narayanan, Nathan Howard, Tom O'Malley

Comments: Accepted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[105] arXiv:2204.12076 [pdf, other]: Title: ATST: Audio Representation Learning with Teacher-Student Transformer

Xian Li, Xiaofei Li

Comments: INTERSPEECH2022(Accepted)

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[106] arXiv:2204.12092 [pdf, other]: Title: Mask scalar prediction for improving robust automatic speech recognition

Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[107] arXiv:2204.12260 [pdf, other]: Title: Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Comments: 22 pages, 8 figures. Under the review process

Journal-ref: HEAR: Holistic Evaluation of Audio Representations (NeurIPS 2021 Competition) PMLR 166 (2022) 1-24

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[108] arXiv:2204.12279 [pdf, other]: Title: Low-dimensional representation of infant and adult vocalization acoustics

Silvia Pagliarini, Sara Schneider, Christopher T. Kello, Anne S. Warlaumont

Comments: Under review at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[109] arXiv:2204.12308 [pdf, other]: Title: Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Gene-Ping Yang, Hao Tang

Comments: Accepted at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[110] arXiv:2204.12649 [pdf, other]: Title: Study on the Fairness of Speaker Verification Systems on Underrepresented Accents in English

Mariel Estevez, Luciana Ferrer

Comments: 5 pages, 2 figures, submitted to INTERSPEECH

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[111] arXiv:2204.12777 [pdf, other]: Title: Ultra Fast Speech Separation Model with Teacher Student Learning

Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

Comments: Accepted by interspeech 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[112] arXiv:2204.13883 [pdf, other]: Title: Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain

Karn N. Watcharasupat, Kenneth Ooi, Bhan Lam, Trevor Wong, Zhen-Ting Ong, Woon-Seng Gan

Comments: Accepted to IEEE Signal Processing Letters. (c) 2022 IEEE

Journal-ref: IEEE Signal Processing Letters, Vol. 29, pp. 1749 - 1753, 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[113] arXiv:2204.13890 [pdf, other]: Title: Deployment of an IoT System for Adaptive In-Situ Soundscape Augmentation

Trevor Wong, Karn N. Watcharasupat, Bhan Lam, Kenneth Ooi, Zhen-Ting Ong, Furi Andi Karnapi, Woon-Seng Gan

Comments: To be presented at the 51st International Congress and Exposition on Noise Control Engineering

Journal-ref: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Feb. 2022, vol. 265, no. 5, pp. 2013-2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Systems and Control (eess.SY)
[114] arXiv:2204.00061 (cross-list from cs.SD) [pdf, other]: Title: Data-augmented cross-lingual synthesis in a teacher-student framework

Marcel de Korte, Jaebok Kim, Aki Kunikoshi, Adaeze Adigwe, Esther Klabbers

Comments: Submitted to INTERSPEECH 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[115] arXiv:2204.00088 (cross-list from cs.SD) [pdf, other]: Title: Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression

Salvatore Fara, Stefano Goria, Emilia Molimpakis, Nicholas Cummins

Comments: Submitted to Interspeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[116] arXiv:2204.00094 (cross-list from cs.SD) [pdf, other]: Title: Perceptive, non-linear Speech Processing and Spiking Neural Networks

Jean Rouat, Ramin Pichevar, Stéphane Loiselle

Comments: preprint of the 2005 published paper: Perceptive, Non-linear Speech Processing and Spiking Neural Networks. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science, vol 3445. Springer, Berlin, Heidelberg

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[117] arXiv:2204.00164 (cross-list from cs.CL) [pdf, other]: Title: Filter-based Discriminative Autoencoders for Children Speech Recognition

Chiang-Lin Tai, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Comments: Published in EUSIPCO 2022

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2204.00174 (cross-list from cs.CL) [pdf, other]: Title: InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

Comments: This paper was submitted to INTERSPEECH2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:2204.00175 (cross-list from cs.CL) [pdf, other]: Title: Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR

Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida

Comments: SLT 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2204.00176 (cross-list from cs.CL) [pdf, other]: Title: Better Intermediates Improve CTC Inference

Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida

Comments: 5 pages, submitted INTERSPEECH2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121] arXiv:2204.00212 (cross-list from cs.CL) [pdf, other]: Title: Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon

Comments: Accepted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2204.00291 (cross-list from cs.CL) [pdf, other]: Title: Text-To-Speech Data Augmentation for Low Resource Speech Recognition

Rodolfo Zevallos

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123] arXiv:2204.00311 (cross-list from cs.SD) [pdf, other]: Title: Speaker verification in mismatch training and testing conditions

Marcos Faundez-Zanuy, Adam Slupinski

Comments: 4 pages, published in 6th international conference on spoken language processing (ICSLP 2000), Vol. II, pp.322-325. ICSLP 2000, ISBN 7-80150-144-4/G.18Beijing (China). October 16-20, 2000. arXiv admin note: substantial text overlap with arXiv:2203.00513

Journal-ref: 6th international conference on spoken language processing (ICSLP 2000), Vol. II, pp.322-325, 2000

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124] arXiv:2204.00331 (cross-list from cs.SD) [pdf, other]: Title: Using segment-based features of jaw movements to recognize foraging activities in grazing cattle

José O. Chelotti, Sebastián R. Vanrell, Luciano S. Martinez-Rau, Julio R. Galli, Santiago A. Utsumi, Alejandra M. Planisich, Suyai A. Almirón, Diego H. Milone, Leonardo L. Giovanini, H. Leonardo Rufiner

Comments: Preprint submitted to journal

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2204.00348 (cross-list from cs.CL) [pdf, other]: Title: WavFT: Acoustic model finetuning with labelled and unlabelled data

Utkarsh Chauhan, Vikas Joshi, Rupesh R. Mehta

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 320 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 176-200 ... 301-320

Showing up to 25 entries per page: fewer | more | all