Sound

Authors and titles for April 2022

Total of 291 entries : 1-25 51-75 76-100 101-125 126-150 151-175 176-200 201-225 ... 276-291

Showing up to 25 entries per page: fewer | more | all

[126] arXiv:2204.13094 [pdf, other]: Title: Unsupervised Word Segmentation using K Nearest Neighbors

Tzeviya Sylvia Fuchs, Yedid Hoshen, Joseph Keshet

Comments: Submitted to interspeech 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127] arXiv:2204.13206 [pdf, other]: Title: Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Dan Oneata, Horia Cucu

Comments: Accepted at the Multimodal Learning and Applications Workshop (MULA) from CVPR 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[128] arXiv:2204.13289 [pdf, other]: Title: Music Enhancement via Image Translation and Vocoding

Nikhil Kandpal, Oriol Nieto, Zeyu Jin

Comments: ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[129] arXiv:2204.13430 [pdf, other]: Title: Pseudo strong labels for large scale weakly supervised audio tagging

Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2204.13437 [pdf, other]: Title: Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss

Efthymios Georgiou, Kosmas Kritsis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[131] arXiv:2204.13601 [pdf, other]: Title: Emotion Recognition In Persian Speech Using Deep Neural Networks

Ali Yazdani, Hossein Simchi, Yasser Shekofteh

Comments: 5 pages, 1 figure, 3 tables

Journal-ref: 11th International Conference on Computer and Knowledge Engineering (ICCKE 2021)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[132] arXiv:2204.13668 [pdf, other]: Title: Unaligned Supervision For Automatic Music Transcription in The Wild

Ben Maman, Amit H. Bermano

Comments: 16 pages, project page available at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[133] arXiv:2204.14057 [pdf, other]: Title: Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang, Yuxing Peng

Comments: 8 pages, 4 figures. Accepted by IJCAI-2022

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[134] arXiv:2204.00065 (cross-list from eess.AS) [pdf, other]: Title: Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives

Samik Sadhu, Hynek Hermansky

Comments: Submitted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[135] arXiv:2204.00164 (cross-list from cs.CL) [pdf, other]: Title: Filter-based Discriminative Autoencoders for Children Speech Recognition

Chiang-Lin Tai, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Comments: Published in EUSIPCO 2022

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2204.00170 (cross-list from eess.AS) [pdf, other]: Title: Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

Fan-Lin Wang, Po-chun Hsu, Da-rong Liu, Hung-yi Lee

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[137] arXiv:2204.00174 (cross-list from cs.CL) [pdf, other]: Title: InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

Comments: This paper was submitted to INTERSPEECH2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2204.00175 (cross-list from cs.CL) [pdf, other]: Title: Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR

Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida

Comments: SLT 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[139] arXiv:2204.00176 (cross-list from cs.CL) [pdf, other]: Title: Better Intermediates Improve CTC Inference

Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida

Comments: 5 pages, submitted INTERSPEECH2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2204.00212 (cross-list from cs.CL) [pdf, other]: Title: Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon

Comments: Accepted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2204.00218 (cross-list from eess.AS) [pdf, other]: Title: End-to-End Multi-speaker ASR with Independent Vector Analysis

Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian

Comments: Submitted to INTERSPEECH2022. 5 pages, 2 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[142] arXiv:2204.00291 (cross-list from cs.CL) [pdf, other]: Title: Text-To-Speech Data Augmentation for Low Resource Speech Recognition

Rodolfo Zevallos

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2204.00348 (cross-list from cs.CL) [pdf, other]: Title: WavFT: Acoustic model finetuning with labelled and unlabelled data

Utkarsh Chauhan, Vikas Joshi, Rupesh R. Mehta

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2204.00436 (cross-list from eess.AS) [pdf, other]: Title: AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu

Comments: 5 pages, 2 tables, 2 figure. Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[145] arXiv:2204.00555 (cross-list from eess.AS) [pdf, other]: Title: 1-D CNN based Acoustic Scene Classification via Reducing Layer-wise Dimensionality

Arshdeep Singh

Comments: No comments

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[146] arXiv:2204.00558 (cross-list from cs.CL) [pdf, other]: Title: Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra

Comments: Accepted at ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147] arXiv:2204.00604 (cross-list from cs.CV) [pdf, other]: Title: Quantized GAN for Complex Music Generation from Dance Videos

Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, Sergey Tulyakov

Comments: Dataset and code at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2204.00618 (cross-list from eess.AS) [pdf, other]: Title: ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

Edresson Casanova, Christopher Shulby, Alexander Korolev, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Aluísio, Moacir Antonelli Ponti

Comments: This paper was accepted at INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[149] arXiv:2204.00657 (cross-list from eess.AS) [pdf, other]: Title: Multimodal Clustering with Role Induced Constraints for Speaker Diarization

Nikolaos Flemotomos, Shrikanth Narayanan

Comments: To appear at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[150] arXiv:2204.00679 (cross-list from cs.CV) [pdf, other]: Title: Learning Audio-Video Modalities from Image Captions

Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 291 entries : 1-25 51-75 76-100 101-125 126-150 151-175 176-200 201-225 ... 276-291

Showing up to 25 entries per page: fewer | more | all