Sound

Authors and titles for April 2022

Total of 291 entries : 1-25 ... 176-200 201-225 226-250 251-275 276-291

Showing up to 25 entries per page: fewer | more | all

[251] arXiv:2204.08858 (cross-list from eess.AS) [pdf, other]: Title: An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen

Comments: Accepted to SLT 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[252] arXiv:2204.08920 (cross-list from cs.CL) [pdf, other]: Title: Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora

Comments: Submitted to Interspeech2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[253] arXiv:2204.09028 (cross-list from cs.CL) [pdf, other]: Title: On the Locality of Attention in Direct Speech Translation

Belen Alastruey, Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà

Comments: ACL-SRW 2022. Equal contribution between Belen Alastruey and Javier Ferrando

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[254] arXiv:2204.09079 (cross-list from eess.AS) [pdf, other]: Title: Music Source Separation with Generative Flow

Ge Zhu, Jordan Darefsky, Fei Jiang, Anton Selitskiy, Zhiyao Duan

Comments: Accepted by Signal Processing Letters

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[255] arXiv:2204.09227 (cross-list from cs.CL) [pdf, other]: Title: Cross-stitched Multi-modal Encoders

Karan Singla, Daniel Pressel, Ryan Price, Bhargav Srinivas Chinnari, Yeon-Jun Kim, Srinivas Bangalore

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[256] arXiv:2204.09595 (cross-list from cs.CL) [pdf, other]: Title: Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

Chih-Chiang Chang, Hung-yi Lee

Comments: INTERSPEECH 2022 camera ready

Journal-ref: Proc. Interspeech 2022, 5175-5179

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[257] arXiv:2204.09606 (cross-list from cs.CL) [pdf, other]: Title: Detecting Unintended Memorization in Language-Model-Fused ASR

W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews

Comments: Interspeech 2022

Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[258] arXiv:2204.09919 (cross-list from cs.HC) [pdf, other]: Title: Sonic Interactions in Virtual Environments: the Egocentric Audio Perspective of the Digital Twin

Michele Geronazzo, Stefania Serafin

Comments: 46 pages, 5 figures. Pre-print version of the introduction to the book "Sonic Interactions in Virtual Environments" in press for Springer's Human-Computer Interaction Series, Open Access license. The pre-print editors' copy of the book can be found at this https URL - full book info: this https URL

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[259] arXiv:2204.09934 (cross-list from eess.AS) [pdf, other]: Title: FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao

Comments: Accepted by IJCAI 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[260] arXiv:2204.10020 (cross-list from eess.AS) [pdf, other]: Title: Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana

Comments: Accepted to INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[261] arXiv:2204.10228 (cross-list from eess.AS) [pdf, other]: Title: The NIST CTS Speaker Recognition Challenge

Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason, Douglas Reynolds

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[262] arXiv:2204.10242 (cross-list from eess.AS) [pdf, other]: Title: The 2021 NIST Speaker Recognition Evaluation

Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason, Douglas Reynolds

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[263] arXiv:2204.10289 (cross-list from physics.bio-ph) [pdf, other]: Title: Wave-like behaviour in (0,1) binary sequences

E. Canessa

Comments: 6 pages, 5 figures

Subjects: Biological Physics (physics.bio-ph); Sound (cs.SD)
[264] arXiv:2204.10461 (cross-list from cs.CL) [pdf, other]: Title: WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Lin Yao, Jianfei Song, Ruizhuo Xu, Yingfang Yang, Zijian Chen, Yafeng Deng

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[265] arXiv:2204.10586 (cross-list from cs.CL) [pdf, other]: Title: Efficient Training of Neural Transducer for Speech Recognition

Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney

Comments: accepted at Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[266] arXiv:2204.10593 (cross-list from cs.CL) [pdf, other]: Title: LibriS2S: A German-English Speech-to-Speech Translation Corpus

Pedro Jeuris, Jan Niehues

Comments: Accepted to LREC 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[267] arXiv:2204.11030 (cross-list from eess.AS) [pdf, other]: Title: Improving Self-Supervised Learning-based MOS Prediction Networks

Bálint Gyires-Tóth, Csaba Zainkó

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[268] arXiv:2204.11032 (cross-list from eess.AS) [pdf, other]: Title: Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

Jiangyu Han, Yanhua Long

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[269] arXiv:2204.11180 (cross-list from eess.AS) [pdf, other]: Title: Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Yanxiong Li, Wucheng Wang, Hao Chen, Wenchang Cao, Wei Li, Qianhua He

Comments: Accepted by Odyssey 2022 (The Speaker and Language Recognition Workshop 2022, Beijing, China)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[270] arXiv:2204.11232 (cross-list from eess.AS) [pdf, other]: Title: Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization

Natsuo Yamashita, Shota Horiguchi, Takeshi Homma

Comments: Accepted to Speaker Odyssey 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[271] arXiv:2204.11286 (cross-list from eess.AS) [pdf, other]: Title: Improved far-field speech recognition using Joint Variational Autoencoder

Shashi Kumar, Shakti P. Rath, Abhishek Pandey

Comments: 5 pages, 2 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[272] arXiv:2204.11420 (cross-list from cs.CV) [pdf, other]: Title: Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy

Chengxin Chen, Meng Wang, Pengyuan Zhang

Comments: 5 pages, 2 figures, based on the work that won first place in the challenge of DCASE2021 Task 1B

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[273] arXiv:2204.11501 (cross-list from eess.AS) [pdf, other]: Title: Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data

Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li

Comments: Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[274] arXiv:2204.11550 (cross-list from cs.CL) [pdf, other]: Title: Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line. H. Clemmensen

Comments: 5 pages. Submitted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[275] arXiv:2204.11775 (cross-list from quant-ph) [pdf, other]: Title: A quantum Fourier transform (QFT) based note detection algorithm

Shlomo Kashani, Maryam Alqasemi, Jacob Hammond

Subjects: Quantum Physics (quant-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 291 entries : 1-25 ... 176-200 201-225 226-250 251-275 276-291

Showing up to 25 entries per page: fewer | more | all