Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for April 2022

Total of 291 entries : 1-25 ... 176-200 201-225 226-250 251-275 276-291
Showing up to 25 entries per page: fewer | more | all
[251] arXiv:2204.08858 (cross-list from eess.AS) [pdf, other]
Title: An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen
Comments: Accepted to SLT 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[252] arXiv:2204.08920 (cross-list from cs.CL) [pdf, other]
Title: Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora
Comments: Submitted to Interspeech2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[253] arXiv:2204.09028 (cross-list from cs.CL) [pdf, other]
Title: On the Locality of Attention in Direct Speech Translation
Belen Alastruey, Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà
Comments: ACL-SRW 2022. Equal contribution between Belen Alastruey and Javier Ferrando
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[254] arXiv:2204.09079 (cross-list from eess.AS) [pdf, other]
Title: Music Source Separation with Generative Flow
Ge Zhu, Jordan Darefsky, Fei Jiang, Anton Selitskiy, Zhiyao Duan
Comments: Accepted by Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[255] arXiv:2204.09227 (cross-list from cs.CL) [pdf, other]
Title: Cross-stitched Multi-modal Encoders
Karan Singla, Daniel Pressel, Ryan Price, Bhargav Srinivas Chinnari, Yeon-Jun Kim, Srinivas Bangalore
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[256] arXiv:2204.09595 (cross-list from cs.CL) [pdf, other]
Title: Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Chih-Chiang Chang, Hung-yi Lee
Comments: INTERSPEECH 2022 camera ready
Journal-ref: Proc. Interspeech 2022, 5175-5179
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[257] arXiv:2204.09606 (cross-list from cs.CL) [pdf, other]
Title: Detecting Unintended Memorization in Language-Model-Fused ASR
W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews
Comments: Interspeech 2022
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[258] arXiv:2204.09919 (cross-list from cs.HC) [pdf, other]
Title: Sonic Interactions in Virtual Environments: the Egocentric Audio Perspective of the Digital Twin
Michele Geronazzo, Stefania Serafin
Comments: 46 pages, 5 figures. Pre-print version of the introduction to the book "Sonic Interactions in Virtual Environments" in press for Springer's Human-Computer Interaction Series, Open Access license. The pre-print editors' copy of the book can be found at this https URL - full book info: this https URL
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[259] arXiv:2204.09934 (cross-list from eess.AS) [pdf, other]
Title: FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao
Comments: Accepted by IJCAI 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[260] arXiv:2204.10020 (cross-list from eess.AS) [pdf, other]
Title: Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana
Comments: Accepted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[261] arXiv:2204.10228 (cross-list from eess.AS) [pdf, other]
Title: The NIST CTS Speaker Recognition Challenge
Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason, Douglas Reynolds
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[262] arXiv:2204.10242 (cross-list from eess.AS) [pdf, other]
Title: The 2021 NIST Speaker Recognition Evaluation
Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason, Douglas Reynolds
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[263] arXiv:2204.10289 (cross-list from physics.bio-ph) [pdf, other]
Title: Wave-like behaviour in (0,1) binary sequences
E. Canessa
Comments: 6 pages, 5 figures
Subjects: Biological Physics (physics.bio-ph); Sound (cs.SD)
[264] arXiv:2204.10461 (cross-list from cs.CL) [pdf, other]
Title: WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment
Lin Yao, Jianfei Song, Ruizhuo Xu, Yingfang Yang, Zijian Chen, Yafeng Deng
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[265] arXiv:2204.10586 (cross-list from cs.CL) [pdf, other]
Title: Efficient Training of Neural Transducer for Speech Recognition
Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney
Comments: accepted at Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[266] arXiv:2204.10593 (cross-list from cs.CL) [pdf, other]
Title: LibriS2S: A German-English Speech-to-Speech Translation Corpus
Pedro Jeuris, Jan Niehues
Comments: Accepted to LREC 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[267] arXiv:2204.11030 (cross-list from eess.AS) [pdf, other]
Title: Improving Self-Supervised Learning-based MOS Prediction Networks
Bálint Gyires-Tóth, Csaba Zainkó
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[268] arXiv:2204.11032 (cross-list from eess.AS) [pdf, other]
Title: Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation
Jiangyu Han, Yanhua Long
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[269] arXiv:2204.11180 (cross-list from eess.AS) [pdf, other]
Title: Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention
Yanxiong Li, Wucheng Wang, Hao Chen, Wenchang Cao, Wei Li, Qianhua He
Comments: Accepted by Odyssey 2022 (The Speaker and Language Recognition Workshop 2022, Beijing, China)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[270] arXiv:2204.11232 (cross-list from eess.AS) [pdf, other]
Title: Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
Natsuo Yamashita, Shota Horiguchi, Takeshi Homma
Comments: Accepted to Speaker Odyssey 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[271] arXiv:2204.11286 (cross-list from eess.AS) [pdf, other]
Title: Improved far-field speech recognition using Joint Variational Autoencoder
Shashi Kumar, Shakti P. Rath, Abhishek Pandey
Comments: 5 pages, 2 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[272] arXiv:2204.11420 (cross-list from cs.CV) [pdf, other]
Title: Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy
Chengxin Chen, Meng Wang, Pengyuan Zhang
Comments: 5 pages, 2 figures, based on the work that won first place in the challenge of DCASE2021 Task 1B
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[273] arXiv:2204.11501 (cross-list from eess.AS) [pdf, other]
Title: Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data
Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[274] arXiv:2204.11550 (cross-list from cs.CL) [pdf, other]
Title: Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study
Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line. H. Clemmensen
Comments: 5 pages. Submitted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[275] arXiv:2204.11775 (cross-list from quant-ph) [pdf, other]
Title: A quantum Fourier transform (QFT) based note detection algorithm
Shlomo Kashani, Maryam Alqasemi, Jacob Hammond
Subjects: Quantum Physics (quant-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 291 entries : 1-25 ... 176-200 201-225 226-250 251-275 276-291
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack