Sound

Authors and titles for April 2022

Total of 291 entries : 1-50 101-150 151-200 201-250 251-291

Showing up to 50 entries per page: fewer | more | all

[251] arXiv:2204.08858 (cross-list from eess.AS) [pdf, other]: Title: An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen

Comments: Accepted to SLT 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[252] arXiv:2204.08920 (cross-list from cs.CL) [pdf, other]: Title: Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora

Comments: Submitted to Interspeech2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[253] arXiv:2204.09028 (cross-list from cs.CL) [pdf, other]: Title: On the Locality of Attention in Direct Speech Translation

Belen Alastruey, Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà

Comments: ACL-SRW 2022. Equal contribution between Belen Alastruey and Javier Ferrando

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[254] arXiv:2204.09079 (cross-list from eess.AS) [pdf, other]: Title: Music Source Separation with Generative Flow

Ge Zhu, Jordan Darefsky, Fei Jiang, Anton Selitskiy, Zhiyao Duan

Comments: Accepted by Signal Processing Letters

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[255] arXiv:2204.09227 (cross-list from cs.CL) [pdf, other]: Title: Cross-stitched Multi-modal Encoders

Karan Singla, Daniel Pressel, Ryan Price, Bhargav Srinivas Chinnari, Yeon-Jun Kim, Srinivas Bangalore

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[256] arXiv:2204.09595 (cross-list from cs.CL) [pdf, other]: Title: Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

Chih-Chiang Chang, Hung-yi Lee

Comments: INTERSPEECH 2022 camera ready

Journal-ref: Proc. Interspeech 2022, 5175-5179

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[257] arXiv:2204.09606 (cross-list from cs.CL) [pdf, other]: Title: Detecting Unintended Memorization in Language-Model-Fused ASR

W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews

Comments: Interspeech 2022

Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[258] arXiv:2204.09919 (cross-list from cs.HC) [pdf, other]: Title: Sonic Interactions in Virtual Environments: the Egocentric Audio Perspective of the Digital Twin

Michele Geronazzo, Stefania Serafin

Comments: 46 pages, 5 figures. Pre-print version of the introduction to the book "Sonic Interactions in Virtual Environments" in press for Springer's Human-Computer Interaction Series, Open Access license. The pre-print editors' copy of the book can be found at this https URL - full book info: this https URL

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[259] arXiv:2204.09934 (cross-list from eess.AS) [pdf, other]: Title: FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao

Comments: Accepted by IJCAI 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[260] arXiv:2204.10020 (cross-list from eess.AS) [pdf, other]: Title: Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana

Comments: Accepted to INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[261] arXiv:2204.10228 (cross-list from eess.AS) [pdf, other]: Title: The NIST CTS Speaker Recognition Challenge

Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason, Douglas Reynolds

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[262] arXiv:2204.10242 (cross-list from eess.AS) [pdf, other]: Title: The 2021 NIST Speaker Recognition Evaluation

Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason, Douglas Reynolds

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[263] arXiv:2204.10289 (cross-list from physics.bio-ph) [pdf, other]: Title: Wave-like behaviour in (0,1) binary sequences

E. Canessa

Comments: 6 pages, 5 figures

Subjects: Biological Physics (physics.bio-ph); Sound (cs.SD)
[264] arXiv:2204.10461 (cross-list from cs.CL) [pdf, other]: Title: WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Lin Yao, Jianfei Song, Ruizhuo Xu, Yingfang Yang, Zijian Chen, Yafeng Deng

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[265] arXiv:2204.10586 (cross-list from cs.CL) [pdf, other]: Title: Efficient Training of Neural Transducer for Speech Recognition

Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney

Comments: accepted at Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[266] arXiv:2204.10593 (cross-list from cs.CL) [pdf, other]: Title: LibriS2S: A German-English Speech-to-Speech Translation Corpus

Pedro Jeuris, Jan Niehues

Comments: Accepted to LREC 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[267] arXiv:2204.11030 (cross-list from eess.AS) [pdf, other]: Title: Improving Self-Supervised Learning-based MOS Prediction Networks

Bálint Gyires-Tóth, Csaba Zainkó

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[268] arXiv:2204.11032 (cross-list from eess.AS) [pdf, other]: Title: Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

Jiangyu Han, Yanhua Long

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[269] arXiv:2204.11180 (cross-list from eess.AS) [pdf, other]: Title: Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Yanxiong Li, Wucheng Wang, Hao Chen, Wenchang Cao, Wei Li, Qianhua He

Comments: Accepted by Odyssey 2022 (The Speaker and Language Recognition Workshop 2022, Beijing, China)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[270] arXiv:2204.11232 (cross-list from eess.AS) [pdf, other]: Title: Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization

Natsuo Yamashita, Shota Horiguchi, Takeshi Homma

Comments: Accepted to Speaker Odyssey 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[271] arXiv:2204.11286 (cross-list from eess.AS) [pdf, other]: Title: Improved far-field speech recognition using Joint Variational Autoencoder

Shashi Kumar, Shakti P. Rath, Abhishek Pandey

Comments: 5 pages, 2 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[272] arXiv:2204.11420 (cross-list from cs.CV) [pdf, other]: Title: Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy

Chengxin Chen, Meng Wang, Pengyuan Zhang

Comments: 5 pages, 2 figures, based on the work that won first place in the challenge of DCASE2021 Task 1B

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[273] arXiv:2204.11501 (cross-list from eess.AS) [pdf, other]: Title: Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data

Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li

Comments: Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[274] arXiv:2204.11550 (cross-list from cs.CL) [pdf, other]: Title: Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line. H. Clemmensen

Comments: 5 pages. Submitted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[275] arXiv:2204.11775 (cross-list from quant-ph) [pdf, other]: Title: A quantum Fourier transform (QFT) based note detection algorithm

Shlomo Kashani, Maryam Alqasemi, Jacob Hammond

Subjects: Quantum Physics (quant-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[276] arXiv:2204.11933 (cross-list from eess.AS) [pdf, other]: Title: Cleanformer: A multichannel array configuration-invariant neural enhancement frontend for ASR in smart speakers

Joseph Caroselli, Arun Narayanan, Nathan Howard, Tom O'Malley

Comments: Accepted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[277] arXiv:2204.11934 (cross-list from cs.LG) [pdf, other]: Title: On-demand compute reduction with stochastic wav2vec 2.0

Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Comments: submitted to Interspeech, 2022

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[278] arXiv:2204.12076 (cross-list from eess.AS) [pdf, other]: Title: ATST: Audio Representation Learning with Teacher-Student Transformer

Xian Li, Xiaofei Li

Comments: INTERSPEECH2022(Accepted)

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[279] arXiv:2204.12092 (cross-list from eess.AS) [pdf, other]: Title: Mask scalar prediction for improving robust automatic speech recognition

Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[280] arXiv:2204.12260 (cross-list from eess.AS) [pdf, other]: Title: Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Comments: 22 pages, 8 figures. Under the review process

Journal-ref: HEAR: Holistic Evaluation of Audio Representations (NeurIPS 2021 Competition) PMLR 166 (2022) 1-24

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[281] arXiv:2204.12279 (cross-list from eess.AS) [pdf, other]: Title: Low-dimensional representation of infant and adult vocalization acoustics

Silvia Pagliarini, Sara Schneider, Christopher T. Kello, Anne S. Warlaumont

Comments: Under review at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[282] arXiv:2204.12308 (cross-list from eess.AS) [pdf, other]: Title: Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Gene-Ping Yang, Hao Tang

Comments: Accepted at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[283] arXiv:2204.12489 (cross-list from cs.CV) [pdf, other]: Title: Sound Localization by Self-Supervised Time Delay Estimation

Ziyang Chen, David F. Fouhey, Andrew Owens

Comments: ECCV 2022

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[284] arXiv:2204.12649 (cross-list from eess.AS) [pdf, other]: Title: Study on the Fairness of Speaker Verification Systems on Underrepresented Accents in English

Mariel Estevez, Luciana Ferrer

Comments: 5 pages, 2 figures, submitted to INTERSPEECH

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[285] arXiv:2204.12765 (cross-list from cs.CL) [pdf, other]: Title: Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Comments: Accepted by INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[286] arXiv:2204.12777 (cross-list from eess.AS) [pdf, other]: Title: Ultra Fast Speech Separation Model with Teacher Student Learning

Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

Comments: Accepted by interspeech 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[287] arXiv:2204.12922 (cross-list from cs.LG) [pdf, other]: Title: Using the Projected Belief Network at High Dimensions

Paul M Baggenstoss

Journal-ref: EUSIPCO 2022 submission

Subjects: Machine Learning (cs.LG); Sound (cs.SD)
[288] arXiv:2204.13622 (cross-list from eess.SP) [pdf, other]: Title: Fast Cross-Correlation for TDoA Estimation on Small Aperture Microphone Arrays

François Grondin, Marc-Antoine Maheux, Jean-Samuel Lauzon, Jonathan Vincent, François Michaud

Comments: Submitted to IEEE ICASSP 2023

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[289] arXiv:2204.13883 (cross-list from eess.AS) [pdf, other]: Title: Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain

Karn N. Watcharasupat, Kenneth Ooi, Bhan Lam, Trevor Wong, Zhen-Ting Ong, Woon-Seng Gan

Comments: Accepted to IEEE Signal Processing Letters. (c) 2022 IEEE

Journal-ref: IEEE Signal Processing Letters, Vol. 29, pp. 1749 - 1753, 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[290] arXiv:2204.13890 (cross-list from eess.AS) [pdf, other]: Title: Deployment of an IoT System for Adaptive In-Situ Soundscape Augmentation

Trevor Wong, Karn N. Watcharasupat, Bhan Lam, Kenneth Ooi, Zhen-Ting Ong, Furi Andi Karnapi, Woon-Seng Gan

Comments: To be presented at the 51st International Congress and Exposition on Noise Control Engineering

Journal-ref: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Feb. 2022, vol. 265, no. 5, pp. 2013-2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Systems and Control (eess.SY)
[291] arXiv:2204.14272 (cross-list from cs.CL) [pdf, other]: Title: End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

Chenyu You, Nuo Chen, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou

Comments: In Findings of NAACL 2022. arXiv admin note: substantial text overlap with arXiv:2010.08923

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 291 entries : 1-50 101-150 151-200 201-250 251-291

Showing up to 50 entries per page: fewer | more | all