Sound

Authors and titles for March 2023

Total of 232 entries : 1-25 76-100 101-125 126-150 151-175 176-200 201-225 226-232

Showing up to 25 entries per page: fewer | more | all

[151] arXiv:2303.07624 (cross-list from cs.CL) [pdf, other]: Title: I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Yifan Peng, Jaesong Lee, Shinji Watanabe

Comments: Accepted at ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2303.07650 (cross-list from cs.CL) [pdf, other]: Title: Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features

Xuchu Chen, Yu Pu, Jinpeng Li, Wei-Qiang Zhang

Comments: accepted by ICASSP 2023

Journal-ref: ICASSP (2023)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153] arXiv:2303.07704 (cross-list from eess.AS) [pdf, other]: Title: TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge

Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang

Comments: Accepted by ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[154] arXiv:2303.07739 (cross-list from eess.SP) [pdf, other]: Title: Detecting post-stroke aphasia using EEG-based neural envelope tracking of natural speech

Pieter De Clercq, Jill Kries, Ramtin Mehraram, Jonas Vanthornhout, Tom Francart, Maaike Vandermosten

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[155] arXiv:2303.07816 (cross-list from eess.AS) [pdf, other]: Title: Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

Wang Dai, Archontis Politis, Tuomas Virtanen

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[156] arXiv:2303.07924 (cross-list from cs.LG) [pdf, other]: Title: Improving Accented Speech Recognition with Multi-Domain Training

Lucas Maison, Yannick Estève

Comments: 5 pages, 2 figures. Accepted to ICASSP 2023

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2303.08005 (cross-list from eess.AS) [pdf, other]: Title: Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks

Darius Petermann, Inseon Jang, Minje Kim

Comments: Accepted to ICASSP 2023. For resources and examples, see this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[158] arXiv:2303.08019 (cross-list from eess.AS) [pdf, other]: Title: Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng

Comments: 5 pages, 3 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[159] arXiv:2303.08027 (cross-list from eess.AS) [pdf, other]: Title: A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition

Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen Meng

Comments: 5 pages, 3 figures, 5 tables

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[160] arXiv:2303.08052 (cross-list from eess.AS) [pdf, other]: Title: Localizing Spatial Information in Neural Spatiospectral Filters

Annika Briegleb, Thomas Haubner, Vasileios Belagiannis, Walter Kellermann

Comments: Accepted to the 31st European Signal Processing Conference (EUSIPCO 2023), Helsinki, Finland. 5 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[161] arXiv:2303.08268 (cross-list from cs.RO) [pdf, other]: Title: Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

Xufeng Zhao, Mengdi Li, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter

Comments: IROS2023, Detroit. See the project website at this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162] arXiv:2303.08295 (cross-list from eess.SP) [pdf, other]: Title: A large-scale multimodal dataset of human speech recognition

Yao Ge, Chong Tang, Haobo Li, Zikang Zhang, Wenda Li, Kevin Chetty, Daniele Faccio, Qammer H. Abbasi, Muhammad Imran

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[163] arXiv:2303.08343 (cross-list from eess.AS) [pdf, other]: Title: Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw

Comments: Accepted to IEEE ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[164] arXiv:2303.08372 (cross-list from eess.AS) [pdf, other]: Title: Target Sound Extraction with Variable Cross-modality Clues

Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

Comments: Accepted by ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[165] arXiv:2303.08379 (cross-list from eess.AS) [pdf, other]: Title: Implementing Continuous HRTF Measurement in Near-Field

Ee-Leng Tan, Santi Peksi, Woon-Seng Gan

Comments: 5 pages, 9 figures, Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[166] arXiv:2303.08480 (cross-list from eess.AS) [pdf, other]: Title: Acoustic source localization in the spherical harmonics domain exploiting low-rank approximations

Maximo Cobos, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti

Comments: To appear in ICASSP 2023

Journal-ref: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[167] arXiv:2303.08536 (cross-list from cs.MM) [pdf, other]: Title: Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro

Comments: Accepted at CVPR 2023. Implementation available: this https URL

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2303.08636 (cross-list from eess.AS) [pdf, other]: Title: HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism

Yuguang Yang, Yu Pan, Jingjing Yin, Jiangyu Han, Lei Ma, Heng Lu

Comments: Accepted by ICASSP2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[169] arXiv:2303.08670 (cross-list from cs.CV) [pdf, other]: Title: Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video

Minsu Kim, Chae Won Kim, Yong Man Ro

Comments: Accepted in AAAI2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2303.08674 (cross-list from eess.AS) [pdf, other]: Title: Speech Signal Improvement Using Causal Generative Diffusion Models

Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann

Comments: Accepted by ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[171] arXiv:2303.08702 (cross-list from eess.AS) [pdf, other]: Title: Beamformer-Guided Target Speaker Extraction

Mohamed Elminshawi, Srikanth Raj Chetupalli, Emanuël A. P. Habets

Comments: Submitted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[172] arXiv:2303.09057 (cross-list from eess.AS) [pdf, other]: Title: TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

Hyun Joon Park, Seok Woo Yang, Jin Sob Kim, Wooseok Shin, Sung Won Han

Comments: To appear in ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[173] arXiv:2303.09119 (cross-list from cs.CV) [pdf, other]: Title: Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, Lequan Yu

Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023. 10 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2303.09278 (cross-list from eess.AS) [pdf, other]: Title: DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

Yanzhe Fu, Yueteng Kang, Songjun Cao, Long Ma

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[175] arXiv:2303.09404 (cross-list from eess.AS) [pdf, other]: Title: Speech Modeling with a Hierarchical Transformer Dynamical VAE

Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Total of 232 entries : 1-25 76-100 101-125 126-150 151-175 176-200 201-225 226-232

Showing up to 25 entries per page: fewer | more | all