Sound

Authors and titles for March 2023

Total of 232 entries : 1-25 ... 101-125 126-150 151-175 176-200 201-225 226-232

Showing up to 25 entries per page: fewer | more | all

[176] arXiv:2303.09438 (cross-list from cs.CL) [pdf, other]: Title: Trustera: A Live Conversation Redaction System

Evandro Gouvêa, Ali Dadgar, Shahab Jalalvand, Rathi Chengalvarayan, Badrinath Jayakumar, Ryan Price, Nicholas Ruiz, Jennifer McGovern, Srinivas Bangalore, Ben Stern

Comments: 5

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2303.09455 (cross-list from cs.CL) [pdf, other]: Title: Learning Cross-lingual Visual Speech Representations

Andreas Zinonos, Alexandros Haliassos, Pingchuan Ma, Stavros Petridis, Maja Pantic

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2303.09645 (cross-list from cs.RO) [pdf, other]: Title: Development of a Voice Controlled Robotic Arm

Akkas U. Haque, Humayun Kabir, S. C. Banik, M. T. Islam

Subjects: Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2303.09966 (cross-list from eess.AS) [pdf, other]: Title: Magnitude-Corrected and Time-Aligned Interpolation of Head-Related Transfer Functions

Johannes M. Arend, Christoph Pörschmann, Stefan Weinzierl, Fabian Brinkmann

Journal-ref: IEEE/ACM Trans. Audio Speech and Lang. Proc., 31, 3783--3799 (2023)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[180] arXiv:2303.10008 (cross-list from eess.AS) [pdf, other]: Title: Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture

Julien Hauret, Thomas Joubaud, Véronique Zimpfer, Éric Bavu

Comments: Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing on 14/08/2023

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023 - Volume: 31) - pp. 3499 - 3512

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[181] arXiv:2303.10160 (cross-list from eess.AS) [pdf, other]: Title: Visual Information Matters for ASR Error Correction

Vanya Bannihatti Kumar, Shanbo Cheng, Ningxin Peng, Yuchen Zhang

Comments: Accepted at ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[182] arXiv:2303.10335 (cross-list from cs.MM) [pdf, other]: Title: Multimodal Continuous Emotion Recognition: A Technical Report for ABAW5

Su Zhang, Ziyuan Zhao, Cuntai Guan

Comments: 6 pages. 1 figure. arXiv admin note: substantial text overlap with arXiv:2203.13031

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2303.10384 (cross-list from eess.AS) [pdf, other]: Title: Powerful and Extensible WFST Framework for RNN-Transducer Losses

Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg

Comments: To appear in Proc. ICASSP 2023, June 04-10, 2023, Rhodes island, Greece. 5 pages, 5 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[184] arXiv:2303.10510 (cross-list from cs.CL) [pdf, other]: Title: A Deep Learning System for Domain-specific Speech Recognition

Yanan Jia

Comments: 4th International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2023)

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2303.10556 (cross-list from eess.AS) [pdf, other]: Title: The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework

Zirui Ge, Haiyan Guo, Zhen Yang

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[186] arXiv:2303.10721 (cross-list from cs.HC) [pdf, other]: Title: Right the docs: Characterising voice dataset documentation practices used in machine learning

Kathy Reid, Elizabeth T. Williams

Comments: 16 pages, 3 tables, preprint of a submission to AIES 2023

Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2303.10727 (cross-list from cs.LG) [pdf, html, other]: Title: ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement

Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Celine Lin, Ashutosh Sabharwal

Comments: Accepted by ICASSP'23

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2303.10917 (cross-list from eess.AS) [pdf, other]: Title: Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[189] arXiv:2303.10931 (cross-list from stat.ML) [pdf, other]: Title: Approaching an unknown communication system by latent space exploration and causal inference

Gašper Beguš, Andrej Leban, Shane Gero

Comments: 25 pages, 23 figures; new format and section layout (moved some sections to the appendix), added replication experiments, updated references: to a subsequent experimental validation of the work, as well as to related methodological work

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[190] arXiv:2303.10942 (cross-list from cs.CL) [pdf, other]: Title: On-the-fly Text Retrieval for End-to-End ASR Adaptation

Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko

Comments: Accepted to ICASSP 2023; Appendix added to include ablations that could not fit into the conference 4-page limit

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2303.10949 (cross-list from eess.AS) [pdf, other]: Title: Code-Switching Text Generation and Injection in Mandarin-English ASR

Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng

Comments: Accepted by ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[192] arXiv:2303.11089 (cross-list from cs.CV) [pdf, other]: Title: EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation

Ziqiao Peng, Haoyu Wu, Zhenbo Song, Hao Xu, Xiangyu Zhu, Jun He, Hongyan Liu, Zhaoxin Fan

Comments: Accepted by ICCV 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2303.11131 (cross-list from cs.CL) [pdf, other]: Title: Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

Maryam Fazel-Zarandi, Wei-Ning Hsu

Comments: ICASSP 2023

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2303.11329 (cross-list from cs.CV) [pdf, other]: Title: Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Ziyang Chen, Shengyi Qian, Andrew Owens

Comments: ICCV 2023. Project site: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2303.11551 (cross-list from cs.CV) [pdf, other]: Title: ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers

Akash Gupta, Rohun Tripathi, Wondong Jang

Comments: Paper accepted at ICASSP 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[196] arXiv:2303.11607 (cross-list from cs.CL) [pdf, html, other]: Title: Transformers in Speech Processing: A Survey

Siddique Latif, Aun Zaidi, Heriberto Cuayahuitl, Fahad Shamshad, Moazzam Shoukat, Muhammad Usama, Junaid Qadir

Comments: Accepted in Computer Science Review 2025

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197] arXiv:2303.12002 (cross-list from eess.AS) [pdf, html, other]: Title: End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

Comments: 16 pages, 7 figures

Journal-ref: Speech Communication 161 (2024) 103081

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[198] arXiv:2303.12187 (cross-list from eess.AS) [pdf, other]: Title: Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English

Xiaoming Ren, Chao Li, Shenjian Wang, Biao Li

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[199] arXiv:2303.12337 (cross-list from cs.MM) [pdf, other]: Title: Music-Driven Group Choreography

Nhat Le, Thang Pham, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen

Comments: accepted in CVPR 2023

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2303.12659 (cross-list from cs.AI) [pdf, other]: Title: Posthoc Interpretation via Quantization

Francesco Paissan, Cem Subakan, Mirco Ravanelli

Comments: Francesco Paissan and Cem Subakan contributed equally

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 232 entries : 1-25 ... 101-125 126-150 151-175 176-200 201-225 226-232

Showing up to 25 entries per page: fewer | more | all