Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for March 2023

Total of 232 entries : 1-25 ... 101-125 126-150 151-175 176-200 201-225 226-232
Showing up to 25 entries per page: fewer | more | all
[176] arXiv:2303.09438 (cross-list from cs.CL) [pdf, other]
Title: Trustera: A Live Conversation Redaction System
Evandro Gouvêa, Ali Dadgar, Shahab Jalalvand, Rathi Chengalvarayan, Badrinath Jayakumar, Ryan Price, Nicholas Ruiz, Jennifer McGovern, Srinivas Bangalore, Ben Stern
Comments: 5
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2303.09455 (cross-list from cs.CL) [pdf, other]
Title: Learning Cross-lingual Visual Speech Representations
Andreas Zinonos, Alexandros Haliassos, Pingchuan Ma, Stavros Petridis, Maja Pantic
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2303.09645 (cross-list from cs.RO) [pdf, other]
Title: Development of a Voice Controlled Robotic Arm
Akkas U. Haque, Humayun Kabir, S. C. Banik, M. T. Islam
Subjects: Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2303.09966 (cross-list from eess.AS) [pdf, other]
Title: Magnitude-Corrected and Time-Aligned Interpolation of Head-Related Transfer Functions
Johannes M. Arend, Christoph Pörschmann, Stefan Weinzierl, Fabian Brinkmann
Journal-ref: IEEE/ACM Trans. Audio Speech and Lang. Proc., 31, 3783--3799 (2023)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[180] arXiv:2303.10008 (cross-list from eess.AS) [pdf, other]
Title: Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Julien Hauret, Thomas Joubaud, Véronique Zimpfer, Éric Bavu
Comments: Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing on 14/08/2023
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023 - Volume: 31) - pp. 3499 - 3512
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[181] arXiv:2303.10160 (cross-list from eess.AS) [pdf, other]
Title: Visual Information Matters for ASR Error Correction
Vanya Bannihatti Kumar, Shanbo Cheng, Ningxin Peng, Yuchen Zhang
Comments: Accepted at ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[182] arXiv:2303.10335 (cross-list from cs.MM) [pdf, other]
Title: Multimodal Continuous Emotion Recognition: A Technical Report for ABAW5
Su Zhang, Ziyuan Zhao, Cuntai Guan
Comments: 6 pages. 1 figure. arXiv admin note: substantial text overlap with arXiv:2203.13031
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2303.10384 (cross-list from eess.AS) [pdf, other]
Title: Powerful and Extensible WFST Framework for RNN-Transducer Losses
Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg
Comments: To appear in Proc. ICASSP 2023, June 04-10, 2023, Rhodes island, Greece. 5 pages, 5 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[184] arXiv:2303.10510 (cross-list from cs.CL) [pdf, other]
Title: A Deep Learning System for Domain-specific Speech Recognition
Yanan Jia
Comments: 4th International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2023)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2303.10556 (cross-list from eess.AS) [pdf, other]
Title: The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
Zirui Ge, Haiyan Guo, Zhen Yang
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[186] arXiv:2303.10721 (cross-list from cs.HC) [pdf, other]
Title: Right the docs: Characterising voice dataset documentation practices used in machine learning
Kathy Reid, Elizabeth T. Williams
Comments: 16 pages, 3 tables, preprint of a submission to AIES 2023
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2303.10727 (cross-list from cs.LG) [pdf, html, other]
Title: ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement
Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Celine Lin, Ashutosh Sabharwal
Comments: Accepted by ICASSP'23
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2303.10917 (cross-list from eess.AS) [pdf, other]
Title: Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[189] arXiv:2303.10931 (cross-list from stat.ML) [pdf, other]
Title: Approaching an unknown communication system by latent space exploration and causal inference
Gašper Beguš, Andrej Leban, Shane Gero
Comments: 25 pages, 23 figures; new format and section layout (moved some sections to the appendix), added replication experiments, updated references: to a subsequent experimental validation of the work, as well as to related methodological work
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[190] arXiv:2303.10942 (cross-list from cs.CL) [pdf, other]
Title: On-the-fly Text Retrieval for End-to-End ASR Adaptation
Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko
Comments: Accepted to ICASSP 2023; Appendix added to include ablations that could not fit into the conference 4-page limit
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2303.10949 (cross-list from eess.AS) [pdf, other]
Title: Code-Switching Text Generation and Injection in Mandarin-English ASR
Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[192] arXiv:2303.11089 (cross-list from cs.CV) [pdf, other]
Title: EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
Ziqiao Peng, Haoyu Wu, Zhenbo Song, Hao Xu, Xiangyu Zhu, Jun He, Hongyan Liu, Zhaoxin Fan
Comments: Accepted by ICCV 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2303.11131 (cross-list from cs.CL) [pdf, other]
Title: Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Maryam Fazel-Zarandi, Wei-Ning Hsu
Comments: ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2303.11329 (cross-list from cs.CV) [pdf, other]
Title: Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen, Shengyi Qian, Andrew Owens
Comments: ICCV 2023. Project site: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2303.11551 (cross-list from cs.CV) [pdf, other]
Title: ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Akash Gupta, Rohun Tripathi, Wondong Jang
Comments: Paper accepted at ICASSP 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[196] arXiv:2303.11607 (cross-list from cs.CL) [pdf, html, other]
Title: Transformers in Speech Processing: A Survey
Siddique Latif, Aun Zaidi, Heriberto Cuayahuitl, Fahad Shamshad, Moazzam Shoukat, Muhammad Usama, Junaid Qadir
Comments: Accepted in Computer Science Review 2025
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197] arXiv:2303.12002 (cross-list from eess.AS) [pdf, html, other]
Title: End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
Comments: 16 pages, 7 figures
Journal-ref: Speech Communication 161 (2024) 103081
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[198] arXiv:2303.12187 (cross-list from eess.AS) [pdf, other]
Title: Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English
Xiaoming Ren, Chao Li, Shenjian Wang, Biao Li
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[199] arXiv:2303.12337 (cross-list from cs.MM) [pdf, other]
Title: Music-Driven Group Choreography
Nhat Le, Thang Pham, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen
Comments: accepted in CVPR 2023
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2303.12659 (cross-list from cs.AI) [pdf, other]
Title: Posthoc Interpretation via Quantization
Francesco Paissan, Cem Subakan, Mirco Ravanelli
Comments: Francesco Paissan and Cem Subakan contributed equally
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 232 entries : 1-25 ... 101-125 126-150 151-175 176-200 201-225 226-232
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack