Audio and Speech Processing

Authors and titles for May 2023

Total of 427 entries : 1-50 ... 251-300 301-350 351-400 401-427

Showing up to 50 entries per page: fewer | more | all

[401] arXiv:2305.18551 (cross-list from astro-ph.IM) [pdf, other]: Title: Multi-Band Acoustic Monitoring of Aerial Signatures

Andrew Mead, Sarah Little, Paul Sail, Michelle Tu, Wesley Andrés Watters, Abigail White, Richard Cloete

Journal-ref: Journal of Astronomical Instrumentation, 12(1), 2340005 (2023)

Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[402] arXiv:2305.18596 (cross-list from cs.SD) [pdf, other]: Title: Building Accurate Low Latency ASR for Streaming Voice Search

Abhinav Goyal, Nikesh Garera

Comments: Accepted at ACL 2023 Industry Track

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[403] arXiv:2305.18602 (cross-list from cs.CL) [pdf, other]: Title: From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape

Séverine Guillaume, Guillaume Wisniewski, Alexis Michaud

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[404] arXiv:2305.18665 (cross-list from cs.SD) [pdf, other]: Title: E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks

Arshdeep Singh, Haohe Liu, Mark D. Plumbley

Comments: Accepted in Internoise 2023 conference

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[405] arXiv:2305.18794 (cross-list from cs.SD) [pdf, other]: Title: Understanding temporally weakly supervised training: A case study for keyword spotting

Heinrich Dinkel, Weiji Zhuang, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[406] arXiv:2305.18823 (cross-list from cs.SD) [pdf, other]: Title: Speaker anonymization using orthogonal Householder neural network

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[407] arXiv:2305.18824 (cross-list from cs.CL) [pdf, other]: Title: Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator

Guangzhi Sun, Chao Zhang, Phil Woodland

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[408] arXiv:2305.19020 (cross-list from cs.SD) [pdf, other]: Title: Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification

Qing Wang, Jixun Yao, Ziqian Wang, Pengcheng Guo, Lei Xie

Comments: 5 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[409] arXiv:2305.19130 (cross-list from cs.SD) [pdf, other]: Title: Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Csapó Tamás Gábor

Comments: 5 pages, 3 figures, 3 tables

Journal-ref: the Proceedings of Interspeech 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[410] arXiv:2305.19228 (cross-list from cs.CL) [pdf, html, other]: Title: Unsupervised Melody-to-Lyric Generation

Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Yiwen Chen, Tagyoung Chung, Jing Huang, Nanyun Peng

Comments: ACL 2023. arXiv admin note: substantial text overlap with arXiv:2305.07760

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[411] arXiv:2305.19304 (cross-list from cs.SD) [pdf, other]: Title: Audio classification using ML methods

Krishna Kumar

Comments: 3 pages, 8 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[412] arXiv:2305.19458 (cross-list from cs.SD) [pdf, other]: Title: A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition

Shentong Mo, Pedro Morgado

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[413] arXiv:2305.19522 (cross-list from cs.SD) [pdf, other]: Title: PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions

Guanghou Liu, Yongmao Zhang, Yi Lei, Yunlin Chen, Rui Wang, Zhifei Li, Lei Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[414] arXiv:2305.19556 (cross-list from cs.CV) [pdf, html, other]: Title: Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation

Se Jin Park, Minsu Kim, Jeongsoo Choi, Yong Man Ro

Comments: Accepted at ICASSP 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[415] arXiv:2305.19563 (cross-list from cs.SD) [pdf, other]: Title: Zero-Shot Automatic Pronunciation Assessment

Hongfu Liu, Mingqian Shi, Ye Wang

Comments: Accepted to Interspeech 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[416] arXiv:2305.19567 (cross-list from cs.SD) [pdf, other]: Title: DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer

Yerin Choi, Myoung-Wan Koo

Comments: Accepted in Interspeech 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[417] arXiv:2305.19581 (cross-list from cs.SD) [pdf, other]: Title: SVVAD: Personal Voice Activity Detection for Speaker Verification

Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao

Comments: Accepted by INTERSPEECH 2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[418] arXiv:2305.19584 (cross-list from cs.CL) [pdf, other]: Title: The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR

Kaousheik Jayakumar, Vrunda N. Sukhadia, A Arunkumar, S. Umesh

Comments: 5 pages,5 figures, submitted to INTERSPEECH2023

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[419] arXiv:2305.19602 (cross-list from cs.SD) [pdf, other]: Title: Learning Music Sequence Representation from Text Supervision

Tianyu Chen, Yuan Xie, Shuai Zhang, Shaohan Huang, Haoyi Zhou, Jianxin Li

Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 4583-4587

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[420] arXiv:2305.19603 (cross-list from cs.SD) [pdf, other]: Title: Intelligible Lip-to-Speech Synthesis with Speech Units

Jeongsoo Choi, Minsu Kim, Yong Man Ro

Comments: Interspeech 2023

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[421] arXiv:2305.19612 (cross-list from cs.SD) [pdf, other]: Title: Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition

Yuan Xie, Jiawei Ren, Ji Xu

Journal-ref: The Journal of the Acoustical Society of America, 2022, 152(5): 2641-2651

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[422] arXiv:2305.19709 (cross-list from cs.CL) [pdf, other]: Title: XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

Comments: In Proceedings of INTERSPEECH 2023 (to appear)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[423] arXiv:2305.19750 (cross-list from cs.CL) [pdf, other]: Title: Text-to-Speech Pipeline for Swiss German -- A comparison

Tobias Bollinger, Jan Deriu, Manfred Vogel

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[424] arXiv:2305.19759 (cross-list from cs.CL) [pdf, other]: Title: Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Shuyue Stella Li, Cihan Xiao, Tianjian Li, Bismarck Odoom

Comments: 8 pages, 3 figures, 7 tables

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[425] arXiv:2305.19769 (cross-list from cs.CL) [pdf, other]: Title: Attention-Based Methods For Audio Question Answering

Parthasaarathy Sudarsanam, Tuomas Virtanen

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[426] arXiv:2305.19953 (cross-list from cs.SD) [pdf, other]: Title: Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing

Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen

Comments: Interspeech 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[427] arXiv:2305.20054 (cross-list from cs.SD) [pdf, other]: Title: UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures

Zhong-Qiu Wang, Shinji Watanabe

Comments: in Conference on Neural Information Processing Systems (NeurIPS), 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Total of 427 entries : 1-50 ... 251-300 301-350 351-400 401-427

Showing up to 50 entries per page: fewer | more | all