Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for March 2023

Total of 232 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 ... 226-232
Showing up to 25 entries per page: fewer | more | all
[76] arXiv:2303.10912 [pdf, other]
Title: Exploring Representation Learning for Small-Footprint Keyword Spotting
Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[77] arXiv:2303.11020 [pdf, other]
Title: DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification
Yangfu Li, Jiapan Gan, Xiaodan Lin
Comments: 13 pages 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[78] arXiv:2303.11510 [pdf, other]
Title: ICASSP 2023 Deep Noise Suppression Challenge
Harishchandra Dubey, Ashkan Aazami, Vishak Gopal, Babak Naderi, Sebastian Braun, Ross Cutler, Alex Ju, Mehdi Zohourian, Min Tang, Hannes Gamper, Mehrsa Golestaneh, Robert Aichner
Comments: 6 pages, 1 figure. arXiv admin note: text overlap with arXiv:2202.13288
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2303.11692 [pdf, other]
Title: ByteCover3: Accurate Cover Song Identification on Short Queries
Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma
Comments: Accepeted by ICASSP 2023
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[80] arXiv:2303.11816 [pdf, other]
Title: Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Sung-Feng Huang, Chia-ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-yi Lee
Comments: ICASSP 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2303.12300 [pdf, other]
Title: Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network
Zeyu Ren, Nurmement Yolwas, Huiru Wang, Wushour Slamu
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[82] arXiv:2303.12692 [pdf, other]
Title: Dual-Quaternions: Theory and Applications in Sound
Benjamin Kenwright
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2303.12984 [pdf, other]
Title: LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Teerapat Jenrungrot, Michael Chinen, W. Bastiaan Kleijn, Jan Skoglund, Zalán Borsos, Neil Zeghidour, Marco Tagliasacchi
Comments: 5 pages, accepted to ICASSP 2023, project page: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2303.13072 [pdf, other]
Title: Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
Haoyu Tang, Zhaoyi Liu, Chang Zeng, Xinfeng Li
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[85] arXiv:2303.13272 [pdf, other]
Title: Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism
Dichucheng Li, Mingjin Che, Wenwu Meng, Yulun Wu, Yi Yu, Fan Xia, Wei Li
Comments: Accepted to ICASSP 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[86] arXiv:2303.13336 [pdf, other]
Title: A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang, Chaoning Zhang, Sheng Zheng, Mengchun Zhang, Maryam Qamar, Sung-Ho Bae, In So Kweon
Comments: 18 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[87] arXiv:2303.13631 [pdf, html, other]
Title: In-depth analysis of music structure as a text network
Ping-Rui Tsai, Yen-Ting Chou, Nathan-Christopher Wang, Hui-Ling Chen, Hong-Yue Huang, Zih-Jia Luo, Tzay-Ming Hong
Comments: 7 pages, 8 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[88] arXiv:2303.13881 [pdf, other]
Title: Symbolic Music Structure Analysis with Graph Representations and Changepoint Detection Methods
Carlos Hernandez-Olivan, Sonia Rubio Llamas, Jose R. Beltran
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[89] arXiv:2303.13909 [pdf, other]
Title: Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki
Comments: Accepted to ICASSP 2023. Project page: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[90] arXiv:2303.14593 [pdf, other]
Title: Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang, Tatsuya Kawahara
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91] arXiv:2303.15161 [pdf, other]
Title: Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator
Yunhao Chen, Yunjie Zhu, Zihui Yan, Jianlu Shen, Zhen Ren, Yifan Huang
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2303.15306 [pdf, other]
Title: Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings
Nicolas Lazzari, Andrea Poltronieri, Valentina Presutti
Journal-ref: Proceedings of the 1st Workshop on Artificial Intelligence and Creativity co-located with 21th International Conference of the Italian Association for Artificial Intelligence(AIxIA 2022), Udine, Italy, November 28 - December 3, 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[93] arXiv:2303.15734 [pdf, html, other]
Title: Adaptive Background Music for a Fighting Game: A Multi-Instrument Volume Modulation Approach
Ibrahim Khan, Thai Van Nguyen, Chollakorn Nimpattanavong, Ruck Thawonmas
Comments: In the updated version, the description of the association between the distance between the two players (PD) and the instrument's volume on page 3 has been revised
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[94] arXiv:2303.15940 [pdf, other]
Title: TransAudio: Towards the Transferable Adversarial Audio Attack via Learning Contextualized Perturbations
Qi Gege, Yuefeng Chen, Xiaofeng Mao, Yao Zhu, Binyuan Hui, Xiaodan Li, Rong Zhang, Hui Xue
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2303.17949 [pdf, other]
Title: Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach
Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, Pingyi Fan, Jia Liu
Comments: Accepted by ICASSP 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[96] arXiv:2303.00069 (cross-list from cs.CL) [pdf, other]
Title: ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Ajinkya Kulkarni, Atharva Kulkarni, Sara Abedalmonem Mohammad Shatnawi, Hanan Aldarmaki
Comments: None
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97] arXiv:2303.00091 (cross-list from eess.AS) [pdf, other]
Title: Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
Jaeyoung Huh, Sangjoon Park, Jeong Eun Lee, Jong Chul Ye
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Image and Video Processing (eess.IV)
[98] arXiv:2303.00146 (cross-list from cs.HC) [pdf, html, other]
Title: I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue
Yuanchao Li, Koji Inoue, Leimin Tian, Changzeng Fu, Carlos Ishi, Hiroshi Ishiguro, Tatsuya Kawahara, Catherine Lai
Comments: Accepted to CHI2023 Late-Breaking Work
Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2303.00455 (cross-list from eess.AS) [pdf, other]
Title: First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline
Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda
Comments: 5 pages, 2 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[100] arXiv:2303.00456 (cross-list from cs.CL) [pdf, other]
Title: N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian
Comments: Proceedings of INTERSPEECH
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 232 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-175 ... 226-232
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack