Sound

Authors and titles for March 2023

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232

Showing up to 50 entries per page: fewer | more | all

[51] arXiv:2303.07643 [pdf, other]: Title: Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification

Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, Jing Xiao

Comments: Accepted by ICASSP 2023. International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[52] arXiv:2303.07667 [pdf, other]: Title: Improving Music Genre Classification from Multi-Modal Properties of Music and Genre Correlations Perspective

Ganghui Ru, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD)
[53] arXiv:2303.07682 [pdf, other]: Title: QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

Haobin Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[54] arXiv:2303.07687 [pdf, other]: Title: Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

Xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[55] arXiv:2303.07711 [pdf, other]: Title: Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis

Chunyu Qiang, Peng Yang, Hao Che, Ying Zhang, Xiaorui Wang, Zhongyuan Wang

Comments: Accepted by ICASSP2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[56] arXiv:2303.07794 [pdf, other]: Title: DiffuseRoll: Multi-track multi-category music generation based on diffusion model

Hongfei Wang

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[57] arXiv:2303.07902 [pdf, html, other]: Title: BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data

Xuenan Xu, Zhiling Zhang, Zelin Zhou, Pingyue Zhang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[58] arXiv:2303.08026 [pdf, other]: Title: A Study on Bias and Fairness In Deep Speaker Recognition

Amirhossein Hajavi, Ali Etemad

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[59] arXiv:2303.08239 [pdf, other]: Title: Facilitating deep acoustic phenotyping: A basic coding scheme of infant vocalisations preluding computational analysis, machine learning and clinical reasoning

Tomas Kulvicius, Sigrun Lang, Claudius AA Widmann, Nina Hansmann, Daniel Holzinger, Luise Poustka, Dajie Zhang, Peter B Marschik

Comments: This paper is in German

Journal-ref: Kindheit und EntwicklungVol, 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:2303.08329 [pdf, other]: Title: Cross-speaker Emotion Transfer by Manipulating Speech Style Latents

Suhee Jo, Younggun Lee, Yookyung Shin, Yeongtae Hwang, Taesu Kim

Comments: accepted to ICASSP 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[61] arXiv:2303.08342 [pdf, html, other]: Title: Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs

Kenneth Ooi, Karn N. Watcharasupat, Bhan Lam, Zhen-Ting Ong, Woon-Seng Gan

Comments: [v1] 5 pages, 2 figures. Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. [v2] 5 pages, 2 figures. Fixed incorrect author list in citation #30

Journal-ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1-5

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[62] arXiv:2303.08362 [pdf, other]: Title: Transfer Learning Based Diagnosis and Analysis of Lung Sound Aberrations

Hafsa Gulzar, Jiyun Li, Arslan Manzoor, Sadaf Rehmat, Usman Amjad, Hadiqa Jalil Khan

Comments: 12 pages, 9 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[63] arXiv:2303.08385 [pdf, other]: Title: Generating symbolic music using diffusion models

Lilac Atassi

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64] arXiv:2303.08561 [pdf, other]: Title: Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation

Yulin Pan, Xiangteng He, Biao Gong, Yuxin Peng, Yiliang Lv

Comments: 8 pages, 4 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65] arXiv:2303.08607 [pdf, other]: Title: PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin Jin

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66] arXiv:2303.08610 [pdf, other]: Title: Blind Estimation of Audio Processing Graph

Sungho Lee, Jaehyun Park, Seungryeol Paik, Kyogu Lee

Comments: Accepted to ICASSP 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67] arXiv:2303.09048 [pdf, other]: Title: Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms

Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Hojeong Lee, Ankit Shah, Shuo Han, Yunyang Zeng, Amanda Shu, Haohui Liu, Xuankai Chang, Hamza Khalid, Minseon Gwak, Kawon Lee, Minjeong Kim, Bhiksha Raj

Comments: Under review at European Association for Signal Processing. 5 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[68] arXiv:2303.10316 [pdf, other]: Title: Zero-shot Sound Event Classification Using a Sound Attribute Vector with Global and Local Feature Learning

Yi-Han Lin, Xunquan Chen, Ryoichi Takashima, Tetsuya Takiguchi

Comments: Accepted by ICASSP2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69] arXiv:2303.10351 [pdf, other]: Title: Weight-sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Constraints

Guan-Ting Lin, Qingming Tang, Chieh-Chi Kao, Viktor Rozgic, Chao Wang

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:2303.10445 [pdf, other]: Title: EarCough: Enabling Continuous Subject Cough Event Detection on Hearables

Xiyuxing Zhang, Yuntao Wang, Jingru Zhang, Yaqing Yang, Shwetak Patel, Yuanchun Shi

Comments: This paper has been accepted by ACM CHI 2023

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71] arXiv:2303.10446 [pdf, html, other]: Title: Content Adaptive Front End For Audio Classification

Prateek Verma, Chris Chafe

Comments: 5 pages, 4 figures. 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing, Rhodes, Greece; Minor Edits

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[72] arXiv:2303.10539 [pdf, other]: Title: Textless Speech-to-Music Retrieval Using Emotion Similarity

SeungHeon Doh, Minz Won, Keunwoo Choi, Juhan Nam

Comments: To Appear IEEE ICASSP 2023

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[73] arXiv:2303.10667 [pdf, other]: Title: Audio-Text Models Do Not Yet Leverage Natural Language

Ho-Hsiang Wu, Oriol Nieto, Juan Pablo Bello, Justin Salamon

Comments: Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2303.10757 [pdf, other]: Title: Multiscale Audio Spectrogram Transformer for Efficient Audio Classification

Wentao Zhu, Mohamed Omar

Comments: ICASSP 2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[75] arXiv:2303.10897 [pdf, other]: Title: Relate auditory speech to EEG by shallow-deep attention-based network

Fan Cui, Liyong Guo, Lang He, Jiyao Liu, ErCheng Pei, Yujun Wang, Dongmei Jiang

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[76] arXiv:2303.10912 [pdf, other]: Title: Exploring Representation Learning for Small-Footprint Keyword Spotting

Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[77] arXiv:2303.11020 [pdf, other]: Title: DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification

Yangfu Li, Jiapan Gan, Xiaodan Lin

Comments: 13 pages 4 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[78] arXiv:2303.11510 [pdf, other]: Title: ICASSP 2023 Deep Noise Suppression Challenge

Harishchandra Dubey, Ashkan Aazami, Vishak Gopal, Babak Naderi, Sebastian Braun, Ross Cutler, Alex Ju, Mehdi Zohourian, Min Tang, Hannes Gamper, Mehrsa Golestaneh, Robert Aichner

Comments: 6 pages, 1 figure. arXiv admin note: text overlap with arXiv:2202.13288

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2303.11692 [pdf, other]: Title: ByteCover3: Accurate Cover Song Identification on Short Queries

Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma

Comments: Accepeted by ICASSP 2023

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[80] arXiv:2303.11816 [pdf, other]: Title: Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Sung-Feng Huang, Chia-ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-yi Lee

Comments: ICASSP 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2303.12300 [pdf, other]: Title: Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network

Zeyu Ren, Nurmement Yolwas, Huiru Wang, Wushour Slamu

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[82] arXiv:2303.12692 [pdf, other]: Title: Dual-Quaternions: Theory and Applications in Sound

Benjamin Kenwright

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2303.12984 [pdf, other]: Title: LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

Teerapat Jenrungrot, Michael Chinen, W. Bastiaan Kleijn, Jan Skoglund, Zalán Borsos, Neil Zeghidour, Marco Tagliasacchi

Comments: 5 pages, accepted to ICASSP 2023, project page: this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2303.13072 [pdf, other]: Title: Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

Haoyu Tang, Zhaoyi Liu, Chang Zeng, Xinfeng Li

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[85] arXiv:2303.13272 [pdf, other]: Title: Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism

Dichucheng Li, Mingjin Che, Wenwu Meng, Yulun Wu, Yi Yu, Fan Xia, Wei Li

Comments: Accepted to ICASSP 2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[86] arXiv:2303.13336 [pdf, other]: Title: A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Chenshuang Zhang, Chaoning Zhang, Sheng Zheng, Mengchun Zhang, Maryam Qamar, Sung-Ho Bae, In So Kweon

Comments: 18 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[87] arXiv:2303.13631 [pdf, html, other]: Title: In-depth analysis of music structure as a text network

Ping-Rui Tsai, Yen-Ting Chou, Nathan-Christopher Wang, Hui-Ling Chen, Hong-Yue Huang, Zih-Jia Luo, Tzay-Ming Hong

Comments: 7 pages, 8 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[88] arXiv:2303.13881 [pdf, other]: Title: Symbolic Music Structure Analysis with Graph Representations and Changepoint Detection Methods

Carlos Hernandez-Olivan, Sonia Rubio Llamas, Jose R. Beltran

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[89] arXiv:2303.13909 [pdf, other]: Title: Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis

Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

Comments: Accepted to ICASSP 2023. Project page: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[90] arXiv:2303.14593 [pdf, other]: Title: Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang, Tatsuya Kawahara

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91] arXiv:2303.15161 [pdf, other]: Title: Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Yunhao Chen, Yunjie Zhu, Zihui Yan, Jianlu Shen, Zhen Ren, Yifan Huang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2303.15306 [pdf, other]: Title: Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings

Nicolas Lazzari, Andrea Poltronieri, Valentina Presutti

Journal-ref: Proceedings of the 1st Workshop on Artificial Intelligence and Creativity co-located with 21th International Conference of the Italian Association for Artificial Intelligence(AIxIA 2022), Udine, Italy, November 28 - December 3, 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[93] arXiv:2303.15734 [pdf, html, other]: Title: Adaptive Background Music for a Fighting Game: A Multi-Instrument Volume Modulation Approach

Ibrahim Khan, Thai Van Nguyen, Chollakorn Nimpattanavong, Ruck Thawonmas

Comments: In the updated version, the description of the association between the distance between the two players (PD) and the instrument's volume on page 3 has been revised

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[94] arXiv:2303.15940 [pdf, other]: Title: TransAudio: Towards the Transferable Adversarial Audio Attack via Learning Contextualized Perturbations

Qi Gege, Yuefeng Chen, Xiaofeng Mao, Yao Zhu, Binyuan Hui, Xiaodan Li, Rong Zhang, Hui Xue

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2303.17949 [pdf, other]: Title: Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, Pingyi Fan, Jia Liu

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[96] arXiv:2303.00069 (cross-list from cs.CL) [pdf, other]: Title: ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus

Ajinkya Kulkarni, Atharva Kulkarni, Sara Abedalmonem Mohammad Shatnawi, Hanan Aldarmaki

Comments: None

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97] arXiv:2303.00091 (cross-list from eess.AS) [pdf, other]: Title: Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model

Jaeyoung Huh, Sangjoon Park, Jeong Eun Lee, Jong Chul Ye

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Image and Video Processing (eess.IV)
[98] arXiv:2303.00146 (cross-list from cs.HC) [pdf, html, other]: Title: I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue

Yuanchao Li, Koji Inoue, Leimin Tian, Changzeng Fu, Carlos Ishi, Hiroshi Ishiguro, Tatsuya Kawahara, Catherine Lai

Comments: Accepted to CHI2023 Late-Breaking Work

Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2303.00455 (cross-list from eess.AS) [pdf, other]: Title: First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline

Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda

Comments: 5 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[100] arXiv:2303.00456 (cross-list from cs.CL) [pdf, other]: Title: N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian

Comments: Proceedings of INTERSPEECH

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 232 entries : 1-50 51-100 101-150 151-200 201-232

Showing up to 50 entries per page: fewer | more | all