Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for August 2022

Total of 138 entries : 1-50 51-100 101-138
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2208.00792 [pdf, other]
Title: Jazz Contrafact Detection
C. Bunks, T. Weyde
Comments: 8 pages, 6 figures, 4 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2208.01141 [pdf, other]
Title: SampleMatch: Drum Sample Retrieval by Musical Context
Stefan Lattner
Comments: 8 pages, 3 figures, 1 table; Accepted at the ISMIR conference, Bengaluru, India, 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2208.01214 [pdf, other]
Title: Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Jun Xue, Cunhang Fan, Zhao Lv, Jianhua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2208.01818 [pdf, other]
Title: VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Jiatong Shi, George Saon, David Haws, Shinji Watanabe, Brian Kingsbury
Comments: Interspeech 2022 accepted paper
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[5] arXiv:2208.01917 [pdf, other]
Title: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Mireille Fares, Michele Grimaldi, Catherine Pelachaud, Nicolas Obin
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:2208.01928 [pdf, other]
Title: Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction
Bing Han, Zhengyang Chen, Yanmin Qian
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:2208.01933 [pdf, other]
Title: The SJTU System for Short-duration Speaker Verification Challenge 2021
Bing Han, Zhengyang Chen, Zhikai Zhou, Yanmin Qian
Comments: Published by Interspeech 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2208.02086 [pdf, other]
Title: Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion
Yuanbo Hou, Bo Kang, Dick Botteldooren
Comments: IEEE MMSP 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[9] arXiv:2208.02250 [pdf, other]
Title: Adversarial Attacks on ASR Systems: An Overview
Xiao Zhang, Hao Tan, Xuan Huang, Denghui Zhang, Keke Tang, Zhaoquan Gu
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[10] arXiv:2208.02494 [pdf, other]
Title: Tokyo Kion-On: Query-Based Generative Sonification of Atmospheric Data
Stefano Kalonaris
Comments: To appear in: Proceedings of the 27th International Conference on Auditory Display (ICAD 2022)
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11] arXiv:2208.02765 [pdf, other]
Title: Keyword Spotting System and Evaluation of Pruning and Quantization Methods on Low-power Edge Microcontrollers
Jingyi Wang, Shengchen Li
Comments: Submitted to DCASE2022 Workshop. Code available at: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Audio and Speech Processing (eess.AS)
[12] arXiv:2208.03084 [pdf, other]
Title: Deep Feature Learning for Medical Acoustics
Alessandro Maria Poirè, Federico Simonetta, Stavros Ntalampiras
Comments: Published at ICANN 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13] arXiv:2208.03162 [pdf, other]
Title: Robust Acoustic Domain Identification with its Application to Speaker Diarization
A Kishore Kumar, Shefali Waldekar, Md Sahidullah, Goutam Saha
Comments: Accepted for publication in International Journal of Speech Technology (Springer Nature)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2208.03311 [pdf, other]
Title: A Model You Can Hear: Audio Identification with Playable Prototypes
Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loic Landrieu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15] arXiv:2208.03326 [pdf, html, other]
Title: Variational Autoencoders for Anomaly Detection in Respiratory Sounds
Michele Cozzatti, Federico Simonetta, Stavros Ntalampiras
Comments: Published at ICANN 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:2208.03393 [pdf, other]
Title: Chronological Self-Training for Real-Time Speaker Diarization
Dirk Padfield, Daniel J. Liebling
Comments: 5 pages, 5 figures, ICASSP 2021
Journal-ref: Proc. Interspeech (2021) 4613-4617
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:2208.04035 [pdf, other]
Title: TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training
Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao
Comments: ASRU 6 pages
Journal-ref: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 938-945
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18] arXiv:2208.04462 [pdf, other]
Title: Denoising Induction Motor Sounds Using an Autoencoder
Thanh Tran, Sebastian Bader, Jan Lundgren
Comments: 9 pages, 10 figures, conference
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[19] arXiv:2208.04756 [pdf, other]
Title: DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation
Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang
Comments: Accepted at ISMIR 2022
Journal-ref: International Society for Music Information Retrieval (ISMIR) 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2208.04877 [pdf, other]
Title: Pure Data and INScore: Animated notation for new music
Patricio F. Calatayud
Comments: 11 pages, 10 figures
Subjects: Sound (cs.SD); Graphics (cs.GR); Audio and Speech Processing (eess.AS)
[21] arXiv:2208.04974 [pdf, other]
Title: Mathematical Foundations of Complex Tonality
Jeffrey R. Boland, Lane P. Hughston
Comments: 35 pages, to appear in Journal of Mathematics and Music
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Number Theory (math.NT)
[22] arXiv:2208.04994 [pdf, other]
Title: Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition
Shijun Wang, Hamed Hemati, Jón Guðnason, Damian Borth
Comments: Published in INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2208.05057 [pdf, other]
Title: Subjective Evaluation of Deep Neural Network Based Speech Enhancement Systems in Real-World Conditions
Gaurav Naithani, Kirsi Pietilä, Riitta Niemistö, Erkki Paajanen, Tero Takala, Tuomas Virtanen
Comments: Accepted for publication in IEEE MMSP 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[24] arXiv:2208.05162 [pdf, other]
Title: Controlling Perceived Emotion in Symbolic Music Generation with Monte Carlo Tree Search
Lucas N. Ferreira, Lili Mou, Jim Whitehead, Levi H. S. Lelis
Comments: Accepted for publication at the 18th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-22)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[25] arXiv:2208.05359 [pdf, other]
Title: Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng
Comments: 5 pages, 3 figures, accepted to INTERSPEECH 2022, demo page at this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[26] arXiv:2208.05605 [pdf, other]
Title: Symbolic Music Loop Generation with Neural Discrete Representations
Sangjun Han, Hyeongrae Ihm, Moontae Lee, Woohyung Lim
Comments: Accepted at ISMIR 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[27] arXiv:2208.05697 [pdf, other]
Title: Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation
Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[28] arXiv:2208.06127 [pdf, other]
Title: An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan, Shengchen Li
Comments: 5 pages, 7 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[29] arXiv:2208.06169 [pdf, other]
Title: DDX7: Differentiable FM Synthesis of Musical Instrument Sounds
Franco Caspe, Andrew McPherson, Mark Sandler
Comments: Accepted to ISMIR 2022. See online supplement at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[30] arXiv:2208.06878 [pdf, other]
Title: Models of Music Cognition and Composition
Abhimanyu Sethia, Aayush
Comments: TLDR: literature review of models of music cognition and composition
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[31] arXiv:2208.07091 [pdf, other]
Title: Analysis of impact of emotions on target speech extraction and speech separation
Ján Švec, Kateřina Žmolíková, Martin Kocour, Marc Delcroix, Tsubasa Ochiai, Ladislav Mošner, Jan Černocký
Comments: Accepted to IWAENC 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[32] arXiv:2208.07122 [pdf, other]
Title: Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0
Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh
Comments: accepted at EUSIPCO2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2208.07277 [pdf, other]
Title: LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation
Chenggang Zhang, Jinjiang Liu, Xueliang Zhang
Comments: Accepted to Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2208.07679 [pdf, other]
Title: How Should We Evaluate Synthesized Environmental Sounds
Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Takahiro Fukumori, Yoichi Yamashita
Comments: Submitted APSIPA ASC 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2208.07994 [pdf, other]
Title: Enhancing Audio Perception of Music By AI Picked Room Acoustics
Prateek Verma, Jonathan Berger
Comments: 24th International Congress on Acoustics, Gyeongju, South Korea
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[36] arXiv:2208.08131 [pdf, other]
Title: Domestic sound event detection by shift consistency mean-teacher training and adversarial domain adaptation
Fang-Ching Chen, Kuan-Dar Chen, Yi-Wen Liu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37] arXiv:2208.08354 [pdf, other]
Title: Extract fundamental frequency based on CNN combined with PYIN
Ruowei Xing, Shengchen Li
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38] arXiv:2208.08706 [pdf, other]
Title: Musika! Fast Infinite Waveform Music Generation
Marco Pasini, Jan Schlüter
Comments: Accepted at ISMIR 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[39] arXiv:2208.08960 [pdf, other]
Title: Deploying Enhanced Speech Feature Decreased Audio Complaints at SVT Play VOD Service
Annika Bidner, Julia Lindberg, Olof Lindman, Kinga Skorupska
Comments: 9 pages, study based on a practical implementation at SVT
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[40] arXiv:2208.09096 [pdf, other]
Title: Representation Learning for the Automatic Indexing of Sound Effects Libraries
Alison B. Ma, Alexander Lerch
Comments: Accepted at the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), 10 pages, 7 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[41] arXiv:2208.09110 [pdf, other]
Title: 3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment
Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen
Comments: Accepted to APSIPA ASC 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[42] arXiv:2208.09201 [pdf, other]
Title: Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning
Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
Comments: Published on IEEE Access journal, Volume 10, 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[43] arXiv:2208.09618 [pdf, other]
Title: Fully Automated End-to-End Fake Audio Detection
Chenglong Wang, Jiangyan Yi, Jianhua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[44] arXiv:2208.09646 [pdf, other]
Title: An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio
Xinrui Yan, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu
Comments: Accepted by ACM Multimedia 2022 Workshop: First International Workshop on Deepfake Detection for Audio Multimedia
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[45] arXiv:2208.09830 [pdf, other]
Title: Representation Learning with Graph Neural Networks for Speech Emotion Recognition
Junghun Kim, Jihie Kim
Comments: AAAI 2022 Workshop on Graphs and More Complex Structures for Learning and Reasoning (GCLR)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[46] arXiv:2208.10367 [pdf, other]
Title: Multi-View Attention Transfer for Efficient Speech Enhancement
Wooseok Shin, Hyun Joon Park, Jin Sob Kim, Byung Hoon Lee, Sung Won Han
Comments: Proceedings of Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47] arXiv:2208.10489 [pdf, html, other]
Title: Audio Deepfake Attribution: An Initial Dataset and Investigation
Xinrui Yan, Jiangyan Yi, Jianhua Tao, Jie Chen
Comments: 13 pages, 5 figures. arXiv admin note: text overlap with arXiv:2208.10489v3
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[48] arXiv:2208.10491 [pdf, other]
Title: Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms
Junghun Kim, Yoojin An, Jihie Kim
Comments: Accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[49] arXiv:2208.10497 [pdf, other]
Title: Are disentangled representations all you need to build speaker anonymization systems?
Pierre Champion (MULTISPEECH, LIUM), Denis Jouvet (MULTISPEECH), Anthony Larcher (LIUM)
Journal-ref: INTERSPEECH 2022 - Human and Humanizing Speech Technology, Sep 2022, incheon, South Korea
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[50] arXiv:2208.10597 [pdf, other]
Title: Concurrent Validity of Automatic Speech and Pause Measures During Passage Reading in ALS
Saeid Alavi Naeini, Leif Simmatis, Yana Yunusova, Babak Taati
Comments: 2022 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 138 entries : 1-50 51-100 101-138
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack