Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for August 2022

Total of 149 entries
Showing up to 2000 entries per page: fewer | more | all
[76] arXiv:2208.05605 (cross-list from cs.SD) [pdf, other]
Title: Symbolic Music Loop Generation with Neural Discrete Representations
Sangjun Han, Hyeongrae Ihm, Moontae Lee, Woohyung Lim
Comments: Accepted at ISMIR 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[77] arXiv:2208.05697 (cross-list from cs.SD) [pdf, other]
Title: Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation
Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[78] arXiv:2208.06127 (cross-list from cs.SD) [pdf, other]
Title: An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan, Shengchen Li
Comments: 5 pages, 7 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[79] arXiv:2208.06169 (cross-list from cs.SD) [pdf, other]
Title: DDX7: Differentiable FM Synthesis of Musical Instrument Sounds
Franco Caspe, Andrew McPherson, Mark Sandler
Comments: Accepted to ISMIR 2022. See online supplement at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[80] arXiv:2208.06878 (cross-list from cs.SD) [pdf, other]
Title: Models of Music Cognition and Composition
Abhimanyu Sethia, Aayush
Comments: TLDR: literature review of models of music cognition and composition
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[81] arXiv:2208.07091 (cross-list from cs.SD) [pdf, other]
Title: Analysis of impact of emotions on target speech extraction and speech separation
Ján Švec, Kateřina Žmolíková, Martin Kocour, Marc Delcroix, Tsubasa Ochiai, Ladislav Mošner, Jan Černocký
Comments: Accepted to IWAENC 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[82] arXiv:2208.07122 (cross-list from cs.SD) [pdf, other]
Title: Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0
Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh
Comments: accepted at EUSIPCO2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2208.07277 (cross-list from cs.SD) [pdf, other]
Title: LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation
Chenggang Zhang, Jinjiang Liu, Xueliang Zhang
Comments: Accepted to Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2208.07679 (cross-list from cs.SD) [pdf, other]
Title: How Should We Evaluate Synthesized Environmental Sounds
Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Takahiro Fukumori, Yoichi Yamashita
Comments: Submitted APSIPA ASC 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2208.07994 (cross-list from cs.SD) [pdf, other]
Title: Enhancing Audio Perception of Music By AI Picked Room Acoustics
Prateek Verma, Jonathan Berger
Comments: 24th International Congress on Acoustics, Gyeongju, South Korea
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[86] arXiv:2208.08042 (cross-list from cs.CL) [pdf, other]
Title: The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan
Comments: arXiv admin note: text overlap with arXiv:2203.16844
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2208.08082 (cross-list from eess.SY) [pdf, other]
Title: A Hybrid SFANC-FxNLMS Algorithm for Active Noise Control based on Deep Learning
Zhengding Luo, Dongyuan Shi, Woon-Seng Gan
Journal-ref: IEEE Signal Processing Letters, 2022
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88] arXiv:2208.08131 (cross-list from cs.SD) [pdf, other]
Title: Domestic sound event detection by shift consistency mean-teacher training and adversarial domain adaptation
Fang-Ching Chen, Kuan-Dar Chen, Yi-Wen Liu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2208.08354 (cross-list from cs.SD) [pdf, other]
Title: Extract fundamental frequency based on CNN combined with PYIN
Ruowei Xing, Shengchen Li
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[90] arXiv:2208.08509 (cross-list from cs.CL) [pdf, other]
Title: Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition
Goutham Rajendran, Wei Zou
Comments: 5 pages, 14 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91] arXiv:2208.08706 (cross-list from cs.SD) [pdf, other]
Title: Musika! Fast Infinite Waveform Music Generation
Marco Pasini, Jan Schlüter
Comments: Accepted at ISMIR 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[92] arXiv:2208.08960 (cross-list from cs.SD) [pdf, other]
Title: Deploying Enhanced Speech Feature Decreased Audio Complaints at SVT Play VOD Service
Annika Bidner, Julia Lindberg, Olof Lindman, Kinga Skorupska
Comments: 9 pages, study based on a practical implementation at SVT
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[93] arXiv:2208.09096 (cross-list from cs.SD) [pdf, other]
Title: Representation Learning for the Automatic Indexing of Sound Effects Libraries
Alison B. Ma, Alexander Lerch
Comments: Accepted at the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), 10 pages, 7 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[94] arXiv:2208.09110 (cross-list from cs.SD) [pdf, other]
Title: 3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment
Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen
Comments: Accepted to APSIPA ASC 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[95] arXiv:2208.09201 (cross-list from cs.SD) [pdf, other]
Title: Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning
Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
Comments: Published on IEEE Access journal, Volume 10, 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[96] arXiv:2208.09269 (cross-list from eess.SP) [pdf, other]
Title: Feature Selection Enhancement and Feature Space Visualization for Speech-Based Emotion Recognition
Sofia Kanwal, Sohail Asghar, Hazrat Ali
Comments: Accepted at PeerJ Computer Science
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97] arXiv:2208.09618 (cross-list from cs.SD) [pdf, other]
Title: Fully Automated End-to-End Fake Audio Detection
Chenglong Wang, Jiangyan Yi, Jianhua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[98] arXiv:2208.09646 (cross-list from cs.SD) [pdf, other]
Title: An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio
Xinrui Yan, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu
Comments: Accepted by ACM Multimedia 2022 Workshop: First International Workshop on Deepfake Detection for Audio Multimedia
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[99] arXiv:2208.09830 (cross-list from cs.SD) [pdf, other]
Title: Representation Learning with Graph Neural Networks for Speech Emotion Recognition
Junghun Kim, Jihie Kim
Comments: AAAI 2022 Workshop on Graphs and More Complex Structures for Learning and Reasoning (GCLR)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[100] arXiv:2208.10367 (cross-list from cs.SD) [pdf, other]
Title: Multi-View Attention Transfer for Efficient Speech Enhancement
Wooseok Shin, Hyun Joon Park, Jin Sob Kim, Byung Hoon Lee, Sung Won Han
Comments: Proceedings of Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[101] arXiv:2208.10441 (cross-list from cs.HC) [pdf, other]
Title: The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation
Youngwoo Yoon, Pieter Wolfert, Taras Kucherenko, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter
Comments: 12 pages, 5 figures; final version for ACM ICMI 2022
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102] arXiv:2208.10455 (cross-list from cs.RO) [pdf, other]
Title: Examining Audio Communication Mechanisms for Supervising Fleets of Agricultural Robots
Abhi Kamboj, Tianchen Ji, Katie Driggs-Campbell
Comments: Camera ready version for IEEE RO-MAN 2022
Subjects: Robotics (cs.RO); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2208.10489 (cross-list from cs.SD) [pdf, html, other]
Title: Audio Deepfake Attribution: An Initial Dataset and Investigation
Xinrui Yan, Jiangyan Yi, Jianhua Tao, Jie Chen
Comments: 13 pages, 5 figures. arXiv admin note: text overlap with arXiv:2208.10489v3
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[104] arXiv:2208.10491 (cross-list from cs.SD) [pdf, other]
Title: Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms
Junghun Kim, Yoojin An, Jihie Kim
Comments: Accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[105] arXiv:2208.10497 (cross-list from cs.SD) [pdf, other]
Title: Are disentangled representations all you need to build speaker anonymization systems?
Pierre Champion (MULTISPEECH, LIUM), Denis Jouvet (MULTISPEECH), Anthony Larcher (LIUM)
Journal-ref: INTERSPEECH 2022 - Human and Humanizing Speech Technology, Sep 2022, incheon, South Korea
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[106] arXiv:2208.10499 (cross-list from cs.HC) [pdf, other]
Title: DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input
Jun Rekimoto
Comments: to appear as ACM UIST 2022 paper
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2208.10597 (cross-list from cs.SD) [pdf, other]
Title: Concurrent Validity of Automatic Speech and Pause Measures During Passage Reading in ALS
Saeid Alavi Naeini, Leif Simmatis, Yana Yunusova, Babak Taati
Comments: 2022 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108] arXiv:2208.10659 (cross-list from cs.SD) [pdf, other]
Title: Fall Detection from Audios with Audio Transformers
Prabhjot Kaur, Qifan Wang, Weisong Shi
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[109] arXiv:2208.10839 (cross-list from cs.CV) [pdf, other]
Title: In-Air Imaging Sonar Sensor Network with Real-Time Processing Using GPUs
Wouter Jansen, Dennis Laurijssen, Robin Kerstens, Walter Daems, Jan Steckel
Comments: 2019 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Networking and Internet Architecture (cs.NI); Audio and Speech Processing (eess.AS)
[110] arXiv:2208.10922 (cross-list from cs.CV) [pdf, html, other]
Title: StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation
Dongchan Min, Minyoung Song, Eunji Ko, Sung Ju Hwang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[111] arXiv:2208.11308 (cross-list from cs.SD) [pdf, other]
Title: Deep model with built-in cross-attention alignment for acoustic echo cancellation
Evgenii Indenbom, Nicolae-Cătălin Ristea, Ando Saabas, Tanel Pärnamaa, Jegor Gužvin
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[112] arXiv:2208.11402 (cross-list from cs.SD) [pdf, other]
Title: Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers
Paul Primus, Gerhard Widmer
Comments: published in EUSIPCO 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[113] arXiv:2208.11460 (cross-list from cs.SD) [pdf, other]
Title: Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio & Text Augmentations
Paul Primus, Gerhard Widmer
Comments: accepted at DCASE Workshop 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[114] arXiv:2208.11488 (cross-list from q-bio.QM) [pdf, other]
Title: MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing
Laura Gwilliams, Graham Flick, Alec Marantz, Liina Pylkkanen, David Poeppel, Jean-Remi King
Comments: 11 pages, 4 figures
Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[115] arXiv:2208.11671 (cross-list from cs.SD) [pdf, other]
Title: Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model
Yixiao Zhang, Junyan Jiang, Gus Xia, Simon Dixon
Comments: Accepted to ISMIR 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[116] arXiv:2208.11700 (cross-list from q-bio.NC) [pdf, other]
Title: Low-Level Physiological Implications of End-to-End Learning of Speech Recognition
Louise Coppieters de Gibson, Philip N. Garner
Comments: Submitted to INTERSPEECH 2022
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2208.11761 (cross-list from cs.CL) [pdf, other]
Title: IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages
Tahir Javed, Kaushal Santosh Bhogale, Abhigyan Raman, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2208.11868 (cross-list from cs.CV) [pdf, other]
Title: Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data
Puneet Kumar, Sarthak Malik, Balasubramanian Raman
Comments: arXiv admin note: text overlap with arXiv:2208.11450
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:2208.11920 (cross-list from cs.SD) [pdf, other]
Title: Digital Audio Tampering Detection Based on ENF Spatio-temporal Features Representation Learning
Chunyan Zeng, Shuai Kong, Zhifeng Wang, Xiangkui Wan, Yunfan Chen
Comments: 19 pages, 6 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2208.12086 (cross-list from cs.SD) [pdf, other]
Title: A Study on Broadcast Networks for Music Genre Classification
Ahmed Heakl, Abdelrahman Abdelgawad, Victor Parque
Comments: accepted for oral presentation at the World Congress on Computational Intelligence (WCCI 2022) - International Joint Conference on Neural Networks (IJCNN 2022)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[121] arXiv:2208.12133 (cross-list from cs.HC) [pdf, other]
Title: The ReprGesture entry to the GENEA Challenge 2022
Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao
Comments: 8 pages, 4 figures, ICMI 2022
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2208.12208 (cross-list from cs.SD) [pdf, other]
Title: Contrastive Audio-Language Learning for Music
Ilaria Manco, Emmanouil Benetos, Elio Quinton, György Fazekas
Comments: Accepted to ISMIR 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[123] arXiv:2208.12387 (cross-list from cs.SD) [pdf, other]
Title: Music Separation Enhancement with Generative Modeling
Noah Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo
Comments: Accepted to ISMIR 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[124] arXiv:2208.12410 (cross-list from cs.SD) [pdf, other]
Title: Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi
Comments: accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[125] arXiv:2208.12485 (cross-list from cs.SD) [pdf, other]
Title: Concept-Based Techniques for "Musicologist-friendly" Explanations in a Deep Music Classifier
Francesco Foscarin, Katharina Hoedt, Verena Praher, Arthur Flexer, Gerhard Widmer
Comments: In Proceedings of the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), Bengaluru, India
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[126] arXiv:2208.12666 (cross-list from cs.CL) [pdf, other]
Title: Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages
Kaushal Santosh Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127] arXiv:2208.12753 (cross-list from cs.SD) [pdf, other]
Title: Spatio-Temporal Representation Learning Enhanced Source Cell-phone Recognition from Speech Recordings
Chunyan Zeng, Shixiong Feng, Zhifeng Wang, Xiangkui Wan, Yunfan Chen, Nan Zhao
Comments: 29 pages, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[128] arXiv:2208.12782 (cross-list from cs.SD) [pdf, other]
Title: Mel Spectrogram Inversion with Stable Pitch
Bruno Di Giorgi, Mark Levy, Richard Sharp
Comments: 7 pages, 5 figures, Proceedings of the 23st International Society for Music Information Retrieval Conference, ISMIR 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[129] arXiv:2208.12888 (cross-list from cs.CL) [pdf, other]
Title: Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation
Zoey Liu, Justin Spence, Emily Prud'hommeaux
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2208.12991 (cross-list from cs.NE) [pdf, other]
Title: Sub-mW Neuromorphic SNN audio processing applications with Rockpool and Xylo
Hannah Bos, Dylan Muir
Comments: This submission has been removed by arXiv administrators because the submitter did not have the authority to grant a license to the work at the time of submission
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131] arXiv:2208.13066 (cross-list from cs.SD) [pdf, other]
Title: SA: Sliding attack for synthetic speech detection with resistance to clipping and self-splicing
Deng JiaCheng, Dong Li, Yan Diqun, Wang Rangding, Zeng Jiaming
Comments: Updated description and formula
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[132] arXiv:2208.13183 (cross-list from cs.SD) [pdf, other]
Title: Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark
Comments: To be published in Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2208.13191 (cross-list from cs.SD) [pdf, other]
Title: Towards Disentangled Speech Representations
Cal Peyser, Ronny Huang Andrew Rosenberg Tara N. Sainath, Michael Picheny, Kyunghyun Cho
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[134] arXiv:2208.13285 (cross-list from cs.SD) [pdf, other]
Title: Computing with Hypervectors for Efficient Speaker Identification
Ping-Chen Huang, Denis Kleyko, Jan M. Rabaey, Bruno A. Olshausen, Pentti Kanerva
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[135] arXiv:2208.13321 (cross-list from cs.CL) [pdf, other]
Title: Turn-Taking Prediction for Natural Conversational Speech
Shuo-yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He
Comments: 5 pages, Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2208.13322 (cross-list from cs.CL) [pdf, other]
Title: Streaming Intended Query Detection using E2E Modeling for Continued Conversation
Shuo-yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman
Comments: 5 pages, Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2208.13954 (cross-list from cs.CV) [pdf, other]
Title: Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis
Rongfei Chen, Wenju Zhou, Yang Li, Huiyu Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2208.14017 (cross-list from cs.SD) [pdf, other]
Title: Gridless 3D Recovery of Image Sources from Room Impulse Responses
Tom Sprunck (IRMA, TONUS), Yannick Privat (IRMA, TONUS), Cédric Foy (UMRAE), Antoine Deleforge (MULTISPEECH)
Comments: IEEE Signal Processing Letters, 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Classical Physics (physics.class-ph)
[139] arXiv:2208.14182 (cross-list from eess.SP) [pdf, other]
Title: A Study on the relationship between the geometrical shapes and the biometrical acoustic characteristics of human ear canal
Riki Kimura, Shunsuke Tanaka, Naoki Wakui, Naoki Kodama, Shohei Yano
Comments: 10 pages, 10 figures
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2208.14339 (cross-list from cs.SD) [pdf, other]
Title: HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription
Weixing Wei, Peilin Li, Yi Yu, Wei Li
Comments: Accepted to ISMIR 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[141] arXiv:2208.14345 (cross-list from cs.SD) [pdf, other]
Title: MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks
Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[142] arXiv:2208.14355 (cross-list from cs.SD) [pdf, other]
Title: Towards robust music source separation on loud commercial music
Chang-Bin Jeon, Kyogu Lee
Comments: Accepted to ISMIR 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2208.14717 (cross-list from cs.SD) [pdf, other]
Title: A Real-Time Tempo and Meter Tracking System for Rhythmic Improvis
Filippo Carnovalini, Antonio Rodà
Journal-ref: In Audio Mostly (AM'19), September 18-20, 2019, Nottingham, UK. ACM, New York, NY, USA, 8 pages
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[144] arXiv:2208.14734 (cross-list from cs.SD) [pdf, other]
Title: Open Challenges in Musical Metacreation
Filippo Carnovalini
Journal-ref: In EAI International Conference on Smart Objects and Technologies for Social Good (GoodTechs '19), September 25-27, 2019, Valencia, Spain. ACM, New York, NY, USA, 2 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[145] arXiv:2208.14747 (cross-list from cs.SD) [pdf, other]
Title: A New Corpus for Computational Music Research and A Novel Method for Musical Structure Analysis
Filippo Carnovalini, Antonio Rodà, Nicholas Harley, Steven T. Homer, Geraint A. Wiggins
Journal-ref: In Audio Mostly 2021 (AM '21), September 1-3, 2021, virtual/Trento, Italy. ACM, New York, NY, USA, 4 pages
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[146] arXiv:2208.14750 (cross-list from cs.SD) [pdf, other]
Title: Harmonization and Evaluation; Tweaking the Parameters on Human Listeners
Filippo Carnovalini, Alessandro Pelizzo, Antonio Rodà, Sergio Canazza
Comments: Accepted for publication in 9th International Conference on Kansei Engineering and Emotion Research 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[147] arXiv:2208.14812 (cross-list from cs.SD) [pdf, other]
Title: Domain Shift-oriented Machine Anomalous Sound Detection Model Based on Self-Supervised Learning
Jing-ke Yan, Xin Wang, Qin Wang, Qin Qin, Huang-he Li, Peng-fei Ye, Yue-ping He, Jing Zeng
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2208.14819 (cross-list from cs.SD) [pdf, other]
Title: Cadence Detection in Symbolic Classical Music using Graph Neural Networks
Emmanouil Karystinaios, Gerhard Widmer
Comments: In proceedings of the International Society for Music Information Retrieval Conference 2022 (ISMIR)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[149] arXiv:2208.14867 (cross-list from cs.SD) [pdf, other]
Title: Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-Supervised Learning
Seungyeon Rhyu, Sarah Kim, Kyogu Lee
Comments: 8 pages, 4 figures, the 23rd International Society for Music Information Retrieval Conference, Bengaluru, India, 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Total of 149 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack