Sound

Authors and titles for April 2023

Total of 123 entries : 1-50 51-100 101-123

Showing up to 50 entries per page: fewer | more | all

[51] arXiv:2304.12939 [pdf, other]: Title: The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist

Carlos Cancino-Chacón, Silvan Peter, Patricia Hu, Emmanouil Karystinaios, Florian Henkel, Francesco Foscarin, Nimrod Varga, Gerhard Widmer

Comments: In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23), Macao, China. The differences/extensions with the previous version include a technical appendix, added missing links, and minor text updates. 10 pages, 4 figures

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[52] arXiv:2304.12993 [pdf, other]: Title: Room dimensions and absorption inference from room transfer function via machine learning

Yuanxin Xia, Cheol-Ho Jeong

Comments: 15 pages, 10 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53] arXiv:2304.13085 [pdf, other]: Title: AI-Synthesized Voice Detection Using Neural Vocoder Artifacts

Chengzhe Sun, Shan Jia, Shuwei Hou, Siwei Lyu

Comments: Paper accepted in CVPRW 2023. Codes and data can be found at this https URL. arXiv admin note: substantial text overlap with arXiv:2302.09198

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[54] arXiv:2304.13121 [pdf, html, other]: Title: Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

Chenpeng Du, Yiwei Guo, Feiyu Shen, Kai Yu

Comments: Accepted to ICASSP 2023 Special Session for Grand Challenges

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2304.14019 [pdf, other]: Title: XAI-based Comparison of Input Representations for Audio Event Classification

Annika Frommholz, Fabian Seipel, Sebastian Lapuschkin, Wojciech Samek, Johanna Vielhaben

Comments: 7 pages, 4 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[56] arXiv:2304.14535 [pdf, other]: Title: Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Hamza Kheddar, Yassine Himeur, Somaya Al-Maadeed, Abbes Amira, Faycal Bensaali

Journal-ref: Knowledge-Based Systems, Elsevier, 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[57] arXiv:2304.14848 [pdf, other]: Title: Musical Voice Separation as Link Prediction: Modeling a Musical Perception Task as a Multi-Trajectory Tracking Problem

Emmanouil Karystinaios, Francesco Foscarin, Gerhard Widmer

Comments: Accepted at the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[58] arXiv:2304.14882 [pdf, other]: Title: The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice Baird, Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen, Nikola Lackovic, Marie-José Caraty, Claude Montacié

Comments: 5 pages, part of the ACM Multimedia 2023 Grand Challenge "The ACM Multimedia 2023 Computational Paralinguistics Challenge (ComParE 2023). arXiv admin note: text overlap with arXiv:2205.06799

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[59] arXiv:2304.00171 (cross-list from cs.CL) [pdf, other]: Title: Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:2304.00173 (cross-list from cs.CL) [pdf, other]: Title: Lego-Features: Exporting modular encoder features for streaming and deliberation ASR

Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61] arXiv:2304.00649 (cross-list from cs.CL) [pdf, other]: Title: Multilingual Word Error Rate Estimation: e-WER3

Shammur Absar Chowdhury, Ahmed Ali

Comments: Accepted in ICASSP, Multilingual WER estimation, End-to-End systems, multilingual model, automatic word error rate estimation

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[62] arXiv:2304.00986 (cross-list from cs.AI) [pdf, other]: Title: The Music Note Ontology

Andrea Poltronieri, Aldo Gangemi

Comments: 12 pages, 1 figure, 1 table. Proceedings of the 12th Workshop on Ontology Design and Patterns (WOP 2021), Online, edited by K. Hammar et al., 2021

Subjects: Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63] arXiv:2304.00988 (cross-list from cs.AI) [pdf, other]: Title: The Music Annotation Pattern

Jacopo de Berardinis, Albert Meroño-Peñuela, Andrea Poltronieri, Valentina Presutti

Comments: 12 pages, 3 figures. Proceedings of the 13th Workshop on Ontology Design and Patterns, edited by V. Svátek et al., WOP, 2022

Subjects: Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64] arXiv:2304.00993 (cross-list from eess.AS) [pdf, other]: Title: Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels

Tzeviya Sylvia Fuchs, Yedid Hoshen

Comments: ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[65] arXiv:2304.01778 (cross-list from eess.AS) [pdf, other]: Title: Independent Vector Extraction Constrained on Manifold of Half-Length Filters

Zbyněk Koldovský, Jaroslav Čmejla, Tülay Adalı, Stephen O'Regan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[66] arXiv:2304.02181 (cross-list from cs.CL) [pdf, html, other]: Title: On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection

Yi Zhu, Mohamed Imoussaïne-Aïkous, Carolyn Côté-Lussier, Tiago H. Falk

Comments: Updated version; Published at IEEE Transactions on Information Forensics and Security

Journal-ref: IEEE Transactions on Information Forensics and Security, vol. 19, pp. 5151-5165, 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67] arXiv:2304.03169 (cross-list from cs.CL) [pdf, other]: Title: Selective Data Augmentation for Robust Speech Translation

Rajul Acharya, Ashish Panda, Sunil Kumar Kopparapu

Comments: Did not realize that the experiments and the analysis based on the experiments were incomplete

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68] arXiv:2304.03289 (cross-list from cs.HC) [pdf, other]: Title: A2D: Anywhere Anytime Drumming

Harel Yadid, Almog Algranti, Mark Levin, Ayal Taitler

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69] arXiv:2304.03407 (cross-list from cs.HC) [pdf, other]: Title: Adoption of AI Technology in the Music Mixing Workflow: An Investigation

Soumya Sai Vanka, Maryam Safi, Jean-Baptiste Rolland, George Fazekas

Journal-ref: Paper number 10653, 154th AES Convention 2023

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:2304.03416 (cross-list from eess.SP) [pdf, other]: Title: To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive Refinement

Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Ching-Hua Lee, Chouchang Yang, Yilin Shen, Hongxia Jin

Comments: Accepted for publication in ICASSP 2023

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71] arXiv:2304.03515 (cross-list from eess.AS) [pdf, other]: Title: Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker Audio

Jenthe Thienpondt, Nilesh Madhu, Kris Demuynck

Comments: proceedings of ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[72] arXiv:2304.03858 (cross-list from cs.CY) [pdf, other]: Title: Benchmark Dataset Dynamics, Bias and Privacy Challenges in Voice Biometrics Research

Casandra Rusti, Anna Leschanowsky, Carolyn Quinlan, Michaela Pnacek, Lauriane Gorce, Wiebke Hutiri

Comments: 8 pages (10 with References)

Subjects: Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2304.03899 (cross-list from cs.CL) [pdf, other]: Title: An Empirical Study and Improvement for Speech Emotion Recognition

Zhen Wu, Yizhe Lu, Xinyu Dai

Comments: Accepted by ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2304.03940 (cross-list from cs.LG) [pdf, other]: Title: Unsupervised Speech Representation Pooling Using Vector Quantization

Jeongkyun Park, Kwanghee Choi, Hyunjun Heo, Hyung-Min Park

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75] arXiv:2304.04157 (cross-list from eess.AS) [pdf, html, other]: Title: An investigation of phrase break prediction in an End-to-End TTS system

Anandaswarup Vadapalli

Comments: Accepted for publication in SN Computer Science

Journal-ref: SN Computer Science, 6(2):91, 2025

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[76] arXiv:2304.04394 (cross-list from eess.AS) [pdf, other]: Title: Leveraging Neural Representations for Audio Manipulation

Scott H. Hawley, Christian J. Steinmetz

Comments: Accepted as Express Paper for AES Europe 2023, this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[77] arXiv:2304.04478 (cross-list from cs.CL) [pdf, other]: Title: Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions

Daniel Ortega, Chia-Yu Li, Ngoc Thang Vu

Comments: Published in ICASSP 2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:2304.04974 (cross-list from eess.AS) [pdf, html, other]: Title: Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng

Comments: 12 pages, 7 figures, IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[79] arXiv:2304.05067 (cross-list from eess.AS) [pdf, other]: Title: Audio Bank: A High-Level Acoustic Signal Representation for Audio Event Recognition

Tushar Sandhan, Sukanya Sonowal, Jin Young Choi

Comments: 6 pages, 9 figures, published in IEEE International Conf ICCAS 2014 (Best paper award)

Subjects: Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Sound (cs.SD)
[80] arXiv:2304.05922 (cross-list from eess.AS) [pdf, other]: Title: Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo

Comments: accepted by ICASSP23

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[81] arXiv:2304.06183 (cross-list from eess.AS) [pdf, other]: Title: Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data

Matthew C. Kelley

Comments: 5 pages, 1 figure, accepted for ICPhS 2023; Julia reference corrected in v2

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[82] arXiv:2304.06315 (cross-list from eess.SP) [pdf, other]: Title: Brain Connectivity Features-based Age Group Classification using Temporal Asynchrony Audio-Visual Integration Task

Prerna Singh, Ayush Tripathi, Lalan Kumar, Tapan Kumar Gandhi

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[83] arXiv:2304.06786 (cross-list from eess.AS) [pdf, other]: Title: The future of hearing aid technology

Volker Hohmann

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[84] arXiv:2304.06795 (cross-list from eess.AS) [pdf, other]: Title: Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[85] arXiv:2304.06910 (cross-list from eess.AS) [pdf, html, other]: Title: HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

Soumya Dutta, Sriram Ganapathy

Comments: 11 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[86] arXiv:2304.07305 (cross-list from eess.AS) [pdf, other]: Title: 1-D Residual Convolutional Neural Network coupled with Data Augmentation and Regularization for the ICPHM 2023 Data Challenge

Matthias Kreuzer, Walter Kellermann

Comments: Accepted at the IEEE Conference on Prognostics and Health Management (ICPHM) 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[87] arXiv:2304.07307 (cross-list from eess.AS) [pdf, other]: Title: Airborne Sound Analysis for the Detection of Bearing Faults in Railway Vehicles with Real-World Data

Matthias Kreuzer, David Schmidt, Simon Wokusch, Walter Kellermann

Comments: Accepted at the ICPHM 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[88] arXiv:2304.07596 (cross-list from cs.RO) [pdf, other]: Title: Acoustic Beamforming for Object-relative Distance Estimation and Control in Unmanned Air Vehicles using Propulsion System Noise

Alisha Sharma, Jason Geder, Joseph Lingevitch, Theodore Martin, Daniel Lofaro, Donald Sofge

Comments: 7 pages, 12 figures

Subjects: Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2304.07830 (cross-list from cs.CL) [pdf, other]: Title: The language of sounds unheard: Exploring musical timbre semantics of large language models

Kai Siedenburg, Charalampos Saitis

Comments: 12 pages, 3 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:2304.07958 (cross-list from cs.CV) [pdf, other]: Title: Recursive Joint Attention for Audio-Visual Fusion in Regression based Emotion Recognition

R Gnana Praveen, Eric Granger, Patrick Cardinal

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91] arXiv:2304.08120 (cross-list from physics.geo-ph) [pdf, other]: Title: DAS-N2N: Machine learning Distributed Acoustic Sensing (DAS) signal denoising without clean data

Sacha Lapins, Antony Butcher, J.-Michael Kendall, Thomas S. Hudson, Anna L. Stork, Maximilian J. Werner, Jemma Gunning, Alex M. Brisbourne

Comments: Submitted for publication to Geophysical Journal International. For the purpose of open access, the author(s) has applied a Creative Commons Attribution (CC BY) licence to the Author Accepted Manuscript version arising from this submission

Subjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2304.08249 (cross-list from eess.AS) [pdf, other]: Title: Novel features for the detection of bearing faults in railway vehicles

Matthias Kreuzer, Alexander Schmidt, Walter Kellermann

Comments: Accepted at Inter-Noise 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[93] arXiv:2304.08431 (cross-list from cs.CL) [pdf, other]: Title: Prak: An automatic phonetic alignment tool for Czech

Václav Hanžl, Adléta Hanžlová

Comments: Submitted for ICPhS 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2304.08490 (cross-list from cs.CV) [pdf, other]: Title: Conditional Generation of Audio from Video via Foley Analogies

Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens

Comments: CVPR 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2304.08541 (cross-list from eess.AS) [pdf, other]: Title: How Tiny Can Analog Filterbank Features Be Made for Ultra-low-power On-device Keyword Spotting?

Subhajit Ray, Xinghua Sun, Nolan Tremelling, Maria Gordiyenko, Peter Kinget

Comments: Accepted as a full paper by the TinyML Research Symposium 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[96] arXiv:2304.08707 (cross-list from eess.AS) [pdf, other]: Title: Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

Comments: in ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[97] arXiv:2304.08811 (cross-list from cs.CR) [pdf, other]: Title: Towards the Transferable Audio Adversarial Attack via Ensemble Methods

Feng Guo, Zheng Sun, Yuxuan Chen, Lei Ju

Comments: Submitted to Cybersecurity journal 2023

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2304.09061 (cross-list from cs.IR) [pdf, other]: Title: A Scalable Framework for Automatic Playlist Continuation on Music Streaming Services

Walid Bendada, Guillaume Salha-Galvan, Thomas Bouabça, Tristan Cazenave

Comments: Accepted as a Full Paper at the SIGIR 2023 conference

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2304.09116 (cross-list from eess.AS) [pdf, other]: Title: NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

Comments: A large-scale text-to-speech and singing voice synthesis system with latent diffusion models. Update: NaturalSpeech 2 extension to voice conversion and speech enhancement

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[100] arXiv:2304.09226 (cross-list from eess.AS) [pdf, other]: Title: Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN

Ziyi Xu, Ziyue Zhao, Tim Fingscheidt

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Total of 123 entries : 1-50 51-100 101-123

Showing up to 50 entries per page: fewer | more | all