Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for December 2023

Total of 215 entries : 1-50 51-100 101-150 151-200 ... 201-215
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2312.00091 [pdf, other]
Title: Sound Terminology Describing Production and Perception of Sonification
Tim Ziemer
Comments: 16 pages, 0 figures
Journal-ref: AES: Journal of the Audio Engineering Society 72(5), 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2312.00476 [pdf, html, other]
Title: Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer
Bing Yang, Xiaofei Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3] arXiv:2312.00834 [pdf, html, other]
Title: AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar, Purva Chiniya, Dinesh Manocha
Comments: Accepted to CVPR 2024
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[4] arXiv:2312.01062 [pdf, other]
Title: Acoustic Signal Analysis with Deep Neural Network for Detecting Fault Diagnosis in Industrial Machines
Mustafa Yurdakul, Sakir Tasdemir
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[5] arXiv:2312.01092 [pdf, other]
Title: A Semi-Supervised Deep Learning Approach to Dataset Collection for Query-By-Humming Task
Amantur Amatov, Dmitry Lamanov, Maksim Titov, Ivan Vovk, Ilya Makarov, Mikhail Kudinov
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:2312.01479 [pdf, html, other]
Title: OpenVoice: Versatile Instant Voice Cloning
Zengyi Qin, Wenliang Zhao, Xumin Yu, Xin Sun
Comments: Technical Report
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7] arXiv:2312.01554 [pdf, html, other]
Title: Building Ears for Robots: Machine Hearing in the Age of Autonomy
Xuan Zhong
Comments: 11 pages, 6 figures. The materials covered in this article were presented and discussed at the Hearing Seminar at Stanford University organized by Malcolm Slaney in October, 2023
Subjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[8] arXiv:2312.01645 [pdf, other]
Title: A text-dependent speaker verification application framework based on Chinese numerical string corpus
Litong Zheng, Feng Hong, Weijie Xu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2312.01842 [pdf, other]
Title: Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Jihyun Lee, Yejin Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee
Comments: Accepted in ASRU 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[10] arXiv:2312.02229 [pdf, html, other]
Title: Synthetic Data Generation Techniques for Developing AI-based Speech Assessments for Parkinson's Disease (A Comparative Study)
Mahboobeh Parsapoor
Comments: 6, 5 Tables, 5 Figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11] arXiv:2312.02773 [pdf, html, other]
Title: Integrating Plug-and-Play Data Priors with Weighted Prediction Error for Speech Dereverberation
Ziye Yang, Wenxing Yang, Kai Xie, Jie Chen
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2312.03410 [pdf, other]
Title: Detecting Voice Cloning Attacks via Timbre Watermarking
Chang Liu, Jie Zhang, Tianwei Zhang, Xi Yang, Weiming Zhang, Nenghai Yu
Comments: NDSS 2024
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[13] arXiv:2312.03455 [pdf, html, other]
Title: Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data
Tashi Namgyal, Alexander Hepburn, Raul Santos-Rodriguez, Valero Laparra, Jesus Malo
Comments: Machine Learning for Audio Workshop, NeurIPS 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[14] arXiv:2312.03479 [pdf, html, other]
Title: JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live
Sven Hollowell, Tashi Namgyal, Paul Marshall
Comments: Conference: 24th International Society for Music Information Retrieval. Late Breaking Demo. 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[15] arXiv:2312.03632 [pdf, html, other]
Title: Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:2312.03666 [pdf, html, other]
Title: Towards small and accurate convolutional neural networks for acoustic biodiversity monitoring
Serge Zaugg, Mike van der Schaar, Florence Erbs, Antonio Sanchez, Joan V. Castell, Emiliano Ramallo, Michel André
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:2312.04846 [pdf, html, other]
Title: Sound Source Localization for a Source inside a Structure using Ac-CycleGAN
Shunsuke Kita, Choong Sik Park, Yoshinobu Kajikawa
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:2312.04919 [pdf, other]
Title: Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion
Binzhu Sha, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2312.05415 [pdf, html, other]
Title: An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis
Via Nielson, Steven Hillis
Comments: 8 pages, 1 figure, 4 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20] arXiv:2312.05640 [pdf, html, other]
Title: Keyword spotting -- Detecting commands in speech using deep learning
Sumedha Rai, Tong Li, Bella Lyu
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[21] arXiv:2312.05815 [pdf, html, other]
Title: Voice Activity Detection (VAD) in Noisy Environments
Joshua Ball
Comments: 7 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2312.05994 [pdf, html, other]
Title: mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks
Christos Plachouras, Pablo Alonso-Jiménez, Dmitry Bogdanov
Comments: Machine Learning for Audio Workshop, Neural Information Processing Systems (NeurIPS) 2023, New Orleans, LA
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[23] arXiv:2312.06055 [pdf, html, other]
Title: Speaker-Text Retrieval via Contrastive Learning
Xuechen Liu, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2312.06118 [pdf, html, other]
Title: ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning
Xincheng Yu, Dongyue Guo, Jianwei Zhang, Yi Lin
Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[25] arXiv:2312.06197 [pdf, html, other]
Title: MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer
Dong Yao, Jieming Zhu, Jiahao Xun, Shengyu Zhang, Zhou Zhao, Liqun Deng, Wenqiao Zhang, Zhenhua Dong, Xin Jiang
Comments: Short paper accepted by WWW 2024. This is revised and condensed based on the previous version titled "Music-PAW: Learning Music Representations via Hierarchical Part-whole Interaction and Contrast". For more experimental details and discussions, please refer to the original long paper at arXiv:2312.06197v1
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[26] arXiv:2312.06253 [pdf, html, other]
Title: Transformer Attractors for Robust and Efficient End-to-End Neural Diarization
Lahiru Samarakoon, Samuel J. Broughton, Marc Härkönen, Ivan Fung
Comments: 8 pages, 1 figure, ASRU2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2312.06337 [pdf, html, other]
Title: Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations
Tao Meng, Yuntao Shou, Wei Ai, Nan Yin, Keqin Li
Comments: 16 pages, 9 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[28] arXiv:2312.06466 [pdf, html, other]
Title: Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach
Yan Zhao, Yuan Zong, Hailun Lian, Cheng Lu, Jingang Shi, Wenming Zheng
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:2312.07059 [pdf, html, other]
Title: LSTM-CNN Network for Audio Signature Analysis in Noisy Environments
Praveen Damacharla, Hamid Rajabalipanah, Mohammad Hosein Fakheri
Comments: 10th Annual Conf. on Computational Science & Computational Intelligence (CSCI'23)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[30] arXiv:2312.07136 [pdf, html, other]
Title: Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning
Ivan Fung, Lahiru Samarakoon, Samuel J. Broughton
Comments: 7 pages, 2 figures, ASRU 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2312.08069 [pdf, other]
Title: Improving Spatial Resolution of First-order Ambisonics Using Sparse MDCT Representation
Denis Likhachov, Nick Petrovsky, Elias Azarov
Comments: Associated slides with audio samples this https URL Audio samples with visualizations this https URL
Journal-ref: Proceedings of 16-th International Conference PRIP2023, Minsk, 2023, P. 122-125
Subjects: Sound (cs.SD)
[32] arXiv:2312.08494 [pdf, html, other]
Title: PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models
Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[33] arXiv:2312.08571 [pdf, html, other]
Title: PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition
Chengxi Lei, Satwinder Singh, Feng Hou, Xiaoyun Jia, Ruili Wang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2312.08660 [pdf, html, other]
Title: Low-rank constrained multichannel signal denoising considering channel-dependent sensitivity inspired by self-supervised learning for optical fiber sensing
Noriyuki Tonami, Wataru Kohno, Sakiko Mishima, Yumi Arai, Reishi Kondo, Tomoyuki Hino
Comments: Accepted for ICASSP2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2312.08676 [pdf, other]
Title: SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li, Yiwei Guo, Xie Chen, Kai Yu
Comments: 5 pages, 2 figures, accepted to ICASSP 2024
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[36] arXiv:2312.08723 [pdf, other]
Title: StemGen: A music generation model that listens
Julian D. Parker, Janne Spijkervet, Katerina Kosta, Furkan Yesiler, Boris Kuznetsov, Ju-Chiang Wang, Matt Avent, Jitong Chen, Duc Le
Comments: Accepted for publication at ICASSP 2024
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[37] arXiv:2312.08732 [pdf, html, other]
Title: TIA: A Teaching Intonation Assessment Dataset in Real Teaching Situations
Shuhua Liu, Chunyu Zhang, Binshuai Li, Niantong Qin, Huanting Cheng, Huayu Zhang
Comments: 4 pages, 3 figures, 4 tables, accepted by 2024 International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:2312.08850 [pdf, html, other]
Title: Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2312.08931 [pdf, html, other]
Title: N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding
Jinhao Tian, Zuchao Li, Jiajia Li, Ping Wang
Comments: 8 pages, 2 figures, aaai2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[40] arXiv:2312.08979 [pdf, html, other]
Title: Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement
George Close, William Ravenscroft, Thomas Hain, Stefan Goetze
Comments: Accepted @ ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:2312.09040 [pdf, html, other]
Title: STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
Kangwook Jang, Sungnyun Kim, Hoirin Kim
Comments: ICASSP 2024 Best Student Paper Awarded. Code URL: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[42] arXiv:2312.09143 [pdf, html, other]
Title: F1-EV Score: Measuring the Likelihood of Estimating a Good Decision Threshold for Semi-Supervised Anomaly Detection
Kevin Wilkinghoff, Keisuke Imoto
Comments: Accepted for presentation at IEEE ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2312.09265 [pdf, html, other]
Title: Acoustic models of Brazilian Portuguese Speech based on Neural Transformers
Marcelo Matheus Gauy, Marcelo Finger
Comments: Under review at Journal of Brazilian Computer Society
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44] arXiv:2312.09269 [pdf, html, other]
Title: Efficient speech detection in environmental audio using acoustic recognition and knowledge distillation
Drew Priebe, Burooj Ghani, Dan Stowell
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45] arXiv:2312.09369 [pdf, html, other]
Title: Audio-visual fine-tuning of audio-only ASR models
Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[46] arXiv:2312.09580 [pdf, html, other]
Title: A 1.6-mW Sparse Deep Learning Accelerator for Speech Separation
Chih-Chyau Yang, Tian-Sheuan Chang
Journal-ref: in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 31, no. 3, pp. 310-319, March 2023
Subjects: Sound (cs.SD); Hardware Architecture (cs.AR); Audio and Speech Processing (eess.AS)
[47] arXiv:2312.09603 [pdf, html, other]
Title: Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification
June-Woo Kim, Sangmin Bae, Won-Yang Cho, Byungjo Lee, Ho-Young Jung
Comments: accepted to ICASSP 2024
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[48] arXiv:2312.09651 [pdf, html, other]
Title: What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, Jianhua Tao
Comments: Accepted by the main track The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[49] arXiv:2312.09746 [pdf, html, other]
Title: Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies
Bingshen Mu, Pengcheng Guo, Dake Guo, Pan Zhou, Wei Chen, Lei Xie
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2312.09842 [pdf, html, other]
Title: On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition
Nagaraj Adiga, Jinhwan Park, Chintigari Shiva Kumar, Shatrughan Singh, Kyungmin Lee, Chanwoo Kim, Dhananjaya Gowda
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 215 entries : 1-50 51-100 101-150 151-200 ... 201-215
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack