Sound

Authors and titles for October 2024

Total of 304 entries : 1-250 251-304

Showing up to 250 entries per page: fewer | more | all

[251] arXiv:2410.15499 (cross-list from cs.AI) [pdf, html, other]: Title: Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses

Suhita Ghosh, Tim Thiele, Frederic Lorbeer, Frank Dreyer, Sebastian Stober

Comments: Accepted in NeurIPS 2024 Workshop (Audio Imagination)

Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[252] arXiv:2410.15500 (cross-list from cs.AI) [pdf, html, other]: Title: Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example

Suhita Ghosh, Melanie Jouaiti, Arnab Das, Yamini Sinha, Tim Polzehl, Ingo Siegert, Sebastian Stober

Comments: Accepted in Interspeech 2024

Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[253] arXiv:2410.15609 (cross-list from cs.CL) [pdf, html, other]: Title: Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding

Yeonjoon Jung, Jaeseong Lee, Seungtaek Choi, Dohyeon Lee, Minsoo Kim, Seung-won Hwang

Comments: 9 pages, 3 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[254] arXiv:2410.15764 (cross-list from eess.AS) [pdf, html, other]: Title: LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu

Comments: 5 pages, 2 figures, 3 tables. Demo page: this https URL. Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[255] arXiv:2410.15929 (cross-list from cs.CL) [pdf, html, other]: Title: Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection

Koji Inoue, Divesh Lala, Gabriel Skantze, Tatsuya Kawahara

Comments: This paper has been accepted for presentation at the main conference of 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) and represents the author's version of the work

Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[256] arXiv:2410.16059 (cross-list from eess.AS) [pdf, html, other]: Title: Multi-Level Speaker Representation for Target Speaker Extraction

Ke Zhang, Junjie Li, Shuai Wang, Yangjie Wei, Yi Wang, Yannan Wang, Haizhou Li

Comments: 5 pages. Submitted to ICASSP 2025. Implementation will be released at this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[257] arXiv:2410.16130 (cross-list from eess.AS) [pdf, html, other]: Title: Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning

Chun-Yi Kuan, Hung-yi Lee

Comments: Accepted to ICASSP 2025. Project Website: this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[258] arXiv:2410.16278 (cross-list from cs.NI) [pdf, html, other]: Title: Edge Computing in Distributed Acoustic Sensing: An Application in Traffic Monitoring

Khanh Truong, Jo Eidsvik, Robin Andre Rørstadbotnen

Comments: 11 pages, 17 figures

Subjects: Networking and Internet Architecture (cs.NI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[259] arXiv:2410.17028 (cross-list from eess.AS) [pdf, html, other]: Title: Can a Machine Distinguish High and Low Amount of Social Creak in Speech?

Anne-Maria Laukkanen, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku

Comments: Accepted in Journal of Voice

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[260] arXiv:2410.17033 (cross-list from eess.AS) [pdf, html, other]: Title: Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification

Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian

Comments: Accepted to ISCSLP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[261] arXiv:2410.17196 (cross-list from cs.CL) [pdf, html, other]: Title: VoiceBench: Benchmarking LLM-Based Voice Assistants

Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li

Comments: Work in progress. Data is available at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[262] arXiv:2410.17574 (cross-list from cs.LG) [pdf, html, other]: Title: Adversarial Domain Adaptation for Metal Cutting Sound Detection: Leveraging Abundant Lab Data for Scarce Industry Data

Mir Imtiaz Mostafiz (1), Eunseob Kim (2), Adrian Shuai Li (1), Elisa Bertino (1), Martin Byung-Guk Jun (2), Ali Shakouri (3) ((1) Department of Computer Science, Purdue University (2) School of Mechanical Engineering, Purdue University, (3) School of Electrical and Computer Engineering, Purdue University)

Comments: 8 pages, 3 figures, 3 tables, First two named Authors have equal contribution (Co-first author)

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[263] arXiv:2410.17790 (cross-list from eess.AS) [pdf, other]: Title: Regularized autoregressive modeling and its application to audio signal declipping

Ondřej Mokrý, Pavel Rajmic

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[264] arXiv:2410.17799 (cross-list from cs.CL) [pdf, html, other]: Title: OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Qinglin Zhang, Luyao Cheng, Chong Deng, Qian Chen, Wen Wang, Siqi Zheng, Jiaqing Liu, Hai Yu, Chaohong Tan, Zhihao Du, Shiliang Zhang

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[265] arXiv:2410.17834 (cross-list from eess.AS) [pdf, html, other]: Title: Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech

Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann

Comments: Accepted at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[266] arXiv:2410.18218 (cross-list from cs.AI) [pdf, html, other]: Title: Optimizing the role of human evaluation in LLM-based spoken document summarization systems

Margaret Kroll, Kelsey Kraus

Journal-ref: Proc. Interspeech 2024, 1935-1939 (2024)

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[267] arXiv:2410.18298 (cross-list from cs.LG) [pdf, html, other]: Title: Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches

Kexin Feng, Theodora Chaspari

Comments: accepted at the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2024)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[268] arXiv:2410.18363 (cross-list from cs.AI) [pdf, html, other]: Title: Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model

Vishakha Lall, Yisi Liu

Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[269] arXiv:2410.18395 (cross-list from cs.LG) [pdf, html, other]: Title: A contrastive-learning approach for auditory attention detection

Seyed Ali Alavi Bajestan, Mark Pitt, Donald S. Williamson

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[270] arXiv:2410.18607 (cross-list from cs.CL) [pdf, html, other]: Title: STTATTS: Unified Speech-To-Text And Text-To-Speech Model

Hawau Olamide Toyin, Hao Li, Hanan Aldarmaki

Comments: 11 pages, 4 Figures, EMNLP 2024 Findings

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[271] arXiv:2410.18850 (cross-list from cs.CL) [pdf, other]: Title: kNN For Whisper And Its Effect On Bias And Speaker Adaptation

Maya K. Nachesa, Vlad Niculae

Comments: Accepted to Findings of NAACL 2025. 7 pages incl. appendix, 2 figures, 6 tables

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[272] arXiv:2410.19134 (cross-list from cs.CL) [pdf, html, other]: Title: AlignCap: Aligning Speech Emotion Captioning to Human Preferences

Ziqi Liang, Haoxiang Shi, Hanhui Chen

Comments: Accepted to EMNLP2024 main conference

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[273] arXiv:2410.19168 (cross-list from eess.AS) [pdf, html, other]: Title: MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

S Sakshi, Utkarsh Tyagi, Sonal Kumar, Ashish Seth, Ramaneswaran Selvakumar, Oriol Nieto, Ramani Duraiswami, Sreyan Ghosh, Dinesh Manocha

Comments: Project Website: this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[274] arXiv:2410.19199 (cross-list from cs.SI) [pdf, html, other]: Title: Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis

Suparna De, Ionut Bostan, Nishanth Sastry

Journal-ref: 16th International Conference on Advances in Social Networks Analysis and Mining -ASONAM-2024

Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[275] arXiv:2410.19595 (cross-list from eess.AS) [pdf, html, other]: Title: Mask-Weighted Spatial Likelihood Coding for Speaker-Independent Joint Localization and Mask Estimation

Jakob Kienegger, Alina Mannanova, Timo Gerkmann

Comments: ©2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[276] arXiv:2410.19793 (cross-list from eess.SP) [pdf, html, other]: Title: Single-word Auditory Attention Decoding Using Deep Learning Model

Nhan Duc Thanh Nguyen, Huy Phan, Kaare Mikkelsen, Preben Kidmose

Comments: 5 pages, 3 figures

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[277] arXiv:2410.19935 (cross-list from cs.CL) [pdf, html, other]: Title: Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?

Opeyemi Osakuade, Simon King

Comments: Submitted to ICASSP 2025

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[278] arXiv:2410.20334 (cross-list from cs.CL) [pdf, html, other]: Title: Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs

Enshi Zhang, Christian Poellabauer

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[279] arXiv:2410.20336 (cross-list from cs.CL) [pdf, html, other]: Title: Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation

Maohao Shen, Shun Zhang, Jilong Wu, Zhiping Xiu, Ehab AlBadawy, Yiting Lu, Mike Seltzer, Qing He

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[280] arXiv:2410.20564 (cross-list from cs.HC) [pdf, html, other]: Title: Using Confidence Scores to Improve Eyes-free Detection of Speech Recognition Errors

Sadia Nowrin, Keith Vertanen

Comments: To appear in PErvasive Technologies Related to Assistive Environments (PETRA '25)

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[281] arXiv:2410.20578 (cross-list from eess.AS) [pdf, html, other]: Title: Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes

Ivan Kukanov, Janne Laakkonen, Tomi Kinnunen, Ville Hautamäki

Comments: 6 pages, accepted to the IEEE Spoken Language Technology Workshop (SLT) 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[282] arXiv:2410.21276 (cross-list from cs.CL) [pdf, html, other]: Title: GPT-4o System Card

OpenAI: Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis, Alexis Conneau, Ali Kamali, Allan Jabri, Allison Moyer, Allison Tam, Amadou Crookes, Amin Tootoochian, Amin Tootoonchian, Ananya Kumar, Andrea Vallone, Andrej Karpathy, Andrew Braunstein, Andrew Cann, Andrew Codispoti, Andrew Galu, Andrew Kondrich, Andrew Tulloch, Andrey Mishchenko, Angela Baek, Angela Jiang, Antoine Pelisse, Antonia Woodford, Anuj Gosalia, Arka Dhar, Ashley Pantuliano, Avi Nayak, Avital Oliver, Barret Zoph, Behrooz Ghorbani, Ben Leimberger, Ben Rossen, Ben Sokolowsky, Ben Wang, Benjamin Zweig, Beth Hoover, Blake Samic, Bob McGrew, Bobby Spero, Bogo Giertler, Bowen Cheng, Brad Lightcap, Brandon Walkin, Brendan Quinn, Brian Guarraci, Brian Hsu, Bright Kellogg, Brydon Eastman, Camillo Lugaresi, Carroll Wainwright, Cary Bassin, Cary Hudson, Casey Chu, Chad Nelson, Chak Li, Chan Jun Shern, Channing Conger, Charlotte Barette, Chelsea Voss, Chen Ding, Cheng Lu, Chong Zhang, Chris Beaumont, Chris Hallacy, Chris Koch, Christian Gibson, Christina Kim, Christine Choi, Christine McLeavey, Christopher Hesse, Claudia Fischer, Clemens Winter, Coley Czarnecki, Colin Jarvis, Colin Wei, Constantin Koumouzelis, Dane Sherburn

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[283] arXiv:2410.21640 (cross-list from eess.AS) [pdf, html, other]: Title: A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation

Si-Ioi Ng, Lingfeng Xu, Ingo Siegert, Nicholas Cummins, Nina R. Benway, Julie Liss, Visar Berisha

Comments: 76 pages, 24 figures

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[284] arXiv:2410.21797 (cross-list from eess.AS) [pdf, html, other]: Title: Representational learning for an anomalous sound detection system with source separation model

Seunghyeon Shin, Seokjin Lee

Comments: DCASE 2024 workshop published

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[285] arXiv:2410.21876 (cross-list from cs.IR) [pdf, html, other]: Title: Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization

Kemal Altwlkany, Sead Delalić, Adis Alihodžić, Elmedin Selmanović, Damir Hasić

Subjects: Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[286] arXiv:2410.21951 (cross-list from eess.AS) [pdf, html, other]: Title: Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding

Bohan Li, Hankun Wang, Situo Zhang, Yiwei Guo, Kai Yu

Comments: Accepted by ICASSP 2025

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[287] arXiv:2410.22033 (cross-list from eess.AS) [pdf, html, other]: Title: Timbre Difference Capturing in Anomalous Sound Detection

Tomoya Nishida, Harsh Purohit, Kota Dohi, Takashi Endo, Yohei Kawaguchi

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[288] arXiv:2410.22056 (cross-list from eess.AS) [pdf, html, other]: Title: Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training

Ryoya Ogura, Tomoya Nishida, Yohei Kawaguchi

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[289] arXiv:2410.22066 (cross-list from cs.CL) [pdf, html, other]: Title: Sing it, Narrate it: Quality Musical Lyrics Translation

Zhuorui Ye, Jinhan Li, Rongwu Xu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[290] arXiv:2410.22124 (cross-list from cs.LG) [pdf, html, other]: Title: RankUp: Boosting Semi-Supervised Regression with an Auxiliary Ranking Classifier

Pin-Yen Huang, Szu-Wei Fu, Yu Tsao

Comments: Accepted at NeurIPS 2024 (Poster)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[291] arXiv:2410.22179 (cross-list from cs.CL) [pdf, html, other]: Title: Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech

Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, Soroosh Mariooryad, Matt Shannon, Julian Salazar, David Kao

Comments: Accepted to NAACL 2025

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[292] arXiv:2410.22350 (cross-list from cs.MM) [pdf, other]: Title: Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization

Mao-Kui He, Jun Du, Shu-Tong Niu, Qing-Feng Liu, Chin-Hui Lee

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[293] arXiv:2410.22448 (cross-list from eess.AS) [pdf, html, other]: Title: A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation

Alexander H. Liu, Qirui Wang, Yuan Gong, James Glass

Comments: NeurIPS 2024 Audio Imagination workshop paper; demo page at this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[294] arXiv:2410.22807 (cross-list from eess.AS) [pdf, html, other]: Title: APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm

Hui-Peng Du, Yang Ai, Rui-Chen Zheng, Zhen-Hua Ling

Comments: Accepted by ISCSLP 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[295] arXiv:2410.22903 (cross-list from eess.AS) [pdf, html, other]: Title: Augmenting Polish Automatic Speech Recognition System With Synthetic Data

Łukasz Bondaruk, Jakub Kubiak, Mateusz Czyżnikiewicz

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[296] arXiv:2410.23015 (cross-list from eess.AS) [pdf, html, other]: Title: Audiovisual angle and voice incongruence do not affect audiovisual verbal short-term memory in virtual reality

Cosima A. Ermert, Manuj Yadav, Jonathan Ehret, Chinthusa Mohanathasan, Andrea Bönsch, Torsten W. Kuhlen, Sabine J. Schlittmeier, Janina Fels

Comments: Submitted to PlosOne, 19 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[297] arXiv:2410.23230 (cross-list from cs.CV) [pdf, html, other]: Title: Aligning Audio-Visual Joint Representations with an Agentic Workflow

Shentong Mo, Yibing Song

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[298] arXiv:2410.23293 (cross-list from eess.AS) [pdf, html, other]: Title: DDMD: AI-Powered Digital Drug Music Detector

Mohamed Gharzouli

Comments: 14 pages

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[299] arXiv:2410.23320 (cross-list from eess.AS) [pdf, html, other]: Title: Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

Théodor Lemerle, Harrison Vanderbyl, Vaibhav Srivastav, Nicolas Obin, Axel Roebel

Comments: Preprint

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[300] arXiv:2410.23325 (cross-list from eess.AS) [pdf, other]: Title: Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano

Zhenyi Hou, Xu Zhao, Kejie Ye, Xinyu Sheng, Shanggerile Jiang, Jiajing Xia, Yitao Zhang, Chenxi Ban, Daijun Luo, Jiaxing Chen, Yan Zou, Yuchao Feng, Guangyu Fan, Xin Yuan

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD)
[301] arXiv:2410.23861 (cross-list from cs.CL) [pdf, html, other]: Title: Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models

Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[302] arXiv:2410.23987 (cross-list from eess.AS) [pdf, html, other]: Title: Task-Aware Unified Source Separation

Kohei Saijo, Janek Ebbers, François G. Germain, Gordon Wichern, Jonathan Le Roux

Comments: Submitted to ICASSP 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[303] arXiv:2410.24019 (cross-list from cs.CL) [pdf, html, other]: Title: Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?

Ioannis Tsiamas, Matthias Sperber, Andrew Finch, Sarthak Garg

Comments: WMT 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[304] arXiv:2410.24177 (cross-list from eess.AS) [pdf, html, other]: Title: DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

Heng-Jui Chang, Hongyu Gong, Changhan Wang, James Glass, Yu-An Chung

Comments: Preprint

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Total of 304 entries : 1-250 251-304

Showing up to 250 entries per page: fewer | more | all