Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for April 2024

Total of 170 entries : 1-50 51-100 101-150 151-170
Showing up to 50 entries per page: fewer | more | all
[151] arXiv:2404.15321 (cross-list from eess.SP) [pdf, html, other]
Title: Characteristics-Based Design of Generalized-Exponent Bandpass Filters
Samiya A Alkhairy
Comments: 16 pages, 7 figures, 2 tables, 26 equations
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2404.15704 (cross-list from cs.LG) [pdf, html, other]
Title: Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning
Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153] arXiv:2404.15854 (cross-list from cs.CR) [pdf, html, other]
Title: CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu
Comments: Submitted to IEEE TDSC
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2404.16104 (cross-list from eess.AS) [pdf, html, other]
Title: Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective
Albert Rilliard, David Doukhan, Rémi Uro, Simon Devauchelle
Comments: 5 pages, 2 figures, keywords:, Gender, Diachrony, Vocal Tract Resonance, Vocal register, Broadcast speech
Journal-ref: Radek Skarnitzl & Jan Vol\'in (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), Prague 2023, pp. 753-757. Guarant International. ISBN 978-80-908 114-2-3
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[155] arXiv:2404.16216 (cross-list from cs.CV) [pdf, html, other]
Title: ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling
Arjun Somayazulu, Sagnik Majumder, Changan Chen, Kristen Grauman
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[156] arXiv:2404.16305 (cross-list from cs.MM) [pdf, html, other]
Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2404.16547 (cross-list from eess.AS) [pdf, html, other]
Title: Developing Acoustic Models for Automatic Speech Recognition in Swedish
Giampiero Salvi
Comments: 16 pages, 7 figures
Journal-ref: European Student Journal of Language and Speech, 1999
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[158] arXiv:2404.16743 (cross-list from cs.CL) [pdf, html, other]
Title: Automatic Speech Recognition System-Independent Word Error Rate Estimation
Chanho Park, Mingjie Chen, Thomas Hain
Comments: Accepted to LREC-COLING 2024 (long)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[159] arXiv:2404.16905 (cross-list from cs.CL) [pdf, html, other]
Title: Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations
Shen Zhang, Haojie Zhang, Jing Zhang, Xudong Zhang, Yimeng Zhuang, Jinting Wu
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2404.17107 (cross-list from eess.AS) [pdf, html, other]
Title: Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Comments: 4 pages, 1 figure, and 4 tables. Accepted by IEEE EMBC 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[161] arXiv:2404.17252 (cross-list from cs.LG) [pdf, html, other]
Title: Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition
Houtan Ghaffari, Paul Devos
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[162] arXiv:2404.17490 (cross-list from eess.AS) [pdf, html, other]
Title: The CARFAC v2 Cochlear Model in Matlab, NumPy, and JAX
Richard F. Lyon, Rob Schonberger, Malcolm Slaney, Mihajlo Velimirović, Honglin Yu
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[163] arXiv:2404.17552 (cross-list from eess.AS) [pdf, html, other]
Title: A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification
Rémi Uro, David Doukhan, Albert Rilliard, Laëtitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon, Antoine Laurent
Comments: Keywords:, semi-automatic processing, corpus creation, diarization, speaker identification, gender-balanced, age-balanced, speaker corpus, diachrony
Journal-ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 3271-3280, Marseille, 20-25 June 2022. European Language Resources Association (ELRA)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG); Sound (cs.SD)
[164] arXiv:2404.17810 (cross-list from eess.AS) [pdf, html, other]
Title: A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness
Oubaida Chouchane, Christoph Busch, Chiara Galdi, Nicholas Evans, Massimiliano Todisco
Comments: 8 pages, 7 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[165] arXiv:2404.17968 (cross-list from cs.CL) [pdf, html, other]
Title: Usefulness of Emotional Prosody in Neural Machine Translation
Charles Brazier, Jean-Luc Rouas
Comments: 5 pages, In Proceedings of the 11th International Conference on Speech Prosody (SP), Leiden, The Netherlands, 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2404.18501 (cross-list from eess.AS) [pdf, html, other]
Title: Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[167] arXiv:2404.19375 (cross-list from eess.AS) [pdf, other]
Title: Deep low-latency joint speech transmission and enhancement over a gaussian channel
Mohammad Bokaei, Jesper Jensen, Simon Doclo, Jan Østergaard
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[168] arXiv:2404.19615 (cross-list from cs.CV) [pdf, other]
Title: SemiPL: A Semi-supervised Method for Event Sound Source Localization
Yue Li, Baiqiao Yin, Jinfu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2404.19622 (cross-list from cs.HC) [pdf, html, other]
Title: Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Moëll, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson
Comments: 13+1 pages, 2 figures, accepted at the Human Motion Generation workshop (HuMoGen) at CVPR 2024
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2404.19723 (cross-list from eess.AS) [pdf, html, other]
Title: Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu
Comments: Accepted by IEEE Spoken Language Technology (SLT) Workshop 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Total of 170 entries : 1-50 51-100 101-150 151-170
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack