Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for September 2022

Total of 184 entries : 1-50 51-100 101-150 151-184
Showing up to 50 entries per page: fewer | more | all
[151] arXiv:2209.11896 (cross-list from eess.IV) [pdf, other]
Title: Unsupervised active speaker detection in media content using cross-modal information
Rahul Sharma, Shrikanth Narayanan
Comments: Under review at IEEE Transactions on Image Processing
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[152] arXiv:2209.11905 (cross-list from cs.SD) [pdf, other]
Title: Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations
Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[153] arXiv:2209.11906 (cross-list from cs.SD) [pdf, other]
Title: Joint Speech Activity and Overlap Detection with Multi-Exit Architecture
Ziqing Du, Kai Liu, Xucheng Wan, Huan Zhou
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[154] arXiv:2209.12043 (cross-list from cs.SD) [pdf, other]
Title: Unsupervised domain adaptation for speech recognition with unsupervised error correction
Long Mai, Julie Carson-Berndsen
Comments: Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[155] arXiv:2209.12045 (cross-list from cs.SD) [pdf, other]
Title: Song Emotion Recognition: a Performance Comparison Between Audio Features and Artificial Neural Networks
Karen Rosero, Arthur Nicholas dos Santos, Pedro Benevenuto Valadares, Bruno Sanches Masiero
Comments: 7 pages,
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[156] arXiv:2209.12202 (cross-list from cs.SD) [pdf, other]
Title: Multimodal Exponentially Modified Gaussian Oscillators
Christopher Hahne
Comments: IEEE International Ultrasonic Symposium 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[157] arXiv:2209.12549 (cross-list from cs.SD) [pdf, other]
Title: Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Yusuke Nakai, Yuki Saito, Kenta Udagawa, Hiroshi Saruwatari
Comments: 6 pages, 1 figure, Accepted for APSIPA ASC 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[158] arXiv:2209.12573 (cross-list from cs.SD) [pdf, html, other]
Title: Faked Speech Detection with Zero Prior Knowledge
Sahar Al Ajmi, Khizar Hayat, Alaa M. Al Obaidi, Naresh Kumar, Munaf Najmuldeen, Baptiste Magnier
Comments: 14 pages, 4 figures (6 if you count subfigures), 2 tables
Journal-ref: Discover Applied Sciences, vol. 6 (288), May 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[159] arXiv:2209.12602 (cross-list from cs.SD) [pdf, other]
Title: Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings
Dávid Sztahó, Attila Fejes
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[160] arXiv:2209.12650 (cross-list from cs.CL) [pdf, other]
Title: Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models
Mohammed Rakib, Md. Ismail Hossain, Nabeel Mohammed, Fuad Rahman
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[161] arXiv:2209.12652 (cross-list from cs.CL) [pdf, other]
Title: AI-powered Language Assessment Tools for Dementia
Mahboobeh Parsapoor, Muhammad Raisul Alam, Alex Mihailidis
Comments: 27 Pages, 11 Tables, 16 Figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[162] arXiv:2209.12816 (cross-list from cs.CL) [pdf, other]
Title: Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Nurullah Sevim, Ege Ozan Özyedek, Furkan Şahinuç, Aykut Koç
Comments: 11 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); General Literature (cs.GL); Audio and Speech Processing (eess.AS)
[163] arXiv:2209.12900 (cross-list from cs.SD) [pdf, other]
Title: The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsu-Yuan Hsu, Hung-Yi Lee
Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[164] arXiv:2209.12942 (cross-list from cs.CL) [pdf, other]
Title: Cross-lingual Dysarthria Severity Classification for English, Korean, and Tamil
Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung
Comments: 9 pages, 4 figures, APSIPA 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2209.13385 (cross-list from q-bio.QM) [pdf, other]
Title: Beyond Heart Murmur Detection: Automatic Murmur Grading from Phonocardiogram
Andoni Elola, Elisabete Aramendi, Jorge Oliveira, Francesco Renna, Miguel T. Coimbra, Matthew A. Reyna, Reza Sameni, Gari D. Clifford, Ali Bahrami Rad
Subjects: Quantitative Methods (q-bio.QM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2209.13598 (cross-list from cs.SD) [pdf, other]
Title: Computing Melodic Templates in Oral Music Traditions
Sergey Bereg, José-Miguel Díaz-Báñez, Nadine Kroher, Inmaculada Ventura
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[167] arXiv:2209.13914 (cross-list from cs.SD) [pdf, other]
Title: An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Tobias Hallmen, Silvan Mertes, Dominik Schiller, Elisabeth André
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[168] arXiv:2209.13921 (cross-list from cs.HC) [pdf, other]
Title: Entangling Practice with Artistic and Educational Aims: Interviews on Technology-based Movement Sound Interactions
Victor Paredes (STMS), Jules Françoise (IRCAM), Frédéric Bevilacqua
Comments: New Interfaces for Musical Expression (NIME), Jun 2022, Auckland, New Zealand
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2209.14078 (cross-list from cs.SD) [pdf, other]
Title: MeWEHV: Mel and Wave Embeddings for Human Voice Tasks
Andrés Carofilis, Laura Fernández-Robles, Enrique Alegre, Eduardo Fidalgo
Comments: Submitted to IEEE Access
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2209.14098 (cross-list from cs.SD) [pdf, other]
Title: Deepfake audio detection by speaker verification
Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[171] arXiv:2209.14272 (cross-list from cs.LG) [pdf, html, other]
Title: Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results
Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, Björn W. Schuller
Comments: This work has been submitted to the IEEE for possible publication (Major Revision)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2209.14345 (cross-list from cs.SD) [pdf, other]
Title: Audio Barlow Twins: Self-Supervised Audio Representation Learning
Jonah Anton, Harry Coppock, Pancham Shukla, Bjorn W.Schuller
Comments: 15 pages (4 main text, rest references + appendices)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[173] arXiv:2209.14458 (cross-list from cs.SD) [pdf, other]
Title: The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling
Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, Jesse Engel
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[174] arXiv:2209.14842 (cross-list from cs.SD) [pdf, other]
Title: Classification of Vocal Bursts for ACII 2022 A-VB-Type Competition using Convolutional Neural Networks and Deep Acoustic Embeddings
Muhammad Shehram Shah Syed, Zafi Sherhan Syed, Abbas Syed
Comments: Report for our submission to the ACII 2022 Affective Vocal Bursts (A-VB) Competition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175] arXiv:2209.14868 (cross-list from cs.SD) [pdf, other]
Title: ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris
Comments: This paper was presented in Interspeech 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[176] arXiv:2209.15167 (cross-list from cs.SD) [pdf, other]
Title: An empirical study of weakly supervised audio tagging embeddings for general audio representations
Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang
Comments: Odyssey 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2209.15200 (cross-list from cs.SD) [pdf, other]
Title: An efficient encoder-decoder architecture with top-down attention for speech separation
Kai Li, Runxuan Yang, Xiaolin Hu
Comments: Accepted by ICLR 2023; Code & Demos: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[178] arXiv:2209.15296 (cross-list from cs.SD) [pdf, other]
Title: Wake Word Detection Based on Res2Net
Qiuchen Yu, Ruohua Zhou
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2209.15325 (cross-list from cs.SD) [pdf, other]
Title: Symphony: Localizing Multiple Acoustic Sources with a Single Microphone Array
Weiguo Wang, Jinming Li, Yuan He, Yunhao Liu
Subjects: Sound (cs.SD); Networking and Internet Architecture (cs.NI); Audio and Speech Processing (eess.AS)
[180] arXiv:2209.15329 (cross-list from cs.CL) [pdf, other]
Title: SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Lirong Dai, Jinyu Li, Furu Wei
Comments: We have corrected the errors in the pre-training data for SpeechLM-P Base models, new results are updated
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[181] arXiv:2209.15334 (cross-list from cs.SD) [pdf, other]
Title: ChordMics: Acoustic Signal Purification with Distributed Microphones
Weiguo Wang, Jinming Li, Meng Jin, Yuan He
Subjects: Sound (cs.SD); Networking and Internet Architecture (cs.NI); Audio and Speech Processing (eess.AS)
[182] arXiv:2209.15352 (cross-list from cs.SD) [pdf, other]
Title: AudioGen: Textually Guided Audio Generation
Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi
Comments: Accepted to ICLR 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[183] arXiv:2209.15483 (cross-list from cs.CL) [pdf, other]
Title: Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling
Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[184] arXiv:2209.15575 (cross-list from cs.SD) [pdf, other]
Title: Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio
Yan Gao, Javier Fernandez-Marques, Titouan Parcollet, Pedro P. B. de Gusmao, Nicholas D. Lane
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Total of 184 entries : 1-50 51-100 101-150 151-184
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack