Audio and Speech Processing

Authors and titles for September 2022

Total of 184 entries : 1-50 51-100 101-150 151-184

Showing up to 50 entries per page: fewer | more | all

[151] arXiv:2209.11896 (cross-list from eess.IV) [pdf, other]: Title: Unsupervised active speaker detection in media content using cross-modal information

Rahul Sharma, Shrikanth Narayanan

Comments: Under review at IEEE Transactions on Image Processing

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[152] arXiv:2209.11905 (cross-list from cs.SD) [pdf, other]: Title: Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations

Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[153] arXiv:2209.11906 (cross-list from cs.SD) [pdf, other]: Title: Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

Ziqing Du, Kai Liu, Xucheng Wan, Huan Zhou

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[154] arXiv:2209.12043 (cross-list from cs.SD) [pdf, other]: Title: Unsupervised domain adaptation for speech recognition with unsupervised error correction

Long Mai, Julie Carson-Berndsen

Comments: Interspeech 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[155] arXiv:2209.12045 (cross-list from cs.SD) [pdf, other]: Title: Song Emotion Recognition: a Performance Comparison Between Audio Features and Artificial Neural Networks

Karen Rosero, Arthur Nicholas dos Santos, Pedro Benevenuto Valadares, Bruno Sanches Masiero

Comments: 7 pages,

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[156] arXiv:2209.12202 (cross-list from cs.SD) [pdf, other]: Title: Multimodal Exponentially Modified Gaussian Oscillators

Christopher Hahne

Comments: IEEE International Ultrasonic Symposium 2022

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[157] arXiv:2209.12549 (cross-list from cs.SD) [pdf, other]: Title: Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech

Yusuke Nakai, Yuki Saito, Kenta Udagawa, Hiroshi Saruwatari

Comments: 6 pages, 1 figure, Accepted for APSIPA ASC 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[158] arXiv:2209.12573 (cross-list from cs.SD) [pdf, html, other]: Title: Faked Speech Detection with Zero Prior Knowledge

Sahar Al Ajmi, Khizar Hayat, Alaa M. Al Obaidi, Naresh Kumar, Munaf Najmuldeen, Baptiste Magnier

Comments: 14 pages, 4 figures (6 if you count subfigures), 2 tables

Journal-ref: Discover Applied Sciences, vol. 6 (288), May 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[159] arXiv:2209.12602 (cross-list from cs.SD) [pdf, other]: Title: Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

Dávid Sztahó, Attila Fejes

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[160] arXiv:2209.12650 (cross-list from cs.CL) [pdf, other]: Title: Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models

Mohammed Rakib, Md. Ismail Hossain, Nabeel Mohammed, Fuad Rahman

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[161] arXiv:2209.12652 (cross-list from cs.CL) [pdf, other]: Title: AI-powered Language Assessment Tools for Dementia

Mahboobeh Parsapoor, Muhammad Raisul Alam, Alex Mihailidis

Comments: 27 Pages, 11 Tables, 16 Figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[162] arXiv:2209.12816 (cross-list from cs.CL) [pdf, other]: Title: Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers

Nurullah Sevim, Ege Ozan Özyedek, Furkan Şahinuç, Aykut Koç

Comments: 11 pages

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); General Literature (cs.GL); Audio and Speech Processing (eess.AS)
[163] arXiv:2209.12900 (cross-list from cs.SD) [pdf, other]: Title: The Efficacy of Self-Supervised Speech Models for Audio Representations

Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsu-Yuan Hsu, Hung-Yi Lee

Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[164] arXiv:2209.12942 (cross-list from cs.CL) [pdf, other]: Title: Cross-lingual Dysarthria Severity Classification for English, Korean, and Tamil

Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung

Comments: 9 pages, 4 figures, APSIPA 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2209.13385 (cross-list from q-bio.QM) [pdf, other]: Title: Beyond Heart Murmur Detection: Automatic Murmur Grading from Phonocardiogram

Andoni Elola, Elisabete Aramendi, Jorge Oliveira, Francesco Renna, Miguel T. Coimbra, Matthew A. Reyna, Reza Sameni, Gari D. Clifford, Ali Bahrami Rad

Subjects: Quantitative Methods (q-bio.QM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2209.13598 (cross-list from cs.SD) [pdf, other]: Title: Computing Melodic Templates in Oral Music Traditions

Sergey Bereg, José-Miguel Díaz-Báñez, Nadine Kroher, Inmaculada Ventura

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[167] arXiv:2209.13914 (cross-list from cs.SD) [pdf, other]: Title: An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis

Tobias Hallmen, Silvan Mertes, Dominik Schiller, Elisabeth André

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[168] arXiv:2209.13921 (cross-list from cs.HC) [pdf, other]: Title: Entangling Practice with Artistic and Educational Aims: Interviews on Technology-based Movement Sound Interactions

Victor Paredes (STMS), Jules Françoise (IRCAM), Frédéric Bevilacqua

Comments: New Interfaces for Musical Expression (NIME), Jun 2022, Auckland, New Zealand

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2209.14078 (cross-list from cs.SD) [pdf, other]: Title: MeWEHV: Mel and Wave Embeddings for Human Voice Tasks

Andrés Carofilis, Laura Fernández-Robles, Enrique Alegre, Eduardo Fidalgo

Comments: Submitted to IEEE Access

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2209.14098 (cross-list from cs.SD) [pdf, other]: Title: Deepfake audio detection by speaker verification

Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[171] arXiv:2209.14272 (cross-list from cs.LG) [pdf, html, other]: Title: Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, Björn W. Schuller

Comments: This work has been submitted to the IEEE for possible publication (Major Revision)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2209.14345 (cross-list from cs.SD) [pdf, other]: Title: Audio Barlow Twins: Self-Supervised Audio Representation Learning

Jonah Anton, Harry Coppock, Pancham Shukla, Bjorn W.Schuller

Comments: 15 pages (4 main text, rest references + appendices)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[173] arXiv:2209.14458 (cross-list from cs.SD) [pdf, other]: Title: The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling

Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, Jesse Engel

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[174] arXiv:2209.14842 (cross-list from cs.SD) [pdf, other]: Title: Classification of Vocal Bursts for ACII 2022 A-VB-Type Competition using Convolutional Neural Networks and Deep Acoustic Embeddings

Muhammad Shehram Shah Syed, Zafi Sherhan Syed, Abbas Syed

Comments: Report for our submission to the ACII 2022 Affective Vocal Bursts (A-VB) Competition

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175] arXiv:2209.14868 (cross-list from cs.SD) [pdf, other]: Title: ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition

Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris

Comments: This paper was presented in Interspeech 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[176] arXiv:2209.15167 (cross-list from cs.SD) [pdf, other]: Title: An empirical study of weakly supervised audio tagging embeddings for general audio representations

Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang

Comments: Odyssey 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2209.15200 (cross-list from cs.SD) [pdf, other]: Title: An efficient encoder-decoder architecture with top-down attention for speech separation

Kai Li, Runxuan Yang, Xiaolin Hu

Comments: Accepted by ICLR 2023; Code & Demos: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[178] arXiv:2209.15296 (cross-list from cs.SD) [pdf, other]: Title: Wake Word Detection Based on Res2Net

Qiuchen Yu, Ruohua Zhou

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2209.15325 (cross-list from cs.SD) [pdf, other]: Title: Symphony: Localizing Multiple Acoustic Sources with a Single Microphone Array

Weiguo Wang, Jinming Li, Yuan He, Yunhao Liu

Subjects: Sound (cs.SD); Networking and Internet Architecture (cs.NI); Audio and Speech Processing (eess.AS)
[180] arXiv:2209.15329 (cross-list from cs.CL) [pdf, other]: Title: SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Lirong Dai, Jinyu Li, Furu Wei

Comments: We have corrected the errors in the pre-training data for SpeechLM-P Base models, new results are updated

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[181] arXiv:2209.15334 (cross-list from cs.SD) [pdf, other]: Title: ChordMics: Acoustic Signal Purification with Distributed Microphones

Weiguo Wang, Jinming Li, Meng Jin, Yuan He

Subjects: Sound (cs.SD); Networking and Internet Architecture (cs.NI); Audio and Speech Processing (eess.AS)
[182] arXiv:2209.15352 (cross-list from cs.SD) [pdf, other]: Title: AudioGen: Textually Guided Audio Generation

Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi

Comments: Accepted to ICLR 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[183] arXiv:2209.15483 (cross-list from cs.CL) [pdf, other]: Title: Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[184] arXiv:2209.15575 (cross-list from cs.SD) [pdf, other]: Title: Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio

Yan Gao, Javier Fernandez-Marques, Titouan Parcollet, Pedro P. B. de Gusmao, Nicholas D. Lane

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Total of 184 entries : 1-50 51-100 101-150 151-184

Showing up to 50 entries per page: fewer | more | all