Audio and Speech Processing

Authors and titles for April 2021

Total of 266 entries

Showing up to 2000 entries per page: fewer | more | all

[226] arXiv:2104.10507 (cross-list from cs.CL) [pdf, other]: Title: On Sampling-Based Training Criteria for Neural Language Modeling

Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney

Comments: Accepted at INTERSPEECH 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[227] arXiv:2104.10747 (cross-list from cs.CL) [pdf, other]: Title: Accented Speech Recognition: A Survey

Arthur Hinsvark (1), Natalie Delworth (1), Miguel Del Rio (1), Quinten McNamara (1), Joshua Dong (1), Ryan Westerman (1), Michelle Huang (1), Joseph Palakapilly (1), Jennifer Drexler (1), Ilya Pirkin (1), Nishchal Bhandari (1), Miguel Jette (1) ((1) <a href="http://Rev.com" rel="external noopener nofollow" class="link-external link-http">this http URL</a>)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[228] arXiv:2104.11051 (cross-list from cs.SD) [pdf, other]: Title: Protecting gender and identity with disentangled speech representations

Dimitrios Stoidis, Andrea Cavallaro

Comments: 5 pages, 2 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[229] arXiv:2104.11116 (cross-list from cs.CV) [pdf, other]: Title: Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu

Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. Code and models are available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[230] arXiv:2104.11127 (cross-list from cs.CL) [pdf, other]: Title: Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network

Janne Pylkkönen (1), Antti Ukkonen (1 and 2), Juho Kilpikoski (1), Samu Tamminen (1), Hannes Heikinheimo (1) ((1) Speechly, (2) Department of Computer Science, University of Helsinki, Finland)

Comments: 5 pages, 2 figures. Accepted to Interspeech 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[231] arXiv:2104.11347 (cross-list from cs.SD) [pdf, other]: Title: Restoring degraded speech via a modified diffusion model

Jianwei Zhang, Suren Jayasuriya, Visar Berisha

Journal-ref: Proc. Interspeech 2021, 221-225, 2021)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[232] arXiv:2104.11348 (cross-list from cs.CL) [pdf, other]: Title: Earnings-21: A Practical Benchmark for ASR in the Wild

Miguel Del Rio, Natalie Delworth, Ryan Westerman, Michelle Huang, Nishchal Bhandari, Joseph Palakapilly, Quinten McNamara, Joshua Dong, Piotr Zelasko, Miguel Jette

Comments: Accepted to INTERSPEECH 2021. June 15 2021: Addressing the comments of reviewers and updating the results of our internal ESPNet model. The results do not change our conclusions. April 28th, 2021: We found and resolved an issue in our experimental evaluation that scored the LibriSpeech model at ~20% worse relative WER than the actual WER. The updated results do not affect our conclusions

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[233] arXiv:2104.11395 (cross-list from cs.SD) [pdf, other]: Title: Infant Vocal Tract Development Analysis and Diagnosis by Cry Signals with CNN Age Classification

Chunyan Ji, Yi Pan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[234] arXiv:2104.11462 (cross-list from cs.CL) [pdf, other]: Title: LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

Comments: Will be presented at Interspeech 2021

Journal-ref: Proc. Interspeech 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[235] arXiv:2104.11532 (cross-list from cs.SD) [pdf, other]: Title: 3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces

László Tóth, Amin Honarmandi Shandiz

Comments: 10 pages, 2 tables , 3 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[236] arXiv:2104.11587 (cross-list from cs.SD) [pdf, other]: Title: ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

Comments: submitted IJCNN 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[237] arXiv:2104.11598 (cross-list from cs.SD) [pdf, other]: Title: Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

Yide Yu, Amin Honarmandi Shandiz, László Tóth

Comments: 6 pages. 4 tables, 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[238] arXiv:2104.11601 (cross-list from cs.SD) [pdf, other]: Title: Improving Neural Silent Speech Interface Models by Adversarial Training

Amin Honarmandi Shandiz, László Tóth, Gábor Gosztolya, Alexandra Markó, Tamás Gábor Csapó

Comments: 11 pages, 3 tables, 2 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[239] arXiv:2104.11629 (cross-list from cs.SD) [pdf, other]: Title: DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data

Shahin Amiriparian (1), Tobias Hübner (1), Maurice Gerczuk (1), Sandra Ottl (1), Björn W. Schuller (1,2) ((1) EIHW -- Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany, (2) GLAM -- Group on Language, Audio, and Music, Imperial College London, UK)

Comments: 5 pages, 1 figure

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[240] arXiv:2104.11673 (cross-list from cs.SD) [pdf, other]: Title: Deep Learning Based Assessment of Synthetic Speech Naturalness

Gabriel Mittag, Sebastian Möller

Comments: Late upload, presented at Interspeech 2020

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[241] arXiv:2104.11710 (cross-list from cs.SD) [pdf, other]: Title: Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation

Marco Gaido, Matteo Negri, Mauro Cettolo, Marco Turchi

Comments: Accepted to ICNLSP 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[242] arXiv:2104.11880 (cross-list from cs.SD) [pdf, other]: Title: Music Embedding: A Tool for Incorporating Music Theory into Computational Music Applications

SeyyedPooya HekmatiAthar, Mohd Anwar

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[243] arXiv:2104.11946 (cross-list from cs.LG) [pdf, other]: Title: Aligned Contrastive Predictive Coding

Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski

Comments: Published in Interspeech 2021

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[244] arXiv:2104.11984 (cross-list from cs.SD) [pdf, other]: Title: MusCaps: Generating Captions for Music Audio

Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas

Comments: Accepted to IJCNN 2021 for the Special Session on Representation Learning for Audio, Speech, and Music Processing

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[245] arXiv:2104.12159 (cross-list from cs.SD) [pdf, other]: Title: An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion

Sandipan Dhar, Nanda Dulal Jana, Swagatam Das

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[246] arXiv:2104.12292 (cross-list from cs.SD) [pdf, other]: Title: Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis

Erica Cooper, Xin Wang, Junichi Yamagishi

Comments: In the proceedings of ISCA Speech Synthesis Workshop 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[247] arXiv:2104.12359 (cross-list from cs.SD) [pdf, other]: Title: Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain

Rongzhi Gu, Shi-Xiong Zhang, Yuexian Zou, Dong Yu

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[248] arXiv:2104.12432 (cross-list from cs.SD) [pdf, other]: Title: Generation of musical patterns through operads

Samuele Giraudo

Comments: 10 pages

Journal-ref: Journ\'ees d'informatique musicale, 2020

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Combinatorics (math.CO)
[249] arXiv:2104.12462 (cross-list from cs.SD) [pdf, other]: Title: Points2Sound: From mono to binaural audio using 3D point cloud scenes

Francesc Lluís, Vasileios Chatziioannou, Alex Hofmann

Comments: Code, data, and listening examples: this https URL

Journal-ref: EURASIP Journal on Audio, Speech, and Music Processing 2022 (1), 1-15

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[250] arXiv:2104.12693 (cross-list from cs.SD) [pdf, other]: Title: Identifying Actions for Sound Event Classification

Benjamin Elizalde, Radu Revutchi, Samarjit Das, Bhiksha Raj, Ian Lane, Laurie M. Heller

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[251] arXiv:2104.12807 (cross-list from cs.SD) [pdf, other]: Title: Multimodal Self-Supervised Learning of General Audio Representations

Luyu Wang, Pauline Luc, Adria Recasens, Jean-Baptiste Alayrac, Aaron van den Oord

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[252] arXiv:2104.12922 (cross-list from cs.SD) [pdf, other]: Title: One Billion Audio Sounds from GPU-enabled Modular Synthesis

Joseph Turian, Jordie Shier, George Tzanetakis, Kirk McNally, Max Henry

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[253] arXiv:2104.13002 (cross-list from cs.SD) [pdf, other]: Title: DPT-FSNet: Dual-path Transformer Based Full-band and Sub-band Fusion Network for Speech Enhancement

Feng Dang, Hangting Chen, Pengyuan Zhang

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[254] arXiv:2104.13040 (cross-list from cs.SD) [pdf, html, other]: Title: The music box operad: Random generation of musical phrases from patterns

Samuele Giraudo

Comments: 42 pages. Extended version of arXiv:2104.12432

Journal-ref: Journal of Creative Music Systems 8, Issue 1, 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Combinatorics (math.CO); Quantum Algebra (math.QA)
[255] arXiv:2104.13056 (cross-list from cs.SD) [pdf, other]: Title: Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

Dimos Makris, Kat R. Agres, Dorien Herremans

Comments: Accepted for the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021 (virtual)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[256] arXiv:2104.13225 (cross-list from cs.AI) [pdf, other]: Title: Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques

Grzegorz Chrupała

Journal-ref: Journal of Artificial Intelligence Research 73 (2022) 673-707

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[257] arXiv:2104.13266 (cross-list from cs.SD) [pdf, other]: Title: Batebit Controller: Popularizing Digital Musical Instruments Development Process

Filipe Calegario, João Tragtenberg, Giordano Cabral, Geber Ramalho

Comments: 2 pages, 2 figures, 17th Brazilian Symposium on Computer Music

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[258] arXiv:2104.13276 (cross-list from cs.SD) [pdf, other]: Title: MULTIMODAL ANALYSIS: Informed content estimation and audio source separation

Gabriel Meseguer-Brocal

Comments: Ph.D. dissertation. Thesis supervisor: Geoffroy Peeters. Jury:Laurent Girin, Gaël Richard, Rachel Bittner, Elena Cabrio, Bruno Gas, Perfecto Herrera Boyer, Antoine Liutkus

Subjects: Sound (cs.SD); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[259] arXiv:2104.13332 (cross-list from cs.LG) [pdf, other]: Title: End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic

Comments: Published in IEEE Transactions on Cybernetics (April 2022)

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[260] arXiv:2104.14067 (cross-list from cs.SD) [pdf, other]: Title: Improving Fairness in Speaker Recognition

Gianni Fenu, Giacomo Medda, Mirko Marras, Giacomo Meloni

Comments: Accepted at the 2020 European Symposium on Software Engineering (ESSE 2020)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[261] arXiv:2104.14297 (cross-list from cs.SD) [pdf, other]: Title: End-to-End Speech Recognition from Federated Acoustic Models

Yan Gao, Titouan Parcollet, Salah Zaiem, Javier Fernandez-Marques, Pedro P. B. de Gusmao, Daniel J. Beutel, Nicholas D. Lane

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[262] arXiv:2104.14346 (cross-list from cs.CL) [pdf, other]: Title: Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[263] arXiv:2104.14468 (cross-list from cs.SD) [pdf, other]: Title: Star DGT: a Robust Gabor Transform for Speech Denoising

Vasiliki Kouni, Holger Rauhut, Theoharis Theoharis

Comments: arXiv admin note: text overlap with arXiv:2103.11233

Subjects: Sound (cs.SD); Information Theory (cs.IT); Audio and Speech Processing (eess.AS)
[264] arXiv:2104.14470 (cross-list from cs.CL) [pdf, other]: Title: Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

Ha Nguyen, Yannick Estève, Laurent Besacier

Comments: Accepted for presentation at Interspeech 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[265] arXiv:2104.14802 (cross-list from cs.MM) [pdf, other]: Title: Dance Generation with Style Embedding: Learning and Transferring Latent Representations of Dance Styles

Xinjian Zhang, Yi Xu, Su Yang, Longwen Gao, Huyang Sun

Comments: Submit to IJCAI-21

Subjects: Multimedia (cs.MM); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[266] arXiv:2104.14830 (cross-list from cs.CL) [pdf, other]: Title: Scaling End-to-End Models for Large-Scale Multilingual ASR

Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai

Comments: ASRU 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 266 entries

Showing up to 2000 entries per page: fewer | more | all