Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for April 2021

Total of 266 entries : 1-50 101-150 151-200 201-250 251-266
Showing up to 50 entries per page: fewer | more | all
[251] arXiv:2104.12807 (cross-list from cs.SD) [pdf, other]
Title: Multimodal Self-Supervised Learning of General Audio Representations
Luyu Wang, Pauline Luc, Adria Recasens, Jean-Baptiste Alayrac, Aaron van den Oord
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[252] arXiv:2104.12922 (cross-list from cs.SD) [pdf, other]
Title: One Billion Audio Sounds from GPU-enabled Modular Synthesis
Joseph Turian, Jordie Shier, George Tzanetakis, Kirk McNally, Max Henry
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[253] arXiv:2104.13002 (cross-list from cs.SD) [pdf, other]
Title: DPT-FSNet: Dual-path Transformer Based Full-band and Sub-band Fusion Network for Speech Enhancement
Feng Dang, Hangting Chen, Pengyuan Zhang
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[254] arXiv:2104.13040 (cross-list from cs.SD) [pdf, html, other]
Title: The music box operad: Random generation of musical phrases from patterns
Samuele Giraudo
Comments: 42 pages. Extended version of arXiv:2104.12432
Journal-ref: Journal of Creative Music Systems 8, Issue 1, 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Combinatorics (math.CO); Quantum Algebra (math.QA)
[255] arXiv:2104.13056 (cross-list from cs.SD) [pdf, other]
Title: Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework
Dimos Makris, Kat R. Agres, Dorien Herremans
Comments: Accepted for the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021 (virtual)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[256] arXiv:2104.13225 (cross-list from cs.AI) [pdf, other]
Title: Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques
Grzegorz Chrupała
Journal-ref: Journal of Artificial Intelligence Research 73 (2022) 673-707
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[257] arXiv:2104.13266 (cross-list from cs.SD) [pdf, other]
Title: Batebit Controller: Popularizing Digital Musical Instruments Development Process
Filipe Calegario, João Tragtenberg, Giordano Cabral, Geber Ramalho
Comments: 2 pages, 2 figures, 17th Brazilian Symposium on Computer Music
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[258] arXiv:2104.13276 (cross-list from cs.SD) [pdf, other]
Title: MULTIMODAL ANALYSIS: Informed content estimation and audio source separation
Gabriel Meseguer-Brocal
Comments: Ph.D. dissertation. Thesis supervisor: Geoffroy Peeters. Jury:Laurent Girin, Gaël Richard, Rachel Bittner, Elena Cabrio, Bruno Gas, Perfecto Herrera Boyer, Antoine Liutkus
Subjects: Sound (cs.SD); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[259] arXiv:2104.13332 (cross-list from cs.LG) [pdf, other]
Title: End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic
Comments: Published in IEEE Transactions on Cybernetics (April 2022)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[260] arXiv:2104.14067 (cross-list from cs.SD) [pdf, other]
Title: Improving Fairness in Speaker Recognition
Gianni Fenu, Giacomo Medda, Mirko Marras, Giacomo Meloni
Comments: Accepted at the 2020 European Symposium on Software Engineering (ESSE 2020)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[261] arXiv:2104.14297 (cross-list from cs.SD) [pdf, other]
Title: End-to-End Speech Recognition from Federated Acoustic Models
Yan Gao, Titouan Parcollet, Salah Zaiem, Javier Fernandez-Marques, Pedro P. B. de Gusmao, Daniel J. Beutel, Nicholas D. Lane
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[262] arXiv:2104.14346 (cross-list from cs.CL) [pdf, other]
Title: Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[263] arXiv:2104.14468 (cross-list from cs.SD) [pdf, other]
Title: Star DGT: a Robust Gabor Transform for Speech Denoising
Vasiliki Kouni, Holger Rauhut, Theoharis Theoharis
Comments: arXiv admin note: text overlap with arXiv:2103.11233
Subjects: Sound (cs.SD); Information Theory (cs.IT); Audio and Speech Processing (eess.AS)
[264] arXiv:2104.14470 (cross-list from cs.CL) [pdf, other]
Title: Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
Ha Nguyen, Yannick Estève, Laurent Besacier
Comments: Accepted for presentation at Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[265] arXiv:2104.14802 (cross-list from cs.MM) [pdf, other]
Title: Dance Generation with Style Embedding: Learning and Transferring Latent Representations of Dance Styles
Xinjian Zhang, Yi Xu, Su Yang, Longwen Gao, Huyang Sun
Comments: Submit to IJCAI-21
Subjects: Multimedia (cs.MM); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[266] arXiv:2104.14830 (cross-list from cs.CL) [pdf, other]
Title: Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai
Comments: ASRU 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 266 entries : 1-50 101-150 151-200 201-250 251-266
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack