Sound

Authors and titles for December 2022

Total of 137 entries : 1-50 51-100 101-137

Showing up to 50 entries per page: fewer | more | all

[101] arXiv:2212.04684 (cross-list from cs.LG) [pdf, other]: Title: Machine Learning-based Classification of Birds through Birdsong

Yueying Chang, Richard O. Sinnott

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102] arXiv:2212.04831 (cross-list from eess.AS) [pdf, other]: Title: Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Huajian Fang, Timo Gerkmann

Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[103] arXiv:2212.04930 (cross-list from eess.AS) [pdf, other]: Title: DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech

Kazuki Kawamura, Jun Rekimoto

Journal-ref: 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
[104] arXiv:2212.05008 (cross-list from eess.AS) [pdf, other]: Title: Hyperbolic Audio Source Separation

Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux

Comments: Submitted to ICASSP 2023, Demo page: this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[105] arXiv:2212.05271 (cross-list from eess.AS) [pdf, other]: Title: GPU-accelerated Guided Source Separation for Meeting Transcription

Desh Raj, Daniel Povey, Sanjeev Khudanpur

Comments: 7 pages, 4 figures. To appear at InterSpeech 2023. Code available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[106] arXiv:2212.05805 (cross-list from cs.CL) [pdf, other]: Title: Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma

Comments: 4 pages, 3 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2212.05922 (cross-list from cs.CV) [pdf, html, other]: Title: Audiovisual Masked Autoencoders

Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab

Comments: ICCV 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[108] arXiv:2212.06246 (cross-list from cs.LG) [pdf, other]: Title: Jointly Learning Visual and Auditory Speech Representations from Raw Data

Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic

Comments: ICLR 2023. Code: this https URL

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[109] arXiv:2212.07136 (cross-list from eess.SP) [pdf, other]: Title: Event-driven Spectrotemporal Feature Extraction and Classification using a Silicon Cochlea Model

Ying Xu, Samalika Perera, Yeshwanth Bethi, Saeed Afshar, André van Schaik

Comments: 12 pages, 8 figures

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[110] arXiv:2212.07327 (cross-list from eess.AS) [pdf, other]: Title: Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux

Comments: Submitted to IEEE TASLP (In review), 13 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[111] arXiv:2212.07525 (cross-list from cs.LG) [pdf, other]: Title: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2212.07570 (cross-list from eess.AS) [pdf, other]: Title: DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Dongheon Lee, Jung-Woo Choi

Comments: 5 pages, 2 figures, 3 tables. This article has been published by IEEE Signal Processing Letters. This version is the authors' version and may vary from the final publication in details

Journal-ref: IEEE Signal Processing Letters, Vol. 30, pp. 155-159, 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[113] arXiv:2212.07939 (cross-list from cs.CL) [pdf, other]: Title: RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis

Shinhyeok Oh, HyeongRae Noh, Yoonseok Hong, Insoo Oh

Comments: Accepted to AAAI 2023

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114] arXiv:2212.07983 (cross-list from cs.CV) [pdf, other]: Title: Vision Transformers are Parameter-Efficient Audio-Visual Learners

Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius

Comments: CVPR 2023 Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115] arXiv:2212.08055 (cross-list from cs.CL) [pdf, other]: Title: UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

Comments: ACL 2023 (main conference)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2212.08071 (cross-list from cs.CV) [pdf, other]: Title: MAViL: Masked Audio-Video Learners

Po-Yao Huang, Vasu Sharma, Hu Xu, Chaitanya Ryali, Haoqi Fan, Yanghao Li, Shang-Wen Li, Gargi Ghosh, Jitendra Malik, Christoph Feichtenhofer

Comments: Technical report

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2212.08378 (cross-list from cs.LG) [pdf, other]: Title: Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning

Alex Tamkin, Margalit Glasgow, Xiluo He, Noah Goodman

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2212.08489 (cross-list from cs.CL) [pdf, other]: Title: Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju

Comments: Accepted in ICASSP 2023

Journal-ref: ICASSP 2023

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:2212.08911 (cross-list from cs.CL) [pdf, other]: Title: AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation

Xingshan Zeng, Liangyou Li, Qun Liu

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2212.09058 (cross-list from eess.AS) [pdf, other]: Title: BEATs: Audio Pre-Training with Acoustic Tokenizers

Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[121] arXiv:2212.09359 (cross-list from cs.CL) [pdf, other]: Title: WACO: Word-Aligned Contrastive Learning for Speech Translation

Siqi Ouyang, Rong Ye, Lei Li

Comments: ACL 2023 Poster

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2212.09553 (cross-list from cs.CL) [pdf, other]: Title: Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna

Comments: ICML 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123] arXiv:2212.09699 (cross-list from cs.CL) [pdf, other]: Title: SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Ioannis Tsiamas, José A. R. Fonollosa, Marta R. Costa-jussà

Comments: EMNLP 2023 (Findings)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124] arXiv:2212.09982 (cross-list from cs.CL) [pdf, other]: Title: Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Mozhdeh Gheini, Tatiana Likhomanenko, Matthias Sperber, Hendra Setiawan

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2212.11377 (cross-list from eess.AS) [pdf, other]: Title: ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[126] arXiv:2212.11772 (cross-list from cs.CL) [pdf, other]: Title: A Self-Adjusting Fusion Representation Learning Model for Unaligned Text-Audio Sequences

Kaicheng Yang, Ruxuan Zhang, Hua Xu, Kai Gao

Comments: 8 pages

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127] arXiv:2212.11851 (cross-list from eess.AS) [pdf, other]: Title: StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann

Comments: Published in IEEE/ACM Transactions on Audio, Speech and Language Processing, 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[128] arXiv:2212.12048 (cross-list from cs.CL) [pdf, other]: Title: Pushing the performances of ASR models on English and Spanish accents

Pooja Chitkara, Morgane Riviere, Jade Copet, Frank Zhang, Yatharth Saraf

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2212.12497 (cross-list from nlin.PS) [pdf, other]: Title: Fractal Patterns in Music

John McDonough, Andrzej Herczyński

Comments: 27 pages, 12 figures

Subjects: Pattern Formation and Solitons (nlin.PS); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2212.13015 (cross-list from cs.CL) [pdf, other]: Title: Skit-S2I: An Indian Accented Speech to Intent dataset

Shangeth Rajaa, Swaraj Dalmia, Kumarmanas Nethil

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131] arXiv:2212.13378 (cross-list from cs.CL) [pdf, other]: Title: Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Tomer Wullach, Shlomo E. Chazan

Comments: Accepted to AAAI 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132] arXiv:2212.13442 (cross-list from eess.IV) [pdf, other]: Title: Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research

Thomas Robotham, Ashutosh Singla, Olli S. Rummukainen, Alexander Raake, Emanuël A. P. Habets

Comments: 6 pages, 2 figures, accepted and presented at the 2022 14th International Conference on Quality of Multimedia Experience (QoMEX). Database is publicly accessible at this https URL

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2212.13703 (cross-list from eess.AS) [pdf, other]: Title: Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Comments: 5 pages, 4 figures, 2 tables, accepted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[134] arXiv:2212.13917 (cross-list from cs.HC) [pdf, other]: Title: Multimodal Emotion Recognition among Couples from Lab Settings to Daily Life using Smartwatches

George Boateng

Comments: PhD Thesis, 2022 - ETH Zurich

Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[135] arXiv:2212.14149 (cross-list from cs.LG) [pdf, other]: Title: Macro-block dropout for improved regularization in training end-to-end speech recognition models

Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung

Comments: Accepted for presentation at The 2022 IEEE Spoken Language Technology Workshop (SLT 2022)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2212.14227 (cross-list from eess.AS) [pdf, other]: Title: StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models

Yinghao Aaron Li, Cong Han, Nima Mesgarani

Comments: SLT 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[137] arXiv:2212.14518 (cross-list from eess.AS) [pdf, other]: Title: ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic

Comments: 13 pages, 5 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)

Total of 137 entries : 1-50 51-100 101-137

Showing up to 50 entries per page: fewer | more | all