Sound

Authors and titles for December 2022

Total of 137 entries : 1-25 26-50 51-75 76-100 101-125 126-137

Showing up to 25 entries per page: fewer | more | all

[101] arXiv:2212.04684 (cross-list from cs.LG) [pdf, other]: Title: Machine Learning-based Classification of Birds through Birdsong

Yueying Chang, Richard O. Sinnott

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102] arXiv:2212.04831 (cross-list from eess.AS) [pdf, other]: Title: Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Huajian Fang, Timo Gerkmann

Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[103] arXiv:2212.04930 (cross-list from eess.AS) [pdf, other]: Title: DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech

Kazuki Kawamura, Jun Rekimoto

Journal-ref: 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
[104] arXiv:2212.05008 (cross-list from eess.AS) [pdf, other]: Title: Hyperbolic Audio Source Separation

Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux

Comments: Submitted to ICASSP 2023, Demo page: this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[105] arXiv:2212.05271 (cross-list from eess.AS) [pdf, other]: Title: GPU-accelerated Guided Source Separation for Meeting Transcription

Desh Raj, Daniel Povey, Sanjeev Khudanpur

Comments: 7 pages, 4 figures. To appear at InterSpeech 2023. Code available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[106] arXiv:2212.05805 (cross-list from cs.CL) [pdf, other]: Title: Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma

Comments: 4 pages, 3 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2212.05922 (cross-list from cs.CV) [pdf, html, other]: Title: Audiovisual Masked Autoencoders

Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab

Comments: ICCV 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[108] arXiv:2212.06246 (cross-list from cs.LG) [pdf, other]: Title: Jointly Learning Visual and Auditory Speech Representations from Raw Data

Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic

Comments: ICLR 2023. Code: this https URL

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[109] arXiv:2212.07136 (cross-list from eess.SP) [pdf, other]: Title: Event-driven Spectrotemporal Feature Extraction and Classification using a Silicon Cochlea Model

Ying Xu, Samalika Perera, Yeshwanth Bethi, Saeed Afshar, André van Schaik

Comments: 12 pages, 8 figures

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[110] arXiv:2212.07327 (cross-list from eess.AS) [pdf, other]: Title: Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux

Comments: Submitted to IEEE TASLP (In review), 13 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[111] arXiv:2212.07525 (cross-list from cs.LG) [pdf, other]: Title: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2212.07570 (cross-list from eess.AS) [pdf, other]: Title: DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Dongheon Lee, Jung-Woo Choi

Comments: 5 pages, 2 figures, 3 tables. This article has been published by IEEE Signal Processing Letters. This version is the authors' version and may vary from the final publication in details

Journal-ref: IEEE Signal Processing Letters, Vol. 30, pp. 155-159, 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[113] arXiv:2212.07939 (cross-list from cs.CL) [pdf, other]: Title: RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis

Shinhyeok Oh, HyeongRae Noh, Yoonseok Hong, Insoo Oh

Comments: Accepted to AAAI 2023

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114] arXiv:2212.07983 (cross-list from cs.CV) [pdf, other]: Title: Vision Transformers are Parameter-Efficient Audio-Visual Learners

Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius

Comments: CVPR 2023 Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115] arXiv:2212.08055 (cross-list from cs.CL) [pdf, other]: Title: UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

Comments: ACL 2023 (main conference)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2212.08071 (cross-list from cs.CV) [pdf, other]: Title: MAViL: Masked Audio-Video Learners

Po-Yao Huang, Vasu Sharma, Hu Xu, Chaitanya Ryali, Haoqi Fan, Yanghao Li, Shang-Wen Li, Gargi Ghosh, Jitendra Malik, Christoph Feichtenhofer

Comments: Technical report

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2212.08378 (cross-list from cs.LG) [pdf, other]: Title: Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning

Alex Tamkin, Margalit Glasgow, Xiluo He, Noah Goodman

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2212.08489 (cross-list from cs.CL) [pdf, other]: Title: Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju

Comments: Accepted in ICASSP 2023

Journal-ref: ICASSP 2023

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:2212.08911 (cross-list from cs.CL) [pdf, other]: Title: AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation

Xingshan Zeng, Liangyou Li, Qun Liu

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2212.09058 (cross-list from eess.AS) [pdf, other]: Title: BEATs: Audio Pre-Training with Acoustic Tokenizers

Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[121] arXiv:2212.09359 (cross-list from cs.CL) [pdf, other]: Title: WACO: Word-Aligned Contrastive Learning for Speech Translation

Siqi Ouyang, Rong Ye, Lei Li

Comments: ACL 2023 Poster

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2212.09553 (cross-list from cs.CL) [pdf, other]: Title: Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna

Comments: ICML 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123] arXiv:2212.09699 (cross-list from cs.CL) [pdf, other]: Title: SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Ioannis Tsiamas, José A. R. Fonollosa, Marta R. Costa-jussà

Comments: EMNLP 2023 (Findings)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124] arXiv:2212.09982 (cross-list from cs.CL) [pdf, other]: Title: Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Mozhdeh Gheini, Tatiana Likhomanenko, Matthias Sperber, Hendra Setiawan

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2212.11377 (cross-list from eess.AS) [pdf, other]: Title: ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)

Total of 137 entries : 1-25 26-50 51-75 76-100 101-125 126-137

Showing up to 25 entries per page: fewer | more | all