Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for December 2022

Total of 137 entries : 1-25 26-50 51-75 76-100 101-125 126-137
Showing up to 25 entries per page: fewer | more | all
[101] arXiv:2212.04684 (cross-list from cs.LG) [pdf, other]
Title: Machine Learning-based Classification of Birds through Birdsong
Yueying Chang, Richard O. Sinnott
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102] arXiv:2212.04831 (cross-list from eess.AS) [pdf, other]
Title: Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models
Huajian Fang, Timo Gerkmann
Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[103] arXiv:2212.04930 (cross-list from eess.AS) [pdf, other]
Title: DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech
Kazuki Kawamura, Jun Rekimoto
Journal-ref: 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
[104] arXiv:2212.05008 (cross-list from eess.AS) [pdf, other]
Title: Hyperbolic Audio Source Separation
Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux
Comments: Submitted to ICASSP 2023, Demo page: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[105] arXiv:2212.05271 (cross-list from eess.AS) [pdf, other]
Title: GPU-accelerated Guided Source Separation for Meeting Transcription
Desh Raj, Daniel Povey, Sanjeev Khudanpur
Comments: 7 pages, 4 figures. To appear at InterSpeech 2023. Code available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[106] arXiv:2212.05805 (cross-list from cs.CL) [pdf, other]
Title: Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma
Comments: 4 pages, 3 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2212.05922 (cross-list from cs.CV) [pdf, html, other]
Title: Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab
Comments: ICCV 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[108] arXiv:2212.06246 (cross-list from cs.LG) [pdf, other]
Title: Jointly Learning Visual and Auditory Speech Representations from Raw Data
Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic
Comments: ICLR 2023. Code: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[109] arXiv:2212.07136 (cross-list from eess.SP) [pdf, other]
Title: Event-driven Spectrotemporal Feature Extraction and Classification using a Silicon Cochlea Model
Ying Xu, Samalika Perera, Yeshwanth Bethi, Saeed Afshar, André van Schaik
Comments: 12 pages, 8 figures
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[110] arXiv:2212.07327 (cross-list from eess.AS) [pdf, other]
Title: Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux
Comments: Submitted to IEEE TASLP (In review), 13 pages, 6 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[111] arXiv:2212.07525 (cross-list from cs.LG) [pdf, other]
Title: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2212.07570 (cross-list from eess.AS) [pdf, other]
Title: DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement
Dongheon Lee, Jung-Woo Choi
Comments: 5 pages, 2 figures, 3 tables. This article has been published by IEEE Signal Processing Letters. This version is the authors' version and may vary from the final publication in details
Journal-ref: IEEE Signal Processing Letters, Vol. 30, pp. 155-159, 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[113] arXiv:2212.07939 (cross-list from cs.CL) [pdf, other]
Title: RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh, HyeongRae Noh, Yoonseok Hong, Insoo Oh
Comments: Accepted to AAAI 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114] arXiv:2212.07983 (cross-list from cs.CV) [pdf, other]
Title: Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius
Comments: CVPR 2023 Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115] arXiv:2212.08055 (cross-list from cs.CL) [pdf, other]
Title: UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino
Comments: ACL 2023 (main conference)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2212.08071 (cross-list from cs.CV) [pdf, other]
Title: MAViL: Masked Audio-Video Learners
Po-Yao Huang, Vasu Sharma, Hu Xu, Chaitanya Ryali, Haoqi Fan, Yanghao Li, Shang-Wen Li, Gargi Ghosh, Jitendra Malik, Christoph Feichtenhofer
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2212.08378 (cross-list from cs.LG) [pdf, other]
Title: Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning
Alex Tamkin, Margalit Glasgow, Xiluo He, Noah Goodman
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2212.08489 (cross-list from cs.CL) [pdf, other]
Title: Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks
Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju
Comments: Accepted in ICASSP 2023
Journal-ref: ICASSP 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:2212.08911 (cross-list from cs.CL) [pdf, other]
Title: AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation
Xingshan Zeng, Liangyou Li, Qun Liu
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2212.09058 (cross-list from eess.AS) [pdf, other]
Title: BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[121] arXiv:2212.09359 (cross-list from cs.CL) [pdf, other]
Title: WACO: Word-Aligned Contrastive Learning for Speech Translation
Siqi Ouyang, Rong Ye, Lei Li
Comments: ACL 2023 Poster
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2212.09553 (cross-list from cs.CL) [pdf, other]
Title: Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna
Comments: ICML 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123] arXiv:2212.09699 (cross-list from cs.CL) [pdf, other]
Title: SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
Ioannis Tsiamas, José A. R. Fonollosa, Marta R. Costa-jussà
Comments: EMNLP 2023 (Findings)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124] arXiv:2212.09982 (cross-list from cs.CL) [pdf, other]
Title: Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data
Mozhdeh Gheini, Tatiana Likhomanenko, Matthias Sperber, Hendra Setiawan
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2212.11377 (cross-list from eess.AS) [pdf, other]
Title: ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
Total of 137 entries : 1-25 26-50 51-75 76-100 101-125 126-137
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack