Sound

Authors and titles for December 2022

Total of 137 entries : 1-25 26-50 51-75 76-100 101-125 126-137

Showing up to 25 entries per page: fewer | more | all

[51] arXiv:2212.14618 [pdf, other]: Title: Blind Restoration of Real-World Audio by 1D Operational GANs

Turker Ince, Serkan Kiranyaz, Ozer Can Devecioglu, Muhammad Salman Khan, Muhammad Chowdhury, Moncef Gabbouj

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[52] arXiv:2212.00004 (cross-list from cs.HC) [pdf, other]: Title: Advanced Audio Aid for Blind People

Savera Sarwar, Muhammad Turab, Danish Channa, Aisha Chandio, M. Uzair Sohu, Vikram Kumar

Comments: Under revision. Submitted to International Conference On Emerging Technologies In Electronics, Computing And Communication (ICETECC) 2022

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53] arXiv:2212.00239 (cross-list from cs.CL) [pdf, other]: Title: Inconsistency Ranking-based Noisy Label Detection for High-quality Data

Ruibin Yuan, Hanzhi Yin, Yi Wang, Yifan He, Yushi Ye, Lei Zhang, Zhizheng Wu

Comments: 5 pages

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[54] arXiv:2212.00500 (cross-list from cs.MM) [pdf, other]: Title: MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

Xiaohuan Zhou, Jiaming Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou

Comments: Submitted to ICASSP 2023

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2212.01012 (cross-list from eess.AS) [pdf, other]: Title: Injecting Spatial Information for Monaural Speech Enhancement via Knowledge Distillation

Xinmeng Xu, Weiping Tu, Yuhong Yang

Comments: Submitted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56] arXiv:2212.01040 (cross-list from cs.CV) [pdf, other]: Title: Role of Audio in Audio-Visual Video Summarization

Ibrahim Shoer, Berkay Kopru, Engin Erzin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57] arXiv:2212.01083 (cross-list from cs.CV) [pdf, other]: Title: Cross-Modal Mutual Learning for Cued Speech Recognition

Lei Liu, Li Liu

Comments: Accepted to ICASSP2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[58] arXiv:2212.01187 (cross-list from cs.CL) [pdf, other]: Title: Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition

Alexandre Bittar, Philip N. Garner

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[59] arXiv:2212.01245 (cross-list from eess.AS) [pdf, other]: Title: Preliminary Study on SSCF-derived Polar Coordinate for ASR

Sotheara Leang (CADT, M-PSI), Eric Castelli (M-PSI), Dominique Vaufreydaz (M-PSI), Sethserey Sam (CADT)

Journal-ref: ACET 2022, Dec 2022, Phnom Penh, Cambodia

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[60] arXiv:2212.01282 (cross-list from eess.AS) [pdf, other]: Title: CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models

Zih-Ching Chen, Yu-Shun Sung, Hung-yi Lee

Comments: Submitted to ICASSP 2023. Under review

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[61] arXiv:2212.01306 (cross-list from eess.AS) [pdf, other]: Title: Relative Acoustic Features for Distance Estimation in Smart-Homes

Francesco Nespoli, Daniel Barreda, Patrick A. Naylor

Journal-ref: Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62] arXiv:2212.01393 (cross-list from eess.AS) [pdf, other]: Title: Continual Learning for On-Device Speech Recognition using Disentangled Conformers

Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed

Comments: 8 pages, 2 figures. Submitted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[63] arXiv:2212.01427 (cross-list from eess.AS) [pdf, other]: Title: Investigations on the Influence of Combined Inter-Aural Cue Distortions on Overall Audio Quality

Pablo M. Delgado, Jürgen Herre

Comments: A previous version of this paper (minus errata) was presented at Fortschritte der Akustik - DAGA 2019 (Rostock, Germany)

Journal-ref: Tagungsband - DAGA 2019 - 45. Jahrestagung f\"ur Akustik

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64] arXiv:2212.01451 (cross-list from eess.AS) [pdf, other]: Title: Objective Assessment of Spatial Audio Quality using Directional Loudness Maps

Pablo M. Delgado, Jürgen Herre

Comments: Accepted paper at ICASSP 2019

Journal-ref: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[65] arXiv:2212.01467 (cross-list from eess.AS) [pdf, other]: Title: Can we still use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality

Pablo M. Delgado, Jürgen Herre

Comments: Accepter manuscript for 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX 2020)

Journal-ref: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), 2020, pp. 1-6,

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[66] arXiv:2212.01651 (cross-list from cs.LG) [pdf, other]: Title: A dataset for audio-video based vehicle speed estimation

Slobodan Djukanović, Nikola Bulatović, Ivana Čavor

Comments: 30th Telecommunications Forum TELFOR 2022, Belgrade, Serbia, November 15-16, 2022. 5 pages, 2 figures, 1 table

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67] arXiv:2212.01661 (cross-list from eess.AS) [pdf, other]: Title: Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models

Reem Gody, David Harwath

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[68] arXiv:2212.01686 (cross-list from eess.IV) [pdf, other]: Title: A subjective study of the perceptual acceptability of audio-video desynchronization in sports videos

Joshua Peter Ebenezer

Comments: 6 pages; for associated code see this https URL

Subjects: Image and Video Processing (eess.IV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69] arXiv:2212.01778 (cross-list from eess.AS) [pdf, other]: Title: Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data

Yuhao Zhang, Chen Xu, Bojie Hu, Chunliang Zhang, Tong Xiao, Jingbo Zhu

Comments: Accepted to AAAI 2023

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[70] arXiv:2212.01892 (cross-list from eess.AS) [pdf, other]: Title: Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

Davide Berghi, Marco Volino, Philip J. B. Jackson

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[71] arXiv:2212.01992 (cross-list from cs.CL) [pdf, other]: Title: Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Rui Zhao, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72] arXiv:2212.02013 (cross-list from eess.AS) [pdf, other]: Title: Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features

Tadipatri Uday Kiran Reddy, Sahukari Chaitanya Varun, Kota Pranav Kumar Sankala Sreekanth, Kodukula Sri Rama Murty

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[73] arXiv:2212.02033 (cross-list from eess.AS) [pdf, html, other]: Title: Towards Generating Diverse Audio Captions via Adversarial Training

Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

Comments: Accepted to TASLP

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD)
[74] arXiv:2212.02099 (cross-list from eess.AS) [pdf, other]: Title: LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition

Yuguang Yang, Yu Pan, Jingjing Yin, Heng Lu

Comments: NCMMSC2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75] arXiv:2212.02616 (cross-list from eess.AS) [pdf, other]: Title: Sound emergence as a predictor of short-term annoyance from wind turbine noise

Elise Ruaud, Guillaume Dutilleux

Comments: Accepted for publication in the Journal or the Acoustical Society of America. 17 pages, 8 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP); Classical Physics (physics.class-ph)

Total of 137 entries : 1-25 26-50 51-75 76-100 101-125 126-137

Showing up to 25 entries per page: fewer | more | all