Audio and Speech Processing

Authors and titles for December 2023

Total of 234 entries : 1-25 26-50 51-75 76-100 ... 226-234

Showing up to 25 entries per page: fewer | more | all

[1] arXiv:2312.00174 [pdf, other]: Title: Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices

Gokul Srinivasagan, Michael Deisher, Munir Georges

Comments: 5 pages, 2 figures, 2 tables, presented at the 15th ITG Conference on Speech Communications, September 2023, Aachen

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[2] arXiv:2312.00231 [pdf, other]: Title: Learning domain-invariant classifiers for infant cry sounds

Charles C. Onu, Hemanth K. Sheetha, Arsenii Gorin, Doina Precup

Subjects: Audio and Speech Processing (eess.AS)
[3] arXiv:2312.00249 [pdf, html, other]: Title: Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities

Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

Comments: Published at IEEE Transactions on Audio, Speech and Language Processing

Journal-ref: IEEE Transactions on Audio, Speech and Language Processing (2025)

Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2312.00698 [pdf, html, other]: Title: SPIRE-SIES: A Spontaneous Indian English Speech Corpus

Abhayjeet Singh, Charu Shah, Rajashri Varadaraj, Sonakshi Chauhan, Prasanta Kumar Ghosh

Comments: 6 pages, 7 plots, 3 tables, Accepted at O-COCOSDA 2023

Subjects: Audio and Speech Processing (eess.AS)
[5] arXiv:2312.01744 [pdf, html, other]: Title: SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

Martin Strauss, Nicola Pia, Nagashree K. S. Rao, Bernd Edler

Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

Subjects: Audio and Speech Processing (eess.AS)
[6] arXiv:2312.01808 [pdf, html, other]: Title: Head Orientation Estimation with Distributed Microphones Using Speech Radiation Patterns

Kaspar Müller, Bilgesu Çakmak, Paul Didier, Simon Doclo, Jan Østergaard, Tobias Wolff

Comments: 6 pages, submitted to 57th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2312.02581 [pdf, other]: Title: Auralization based on multi-perspective ambisonic room impulse responses

Kaspar Müller, Franz Zotter

Comments: 18 pages, published in Acta Acustica (Open Access), datasets are available via this https URL and this https URL

Journal-ref: Acta Acustica, Volume 4, Number 6, Article Number 25, 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2312.02683 [pdf, other]: Title: Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler

Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[9] arXiv:2312.03034 [pdf, html, other]: Title: Distributed Speech Dereverberation Using Weighted Prediction Error

Ziye Yang, Mengfei Zhang, Jie Chen

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[10] arXiv:2312.03129 [pdf, other]: Title: Leveraging Laryngograph Data for Robust Voicing Detection in Speech

Yixuan Zhang, Heming Wang, DeLiang Wang

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[11] arXiv:2312.03324 [pdf, other]: Title: Lightweight Speaker Verification Using Transformation Module with Feature Partition and Fusion

Yanxiong Li, Zhongjie Jiang, Qisheng Huang, Wenchang Cao, Jialong Li

Comments: 12 pages, 5 figures, 6 tables; accepted for publication in IEEE-ACM TASLP

Subjects: Audio and Speech Processing (eess.AS)
[12] arXiv:2312.03620 [pdf, html, other]: Title: Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. Open Access: this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13] arXiv:2312.03668 [pdf, html, other]: Title: Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition

Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada

Comments: 17 pages, 4 figures, 9 tables, accepted for Findings of ACL 2024. The model is available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[14] arXiv:2312.03694 [pdf, html, other]: Title: Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Mirco Ravanelli

Comments: Accepted at IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024. The code is available at: this https URL

Subjects: Audio and Speech Processing (eess.AS)
[15] arXiv:2312.04131 [pdf, html, other]: Title: Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization

Huan Zhao, Li Zhang, Yue Li, Yannan Wang, Hongji Wang, Wei Rao, Qing Wang, Lei Xie

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2312.04324 [pdf, html, other]: Title: DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors

Federico Landini, Mireia Diez, Themos Stafylakis, Lukáš Burget

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17] arXiv:2312.04370 [pdf, html, other]: Title: Investigating the Design Space of Diffusion Models for Speech Enhancement

Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May

Comments: Accepted version

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[18] arXiv:2312.05173 [pdf, html, other]: Title: Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach

Nils L. Westhausen, Bernd T. Meyer

Comments: Accepted for publication at IEEE ICASSP 2024 OJSP track

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19] arXiv:2312.06065 [pdf, html, other]: Title: EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

Comments: Submitted to IEEE Signal Processing Letters

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[20] arXiv:2312.06270 [pdf, html, other]: Title: Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models

Anna Derington, Hagen Wierstorf, Ali Özkil, Florian Eyben, Felix Burkhardt, Björn W. Schuller

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21] arXiv:2312.06907 [pdf, html, other]: Title: w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-Training

Orlem Lima dos Santos, Karen Rosero, Roberto de Alencar Lotufo

Comments: 17 pages, 5 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22] arXiv:2312.07513 [pdf, html, other]: Title: NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

Zexu Pan, Gordon Wichern, Francois G. Germain, Sameer Khurana, Jonathan Le Roux

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2312.08089 [pdf, html, other]: Title: Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier

Yinlin Guo, Haofan Huang, Xi Chen, He Zhao, Yuehai Wang

Comments: Accepted to ICASSP 2024. 5 pages, 1 figure

Subjects: Audio and Speech Processing (eess.AS)
[24] arXiv:2312.08132 [pdf, html, other]: Title: Ultra Low Complexity Deep Learning Based Noise Suppression

Shrishti Saha Shetu, Soumitro Chakrabarty, Oliver Thiergart, Edwin Mabande

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
[25] arXiv:2312.08496 [pdf, other]: Title: Metrological support of acoustic measuring installations mid-frequency devices

A.N. Grekov, N.A. Grekov, E.N. Sychev

Comments: 9 pages, 1 figure

Journal-ref: Environmental control systems. 2023. Issue. 2 (40). pp. 117-126

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Total of 234 entries : 1-25 26-50 51-75 76-100 ... 226-234

Showing up to 25 entries per page: fewer | more | all