Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > eess.AS

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Audio and Speech Processing

Authors and titles for December 2023

Total of 234 entries : 1-25 26-50 51-75 76-100 ... 226-234
Showing up to 25 entries per page: fewer | more | all
[1] arXiv:2312.00174 [pdf, other]
Title: Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices
Gokul Srinivasagan, Michael Deisher, Munir Georges
Comments: 5 pages, 2 figures, 2 tables, presented at the 15th ITG Conference on Speech Communications, September 2023, Aachen
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[2] arXiv:2312.00231 [pdf, other]
Title: Learning domain-invariant classifiers for infant cry sounds
Charles C. Onu, Hemanth K. Sheetha, Arsenii Gorin, Doina Precup
Subjects: Audio and Speech Processing (eess.AS)
[3] arXiv:2312.00249 [pdf, html, other]
Title: Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities
Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos
Comments: Published at IEEE Transactions on Audio, Speech and Language Processing
Journal-ref: IEEE Transactions on Audio, Speech and Language Processing (2025)
Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2312.00698 [pdf, html, other]
Title: SPIRE-SIES: A Spontaneous Indian English Speech Corpus
Abhayjeet Singh, Charu Shah, Rajashri Varadaraj, Sonakshi Chauhan, Prasanta Kumar Ghosh
Comments: 6 pages, 7 plots, 3 tables, Accepted at O-COCOSDA 2023
Subjects: Audio and Speech Processing (eess.AS)
[5] arXiv:2312.01744 [pdf, html, other]
Title: SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement
Martin Strauss, Nicola Pia, Nagashree K. S. Rao, Bernd Edler
Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
Subjects: Audio and Speech Processing (eess.AS)
[6] arXiv:2312.01808 [pdf, html, other]
Title: Head Orientation Estimation with Distributed Microphones Using Speech Radiation Patterns
Kaspar Müller, Bilgesu Çakmak, Paul Didier, Simon Doclo, Jan Østergaard, Tobias Wolff
Comments: 6 pages, submitted to 57th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2312.02581 [pdf, other]
Title: Auralization based on multi-perspective ambisonic room impulse responses
Kaspar Müller, Franz Zotter
Comments: 18 pages, published in Acta Acustica (Open Access), datasets are available via this https URL and this https URL
Journal-ref: Acta Acustica, Volume 4, Number 6, Article Number 25, 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2312.02683 [pdf, other]
Title: Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler
Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May
Comments: Accepted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[9] arXiv:2312.03034 [pdf, html, other]
Title: Distributed Speech Dereverberation Using Weighted Prediction Error
Ziye Yang, Mengfei Zhang, Jie Chen
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[10] arXiv:2312.03129 [pdf, other]
Title: Leveraging Laryngograph Data for Robust Voicing Detection in Speech
Yixuan Zhang, Heming Wang, DeLiang Wang
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[11] arXiv:2312.03324 [pdf, other]
Title: Lightweight Speaker Verification Using Transformation Module with Feature Partition and Fusion
Yanxiong Li, Zhongjie Jiang, Qisheng Huang, Wenchang Cao, Jialong Li
Comments: 12 pages, 5 figures, 6 tables; accepted for publication in IEEE-ACM TASLP
Subjects: Audio and Speech Processing (eess.AS)
[12] arXiv:2312.03620 [pdf, html, other]
Title: Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li
Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. Open Access: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13] arXiv:2312.03668 [pdf, html, other]
Title: Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada
Comments: 17 pages, 4 figures, 9 tables, accepted for Findings of ACL 2024. The model is available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[14] arXiv:2312.03694 [pdf, html, other]
Title: Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Mirco Ravanelli
Comments: Accepted at IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024. The code is available at: this https URL
Subjects: Audio and Speech Processing (eess.AS)
[15] arXiv:2312.04131 [pdf, html, other]
Title: Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
Huan Zhao, Li Zhang, Yue Li, Yannan Wang, Hongji Wang, Wei Rao, Qing Wang, Lei Xie
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2312.04324 [pdf, html, other]
Title: DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Federico Landini, Mireia Diez, Themos Stafylakis, Lukáš Burget
Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17] arXiv:2312.04370 [pdf, html, other]
Title: Investigating the Design Space of Diffusion Models for Speech Enhancement
Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May
Comments: Accepted version
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[18] arXiv:2312.05173 [pdf, html, other]
Title: Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach
Nils L. Westhausen, Bernd T. Meyer
Comments: Accepted for publication at IEEE ICASSP 2024 OJSP track
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19] arXiv:2312.06065 [pdf, html, other]
Title: EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[20] arXiv:2312.06270 [pdf, html, other]
Title: Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
Anna Derington, Hagen Wierstorf, Ali Özkil, Florian Eyben, Felix Burkhardt, Björn W. Schuller
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21] arXiv:2312.06907 [pdf, html, other]
Title: w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-Training
Orlem Lima dos Santos, Karen Rosero, Roberto de Alencar Lotufo
Comments: 17 pages, 5 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22] arXiv:2312.07513 [pdf, html, other]
Title: NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection
Zexu Pan, Gordon Wichern, Francois G. Germain, Sameer Khurana, Jonathan Le Roux
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2312.08089 [pdf, html, other]
Title: Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier
Yinlin Guo, Haofan Huang, Xi Chen, He Zhao, Yuehai Wang
Comments: Accepted to ICASSP 2024. 5 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS)
[24] arXiv:2312.08132 [pdf, html, other]
Title: Ultra Low Complexity Deep Learning Based Noise Suppression
Shrishti Saha Shetu, Soumitro Chakrabarty, Oliver Thiergart, Edwin Mabande
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
[25] arXiv:2312.08496 [pdf, other]
Title: Metrological support of acoustic measuring installations mid-frequency devices
A.N. Grekov, N.A. Grekov, E.N. Sychev
Comments: 9 pages, 1 figure
Journal-ref: Environmental control systems. 2023. Issue. 2 (40). pp. 117-126
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Total of 234 entries : 1-25 26-50 51-75 76-100 ... 226-234
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack