Skip to main content

Showing 1–13 of 13 results for author: Németh, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.11216  [pdf, ps, other

    cs.LG cs.AI

    Diversity-Driven Learning: Tackling Spurious Correlations and Data Heterogeneity in Federated Models

    Authors: Gergely D. Németh, Eros Fanì, Yeat Jeng Ng, Barbara Caputo, Miguel Ángel Lozano, Nuria Oliver, Novi Quadrianto

    Abstract: Federated Learning (FL) enables decentralized training of machine learning models on distributed data while preserving privacy. However, in real-world FL settings, client data is often non-identically distributed and imbalanced, resulting in statistical data heterogeneity which impacts the generalization capabilities of the server's model across clients, slows convergence and reduces performance.… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  2. arXiv:2401.07639  [pdf, other

    cs.LG

    Compute-Efficient Active Learning

    Authors: Gábor Németh, Tamás Matuszka

    Abstract: Active learning, a powerful paradigm in machine learning, aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset. However, the traditional active learning process often demands extensive computational resources, hindering scalability and efficiency. In this paper, we address this critical issue by presenting a novel method designed to alleviate the comp… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted at NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

  3. Privacy and Accuracy Implications of Model Complexity and Integration in Heterogeneous Federated Learning

    Authors: Gergely Dániel Németh, Miguel Ángel Lozano, Novi Quadrianto, Nuria Oliver

    Abstract: Federated Learning (FL) has been proposed as a privacy-preserving solution for distributed machine learning, particularly in heterogeneous FL settings where clients have varying computational capabilities and thus train models with different complexities compared to the server's model. However, FL is not without vulnerabilities: recent studies have shown that it is susceptible to membership infere… ▽ More

    Submitted 10 March, 2025; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Code: https://github.com/ellisalicante/ma-fl-mia

    Journal ref: IEEE Access 13 (2025) 40258-40274

  4. arXiv:2211.09445  [pdf, other

    cs.CV

    aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving with Long-Range Perception

    Authors: Tamás Matuszka, Iván Barton, Ádám Butykai, Péter Hajas, Dávid Kiss, Domonkos Kovács, Sándor Kunsági-Máté, Péter Lengyel, Gábor Németh, Levente Pető, Dezső Ribli, Dávid Szeghy, Szabolcs Vajna, Bálint Varga

    Abstract: Autonomous driving is a popular research area within the computer vision research community. Since autonomous vehicles are highly safety-critical, ensuring robustness is essential for real-world deployment. While several public multimodal datasets are accessible, they mainly comprise two sensor modalities (camera, LiDAR) which are not well suited for adverse weather. In addition, they lack far-ran… ▽ More

    Submitted 22 September, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: The paper was accepted to ICLR 2023 Workshop Scene Representations for Autonomous Driving

  5. arXiv:2210.04607  [pdf, ps, other

    cs.DC cs.AI cs.CR cs.LG

    A Snapshot of the Frontiers of Client Selection in Federated Learning

    Authors: Gergely Dániel Németh, Miguel Ángel Lozano, Novi Quadrianto, Nuria Oliver

    Abstract: Federated learning (FL) has been proposed as a privacy-preserving approach in distributed machine learning. A federated learning architecture consists of a central server and a number of clients that have access to private, potentially sensitive data. Clients are able to keep their data in their local machines and only share their locally trained model's parameters with a central server that manag… ▽ More

    Submitted 2 January, 2023; v1 submitted 27 September, 2022; originally announced October 2022.

    Comments: 17 pages, 3 figures, 1 appendix, accepted to TMLR

    Journal ref: Transactions on Machine Learning Research, 2022

  6. arXiv:2208.07122  [pdf

    cs.SD eess.AS

    Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh

    Abstract: Neural network-based Text-to-Speech has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron2, FastSpeech, FastPitch) usually generate Mel-spectrogram from text and then synthesize speech using vocoder (e.g., WaveNet, WaveGlow, HiFiGAN). Compared with traditional parametric approaches (e.g., STRAIGHT and WORLD), neural vocoder based end-to-end models suffer f… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: accepted at EUSIPCO2022

  7. arXiv:2107.12051  [pdf, other

    eess.AS cs.AI cs.SD

    Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

    Authors: Csaba Zainkó, László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh, Tamás Gábor Csapó

    Abstract: For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacotron2 text-to-speech model to improve the final synthesis quality of ultrasound-based articulatory-to-acoustic mapping with a limited database. We use… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: accepted at SSW11. arXiv admin note: text overlap with arXiv:2008.03152

  8. arXiv:2106.10481  [pdf

    cs.SD cs.AI eess.AS

    Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh

    Abstract: Vocoders received renewed attention as main components in statistical parametric text-to-speech (TTS) synthesis and speech transformation systems. Even though there are vocoding techniques give almost accepted synthesized speech, their high computational complexity and irregular structures are still considered challenging concerns, which yield a variety of voice quality degradation. Therefore, thi… ▽ More

    Submitted 19 June, 2021; originally announced June 2021.

    Comments: 6 pages, 3 figures, International Conference on Artificial Intelligence and Speech Technology (AIST2020)

  9. arXiv:2106.06863  [pdf

    cs.SD eess.AS

    Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh

    Abstract: To date, various speech technology systems have adopted the vocoder approach, a method for synthesizing speech waveform that shows a major role in the performance of statistical parametric speech synthesis. WaveNet one of the best models that nearly resembles the human voice, has to generate a waveform in a time consuming sequential manner with an extremely complex structure of its neural networks… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: 5 pages, 4 figures, accepted to the conference of Interspeech 2021

  10. Self-Attention Networks for Intent Detection

    Authors: Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth

    Abstract: Self-attention networks (SAN) have shown promising performance in various Natural Language Processing (NLP) scenarios, especially in machine translation. One of the main points of SANs is the strength of capturing long-range and multi-scale dependencies from the data. In this paper, we present a novel intent detection system which is based on a self-attention network and a Bi-LSTM. Our approach sh… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

    Comments: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

  11. arXiv:2004.06338  [pdf

    eess.AS cs.CL cs.LG cs.SD

    Transformer based Grapheme-to-Phoneme Conversion

    Authors: Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth

    Abstract: Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme… ▽ More

    Submitted 26 June, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

    Comments: INTERSPEECH 2019

  12. arXiv:1906.09885  [pdf, other

    cs.SD eess.AS eess.IV

    Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder

    Authors: Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth, Alexandra Markó

    Abstract: Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text-to-speech synthesizers were shown to produce higher quality speech when using a continuous pitch estimate, which takes non-zero pitch values even whe… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: 5 pages, 3 figures, accepted for publication at Interspeech 2019

  13. arXiv:1904.06075  [pdf

    cs.SD eess.AS

    RNN-based speech synthesis using a continuous sinusoidal model

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh

    Abstract: Recently in statistical parametric speech synthesis, we proposed a continuous sinusoidal model (CSM) using continuous F0 (contF0) in combination with Maximum Voiced Frequency (MVF), which was successfully giving state-of-the-art vocoders performance (e.g. similar to STRAIGHT) in synthesized speech. In this paper, we address the use of sequence-to-sequence modeling with recurrent neural networks (R… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: 8 pages, 4 figures, Accepted to IJCNN 2019