Skip to main content

Showing 1–4 of 4 results for author: Muzsai, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.02048  [pdf, ps, other

    cs.CR cs.AI

    Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges

    Authors: Lajos Muzsai, David Imolai, András Lukács

    Abstract: Large Language Models (LLMs) still struggle with the structured reasoning and tool-assisted computation needed for problem solving in cybersecurity applications. In this work, we introduce "random-crypto", a cryptographic Capture-the-Flag (CTF) challenge generator framework that we use to fine-tune a tool-augmented Llama-3.1-8B with Guided Reinforcement Prompt Optimisation (GRPO), allowing the age… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 11 pages, 1 figure

    MSC Class: 68M25 ACM Class: I.2.1; K.6.5

  2. arXiv:2412.01778  [pdf, other

    cs.CR cs.AI

    HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

    Authors: Lajos Muzsai, David Imolai, András Lukács

    Abstract: We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWir… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 16 pages, 9 figures

    MSC Class: 68M25 ACM Class: I.2.1; K.6.5

  3. arXiv:2410.15490  [pdf, other

    cs.AI cs.MA

    Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence

    Authors: Norbert Tihanyi, Tamas Bisztray, Richard A. Dubniczky, Rebeka Toth, Bertalan Borsos, Bilel Cherif, Mohamed Amine Ferrag, Lajos Muzsai, Ridhi Jain, Ryan Marinelli, Lucas C. Cordeiro, Merouane Debbah, Vasileios Mavroeidis, Audun Josang

    Abstract: As machine intelligence evolves, the need to test and compare the problem-solving abilities of different AI models grows. However, current benchmarks are often simplistic, allowing models to perform uniformly well and making it difficult to distinguish their capabilities. Additionally, benchmarks typically rely on static question-answer pairs that the models might memorize or guess. To address the… ▽ More

    Submitted 22 November, 2024; v1 submitted 20 October, 2024; originally announced October 2024.

  4. arXiv:2403.15938  [pdf, other

    cs.CL cs.AI cs.LG

    LlamBERT: Large-scale low-cost data annotation in NLP

    Authors: Bálint Csanády, Lajos Muzsai, Péter Vedres, Zoltán Nádasdy, András Lukács

    Abstract: Large Language Models (LLMs), such as GPT-4 and Llama 2, show remarkable proficiency in a wide range of natural language processing (NLP) tasks. Despite their effectiveness, the high costs associated with their use pose a challenge. We present LlamBERT, a hybrid approach that leverages LLMs to annotate a small subset of large, unlabeled databases and uses the results for fine-tuning transformer en… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 11 pages, 1 figure

    ACM Class: I.2.7; F.1.1