Skip to main content

Showing 1–15 of 15 results for author: Bisztray, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2510.10493  [pdf, ps, other

    cs.CR cs.LG

    The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution

    Authors: Norbert Tihanyi, Bilel Cherif, Richard A. Dubniczky, Mohamed Amine Ferrag, Tamás Bisztray

    Abstract: In this paper, we present the first large-scale study exploring whether JavaScript code generated by Large Language Models (LLMs) can reveal which model produced it, enabling reliable authorship attribution and model fingerprinting. With the rapid rise of AI-generated code, attribution is playing a critical role in detecting vulnerabilities, flagging malicious content, and ensuring accountability.… ▽ More

    Submitted 12 October, 2025; originally announced October 2025.

  2. arXiv:2506.17323  [pdf, ps, other

    cs.LG cs.AI cs.SE

    I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution

    Authors: Tamas Bisztray, Bilel Cherif, Richard A. Dubniczky, Nils Gruschka, Bertalan Borsos, Mohamed Amine Ferrag, Attila Kovacs, Vasileios Mavroeidis, Norbert Tihanyi

    Abstract: Detecting AI-generated code, deepfakes, and other synthetic content is an emerging research challenge. As code generated by Large Language Models (LLMs) becomes more common, identifying the specific model behind each sample is increasingly important. This paper presents the first systematic study of LLM authorship attribution for C programs. We released CodeT5-Authorship, a novel model that uses o… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  3. arXiv:2505.19973  [pdf, ps, other

    cs.CR cs.AI

    DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response

    Authors: Bilel Cherif, Tamas Bisztray, Richard A. Dubniczky, Aaesha Aldahmani, Saeed Alshehhi, Norbert Tihanyi

    Abstract: Digital Forensics and Incident Response (DFIR) involves analyzing digital evidence to support legal investigations. Large Language Models (LLMs) offer new opportunities in DFIR tasks such as log analysis and memory forensics, but their susceptibility to errors and hallucinations raises concerns in high-stakes contexts. Despite growing interest, there is no comprehensive benchmark to evaluate LLMs… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  4. arXiv:2503.21464  [pdf, other

    cs.CL cs.AI cs.PF

    Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection

    Authors: Ryan Marinelli, Josef Pichlmeier, Tamas Bisztray

    Abstract: In this work, we propose a metric called Number of Thoughts (NofT) to determine the difficulty of tasks pre-prompting and support Large Language Models (LLMs) in production contexts. By setting thresholds based on the number of thoughts, this metric can discern the difficulty of prompts and support more effective prompt routing. A 2% decrease in latency is achieved when routing prompts from the Ma… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  5. arXiv:2503.10784  [pdf, other

    cs.SE cs.AI

    Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview

    Authors: Norbert Tihanyi, Tamas Bisztray, Mohamed Amine Ferrag, Bilel Cherif, Richard A. Dubniczky, Ridhi Jain, Lucas C. Cordeiro

    Abstract: Software testing and verification are critical for ensuring the reliability and security of modern software systems. Traditionally, formal verification techniques, such as model checking and theorem proving, have provided rigorous frameworks for detecting bugs and vulnerabilities. However, these methods often face scalability challenges when applied to complex, real-world programs. Recently, the a… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  6. arXiv:2503.09433  [pdf, other

    cs.CR cs.AI cs.SE

    CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection

    Authors: Richard A. Dubniczky, Krisztofer Zoltán Horvát, Tamás Bisztray, Mohamed Amine Ferrag, Lucas C. Cordeiro, Norbert Tihanyi

    Abstract: Identifying vulnerabilities in source code is crucial, especially in critical software components. Existing methods such as static analysis, dynamic analysis, formal verification, and recently Large Language Models are widely used to detect security flaws. This paper introduces CASTLE (CWE Automated Security Testing and Low-Level Evaluation), a benchmarking framework for evaluating the vulnerabili… ▽ More

    Submitted 31 March, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

  7. arXiv:2410.15490  [pdf, other

    cs.AI cs.MA

    Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence

    Authors: Norbert Tihanyi, Tamas Bisztray, Richard A. Dubniczky, Rebeka Toth, Bertalan Borsos, Bilel Cherif, Mohamed Amine Ferrag, Lajos Muzsai, Ridhi Jain, Ryan Marinelli, Lucas C. Cordeiro, Merouane Debbah, Vasileios Mavroeidis, Audun Josang

    Abstract: As machine intelligence evolves, the need to test and compare the problem-solving abilities of different AI models grows. However, current benchmarks are often simplistic, allowing models to perform uniformly well and making it difficult to distinguish their capabilities. Additionally, benchmarks typically rely on static question-answer pairs that the models might memorize or guess. To address the… ▽ More

    Submitted 22 November, 2024; v1 submitted 20 October, 2024; originally announced October 2024.

  8. arXiv:2405.12750  [pdf, other

    cs.CR cs.AI

    Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities

    Authors: Mohamed Amine Ferrag, Fatima Alwahedi, Ammar Battah, Bilel Cherif, Abdechakour Mechri, Norbert Tihanyi, Tamas Bisztray, Merouane Debbah

    Abstract: This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs). We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection. We present an overview of LLM evolution and its… ▽ More

    Submitted 17 January, 2025; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: 52 pages, 8 figures

  9. How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

    Authors: Norbert Tihanyi, Tamas Bisztray, Mohamed Amine Ferrag, Ridhi Jain, Lucas C. Cordeiro

    Abstract: This study compares state-of-the-art Large Language Models (LLMs) on their tendency to generate vulnerabilities when writing C programs using a neutral zero-shot prompt. Tihanyi et al. introduced the FormAI dataset at PROMISE'23, featuring 112,000 C programs generated by GPT-3.5-turbo, with over 51.24% identified as vulnerable. We extended that research with a large-scale study involving 9 state-o… ▽ More

    Submitted 11 December, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: Accepted and will be shortly published at Empirical Software Engineering (EMSE). Journal Impact Factor: 3.5 (2023)

  10. arXiv:2404.14459  [pdf, other

    cs.SE cs.AI

    LLMs in Web Development: Evaluating LLM-Generated PHP Code Unveiling Vulnerabilities and Limitations

    Authors: Rebeka Tóth, Tamas Bisztray, László Erdodi

    Abstract: This study evaluates the security of web application code generated by Large Language Models, analyzing 2,500 GPT-4 generated PHP websites. These were deployed in Docker containers and tested for vulnerabilities using a hybrid approach of Burp Suite active scanning, static analysis, and manual review. Our investigation focuses on identifying Insecure File Upload, SQL Injection, Stored XSS, and Ref… ▽ More

    Submitted 21 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  11. arXiv:2402.07688  [pdf, other

    cs.AI cs.CR

    CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge

    Authors: Norbert Tihanyi, Mohamed Amine Ferrag, Ridhi Jain, Tamas Bisztray, Merouane Debbah

    Abstract: Large Language Models (LLMs) are increasingly used across various domains, from software development to cyber threat intelligence. Understanding all the different fields of cybersecurity, which includes topics such as cryptography, reverse engineering, and risk assessment, poses a challenge even for human experts. To accurately test the general knowledge of LLMs in cybersecurity, the research comm… ▽ More

    Submitted 3 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  12. The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification

    Authors: Norbert Tihanyi, Tamas Bisztray, Ridhi Jain, Mohamed Amine Ferrag, Lucas C. Cordeiro, Vasileios Mavroeidis

    Abstract: This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse programs utilizing Large Language Models (LLMs). The dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some program… ▽ More

    Submitted 28 March, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: https://github.com/FormAI-Dataset PLEASE USE PUBLISHED VERSION FOR CITATION: https://doi.org/10.1145/3617555.3617874

    Journal ref: PROMISE 2023: Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering December 2023 Pages 33 to 43

  13. arXiv:2306.08740  [pdf, other

    cs.CR cs.IT

    Privacy-Preserving Password Cracking: How a Third Party Can Crack Our Password Hash Without Learning the Hash Value or the Cleartext

    Authors: Norbert Tihanyi, Tamas Bisztray, Bertalan Borsos, Sebastien Raveau

    Abstract: Using the computational resources of an untrusted third party to crack a password hash can pose a high number of privacy and security risks. The act of revealing the hash digest could in itself negatively impact both the data subject who created the password, and the data controller who stores the hash digest. This paper solves this currently open problem by presenting a Privacy-Preserving Passwor… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  14. Emerging Biometric Modalities and their Use: Loopholes in the Terminology of the GDPR and Resulting Privacy Risks

    Authors: Tamas Bisztray, Nils Gruschka, Thirimachos Bourlai, Lothar Fritsch

    Abstract: Technological advancements allow biometric applications to be more omnipresent than in any other time before. This paper argues that in the current EU data protection regulation, classification applications using biometric data receive less protection compared to biometric recognition. We analyse preconditions in the regulatory language and explore how this has the potential to be the source of un… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Journal ref: 2021 International Conference of the Biometrics Special Interest Group (BIOSIG), 2021, pp. 1-5,

  15. Privacy Impact Assessment: Comparing methodologies with a focus on practicality

    Authors: Tamas Bisztray, Nils Gruschka

    Abstract: Privacy and data protection have become more and more important in recent years since an increasing number of enterprises and startups are harvesting personal data as a part of their business model. One central requirement of the GDPR is the implementation of a data protection impact assessment for privacy critical systems. However, the law does not dictate or recommend the use of any particular f… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Journal ref: NordSec 2019. Lecture Notes in Computer Science, vol 11875. Springer, Cham