Skip to main content

Showing 1–25 of 25 results for author: Zizzo, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.12397  [pdf, other

    cs.LG cs.AI

    Activated LoRA: Fine-tuned LLMs for Intrinsics

    Authors: Kristjan Greenewald, Luis Lastras, Thomas Parnell, Vraj Shah, Lucian Popa, Giulio Zizzo, Chulaka Gunasekara, Ambrish Rawat, David Cox

    Abstract: Low-Rank Adaptation (LoRA) has emerged as a highly efficient framework for finetuning the weights of large foundation models, and has become the go-to method for data-driven customization of LLMs. Despite the promise of highly customized behaviors and capabilities, switching between relevant LoRAs in a multiturn setting is inefficient, as the key-value (KV) cache of the entire turn history must be… ▽ More

    Submitted 10 June, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

  2. arXiv:2503.06253  [pdf, ps, other

    cs.LG

    MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming

    Authors: Stefan Schoepf, Muhammad Zaid Hameed, Ambrish Rawat, Kieran Fraser, Giulio Zizzo, Giandomenico Cornacchia, Mark Purcell

    Abstract: With LLM usage rapidly increasing, their vulnerability to jailbreaks that create harmful outputs are a major security risk. As new jailbreaking strategies emerge and models are changed by fine-tuning, continuous testing for security vulnerabilities is necessary. Existing Red Teaming methods fall short in cost efficiency, attack success rate, attack diversity, or extensibility as new attack types e… ▽ More

    Submitted 18 June, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: Data in Generative Models Workshop: The Bad, the Ugly, and the Greats (DIG-BUGS) at ICML 2025

  3. arXiv:2502.15427  [pdf, other

    cs.CR cs.LG

    Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs

    Authors: Giulio Zizzo, Giandomenico Cornacchia, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Beat Buesser, Mark Purcell, Pin-Yu Chen, Prasanna Sattigeri, Kush Varshney

    Abstract: As large language models (LLMs) become integrated into everyday applications, ensuring their robustness and security is increasingly critical. In particular, LLMs can be manipulated into unsafe behaviour by prompts known as jailbreaks. The variety of jailbreak styles is growing, necessitating the use of external defences known as guardrails. While many jailbreak defences have been proposed, not al… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: NeurIPS 2024, Safe Generative AI Workshop

  4. arXiv:2412.07724  [pdf, other

    cs.CL

    Granite Guardian

    Authors: Inkit Padhi, Manish Nagireddy, Giandomenico Cornacchia, Subhajit Chaudhury, Tejaswini Pedapati, Pierre Dognin, Keerthiram Murugesan, Erik Miehling, Martín Santillán Cooper, Kieran Fraser, Giulio Zizzo, Muhammad Zaid Hameed, Mark Purcell, Michael Desmond, Qian Pan, Zahra Ashktorab, Inge Vejsbjerg, Elizabeth M. Daly, Michael Hind, Werner Geyer, Ambrish Rawat, Kush R. Varshney, Prasanna Sattigeri

    Abstract: We introduce the Granite Guardian models, a suite of safeguards designed to provide risk detection for prompts and responses, enabling safe and responsible use in combination with any large language model (LLM). These models offer comprehensive coverage across multiple risk dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, and hallucination-r… ▽ More

    Submitted 16 December, 2024; v1 submitted 10 December, 2024; originally announced December 2024.

  5. arXiv:2411.06835  [pdf, other

    cs.CL cs.CR

    HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment

    Authors: Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis

    Abstract: With the introduction of the transformers architecture, LLMs have revolutionized the NLP field with ever more powerful models. Nevertheless, their development came up with several challenges. The exponential growth in computational power and reasoning capabilities of language models has heightened concerns about their security. As models become more powerful, ensuring their safety has become a cru… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024 Workshop on Safe Generative Artificial Intelligence (SafeGenAI)

  6. Assessing the Impact of Packing on Machine Learning-Based Malware Detection and Classification Systems

    Authors: Daniel Gibert, Nikolaos Totosis, Constantinos Patsakis, Giulio Zizzo, Quan Le

    Abstract: The proliferation of malware, particularly through the use of packing, presents a significant challenge to static analysis and signature-based malware detection techniques. The application of packing to the original executable code renders extracting meaningful features and signatures challenging. To deal with the increasing amount of malware in the wild, researchers and anti-malware companies sta… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

  7. arXiv:2410.09078  [pdf, other

    cs.CL cs.AI cs.CY cs.SE

    Knowledge-Augmented Reasoning for EUAIA Compliance and Adversarial Robustness of LLMs

    Authors: Tomas Bueno Momcilovic, Dian Balta, Beat Buesser, Giulio Zizzo, Mark Purcell

    Abstract: The EU AI Act (EUAIA) introduces requirements for AI systems which intersect with the processes required to establish adversarial robustness. However, given the ambiguous language of regulation and the dynamicity of adversarial attacks, developers of systems with highly complex models such as LLMs may find their effort to be duplicated without the assurance of having achieved either compliance or… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted in the VECOMP 2024 workshop

  8. arXiv:2410.07962  [pdf, other

    cs.AI

    Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation

    Authors: Tomas Bueno Momcilovic, Beat Buesser, Giulio Zizzo, Mark Purcell, Dian Balta

    Abstract: Despite the impressive adaptability of large language models (LLMs), challenges remain in ensuring their security, transparency, and interpretability. Given their susceptibility to adversarial attacks, LLMs need to be defended with an evolving combination of adversarial training and guardrails. However, managing the implicit and heterogeneous knowledge for continuously assuring robustness is diffi… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: To be published in xAI 2024, late-breaking track

  9. arXiv:2410.05306  [pdf, other

    cs.CR cs.AI

    Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs

    Authors: Tomas Bueno Momcilovic, Beat Buesser, Giulio Zizzo, Mark Purcell, Dian Balta

    Abstract: Large language models are prone to misuse and vulnerable to security threats, raising significant safety and security concerns. The European Union's Artificial Intelligence Act seeks to enforce AI robustness in certain contexts, but faces implementation challenges due to the lack of standards, complexity of LLMs and emerging security vulnerabilities. Our research introduces a framework using ontol… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted in the AI Act Workshop

  10. arXiv:2410.05304  [pdf, other

    cs.CR cs.AI cs.SE

    Developing Assurance Cases for Adversarial Robustness and Regulatory Compliance in LLMs

    Authors: Tomas Bueno Momcilovic, Dian Balta, Beat Buesser, Giulio Zizzo, Mark Purcell

    Abstract: This paper presents an approach to developing assurance cases for adversarial robustness and regulatory compliance in large language models (LLMs). Focusing on both natural and code language tasks, we explore the vulnerabilities these models face, including adversarial attacks based on jailbreaking, heuristics, and randomization. We propose a layered framework incorporating guardrails at various s… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted to the ASSURE 2024 workshop

  11. arXiv:2409.17699  [pdf, other

    cs.CR cs.AI cs.LG

    MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks

    Authors: Giandomenico Cornacchia, Giulio Zizzo, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Mark Purcell

    Abstract: The proliferation of Large Language Models (LLMs) in diverse applications underscores the pressing need for robust security measures to thwart potential jailbreak attacks. These attacks exploit vulnerabilities within LLMs, endanger data integrity and user privacy. Guardrails serve as crucial protective mechanisms against such threats, but existing models often fall short in terms of both detection… ▽ More

    Submitted 4 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

  12. arXiv:2409.15398  [pdf, other

    cs.CR cs.AI cs.LG

    Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

    Authors: Ambrish Rawat, Stefan Schoepf, Giulio Zizzo, Giandomenico Cornacchia, Muhammad Zaid Hameed, Kieran Fraser, Erik Miehling, Beat Buesser, Elizabeth M. Daly, Mark Purcell, Prasanna Sattigeri, Pin-Yu Chen, Kush R. Varshney

    Abstract: As generative AI, particularly large language models (LLMs), become increasingly integrated into production applications, new attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversar… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  13. arXiv:2405.00392  [pdf, other

    cs.CR cs.AI

    Certified Adversarial Robustness of Machine Learning-based Malware Detectors via (De)Randomized Smoothing

    Authors: Daniel Gibert, Luca Demetrio, Giulio Zizzo, Quan Le, Jordi Planes, Battista Biggio

    Abstract: Deep learning-based malware detection systems are vulnerable to adversarial EXEmples - carefully-crafted malicious programs that evade detection with minimal perturbation. As such, the community is dedicating effort to develop mechanisms to defend against adversarial EXEmples. However, current randomized smoothing-based defenses are still vulnerable to attacks that inject blocks of adversarial con… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  14. A Robust Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via (De)Randomized Smoothing

    Authors: Daniel Gibert, Giulio Zizzo, Quan Le, Jordi Planes

    Abstract: Deep learning-based malware detectors have been shown to be susceptible to adversarial malware examples, i.e. malware examples that have been deliberately manipulated in order to avoid detection. In light of the vulnerability of deep learning detectors to subtle input file modifications, we propose a practical defense against adversarial malware examples inspired by (de)randomized smoothing. In th… ▽ More

    Submitted 26 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.08906

  15. arXiv:2401.10405  [pdf, other

    cs.LG

    Differentially Private and Adversarially Robust Machine Learning: An Empirical Evaluation

    Authors: Janvi Thakkar, Giulio Zizzo, Sergio Maffeis

    Abstract: Malicious adversaries can attack machine learning models to infer sensitive information or damage the system by launching a series of evasion attacks. Although various work addresses privacy and security concerns, they focus on individual defenses, but in practice, models may undergo simultaneous attacks. This study explores the combination of adversarial training and differentially private traini… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted at PPAI-24: The 5th AAAI Workshop on Privacy-Preserving Artificial Intelligence

  16. arXiv:2401.06524  [pdf, ps, other

    cs.LG

    Domain Adaptation for Time series Transformers using One-step fine-tuning

    Authors: Subina Khanal, Seshu Tirupathi, Giulio Zizzo, Ambrish Rawat, Torben Bach Pedersen

    Abstract: The recent breakthrough of Transformers in deep learning has drawn significant attention of the time series community due to their ability to capture long-range dependencies. However, like other deep learning models, Transformers face limitations in time series prediction, including insufficient temporal understanding, generalization challenges, and data shift issues for the domains with limited d… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted at the Fourth Workshop of Artificial Intelligence for Time Series Analysis (AI4TS): Theory, Algorithms, and Applications, AAAI 2024, Vancouver, Canada

  17. arXiv:2312.14260  [pdf, other

    cs.LG cs.CR

    Elevating Defenses: Bridging Adversarial Training and Watermarking for Model Resilience

    Authors: Janvi Thakkar, Giulio Zizzo, Sergio Maffeis

    Abstract: Machine learning models are being used in an increasing number of critical applications; thus, securing their integrity and ownership is critical. Recent studies observed that adversarial training and watermarking have a conflicting interaction. This work introduces a novel framework to integrate adversarial training with watermarking techniques to fortify against evasion attacks and provide confi… ▽ More

    Submitted 7 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted at DAI Workshop, AAAI 2024

  18. Towards a Practical Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via Randomized Smoothing

    Authors: Daniel Gibert, Giulio Zizzo, Quan Le

    Abstract: Malware detectors based on deep learning (DL) have been shown to be susceptible to malware examples that have been deliberately manipulated in order to evade detection, a.k.a. adversarial malware examples. More specifically, it has been show that deep learning detectors are vulnerable to small changes on the input file. Given this vulnerability of deep learning detectors, we propose a practical de… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  19. Query-Free Evasion Attacks Against Machine Learning-Based Malware Detectors with Generative Adversarial Networks

    Authors: Daniel Gibert, Jordi Planes, Quan Le, Giulio Zizzo

    Abstract: Malware detectors based on machine learning (ML) have been shown to be susceptible to adversarial malware examples. However, current methods to generate adversarial malware examples still have their limits. They either rely on detailed model information (gradient-based attacks), or on detailed outputs of the model - such as class probabilities (score-based attacks), neither of which are available… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Journal ref: 2023 IEEE European Symposium on Security and Privacy Workshops

  20. arXiv:2306.09308  [pdf, other

    cs.CL cs.AI cs.CR

    Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models

    Authors: Myles Foley, Ambrish Rawat, Taesung Lee, Yufang Hou, Gabriele Picco, Giulio Zizzo

    Abstract: The wide applicability and adaptability of generative large language models (LLMs) has enabled their rapid adoption. While the pre-trained models can perform many tasks, such models are often fine-tuned to improve their performance on various downstream applications. However, this leads to issues over violation of model licenses, model theft, and copyright infringement. Moreover, recent advances s… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  21. arXiv:2212.08290  [pdf, other

    cs.LG cs.CV

    Robust Learning Protocol for Federated Tumor Segmentation Challenge

    Authors: Ambrish Rawat, Giulio Zizzo, Swanand Kadhe, Jonathan P. Epperlein, Stefano Braghin

    Abstract: In this work, we devise robust and efficient learning protocols for orchestrating a Federated Learning (FL) process for the Federated Tumor Segmentation Challenge (FeTS 2022). Enabling FL for FeTS setup is challenging mainly due to data heterogeneity among collaborators and communication cost of training. To tackle these challenges, we propose Robust Learning Protocol (RoLePRO) which is a combinat… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: 14 pages, 2 figures, 3 tables

  22. arXiv:2112.10525  [pdf, other

    cs.LG cs.CR

    Certified Federated Adversarial Training

    Authors: Giulio Zizzo, Ambrish Rawat, Mathieu Sinn, Sergio Maffeis, Chris Hankin

    Abstract: In federated learning (FL), robust aggregation schemes have been developed to protect against malicious clients. Many robust aggregation schemes rely on certain numbers of benign clients being present in a quorum of workers. This can be hard to guarantee when clients can join at will, or join based on factors such as idle system status, and connected to power and WiFi. We tackle the scenario of se… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: First presented at the 1st NeurIPS Workshop on New Frontiers in Federated Learning (NFFL 2021)

  23. arXiv:2012.01791  [pdf, other

    cs.LG cs.CR

    FAT: Federated Adversarial Training

    Authors: Giulio Zizzo, Ambrish Rawat, Mathieu Sinn, Beat Buesser

    Abstract: Federated learning (FL) is one of the most important paradigms addressing privacy and data governance issues in machine learning (ML). Adversarial training has emerged, so far, as the most promising approach against evasion threats on ML models. In this paper, we take the first known steps towards federated adversarial training (FAT) combining both methods to reduce the threat of evasion during in… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: NeurIPS 2020 Workshop on Scalability, Privacy, and Security in Federated Learning (SpicyFL)

  24. Adversarial Attacks on Time-Series Intrusion Detection for Industrial Control Systems

    Authors: Giulio Zizzo, Chris Hankin, Sergio Maffeis, Kevin Jones

    Abstract: Neural networks are increasingly used for intrusion detection on industrial control systems (ICS). With neural networks being vulnerable to adversarial examples, attackers who wish to cause damage to an ICS can attempt to hide their attacks from detection by using adversarial example techniques. In this work we address the domain specific challenges of constructing such attacks against autoregress… ▽ More

    Submitted 3 October, 2021; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: Accepted at the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

  25. arXiv:1910.03916  [pdf, other

    cs.LG stat.ML

    Deep Latent Defence

    Authors: Giulio Zizzo, Chris Hankin, Sergio Maffeis, Kevin Jones

    Abstract: Deep learning methods have shown state of the art performance in a range of tasks from computer vision to natural language processing. However, it is well known that such systems are vulnerable to attackers who craft inputs in order to cause misclassification. The level of perturbation an attacker needs to introduce in order to cause such a misclassification can be extremely small, and often imper… ▽ More

    Submitted 27 September, 2020; v1 submitted 9 October, 2019; originally announced October 2019.