-
Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective
Authors:
Jean Marie Tshimula,
Xavier Ndona,
D'Jeff K. Nkashama,
Pierre-Martin Tardif,
Froduald Kabanza,
Marc Frappier,
Shengrui Wang
Abstract:
Jailbreak prompts pose a significant threat in AI and cybersecurity, as they are crafted to bypass ethical safeguards in large language models, potentially enabling misuse by cybercriminals. This paper analyzes jailbreak prompts from a cyber defense perspective, exploring techniques like prompt injection and context manipulation that allow harmful content generation, content filter evasion, and se…
▽ More
Jailbreak prompts pose a significant threat in AI and cybersecurity, as they are crafted to bypass ethical safeguards in large language models, potentially enabling misuse by cybercriminals. This paper analyzes jailbreak prompts from a cyber defense perspective, exploring techniques like prompt injection and context manipulation that allow harmful content generation, content filter evasion, and sensitive information extraction. We assess the impact of successful jailbreaks, from misinformation and automated social engineering to hazardous content creation, including bioweapons and explosives. To address these threats, we propose strategies involving advanced prompt analysis, dynamic safety protocols, and continuous model fine-tuning to strengthen AI resilience. Additionally, we highlight the need for collaboration among AI researchers, cybersecurity experts, and policymakers to set standards for protecting AI systems. Through case studies, we illustrate these cyber defense approaches, promoting responsible AI practices to maintain system integrity and public trust. \textbf{\color{red}Warning: This paper contains content which the reader may find offensive.}
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
ASTD Patterns for Integrated Continuous Anomaly Detection In Data Logs
Authors:
Chaymae El Jabri,
Marc Frappier,
Pierre-Martin Tardif
Abstract:
This paper investigates the use of the ASTD language for ensemble anomaly detection in data logs. It uses a sliding window technique for continuous learning in data streams, coupled with updating learning models upon the completion of each window to maintain accurate detection and align with current data trends. It proposes ASTD patterns for combining learning models, especially in the context of…
▽ More
This paper investigates the use of the ASTD language for ensemble anomaly detection in data logs. It uses a sliding window technique for continuous learning in data streams, coupled with updating learning models upon the completion of each window to maintain accurate detection and align with current data trends. It proposes ASTD patterns for combining learning models, especially in the context of unsupervised learning, which is commonly used for data streams. To facilitate this, a new ASTD operator is proposed, the Quantified Flow, which enables the seamless combination of learning models while ensuring that the specification remains concise. Our contribution is a specification pattern, highlighting the capacity of ASTDs to abstract and modularize anomaly detection systems. The ASTD language provides a unique approach to develop data flow anomaly detection systems, grounded in the combination of processes through the graphical representation of the language operators. This simplifies the design task for developers, who can focus primarily on defining the functional operations that constitute the system.
△ Less
Submitted 14 December, 2024; v1 submitted 10 November, 2024;
originally announced November 2024.
-
Impact of Inaccurate Contamination Ratio on Robust Unsupervised Anomaly Detection
Authors:
Jordan F. Masakuna,
DJeff Kanda Nkashama,
Arian Soltani,
Marc Frappier,
Pierre-Martin Tardif,
Froduald Kabanza
Abstract:
Training data sets intended for unsupervised anomaly detection, typically presumed to be anomaly-free, often contain anomalies (or contamination), a challenge that significantly undermines model performance. Most robust unsupervised anomaly detection models rely on contamination ratio information to tackle contamination. However, in reality, contamination ratio may be inaccurate. We investigate on…
▽ More
Training data sets intended for unsupervised anomaly detection, typically presumed to be anomaly-free, often contain anomalies (or contamination), a challenge that significantly undermines model performance. Most robust unsupervised anomaly detection models rely on contamination ratio information to tackle contamination. However, in reality, contamination ratio may be inaccurate. We investigate on the impact of inaccurate contamination ratio information in robust unsupervised anomaly detection. We verify whether they are resilient to misinformed contamination ratios. Our investigation on 6 benchmark data sets reveals that such models are not adversely affected by exposure to misinformation. In fact, they can exhibit improved performance when provided with such inaccurate contamination ratios.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Deep Learning for Network Anomaly Detection under Data Contamination: Evaluating Robustness and Mitigating Performance Degradation
Authors:
D'Jeff K. Nkashama,
Jordan Masakuna Félicien,
Arian Soltani,
Jean-Charles Verdier,
Pierre-Martin Tardif,
Marc Frappier,
Froduald Kabanza
Abstract:
Deep learning (DL) has emerged as a crucial tool in network anomaly detection (NAD) for cybersecurity. While DL models for anomaly detection excel at extracting features and learning patterns from data, they are vulnerable to data contamination -- the inadvertent inclusion of attack-related data in training sets presumed benign. This study evaluates the robustness of six unsupervised DL algorithms…
▽ More
Deep learning (DL) has emerged as a crucial tool in network anomaly detection (NAD) for cybersecurity. While DL models for anomaly detection excel at extracting features and learning patterns from data, they are vulnerable to data contamination -- the inadvertent inclusion of attack-related data in training sets presumed benign. This study evaluates the robustness of six unsupervised DL algorithms against data contamination using our proposed evaluation protocol. Results demonstrate significant performance degradation in state-of-the-art anomaly detection algorithms when exposed to contaminated data, highlighting the critical need for self-protection mechanisms in DL-based NAD models. To mitigate this vulnerability, we propose an enhanced auto-encoder with a constrained latent representation, allowing normal data to cluster more densely around a learnable center in the latent space. Our evaluation reveals that this approach exhibits improved resistance to data contamination compared to existing methods, offering a promising direction for more robust NAD systems.
△ Less
Submitted 12 September, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Psychological Profiling in Cybersecurity: A Look at LLMs and Psycholinguistic Features
Authors:
Jean Marie Tshimula,
D'Jeff K. Nkashama,
Jean Tshibangu Muabila,
René Manassé Galekwa,
Hugues Kanda,
Maximilien V. Dialufuma,
Mbuyi Mukendi Didier,
Kalonji Kalala,
Serge Mundele,
Patience Kinshie Lenye,
Tighana Wenge Basele,
Aristarque Ilunga,
Christian N. Mayemba,
Nathanaël M. Kasoro,
Selain K. Kasereka,
Hardy Mikese,
Pierre-Martin Tardif,
Marc Frappier,
Froduald Kabanza,
Belkacem Chikhaoui,
Shengrui Wang,
Ali Mulenda Sumbu,
Xavier Ndona,
Raoul Kienge-Kienge Intudi
Abstract:
The increasing sophistication of cyber threats necessitates innovative approaches to cybersecurity. In this paper, we explore the potential of psychological profiling techniques, particularly focusing on the utilization of Large Language Models (LLMs) and psycholinguistic features. We investigate the intersection of psychology and cybersecurity, discussing how LLMs can be employed to analyze textu…
▽ More
The increasing sophistication of cyber threats necessitates innovative approaches to cybersecurity. In this paper, we explore the potential of psychological profiling techniques, particularly focusing on the utilization of Large Language Models (LLMs) and psycholinguistic features. We investigate the intersection of psychology and cybersecurity, discussing how LLMs can be employed to analyze textual data for identifying psychological traits of threat actors. We explore the incorporation of psycholinguistic features, such as linguistic patterns and emotional cues, into cybersecurity frameworks. Our research underscores the importance of integrating psychological perspectives into cybersecurity practices to bolster defense mechanisms against evolving threats.
△ Less
Submitted 9 August, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Characterizing Financial Market Coverage using Artificial Intelligence
Authors:
Jean Marie Tshimula,
D'Jeff K. Nkashama,
Patrick Owusu,
Marc Frappier,
Pierre-Martin Tardif,
Froduald Kabanza,
Armelle Brun,
Jean-Marc Patenaude,
Shengrui Wang,
Belkacem Chikhaoui
Abstract:
This paper scrutinizes a database of over 4900 YouTube videos to characterize financial market coverage. Financial market coverage generates a large number of videos. Therefore, watching these videos to derive actionable insights could be challenging and complex. In this paper, we leverage Whisper, a speech-to-text model from OpenAI, to generate a text corpus of market coverage videos from Bloombe…
▽ More
This paper scrutinizes a database of over 4900 YouTube videos to characterize financial market coverage. Financial market coverage generates a large number of videos. Therefore, watching these videos to derive actionable insights could be challenging and complex. In this paper, we leverage Whisper, a speech-to-text model from OpenAI, to generate a text corpus of market coverage videos from Bloomberg and Yahoo Finance. We employ natural language processing to extract insights regarding language use from the market coverage. Moreover, we examine the prominent presence of trending topics and their evolution over time, and the impacts that some individuals and organizations have on the financial market. Our characterization highlights the dynamics of the financial market coverage and provides valuable insights reflecting broad discussions regarding recent financial events and the world economy.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
Robustness Evaluation of Deep Unsupervised Learning Algorithms for Intrusion Detection Systems
Authors:
D'Jeff Kanda Nkashama,
Arian Soltani,
Jean-Charles Verdier,
Marc Frappier,
Pierre-Martin Tardif,
Froduald Kabanza
Abstract:
Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurit…
▽ More
Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurity because they learn directly from data. Data is critical for the development of ML systems, and becomes potential targets for attackers. Basically, data poisoning or contamination is one of the most common techniques used to fool ML models through data. This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation when developing novel models, especially for intrusion detection systems.
△ Less
Submitted 30 October, 2023; v1 submitted 24 June, 2022;
originally announced July 2022.
-
A Revealing Large-Scale Evaluation of Unsupervised Anomaly Detection Algorithms
Authors:
Maxime Alvarez,
Jean-Charles Verdier,
D'Jeff K. Nkashama,
Marc Frappier,
Pierre-Martin Tardif,
Froduald Kabanza
Abstract:
Anomaly detection has many applications ranging from bank-fraud detection and cyber-threat detection to equipment maintenance and health monitoring. However, choosing a suitable algorithm for a given application remains a challenging design decision, often informed by the literature on anomaly detection algorithms. We extensively reviewed twelve of the most popular unsupervised anomaly detection m…
▽ More
Anomaly detection has many applications ranging from bank-fraud detection and cyber-threat detection to equipment maintenance and health monitoring. However, choosing a suitable algorithm for a given application remains a challenging design decision, often informed by the literature on anomaly detection algorithms. We extensively reviewed twelve of the most popular unsupervised anomaly detection methods. We observed that, so far, they have been compared using inconsistent protocols - the choice of the class of interest or the positive class, the split of training and test data, and the choice of hyperparameters - leading to ambiguous evaluations. This observation led us to define a coherent evaluation protocol which we then used to produce an updated and more precise picture of the relative performance of the twelve methods on five widely used tabular datasets. While our evaluation cannot pinpoint a method that outperforms all the others on all datasets, it identifies those that stand out and revise misconceived knowledge about their relative performances.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
The Generic SysML/KAOS Domain Metamodel
Authors:
Steve Jeffrey Tueno Fotso,
Marc Frappier,
Régine Laleau,
Amel Mammar,
Hector Ruiz Barradas
Abstract:
This paper is related to the generalised/generic version of the SysML/KAOS domain metamodel and on translation and back propagation rules between the new domain models and B System specifications.
This paper is related to the generalised/generic version of the SysML/KAOS domain metamodel and on translation and back propagation rules between the new domain models and B System specifications.
△ Less
Submitted 8 March, 2019; v1 submitted 8 November, 2018;
originally announced November 2018.
-
SysML/KAOS Domain Models and B System Specifications
Authors:
Steve Jeffrey Tueno Fotso,
Marc Frappier,
Amel Mammar,
Régine Laleau
Abstract:
In this paper, we use a combination of the SysML/KAOS requirements engineering method, an extension of SysML, with concepts of the KAOS goal model, and of the B System formal method. Translation rules from a SysML/KAOS goal model to a B System specification have been defined. They allow to obtain a skeleton of the B System specification. To complete it, we have defined a language to express the do…
▽ More
In this paper, we use a combination of the SysML/KAOS requirements engineering method, an extension of SysML, with concepts of the KAOS goal model, and of the B System formal method. Translation rules from a SysML/KAOS goal model to a B System specification have been defined. They allow to obtain a skeleton of the B System specification. To complete it, we have defined a language to express the domain model associated to the goal model. The translation of this domain model gives the structural part of the B System specification. The contribution of this paper is the description of translation rules from SysML/KAOS domain models to B System specifications. We also present the formal verification of these rules and we describe an open source tool that implements the languages and the rules. Finally, we provide a review of the application of the SysML/KAOS method on case studies such as for the formal specification of the hybrid ERTMS/ETCS level 3 standard.
△ Less
Submitted 28 June, 2018; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Formal Representation of SysML/KAOS Domain Model (Complete Version)
Authors:
Steve Tueno,
Régine Laleau,
Amel Mammar,
Marc Frappier
Abstract:
Nowadays, the usefulness of a formal language for ensuring the consistency of requirements is well established. The work presented here is part of the definition of a formally-grounded, model-based requirements engineering method for critical and complex systems. Requirements are captured through the SysML/KAOS method and the targeted formal specification is written using the Event-B method. First…
▽ More
Nowadays, the usefulness of a formal language for ensuring the consistency of requirements is well established. The work presented here is part of the definition of a formally-grounded, model-based requirements engineering method for critical and complex systems. Requirements are captured through the SysML/KAOS method and the targeted formal specification is written using the Event-B method. Firstly, an Event-B skeleton is produced from the goal hierarchy provided by the SysML/KAOS goal model. This skeleton is then completed in a second step by the Event-B specification obtained from system application domain properties that gives rise to the system structure. Considering that the domain is represented using ontologies through the SysML/KAOS Domain Model method, is it possible to automatically produce the structural part of system Event-B models ? This paper proposes a set of generic rules that translate SysML/KAOS domain ontologies into an Event-B specification. The rules have been expressed, verified and validated through the Rodin tool using the Event-B method. They are illustrated through a case study dealing with a landing gear system. Our proposition makes it possible to automatically obtain, from a representation of the system application domain in the form of ontologies, the structural part of the Event-B specification which will be used to formally validate the consistency of system requirements.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
The SysML/KAOS Domain Modeling Approach
Authors:
Steve Tueno,
Régine Laleau,
Amel Mammar,
Marc Frappier
Abstract:
A means of building safe critical systems consists of formally modeling the requirements formulated by stakeholders and ensuring their consistency with respect to application domain properties. This paper proposes a metamodel for an ontology modeling formalism based on OWL and PLIB. This modeling formalism is part of a method for modeling the domain of systems whose requirements are captured throu…
▽ More
A means of building safe critical systems consists of formally modeling the requirements formulated by stakeholders and ensuring their consistency with respect to application domain properties. This paper proposes a metamodel for an ontology modeling formalism based on OWL and PLIB. This modeling formalism is part of a method for modeling the domain of systems whose requirements are captured through SysML/KAOS. The formal semantics of SysML/KAOS goals are represented using Event-B specifications. Goals provide the set of events, while domain models will provide the structure of the system state of the Event-B specification. Our proposal is illustrated through a case study dealing with a Cycab localization component specification. The case study deals with the specification of a localization software component that uses GPS,Wi-Fi and sensor technologies for the realtime localization of the Cycab vehicle, an autonomous ground transportation system designed to be robust and completely independent.
△ Less
Submitted 2 October, 2017;
originally announced October 2017.
-
Formal refinement of extended state machines
Authors:
Thomas Fayolle,
Marc Frappier,
Régine Laleau,
Frédéric Gervais
Abstract:
In a traditional formal development process, e.g. using the B method, the informal user requirements are (manually) translated into a global abstract formal specification. This translation is especially difficult to achieve. The Event-B method was developed to incrementally and formally construct such a specification using stepwise refinement. Each increment takes into account new properties and s…
▽ More
In a traditional formal development process, e.g. using the B method, the informal user requirements are (manually) translated into a global abstract formal specification. This translation is especially difficult to achieve. The Event-B method was developed to incrementally and formally construct such a specification using stepwise refinement. Each increment takes into account new properties and system aspects. In this paper, we propose to couple a graphical notation called Algebraic State-Transition Diagrams (ASTD) with an Event-B specification in order to provide a better understanding of the software behaviour. The dynamic behaviour is captured by the ASTD, which is based on automata and process algebra operators, while the data model is described by means of an Event-B specification. We propose a methodology to incrementally refine such specification couplings, taking into account new refinement relations and consistency conditions between the control specification and the data specification. We compare the specifications obtained using each approach for readability and proof complexity. The advantages and drawbacks of the traditional approach and of our methodology are discussed. The whole process is illustrated by a railway CBTC-like case study. Our approach is supported by tools for translating ASTD's into B and Event-B into B.
△ Less
Submitted 7 June, 2016;
originally announced June 2016.