Search | arXiv e-print repository

DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift

Authors: Shae McFadden, Myles Foley, Mario D'Onghia, Chris Hicks, Vasilios Mavroudis, Nicola Paoletti, Fabio Pierazzi

Abstract: Malware detection in real-world settings must deal with evolving threats, limited labeling budgets, and uncertain predictions. Traditional classifiers, without additional mechanisms, struggle to maintain performance under concept drift in malware domains, as their supervised learning formulation cannot optimize when to defer decisions to manual labeling and adaptation. Modern malware detection pip… ▽ More Malware detection in real-world settings must deal with evolving threats, limited labeling budgets, and uncertain predictions. Traditional classifiers, without additional mechanisms, struggle to maintain performance under concept drift in malware domains, as their supervised learning formulation cannot optimize when to defer decisions to manual labeling and adaptation. Modern malware detection pipelines combine classifiers with monthly active learning (AL) and rejection mechanisms to mitigate the impact of concept drift. In this work, we develop a novel formulation of malware detection as a one-step Markov Decision Process and train a deep reinforcement learning (DRL) agent, simultaneously optimizing sample classification performance and rejecting high-risk samples for manual labeling. We evaluated the joint detection and drift mitigation policy learned by the DRL-based Malware Detection (DRMD) agent through time-aware evaluations on Android malware datasets subject to realistic drift requiring multi-year performance stability. The policies learned under these conditions achieve a higher Area Under Time (AUT) performance compared to standard classification approaches used in the domain, showing improved resilience to concept drift. Specifically, the DRMD agent achieved a $5.18\pm5.44$, $14.49\pm12.86$, and $10.06\pm10.81$ average AUT performance improvement for the classification only, classification with rejection, and classification with rejection and AL settings, respectively. Our results demonstrate for the first time that DRL can facilitate effective malware detection and improved resiliency to concept drift in the dynamic environment of the Android malware domain. △ Less

Submitted 26 August, 2025; originally announced August 2025.

Comments: 10 pages

arXiv:2412.15991 [pdf, other]

APIRL: Deep Reinforcement Learning for REST API Fuzzing

Authors: Myles Foley, Sergio Maffeis

Abstract: REST APIs have become key components of web services. However, they often contain logic flaws resulting in server side errors or security vulnerabilities. HTTP requests are used as test cases to find and mitigate such issues. Existing methods to modify requests, including those using deep learning, suffer from limited performance and precision, relying on undirected search or making limited usage… ▽ More REST APIs have become key components of web services. However, they often contain logic flaws resulting in server side errors or security vulnerabilities. HTTP requests are used as test cases to find and mitigate such issues. Existing methods to modify requests, including those using deep learning, suffer from limited performance and precision, relying on undirected search or making limited usage of the contextual information. In this paper we propose APIRL, a fully automated deep reinforcement learning tool for testing REST APIs. A key novelty of our approach is the use of feedback from a transformer module pre-trained on JSON-structured data, akin to that used in API responses. This allows APIRL to learn the subtleties relating to test outcomes, and generalise to unseen API endpoints. We show APIRL can find significantly more bugs than the state-of-the-art in real world REST APIs while minimising the number of required test cases. We also study how reward functions, and other key design choices, affect learnt policies in a thorough ablation study. △ Less

Submitted 20 December, 2024; originally announced December 2024.

Comments: Thirty-ninth Conference on Artificial Intelligence (AAAI 2025)

arXiv:2409.18197 [pdf]

doi 10.1145/3488932.3527286

Autonomous Network Defence using Reinforcement Learning

Authors: Myles Foley, Chris Hicks, Kate Highnam, Vasilios Mavroudis

Abstract: In the network security arms race, the defender is significantly disadvantaged as they need to successfully detect and counter every malicious attack. In contrast, the attacker needs to succeed only once. To level the playing field, we investigate the effectiveness of autonomous agents in a realistic network defence scenario. We first outline the problem, provide the background on reinforcement le… ▽ More In the network security arms race, the defender is significantly disadvantaged as they need to successfully detect and counter every malicious attack. In contrast, the attacker needs to succeed only once. To level the playing field, we investigate the effectiveness of autonomous agents in a realistic network defence scenario. We first outline the problem, provide the background on reinforcement learning and detail our proposed agent design. Using a network environment simulation, with 13 hosts spanning 3 subnets, we train a novel reinforcement learning agent and show that it can reliably defend continual attacks by two advanced persistent threat (APT) red agents: one with complete knowledge of the network layout and another which must discover resources through exploration but is more general. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Journal ref: ASIA CCS '22: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security

arXiv:2403.15392 [pdf]

doi 10.1016/j.scitotenv.2019.133746

Radon mitigation by soil depressurisation case study: radon concentration and pressure field extension monitoring in a pilot house in Spain

Authors: Marta Fuente, Jamie Goggins, Daniel Rabago, Ismael Fuente, Carlos Sainz, Mark Foley

Abstract: A one-year monitoring study was conducted in a pilot house with high radon levels to investigate the ability and efficiency of radon mitigation by soil depressurisation (SD) both active and passive. The study included monitoring of radon concentration, pressure field extension (pfe) under the slab and some atmospheric parameters for different testing phases. Periods in which the house remained clo… ▽ More A one-year monitoring study was conducted in a pilot house with high radon levels to investigate the ability and efficiency of radon mitigation by soil depressurisation (SD) both active and passive. The study included monitoring of radon concentration, pressure field extension (pfe) under the slab and some atmospheric parameters for different testing phases. Periods in which the house remained closed to foster radon accumulation were alternated with phases of active and passive soil depressurisation under different conditions. The behaviour of the radon concentration in the pilot house was analysed along with the influence of atmospheric variables, significant correlations were found for the radon concentration with atmospheric pressure, outdoor temperature and wind. From the pfe analysis it was proven that the pressure drop with distance from the suction point of the SD system is proportional to the depressurisation generated. It was found also that the permeability characterisation of the pilot house agrees with the literature about granular fill materials characterisation for radon SD systems across Europe. Radon reductions in excess of 85% were achieved for the different testing phases in all cases. Finally, from the results it was stated that a fan power of 23 W is sufficient to ensure radon reductions over 85% for dwellings with similar aggregate layer and soil permeability. △ Less

Submitted 8 February, 2024; originally announced March 2024.

Comments: 18 pages, 10 figures, 2 tables

Journal ref: Science of the Total Environment, 2019

arXiv:2312.04940 [pdf, other]

doi 10.1145/3605764.3623986

Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning

Authors: Chris Hicks, Vasilios Mavroudis, Myles Foley, Thomas Davies, Kate Highnam, Tim Watson

Abstract: Communication networks able to withstand hostile environments are critically important for disaster relief operations. In this paper, we consider a challenging scenario where drones have been compromised in the supply chain, during their manufacture, and harbour malicious software capable of wide-ranging and infectious disruption. We investigate multi-agent deep reinforcement learning as a tool fo… ▽ More Communication networks able to withstand hostile environments are critically important for disaster relief operations. In this paper, we consider a challenging scenario where drones have been compromised in the supply chain, during their manufacture, and harbour malicious software capable of wide-ranging and infectious disruption. We investigate multi-agent deep reinforcement learning as a tool for learning defensive strategies that maximise communications bandwidth despite continual adversarial interference. Using a public challenge for learning network resilience strategies, we propose a state-of-the-art expert technique and study its superiority over deep reinforcement learning agents. Correspondingly, we identify three specific methods for improving the performance of our learning-based agents: (1) ensuring each observation contains the necessary information, (2) using expert agents to provide a curriculum for learning, and (3) paying close attention to reward. We apply our methods and present a new mixed strategy enabling expert and learning-based agents to work together and improve on all prior results. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: Published in AISec '23. This version fixes some terminology to improve readability

Journal ref: In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. Association for Computing Machinery, 91-101 (2023)

arXiv:2306.09318 [pdf, other]

Inroads into Autonomous Network Defence using Explained Reinforcement Learning

Authors: Myles Foley, Mia Wang, Zoe M, Chris Hicks, Vasilios Mavroudis

Abstract: Computer network defence is a complicated task that has necessitated a high degree of human involvement. However, with recent advancements in machine learning, fully autonomous network defence is becoming increasingly plausible. This paper introduces an end-to-end methodology for studying attack strategies, designing defence agents and explaining their operation. First, using state diagrams, we vi… ▽ More Computer network defence is a complicated task that has necessitated a high degree of human involvement. However, with recent advancements in machine learning, fully autonomous network defence is becoming increasingly plausible. This paper introduces an end-to-end methodology for studying attack strategies, designing defence agents and explaining their operation. First, using state diagrams, we visualise adversarial behaviour to gain insight about potential points of intervention and inform the design of our defensive models. We opt to use a set of deep reinforcement learning agents trained on different parts of the task and organised in a shallow hierarchy. Our evaluation shows that the resulting design achieves a substantial performance improvement compared to prior work. Finally, to better investigate the decision-making process of our agents, we complete our analysis with a feature ablation and importance study. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.09308 [pdf, other]

Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models

Authors: Myles Foley, Ambrish Rawat, Taesung Lee, Yufang Hou, Gabriele Picco, Giulio Zizzo

Abstract: The wide applicability and adaptability of generative large language models (LLMs) has enabled their rapid adoption. While the pre-trained models can perform many tasks, such models are often fine-tuned to improve their performance on various downstream applications. However, this leads to issues over violation of model licenses, model theft, and copyright infringement. Moreover, recent advances s… ▽ More The wide applicability and adaptability of generative large language models (LLMs) has enabled their rapid adoption. While the pre-trained models can perform many tasks, such models are often fine-tuned to improve their performance on various downstream applications. However, this leads to issues over violation of model licenses, model theft, and copyright infringement. Moreover, recent advances show that generative technology is capable of producing harmful content which exacerbates the problems of accountability within model supply chains. Thus, we need a method to investigate how a model was trained or a piece of text was generated and what their pre-trained base model was. In this paper we take the first step to address this open problem by tracing back the origin of a given fine-tuned LLM to its corresponding pre-trained base model. We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2205.11398 [pdf, other]

Fine-Grained Counting with Crowd-Sourced Supervision

Authors: Justin Kay, Catherine M. Foley, Tom Hart

Abstract: Crowd-sourcing is an increasingly popular tool for image analysis in animal ecology. Computer vision methods that can utilize crowd-sourced annotations can help scale up analysis further. In this work we study the potential to do so on the challenging task of fine-grained counting. As opposed to the standard crowd counting task, fine-grained counting also involves classifying attributes of individ… ▽ More Crowd-sourcing is an increasingly popular tool for image analysis in animal ecology. Computer vision methods that can utilize crowd-sourced annotations can help scale up analysis further. In this work we study the potential to do so on the challenging task of fine-grained counting. As opposed to the standard crowd counting task, fine-grained counting also involves classifying attributes of individuals in dense crowds. We introduce a new dataset from animal ecology to enable this study that contains 1.7M crowd-sourced annotations of 8 fine-grained classes. It is the largest available dataset for fine-grained counting and the first to enable the study of the task with crowd-sourced annotations. We introduce methods for generating aggregate "ground truths" from the collected annotations, as well as a counting method that can utilize the aggregate information. Our method improves results by 8% over a comparable baseline, indicating the potential for algorithms to learn fine-grained counting using crowd-sourced supervision. △ Less

Submitted 29 May, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: In Computer Vision for Animal Behavior Tracking and Modeling Workshop at CVPR 2022. 4 pages, 3 figures

arXiv:2104.08636 [pdf, other]

doi 10.1371/journal.pcbi.1008847

Avoiding the bullies: The resilience of cooperation among unequals

Authors: Michael Foley, Rory Smead, Patrick Forber, Christoph Riedl

Abstract: Can egalitarian norms or conventions survive the presence of dominant individuals who are ensured of victory in conflicts? We investigate the interaction of power asymmetry and partner choice in games of conflict over a contested resource. We introduce three models to study the emergence and resilience of cooperation among unequals when interaction is random, when individuals can choose their part… ▽ More Can egalitarian norms or conventions survive the presence of dominant individuals who are ensured of victory in conflicts? We investigate the interaction of power asymmetry and partner choice in games of conflict over a contested resource. We introduce three models to study the emergence and resilience of cooperation among unequals when interaction is random, when individuals can choose their partners, and where power asymmetries dynamically depend on accumulated payoffs. We find that the ability to avoid bullies with higher competitive ability afforded by partner choice mostly restores cooperative conventions and that the competitive hierarchy never forms. Partner choice counteracts the hyper dominance of bullies who are isolated in the network and eliminates the need for others to coordinate in a coalition. When competitive ability dynamically depends on cumulative payoffs, complex cycles of coupled network-strategy-rank changes emerge. Effective collaborators gain popularity (and thus power), adopt aggressive behavior, get isolated, and ultimately lose power. Neither the network nor behavior converge to a stable equilibrium. Despite the instability of power dynamics, the cooperative convention in the population remains stable overall and long-term inequality is completely eliminated. The interaction between partner choice and dynamic power asymmetry is crucial for these results: without partner choice, bullies cannot be isolated, and without dynamic power asymmetry, bullies do not lose their power even when isolated. We analytically identify a single critical point that marks a phase transition in all three iterations of our models. This critical point is where the first individual breaks from the convention and cycles start to emerge. △ Less

Submitted 17 April, 2021; originally announced April 2021.

Journal ref: PLoS Computational Biology 17(4): e1008847, 2021

arXiv:1904.08180 [pdf, ps, other]

The intersection of two vertex coloring problems

Authors: Angele M. Foley, Dallas J. Fraser, Chinh T. Hoang, Kevin Holmes, Tom P. LaMantia

Abstract: A hole is an induced cycle with at least four vertices. A hole is even if its number of vertices is even. Given a set L of graphs, a graph G is L-free if G does not contain any graph in L as an induced subgraph. Currently, the following two problems are unresolved: the complexity of coloring even hole-free graphs, and the complexity of coloring {4K1, C4}-free graphs. The intersection of these two… ▽ More A hole is an induced cycle with at least four vertices. A hole is even if its number of vertices is even. Given a set L of graphs, a graph G is L-free if G does not contain any graph in L as an induced subgraph. Currently, the following two problems are unresolved: the complexity of coloring even hole-free graphs, and the complexity of coloring {4K1, C4}-free graphs. The intersection of these two problems is the problem of coloring {4K1, C4, C6}-free graphs. In this paper we present partial results on this problem. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: 16 pages

MSC Class: 68R05

arXiv:1802.08298 [pdf, other]

Conflict and Convention in Dynamic Networks

Authors: Michael Foley, Patrick Forber, Rory Smead, Christoph Riedl

Abstract: An important way to resolve games of conflict (snowdrift, hawk-dove, chicken) involves adopting a convention: a correlated equilibrium that avoids any conflict between aggressive strategies. Dynamic networks allow individuals to resolve conflict via their network connections rather than changing their strategy. Exploring how behavioral strategies coevolve with social networks reveals new dynamics… ▽ More An important way to resolve games of conflict (snowdrift, hawk-dove, chicken) involves adopting a convention: a correlated equilibrium that avoids any conflict between aggressive strategies. Dynamic networks allow individuals to resolve conflict via their network connections rather than changing their strategy. Exploring how behavioral strategies coevolve with social networks reveals new dynamics that can help explain the origins and robustness of conventions. Here we model the emergence of conventions as correlated equilibria in dynamic networks. Our results show that networks have the tendency to break the symmetry between the two conventional solutions in a strongly biased way. Rather than the correlated equilibrium associated with ownership norms (play aggressive at home, not away), we usually see the opposite host-guest norm (play aggressive away, not at home) evolve on dynamic networks, a phenomenon common to human interaction. We also show that learning to avoid conflict can produce realistic network structures in a way different than preferential attachment models. △ Less

Submitted 22 February, 2018; originally announced February 2018.

MSC Class: 91-XX

Showing 1–11 of 11 results for author: Foley, M