-
Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS
Authors:
Sabrine Ennaji,
Elhadj Benkhelifa,
Luigi V. Mancini
Abstract:
Adversarial attacks, wherein slight inputs are carefully crafted to mislead intelligent models, have attracted increasing attention. However, a critical gap persists between theoretical advancements and practical application, particularly in structured data like network traffic, where interdependent features complicate effective adversarial manipulations. Moreover, ambiguity in current approaches…
▽ More
Adversarial attacks, wherein slight inputs are carefully crafted to mislead intelligent models, have attracted increasing attention. However, a critical gap persists between theoretical advancements and practical application, particularly in structured data like network traffic, where interdependent features complicate effective adversarial manipulations. Moreover, ambiguity in current approaches restricts reproducibility and limits progress in this field. Hence, existing defenses often fail to handle evolving adversarial attacks. This paper proposes a novel approach for black-box adversarial attacks, that addresses these limitations. Unlike prior work, which often assumes system access or relies on repeated probing, our method strictly respect black-box constraints, reducing interaction to avoid detection and better reflect real-world scenarios. We present an adaptive feature selection strategy using change-point detection and causality analysis to identify and target sensitive features to perturbations. This lightweight design ensures low computational cost and high deployability. Our comprehensive experiments show the attack's effectiveness in evading detection with minimal interaction, enhancing its adaptability and applicability in real-world scenarios. By advancing the understanding of adversarial attacks in network traffic, this work lays a foundation for developing robust defenses.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
Authors:
Antonio Ocello,
Daniil Tiapkin,
Lorenzo Mancini,
Mathieu Laurière,
Eric Moulines
Abstract:
We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optim…
▽ More
We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark
Authors:
William Corrias,
Fabio De Gaspari,
Dorjan Hitaj,
Luigi V. Mancini
Abstract:
Recent advances in generative models have led to their application in password guessing, with the aim of replicating the complexity, structure, and patterns of human-created passwords. Despite their potential, inconsistencies and inadequate evaluation methodologies in prior research have hindered meaningful comparisons and a comprehensive, unbiased understanding of their capabilities. This paper i…
▽ More
Recent advances in generative models have led to their application in password guessing, with the aim of replicating the complexity, structure, and patterns of human-created passwords. Despite their potential, inconsistencies and inadequate evaluation methodologies in prior research have hindered meaningful comparisons and a comprehensive, unbiased understanding of their capabilities. This paper introduces MAYA, a unified, customizable, plug-and-play benchmarking framework designed to facilitate the systematic characterization and benchmarking of generative password-guessing models in the context of trawling attacks. Using MAYA, we conduct a comprehensive assessment of six state-of-the-art approaches, which we re-implemented and adapted to ensure standardization. Our evaluation spans eight real-world password datasets and covers an exhaustive set of advanced testing scenarios, totaling over 15,000 compute hours. Our findings indicate that these models effectively capture different aspects of human password distribution and exhibit strong generalization capabilities. However, their effectiveness varies significantly with long and complex passwords. Through our evaluation, sequential models consistently outperform other generative architectures and traditional password-guessing tools, demonstrating unique capabilities in generating accurate and complex guesses. Moreover, the diverse password distributions learned by the models enable a multi-model attack that outperforms the best individual model. By releasing MAYA, we aim to foster further research, providing the community with a new tool to consistently and reliably benchmark generative password-guessing models. Our framework is publicly available at https://github.com/williamcorrias/MAYA-Password-Benchmarking.
△ Less
Submitted 12 June, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
Toward Realistic Adversarial Attacks in IDS: A Novel Feasibility Metric for Transferability
Authors:
Sabrine Ennaji,
Elhadj Benkhelifa,
Luigi Vincenzo Mancini
Abstract:
Transferability-based adversarial attacks exploit the ability of adversarial examples, crafted to deceive a specific source Intrusion Detection System (IDS) model, to also mislead a target IDS model without requiring access to the training data or any internal model parameters. These attacks exploit common vulnerabilities in machine learning models to bypass security measures and compromise system…
▽ More
Transferability-based adversarial attacks exploit the ability of adversarial examples, crafted to deceive a specific source Intrusion Detection System (IDS) model, to also mislead a target IDS model without requiring access to the training data or any internal model parameters. These attacks exploit common vulnerabilities in machine learning models to bypass security measures and compromise systems. Although the transferability concept has been widely studied, its practical feasibility remains limited due to assumptions of high similarity between source and target models. This paper analyzes the core factors that contribute to transferability, including feature alignment, model architectural similarity, and overlap in the data distributions that each IDS examines. We propose a novel metric, the Transferability Feasibility Score (TFS), to assess the feasibility and reliability of such attacks based on these factors. Through experimental evidence, we demonstrate that TFS and actual attack success rates are highly correlated, addressing the gap between theoretical understanding and real-world impact. Our findings provide needed guidance for designing more realistic transferable adversarial attacks, developing robust defenses, and ultimately improving the security of machine learning-based IDS in critical systems.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
Authors:
Safwan Labbi,
Daniil Tiapkin,
Lorenzo Mancini,
Paul Mangold,
Eric Moulines
Abstract:
In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to hete…
▽ More
In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to heterogeneity, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $\texttt{Fed-UCBVI}$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, $\texttt{Fed-UCBVI}$'s communication complexity only marginally increases with the number of agents.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Joint Channel Selection using FedDRL in V2X
Authors:
Lorenzo Mancini,
Safwan Labbi,
Karim Abed Meraim,
Fouzi Boukhalfa,
Alain Durmus,
Paul Mangold,
Eric Moulines
Abstract:
Vehicle-to-everything (V2X) communication technology is revolutionizing transportation by enabling interactions between vehicles, devices, and infrastructures. This connectivity enhances road safety, transportation efficiency, and driver assistance systems. V2X benefits from Machine Learning, enabling real-time data analysis, better decision-making, and improved traffic predictions, making transpo…
▽ More
Vehicle-to-everything (V2X) communication technology is revolutionizing transportation by enabling interactions between vehicles, devices, and infrastructures. This connectivity enhances road safety, transportation efficiency, and driver assistance systems. V2X benefits from Machine Learning, enabling real-time data analysis, better decision-making, and improved traffic predictions, making transportation safer and more efficient. In this paper, we study the problem of joint channel selection, where vehicles with different technologies choose one or more Access Points (APs) to transmit messages in a network. In this problem, vehicles must learn a strategy for channel selection, based on observations that incorporate vehicles' information (position and speed), network and communication data (Signal-to-Interference-plus-Noise Ratio from past communications), and environmental data (road type). We propose an approach based on Federated Deep Reinforcement Learning (FedDRL), which enables each vehicle to benefit from other vehicles' experiences. Specifically, we apply the federated Proximal Policy Optimization (FedPPO) algorithm to this task. We show that this method improves communication reliability while minimizing transmission costs and channel switches. The efficiency of the proposed solution is assessed via realistic simulations, highlighting the potential of FedDRL to advance V2X technology.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Adversarial Challenges in Network Intrusion Detection Systems: Research Insights and Future Prospects
Authors:
Sabrine Ennaji,
Fabio De Gaspari,
Dorjan Hitaj,
Alicia Kbidi,
Luigi V. Mancini
Abstract:
Machine learning has brought significant advances in cybersecurity, particularly in the development of Intrusion Detection Systems (IDS). These improvements are mainly attributed to the ability of machine learning algorithms to identify complex relationships between features and effectively generalize to unseen data. Deep neural networks, in particular, contributed to this progress by enabling the…
▽ More
Machine learning has brought significant advances in cybersecurity, particularly in the development of Intrusion Detection Systems (IDS). These improvements are mainly attributed to the ability of machine learning algorithms to identify complex relationships between features and effectively generalize to unseen data. Deep neural networks, in particular, contributed to this progress by enabling the analysis of large amounts of training data, significantly enhancing detection performance. However, machine learning models remain vulnerable to adversarial attacks, where carefully crafted input data can mislead the model into making incorrect predictions. While adversarial threats in unstructured data, such as images and text, have been extensively studied, their impact on structured data like network traffic is less explored. This survey aims to address this gap by providing a comprehensive review of machine learning-based Network Intrusion Detection Systems (NIDS) and thoroughly analyzing their susceptibility to adversarial attacks. We critically examine existing research in NIDS, highlighting key trends, strengths, and limitations, while identifying areas that require further exploration. Additionally, we discuss emerging challenges in the field and offer insights for the development of more robust and resilient NIDS. In summary, this paper enhances the understanding of adversarial attacks and defenses in NIDS and guide future research in improving the robustness of machine learning models in cybersecurity applications.
△ Less
Submitted 22 October, 2024; v1 submitted 27 September, 2024;
originally announced September 2024.
-
Have You Poisoned My Data? Defending Neural Networks against Data Poisoning
Authors:
Fabio De Gaspari,
Dorjan Hitaj,
Luigi V. Mancini
Abstract:
The unprecedented availability of training data fueled the rapid development of powerful neural networks in recent years. However, the need for such large amounts of data leads to potential threats such as poisoning attacks: adversarial manipulations of the training data aimed at compromising the learned model to achieve a given adversarial goal.
This paper investigates defenses against clean-la…
▽ More
The unprecedented availability of training data fueled the rapid development of powerful neural networks in recent years. However, the need for such large amounts of data leads to potential threats such as poisoning attacks: adversarial manipulations of the training data aimed at compromising the learned model to achieve a given adversarial goal.
This paper investigates defenses against clean-label poisoning attacks and proposes a novel approach to detect and filter poisoned datapoints in the transfer learning setting. We define a new characteristic vector representation of datapoints and show that it effectively captures the intrinsic properties of the data distribution. Through experimental analysis, we demonstrate that effective poisons can be successfully differentiated from clean points in the characteristic vector space. We thoroughly evaluate our proposed approach and compare it to existing state-of-the-art defenses using multiple architectures, datasets, and poison budgets. Our evaluation shows that our proposal outperforms existing approaches in defense rate and final trained model performance across all experimental settings.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Do You Trust Your Model? Emerging Malware Threats in the Deep Learning Ecosystem
Authors:
Dorjan Hitaj,
Giulio Pagnotta,
Fabio De Gaspari,
Sediola Ruko,
Briland Hitaj,
Luigi V. Mancini,
Fernando Perez-Cruz
Abstract:
Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively ju…
▽ More
Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively just data in tensor form and considered safe. In this paper, we raise awareness of a new machine learning supply chain threat targeting neural networks. We introduce MaleficNet 2.0, a novel technique to embed self-extracting, self-executing malware in neural networks. MaleficNet 2.0 uses spread-spectrum channel coding combined with error correction techniques to inject malicious payloads in the parameters of deep neural networks. MaleficNet 2.0 injection technique is stealthy, does not degrade the performance of the model, and is robust against removal techniques. We design our approach to work both in traditional and distributed learning settings such as Federated Learning, and demonstrate that it is effective even when a reduced number of bits is used for the model parameters. Finally, we implement a proof-of-concept self-extracting neural network malware using MaleficNet 2.0, demonstrating the practicality of the attack against a widely adopted machine learning framework. Our aim with this work is to raise awareness against these new, dangerous attacks both in the research community and industry, and we hope to encourage further research in mitigation techniques against such threats.
△ Less
Submitted 13 May, 2025; v1 submitted 6 March, 2024;
originally announced March 2024.
-
OliVaR: Improving Olive Variety Recognition using Deep Neural Networks
Authors:
Hristofor Miho,
Giulio Pagnotta,
Dorjan Hitaj,
Fabio De Gaspari,
Luigi V. Mancini,
Georgios Koubouris,
Gianluca Godino,
Mehmet Hakan,
Concepcion Muñoz Diez
Abstract:
The easy and accurate identification of varieties is fundamental in agriculture, especially in the olive sector, where more than 1200 olive varieties are currently known worldwide. Varietal misidentification leads to many potential problems for all the actors in the sector: farmers and nursery workers may establish the wrong variety, leading to its maladaptation in the field; olive oil and table o…
▽ More
The easy and accurate identification of varieties is fundamental in agriculture, especially in the olive sector, where more than 1200 olive varieties are currently known worldwide. Varietal misidentification leads to many potential problems for all the actors in the sector: farmers and nursery workers may establish the wrong variety, leading to its maladaptation in the field; olive oil and table olive producers may label and sell a non-authentic product; consumers may be misled; and breeders may commit errors during targeted crossings between different varieties. To date, the standard for varietal identification and certification consists of two methods: morphological classification and genetic analysis. The morphological classification consists of the visual pairwise comparison of different organs of the olive tree, where the most important organ is considered to be the endocarp. In contrast, different methods for genetic classification exist (RAPDs, SSR, and SNP). Both classification methods present advantages and disadvantages. Visual morphological classification requires highly specialized personnel and is prone to human error. Genetic identification methods are more accurate but incur a high cost and are difficult to implement. This paper introduces OliVaR, a novel approach to olive varietal identification. OliVaR uses a teacher-student deep learning architecture to learn the defining characteristics of the endocarp of each specific olive variety and perform classification. We construct what is, to the best of our knowledge, the largest olive variety dataset to date, comprising image data for 131 varieties from the Mediterranean basin. We thoroughly test OliVaR on this dataset and show that it correctly predicts olive varieties with over 86% accuracy.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
DOLOS: A Novel Architecture for Moving Target Defense
Authors:
Giulio Pagnotta,
Fabio De Gaspari,
Dorjan Hitaj,
Mauro Andreolini,
Michele Colajanni,
Luigi V. Mancini
Abstract:
Moving Target Defense and Cyber Deception emerged in recent years as two key proactive cyber defense approaches, contrasting with the static nature of the traditional reactive cyber defense. The key insight behind these approaches is to impose an asymmetric disadvantage for the attacker by using deception and randomization techniques to create a dynamic attack surface. Moving Target Defense typica…
▽ More
Moving Target Defense and Cyber Deception emerged in recent years as two key proactive cyber defense approaches, contrasting with the static nature of the traditional reactive cyber defense. The key insight behind these approaches is to impose an asymmetric disadvantage for the attacker by using deception and randomization techniques to create a dynamic attack surface. Moving Target Defense typically relies on system randomization and diversification, while Cyber Deception is based on decoy nodes and fake systems to deceive attackers. However, current Moving Target Defense techniques are complex to manage and can introduce high overheads, while Cyber Deception nodes are easily recognized and avoided by adversaries. This paper presents DOLOS, a novel architecture that unifies Cyber Deception and Moving Target Defense approaches. DOLOS is motivated by the insight that deceptive techniques are much more powerful when integrated into production systems rather than deployed alongside them. DOLOS combines typical Moving Target Defense techniques, such as randomization, diversity, and redundancy, with cyber deception and seamlessly integrates them into production systems through multiple layers of isolation. We extensively evaluate DOLOS against a wide range of attackers, ranging from automated malware to professional penetration testers, and show that DOLOS is highly effective in slowing down attacks and protecting the integrity of production systems. We also provide valuable insights and considerations for the future development of MTD techniques based on our findings.
△ Less
Submitted 27 September, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Minerva: A File-Based Ransomware Detector
Authors:
Dorjan Hitaj,
Giulio Pagnotta,
Fabio De Gaspari,
Lorenzo De Carli,
Luigi V. Mancini
Abstract:
Ransomware attacks have caused billions of dollars in damages in recent years, and are expected to cause billions more in the future. Consequently, significant effort has been devoted to ransomware detection and mitigation. Behavioral-based ransomware detection approaches have garnered considerable attention recently. These behavioral detectors typically rely on process-based behavioral profiles t…
▽ More
Ransomware attacks have caused billions of dollars in damages in recent years, and are expected to cause billions more in the future. Consequently, significant effort has been devoted to ransomware detection and mitigation. Behavioral-based ransomware detection approaches have garnered considerable attention recently. These behavioral detectors typically rely on process-based behavioral profiles to identify malicious behaviors. However, with an increasing body of literature highlighting the vulnerability of such approaches to evasion attacks, a comprehensive solution to the ransomware problem remains elusive. This paper presents Minerva, a novel, robust approach to ransomware detection. Minerva is engineered to be robust by design against evasion attacks, with architectural and feature selection choices informed by their resilience to adversarial manipulation. We conduct a comprehensive analysis of Minerva across a diverse spectrum of ransomware types, encompassing unseen ransomware as well as variants designed specifically to evade Minerva. Our evaluation showcases the ability of Minerva to accurately identify ransomware, generalize to unseen threats, and withstand evasion attacks. Furthermore, over 99% of detected ransomware are identified within 0.52sec of activity, enabling the adoption of data loss prevention techniques with near-zero overhead.
△ Less
Submitted 29 March, 2025; v1 submitted 26 January, 2023;
originally announced January 2023.
-
TATTOOED: A Robust Deep Neural Network Watermarking Scheme based on Spread-Spectrum Channel Coding
Authors:
Giulio Pagnotta,
Dorjan Hitaj,
Briland Hitaj,
Fernando Perez-Cruz,
Luigi V. Mancini
Abstract:
Watermarking of deep neural networks (DNNs) has gained significant traction in recent years, with numerous (watermarking) strategies being proposed as mechanisms that can help verify the ownership of a DNN in scenarios where these models are obtained without the permission of the owner. However, a growing body of work has demonstrated that existing watermarking mechanisms are highly susceptible to…
▽ More
Watermarking of deep neural networks (DNNs) has gained significant traction in recent years, with numerous (watermarking) strategies being proposed as mechanisms that can help verify the ownership of a DNN in scenarios where these models are obtained without the permission of the owner. However, a growing body of work has demonstrated that existing watermarking mechanisms are highly susceptible to removal techniques, such as fine-tuning, parameter pruning, or shuffling. In this paper, we build upon extensive prior work on covert (military) communication and propose TATTOOED, a novel DNN watermarking technique that is robust to existing threats. We demonstrate that using TATTOOED as their watermarking mechanisms, the DNN owner can successfully obtain the watermark and verify model ownership even in scenarios where 99% of model parameters are altered. Furthermore, we show that TATTOOED is easy to employ in training pipelines, and has negligible impact on model performance.
△ Less
Submitted 3 June, 2024; v1 submitted 12 February, 2022;
originally announced February 2022.
-
FedComm: Federated Learning as a Medium for Covert Communication
Authors:
Dorjan Hitaj,
Giulio Pagnotta,
Briland Hitaj,
Fernando Perez-Cruz,
Luigi V. Mancini
Abstract:
Proposed as a solution to mitigate the privacy implications related to the adoption of deep learning, Federated Learning (FL) enables large numbers of participants to successfully train deep neural networks without having to reveal the actual private training data. To date, a substantial amount of research has investigated the security and privacy properties of FL, resulting in a plethora of innov…
▽ More
Proposed as a solution to mitigate the privacy implications related to the adoption of deep learning, Federated Learning (FL) enables large numbers of participants to successfully train deep neural networks without having to reveal the actual private training data. To date, a substantial amount of research has investigated the security and privacy properties of FL, resulting in a plethora of innovative attack and defense strategies. This paper thoroughly investigates the communication capabilities of an FL scheme. In particular, we show that a party involved in the FL learning process can use FL as a covert communication medium to send an arbitrary message. We introduce FedComm, a novel multi-system covert-communication technique that enables robust sharing and transfer of targeted payloads within the FL framework. Our extensive theoretical and empirical evaluations show that FedComm provides a stealthy communication channel, with minimal disruptions to the training process. Our experiments show that FedComm successfully delivers 100% of a payload in the order of kilobits before the FL procedure converges. Our evaluation also shows that FedComm is independent of the application domain and the neural network architecture used by the underlying FL scheme.
△ Less
Submitted 17 May, 2023; v1 submitted 21 January, 2022;
originally announced January 2022.
-
MalPhase: Fine-Grained Malware Detection Using Network Flow Data
Authors:
Michal Piskozub,
Fabio De Gaspari,
Frederick Barr-Smith,
Luigi V. Mancini,
Ivan Martinovic
Abstract:
Economic incentives encourage malware authors to constantly develop new, increasingly complex malware to steal sensitive data or blackmail individuals and companies into paying large ransoms. In 2017, the worldwide economic impact of cyberattacks is estimated to be between 445 and 600 billion USD, or 0.8% of global GDP. Traditionally, one of the approaches used to defend against malware is network…
▽ More
Economic incentives encourage malware authors to constantly develop new, increasingly complex malware to steal sensitive data or blackmail individuals and companies into paying large ransoms. In 2017, the worldwide economic impact of cyberattacks is estimated to be between 445 and 600 billion USD, or 0.8% of global GDP. Traditionally, one of the approaches used to defend against malware is network traffic analysis, which relies on network data to detect the presence of potentially malicious software. However, to keep up with increasing network speeds and amount of traffic, network analysis is generally limited to work on aggregated network data, which is traditionally challenging and yields mixed results. In this paper we present MalPhase, a system that was designed to cope with the limitations of aggregated flows. MalPhase features a multi-phase pipeline for malware detection, type and family classification. The use of an extended set of network flow features and a simultaneous multi-tier architecture facilitates a performance improvement for deep learning models, making them able to detect malicious flows (>98% F1) and categorize them to a respective malware type (>93% F1) and family (>91% F1). Furthermore, the use of robust features and denoising autoencoders allows MalPhase to perform well on samples with varying amounts of benign traffic mixed in. Finally, MalPhase detects unseen malware samples with performance comparable to that of known samples, even when interlaced with benign flows to reflect realistic network environments.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
PassFlow: Guessing Passwords with Generative Flows
Authors:
Giulio Pagnotta,
Dorjan Hitaj,
Fabio De Gaspari,
Luigi V. Mancini
Abstract:
Recent advances in generative machine learning models rekindled research interest in the area of password guessing. Data-driven password guessing approaches based on GANs, language models and deep latent variable models have shown impressive generalization performance and offer compelling properties for the task of password guessing. In this paper, we propose PassFlow, a flow-based generative mode…
▽ More
Recent advances in generative machine learning models rekindled research interest in the area of password guessing. Data-driven password guessing approaches based on GANs, language models and deep latent variable models have shown impressive generalization performance and offer compelling properties for the task of password guessing. In this paper, we propose PassFlow, a flow-based generative model approach to password guessing. Flow-based models allow for precise log-likelihood computation and optimization, which enables exact latent variable inference. Additionally, flow-based models provide meaningful latent space representation, which enables operations such as exploration of specific subspaces of the latent space and interpolation. We demonstrate the applicability of generative flows to the context of password guessing, departing from previous applications of flow-networks which are mainly limited to the continuous space of image generation. We show that PassFlow is able to outperform prior state-of-the-art GAN-based approaches in the password guessing task while using a training set that is orders of magnitudes smaller than that of previous art. Furthermore, a qualitative analysis of the generated samples shows that PassFlow can accurately model the distribution of the original passwords, with even non-matched samples closely resembling human-like passwords.
△ Less
Submitted 14 December, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Reliable Detection of Compressed and Encrypted Data
Authors:
Fabio De Gaspari,
Dorjan Hitaj,
Giulio Pagnotta,
Lorenzo De Carli,
Luigi V. Mancini
Abstract:
Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments. However, modern content types use compression techniques which alter data distribution pushing it cl…
▽ More
Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments. However, modern content types use compression techniques which alter data distribution pushing it closer to the uniform distribution. The result is that current approaches exhibit unreliable encryption detection performance when compressed data appears in the dataset. Furthermore, proposed approaches are typically evaluated over few data types and fragment sizes, making it hard to assess their practical applicability. This paper compares existing statistical tests on a large, standardized dataset and shows that current approaches consistently fail to distinguish encrypted and compressed data on both small and large fragment sizes. We address these shortcomings and design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data. We evaluate EnCoD on a dataset of 16 different file types and fragment sizes ranging from 512B to 8KB. Our results highlight that EnCoD outperforms current approaches by a wide margin, with accuracy ranging from ~82 for 512B fragments up to ~92 for 8KB data fragments. Moreover, EnCoD can pinpoint the exact format of a given data fragment, rather than performing only binary classification like previous approaches.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.
-
Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training
Authors:
Dorjan Hitaj,
Giulio Pagnotta,
Iacopo Masi,
Luigi V. Mancini
Abstract:
In this technical report, we evaluate the adversarial robustness of a very recent method called "Geometry-aware Instance-reweighted Adversarial Training"[7]. GAIRAT reports state-of-the-art results on defenses to adversarial attacks on the CIFAR-10 dataset. In fact, we find that a network trained with this method, while showing an improvement over regular adversarial training (AT), is biasing the…
▽ More
In this technical report, we evaluate the adversarial robustness of a very recent method called "Geometry-aware Instance-reweighted Adversarial Training"[7]. GAIRAT reports state-of-the-art results on defenses to adversarial attacks on the CIFAR-10 dataset. In fact, we find that a network trained with this method, while showing an improvement over regular adversarial training (AT), is biasing the model towards certain samples by re-scaling the loss. Indeed, this leads the model to be susceptible to attacks that scale the logits. The original model shows an accuracy of 59% under AutoAttack - when trained with additional data with pseudo-labels. We provide an analysis that shows the opposite. In particular, we craft a PGD attack multiplying the logits by a positive scalar that decreases the GAIRAT accuracy from from 55% to 44%, when trained solely on CIFAR-10. In this report, we rigorously evaluate the model and provide insights into the reasons behind the vulnerability of GAIRAT to this adversarial attack. The code to reproduce our evaluation is made available at https://github.com/giuxhub/GAIRAT-LSA
△ Less
Submitted 5 March, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition
Authors:
Anugunj Naman,
Chetan Sinha,
Liliana Mancini
Abstract:
In this paper, we analyze the feasibility of applying few-shot learning to speech emotion recognition task (SER). The current speech emotion recognition models work exceptionally well but fail when then input is multilingual. Moreover, when training such models, the models' performance is suitable only when the training corpus is vast. This availability of a big training corpus is a significant pr…
▽ More
In this paper, we analyze the feasibility of applying few-shot learning to speech emotion recognition task (SER). The current speech emotion recognition models work exceptionally well but fail when then input is multilingual. Moreover, when training such models, the models' performance is suitable only when the training corpus is vast. This availability of a big training corpus is a significant problem when choosing a language that is not much popular or obscure. We attempt to solve this challenge of multilingualism and lack of available data by turning this problem into a few-shot learning problem. We suggest relaxing the assumption that all N classes in an N-way K-shot problem be new and define an N+F way problem where N and F are the number of emotion classes and predefined fixed classes, respectively. We propose this modification to the Model-Agnostic MetaLearning (MAML) algorithm to solve the problem and call this new model F-MAML. This modification performs better than the original MAML and outperforms on EmoFilm dataset.
△ Less
Submitted 30 May, 2022; v1 submitted 5 January, 2021;
originally announced January 2021.
-
Capture the Bot: Using Adversarial Examples to Improve CAPTCHA Robustness to Bot Attacks
Authors:
Dorjan Hitaj,
Briland Hitaj,
Sushil Jajodia,
Luigi V. Mancini
Abstract:
To this date, CAPTCHAs have served as the first line of defense preventing unauthorized access by (malicious) bots to web-based services, while at the same time maintaining a trouble-free experience for human visitors. However, recent work in the literature has provided evidence of sophisticated bots that make use of advancements in machine learning (ML) to easily bypass existing CAPTCHA-based def…
▽ More
To this date, CAPTCHAs have served as the first line of defense preventing unauthorized access by (malicious) bots to web-based services, while at the same time maintaining a trouble-free experience for human visitors. However, recent work in the literature has provided evidence of sophisticated bots that make use of advancements in machine learning (ML) to easily bypass existing CAPTCHA-based defenses. In this work, we take the first step to address this problem. We introduce CAPTURE, a novel CAPTCHA scheme based on adversarial examples. While typically adversarial examples are used to lead an ML model astray, with CAPTURE, we attempt to make a "good use" of such mechanisms. Our empirical evaluations show that CAPTURE can produce CAPTCHAs that are easy to solve by humans while at the same time, effectively thwarting ML-based bot solvers.
△ Less
Submitted 4 November, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
EnCoD: Distinguishing Compressed and Encrypted File Fragments
Authors:
Fabio De Gaspari,
Dorjan Hitaj,
Giulio Pagnotta,
Lorenzo De Carli,
Luigi V. Mancini
Abstract:
Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency.…
▽ More
Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, select data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
The Naked Sun: Malicious Cooperation Between Benign-Looking Processes
Authors:
Fabio De Gaspari,
Dorjan Hitaj,
Giulio Pagnotta,
Lorenzo De Carli,
Luigi V. Mancini
Abstract:
Recent progress in machine learning has generated promising results in behavioral malware detection. Behavioral modeling identifies malicious processes via features derived by their runtime behavior. Behavioral features hold great promise as they are intrinsically related to the functioning of each malware, and are therefore considered difficult to evade. Indeed, while a significant amount of resu…
▽ More
Recent progress in machine learning has generated promising results in behavioral malware detection. Behavioral modeling identifies malicious processes via features derived by their runtime behavior. Behavioral features hold great promise as they are intrinsically related to the functioning of each malware, and are therefore considered difficult to evade. Indeed, while a significant amount of results exists on evasion of static malware features, evasion of dynamic features has seen limited work. This paper thoroughly examines the robustness of behavioral malware detectors to evasion, focusing particularly on anti-ransomware evasion. We choose ransomware as its behavior tends to differ significantly from that of benign processes, making it a low-hanging fruit for behavioral detection (and a difficult candidate for evasion). Our analysis identifies a set of novel attacks that distribute the overall malware workload across a small set of cooperating processes to avoid the generation of significant behavioral features. Our most effective attack decreases the accuracy of a state-of-the-art classifier from 98.6% to 0% using only 18 cooperating processes. Furthermore, we show our attacks to be effective against commercial ransomware detectors even in a black-box setting.
△ Less
Submitted 6 November, 2019;
originally announced November 2019.
-
Integrated Optimization of Ascent Trajectory and SRM Design of Multistage Launch Vehicles
Authors:
Lorenzo Federici,
Alessandro Zavoli,
Guido Colasurdo,
Lucandrea Mancini,
Agostino Neri
Abstract:
This paper presents a methodology for the concurrent first-stage preliminary design and ascent trajectory optimization, with application to a Vega-derived Light Launch Vehicle. The reuse as first stage of an existing upper-stage (Zefiro 40) requires a propellant grain geometry redesign, in order to account for the mutated operating conditions. An optimization code based on the parallel running of…
▽ More
This paper presents a methodology for the concurrent first-stage preliminary design and ascent trajectory optimization, with application to a Vega-derived Light Launch Vehicle. The reuse as first stage of an existing upper-stage (Zefiro 40) requires a propellant grain geometry redesign, in order to account for the mutated operating conditions. An optimization code based on the parallel running of several Differential Evolution algorithms is used to find the optimal internal pressure law during Z40 operation, together with the optimal thrust direction and other relevant flight parameters of the entire ascent trajectory. Payload injected into a target orbit is maximized, while respecting multiple design constraints, either involving the alone solid rocket motor or dependent on the actual flight trajectory. Numerical results for SSO injection are presented.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Have You Stolen My Model? Evasion Attacks Against Deep Neural Network Watermarking Techniques
Authors:
Dorjan Hitaj,
Luigi V. Mancini
Abstract:
Deep neural networks have had enormous impact on various domains of computer science, considerably outperforming previous state of the art machine learning techniques. To achieve this performance, neural networks need large quantities of data and huge computational resources, which heavily increases their construction costs. The increased cost of building a good deep neural network model gives ris…
▽ More
Deep neural networks have had enormous impact on various domains of computer science, considerably outperforming previous state of the art machine learning techniques. To achieve this performance, neural networks need large quantities of data and huge computational resources, which heavily increases their construction costs. The increased cost of building a good deep neural network model gives rise to a need for protecting this investment from potential copyright infringements. Legitimate owners of a machine learning model want to be able to reliably track and detect a malicious adversary that tries to steal the intellectual property related to the model. Recently, this problem was tackled by introducing in deep neural networks the concept of watermarking, which allows a legitimate owner to embed some secret information(watermark) in a given model. The watermark allows the legitimate owner to detect copyright infringements of his model. This paper focuses on verifying the robustness and reliability of state-of- the-art deep neural network watermarking schemes. We show that, a malicious adversary, even in scenarios where the watermark is difficult to remove, can still evade the verification by the legitimate owners, thus avoiding the detection of model theft.
△ Less
Submitted 3 September, 2018;
originally announced September 2018.
-
A Modality-Adaptive Method for Segmenting Brain Tumors and Organs-at-Risk in Radiation Therapy Planning
Authors:
Mikael Agn,
Per Munck af Rosenschöld,
Oula Puonti,
Michael J. Lundemann,
Laura Mancini,
Anastasia Papadaki,
Steffi Thust,
John Ashburner,
Ian Law,
Koen Van Leemput
Abstract:
In this paper we present a method for simultaneously segmenting brain tumors and an extensive set of organs-at-risk for radiation therapy planning of glioblastomas. The method combines a contrast-adaptive generative model for whole-brain segmentation with a new spatial regularization model of tumor shape using convolutional restricted Boltzmann machines. We demonstrate experimentally that the meth…
▽ More
In this paper we present a method for simultaneously segmenting brain tumors and an extensive set of organs-at-risk for radiation therapy planning of glioblastomas. The method combines a contrast-adaptive generative model for whole-brain segmentation with a new spatial regularization model of tumor shape using convolutional restricted Boltzmann machines. We demonstrate experimentally that the method is able to adapt to image acquisitions that differ substantially from any available training data, ensuring its applicability across treatment sites; that its tumor segmentation accuracy is comparable to that of the current state of the art; and that it captures most organs-at-risk sufficiently well for radiation therapy planning purposes. The proposed method may be a valuable step towards automating the delineation of brain tumors and organs-at-risk in glioblastoma patients undergoing radiation therapy.
△ Less
Submitted 15 August, 2018; v1 submitted 18 July, 2018;
originally announced July 2018.
-
RADIS: Remote Attestation of Distributed IoT Services
Authors:
Mauro Conti,
Edlira Dushku,
Luigi V. Mancini
Abstract:
Remote attestation is a security technique through which a remote trusted party (i.e., Verifier) checks the trustworthiness of a potentially untrusted device (i.e., Prover). In the Internet of Things (IoT) systems, the existing remote attestation protocols propose various approaches to detect the modified software and physical tampering attacks. However, in an interoperable IoT system, in which Io…
▽ More
Remote attestation is a security technique through which a remote trusted party (i.e., Verifier) checks the trustworthiness of a potentially untrusted device (i.e., Prover). In the Internet of Things (IoT) systems, the existing remote attestation protocols propose various approaches to detect the modified software and physical tampering attacks. However, in an interoperable IoT system, in which IoT devices interact autonomously among themselves, an additional problem arises: a compromised IoT service can influence the genuine operation of other invoked service, without changing the software of the latter. In this paper, we propose a protocol for Remote Attestation of Distributed IoT Services (RADIS), which verifies the trustworthiness of distributed IoT services. Instead of attesting the complete memory content of the entire interoperable IoT devices, RADIS attests only the services involved in performing a certain functionality. RADIS relies on a control-flow attestation technique to detect IoT services that perform an unexpected operation due to their interactions with a malicious remote service. Our experiments show the effectiveness of our protocol in validating the integrity status of a distributed IoT service.
△ Less
Submitted 18 November, 2020; v1 submitted 26 July, 2018;
originally announced July 2018.
-
Towards an Active, Autonomous and Intelligent Cyber Defense of Military Systems: the NATO AICA Reference Architecture
Authors:
Paul Theron,
Alexander Kott,
Martin Drašar,
Krzysztof Rzadca,
Benoît LeBlanc,
Mauno Pihelgas,
Luigi Mancini,
Agostino Panico
Abstract:
Within the future Global Information Grid, complex massively interconnected systems, isolated defense vehicles, sensors and effectors, and infrastructures and systems demanding extremely low failure rates, to which human security operators cannot have an easy access and cannot deliver fast enough reactions to cyber-attacks, need an active, autonomous and intelligent cyber defense. Multi Agent Syst…
▽ More
Within the future Global Information Grid, complex massively interconnected systems, isolated defense vehicles, sensors and effectors, and infrastructures and systems demanding extremely low failure rates, to which human security operators cannot have an easy access and cannot deliver fast enough reactions to cyber-attacks, need an active, autonomous and intelligent cyber defense. Multi Agent Systems for Cyber Defense may provide an answer to this requirement. This paper presents the concept and architecture of an Autonomous Intelligent Cyber defense Agent (AICA). First, we describe the rationale of the AICA concept. Secondly, we explain the methodology and purpose that drive the definition of the AICA Reference Architecture (AICARA) by NATO's IST-152 Research and Technology Group. Thirdly, we review some of the main features and challenges of Multi Autonomous Intelligent Cyber defense Agent (MAICA). Fourthly, we depict the initially assumed AICA Reference Architecture. Then we present one of our preliminary research issues, assumptions and ideas. Finally, we present the future lines of research that will help develop and test the AICA / MAICA concept.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
Autonomous Intelligent Cyber-defense Agent (AICA) Reference Architecture. Release 2.0
Authors:
Alexander Kott,
Paul Théron,
Martin Drašar,
Edlira Dushku,
Benoît LeBlanc,
Paul Losiewicz,
Alessandro Guarino,
Luigi Mancini,
Agostino Panico,
Mauno Pihelgas,
Krzysztof Rzadca,
Fabio De Gaspari
Abstract:
This report - a major revision of its previous release - describes a reference architecture for intelligent software agents performing active, largely autonomous cyber-defense actions on military networks of computing and communicating devices. The report is produced by the North Atlantic Treaty Organization (NATO) Research Task Group (RTG) IST-152 "Intelligent Autonomous Agents for Cyber Defense…
▽ More
This report - a major revision of its previous release - describes a reference architecture for intelligent software agents performing active, largely autonomous cyber-defense actions on military networks of computing and communicating devices. The report is produced by the North Atlantic Treaty Organization (NATO) Research Task Group (RTG) IST-152 "Intelligent Autonomous Agents for Cyber Defense and Resilience". In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents - malware - will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance and computerized weapon systems. To fight them, NATO needs artificial cyber hunters - intelligent, autonomous, mobile agents specialized in active cyber defense. With this in mind, in 2016, NATO initiated RTG IST-152. Its objective has been to help accelerate the development and transition to practice of such software agents by producing a reference architecture and technical roadmap. This report presents the concept and architecture of an Autonomous Intelligent Cyber-defense Agent (AICA). We describe the rationale of the AICA concept, explain the methodology and purpose that drive the definition of the AICA Reference Architecture, and review some of the main features and challenges of AICAs.
△ Less
Submitted 22 March, 2023; v1 submitted 28 March, 2018;
originally announced March 2018.
-
OSSINT - Open Source Social Network Intelligence An efficient and effective way to uncover "private" information in OSN profiles
Authors:
Giuseppe Cascavilla,
Filipe Beato,
Andrea Burattin,
Mauro Conti,
Luigi Vincenzo Mancini
Abstract:
Online Social Networks (OSNs), such as Facebook, provide users with tools to share information along with a set of privacy controls preferences to regulate the spread of information. Current privacy controls are efficient to protect content data. However, the complexity of tuning them undermine their efficiency when protecting contextual information (such as the social network structure) that many…
▽ More
Online Social Networks (OSNs), such as Facebook, provide users with tools to share information along with a set of privacy controls preferences to regulate the spread of information. Current privacy controls are efficient to protect content data. However, the complexity of tuning them undermine their efficiency when protecting contextual information (such as the social network structure) that many users believe being kept private.
In this paper, we demonstrate the extent of the problem of information leakage in Facebook. In particular, we show the possibility of inferring, from the network "surrounding" a victim user, some information that the victim set as hidden. We developed a system, named OSSINT (Open Source Social Network INTelligence), on top of our previous tool SocialSpy, that is able to infer hidden information of a victim profile and retrieve private information from public one. OSSINT retrieves the friendship network of a victim and shows how it is possible to infer additional private information (e.g., user personal preferences and hobbies). Our proposed system OSSINT goes extra mile about the network topology information, i.e., predicting new friendships using the victim's friends of friends network (2-hop of distance from the victim profile), and hence possibly deduce private information of the full Facebook network. OSSINT correctly improved the previous results of SocialSpy predicting an average of 11 additional friendships with peaks of 20 new friends. Moreover, OSSINT, for the considered victim profiles demonstrated how it is possible to infer real life information such as current city, hometown, university, supposed being private.
△ Less
Submitted 21 November, 2016;
originally announced November 2016.
-
Know Your Enemy: Stealth Configuration-Information Gathering in SDN
Authors:
Mauro Conti,
Fabio De Gaspari,
Luigi V. Mancini
Abstract:
Software Defined Networking (SDN) is a network architecture that aims at providing high flexibility through the separation of the network logic from the forwarding functions. The industry has already widely adopted SDN and researchers thoroughly analyzed its vulnerabilities, proposing solutions to improve its security. However, we believe important security aspects of SDN are still left uninvestig…
▽ More
Software Defined Networking (SDN) is a network architecture that aims at providing high flexibility through the separation of the network logic from the forwarding functions. The industry has already widely adopted SDN and researchers thoroughly analyzed its vulnerabilities, proposing solutions to improve its security. However, we believe important security aspects of SDN are still left uninvestigated. In this paper, we raise the concern of the possibility for an attacker to obtain knowledge about an SDN network. In particular, we introduce a novel attack, named Know Your Enemy (KYE), by means of which an attacker can gather vital information about the configuration of the network. This information ranges from the configuration of security tools, such as attack detection thresholds for network scanning, to general network policies like QoS and network virtualization. Additionally, we show that an attacker can perform a KYE attack in a stealthy fashion, i.e., without the risk of being detected. We underline that the vulnerability exploited by the KYE attack is proper of SDN and is not present in legacy networks. To address the KYE attack, we also propose an active defense countermeasure based on network flows obfuscation, which considerably increases the complexity for a successful attack. Our solution offers provable security guarantees that can be tailored to the needs of the specific network under consideration
△ Less
Submitted 16 August, 2016;
originally announced August 2016.
-
No Place to Hide that Bytes won't Reveal: Sniffing Location-Based Encrypted Traffic to Track a User's Position
Authors:
Giuseppe Ateniese,
Briland Hitaj,
Luigi V. Mancini,
Nino V. Verde,
Antonio Villani
Abstract:
News reports of the last few years indicated that several intelligence agencies are able to monitor large networks or entire portions of the Internet backbone. Such a powerful adversary has only recently been considered by the academic literature. In this paper, we propose a new adversary model for Location Based Services (LBSs). The model takes into account an unauthorized third party, different…
▽ More
News reports of the last few years indicated that several intelligence agencies are able to monitor large networks or entire portions of the Internet backbone. Such a powerful adversary has only recently been considered by the academic literature. In this paper, we propose a new adversary model for Location Based Services (LBSs). The model takes into account an unauthorized third party, different from the LBS provider itself, that wants to infer the location and monitor the movements of a LBS user. We show that such an adversary can extrapolate the position of a target user by just analyzing the size and the timing of the encrypted traffic exchanged between that user and the LBS provider. We performed a thorough analysis of a widely deployed location based app that comes pre-installed with many Android devices: GoogleNow. The results are encouraging and highlight the importance of devising more effective countermeasures against powerful adversaries to preserve the privacy of LBS users.
△ Less
Submitted 4 September, 2015; v1 submitted 28 May, 2015;
originally announced May 2015.
-
Can't you hear me knocking: Identification of user actions on Android apps via traffic analysis
Authors:
Mauro Conti,
Luigi V. Mancini,
Riccardo Spolaor,
Nino V. Verde
Abstract:
While smartphone usage become more and more pervasive, people start also asking to which extent such devices can be maliciously exploited as "tracking devices". The concern is not only related to an adversary taking physical or remote control of the device (e.g., via a malicious app), but also to what a passive adversary (without the above capabilities) can observe from the device communications.…
▽ More
While smartphone usage become more and more pervasive, people start also asking to which extent such devices can be maliciously exploited as "tracking devices". The concern is not only related to an adversary taking physical or remote control of the device (e.g., via a malicious app), but also to what a passive adversary (without the above capabilities) can observe from the device communications. Work in this latter direction aimed, for example, at inferring the apps a user has installed on his device, or identifying the presence of a specific user within a network.
In this paper, we move a step forward: we investigate to which extent it is feasible to identify the specific actions that a user is doing on his mobile device, by simply eavesdropping the device's network traffic. In particular, we aim at identifying actions like browsing someone's profile on a social network, posting a message on a friend's wall, or sending an email. We design a system that achieves this goal starting from encrypted TCP/IP packets: it works through identification of network flows and application of machine learning techniques. We did a complete implementation of this system and run a thorough set of experiments, which show that it can achieve accuracy and precision higher than 95%, for most of the considered actions.
△ Less
Submitted 29 July, 2014;
originally announced July 2014.
-
No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone
Authors:
Nino Vincenzo Verde,
Giuseppe Ateniese,
Emanuele Gabrielli,
Luigi Vincenzo Mancini,
Angelo Spognardi
Abstract:
It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In rea…
▽ More
It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis.
△ Less
Submitted 9 February, 2014;
originally announced February 2014.
-
Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers
Authors:
Giuseppe Ateniese,
Giovanni Felici,
Luigi V. Mancini,
Angelo Spognardi,
Antonio Villani,
Domenico Vitali
Abstract:
Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training…
▽ More
Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights.
△ Less
Submitted 19 June, 2013;
originally announced June 2013.
-
Mapping the File Systems Genome: rationales, technique, results and applications
Authors:
Roberto Di Pietro,
Luigi V. Mancini,
Antonio Villani,
Domenico Vitali
Abstract:
This paper provides evidence of a feature of Hard-Disk Drives (HDDs), that we call File System Genome. Such a feature is originated by the areas where (on the HDD) the file blocks are placed by the operating system during the installation procedure. It appears from our study that the File System Genome is a distinctive and unique feature of each indi- vidual HDD. In particular, our extensive set o…
▽ More
This paper provides evidence of a feature of Hard-Disk Drives (HDDs), that we call File System Genome. Such a feature is originated by the areas where (on the HDD) the file blocks are placed by the operating system during the installation procedure. It appears from our study that the File System Genome is a distinctive and unique feature of each indi- vidual HDD. In particular, our extensive set of experiments shows that the installation of the same operating system on two identical hardware configurations generates two different File System Genomes. Further, the application of sound information theory tools, such as min entropy, show that the differences between two File System Genome are considerably relevant. The results provided in this paper constitute the scientific basis for a number of applications in various fields of information technology, such as forensic identification and security. Finally, this work also paves the way for the application of the highlighted technique to other classes of mass-storage devices (e.g. SSDs, Flash memories).
△ Less
Submitted 12 June, 2013;
originally announced June 2013.
-
Formal Analysis of UMTS Privacy
Authors:
Myrto Arapinis,
Loretta Ilaria Mancini,
Eike Ritter,
Mark Ryan
Abstract:
The ubiquitous presence of mobile communication devices and the continuous development of mo- bile data applications, which results in high level of mobile devices' activity and exchanged data, often transparent to the user, makes privacy preservation an important feature of mobile telephony systems. We present a formal analysis of the UMTS Authentication and Key Agreement protocol, using the appl…
▽ More
The ubiquitous presence of mobile communication devices and the continuous development of mo- bile data applications, which results in high level of mobile devices' activity and exchanged data, often transparent to the user, makes privacy preservation an important feature of mobile telephony systems. We present a formal analysis of the UMTS Authentication and Key Agreement protocol, using the applied pi-calculus and the ProVerif tool. We formally verify the model with respect to privacy properties. We show a linkability attack which makes it possible, for individuals with low-cost equipment, to trace UMTS subscribers. The attack exploits information leaked by poorly designed error messages.
△ Less
Submitted 9 September, 2011;
originally announced September 2011.
-
The Smallville Effect: Social Ties Make Mobile Networks More Secure Against the Node Capture Attack
Authors:
Mauro Conti,
Roberto Di Pietro,
Andrea Gabrielli,
Luigi V. Mancini,
Alessandro Mei
Abstract:
Mobile Ad Hoc networks, due to the unattended nature of the network itself and the dispersed location of nodes, are subject to several unique security issues. One of the most vexed security threat is node capture. A few solutions have already been proposed to address this problem; however, those solutions are either centralized or focused on theoretical mobility models alone. In the former case…
▽ More
Mobile Ad Hoc networks, due to the unattended nature of the network itself and the dispersed location of nodes, are subject to several unique security issues. One of the most vexed security threat is node capture. A few solutions have already been proposed to address this problem; however, those solutions are either centralized or focused on theoretical mobility models alone. In the former case the solution does not fit well the distributed nature of the network while, in the latter case, the quality of the solutions obtained for realistic mobility models severely differs from the results obtained for theoretical models. The rationale of this paper is inspired by the observation that re-encounters of mobile nodes do elicit a form of social ties. Leveraging these ties, it is possible to design efficient and distributed algorithms that, with a moderated degree of node cooperation, enforce the emergent property of node capture detection. In particular, in this paper we provide a proof of concept proposing a set of algorithms that leverage, to different extent, node mobility and node cooperation--that is, identifying social ties--to thwart node capture attack. In particular, we test these algorithms on a realistic mobility scenario. Extensive simulations show the quality of the proposed solutions and, more important, the viability of the proposed approach.
△ Less
Submitted 11 December, 2009;
originally announced December 2009.