-
INTELLECT: Adapting Cyber Threat Detection to Heterogeneous Computing Environments
Authors:
Simone Magnani,
Liubov Nedoshivina,
Roberto Doriguzzi-Corin,
Stefano Braghin,
Domenico Siracusa
Abstract:
The widespread adoption of cloud computing, edge, and IoT has increased the attack surface for cyber threats. This is due to the large-scale deployment of often unsecured, heterogeneous devices with varying hardware and software configurations. The diversity of these devices attracts a wide array of potential attack methods, making it challenging for individual organizations to have comprehensive…
▽ More
The widespread adoption of cloud computing, edge, and IoT has increased the attack surface for cyber threats. This is due to the large-scale deployment of often unsecured, heterogeneous devices with varying hardware and software configurations. The diversity of these devices attracts a wide array of potential attack methods, making it challenging for individual organizations to have comprehensive knowledge of all possible threats. In this context, powerful anomaly detection models can be developed by combining data from different parties using Federated Learning. FL enables the collaborative development of ML-based IDSs without requiring the parties to disclose sensitive training data, such as network traffic or sensor readings. However, deploying the resulting models can be challenging, as they may require more computational resources than those available on target devices with limited capacity or already allocated for other operations. Training device-specific models is not feasible for an organization because a significant portion of the training data is private to other participants in the FL process. To address these challenges, this paper introduces INTELLECT, a novel solution that integrates feature selection, model pruning, and fine-tuning techniques into a cohesive pipeline for the dynamic adaptation of pre-trained ML models and configurations for IDSs. Through empirical evaluation, we analyze the benefits of INTELLECT's approach in tailoring ML models to the specific resource constraints of an organization's devices and measure variations in traffic classification accuracy resulting from feature selection, pruning, and fine-tuning operations. Additionally, we demonstrate the advantages of incorporating knowledge distillation techniques while fine-tuning, enabling the ML model to consistently adapt to local network patterns while preserving historical knowledge.
△ Less
Submitted 21 July, 2024; v1 submitted 17 July, 2024;
originally announced July 2024.
-
Robust Learning Protocol for Federated Tumor Segmentation Challenge
Authors:
Ambrish Rawat,
Giulio Zizzo,
Swanand Kadhe,
Jonathan P. Epperlein,
Stefano Braghin
Abstract:
In this work, we devise robust and efficient learning protocols for orchestrating a Federated Learning (FL) process for the Federated Tumor Segmentation Challenge (FeTS 2022). Enabling FL for FeTS setup is challenging mainly due to data heterogeneity among collaborators and communication cost of training. To tackle these challenges, we propose Robust Learning Protocol (RoLePRO) which is a combinat…
▽ More
In this work, we devise robust and efficient learning protocols for orchestrating a Federated Learning (FL) process for the Federated Tumor Segmentation Challenge (FeTS 2022). Enabling FL for FeTS setup is challenging mainly due to data heterogeneity among collaborators and communication cost of training. To tackle these challenges, we propose Robust Learning Protocol (RoLePRO) which is a combination of server-side adaptive optimisation (e.g., server-side Adam) and judicious parameter (weights) aggregation schemes (e.g., adaptive weighted aggregation). RoLePRO takes a two-phase approach, where the first phase consists of vanilla Federated Averaging, while the second phase consists of a judicious aggregation scheme that uses a sophisticated reweighting, all in the presence of an adaptive optimisation algorithm at the server. We draw insights from extensive experimentation to tune learning rates for the two phases.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
DLPFS: The Data Leakage Prevention FileSystem
Authors:
Stefano Braghin,
Marco Simioni,
Mathieu Sinn
Abstract:
Shared folders are still a common practice for granting third parties access to data files, regardless of the advances in data sharing technologies. Services like Google Drive, Dropbox, Box, and others, provide infrastructures and interfaces to manage file sharing. The human factor is the weakest link and data leaks caused by human error are regrettable common news. This takes place as both mishan…
▽ More
Shared folders are still a common practice for granting third parties access to data files, regardless of the advances in data sharing technologies. Services like Google Drive, Dropbox, Box, and others, provide infrastructures and interfaces to manage file sharing. The human factor is the weakest link and data leaks caused by human error are regrettable common news. This takes place as both mishandled data, for example stored to the wrong directory, or via misconfigured or failing applications dumping data incorrectly. We present Data Leakage Prevention FileSystem (DLPFS), a first attempt to systematically protect against data leakage caused by misconfigured application or human error. This filesystem interface provides a privacy protection layer on top of the POSIX filesystem interface, allowing for seamless integration with existing infrastructures and applications, simply augmenting existing security controls. At the same time, DLPFS allows data administrators to protect files shared within an organisation by preventing unauthorised parties to access potentially sensitive content. DLPFS achieves this by transparently integrating with existing access control mechanisms. We empirically evaluate the impact of DLPFS on system's performances to demonstrate the feasibility of the proposed solution.
△ Less
Submitted 31 August, 2021;
originally announced August 2021.
-
Secure k-Anonymization over Encrypted Databases
Authors:
Manish Kesarwani,
Akshar Kaul,
Stefano Braghin,
Naoise Holohan,
Spiros Antonatos
Abstract:
Data protection algorithms are becoming increasingly important to support modern business needs for facilitating data sharing and data monetization. Anonymization is an important step before data sharing. Several organizations leverage on third parties for storing and managing data. However, third parties are often not trusted to store plaintext personal and sensitive data; data encryption is wide…
▽ More
Data protection algorithms are becoming increasingly important to support modern business needs for facilitating data sharing and data monetization. Anonymization is an important step before data sharing. Several organizations leverage on third parties for storing and managing data. However, third parties are often not trusted to store plaintext personal and sensitive data; data encryption is widely adopted to protect against intentional and unintentional attempts to read personal/sensitive data. Traditional encryption schemes do not support operations over the ciphertexts and thus anonymizing encrypted datasets is not feasible with current approaches. This paper explores the feasibility and depth of implementing a privacy-preserving data publishing workflow over encrypted datasets leveraging on homomorphic encryption. We demonstrate how we can achieve uniqueness discovery, data masking, differential privacy and k-anonymity over encrypted data requiring zero knowledge about the original values. We prove that the security protocols followed by our approach provide strong guarantees against inference attacks. Finally, we experimentally demonstrate the performance of our data publishing workflow components.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
Secure Random Sampling in Differential Privacy
Authors:
Naoise Holohan,
Stefano Braghin
Abstract:
Differential privacy is among the most prominent techniques for preserving privacy of sensitive data, oweing to its robust mathematical guarantees and general applicability to a vast array of computations on data, including statistical analysis and machine learning. Previous work demonstrated that concrete implementations of differential privacy mechanisms are vulnerable to statistical attacks. Th…
▽ More
Differential privacy is among the most prominent techniques for preserving privacy of sensitive data, oweing to its robust mathematical guarantees and general applicability to a vast array of computations on data, including statistical analysis and machine learning. Previous work demonstrated that concrete implementations of differential privacy mechanisms are vulnerable to statistical attacks. This vulnerability is caused by the approximation of real values to floating point numbers. This paper presents a practical solution to the finite-precision floating point vulnerability, where the inverse transform sampling of the Laplace distribution can itself be inverted, thus enabling an attack where the original value can be retrieved with non-negligible advantage. The proposed solution has the advantages of being generalisable to any infinitely divisible probability distribution, and of simple implementation in modern architectures. Finally, the solution has been designed to make side channel attack infeasible, because of inherently exponential, in the size of the domain, brute force attacks.
△ Less
Submitted 24 November, 2021; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Diffprivlib: The IBM Differential Privacy Library
Authors:
Naoise Holohan,
Stefano Braghin,
Pól Mac Aonghusa,
Killian Levacher
Abstract:
Since its conception in 2006, differential privacy has emerged as the de-facto standard in data privacy, owing to its robust mathematical guarantees, generalised applicability and rich body of literature. Over the years, researchers have studied differential privacy and its applicability to an ever-widening field of topics. Mechanisms have been created to optimise the process of achieving differen…
▽ More
Since its conception in 2006, differential privacy has emerged as the de-facto standard in data privacy, owing to its robust mathematical guarantees, generalised applicability and rich body of literature. Over the years, researchers have studied differential privacy and its applicability to an ever-widening field of topics. Mechanisms have been created to optimise the process of achieving differential privacy, for various data types and scenarios. Until this work however, all previous work on differential privacy has been conducted on a ad-hoc basis, without a single, unifying codebase to implement results.
In this work, we present the IBM Differential Privacy Library, a general purpose, open source library for investigating, experimenting and developing differential privacy applications in the Python programming language. The library includes a host of mechanisms, the building blocks of differential privacy, alongside a number of applications to machine learning and other data analytics tasks. Simplicity and accessibility has been prioritised in developing the library, making it suitable to a wide audience of users, from those using the library for their first investigations in data privacy, to the privacy experts looking to contribute their own models and mechanisms for others to use.
△ Less
Submitted 4 July, 2019;
originally announced July 2019.
-
AnonTokens: tracing re-identification attacks through decoy records
Authors:
Spiros Antonatos,
Stefano Braghin,
Naoise Holohan,
Pol MacAonghusa
Abstract:
Privacy is of the utmost concern when it comes to releasing data to third parties. Data owners rely on anonymization approaches to safeguard the released datasets against re-identification attacks. However, even with strict anonymization in place, re-identification attacks are still a possibility and in many cases a reality. Prior art has focused on providing better anonymization algorithms with m…
▽ More
Privacy is of the utmost concern when it comes to releasing data to third parties. Data owners rely on anonymization approaches to safeguard the released datasets against re-identification attacks. However, even with strict anonymization in place, re-identification attacks are still a possibility and in many cases a reality. Prior art has focused on providing better anonymization algorithms with minimal loss of information and how to prevent data disclosure attacks. Our approach tries to tackle the issue of tracing re-identification attacks based on the concept of honeytokens, decoy or "bait" records with the goal to lure malicious users. While the concept of honeytokens has been widely used in the security domain, this is the first approach to apply the concept on the data privacy domain. Records with high re-identification risk, called AnonTokens, are inserted into anonymized datasets. This work demonstrates the feasibility, detectability and usability of AnonTokens and provides promising results for data owners who want to apply our approach to real use cases. We evaluated our concept with real large-scale population datasets. The results show that the introduction of decoy tokens is feasible without significant impact on the released dataset.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
The Bounded Laplace Mechanism in Differential Privacy
Authors:
Naoise Holohan,
Spiros Antonatos,
Stefano Braghin,
Pól Mac Aonghusa
Abstract:
The Laplace mechanism is the workhorse of differential privacy, applied to many instances where numerical data is processed. However, the Laplace mechanism can return semantically impossible values, such as negative counts, due to its infinite support. There are two popular solutions to this: (i) bounding/capping the output values and (ii) bounding the mechanism support. In this paper, we show tha…
▽ More
The Laplace mechanism is the workhorse of differential privacy, applied to many instances where numerical data is processed. However, the Laplace mechanism can return semantically impossible values, such as negative counts, due to its infinite support. There are two popular solutions to this: (i) bounding/capping the output values and (ii) bounding the mechanism support. In this paper, we show that bounding the mechanism support, while using the parameters of the pure Laplace mechanism, does not typically preserve differential privacy. We also present a robust method to compute the optimal mechanism parameters to achieve differential privacy in such a setting.
△ Less
Submitted 30 August, 2018;
originally announced August 2018.
-
($k$,$ε$)-Anonymity: $k$-Anonymity with $ε$-Differential Privacy
Authors:
Naoise Holohan,
Spiros Antonatos,
Stefano Braghin,
Pól Mac Aonghusa
Abstract:
The explosion in volume and variety of data offers enormous potential for research and commercial use. Increased availability of personal data is of particular interest in enabling highly customised services tuned to individual needs. Preserving the privacy of individuals against reidentification attacks in this fast-moving ecosystem poses significant challenges for a one-size fits all approach to…
▽ More
The explosion in volume and variety of data offers enormous potential for research and commercial use. Increased availability of personal data is of particular interest in enabling highly customised services tuned to individual needs. Preserving the privacy of individuals against reidentification attacks in this fast-moving ecosystem poses significant challenges for a one-size fits all approach to anonymisation.
In this paper we present ($k$,$ε$)-anonymisation, an approach that combines the $k$-anonymisation and $ε$-differential privacy models into a single coherent framework, providing privacy guarantees at least as strong as those offered by the individual models. Linking risks of less than 5\% are observed in experimental results, even with modest values of $k$ and $ε$.
Our approach is shown to address well-known limitations of $k$-anonymity and $ε$-differential privacy and is validated in an extensive experimental campaign using openly available datasets.
△ Less
Submitted 4 October, 2017;
originally announced October 2017.
-
Predicting User Engagement in Twitter with Collaborative Ranking
Authors:
Ernesto Diaz-Aviles,
Hoang Thanh Lam,
Fabio Pinelli,
Stefano Braghin,
Yiannis Gkoufas,
Michele Berlingerio,
Francesco Calabrese
Abstract:
Collaborative Filtering (CF) is a core component of popular web-based services such as Amazon, YouTube, Netflix, and Twitter. Most applications use CF to recommend a small set of items to the user. For instance, YouTube presents to a user a list of top-n videos she would likely watch next based on her rating and viewing history. Current methods of CF evaluation have been focused on assessing the q…
▽ More
Collaborative Filtering (CF) is a core component of popular web-based services such as Amazon, YouTube, Netflix, and Twitter. Most applications use CF to recommend a small set of items to the user. For instance, YouTube presents to a user a list of top-n videos she would likely watch next based on her rating and viewing history. Current methods of CF evaluation have been focused on assessing the quality of a predicted rating or the ranking performance for top-n recommended items. However, restricting the recommender system evaluation to these two aspects is rather limiting and neglects other dimensions that could better characterize a well-perceived recommendation. In this paper, instead of optimizing rating or top-n recommendation, we focus on the task of predicting which items generate the highest user engagement. In particular, we use Twitter as our testbed and cast the problem as a Collaborative Ranking task where the rich features extracted from the metadata of the tweets help to complement the transaction information limited to user ids, item ids, ratings and timestamps. We learn a scoring function that directly optimizes the user engagement in terms of nDCG@10 on the predicted ranking. Experiments conducted on an extended version of the MovieTweetings dataset, released as part of the RecSys Challenge 2014, show the effectiveness of our approach.
△ Less
Submitted 26 December, 2014;
originally announced December 2014.
-
Answering queries using pairings
Authors:
Alberto Trombetta,
Giuseppe Persiano,
Stefano Braghin
Abstract:
Outsourcing data in the cloud has become nowadays very common. Since -- generally speaking -- cloud data storage and management providers cannot be fully trusted, mechanisms providing the confidentiality of the stored data are necessary. A possible solution is to encrypt all the data, but -- of course -- this poses serious problems about the effective usefulness of the stored data. In this work, w…
▽ More
Outsourcing data in the cloud has become nowadays very common. Since -- generally speaking -- cloud data storage and management providers cannot be fully trusted, mechanisms providing the confidentiality of the stored data are necessary. A possible solution is to encrypt all the data, but -- of course -- this poses serious problems about the effective usefulness of the stored data. In this work, we propose to apply a well-known attribute-based cryptographic scheme to cope with the problem of querying encrypted data. We have implemented the proposed scheme with a real-world, off-the-shelf RDBMS and we provide several experimental results showing the feasibility of our approach.
△ Less
Submitted 11 March, 2014;
originally announced March 2014.
-
Secure and Policy-Private Resource Sharing in an Online Social Network
Authors:
Stefano Braghin,
Vincenzo Iovino,
Giuseppe Persiano,
Alberto Trombetta
Abstract:
Providing functionalities that allow online social network users to manage in a secure and private way the publication of their information and/or resources is a relevant and far from trivial topic that has been under scrutiny from various research communities. In this work, we provide a framework that allows users to define highly expressive access policies to their resources in a way that the en…
▽ More
Providing functionalities that allow online social network users to manage in a secure and private way the publication of their information and/or resources is a relevant and far from trivial topic that has been under scrutiny from various research communities. In this work, we provide a framework that allows users to define highly expressive access policies to their resources in a way that the enforcement does not require the intervention of a (trusted or not) third party. This is made possible by the deployment of a newly defined cryptographic primitives that provides - among other things - efficient access revocation and access policy privacy. Finally, we provide an implementation of our framework as a Facebook application, proving the feasibility of our approach.
△ Less
Submitted 10 July, 2013;
originally announced July 2013.
-
The Zen of Multidisciplinary Team Recommendation
Authors:
Anwitaman Datta,
Stefano Braghin,
Jackson Tan Teck Yong
Abstract:
In order to accomplish complex tasks, it is often necessary to compose a team consisting of experts with diverse competencies. However, for proper functioning, it is also preferable that a team be socially cohesive. A team recommendation system, which facilitates the search for potential team members can be of great help both for (i) individuals who need to seek out collaborators and (ii) managers…
▽ More
In order to accomplish complex tasks, it is often necessary to compose a team consisting of experts with diverse competencies. However, for proper functioning, it is also preferable that a team be socially cohesive. A team recommendation system, which facilitates the search for potential team members can be of great help both for (i) individuals who need to seek out collaborators and (ii) managers who need to build a team for some specific tasks.
A decision support system which readily helps summarize such metrics, and possibly rank the teams in a personalized manner according to the end users' preferences, can be a great tool to navigate what would otherwise be an information avalanche.
In this work we present a general framework of how to compose such subsystems together to build a composite team recommendation system, and instantiate it for a case study of academic teams.
△ Less
Submitted 4 March, 2013;
originally announced March 2013.
-
PriSM: A Private Social Mesh for Leveraging Social Networking at Workplace
Authors:
Stefano Braghin,
Jackson Tan,
Rajesh Sharma,
Anwitaman Datta
Abstract:
In this work we describe the PriSM framework for decentralized deployment of a federation of autonomous social networks (ASN). The individual ASNs are centrally managed by organizations according to their institutional needs, while cross-ASN interactions are facilitated subject to security and confidentiality requirements specified by administrators and users of the ASNs. Such decentralized deploy…
▽ More
In this work we describe the PriSM framework for decentralized deployment of a federation of autonomous social networks (ASN). The individual ASNs are centrally managed by organizations according to their institutional needs, while cross-ASN interactions are facilitated subject to security and confidentiality requirements specified by administrators and users of the ASNs. Such decentralized deployment, possibly either on private or public clouds, provides control and ownership of information/flow to individual organizations. Lack of such complete control (if third party online social networking services were to be used) has so far been a great barrier in taking full advantage of the novel communication mechanisms at workplace that have however become commonplace for personal usage with the advent of Web 2.0 platforms and online social networks. PriSM provides a practical solution for organizations to harness the advantages of online social networking both in intra/inter-organizational settings without sacrificing autonomy, security and confidentiality needs.
△ Less
Submitted 5 June, 2013; v1 submitted 7 June, 2012;
originally announced June 2012.