-
From 5G to 6G: A Survey on Security, Privacy, and Standardization Pathways
Authors:
Mengmeng Yang,
Youyang Qu,
Thilina Ranbaduge,
Chandra Thapa,
Nazatul Sultan,
Ming Ding,
Hajime Suzuki,
Wei Ni,
Sharif Abuadbba,
David Smith,
Paul Tyler,
Josef Pieprzyk,
Thierry Rakotoarivelo,
Xinlong Guan,
Sirine M'rabet
Abstract:
The vision for 6G aims to enhance network capabilities with faster data rates, near-zero latency, and higher capacity, supporting more connected devices and seamless experiences within an intelligent digital ecosystem where artificial intelligence (AI) plays a crucial role in network management and data analysis. This advancement seeks to enable immersive mixed-reality experiences, holographic com…
▽ More
The vision for 6G aims to enhance network capabilities with faster data rates, near-zero latency, and higher capacity, supporting more connected devices and seamless experiences within an intelligent digital ecosystem where artificial intelligence (AI) plays a crucial role in network management and data analysis. This advancement seeks to enable immersive mixed-reality experiences, holographic communications, and smart city infrastructures. However, the expansion of 6G raises critical security and privacy concerns, such as unauthorized access and data breaches. This is due to the increased integration of IoT devices, edge computing, and AI-driven analytics. This paper provides a comprehensive overview of 6G protocols, focusing on security and privacy, identifying risks, and presenting mitigation strategies. The survey examines current risk assessment frameworks and advocates for tailored 6G solutions. We further discuss industry visions, government projects, and standardization efforts to balance technological innovation with robust security and privacy measures.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Integrating PETs into Software Applications: A Game-Based Learning Approach
Authors:
Maisha Boteju,
Thilina Ranbaduge,
Dinusha Vatsalan,
Nalin Arachchilage
Abstract:
The absence of data protection measures in software applications leads to data breaches, threatening end-user privacy and causing instabilities in organisations that developed those software. Privacy Enhancing Technologies (PETs) emerge as promising safeguards against data breaches. PETs minimise threats to personal data while enabling software to extract valuable insights from them. However, soft…
▽ More
The absence of data protection measures in software applications leads to data breaches, threatening end-user privacy and causing instabilities in organisations that developed those software. Privacy Enhancing Technologies (PETs) emerge as promising safeguards against data breaches. PETs minimise threats to personal data while enabling software to extract valuable insights from them. However, software developers often lack the adequate knowledge and awareness to develop PETs integrated software. This issue is exacerbated by insufficient PETs related learning approaches customised for software developers. Therefore, we propose "PETs-101", a novel game-based learning framework that motivates developers to integrate PETs into software. By doing so, it aims to improve developers' privacy-preserving software development behaviour rather than simply delivering the learning content on PETs. In future, the proposed framework will be empirically investigated and used as a foundation for developing an educational gaming intervention that trains developers to put PETs into practice.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Privacy Technologies for Financial Intelligence
Authors:
Yang Li,
Thilina Ranbaduge,
Kee Siong Ng
Abstract:
Financial crimes like terrorism financing and money laundering can have real impacts on society, including the abuse and mismanagement of public funds, increase in societal problems such as drug trafficking and illicit gambling with attendant economic costs, and loss of innocent lives in the case of terrorism activities. Complex financial crimes can be hard to detect primarily because data related…
▽ More
Financial crimes like terrorism financing and money laundering can have real impacts on society, including the abuse and mismanagement of public funds, increase in societal problems such as drug trafficking and illicit gambling with attendant economic costs, and loss of innocent lives in the case of terrorism activities. Complex financial crimes can be hard to detect primarily because data related to different pieces of the overall puzzle is usually distributed across a network of financial institutions, regulators, and law-enforcement agencies and they cannot be easily shared due to privacy constraints. Recent advances in Privacy-Preserving Data Matching and Machine Learning provide an opportunity for regulators and the financial industry to come together to solve the risk-discovery problem with technology. This paper provides a survey of the financial intelligence landscape and where opportunities lie for privacy technologies to improve the state-of-the-art in financial-crime detection.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Private Knowledge Sharing in Distributed Learning: A Survey
Authors:
Yasas Supeksala,
Dinh C. Nguyen,
Ming Ding,
Thilina Ranbaduge,
Calson Chua,
Jun Zhang,
Jun Li,
H. Vincent Poor
Abstract:
The rise of Artificial Intelligence (AI) has revolutionized numerous industries and transformed the way society operates. Its widespread use has led to the distribution of AI and its underlying data across many intelligent systems. In this light, it is crucial to utilize information in learning processes that are either distributed or owned by different entities. As a result, modern data-driven se…
▽ More
The rise of Artificial Intelligence (AI) has revolutionized numerous industries and transformed the way society operates. Its widespread use has led to the distribution of AI and its underlying data across many intelligent systems. In this light, it is crucial to utilize information in learning processes that are either distributed or owned by different entities. As a result, modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes. In line with this goal, the latest AI models are frequently trained in a decentralized manner. Distributed learning involves multiple entities working together to make collective predictions and decisions. However, this collaboration can also bring about security vulnerabilities and challenges. This paper provides an in-depth survey on private knowledge sharing in distributed learning, examining various knowledge components utilized in leading distributed learning architectures. Our analysis sheds light on the most critical vulnerabilities that may arise when using these components in a distributed setting. We further identify and examine defensive strategies for preserving the privacy of these knowledge components and preventing malicious parties from manipulating or accessing the knowledge information. Finally, we highlight several key limitations of knowledge sharing in distributed learning and explore potential avenues for future research.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Privacy risk in GeoData: A survey
Authors:
Mahrokh Abdollahi Lorestani,
Thilina Ranbaduge,
Thierry Rakotoarivelo
Abstract:
With the ubiquitous use of location-based services, large-scale individual-level location data has been widely collected through location-awareness devices. The widespread exposure of such location data poses significant privacy risks to users, as it can lead to re-identification, the inference of sensitive information, and even physical threats. In this survey, we analyse different geomasking tec…
▽ More
With the ubiquitous use of location-based services, large-scale individual-level location data has been widely collected through location-awareness devices. The widespread exposure of such location data poses significant privacy risks to users, as it can lead to re-identification, the inference of sensitive information, and even physical threats. In this survey, we analyse different geomasking techniques proposed to protect individuals' privacy in geodata. We propose a taxonomy to characterise these techniques across various dimensions. We then highlight the shortcomings of current techniques and discuss avenues for future research. Our proposed taxonomy serves as a practical resource for data custodians, offering them a means to navigate the extensive array of existing privacy mechanisms and to identify those that align most effectively with their specific requirements.
△ Less
Submitted 12 September, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
SoK: Demystifying Privacy Enhancing Technologies Through the Lens of Software Developers
Authors:
Maisha Boteju,
Thilina Ranbaduge,
Dinusha Vatsalan,
Nalin Asanka Gamagedara Arachchilage
Abstract:
In the absence of data protection measures, software applications lead to privacy breaches, posing threats to end-users and software organisations. Privacy Enhancing Technologies (PETs) are technical measures that protect personal data, thus minimising such privacy breaches. However, for software applications to deliver data protection using PETs, software developers should actively and correctly…
▽ More
In the absence of data protection measures, software applications lead to privacy breaches, posing threats to end-users and software organisations. Privacy Enhancing Technologies (PETs) are technical measures that protect personal data, thus minimising such privacy breaches. However, for software applications to deliver data protection using PETs, software developers should actively and correctly incorporate PETs into the software they develop. Therefore, to uncover ways to encourage and support developers to embed PETs into software, this Systematic Literature Review (SLR) analyses 39 empirical studies on developers' privacy practices. It reports the usage of six PETs in software application scenarios. Then, it discusses challenges developers face when integrating PETs into software, ranging from intrinsic challenges, such as the unawareness of PETs, to extrinsic challenges, such as the increased development cost. Next, the SLR presents the existing solutions to address these challenges, along with the limitations of the solutions. Further, it outlines future research avenues to better understand PETs from a developer perspective and minimise the challenges developers face when incorporating PETs into software.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Differentially Private Vertical Federated Learning
Authors:
Thilina Ranbaduge,
Ming Ding
Abstract:
A successful machine learning (ML) algorithm often relies on a large amount of high-quality data to train well-performed models. Supervised learning approaches, such as deep learning techniques, generate high-quality ML functions for real-life applications, however with large costs and human efforts to label training data. Recent advancements in federated learning (FL) allow multiple data owners o…
▽ More
A successful machine learning (ML) algorithm often relies on a large amount of high-quality data to train well-performed models. Supervised learning approaches, such as deep learning techniques, generate high-quality ML functions for real-life applications, however with large costs and human efforts to label training data. Recent advancements in federated learning (FL) allow multiple data owners or organisations to collaboratively train a machine learning model without sharing raw data. In this light, vertical FL allows organisations to build a global model when the participating organisations have vertically partitioned data. Further, in the vertical FL setting the participating organisation generally requires fewer resources compared to sharing data directly, enabling lightweight and scalable distributed training solutions. However, privacy protection in vertical FL is challenging due to the communication of intermediate outputs and the gradients of model update. This invites adversary entities to infer other organisations underlying data. Thus, in this paper, we aim to explore how to protect the privacy of individual organisation data in a differential privacy (DP) setting. We run experiments with different real-world datasets and DP budgets. Our experimental results show that a trade-off point needs to be found to achieve a balance between the vertical FL performance and privacy protection in terms of the amount of perturbation noise.
△ Less
Submitted 12 November, 2022;
originally announced November 2022.
-
Privacy-preserving Deep Learning based Record Linkage
Authors:
Thilina Ranbaduge,
Dinusha Vatsalan,
Ming Ding
Abstract:
Deep learning-based linkage of records across different databases is becoming increasingly useful in data integration and mining applications to discover new insights from multiple sources of data. However, due to privacy and confidentiality concerns, organisations often are not willing or allowed to share their sensitive data with any external parties, thus making it challenging to build/train de…
▽ More
Deep learning-based linkage of records across different databases is becoming increasingly useful in data integration and mining applications to discover new insights from multiple sources of data. However, due to privacy and confidentiality concerns, organisations often are not willing or allowed to share their sensitive data with any external parties, thus making it challenging to build/train deep learning models for record linkage across different organizations' databases. To overcome this limitation, we propose the first deep learning-based multi-party privacy-preserving record linkage (PPRL) protocol that can be used to link sensitive databases held by multiple different organisations. In our approach, each database owner first trains a local deep learning model, which is then uploaded to a secure environment and securely aggregated to create a global model. The global model is then used by a linkage unit to distinguish unlabelled record pairs as matches and non-matches. We utilise differential privacy to achieve provable privacy protection against re-identification attacks. We evaluate the linkage quality and scalability of our approach using several large real-world databases, showing that it can achieve high linkage quality while providing sufficient privacy protection against existing attacks.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Vertical Federated Learning: Challenges, Methodologies and Experiments
Authors:
Kang Wei,
Jun Li,
Chuan Ma,
Ming Ding,
Sha Wei,
Fan Wu,
Guihai Chen,
Thilina Ranbaduge
Abstract:
Recently, federated learning (FL) has emerged as a promising distributed machine learning (ML) technology, owing to the advancing computational and sensing capacities of end-user devices, however with the increasing concerns on users' privacy. As a special architecture in FL, vertical FL (VFL) is capable of constructing a hyper ML model by embracing sub-models from different clients. These sub-mod…
▽ More
Recently, federated learning (FL) has emerged as a promising distributed machine learning (ML) technology, owing to the advancing computational and sensing capacities of end-user devices, however with the increasing concerns on users' privacy. As a special architecture in FL, vertical FL (VFL) is capable of constructing a hyper ML model by embracing sub-models from different clients. These sub-models are trained locally by vertically partitioned data with distinct attributes. Therefore, the design of VFL is fundamentally different from that of conventional FL, raising new and unique research issues. In this paper, we aim to discuss key challenges in VFL with effective solutions, and conduct experiments on real-life datasets to shed light on these issues. Specifically, we first propose a general framework on VFL, and highlight the key differences between VFL and conventional FL. Then, we discuss research challenges rooted in VFL systems under four aspects, i.e., security and privacy risks, expensive computation and communication costs, possible structural damage caused by model splitting, and system heterogeneity. Afterwards, we develop solutions to addressing the aforementioned challenges, and conduct extensive experiments to showcase the effectiveness of our proposed solutions.
△ Less
Submitted 5 August, 2024; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Private Graph Data Release: A Survey
Authors:
Yang Li,
Michael Purcell,
Thierry Rakotoarivelo,
David Smith,
Thilina Ranbaduge,
Kee Siong Ng
Abstract:
The application of graph analytics to various domains has yielded tremendous societal and economical benefits in recent years. However, the increasingly widespread adoption of graph analytics comes with a commensurate increase in the need to protect private information in graph data, especially in light of the many privacy breaches in real-world graph data that was supposed to preserve sensitive i…
▽ More
The application of graph analytics to various domains has yielded tremendous societal and economical benefits in recent years. However, the increasingly widespread adoption of graph analytics comes with a commensurate increase in the need to protect private information in graph data, especially in light of the many privacy breaches in real-world graph data that was supposed to preserve sensitive information. This paper provides a comprehensive survey of private graph data release algorithms that seek to achieve the fine balance between privacy and utility, with a specific focus on provably private mechanisms. Many of these mechanisms are natural extensions of the Differential Privacy framework to graph data, but we also investigate more general privacy formulations like Pufferfish Privacy that address some of the limitations of Differential Privacy. We also provide a wide-ranging survey of the applications of private graph data release mechanisms to social networks, finance, supply chain, and health care. This survey paper and the taxonomy it provides should benefit practitioners and researchers alike in the increasingly important area of private analytics and data release.
△ Less
Submitted 4 June, 2022; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Large Scale Record Linkage in the Presence of Missing Data
Authors:
Thilina Ranbaduge,
Peter Christen,
Rainer Schnell
Abstract:
Record linkage is aimed at the accurate and efficient identification of records that represent the same entity within or across disparate databases. It is a fundamental task in data integration and increasingly required for accurate decision making in application domains ranging from health analytics to national security. Traditional record linkage techniques calculate string similarities between…
▽ More
Record linkage is aimed at the accurate and efficient identification of records that represent the same entity within or across disparate databases. It is a fundamental task in data integration and increasingly required for accurate decision making in application domains ranging from health analytics to national security. Traditional record linkage techniques calculate string similarities between quasi-identifying (QID) values, such as the names and addresses of people. Errors, variations, and missing QID values can however lead to low linkage quality because the similarities between records cannot be calculated accurately. To overcome this challenge, we propose a novel technique that can accurately link records even when QID values contain errors or variations, or are missing. We first generate attribute signatures (concatenated QID values) using an Apriori based selection of suitable QID attributes, and then relational signatures that encapsulate relationship information between records. Combined, these signatures can uniquely identify individual records and facilitate fast and high quality linking of very large databases through accurate similarity calculations between records. We evaluate the linkage quality and scalability of our approach using large real-world databases, showing that it can achieve high linkage quality even when the databases being linked contain substantial amounts of missing values and errors.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Accurate and Efficient Suffix Tree Based Privacy-Preserving String Matching
Authors:
Sirintra Vaiwsri,
Thilina Ranbaduge,
Peter Christen,
Kee Siong Ng
Abstract:
The task of calculating similarities between strings held by different organizations without revealing these strings is an increasingly important problem in areas such as health informatics, national censuses, genomics, and fraud detection. Most existing privacy-preserving string comparison functions are either based on comparing sets of encoded character q-grams, allow only exact matching of encr…
▽ More
The task of calculating similarities between strings held by different organizations without revealing these strings is an increasingly important problem in areas such as health informatics, national censuses, genomics, and fraud detection. Most existing privacy-preserving string comparison functions are either based on comparing sets of encoded character q-grams, allow only exact matching of encrypted strings, or they are aimed at long genomic sequences that have a small alphabet. The set-based privacy-preserving similarity functions commonly used to compare name and address strings in the context of privacy-preserving record linkage do not take the positions of sub-strings into account. As a result, two very different strings can potentially be considered as an exact match leading to wrongly linked records. Existing set-based techniques also cannot identify the length of the longest common sub-string across two strings. In this paper we propose a novel approach for accurate and efficient privacy-preserving string matching based on suffix trees that are encoded using chained hashing. We incorporate a hashing based encoding technique upon the encoded suffixes to improve privacy against frequency attacks such as those exploiting Benford's law. Our approach allows various operations to be performed without the strings to be compared being revealed: the length of the longest common sub-string, do two strings have the same beginning, middle or end, and the longest common sub-string similarity between two strings. These functions allow a more accurate comparison of, for example, bank account, credit card, or telephone numbers, which cannot be compared appropriately with existing privacy-preserving string matching techniques. Our evaluation on several data sets with different types of strings validates the privacy and accuracy of our proposed approach.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Temporal graph-based clustering for historical record linkage
Authors:
Charini Nanayakkara,
Peter Christen,
Thilina Ranbaduge
Abstract:
Research in the social sciences is increasingly based on large and complex data collections, where individual data sets from different domains are linked and integrated to allow advanced analytics. A popular type of data used in such a context are historical censuses, as well as birth, death, and marriage certificates. Individually, such data sets however limit the types of studies that can be con…
▽ More
Research in the social sciences is increasingly based on large and complex data collections, where individual data sets from different domains are linked and integrated to allow advanced analytics. A popular type of data used in such a context are historical censuses, as well as birth, death, and marriage certificates. Individually, such data sets however limit the types of studies that can be conducted. Specifically, it is impossible to track individuals, families, or households over time. Once such data sets are linked and family trees spanning several decades are available it is possible to, for example, investigate how education, health, mobility, employment, and social status influence each other and the lives of people over two or even more generations. A major challenge is however the accurate linkage of historical data sets which is due to data quality and commonly also the lack of ground truth data being available. Unsupervised techniques need to be employed, which can be based on similarity graphs generated by comparing individual records. In this paper we present initial results from clustering birth records from Scotland where we aim to identify all births of the same mother and group siblings into clusters. We extend an existing clustering technique for record linkage by incorporating temporal constraints that must hold between births by the same mother, and propose a novel greedy temporal clustering technique. Experimental results show improvements over non-temporary approaches, however further work is needed to obtain links of high quality.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.