-
Robustness Analysis of Deep Learning Models for Population Synthesis
Authors:
Daniel Opoku Mensah,
Godwin Badu-Marfo,
Bilal Farooq
Abstract:
Deep generative models have become useful for synthetic data generation, particularly population synthesis. The models implicitly learn the probability distribution of a dataset and can draw samples from a distribution. Several models have been proposed, but their performance is only tested on a single cross-sectional sample. The implementation of population synthesis on single datasets is seen as…
▽ More
Deep generative models have become useful for synthetic data generation, particularly population synthesis. The models implicitly learn the probability distribution of a dataset and can draw samples from a distribution. Several models have been proposed, but their performance is only tested on a single cross-sectional sample. The implementation of population synthesis on single datasets is seen as a drawback that needs further studies to explore the robustness of the models on multiple datasets. While comparing with the real data can increase trust and interpretability of the models, techniques to evaluate deep generative models' robustness for population synthesis remain underexplored. In this study, we present bootstrap confidence interval for the deep generative models, an approach that computes efficient confidence intervals for mean errors predictions to evaluate the robustness of the models to multiple datasets. Specifically, we adopt the tabular-based Composite Travel Generative Adversarial Network (CTGAN) and Variational Autoencoder (VAE), to estimate the distribution of the population, by generating agents that have tabular data using several samples over time from the same study area. The models are implemented on multiple travel diaries of Montreal Origin- Destination Survey of 2008, 2013, and 2018 and compare the predictive performance under varying sample sizes from multiple surveys. Results show that the predictive errors of CTGAN have narrower confidence intervals indicating its robustness to multiple datasets of the varying sample sizes when compared to VAE. Again, the evaluation of model robustness against varying sample size shows a minimal decrease in model performance with decrease in sample size. This study directly supports agent-based modelling by enabling finer synthetic generation of populations in a reliable environment.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
eFedDNN: Ensemble based Federated Deep Neural Networks for Trajectory Mode Inference
Authors:
Daniel Opoku Mensah,
Godwin Badu-Marfo,
Ranwa Al Mallah,
Bilal Farooq
Abstract:
As the most significant data source in smart mobility systems, GPS trajectories can help identify user travel mode. However, these GPS datasets may contain users' private information (e.g., home location), preventing many users from sharing their private information with a third party. Hence, identifying travel modes while protecting users' privacy is a significant issue. To address this challenge…
▽ More
As the most significant data source in smart mobility systems, GPS trajectories can help identify user travel mode. However, these GPS datasets may contain users' private information (e.g., home location), preventing many users from sharing their private information with a third party. Hence, identifying travel modes while protecting users' privacy is a significant issue. To address this challenge, we use federated learning (FL), a privacy-preserving machine learning technique that aims at collaboratively training a robust global model by accessing users' locally trained models but not their raw data. Specifically, we designed a novel ensemble-based Federated Deep Neural Network (eFedDNN). The ensemble method combines the outputs of the different models learned via FL by the users and shows an accuracy that surpasses comparable models reported in the literature. Extensive experimental studies on a real-world open-access dataset from Montreal demonstrate that the proposed inference model can achieve accurate identification of users' mode of travel without compromising privacy.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
On the Initial Behavior Monitoring Issues in Federated Learning
Authors:
Ranwa Al Mallah,
Godwin Badu-Marfo,
Bilal Farooq
Abstract:
In Federated Learning (FL), a group of workers participate to build a global model under the coordination of one node, the chief. Regarding the cybersecurity of FL, some attacks aim at injecting the fabricated local model updates into the system. Some defenses are based on malicious worker detection and behavioral pattern analysis. In this context, without timely and dynamic monitoring methods, th…
▽ More
In Federated Learning (FL), a group of workers participate to build a global model under the coordination of one node, the chief. Regarding the cybersecurity of FL, some attacks aim at injecting the fabricated local model updates into the system. Some defenses are based on malicious worker detection and behavioral pattern analysis. In this context, without timely and dynamic monitoring methods, the chief cannot detect and remove the malicious or unreliable workers from the system. Our work emphasize the urgency to prepare the federated learning process for monitoring and eventually behavioral pattern analysis. We study the information inside the learning process in the early stages of training, propose a monitoring process and evaluate the monitoring period required. The aim is to analyse at what time is it appropriate to start the detection algorithm in order to remove the malicious or unreliable workers from the system and optimise the defense mechanism deployment. We tested our strategy on a behavioral pattern analysis defense applied to the FL process of different benchmark systems for text and image classification. Our results show that the monitoring process lowers false positives and false negatives and consequently increases system efficiency by enabling the distributed learning system to achieve better performance in the early stage of training.
△ Less
Submitted 28 November, 2021; v1 submitted 11 September, 2021;
originally announced September 2021.
-
Cybersecurity Threats in Connected and Automated Vehicles based Federated Learning Systems
Authors:
Ranwa Al Mallah,
Godwin Badu-Marfo,
Bilal Farooq
Abstract:
Federated learning (FL) is a machine learning technique that aims at training an algorithm across decentralized entities holding their local data private. Wireless mobile networks allow users to communicate with other fixed or mobile users. The road traffic network represents an infrastructure-based configuration of a wireless mobile network where the Connected and Automated Vehicles (CAV) represe…
▽ More
Federated learning (FL) is a machine learning technique that aims at training an algorithm across decentralized entities holding their local data private. Wireless mobile networks allow users to communicate with other fixed or mobile users. The road traffic network represents an infrastructure-based configuration of a wireless mobile network where the Connected and Automated Vehicles (CAV) represent the communicating entities. Applying FL in a wireless mobile network setting gives rise to a new threat in the mobile environment that is very different from the traditional fixed networks. The threat is due to the intrinsic characteristics of the wireless medium and is caused by the characteristics of the vehicular networks such as high node-mobility and rapidly changing topology. Most cyber defense techniques depend on highly reliable and connected networks. This paper explores falsified information attacks, which target the FL process that is ongoing at the RSU. We identified a number of attack strategies conducted by the malicious CAVs to disrupt the training of the global model in vehicular networks. We show that the attacks were able to increase the convergence time and decrease the accuracy the model. We demonstrate that our attacks bypass FL defense strategies in their primary form and highlight the need for novel poisoning resilience defense mechanisms in the wireless mobile setting of the future road networks.
△ Less
Submitted 3 June, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
A Differentially Private Multi-Output Deep Generative Networks Approach For Activity Diary Synthesis
Authors:
Godwin Badu-Marfo,
Bilal Farooq,
Zachary Patterson
Abstract:
In this work, we develop a privacy-by-design generative model for synthesizing the activity diary of the travel population using state-of-art deep learning approaches. This proposed approach extends literature on population synthesis by contributing novel deep learning to the development and application of synthetic travel data while guaranteeing privacy protection for members of the sample popula…
▽ More
In this work, we develop a privacy-by-design generative model for synthesizing the activity diary of the travel population using state-of-art deep learning approaches. This proposed approach extends literature on population synthesis by contributing novel deep learning to the development and application of synthetic travel data while guaranteeing privacy protection for members of the sample population on which the synthetic populations are based. First, we show a complete de-generalization of activity diaries to simulate the socioeconomic features and longitudinal sequences of geographically and temporally explicit activities. Second, we introduce a differential privacy approach to control the level of resolution disclosing the uniqueness of survey participants. Finally, we experiment using the Generative Adversarial Networks (GANs). We evaluate the statistical distributions, pairwise correlations and measure the level of privacy guaranteed on simulated datasets for varying noise. The results of the model show successes in simulating activity diaries composed of multiple outputs including structured socio-economic features and sequential tour activities in a differentially private manner.
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Composite Travel Generative Adversarial Networks for Tabular and Sequential Population Synthesis
Authors:
Godwin Badu-Marfo,
Bilal Farooq,
Zachary Paterson
Abstract:
Agent-based transportation modelling has become the standard to simulate travel behaviour, mobility choices and activity preferences using disaggregate travel demand data for entire populations, data that are not typically readily available. Various methods have been proposed to synthesize population data for this purpose. We present a Composite Travel Generative Adversarial Network (CTGAN), a nov…
▽ More
Agent-based transportation modelling has become the standard to simulate travel behaviour, mobility choices and activity preferences using disaggregate travel demand data for entire populations, data that are not typically readily available. Various methods have been proposed to synthesize population data for this purpose. We present a Composite Travel Generative Adversarial Network (CTGAN), a novel deep generative model to estimate the underlying joint distribution of a population, that is capable of reconstructing composite synthetic agents having tabular (e.g. age and sex) as well as sequential mobility data (e.g. trip trajectory and sequence). The CTGAN model is compared with other recently proposed methods such as the Variational Autoencoders (VAE) method, which has shown success in high dimensional tabular population synthesis. We evaluate the performance of the synthesized outputs based on distribution similarity, multi-variate correlations and spatio-temporal metrics. The results show the consistent and accurate generation of synthetic populations and their tabular and spatially sequential attributes, generated over varying spatial scales and dimensions.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Perturbation Methods for Protection of Sensitive Location Data: Smartphone Travel Survey Case Study
Authors:
Godwin Badu-Marfo,
Bilal Farooq,
Zachary Patterson
Abstract:
Smartphone based travel data collection has become an important tool for the analysis of transportation systems. Interest in sharing travel survey data has gained popularity in recent years as "Open Data Initiatives" by governments seek to allow the public to use these data, and hopefully be able to contribute their findings and analysis to the public sphere. The public release of such precise inf…
▽ More
Smartphone based travel data collection has become an important tool for the analysis of transportation systems. Interest in sharing travel survey data has gained popularity in recent years as "Open Data Initiatives" by governments seek to allow the public to use these data, and hopefully be able to contribute their findings and analysis to the public sphere. The public release of such precise information, particularly location data such as place of residence, opens the risk of privacy violation. At the same time, in order for such data to be useful, as much spatial resolution as possible is desirable for utility in transportation applications and travel demand modeling. This paper evaluates geographic random perturbation methods (i.e. Geo-indistinguishability and the Donut geomask) in protecting the privacy of respondents whose residential location may be published. We measure the performance of location privacy methods, preservation of utility and randomness in the distribution of perturbation distances with varying parameters. It is found that both methods produce distributions of spatial perturbations that conform closely to common probability distributions and as a result, that the original locations can be inferred with little information and a high degree of precision. It is also found that while Achieved K-estimate anonymity increases linearly with desired anonymity for the Donut geomask, Geo-Indistinguishability is highly dependent upon its privacy budget factor (epsilon) and is not very effective at assuring desired Achieved K-estimate anonymity.
△ Less
Submitted 12 August, 2019; v1 submitted 21 January, 2019;
originally announced January 2019.
-
A Perspective on the Challenges and Opportunities for Privacy-Aware Big Transportation Data
Authors:
Godwin Badu-Marfo,
Bilal Farooq,
Zachary Patterson
Abstract:
In recent years, and especially since the development of the smartphone, enormous amounts of data relevant for transportation have become available. These data hold out the potential to redefine how transportation system (i.e. design, planning and operations) is done. While researchers in both academia and industry are making advances in using this data to transportation system ends (e.g. informat…
▽ More
In recent years, and especially since the development of the smartphone, enormous amounts of data relevant for transportation have become available. These data hold out the potential to redefine how transportation system (i.e. design, planning and operations) is done. While researchers in both academia and industry are making advances in using this data to transportation system ends (e.g. information inference from collected data), little attention has been paid to four larger scale challenges that will need to be overcome if the potential for Big Transportation Data is to be harnessed for transportation decision-making purposes. This paper aims to provide awareness of these large-scale challenges and provides insight into how we believe these challenges are likely to be met.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.