-
The Closer You Look, The More You Learn: A Grey-box Approach to Protocol State Machine Learning
Authors:
Chris McMahon Stone,
Sam L. Thomas,
Mathy Vanhoef,
James Henderson,
Nicolas Bailluet,
Tom Chothia
Abstract:
In this paper, we propose a new approach to infer state machine models from protocol implementations. Our method, STATEINSPECTOR, learns protocol states by using novel program analyses to combine observations of run-time memory and I/O. It requires no access to source code and only lightweight execution monitoring of the implementation under test. We demonstrate and evaluate STATEINSPECTOR's effec…
▽ More
In this paper, we propose a new approach to infer state machine models from protocol implementations. Our method, STATEINSPECTOR, learns protocol states by using novel program analyses to combine observations of run-time memory and I/O. It requires no access to source code and only lightweight execution monitoring of the implementation under test. We demonstrate and evaluate STATEINSPECTOR's effectiveness on numerous TLS and WPA/2 implementations. In the process, we show STATEINSPECTOR enables deeper state discovery, increased learning efficiency, and more insightful post-mortem analyses than existing approaches. Further to improved learning, our method led us to discover several concerning deviations from the standards and a high impact vulnerability in a prominent Wi-Fi implementation.
△ Less
Submitted 7 June, 2021; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Geographical veracity of indicators derived from mobile phone data
Authors:
Maarten Vanhoof,
Thomas Ploetz,
Zbigniew Smoreda
Abstract:
In this contribution we summarize insights on the geographical veracity of using mobile phone data to create (statistical) indicators. We focus on problems that persist with spatial allocation, spatial delineation and spatial aggregation of information obtained from mobile phone data. For each of the cases, we offer insights from our works on a French CDR dataset and propose both short and long te…
▽ More
In this contribution we summarize insights on the geographical veracity of using mobile phone data to create (statistical) indicators. We focus on problems that persist with spatial allocation, spatial delineation and spatial aggregation of information obtained from mobile phone data. For each of the cases, we offer insights from our works on a French CDR dataset and propose both short and long term solutions. As such, we aim at offering a list of challenges, and a roadmap for future work on the topic.
△ Less
Submitted 26 September, 2018;
originally announced September 2018.
-
Performance and sensitivities of home detection from mobile phone data
Authors:
Maarten Vanhoof,
Clement Lee,
Zbigniew Smoreda
Abstract:
Large-scale location based traces, such as mobile phone data, have been identified as a promising data source to complement or even enrich official statistics. In many cases, a prerequisite step to deploy the massively gathered data is the detection of home location from individual users. The problem is that little research exists on the validation (comparison with ground truth datasets) or the un…
▽ More
Large-scale location based traces, such as mobile phone data, have been identified as a promising data source to complement or even enrich official statistics. In many cases, a prerequisite step to deploy the massively gathered data is the detection of home location from individual users. The problem is that little research exists on the validation (comparison with ground truth datasets) or the uncertainty estimation of home detection methods, not at individual user level, nor at nation-wide levels. In this paper, we present an extensive empirical analysis of home detection methods when performed on a nation-wide mobile phone dataset from France. We analyze the validity of 9 different Home Detection Algorithms (HDAs), and we assess different sources of uncertainty. Based on 225 different set-ups for the home detection of around 18 million users we discuss different measures for validation and investigate sensitivity to user choices such as HDA parameter choice and observation period restriction. Our findings show that nation-wide performance of home detection is moderate at best, with correlations to ground truth maximizing at 0.60 only. Additionally, we show that time and duration of observation have a clear effect on performance, and that the effect of HDA criteria and parameter choice are rather small compared to other uncertainties. Our findings and discussion offer welcoming insights to other practitioners who want to apply home detection on similar datasets, or who are in need of an assessment of the challenges and uncertainties related to mobilizing mobile phone data for official statistics.
△ Less
Submitted 26 September, 2018;
originally announced September 2018.
-
Assessing the quality of home detection from mobile phone data for official statistics
Authors:
Maarten Vanhoof,
Fernando Reis,
Thomas Ploetz,
Zbigniew Smoreda
Abstract:
Mobile phone data are an interesting new data source for official statistics. However, multiple problems and uncertainties need to be solved before these data can inform, support or even become an integral part of statistical production processes. In this paper, we focus on arguably the most important problem hindering the application of mobile phone data in official statistics: detecting home loc…
▽ More
Mobile phone data are an interesting new data source for official statistics. However, multiple problems and uncertainties need to be solved before these data can inform, support or even become an integral part of statistical production processes. In this paper, we focus on arguably the most important problem hindering the application of mobile phone data in official statistics: detecting home locations. We argue that current efforts to detect home locations suffer from a blind deployment of criteria to define a place of residence and from limited validation possibilities. We support our argument by analysing the performance of five home detection algorithms (HDAs) that have been applied to a large, French, Call Detailed Record (CDR) dataset (~18 million users, 5 months). Our results show that criteria choice in HDAs influences the detection of home locations for up to about 40% of users, that HDAs perform poorly when compared with a validation dataset (the 35°-gap), and that their performance is sensitive to the time period and the duration of observation. Based on our findings and experiences, we offer several recommendations for official statistics. If adopted, our recommendations would help in ensuring a more reliable use of mobile phone data vis-à-vis official statistics.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Detecting home locations from CDR data: introducing spatial uncertainty to the state-of-the-art
Authors:
Maarten Vanhoof,
Fernando Reis,
Zbigniew Smoreda,
Thomas Ploetz
Abstract:
Non-continuous location traces inferred from Call Detail Records (CDR) at population scale are increasingly becoming available for research and show great potential for automated detection of meaningful places. Yet, a majority of Home Detection Algorithms (HDAs) suffer from "blind" deployment of criteria to define homes and from limited possibilities for validation. In this paper, we investigate t…
▽ More
Non-continuous location traces inferred from Call Detail Records (CDR) at population scale are increasingly becoming available for research and show great potential for automated detection of meaningful places. Yet, a majority of Home Detection Algorithms (HDAs) suffer from "blind" deployment of criteria to define homes and from limited possibilities for validation. In this paper, we investigate the performance and capabilities of five popular criteria for home detection based on a very large mobile phone dataset from France (~18 million users, 6 months). Furthermore, we construct a data-driven framework to assess the spatial uncertainty related to the application of HDAs. Our findings appropriate spatial uncertainty in HDA and, in extension, for detection of meaningful places. We show how spatial uncertainties on the individuals' level can be assessed in absence of ground truth annotation, how they relate to traditional, high-level validation practices and how they can be used to improve results for, e.g., nation-wide population estimation.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
An analytical framework to nowcast well-being using mobile phone data
Authors:
Luca Pappalardo,
Maarten Vanhoof,
Lorenzo Gabrielli,
Zbigniew Smoreda,
Dino Pedreschi,
Fosca Giannotti
Abstract:
An intriguing open question is whether measurements made on Big Data recording human activities can yield us high-fidelity proxies of socio-economic development and well-being. Can we monitor and predict the socio-economic development of a territory just by observing the behavior of its inhabitants through the lens of Big Data? In this paper, we design a data-driven analytical framework that uses…
▽ More
An intriguing open question is whether measurements made on Big Data recording human activities can yield us high-fidelity proxies of socio-economic development and well-being. Can we monitor and predict the socio-economic development of a territory just by observing the behavior of its inhabitants through the lens of Big Data? In this paper, we design a data-driven analytical framework that uses mobility measures and social measures extracted from mobile phone data to estimate indicators for socio-economic development and well-being. We discover that the diversity of mobility, defined in terms of entropy of the individual users' trajectories, exhibits (i) significant correlation with two different socio-economic indicators and (ii) the highest importance in predictive models built to predict the socio-economic indicators. Our analytical framework opens an interesting perspective to study human behavior through the lens of Big Data by means of new statistical indicators that quantify and possibly "nowcast" the well-being and the socio-economic development of a territory.
△ Less
Submitted 16 March, 2016;
originally announced June 2016.