-
Data Innovation in Demography, Migration and Human Mobility
Authors:
Claudio Bosco,
Sara Grubanov-Boskovic,
Stefano Iacus,
Umberto Minora,
Francesco Sermi,
Spyridon Spyratos
Abstract:
With the consolidation of the culture of evidence-based policymaking, the availability of data has become central to policymakers. Nowadays, innovative data sources offer an opportunity to describe demographic, mobility, and migratory phenomena more accurately by making available large volumes of real-time and spatially detailed data. At the same time, however, data innovation has led to new chall…
▽ More
With the consolidation of the culture of evidence-based policymaking, the availability of data has become central to policymakers. Nowadays, innovative data sources offer an opportunity to describe demographic, mobility, and migratory phenomena more accurately by making available large volumes of real-time and spatially detailed data. At the same time, however, data innovation has led to new challenges (ethics, privacy, data governance models, data quality) for citizens, statistical offices, policymakers and the private sector. Focusing on the fields of demography, mobility, and migration studies, the aim of this report is to assess the current state of data innovation in the scientific literature as well as to identify areas in which data innovation has the most concrete potential for policymaking. Consequently, this study has reviewed more than 300 articles and scientific reports, as well as numerous tools, that employed non-traditional data sources to measure vital population events (mortality, fertility), migration and human mobility, and the population change and population distribution. The specific findings of our report form the basis of a discussion on a) how innovative data is used compared to traditional data sources; b) domains in which innovative data have the greatest potential to contribute to policymaking; c) the prospects of innovative data transition towards systematically contributing to official statistics and policymaking.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Migration patterns, friendship networks, and the diaspora: the potential of Facebook Social Connectedness Index to anticipate displacement patterns induced by Russia invasion of Ukraine in the European Union
Authors:
Umberto Minora,
Martina Belmonte,
Claudio Bosco,
Drew Johnston,
Eugenia Giraudy,
Stefano Iacus,
Francesco Sermi
Abstract:
The conflict in Ukraine is causing large-scale displacement in Europe and in the World. Based on the United Nations High Commissioner for Refugees (UNHCR) estimates, more than 7 million people fled the country as of 5 September 2022. In this context, it is extremely important to anticipate where these people are moving so that national to local authorities can better manage challenges related to t…
▽ More
The conflict in Ukraine is causing large-scale displacement in Europe and in the World. Based on the United Nations High Commissioner for Refugees (UNHCR) estimates, more than 7 million people fled the country as of 5 September 2022. In this context, it is extremely important to anticipate where these people are moving so that national to local authorities can better manage challenges related to their reception and integration. This work shows how innovative data from social media can provide useful insights on conflict-induced migration flows. In particular, we explore the potential of Facebook's Social Connectedness Index (SCI) for predicting migration flows in the context of the war in Ukraine, building on previous research findings that the presence of a diaspora network is one of the major migration drivers. To do so, we first evaluate the relationship between the Ukrainian diaspora and the number of refugees from Ukraine registered for Temporary Protection or similar national schemes as a proxy of migratory flows into the EU. We find a very strong correlation between the two (Pearson's r=0.94, p<0.0001), which indicates that the diaspora is attracting the people fleeing the war, who tend to reach their compatriots, in particular in the countries where the Ukrainian immigration was more a recent phenomenon. Second, we compare Facebook's SCI with available official data on diaspora at regional level in Europe. Our results suggest that the index, along with other readily available covariates, is a strong predictor of the Ukrainian diaspora at regional scale. Finally, we discuss the potential of Facebook's SCI to provide timely and spatially detailed information on human diaspora for those countries where this information might be missing or outdated, and to complement official statistics for fast policy response during conflicts.
△ Less
Submitted 15 December, 2022; v1 submitted 5 September, 2022;
originally announced September 2022.
-
O-Dang! The Ontology of Dangerous Speech Messages
Authors:
Marco A. Stranisci,
Simona Frenda,
Mirko Lai,
Oscar Araque,
Alessandra T. Cignarella,
Valerio Basile,
Viviana Patti,
Cristina Bosco
Abstract:
Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic i…
▽ More
Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of "gold standard", which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this paper we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account for a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG. The paper is structured as follows. In Section 1 the motivations of our work are outlined. Section 2 describes the O-Dang! Ontology, that provides a common semantic model for the integration of datasets in the KG. The Ontology Population stage with information about corpora, users, and annotations is presented in Section 3. Finally, in Section 4 an analysis of offensiveness across corpora is provided as a first case study for the resource.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Multilingual Irony Detection with Dependency Syntax and Neural Models
Authors:
Alessandra Teresa Cignarella,
Valerio Basile,
Manuela Sanguinetti,
Cristina Bosco,
Paolo Rosso,
Farah Benamara
Abstract:
This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental setti…
▽ More
This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations
Authors:
Manuela Sanguinetti,
Lauren Cassidy,
Cristina Bosco,
Özlem Çetinoğlu,
Alessandra Teresa Cignarella,
Teresa Lynn,
Ines Rehbein,
Josef Ruppenhofer,
Djamé Seddah,
Amir Zeldes
Abstract:
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, an…
▽ More
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks -- based on available literature -- along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Estimating the effects of water-induced shallow landslides on soil erosion
Authors:
Claudio Bosco,
Graham Sander
Abstract:
Rainfall induced landslides and soil erosion are part of a complex system of multiple interacting processes, and both are capable of significantly affecting sediment budgets. These sediment mass movements also have the potential to significantly impact on a broad network of ecosystems health, functionality and the services they provide. To support the integrated assessment of these processes it is…
▽ More
Rainfall induced landslides and soil erosion are part of a complex system of multiple interacting processes, and both are capable of significantly affecting sediment budgets. These sediment mass movements also have the potential to significantly impact on a broad network of ecosystems health, functionality and the services they provide. To support the integrated assessment of these processes it is necessary to develop reliable modelling architectures. This paper proposes a semi-quantitative integrated methodology for a robust assessment of soil erosion rates in data poor regions affected by landslide activity. It combines heuristic, empirical and probabilistic approaches. This proposed methodology is based on the geospatial semantic array programming paradigm and has been implemented on a catchment scale methodology using Geographic Information Systems (GIS) spatial analysis tools and GNU Octave. The integrated data-transformation model relies on a modular architecture, where the information flow among modules is constrained by semantic checks. In order to improve computational reproducibility, the geospatial data transformations implemented in ESRI ArcGis are made available in the free software GRASS GIS. The proposed modelling architecture is flexible enough for future transdisciplinary scenario analysis to be more easily designed. In particular, the architecture might contribute as a novel component to simplify future integrated analyses of the potential impact of wildfires or vegetation types and distributions, on sediment transport from water induced landslides and erosion.
△ Less
Submitted 23 January, 2015;
originally announced January 2015.
-
Towards the reproducibility in soil erosion modeling: a new Pan-European soil erosion map
Authors:
Claudio Bosco,
Daniele de Rigo,
Olivier Dewitte,
Luca Montanarella
Abstract:
Soil erosion by water is a widespread phenomenon throughout Europe and has the potentiality, with his on-site and off-site effects, to affect water quality, food security and floods. Despite the implementation of numerous and different models for estimating soil erosion by water in Europe, there is still a lack of harmonization of assessment methodologies.
Often, different approaches result in s…
▽ More
Soil erosion by water is a widespread phenomenon throughout Europe and has the potentiality, with his on-site and off-site effects, to affect water quality, food security and floods. Despite the implementation of numerous and different models for estimating soil erosion by water in Europe, there is still a lack of harmonization of assessment methodologies.
Often, different approaches result in soil erosion rates significantly different. Even when the same model is applied to the same region the results may differ. This can be due to the way the model is implemented (i.e. with the selection of different algorithms when available) and/or to the use of datasets having different resolution or accuracy. Scientific computation is emerging as one of the central topic of the scientific method, for overcoming these problems there is thus the necessity to develop reproducible computational method where codes and data are available.
The present study illustrates this approach. Using only public available datasets, we applied the Revised Universal Soil loss Equation (RUSLE) to locate the most sensitive areas to soil erosion by water in Europe.
A significant effort was made for selecting the better simplified equations to be used when a strict application of the RUSLE model is not possible. In particular for the computation of the Rainfall Erosivity factor (R) the reproducible research paradigm was applied. The calculation of the R factor was implemented using public datasets and the GNU R language. An easily reproducible validation procedure based on measured precipitation time series was applied using MATLAB language. Designing the computational modelling architecture with the aim to ease as much as possible the future reuse of the model in analysing climate change scenarios is also a challenging goal of the research.
△ Less
Submitted 16 February, 2014;
originally announced February 2014.