-
A Conceptual Model for Data Storytelling Highlights in Business Intelligence Environments
Authors:
Panos Vassiliadis,
Patrick Marcel,
Faten El Outa,
Veronika Peralta,
Dimos Gkitsakis
Abstract:
We introduce a conceptual model for highlights to support data analysis and storytelling in the domain of Business Intelligence, via the automated extraction, representation, and exploitation of highlights revealing key facts that are hidden in the data with which a data analyst works. The model builds on the concepts of Holistic and Elementary Highlights, along with their context, constituents an…
▽ More
We introduce a conceptual model for highlights to support data analysis and storytelling in the domain of Business Intelligence, via the automated extraction, representation, and exploitation of highlights revealing key facts that are hidden in the data with which a data analyst works. The model builds on the concepts of Holistic and Elementary Highlights, along with their context, constituents and interrelationships, whose synergy can identify internal properties, patterns and key facts in a dataset being analyzed.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
A declarative approach to data narration
Authors:
Patrick Marcel,
Veronika Peralta,
Faten El Outa,
Panos Vassiliadis
Abstract:
This vision paper lays the preliminary foundations for Data Narrative Management Systems (DNMS), systems that enable the storage, sharing, and manipulation of data narratives. We motivate the need for such formal foundations and introduce a simple logical framework inspired by the relational model. The core of this framework is a Data Narrative Manipulation Language inspired by the extended relati…
▽ More
This vision paper lays the preliminary foundations for Data Narrative Management Systems (DNMS), systems that enable the storage, sharing, and manipulation of data narratives. We motivate the need for such formal foundations and introduce a simple logical framework inspired by the relational model. The core of this framework is a Data Narrative Manipulation Language inspired by the extended relational algebra. We illustrate its use via examples and discuss the main challenges for the implementation of this vision.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Cube Interestingness: Novelty, Relevance, Peculiarity and Surprise
Authors:
Dimos Gkitsakis,
Spyridon Kaloudis,
Eirini Mouselli,
Veronika Peralta,
Patrick Marcel,
Panos Vassiliadis
Abstract:
In this paper, we discuss methods to assess the interestingness of a query in an environment of data cubes. We assume a hierarchical multidimensional database, storing data cubes and level hierarchies. We start with a comprehensive review of related work in the fields of studies of human behavior and computer science. We define the interestingness of a query as a vector of scores along difference…
▽ More
In this paper, we discuss methods to assess the interestingness of a query in an environment of data cubes. We assume a hierarchical multidimensional database, storing data cubes and level hierarchies. We start with a comprehensive review of related work in the fields of studies of human behavior and computer science. We define the interestingness of a query as a vector of scores along difference dimensions, like novelty, relevance, surprise and peculiarity and complement this definition with a taxonomy of the information that can be used to assess each of these dimensions of interestingness. We provide both syntactic (result-independent) checks and extensional (result-dependent) measures and algorithms for assessing the different dimensions of interestingness in a quantitative fashion. We also report our findings on a user study that we conducted, analyzing the significance of each dimension, its evolution over time and the behavior of the study's participants.
△ Less
Submitted 26 July, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Use of Context in Data Quality Management: a Systematic Literature Review
Authors:
Flavia Serra,
Veronika Peralta,
Adriana Marotta,
Patrick Marcel
Abstract:
The importance of context in data quality (DQ) was shown many years ago and nowadays is widely accepted. Early approaches and surveys defined DQ as \textit{fitness for use} and showed the influence of context on DQ. This paper presents a Systematic Literature Review (SLR) for investigating how context is taken into account in recent proposals for DQ management. We specifically present the planning…
▽ More
The importance of context in data quality (DQ) was shown many years ago and nowadays is widely accepted. Early approaches and surveys defined DQ as \textit{fitness for use} and showed the influence of context on DQ. This paper presents a Systematic Literature Review (SLR) for investigating how context is taken into account in recent proposals for DQ management. We specifically present the planning and execution of the SLR, the analysis criteria and our results reflecting the relationship between context and DQ in the state of the art and, particularly, how that context is defined and used for DQ management.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Methodology for Mining, Discovering and Analyzing Semantic Human Mobility Behaviors
Authors:
Clement Moreau,
Thomas Devogele,
Laurent Etienne,
Veronika Peralta,
Cyril de Runz
Abstract:
Various institutes produce large semantic datasets containing information regarding daily activities and human mobility. The analysis and understanding of such data are crucial for urban planning, socio-psychology, political sciences, and epidemiology. However, none of the typical data mining processes have been customized for the thorough analysis of semantic mobility sequences to translate data…
▽ More
Various institutes produce large semantic datasets containing information regarding daily activities and human mobility. The analysis and understanding of such data are crucial for urban planning, socio-psychology, political sciences, and epidemiology. However, none of the typical data mining processes have been customized for the thorough analysis of semantic mobility sequences to translate data into understandable behaviors. Based on an extended literature review, we propose a novel methodological pipeline called simba (Semantic Indicators for Mobility and Behavior Analysis), for mining and analyzing semantic mobility sequences to identify coherent information and human behaviors. A framework for semantic sequence mobility analysis and clustering explicability based on integrating different complementary statistical indicators and visual tools is implemented. To validate this methodology, we used a large set of real daily mobility sequences obtained from a household travel survey. Complementary knowledge is automatically discovered in the proposed method.
△ Less
Submitted 20 December, 2020; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Detecting coherent explorations in SQL workloads
Authors:
Veronika Peralta,
Patrick Marcel,
Willeme Verdeaux,
Aboubakar Sidikhy Diakhaby
Abstract:
This paper presents a proposal aiming at better understanding a workload of SQL queries and detecting coherent explorations hidden within the workload. In particular, our work investigates SQLShare [11], a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of…
▽ More
This paper presents a proposal aiming at better understanding a workload of SQL queries and detecting coherent explorations hidden within the workload. In particular, our work investigates SQLShare [11], a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of [11], this workload is the only one containing primarily ad-hoc hand-written queries over user-uploaded datasets. We analyzed this workload by extracting features that characterize SQL queries and we show how to use these features to separate sequences of SQL queries into meaningful explorations. We ran several tests over various query workloads to validate empirically our approach.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.