-
A Conceptual Model for Data Storytelling Highlights in Business Intelligence Environments
Authors:
Panos Vassiliadis,
Patrick Marcel,
Faten El Outa,
Veronika Peralta,
Dimos Gkitsakis
Abstract:
We introduce a conceptual model for highlights to support data analysis and storytelling in the domain of Business Intelligence, via the automated extraction, representation, and exploitation of highlights revealing key facts that are hidden in the data with which a data analyst works. The model builds on the concepts of Holistic and Elementary Highlights, along with their context, constituents an…
▽ More
We introduce a conceptual model for highlights to support data analysis and storytelling in the domain of Business Intelligence, via the automated extraction, representation, and exploitation of highlights revealing key facts that are hidden in the data with which a data analyst works. The model builds on the concepts of Holistic and Elementary Highlights, along with their context, constituents and interrelationships, whose synergy can identify internal properties, patterns and key facts in a dataset being analyzed.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
A declarative approach to data narration
Authors:
Patrick Marcel,
Veronika Peralta,
Faten El Outa,
Panos Vassiliadis
Abstract:
This vision paper lays the preliminary foundations for Data Narrative Management Systems (DNMS), systems that enable the storage, sharing, and manipulation of data narratives. We motivate the need for such formal foundations and introduce a simple logical framework inspired by the relational model. The core of this framework is a Data Narrative Manipulation Language inspired by the extended relati…
▽ More
This vision paper lays the preliminary foundations for Data Narrative Management Systems (DNMS), systems that enable the storage, sharing, and manipulation of data narratives. We motivate the need for such formal foundations and introduce a simple logical framework inspired by the relational model. The core of this framework is a Data Narrative Manipulation Language inspired by the extended relational algebra. We illustrate its use via examples and discuss the main challenges for the implementation of this vision.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Cube Interestingness: Novelty, Relevance, Peculiarity and Surprise
Authors:
Dimos Gkitsakis,
Spyridon Kaloudis,
Eirini Mouselli,
Veronika Peralta,
Patrick Marcel,
Panos Vassiliadis
Abstract:
In this paper, we discuss methods to assess the interestingness of a query in an environment of data cubes. We assume a hierarchical multidimensional database, storing data cubes and level hierarchies. We start with a comprehensive review of related work in the fields of studies of human behavior and computer science. We define the interestingness of a query as a vector of scores along difference…
▽ More
In this paper, we discuss methods to assess the interestingness of a query in an environment of data cubes. We assume a hierarchical multidimensional database, storing data cubes and level hierarchies. We start with a comprehensive review of related work in the fields of studies of human behavior and computer science. We define the interestingness of a query as a vector of scores along difference dimensions, like novelty, relevance, surprise and peculiarity and complement this definition with a taxonomy of the information that can be used to assess each of these dimensions of interestingness. We provide both syntactic (result-independent) checks and extensional (result-dependent) measures and algorithms for assessing the different dimensions of interestingness in a quantitative fashion. We also report our findings on a user study that we conducted, analyzing the significance of each dimension, its evolution over time and the behavior of the study's participants.
△ Less
Submitted 26 July, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Use of Context in Data Quality Management: a Systematic Literature Review
Authors:
Flavia Serra,
Veronika Peralta,
Adriana Marotta,
Patrick Marcel
Abstract:
The importance of context in data quality (DQ) was shown many years ago and nowadays is widely accepted. Early approaches and surveys defined DQ as \textit{fitness for use} and showed the influence of context on DQ. This paper presents a Systematic Literature Review (SLR) for investigating how context is taken into account in recent proposals for DQ management. We specifically present the planning…
▽ More
The importance of context in data quality (DQ) was shown many years ago and nowadays is widely accepted. Early approaches and surveys defined DQ as \textit{fitness for use} and showed the influence of context on DQ. This paper presents a Systematic Literature Review (SLR) for investigating how context is taken into account in recent proposals for DQ management. We specifically present the planning and execution of the SLR, the analysis criteria and our results reflecting the relationship between context and DQ in the state of the art and, particularly, how that context is defined and used for DQ management.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
A Subjective Interestingness measure for Business Intelligence explorations
Authors:
Alexandre Chanson,
Ben Crulis,
Nicolas Labroche,
Patrick Marcel
Abstract:
This paper addresses the problem of defining a subjective interestingness measure for BI exploration. Such a measure involves prior modeling of the belief of the user. The complexity of this problem lies in the impossibility to ask the user about the degree of belief in each element composing their knowledge prior to the writing of a query. To this aim, we propose to automatically infer this user…
▽ More
This paper addresses the problem of defining a subjective interestingness measure for BI exploration. Such a measure involves prior modeling of the belief of the user. The complexity of this problem lies in the impossibility to ask the user about the degree of belief in each element composing their knowledge prior to the writing of a query. To this aim, we propose to automatically infer this user belief based on the user's past interactions over a data cube, the cube schema and other users past activities. We express the belief under the form of a probability distribution over all the query parts potentially accessible to the user, and use a random walk to learn this distribution. This belief is then used to define a first Subjective Interestingness measure over multidimensional queries. Experiments conducted on simulated and real explorations show how this new subjective interestingness measure relates to prototypical and real user behaviors, and that query parts offer a reasonable proxy to infer user belief.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
Detecting coherent explorations in SQL workloads
Authors:
Veronika Peralta,
Patrick Marcel,
Willeme Verdeaux,
Aboubakar Sidikhy Diakhaby
Abstract:
This paper presents a proposal aiming at better understanding a workload of SQL queries and detecting coherent explorations hidden within the workload. In particular, our work investigates SQLShare [11], a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of…
▽ More
This paper presents a proposal aiming at better understanding a workload of SQL queries and detecting coherent explorations hidden within the workload. In particular, our work investigates SQLShare [11], a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of [11], this workload is the only one containing primarily ad-hoc hand-written queries over user-uploaded datasets. We analyzed this workload by extracting features that characterize SQL queries and we show how to use these features to separate sequences of SQL queries into meaningful explorations. We ran several tests over various query workloads to validate empirically our approach.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
Beyond Roll-Up's and Drill-Down's: An Intentional Analytics Model to Reinvent OLAP (long-version)
Authors:
Panos Vassiliadis,
Patrick Marcel,
Stefano Rizzi
Abstract:
This paper structures a novel vision for OLAP by fundamentally redefining several of the pillars on which OLAP has been based for the last 20 years. We redefine OLAP queries, in order to move to higher degrees of abstraction from roll-up's and drill-down's, and we propose a set of novel intentional OLAP operators, namely, describe, assess, explain, predict, and suggest, which express the user's ne…
▽ More
This paper structures a novel vision for OLAP by fundamentally redefining several of the pillars on which OLAP has been based for the last 20 years. We redefine OLAP queries, in order to move to higher degrees of abstraction from roll-up's and drill-down's, and we propose a set of novel intentional OLAP operators, namely, describe, assess, explain, predict, and suggest, which express the user's need for results. We fundamentally redefine what a query answer is, and escape from the constraint that the answer is a set of tuples; on the contrary, we complement the set of tuples with models (typically, but not exclusively, results of data mining algorithms over the involved data) that concisely represent the internal structure or correlations of the data. Due to the diverse nature of the involved models, we come up (for the first time ever, to the best of our knowledge) with a unifying framework for them, that places its pillars on the extension of each data cell of a cube with information about the models that pertain to it -- practically converting the small parts that build up the models to data that annotate each cell. We exploit this data-to-model mapping to provide highlights of the data, by isolating data and models that maximize the delivery of new information to the user. We introduce a novel method for assessing the surprise that a new query result brings to the user, with respect to the information contained in previous results the user has seen via a new interestingness measure. The individual parts of our proposal are integrated in a new data model for OLAP, which we call the Intentional Analytics Model. We complement our contribution with a list of significant open problems for the community to address.
△ Less
Submitted 8 December, 2020; v1 submitted 19 December, 2018;
originally announced December 2018.