-
To Measure What Isn't There -- Visual Exploration of Missingness Structures Using Quality Metrics
Authors:
Sara Johansson Fernstad,
Sarah Alsufyani,
Silvia Del Din,
Alison Yarnall,
Lynn Rochester
Abstract:
This paper contributes a set of quality metrics for identification and visual analysis of structured missingness in high-dimensional data. Missing values in data are a frequent challenge in most data generating domains and may cause a range of analysis issues. Structural missingness in data may indicate issues in data collection and pre-processing, but may also highlight important data characteris…
▽ More
This paper contributes a set of quality metrics for identification and visual analysis of structured missingness in high-dimensional data. Missing values in data are a frequent challenge in most data generating domains and may cause a range of analysis issues. Structural missingness in data may indicate issues in data collection and pre-processing, but may also highlight important data characteristics. While research into statistical methods for dealing with missing data are mainly focusing on replacing missing values with plausible estimated values, visualization has great potential to support a more in-depth understanding of missingness structures in data. Nonetheless, while the interest in missing data visualization has increased in the last decade, it is still a relatively overlooked research topic with a comparably small number of publications, few of which address scalability issues. Efficient visual analysis approaches are needed to enable exploration of missingness structures in large and high-dimensional data, and to support informed decision-making in context of potential data quality issues. This paper suggests a set of quality metrics for identification of patterns of interest for understanding of structural missingness in data. These quality metrics can be used as guidance in visual analysis, as demonstrated through a use case exploring structural missingness in data from a real-life walking monitoring study. All supplemental materials for this paper are available at https://doi.org/10.25405/data.ncl.c.7741829.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Visualization of missing data: a state-of-the-art survey
Authors:
Sarah Alsufyani,
Matthew Forshaw,
Sara Johansson Fernstad
Abstract:
Missing data, the data value that is not recorded for a variable, occurs in almost all statistical analyses and may be caused by many reasons, such as lack of collection or a lack of documentation. Researchers need to adequately deal with this issue to provide a valid analysis. The visualization of missing values plays an important role in supporting the investigation and understanding of the miss…
▽ More
Missing data, the data value that is not recorded for a variable, occurs in almost all statistical analyses and may be caused by many reasons, such as lack of collection or a lack of documentation. Researchers need to adequately deal with this issue to provide a valid analysis. The visualization of missing values plays an important role in supporting the investigation and understanding of the missing data patterns. While some techniques and tools for visualization of missing values are available, it is still a challenge to select the right visualization that will fulfil the user requirements for visualizing missing data. This paper provides an overview and state-of-the-art report (STAR) of research literature focusing on missing values visualization. To the best of our knowledge, this is the first survey paper with a focus on missing data visualization. The goal of this paper is to encourage visualization researchers to increase their involvement with Missing data visualization.
△ Less
Submitted 27 September, 2024;
originally announced October 2024.
-
Impacts of aspect ratio on task accuracy in parallel coordinates
Authors:
Hugh Garner,
Sara Johansson Fernstad
Abstract:
Parallel coordinates plots (PCPs) are a widely used visualization method, particularly for exploratory analysis. Previous studies show that PCPs perform much more poorly for estimating positive correlation than for estimating negative correlation, but it is not clear if this is affected by the aspect ratio (AR) of the axes pairs. In this paper, we present the results from an evaluation of the effe…
▽ More
Parallel coordinates plots (PCPs) are a widely used visualization method, particularly for exploratory analysis. Previous studies show that PCPs perform much more poorly for estimating positive correlation than for estimating negative correlation, but it is not clear if this is affected by the aspect ratio (AR) of the axes pairs. In this paper, we present the results from an evaluation of the effect of the aspect ratio of axes in static (non-interactive) PCPs for two tasks: a) linear correlation estimation and b) value tracing. For both tasks we find strong evidence that AR influences accuracy, including ARs greater than 1:1 being much more performant for estimation of positive correlations. We provide a set of recommendations for visualization designers using PCPs for correlation or value-tracing tasks, based on the data characteristics and expected use cases.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
To Explore What Isn't There -- Glyph-based Visualization for Analysis of Missing Values
Authors:
Sara Johansson Fernstad,
Jimmy Johansson
Abstract:
This paper contributes a novel visualization method, Missingness Glyph, for analysis and exploration of missing values in data. Missing values are a common challenge in most data generating domains and may cause a range of analysis issues. Missingness in data may indicate potential problems in data collection and pre-processing, or highlight important data characteristics. While the development an…
▽ More
This paper contributes a novel visualization method, Missingness Glyph, for analysis and exploration of missing values in data. Missing values are a common challenge in most data generating domains and may cause a range of analysis issues. Missingness in data may indicate potential problems in data collection and pre-processing, or highlight important data characteristics. While the development and improvement of statistical methods for dealing with missing data is a research area in its own right, mainly focussing on replacing missing values with estimated values, considerably less focus has been put on visualization of missing values. Nonetheless, visualization and explorative analysis has great potential to support understanding of missingness in data, and to enable gaining of novel insights into patterns of missingness in a way that statistical methods are unable to. The Missingness Glyph supports identification of relevant missingness patterns in data, and is evaluated and compared to two other visualization methods in context of the missingness patterns. The results are promising and confirms that the Missingness Glyph in several cases perform better than the alternative visualization methods.
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
Visual Entropy and the Visualization of Uncertainty
Authors:
Nicolas S. Holliman,
Arzu Coltekin,
Sara J. Fernstad,
Lucy McLaughlin,
Michael D. Simpson,
Andrew J. Woods
Abstract:
Background: Even though data visualizations (and underlying data) almost always contain uncertainty, it remains complex to communicate and interpret uncertainty representations. Consequently, uncertainty visualizations for non-expert audiences are rare. Objective: our aim is to rigorously define and evaluate the novel use of visual entropy as a measure of shape that allows us to construct an order…
▽ More
Background: Even though data visualizations (and underlying data) almost always contain uncertainty, it remains complex to communicate and interpret uncertainty representations. Consequently, uncertainty visualizations for non-expert audiences are rare. Objective: our aim is to rigorously define and evaluate the novel use of visual entropy as a measure of shape that allows us to construct an ordered scale of glyphs for use in representing both uncertainty and value in 2D and 3D environments. Method: We use sample entropy as a numerical measure of visual entropy to construct a set of glyphs using R and Blender which vary in their complexity. Results: an exact binomial analysis of a pairwise comparison of the glyphs shows a majority of participants (n = 87) ordered each glyph as predicted by the visual entropy score with large effect size (Cohen's g > 0.25). We also evaluate whether the glyphs effectively represent uncertainty using a signal detection method in a search task. Participants (n = 15) were able to find glyphs representing uncertainty with high sensitivity and low error rates. Conclusion: visual entropy is a successful novel approach to representing ordered data and provides a channel that can allow the uncertainty of a measure to be presented alongside its mean value.
△ Less
Submitted 30 April, 2022; v1 submitted 30 July, 2019;
originally announced July 2019.