Skip to main content

Showing 1–9 of 9 results for author: Tanaka, E

Searching in archive stat. Search in all archives.
.
  1. arXiv:2411.10482  [pdf, other

    cs.HC stat.AP

    The Noisy Work of Uncertainty Visualisation Research: A Review

    Authors: Harriet Mason, Dianne Cook, Sarah Goodwin, Emi Tanaka, Susan VanderPlas

    Abstract: Uncertainty visualisation is quickly becomming a hot topic in information visualisation. Exisiting reviews in the field take the definition and purpose of an uncertainty visualisation to be self evident which results in a large amout of conflicting information. This conflict largely stems from a conflation between uncertainty visualisations designed for decision making and those designed to preven… ▽ More

    Submitted 20 November, 2024; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: 52 pages with 7 figures

    MSC Class: 62P99 (Primary); 68P30; 62A01 (Secondary)

  2. arXiv:2411.01001  [pdf, other

    stat.ML cs.CV cs.LG

    Automated Assessment of Residual Plots with Computer Vision Models

    Authors: Weihao Li, Dianne Cook, Emi Tanaka, Susan VanderPlas, Klaus Ackermann

    Abstract: Plotting the residuals is a recommended procedure to diagnose deviations from linear model assumptions, such as non-linearity, heteroscedasticity, and non-normality. The presence of structure in residual plots can be tested using the lineup protocol to do visual inference. There are a variety of conventional residual tests, but the lineup protocol, used as a statistical test, performs better for d… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  3. arXiv:2311.09705  [pdf, other

    stat.CO stat.OT

    edibble: An R package to encapsulate elements of experimental designs for better planning, management and workflow

    Authors: Emi Tanaka

    Abstract: I present an R package called edibble that facilitates the design of experiments by encapsulating elements of the experiment in a series of composable functions. This package is an interpretation of "the grammar of experimental designs" by Tanaka (2023) in the R programming language. The main features of the edibble package are demonstrated, illustrating how it can be used to create a wide array o… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 32 pages

  4. A Plot is Worth a Thousand Tests: Assessing Residual Diagnostics with the Lineup Protocol

    Authors: Weihao Li, Dianne Cook, Emi Tanaka, Susan VanderPlas

    Abstract: Regression experts consistently recommend plotting residuals for model diagnosis, despite the availability of many numerical hypothesis test procedures designed to use residuals to assess problems with a model fit. Here we provide evidence for why this is good advice using data from a visual inference experiment. We show how conventional tests are too sensitive, which means that too often the conc… ▽ More

    Submitted 24 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

  5. arXiv:2307.11593  [pdf, other

    cs.OH q-bio.QM stat.ME

    Towards a unified language in experimental designs propagated by a software framework

    Authors: Emi Tanaka

    Abstract: Experiments require human decisions in the design process, which in turn are reformulated and summarized as inputs into a system (computational or otherwise) to generate the experimental design. I leverage this system to promote a language of experimental designs by proposing a novel computational framework, called "the grammar of experimental designs", to specify experimental designs based on an… ▽ More

    Submitted 24 July, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

  6. arXiv:2206.07532  [pdf, other

    stat.OT stat.CO

    Current state and prospects of R-packages for the design of experiments

    Authors: Emi Tanaka, Dewi Amaliah

    Abstract: Re-running an experiment is generally costly and, in some cases, impossible due to limited resources; therefore, the design of an experiment plays a critical role in increasing the quality of experimental data. In this paper, we describe the current state of R-packages for the design of experiments through an exploratory data analysis of package downloads, package metadata, and a comparison of cha… ▽ More

    Submitted 13 December, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: 14 pages, 8 figures, 1 supplementary material

  7. arXiv:2205.06417  [pdf, other

    stat.OT

    A Journey from Wild to Textbook Data to Reproducibly Refresh the Wages Data from the National Longitudinal Survey of Youth Database

    Authors: Dewi Amaliah, Dianne Cook, Emi Tanaka, Kate Hyde, Nicholas Tierney

    Abstract: Textbook data is essential for teaching statistics and data science methods because they are clean, allowing the instructor to focus on methodology. Ideally textbook data sets are refreshed regularly, especially when they are subsets taken from an on-going data collection. It is also important to use contemporary data for teaching, to imbue the sense that the methodology is relevant today. This pa… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

  8. Symbolic Formulae for Linear Mixed Models

    Authors: Emi Tanaka, Francis K. C. Hui

    Abstract: A statistical model is a mathematical representation of an often simplified or idealised data-generating process. In this paper, we focus on a particular type of statistical model, called linear mixed models (LMMs), that is widely used in many disciplines e.g.~agriculture, ecology, econometrics, psychology. Mixed models, also commonly known as multi-level, nested, hierarchical or panel data models… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

  9. arXiv:1807.07268  [pdf, other

    q-bio.QM stat.AP

    Simple robust genomic prediction and outlier detection for a multi-environmental field trial

    Authors: Emi Tanaka

    Abstract: The aim of plant breeding trials is often to identify germplasms that are well adapt to target environments. These germplasms are identified through genomic prediction from the analysis of multi-environmental field trial (MET) using linear mixed models. The occurrence of outliers in MET are common and known to adversely impact accuracy of genomic prediction yet the detection of outliers, and subse… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.