-
Reproducibility Companion Paper: Describing Subjective Experiment Consistency by $p$-Value P-P Plot
Authors:
Jakub Nawała,
Lucjan Janowski,
Bogdan Ćmiel,
Krzysztof Rusek,
Marc A. Kastner,
Jan Zahálka
Abstract:
In this paper we reproduce experimental results presented in our earlier work titled "Describing Subjective Experiment Consistency by $p$-Value P-P Plot" that was presented in the course of the 28th ACM International Conference on Multimedia. The paper aims at verifying the soundness of our prior results and helping others understand our software framework. We present artifacts that help reproduce…
▽ More
In this paper we reproduce experimental results presented in our earlier work titled "Describing Subjective Experiment Consistency by $p$-Value P-P Plot" that was presented in the course of the 28th ACM International Conference on Multimedia. The paper aims at verifying the soundness of our prior results and helping others understand our software framework. We present artifacts that help reproduce tables, figures and all the data derived from raw subjective responses that were included in our earlier work. Using the artifacts we show that our results are reproducible. We invite everyone to use our software framework for subjective responses analyses going beyond reproducibility efforts.
△ Less
Submitted 1 September, 2022;
originally announced September 2022.
-
Generalised Score Distribution: Underdispersed Continuation of the Beta-Binomial Distribution
Authors:
Bogdan Ćmiel,
Jakub Nawała,
Lucjan Janowski,
Krzysztof Rusek
Abstract:
A class of discrete probability distributions contains distributions with limited support. A typical example is some variant of a Likert scale, with response mapped to either the $\{1, 2, \ldots, 5\}$ or $\{-3, -2, \ldots, 2, 3\}$ set. An interesting subclass of discrete distributions with finite support are distributions limited to two parameters and having no more than one change in probability…
▽ More
A class of discrete probability distributions contains distributions with limited support. A typical example is some variant of a Likert scale, with response mapped to either the $\{1, 2, \ldots, 5\}$ or $\{-3, -2, \ldots, 2, 3\}$ set. An interesting subclass of discrete distributions with finite support are distributions limited to two parameters and having no more than one change in probability monotonicity. The main contribution of this paper is to propose a family of distributions fitting the above description, which we call the Generalised Score Distribution (GSD) class. The proposed GSD class covers the whole set of possible mean and variances, for any fixed and finite support. Furthermore, the GSD class can be treated as an underdispersed continuation of a reparametrized beta-binomial distribution. The GSD class parameters are intuitive and can be easily estimated by the method of moments. We also offer a Maximum Likelihood Estimation (MLE) algorithm for the GSD class and evidence that the class properly describes response distributions coming from 24 Multimedia Quality Assessment experiments. At last, we show that the GSD class can be represented as a sum of dichotomous zero-one random variables, which points to an interesting interpretation of the class.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Generalised Score Distribution: A Two-Parameter Discrete Distribution Accurately Describing Responses from Quality of Experience Subjective Experiments
Authors:
Jakub Nawała,
Lucjan Janowski,
Bogdan Ćmiel,
Krzysztof Rusek,
Pablo Pérez
Abstract:
Subjective responses from Multimedia Quality Assessment (MQA) experiments are conventionally analysed with methods not suitable for the data type these responses represent. Furthermore, obtaining subjective responses is resource intensive. A method allowing reuse of existing responses would be thus beneficial. Applying improper data analysis methods leads to difficult to interpret results. This en…
▽ More
Subjective responses from Multimedia Quality Assessment (MQA) experiments are conventionally analysed with methods not suitable for the data type these responses represent. Furthermore, obtaining subjective responses is resource intensive. A method allowing reuse of existing responses would be thus beneficial. Applying improper data analysis methods leads to difficult to interpret results. This encourages drawing erroneous conclusions. Building upon existing subjective responses is resource friendly and helps develop machine learning (ML) based visual quality predictors. We show that using a discrete model for analysis of responses from MQA subjective experiments is feasible. We indicate that our proposed Generalised Score Distribution (GSD) properly describes response distributions observed in typical MQA experiments. We highlight interpretability of GSD parameters and indicate that the GSD outperforms the approach based on sample empirical distribution when it comes to bootstrapping. We evidence that the GSD outcompetes the state-of-the-art model both in terms of goodness-of-fit and bootstrapping capabilities. To do all of that we analyse more than one million subjective responses from more than 30 subjective experiments. Furthermore, we make the code implementing the GSD model and related analyses available through our GitHub repository: https://github.com/Qub3k/subjective-exp-consistency-check
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
Describing Subjective Experiment Consistency by $p$-Value P-P Plot
Authors:
Jakub Nawała,
Lucjan Janowski,
Bogdan Ćmiel,
Krzysztof Rusek
Abstract:
There are phenomena that cannot be measured without subjective testing. However, subjective testing is a complex issue with many influencing factors. These interplay to yield either precise or incorrect results. Researchers require a tool to classify results of subjective experiment as either consistent or inconsistent. This is necessary in order to decide whether to treat the gathered scores as q…
▽ More
There are phenomena that cannot be measured without subjective testing. However, subjective testing is a complex issue with many influencing factors. These interplay to yield either precise or incorrect results. Researchers require a tool to classify results of subjective experiment as either consistent or inconsistent. This is necessary in order to decide whether to treat the gathered scores as quality ground truth data. Knowing if subjective scores can be trusted is key to drawing valid conclusions and building functional tools based on those scores (e.g., algorithms assessing the perceived quality of multimedia materials). We provide a tool to classify subjective experiment (and all its results) as either consistent or inconsistent. Additionally, the tool identifies stimuli having irregular score distribution. The approach is based on treating subjective scores as a random variable coming from the discrete Generalized Score Distribution (GSD). The GSD, in combination with a bootstrapped G-test of goodness-of-fit, allows to construct $p$-value P-P plot that visualizes experiment's consistency. The tool safeguards researchers from using inconsistent subjective data. In this way, it makes sure that conclusions they draw and tools they build are more precise and trustworthy. The proposed approach works in line with expectations drawn solely on experiment design descriptions of 21 real-life multimedia quality subjective experiments.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Generalized Score Distribution
Authors:
Lucjan Janowski,
Bogdan Ćmiel,
Krzysztof Rusek,
Jakub Nawała,
Zhi Li
Abstract:
A class of discrete probability distributions contains distributions with limited support, i.e. possible argument values are limited to a set of numbers (typically consecutive). Examples of such data are results from subjective experiments utilizing the Absolute Category Rating (ACR) technique, where possible answers (argument values) are $\{1, 2, \cdots, 5\}$ or typical Likert scale…
▽ More
A class of discrete probability distributions contains distributions with limited support, i.e. possible argument values are limited to a set of numbers (typically consecutive). Examples of such data are results from subjective experiments utilizing the Absolute Category Rating (ACR) technique, where possible answers (argument values) are $\{1, 2, \cdots, 5\}$ or typical Likert scale $\{-3, -2, \cdots, 3\}$. An interesting subclass of those distributions are distributions limited to two parameters: describing the mean value and the spread of the answers, and having no more than one change in the probability monotonicity. In this paper we propose a general distribution passing those limitations called Generalized Score Distribution (GSD). The proposed GSD covers all spreads of the answers, from very small, given by the Bernoulli distribution, to the maximum given by a Beta Binomial distribution. We also show that GSD correctly describes subjective experiments scores from video quality evaluations with probability of 99.7\%. A Google Collaboratory website with implementation of the GSD estimation, simulation, and visualization is provided.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.