-
Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity
Authors:
Mohsen Jenadeleh,
Jon Sneyers,
Davi Lazzarotto,
Shima Mohammadi,
Dominik Keller,
Atanas Boev,
Rakesh Rao Ramachandra Rao,
António Pinheiro,
Thomas Richter,
Alexander Raake,
Touradj Ebrahimi,
João Ascenso,
Dietmar Saupe
Abstract:
High dynamic range (HDR) and wide color gamut (WCG) technologies significantly improve color reproduction compared to standard dynamic range (SDR) and standard color gamuts, resulting in more accurate, richer, and more immersive images. However, HDR increases data demands, posing challenges for bandwidth efficiency and compression techniques.
Advances in compression and display technologies requ…
▽ More
High dynamic range (HDR) and wide color gamut (WCG) technologies significantly improve color reproduction compared to standard dynamic range (SDR) and standard color gamuts, resulting in more accurate, richer, and more immersive images. However, HDR increases data demands, posing challenges for bandwidth efficiency and compression techniques.
Advances in compression and display technologies require more precise image quality assessment, particularly in the high-fidelity range where perceptual differences are subtle.
To address this gap, we introduce AIC-HDR2025, the first such HDR dataset, comprising 100 test images generated from five HDR sources, each compressed using four codecs at five compression levels. It covers the high-fidelity range, from visible distortions to compression levels below the visually lossless threshold.
A subjective study was conducted using the JPEG AIC-3 test methodology, combining plain and boosted triplet comparisons. In total, 34,560 ratings were collected from 151 participants across four fully controlled labs. The results confirm that AIC-3 enables precise HDR quality estimation, with 95\% confidence intervals averaging a width of 0.27 at 1 JND. In addition, several recently proposed objective metrics were evaluated based on their correlation with subjective ratings. The dataset is publicly available.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Appeal prediction for AI up-scaled Images
Authors:
Steve Göring,
Rasmus Merten,
Alexander Raake
Abstract:
DNN- or AI-based up-scaling algorithms are gaining in popularity due to the improvements in machine learning. Various up-scaling models using CNNs, GANs or mixed approaches have been published. The majority of models are evaluated using PSRN and SSIM or only a few example images. However, a performance evaluation with a wide range of real-world images and subjective evaluation is missing, which we…
▽ More
DNN- or AI-based up-scaling algorithms are gaining in popularity due to the improvements in machine learning. Various up-scaling models using CNNs, GANs or mixed approaches have been published. The majority of models are evaluated using PSRN and SSIM or only a few example images. However, a performance evaluation with a wide range of real-world images and subjective evaluation is missing, which we tackle in the following paper. For this reason, we describe our developed dataset, which uses 136 base images and five different up-scaling methods, namely Real-ESRGAN, BSRGAN, waifu2x, KXNet, and Lanczos. Overall the dataset consists of 1496 annotated images. The labeling of our dataset focused on image appeal and has been performed using crowd-sourcing employing our open-source tool AVRate Voyager. We evaluate the appeal of the different methods, and the results indicate that Real-ESRGAN and BSRGAN are the best. Furthermore, we train a DNN to detect which up-scaling method has been used, the trained models have a good overall performance in our evaluation. In addition to this, we evaluate state-of-the-art image appeal and quality models, here none of the models showed a high prediction performance, therefore we also trained two own approaches. The first uses transfer learning and has the best performance, and the second model uses signal-based features and a random forest model with good overall performance. We share the data and implementation to allow further research in the context of open science.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Satellite Streaming Video QoE Prediction: A Real-World Subjective Database and Network-Level Prediction Models
Authors:
Bowen Chen,
Zaixi Shang,
Jae Won Chung,
David Lerner,
Werner Robitza,
Rakesh Rao Ramachandra Rao,
Alexander Raake,
Alan C. Bovik
Abstract:
Demand for streaming services, including satellite, continues to exhibit unprecedented growth. Internet Service Providers find themselves at the crossroads of technological advancements and rising customer expectations. To stay relevant and competitive, these ISPs must ensure their networks deliver optimal video streaming quality, a key determinant of user satisfaction. Towards this end, it is imp…
▽ More
Demand for streaming services, including satellite, continues to exhibit unprecedented growth. Internet Service Providers find themselves at the crossroads of technological advancements and rising customer expectations. To stay relevant and competitive, these ISPs must ensure their networks deliver optimal video streaming quality, a key determinant of user satisfaction. Towards this end, it is important to have accurate Quality of Experience prediction models in place. However, achieving robust performance by these models requires extensive data sets labeled by subjective opinion scores on videos impaired by diverse playback disruptions. To bridge this data gap, we introduce the LIVE-Viasat Real-World Satellite QoE Database. This database consists of 179 videos recorded from real-world streaming services affected by various authentic distortion patterns. We also conducted a comprehensive subjective study involving 54 participants, who contributed both continuous-time opinion scores and endpoint (retrospective) QoE scores. Our analysis sheds light on various determinants influencing subjective QoE, such as stall events, spatial resolutions, bitrate, and certain network parameters. We demonstrate the usefulness of this unique new resource by evaluating the efficacy of prevalent QoE-prediction models on it. We also created a new model that maps the network parameters to predicted human perception scores, which can be used by ISPs to optimize the video streaming quality of their networks. Our proposed model, which we call SatQA, is able to accurately predict QoE using only network parameters, without any access to pixel data or video-specific metadata, estimated by Spearman's Rank Order Correlation Coefficient (SROCC), Pearson Linear Correlation Coefficient (PLCC), and Root Mean Squared Error (RMSE), indicating high accuracy and reliability.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research
Authors:
Thomas Robotham,
Ashutosh Singla,
Olli S. Rummukainen,
Alexander Raake,
Emanuël A. P. Habets
Abstract:
Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, t…
▽ More
Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, there is a lack of high-quality open-source audiovisual databases that can be used to evaluate such aspects or systems capable of reproducing high-quality content. With this paper, we provide a publicly available audiovisual database consisting of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680x3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360 video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. The paper provides details of the scene requirements, recording approach, and scene descriptions. The database provides high-quality reference material with a balanced focus on auditory and visual sensory information. The database will be continuously updated with additional scenes and further metadata such as human ratings and saliency information.
△ Less
Submitted 27 December, 2022;
originally announced December 2022.
-
Technological Factors Influencing Videoconferencing and Zoom Fatigue
Authors:
Alexander Raake,
Markus Fiedler,
Katrin Schoenenberg,
Katrien De Moor,
Nicola Döring
Abstract:
The paper presents a conceptual, multidimensional approach to understand the technological factors that are assumed to or even have been proven to contribute to what has been coined as Zoom Fatigue (ZF) or more generally Videoconferencing Fatigue (VCF). With the advent of the Covid-19 pandemic, the usage of VC services has drastically increased, leading to more and more reports about the ZF or VCF…
▽ More
The paper presents a conceptual, multidimensional approach to understand the technological factors that are assumed to or even have been proven to contribute to what has been coined as Zoom Fatigue (ZF) or more generally Videoconferencing Fatigue (VCF). With the advent of the Covid-19 pandemic, the usage of VC services has drastically increased, leading to more and more reports about the ZF or VCF phenomenon. The paper is motivated by the fact that some of the media outlets initially starting the debate on what Zoom fatigue is and how it can be avoided, as well as some of the scientific papers addressing the topic, contain assumptions that are rather hypothetical and insufficiently underpinned by scientific evidence. Most of these works are acknowledge the lacking evidence and partly suggest directions for future research. This paper intends to deepen the survey of VC-technology-related literature and to provide more existing evidence, where possible, while reviewing some of the already provided support or evidence for certain causal hypotheses. The technological factors dimension and its identified sub-dimensions presented in this paper are embedded within a more holistic four-dimensional conceptual factors model describing the causes for ZF or VCF. The paper describing this overall conceptual model is written by the same group of authors and currently under revision for an Open Access Journal publication. The present paper expands on the technological factors dimension descriptions provided in the overall model paper and provides more detailed analyzes and concepts associated with how VC technology may affect users' perception, cognitive load, interaction and communication, possibly leading to stress, exhaustion and fatigue. The paper currently is a living document which will be expanded further with regard to the evidence for or against the impact of certain technological factors.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)
Authors:
Andrew Perkis,
Christian Timmerer,
Sabina Baraković,
Jasmina Baraković Husić,
Søren Bech,
Sebastian Bosse,
Jean Botev,
Kjell Brunnström,
Luis Cruz,
Katrien De Moor,
Andrea de Polo Saibanti,
Wouter Durnez,
Sebastian Egger-Lampl,
Ulrich Engelke,
Tiago H. Falk,
Jesús Gutiérrez,
Asim Hameed,
Andrew Hines,
Tanja Kojic,
Dragan Kukolj,
Eirini Liotou,
Dragorad Milovanovic,
Sebastian Möller,
Niall Murray,
Babak Naderi
, et al. (19 additional authors not shown)
Abstract:
With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions,…
▽ More
With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions, scope, and constituents that are required to be addressed so that a coherent understanding of the concepts can be achieved. Such consensus is vital for paving the directionality of the future of immersive media experiences (IMEx) and all related matters. The aim of this white paper is to provide a survey of definitions of immersion and presence which leads to a definition of immersive media experience (IMEx). The Quality of Experience (QoE) for immersive media is described by establishing a relationship between the concepts of QoE and IMEx followed by application areas of immersive media experience. Influencing factors on immersive media experience are elaborated as well as the assessment of immersive media experience. Finally, standardization activities related to IMEx are highlighted and the white paper is concluded with an outlook related to future developments.
△ Less
Submitted 24 November, 2020; v1 submitted 10 June, 2020;
originally announced July 2020.
-
Semantic-driven Colorization
Authors:
Man M. Ho,
Lu Zhang,
Alexander Raake,
Jinjia Zhou
Abstract:
Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images. Consequently, the generated color is easier to be overflowed, and the semantic faults are invisible. As a human experience in colorization, our brains first detect and recognize the objects in the photo, then imagine their plausible colors based on many similar objects we have s…
▽ More
Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images. Consequently, the generated color is easier to be overflowed, and the semantic faults are invisible. As a human experience in colorization, our brains first detect and recognize the objects in the photo, then imagine their plausible colors based on many similar objects we have seen in real life, and finally colorize them, as described in the teaser. In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it. Thus, our work can provide plausible colors at a semantic level. Plus, the semantic information of the learned model becomes understandable and able to interact. Additionally, we also prove that Instance Normalization is also a missing ingredient for colorization, then re-design the inference flow of U-Net to have two streams of data, providing an appropriate way of normalizing the feature maps from the black-and-white image and its semantic map. As a result, our network can provide plausible colors competitive to the typical colorization works for specific objects.
△ Less
Submitted 14 August, 2021; v1 submitted 13 June, 2020;
originally announced June 2020.
-
Because we care: Privacy Dashboard on Firefox OS
Authors:
Marta Piekarska,
Yun Zhou,
Dominik Strohmeier,
Alexander Raake
Abstract:
In this paper we present the Privacy Dashboard -- a tool designed to inform and empower the people using mobile devices, by introducing features such as Remote Privacy Protection, Backup, Adjustable Location Accuracy, Permission Control and Secondary-User Mode. We have implemented our solution on FirefoxOS and conducted user studies to verify the usefulness and usability of our tool. The paper sta…
▽ More
In this paper we present the Privacy Dashboard -- a tool designed to inform and empower the people using mobile devices, by introducing features such as Remote Privacy Protection, Backup, Adjustable Location Accuracy, Permission Control and Secondary-User Mode. We have implemented our solution on FirefoxOS and conducted user studies to verify the usefulness and usability of our tool. The paper starts with a discussion of different aspects of mobile privacy, how users perceive it and how much they are willing to give up for better usability. Then we describe the tool in detail, presenting what incentives drove us to certain design decisions. During our studies we tried to understand how users interact with the system and what are their priorities. We have verified our hypothesis, and the impact of the educational aspects on the decisions about the privacy settings. We show that by taking a user-centric development of privacy extensions we can reduce the gap between protection and usability.
△ Less
Submitted 12 June, 2015;
originally announced June 2015.
-
INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control
Authors:
Sebastian Moeller,
Jan Krebber,
Alexander Raake,
Paula Smeele,
Martin Rajman,
Mirek Melichar,
Vincenzo Pallotta,
Gianna Tsakou,
Basilis Kladis,
Anestis Vovos,
Jettie Hoonhout,
Dietmar Schuchardt,
Nikos Fakotakis,
Todor Ganchev,
Ilyas Potamitis
Abstract:
This paper gives an overview of the assessment and evaluation methods which have been used to determine the quality of the INSPIRE smart home system. The system allows different home appliances to be controlled via speech, and consists of speech and speaker recognition, speech understanding, dialogue management, and speech output components. The performance of these components is first assessed…
▽ More
This paper gives an overview of the assessment and evaluation methods which have been used to determine the quality of the INSPIRE smart home system. The system allows different home appliances to be controlled via speech, and consists of speech and speaker recognition, speech understanding, dialogue management, and speech output components. The performance of these components is first assessed individually, and then the entire system is evaluated in an interaction experiment with test users. Initial results of the assessment and evaluation are given, in particular with respect to the transmission channel impact on speech and speaker recognition, and the assessment of speech output for different system metaphors.
△ Less
Submitted 24 October, 2004;
originally announced October 2004.