-
Hues and Cues: Human vs. CLIP
Authors:
Nuria Alabau-Bosque,
Jorge Vila-Tomás,
Paula Daudén-Oliver,
Pablo Hernández-Cámara,
Jose Manuel Jaén-Lorites,
Valero Laparra,
Jesús Malo
Abstract:
Playing games is inherently human, and a lot of games are created to challenge different human characteristics. However, these tasks are often left out when evaluating the human-like nature of artificial models. The objective of this work is proposing a new approach to evaluate artificial models via board games. To this effect, we test the color perception and color naming capabilities of CLIP by…
▽ More
Playing games is inherently human, and a lot of games are created to challenge different human characteristics. However, these tasks are often left out when evaluating the human-like nature of artificial models. The objective of this work is proposing a new approach to evaluate artificial models via board games. To this effect, we test the color perception and color naming capabilities of CLIP by playing the board game Hues & Cues and assess its alignment with humans. Our experiments show that CLIP is generally well aligned with human observers, but our approach brings to light certain cultural biases and inconsistencies when dealing with different abstraction levels that are hard to identify with other testing strategies. Our findings indicate that assessing models with different tasks like board games can make certain deficiencies in the models stand out in ways that are difficult to test with the commonly used benchmarks.
△ Less
Submitted 3 September, 2025; v1 submitted 2 September, 2025;
originally announced September 2025.
-
Assessing invariance to affine transformations in image quality metrics
Authors:
Nuria Alabau-Bosque,
Paula Daudén-Oliver,
Jorge Vila-Tomás,
Valero Laparra,
Jesús Malo
Abstract:
Subjective image quality metrics are usually evaluated according to the correlation with human opinion in databases with distortions that may appear in digital media. However, these oversee affine transformations which may represent better the changes in the images actually happening in natural conditions. Humans can be particularly invariant to these natural transformations, as opposed to the dig…
▽ More
Subjective image quality metrics are usually evaluated according to the correlation with human opinion in databases with distortions that may appear in digital media. However, these oversee affine transformations which may represent better the changes in the images actually happening in natural conditions. Humans can be particularly invariant to these natural transformations, as opposed to the digital ones.
In this work, we propose a methodology to evaluate any image quality metric by assessing their invariance to affine transformations, specifically: rotation, translation, scaling, and changes in spectral illumination. Here, invariance refers to the fact that certain distances should be neglected if their values are below a threshold. This is what we call invisibility threshold of a metric. Our methodology consists of two elements: (1) the determination of a visibility threshold in a subjective representation common to every metric, and (2) a transduction from the distance values of the metric and this common representation. This common representation is based on subjective ratings of readily available image quality databases. We determine the threshold in such common representation (the first element) using accurate psychophysics. Then, the transduction (the second element) can be trivially fitted for any metric: with the provided threshold extension of the method to any metric is straightforward. We test our methodology with some well-established metrics and find that none of them show human-like invisibility thresholds.
This means that tuning the models exclusively to predict the visibility of generic distortions may disregard other properties of human vision as for instance invariances or invisibility thresholds. The data and code are publicly available to test other metrics.
△ Less
Submitted 19 September, 2025; v1 submitted 25 July, 2024;
originally announced July 2024.
-
Image Segmentation via Divisive Normalization: dealing with environmental diversity
Authors:
Pablo Hernández-Cámara,
Jorge Vila-Tomás,
Paula Dauden-Oliver,
Nuria Alabau-Bosque,
Valero Laparra,
Jesús Malo
Abstract:
Autonomous driving is a challenging scenario for image segmentation due to the presence of uncontrolled environmental conditions and the eventually catastrophic consequences of failures. Previous work suggested that a biologically motivated computation, the so-called Divisive Normalization, could be useful to deal with image variability, but its effects have not been systematically studied over di…
▽ More
Autonomous driving is a challenging scenario for image segmentation due to the presence of uncontrolled environmental conditions and the eventually catastrophic consequences of failures. Previous work suggested that a biologically motivated computation, the so-called Divisive Normalization, could be useful to deal with image variability, but its effects have not been systematically studied over different data sources and environmental factors. Here we put segmentation U-nets augmented with Divisive Normalization to work far from training conditions to find where this adaptation is more critical. We categorize the scenes according to their radiance level and dynamic range (day/night), and according to their achromatic/chromatic contrasts. We also consider video game (synthetic) images to broaden the range of environments. We check the performance in the extreme percentiles of such categorization. Then, we push the limits further by artificially modifying the images in perceptually/environmentally relevant dimensions: luminance, contrasts and spectral radiance. Results show that neural networks with Divisive Normalization get better results in all the scenarios and their performance remains more stable with regard to the considered environmental factors and nature of the source. Finally, we explain the improvements in segmentation performance in two ways: (1) by quantifying the invariance of the responses that incorporate Divisive Normalization, and (2) by illustrating the adaptive nonlinearity of the different layers that depends on the local activity.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.