-
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Authors:
Max Klabunde,
Tassilo Wald,
Tobias Schumacher,
Klaus Maier-Hein,
Markus Strohmaier,
Florian Lemmerich
Abstract:
Measuring the similarity of different representations of neural architectures is a fundamental task and an open research challenge for the machine learning community. This paper presents the first comprehensive benchmark for evaluating representational similarity measures based on well-defined groundings of similarity. The representational similarity (ReSi) benchmark consists of (i) six carefully…
▽ More
Measuring the similarity of different representations of neural architectures is a fundamental task and an open research challenge for the machine learning community. This paper presents the first comprehensive benchmark for evaluating representational similarity measures based on well-defined groundings of similarity. The representational similarity (ReSi) benchmark consists of (i) six carefully designed tests for similarity measures, (ii) 24 similarity measures, (iii) 14 neural network architectures, and (iv) seven datasets, spanning over the graph, language, and vision domains. The benchmark opens up several important avenues of research on representational similarity that enable novel explorations and applications of neural architectures. We demonstrate the utility of the ReSi benchmark by conducting experiments on various neural network architectures, real world datasets and similarity measures. All components of the benchmark are publicly available and thereby facilitate systematic reproduction and production of research results. The benchmark is extensible, future research can build on and further expand it. We believe that the ReSi benchmark can serve as a sound platform catalyzing future research that aims to systematically evaluate existing and explore novel ways of comparing representations of neural architectures.
△ Less
Submitted 11 March, 2025; v1 submitted 1 August, 2024;
originally announced August 2024.
-
Towards Measuring Representational Similarity of Large Language Models
Authors:
Max Klabunde,
Mehdi Ben Amor,
Michael Granitzer,
Florian Lemmerich
Abstract:
Understanding the similarity of the numerous released large language models (LLMs) has many uses, e.g., simplifying model selection, detecting illegal model reuse, and advancing our understanding of what makes LLMs perform well. In this work, we measure the similarity of representations of a set of LLMs with 7B parameters. Our results suggest that some LLMs are substantially different from others.…
▽ More
Understanding the similarity of the numerous released large language models (LLMs) has many uses, e.g., simplifying model selection, detecting illegal model reuse, and advancing our understanding of what makes LLMs perform well. In this work, we measure the similarity of representations of a set of LLMs with 7B parameters. Our results suggest that some LLMs are substantially different from others. We identify challenges of using representational similarity measures that suggest the need of careful study of similarity scores to avoid false conclusions.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Authors:
Max Klabunde,
Tobias Schumacher,
Markus Strohmaier,
Florian Lemmerich
Abstract:
Measuring similarity of neural networks to understand and improve their behavior has become an issue of great importance and research interest. In this survey, we provide a comprehensive overview of two complementary perspectives of measuring neural network similarity: (i) representational similarity, which considers how activations of intermediate layers differ, and (ii) functional similarity, wh…
▽ More
Measuring similarity of neural networks to understand and improve their behavior has become an issue of great importance and research interest. In this survey, we provide a comprehensive overview of two complementary perspectives of measuring neural network similarity: (i) representational similarity, which considers how activations of intermediate layers differ, and (ii) functional similarity, which considers how models differ in their outputs. In addition to providing detailed descriptions of existing measures, we summarize and discuss results on the properties of and relationships between these measures, and point to open research problems. We hope our work lays a foundation for more systematic research on the properties and applicability of similarity measures for neural network models.
△ Less
Submitted 9 April, 2025; v1 submitted 10 May, 2023;
originally announced May 2023.
-
On the Prediction Instability of Graph Neural Networks
Authors:
Max Klabunde,
Florian Lemmerich
Abstract:
Instability of trained models, i.e., the dependence of individual node predictions on random factors, can affect reproducibility, reliability, and trust in machine learning systems. In this paper, we systematically assess the prediction instability of node classification with state-of-the-art Graph Neural Networks (GNNs). With our experiments, we establish that multiple instantiations of popular G…
▽ More
Instability of trained models, i.e., the dependence of individual node predictions on random factors, can affect reproducibility, reliability, and trust in machine learning systems. In this paper, we systematically assess the prediction instability of node classification with state-of-the-art Graph Neural Networks (GNNs). With our experiments, we establish that multiple instantiations of popular GNN models trained on the same data with the same model hyperparameters result in almost identical aggregated performance but display substantial disagreement in the predictions for individual nodes. We find that up to one third of the incorrectly classified nodes differ across algorithm runs. We identify correlations between hyperparameters, node properties, and the size of the training set with the stability of predictions. In general, maximizing model performance implicitly also reduces model instability.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
The Effects of Randomness on the Stability of Node Embeddings
Authors:
Tobias Schumacher,
Hinrikus Wolf,
Martin Ritzert,
Florian Lemmerich,
Jan Bachmann,
Florian Frantzen,
Max Klabunde,
Martin Grohe,
Markus Strohmaier
Abstract:
We systematically evaluate the (in-)stability of state-of-the-art node embedding algorithms due to randomness, i.e., the random variation of their outcomes given identical algorithms and graphs. We apply five node embeddings algorithms---HOPE, LINE, node2vec, SDNE, and GraphSAGE---to synthetic and empirical graphs and assess their stability under randomness with respect to (i) the geometry of embe…
▽ More
We systematically evaluate the (in-)stability of state-of-the-art node embedding algorithms due to randomness, i.e., the random variation of their outcomes given identical algorithms and graphs. We apply five node embeddings algorithms---HOPE, LINE, node2vec, SDNE, and GraphSAGE---to synthetic and empirical graphs and assess their stability under randomness with respect to (i) the geometry of embedding spaces as well as (ii) their performance in downstream tasks. We find significant instabilities in the geometry of embedding spaces independent of the centrality of a node. In the evaluation of downstream tasks, we find that the accuracy of node classification seems to be unaffected by random seeding while the actual classification of nodes can vary significantly. This suggests that instability effects need to be taken into account when working with node embeddings. Our work is relevant for researchers and engineers interested in the effectiveness, reliability, and reproducibility of node embedding approaches.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.