-
Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments
Authors:
Chenyu Zhang,
Daniil Cherniavskii,
Andrii Zadaianchuk,
Antonios Tragoudaras,
Antonios Vozikis,
Thijmen Nijdam,
Derck W. E. Prinzhorn,
Mark Bodracska,
Nicu Sebe,
Efstratios Gavves
Abstract:
Recent advances in image and video generation raise hopes that these models possess world modeling capabilities, the ability to generate realistic, physically plausible videos. This could revolutionize applications in robotics, autonomous driving, and scientific simulation. However, before treating these models as world models, we must ask: Do they adhere to physical conservation laws? To answer t…
▽ More
Recent advances in image and video generation raise hopes that these models possess world modeling capabilities, the ability to generate realistic, physically plausible videos. This could revolutionize applications in robotics, autonomous driving, and scientific simulation. However, before treating these models as world models, we must ask: Do they adhere to physical conservation laws? To answer this, we introduce Morpheus, a benchmark for evaluating video generation models on physical reasoning. It features 80 real-world videos capturing physical phenomena, guided by conservation laws. Since artificial generations lack ground truth, we assess physical plausibility using physics-informed metrics evaluated with respect to infallible conservation laws known per physical setting, leveraging advances in physics-informed neural networks and vision-language foundation models. Our findings reveal that even with advanced prompting and video conditioning, current models struggle to encode physical principles despite generating aesthetically pleasing videos. All data, leaderboard, and code are open-sourced at our project page.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Reproducing NevIR: Negation in Neural Information Retrieval
Authors:
Coen van den Elsen,
Francien Barkhof,
Thijmen Nijdam,
Simon Lupart,
Mohammad Aliannejadi
Abstract:
Negation is a fundamental aspect of human communication, yet it remains a challenge for Language Models (LMs) in Information Retrieval (IR). Despite the heavy reliance of modern neural IR systems on LMs, little attention has been given to their handling of negation. In this study, we reproduce and extend the findings of NevIR, a benchmark study that revealed most IR models perform at or below the…
▽ More
Negation is a fundamental aspect of human communication, yet it remains a challenge for Language Models (LMs) in Information Retrieval (IR). Despite the heavy reliance of modern neural IR systems on LMs, little attention has been given to their handling of negation. In this study, we reproduce and extend the findings of NevIR, a benchmark study that revealed most IR models perform at or below the level of random ranking when dealing with negation. We replicate NevIR's original experiments and evaluate newly developed state-of-the-art IR models. Our findings show that a recently emerging category-listwise Large Language Model (LLM) re-rankers-outperforms other models but still underperforms human performance. Additionally, we leverage ExcluIR, a benchmark dataset designed for exclusionary queries with extensive negation, to assess the generalisability of negation understanding. Our findings suggest that fine-tuning on one dataset does not reliably improve performance on the other, indicating notable differences in their data distributions. Furthermore, we observe that only cross-encoders and listwise LLM re-rankers achieve reasonable performance across both negation tasks.
△ Less
Submitted 4 May, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
Reproducibility Study Of Learning Fair Graph Representations Via Automated Data Augmentations
Authors:
Thijmen Nijdam,
Juell Sprott,
Taiki Papandreou-Lazos,
Jurgen de Heus
Abstract:
In this study, we undertake a reproducibility analysis of 'Learning Fair Graph Representations Via Automated Data Augmentations' by Ling et al. (2022). We assess the validity of the original claims focused on node classification tasks and explore the performance of the Graphair framework in link prediction tasks. Our investigation reveals that we can partially reproduce one of the original three c…
▽ More
In this study, we undertake a reproducibility analysis of 'Learning Fair Graph Representations Via Automated Data Augmentations' by Ling et al. (2022). We assess the validity of the original claims focused on node classification tasks and explore the performance of the Graphair framework in link prediction tasks. Our investigation reveals that we can partially reproduce one of the original three claims and fully substantiate the other two. Additionally, we broaden the application of Graphair from node classification to link prediction across various datasets. Our findings indicate that, while Graphair demonstrates a comparable fairness-accuracy trade-off to baseline models for mixed dyadic-level fairness, it has a superior trade-off for subgroup dyadic-level fairness. These findings underscore Graphair's potential for wider adoption in graph-based learning. Our code base can be found on GitHub at https://github.com/juellsprott/graphair-reproducibility.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.
-
Conformal time series decomposition with component-wise exchangeability
Authors:
Derck W. E. Prinzhorn,
Thijmen Nijdam,
Putri A. van der Linden,
Alexander Timans
Abstract:
Conformal prediction offers a practical framework for distribution-free uncertainty quantification, providing finite-sample coverage guarantees under relatively mild assumptions on data exchangeability. However, these assumptions cease to hold for time series due to their temporally correlated nature. In this work, we present a novel use of conformal prediction for time series forecasting that inc…
▽ More
Conformal prediction offers a practical framework for distribution-free uncertainty quantification, providing finite-sample coverage guarantees under relatively mild assumptions on data exchangeability. However, these assumptions cease to hold for time series due to their temporally correlated nature. In this work, we present a novel use of conformal prediction for time series forecasting that incorporates time series decomposition. This approach allows us to model different temporal components individually. By applying specific conformal algorithms to each component and then merging the obtained prediction intervals, we customize our methods to account for the different exchangeability regimes underlying each component. Our decomposition-based approach is thoroughly discussed and empirically evaluated on synthetic and real-world data. We find that the method provides promising results on well-structured time series, but can be limited by factors such as the decomposition step for more complex data.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.