-
Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
Authors:
Arnau Igualde Sáez,
Lamyae Rhomrasi,
Yusef Ahsini,
Ricardo Vinuesa,
Sergio Hoyas,
Jose P. García Sabater,
Marius J. Fullana i Alfonso,
J. Alberto Conejero
Abstract:
Multimodal Large Language Models (MLLMs) promise advanced vision language capabilities, yet their effectiveness in visually presented mathematics remains underexplored. This paper analyzes the development and evaluation of MLLMs for mathematical problem solving, focusing on diagrams, multilingual text, and symbolic notation. We then assess several models, including GPT 4o, Pixtral, Qwen VL, Llama…
▽ More
Multimodal Large Language Models (MLLMs) promise advanced vision language capabilities, yet their effectiveness in visually presented mathematics remains underexplored. This paper analyzes the development and evaluation of MLLMs for mathematical problem solving, focusing on diagrams, multilingual text, and symbolic notation. We then assess several models, including GPT 4o, Pixtral, Qwen VL, Llama 3.2 Vision variants, and Gemini 2.0 Flash in a multilingual Kangaroo style benchmark spanning English, French, Spanish, and Catalan. Our experiments reveal four key findings. First, overall precision remains moderate across geometry, visual algebra, logic, patterns, and combinatorics: no single model excels in every topic. Second, while most models see improved accuracy with questions that do not have images, the gain is often limited; performance for some remains nearly unchanged without visual input, indicating underutilization of diagrammatic information. Third, substantial variation exists across languages and difficulty levels: models frequently handle easier items but struggle with advanced geometry and combinatorial reasoning. Notably, Gemini 2.0 Flash achieves the highest precision on image based tasks, followed by Qwen VL 2.5 72B and GPT 4o, though none approach human level performance. Fourth, a complementary analysis aimed at distinguishing whether models reason or simply recite reveals that Gemini and GPT 4o stand out for their structured reasoning and consistent accuracy. In contrast, Pixtral and Llama exhibit less consistent reasoning, often defaulting to heuristics or randomness when unable to align their outputs with the given answer options.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Large language models in climate and sustainability policy: limits and opportunities
Authors:
Francesca Larosa,
Sergio Hoyas,
H. Alberto Conejero,
Javier Garcia-Martinez,
Francesco Fuso Nerini,
Ricardo Vinuesa
Abstract:
As multiple crises threaten the sustainability of our societies and pose at risk the planetary boundaries, complex challenges require timely, updated, and usable information. Natural-language processing (NLP) tools enhance and expand data collection and processing and knowledge utilization capabilities to support the definition of an inclusive, sustainable future. In this work, we apply different…
▽ More
As multiple crises threaten the sustainability of our societies and pose at risk the planetary boundaries, complex challenges require timely, updated, and usable information. Natural-language processing (NLP) tools enhance and expand data collection and processing and knowledge utilization capabilities to support the definition of an inclusive, sustainable future. In this work, we apply different NLP techniques, tools and approaches to climate and sustainability documents to derive policy-relevant and actionable measures. We focus on general and domain-specific large language models (LLMs) using a combination of static and prompt-based methods. We find that the use of LLMs is successful at processing, classifying and summarizing heterogeneous text-based data. However, we also encounter challenges related to human intervention across different workflow stages and knowledge utilization for policy processes. Our work presents a critical but empirically grounded application of LLMs to complex policy problems and suggests avenues to further expand Artificial Intelligence-powered computational social sciences.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Additive-feature-attribution methods: a review on explainable artificial intelligence for fluid dynamics and heat transfer
Authors:
Andrés Cremades,
Sergio Hoyas,
Ricardo Vinuesa
Abstract:
The use of data-driven methods in fluid mechanics has surged dramatically in recent years due to their capacity to adapt to the complex and multi-scale nature of turbulent flows, as well as to detect patterns in large-scale simulations or experimental tests. In order to interpret the relationships generated in the models during the training process, numerical attributions need to be assigned to th…
▽ More
The use of data-driven methods in fluid mechanics has surged dramatically in recent years due to their capacity to adapt to the complex and multi-scale nature of turbulent flows, as well as to detect patterns in large-scale simulations or experimental tests. In order to interpret the relationships generated in the models during the training process, numerical attributions need to be assigned to the input features. One important example are the additive-feature-attribution methods. These explainability methods link the input features with the model prediction, providing an interpretation based on a linear formulation of the models. The SHapley Additive exPlanations (SHAP values) are formulated as the only possible interpretation that offers a unique solution for understanding the model. In this manuscript, the additive-feature-attribution methods are presented, showing four common implementations in the literature: kernel SHAP, tree SHAP, gradient SHAP, and deep SHAP. Then, the main applications of the additive-feature-attribution methods are introduced, dividing them into three main groups: turbulence modeling, fluid-mechanics fundamentals, and applied problems in fluid dynamics and heat transfer. This review shows thatexplainability techniques, and in particular additive-feature-attribution methods, are crucial for implementing interpretable and physics-compliant deep-learning models in the fluid-mechanics field.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Identifying regions of importance in wall-bounded turbulence through explainable deep learning
Authors:
Andres Cremades,
Sergio Hoyas,
Rahul Deshpande,
Pedro Quintero,
Martin Lellep,
Will Junghoon Lee,
Jason Monty,
Nicholas Hutchins,
Moritz Linkmann,
Ivan Marusic,
Ricardo Vinuesa
Abstract:
Despite its great scientific and technological importance, wall-bounded turbulence is an unresolved problem in classical physics that requires new perspectives to be tackled. One of the key strategies has been to study interactions among the energy-containing coherent structures in the flow. Such interactions are explored in this study for the first time using an explainable deep-learning method.…
▽ More
Despite its great scientific and technological importance, wall-bounded turbulence is an unresolved problem in classical physics that requires new perspectives to be tackled. One of the key strategies has been to study interactions among the energy-containing coherent structures in the flow. Such interactions are explored in this study for the first time using an explainable deep-learning method. The instantaneous velocity field obtained from a turbulent channel flow simulation is used to predict the velocity field in time through a U-net architecture. Based on the predicted flow, we assess the importance of each structure for this prediction using the game-theoretic algorithm of SHapley Additive exPlanations (SHAP). This work provides results in agreement with previous observations in the literature and extends them by revealing that the most important structures in the flow are not necessarily the ones with the highest contribution to the Reynolds shear stress. We also apply the method to an experimental database, where we can identify completely new structures based on their importance score. This framework has the potential to shed light on numerous fundamental phenomena of wall-bounded turbulence, including novel strategies for flow control.
△ Less
Submitted 19 February, 2024; v1 submitted 2 February, 2023;
originally announced February 2023.
-
The Sustainable Development Goals and Aerospace Engineering: A critical note through Artificial Intelligence
Authors:
Alejandro Sánchez-Roncero,
Òscar Garibo-i-Ortsa,
J. Alberto Conejero,
Hamidreza Eivazi,
Fermín Mallor,
Emelie Rosenberg,
Francesco Fuso-Nerini,
Javier García-Martínez,
Ricardo Vinuesa,
Sergio Hoyas
Abstract:
The 2030 Agenda of the United Nations (UN) revolves around the Sustainable Development Goals (SDGs). A critical step towards that objective is identifying whether scientific production aligns with the SDGs' achievement. To assess this, funders and research managers need to manually estimate the impact of their funding agenda on the SDGs, focusing on accuracy, scalability, and objectiveness. With t…
▽ More
The 2030 Agenda of the United Nations (UN) revolves around the Sustainable Development Goals (SDGs). A critical step towards that objective is identifying whether scientific production aligns with the SDGs' achievement. To assess this, funders and research managers need to manually estimate the impact of their funding agenda on the SDGs, focusing on accuracy, scalability, and objectiveness. With this objective in mind, in this work, we develop ASDG, an easy-to-use artificial-intelligence (AI)-based model for automatically identifying the potential impact of scientific papers on the UN SDGs. As a demonstrator of ASDG, we analyze the alignment of recent aerospace publications with the SDGs. The Aerospace data set analyzed in this paper consists of approximately 820,000 papers published in English from 2011 to 2020 and indexed in the Scopus database. The most-contributed SDGs are 7 (on clean energy), 9 (on industry), 11 (on sustainable cities) and 13 (on climate action). The establishment of the SDGs by the UN in the middle of the 2010 decade did not significantly affect the data. However, we find clear discrepancies among countries, likely indicative of different priorities. Also, different trends can be seen in the most and least cited papers, with clear differences in some SDGs. Finally, the number of abstracts the code cannot identify is decreasing with time, possibly showing the scientific community's awareness of SDG.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Aim in Climate Change and City Pollution
Authors:
Pablo Torres,
Beril Sirmacek,
Sergio Hoyas,
Ricardo Vinuesa
Abstract:
The sustainability of urban environments is an increasingly relevant problem. Air pollution plays a key role in the degradation of the environment as well as the health of the citizens exposed to it. In this chapter we provide a review of the methods available to model air pollution, focusing on the application of machine-learning methods. In fact, machine-learning methods have proved to important…
▽ More
The sustainability of urban environments is an increasingly relevant problem. Air pollution plays a key role in the degradation of the environment as well as the health of the citizens exposed to it. In this chapter we provide a review of the methods available to model air pollution, focusing on the application of machine-learning methods. In fact, machine-learning methods have proved to importantly increase the accuracy of traditional air-pollution approaches while limiting the development cost of the models. Machine-learning tools have opened new approaches to study air pollution, such as flow-dynamics modelling or remote-sensing methodologies.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
Towards extraction of orthogonal and parsimonious non-linear modes from turbulent flows
Authors:
Hamidreza Eivazi,
Soledad Le Clainche,
Sergio Hoyas,
Ricardo Vinuesa
Abstract:
We propose a deep probabilistic-neural-network architecture for learning a minimal and near-orthogonal set of non-linear modes from high-fidelity turbulent-flow-field data useful for flow analysis, reduced-order modeling, and flow control. Our approach is based on $β$-variational autoencoders ($β$-VAEs) and convolutional neural networks (CNNs), which allow us to extract non-linear modes from multi…
▽ More
We propose a deep probabilistic-neural-network architecture for learning a minimal and near-orthogonal set of non-linear modes from high-fidelity turbulent-flow-field data useful for flow analysis, reduced-order modeling, and flow control. Our approach is based on $β$-variational autoencoders ($β$-VAEs) and convolutional neural networks (CNNs), which allow us to extract non-linear modes from multi-scale turbulent flows while encouraging the learning of independent latent variables and penalizing the size of the latent vector. Moreover, we introduce an algorithm for ordering VAE-based modes with respect to their contribution to the reconstruction. We apply this method for non-linear mode decomposition of the turbulent flow through a simplified urban environment, where the flow-field data is obtained based on well-resolved large-eddy simulations (LESs). We demonstrate that by constraining the shape of the latent space, it is possible to motivate the orthogonality and extract a set of parsimonious modes sufficient for high-quality reconstruction. Our results show the excellent performance of the method in the reconstruction against linear-theory-based decompositions. Moreover, we compare our method with available AE-based models. We show the ability of our approach in the extraction of near-orthogonal modes that may lead to interpretability.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.