-
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models
Authors:
Mouadh Yagoubi,
Yasser Dahou,
Billel Mokeddem,
Younes Belkada,
Phuc H. Le-Khac,
Basma El Amel Boussaha,
Reda Alami,
Jingwei Zuo,
Damiano Marsili,
Mugariya Farooq,
Mounia Lalmas,
Georgia Gkioxari,
Patrick Gallinari,
Philip Torr,
Hakim Hacid
Abstract:
Existing benchmarks have proven effective for assessing the performance of fully trained large language models. However, we find striking differences in the early training stages of small models, where benchmarks often fail to provide meaningful or discriminative signals. To explore how these differences arise, this competition tackles the challenge of designing scientific knowledge evaluation tas…
▽ More
Existing benchmarks have proven effective for assessing the performance of fully trained large language models. However, we find striking differences in the early training stages of small models, where benchmarks often fail to provide meaningful or discriminative signals. To explore how these differences arise, this competition tackles the challenge of designing scientific knowledge evaluation tasks specifically tailored for measuring early training progress of language models. Participants are invited to develop novel evaluation methodologies or adapt existing benchmarks to better capture performance differences among language models. To support this effort, we provide three pre-trained small models (0.5B, 1B, and 3B parameters), along with intermediate checkpoints sampled during training up to 200B tokens. All experiments and development work can be run on widely available free cloud-based GPU platforms, making participation accessible to researchers with limited computational resources. Submissions will be evaluated based on three criteria: the quality of the performance signal they produce, the consistency of model rankings at 1 trillion tokens of training, and their relevance to the scientific knowledge domain. By promoting the design of tailored evaluation strategies for early training, this competition aims to attract a broad range of participants from various disciplines, including those who may not be machine learning experts or have access to dedicated GPU resources. Ultimately, this initiative seeks to make foundational LLM research more systematic and benchmark-informed from the earliest phases of model development.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
ENMA: Tokenwise Autoregression for Generative Neural PDE Operators
Authors:
Armand Kassaï Koupaï,
Lise Le Boudec,
Louis Serrano,
Patrick Gallinari
Abstract:
Solving time-dependent parametric partial differential equations (PDEs) remains a fundamental challenge for neural solvers, particularly when generalizing across a wide range of physical parameters and dynamics. When data is uncertain or incomplete-as is often the case-a natural approach is to turn to generative models. We introduce ENMA, a generative neural operator designed to model spatio-tempo…
▽ More
Solving time-dependent parametric partial differential equations (PDEs) remains a fundamental challenge for neural solvers, particularly when generalizing across a wide range of physical parameters and dynamics. When data is uncertain or incomplete-as is often the case-a natural approach is to turn to generative models. We introduce ENMA, a generative neural operator designed to model spatio-temporal dynamics arising from physical phenomena. ENMA predicts future dynamics in a compressed latent space using a generative masked autoregressive transformer trained with flow matching loss, enabling tokenwise generation. Irregularly sampled spatial observations are encoded into uniform latent representations via attention mechanisms and further compressed through a spatio-temporal convolutional encoder. This allows ENMA to perform in-context learning at inference time by conditioning on either past states of the target trajectory or auxiliary context trajectories with similar dynamics. The result is a robust and adaptable framework that generalizes to new PDE regimes and supports one-shot surrogate modeling of time-dependent parametric PDEs.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
UP-dROM : Uncertainty-Aware and Parametrised dynamic Reduced-Order Model, application to unsteady flows
Authors:
Ismaël Zighed,
Nicolas Thome,
Patrick Gallinari,
Taraneh Sayadi
Abstract:
Reduced order models (ROMs) play a critical role in fluid mechanics by providing low-cost predictions, making them an attractive tool for engineering applications. However, for ROMs to be widely applicable, they must not only generalise well across different regimes, but also provide a measure of confidence in their predictions. While recent data-driven approaches have begun to address nonlinear r…
▽ More
Reduced order models (ROMs) play a critical role in fluid mechanics by providing low-cost predictions, making them an attractive tool for engineering applications. However, for ROMs to be widely applicable, they must not only generalise well across different regimes, but also provide a measure of confidence in their predictions. While recent data-driven approaches have begun to address nonlinear reduction techniques to improve predictions in transient environments, challenges remain in terms of robustness and parametrisation. In this work, we present a nonlinear reduction strategy specifically designed for transient flows that incorporates parametrisation and uncertainty quantification. Our reduction strategy features a variational auto-encoder (VAE) that uses variational inference for confidence measurement. We use a latent space transformer that incorporates recent advances in attention mechanisms to predict dynamical systems. Attention's versatility in learning sequences and capturing their dependence on external parameters enhances generalisation across a wide range of dynamics. Prediction, coupled with confidence, enables more informed decision making and addresses the need for more robust models. In addition, this confidence is used to cost-effectively sample the parameter space, improving model performance a priori across the entire parameter space without requiring evaluation data for the entire domain.
△ Less
Submitted 3 May, 2025; v1 submitted 29 March, 2025;
originally announced March 2025.
-
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation
Authors:
Song Duong,
Florian Le Bronnec,
Alexandre Allauzen,
Vincent Guigue,
Alberto Lumbreras,
Laure Soulier,
Patrick Gallinari
Abstract:
Large Language Models (LLMs), when used for conditional text generation, often produce hallucinations, i.e., information that is unfaithful or not grounded in the input context. This issue arises in typical conditional text generation tasks, such as text summarization and data-to-text generation, where the goal is to produce fluent text based on contextual input. When fine-tuned on specific domain…
▽ More
Large Language Models (LLMs), when used for conditional text generation, often produce hallucinations, i.e., information that is unfaithful or not grounded in the input context. This issue arises in typical conditional text generation tasks, such as text summarization and data-to-text generation, where the goal is to produce fluent text based on contextual input. When fine-tuned on specific domains, LLMs struggle to provide faithful answers to a given context, often adding information or generating errors. One underlying cause of this issue is that LLMs rely on statistical patterns learned from their training data. This reliance can interfere with the model's ability to stay faithful to a provided context, leading to the generation of ungrounded information. We build upon this observation and introduce a novel self-supervised method for generating a training set of unfaithful samples. We then refine the model using a training process that encourages the generation of grounded outputs over unfaithful ones, drawing on preference-based training. Our approach leads to significantly more grounded text generation, outperforming existing self-supervised techniques in faithfulness, as evaluated through automatic metrics, LLM-based assessments, and human evaluations.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
GEPS: Boosting Generalization in Parametric PDE Neural Solvers through Adaptive Conditioning
Authors:
Armand Kassaï Koupaï,
Jorge Mifsut Benet,
Yuan Yin,
Jean-Noël Vittaut,
Patrick Gallinari
Abstract:
Solving parametric partial differential equations (PDEs) presents significant challenges for data-driven methods due to the sensitivity of spatio-temporal dynamics to variations in PDE parameters. Machine learning approaches often struggle to capture this variability. To address this, data-driven approaches learn parametric PDEs by sampling a very large variety of trajectories with varying PDE par…
▽ More
Solving parametric partial differential equations (PDEs) presents significant challenges for data-driven methods due to the sensitivity of spatio-temporal dynamics to variations in PDE parameters. Machine learning approaches often struggle to capture this variability. To address this, data-driven approaches learn parametric PDEs by sampling a very large variety of trajectories with varying PDE parameters. We first show that incorporating conditioning mechanisms for learning parametric PDEs is essential and that among them, $\textit{adaptive conditioning}$, allows stronger generalization. As existing adaptive conditioning methods do not scale well with respect to the number of parameters to adapt in the neural solver, we propose GEPS, a simple adaptation mechanism to boost GEneralization in Pde Solvers via a first-order optimization and low-rank rapid adaptation of a small set of context parameters. We demonstrate the versatility of our approach for both fully data-driven and for physics-aware neural solvers. Validation performed on a whole range of spatio-temporal forecasting problems demonstrates excellent performance for generalizing to unseen conditions including initial conditions, PDE coefficients, forcing terms and solution domain. $\textit{Project page}$: https://geps-project.github.io
△ Less
Submitted 8 November, 2024; v1 submitted 31 October, 2024;
originally announced October 2024.
-
Learning a Neural Solver for Parametric PDE to Enhance Physics-Informed Methods
Authors:
Lise Le Boudec,
Emmanuel de Bezenac,
Louis Serrano,
Ramon Daniel Regueiro-Espino,
Yuan Yin,
Patrick Gallinari
Abstract:
Physics-informed deep learning often faces optimization challenges due to the complexity of solving partial differential equations (PDEs), which involve exploring large solution spaces, require numerous iterations, and can lead to unstable training. These challenges arise particularly from the ill-conditioning of the optimization problem caused by the differential terms in the loss function. To ad…
▽ More
Physics-informed deep learning often faces optimization challenges due to the complexity of solving partial differential equations (PDEs), which involve exploring large solution spaces, require numerous iterations, and can lead to unstable training. These challenges arise particularly from the ill-conditioning of the optimization problem caused by the differential terms in the loss function. To address these issues, we propose learning a solver, i.e., solving PDEs using a physics-informed iterative algorithm trained on data. Our method learns to condition a gradient descent algorithm that automatically adapts to each PDE instance, significantly accelerating and stabilizing the optimization process and enabling faster convergence of physics-aware models. Furthermore, while traditional physics-informed methods solve for a single PDE instance, our approach extends to parametric PDEs. Specifically, we integrate the physical loss gradient with PDE parameters, allowing our method to solve over a distribution of PDE parameters, including coefficients, initial conditions, and boundary conditions. We demonstrate the effectiveness of our approach through empirical experiments on multiple datasets, comparing both training and test-time optimization performance. The code is available at https://github.com/2ailesB/neural-parametric-solver.
△ Less
Submitted 2 June, 2025; v1 submitted 9 October, 2024;
originally announced October 2024.
-
Probing Language Models on Their Knowledge Source
Authors:
Zineddine Tighidet,
Andrea Mogini,
Jiali Mei,
Benjamin Piwowarski,
Patrick Gallinari
Abstract:
Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between P…
▽ More
Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between PK and CK in LLMs. Using controlled prompts designed to contradict the model's PK, we demonstrate that specific model activations are indicative of the knowledge source employed. We evaluate this framework on various LLMs of different sizes and demonstrate that mid-layer activations, particularly those related to relations in the input, are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.
△ Less
Submitted 9 November, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Zebra: In-Context Generative Pretraining for Solving Parametric PDEs
Authors:
Louis Serrano,
Armand Kassaï Koupaï,
Thomas X Wang,
Pierre Erbacher,
Patrick Gallinari
Abstract:
Solving time-dependent parametric partial differential equations (PDEs) is challenging for data-driven methods, as these models must adapt to variations in parameters such as coefficients, forcing terms, and initial conditions. State-of-the-art neural surrogates perform adaptation through gradient-based optimization and meta-learning to implicitly encode the variety of dynamics from observations.…
▽ More
Solving time-dependent parametric partial differential equations (PDEs) is challenging for data-driven methods, as these models must adapt to variations in parameters such as coefficients, forcing terms, and initial conditions. State-of-the-art neural surrogates perform adaptation through gradient-based optimization and meta-learning to implicitly encode the variety of dynamics from observations. This often comes with increased inference complexity. Inspired by the in-context learning capabilities of large language models (LLMs), we introduce Zebra, a novel generative auto-regressive transformer designed to solve parametric PDEs without requiring gradient adaptation at inference. By leveraging in-context information during both pre-training and inference, Zebra dynamically adapts to new tasks by conditioning on input sequences that incorporate context example trajectories. As a generative model, Zebra can be used to generate new trajectories and allows quantifying the uncertainty of the predictions. We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches.
△ Less
Submitted 26 June, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
MEXMA: Token-level objectives improve sentence representations
Authors:
João Maria Janeiro,
Benjamin Piwowarski,
Patrick Gallinari,
Loïc Barrault
Abstract:
Current pre-trained cross-lingual sentence encoders approaches use sentence-level objectives only. This can lead to loss of information, especially for tokens, which then degrades the sentence representation. We propose MEXMA, a novel approach that integrates both sentence-level and token-level objectives. The sentence representation in one language is used to predict masked tokens in another lang…
▽ More
Current pre-trained cross-lingual sentence encoders approaches use sentence-level objectives only. This can lead to loss of information, especially for tokens, which then degrades the sentence representation. We propose MEXMA, a novel approach that integrates both sentence-level and token-level objectives. The sentence representation in one language is used to predict masked tokens in another language, with both the sentence representation and all tokens directly updating the encoder. We show that adding token-level objectives greatly improves the sentence representation quality across several tasks. Our approach outperforms current pre-trained cross-lingual sentence encoders on bi-text mining as well as several downstream tasks. We also analyse the information encoded in our tokens, and how the sentence representation is built from them.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
NeurIPS 2024 ML4CFD Competition: Harnessing Machine Learning for Computational Fluid Dynamics in Airfoil Design
Authors:
Mouadh Yagoubi,
David Danan,
Milad Leyli-abadi,
Jean-Patrick Brunet,
Jocelyn Ahmed Mazari,
Florent Bonnet,
maroua gmati,
Asma Farjallah,
Paola Cinnella,
Patrick Gallinari,
Marc Schoenauer
Abstract:
The integration of machine learning (ML) techniques for addressing intricate physics problems is increasingly recognized as a promising avenue for expediting simulations. However, assessing ML-derived physical models poses a significant challenge for their adoption within industrial contexts. This competition is designed to promote the development of innovative ML approaches for tackling physical…
▽ More
The integration of machine learning (ML) techniques for addressing intricate physics problems is increasingly recognized as a promising avenue for expediting simulations. However, assessing ML-derived physical models poses a significant challenge for their adoption within industrial contexts. This competition is designed to promote the development of innovative ML approaches for tackling physical challenges, leveraging our recently introduced unified evaluation framework known as Learning Industrial Physical Simulations (LIPS). Building upon the preliminary edition held from November 2023 to March 2024, this iteration centers on a task fundamental to a well-established physical application: airfoil design simulation, utilizing our proposed AirfRANS dataset. The competition evaluates solutions based on various criteria encompassing ML accuracy, computational efficiency, Out-Of-Distribution performance, and adherence to physical principles. Notably, this competition represents a pioneering effort in exploring ML-driven surrogate methods aimed at optimizing the trade-off between computational efficiency and accuracy in physical simulations. Hosted on the Codabench platform, the competition offers online training and evaluation for all participating solutions.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
AROMA: Preserving Spatial Structure for Latent PDE Modeling with Local Neural Fields
Authors:
Louis Serrano,
Thomas X Wang,
Etienne Le Naour,
Jean-Noël Vittaut,
Patrick Gallinari
Abstract:
We present AROMA (Attentive Reduced Order Model with Attention), a framework designed to enhance the modeling of partial differential equations (PDEs) using local neural fields. Our flexible encoder-decoder architecture can obtain smooth latent representations of spatial physical fields from a variety of data types, including irregular-grid inputs and point clouds. This versatility eliminates the…
▽ More
We present AROMA (Attentive Reduced Order Model with Attention), a framework designed to enhance the modeling of partial differential equations (PDEs) using local neural fields. Our flexible encoder-decoder architecture can obtain smooth latent representations of spatial physical fields from a variety of data types, including irregular-grid inputs and point clouds. This versatility eliminates the need for patching and allows efficient processing of diverse geometries. The sequential nature of our latent representation can be interpreted spatially and permits the use of a conditional transformer for modeling the temporal dynamics of PDEs. By employing a diffusion-based formulation, we achieve greater stability and enable longer rollouts compared to conventional MSE training. AROMA's superior performance in simulating 1D and 2D equations underscores the efficacy of our approach in capturing complex dynamical behaviors.
△ Less
Submitted 21 October, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
ML4PhySim : Machine Learning for Physical Simulations Challenge (The airfoil design)
Authors:
Mouadh Yagoubi,
Milad Leyli-Abadi,
David Danan,
Jean-Patrick Brunet,
Jocelyn Ahmed Mazari,
Florent Bonnet,
Asma Farjallah,
Marc Schoenauer,
Patrick Gallinari
Abstract:
The use of machine learning (ML) techniques to solve complex physical problems has been considered recently as a promising approach. However, the evaluation of such learned physical models remains an important issue for industrial use. The aim of this competition is to encourage the development of new ML techniques to solve physical problems using a unified evaluation framework proposed recently,…
▽ More
The use of machine learning (ML) techniques to solve complex physical problems has been considered recently as a promising approach. However, the evaluation of such learned physical models remains an important issue for industrial use. The aim of this competition is to encourage the development of new ML techniques to solve physical problems using a unified evaluation framework proposed recently, called Learning Industrial Physical Simulations (LIPS). We propose learning a task representing a well-known physical use case: the airfoil design simulation, using a dataset called AirfRANS. The global score calculated for each submitted solution is based on three main categories of criteria covering different aspects, namely: ML-related, Out-Of-Distribution, and physical compliance criteria. To the best of our knowledge, this is the first competition addressing the use of ML-based surrogate approaches to improve the trade-off computational cost/accuracy of physical simulation.The competition is hosted by the Codabench platform with online training and evaluation of all submitted solutions.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
LOCOST: State-Space Models for Long Document Abstractive Summarization
Authors:
Florian Le Bronnec,
Song Duong,
Mathieu Ravaut,
Alexandre Allauzen,
Nancy F. Chen,
Vincent Guigue,
Alberto Lumbreras,
Laure Soulier,
Patrick Gallinari
Abstract:
State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-a…
▽ More
State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.
△ Less
Submitted 25 March, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Module-wise Training of Neural Networks via the Minimizing Movement Scheme
Authors:
Skander Karkar,
Ibrahim Ayed,
Emmanuel de Bézenac,
Patrick Gallinari
Abstract:
Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introd…
▽ More
Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.
△ Less
Submitted 5 October, 2023; v1 submitted 29 September, 2023;
originally announced September 2023.
-
INFINITY: Neural Field Modeling for Reynolds-Averaged Navier-Stokes Equations
Authors:
Louis Serrano,
Leon Migus,
Yuan Yin,
Jocelyn Ahmed Mazari,
Patrick Gallinari
Abstract:
For numerical design, the development of efficient and accurate surrogate models is paramount. They allow us to approximate complex physical phenomena, thereby reducing the computational burden of direct numerical simulations. We propose INFINITY, a deep learning model that utilizes implicit neural representations (INRs) to address this challenge. Our framework encodes geometric information and ph…
▽ More
For numerical design, the development of efficient and accurate surrogate models is paramount. They allow us to approximate complex physical phenomena, thereby reducing the computational burden of direct numerical simulations. We propose INFINITY, a deep learning model that utilizes implicit neural representations (INRs) to address this challenge. Our framework encodes geometric information and physical fields into compact representations and learns a mapping between them to infer the physical fields. We use an airfoil design optimization problem as an example task and we evaluate our approach on the challenging AirfRANS dataset, which closely resembles real-world industrial use-cases. The experimental results demonstrate that our framework achieves state-of-the-art performance by accurately inferring physical fields throughout the volume and surface. Additionally we demonstrate its applicability in contexts such as design exploration and shape optimization: our model can correctly predict drag and lift coefficients while adhering to the equations.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Operator Learning with Neural Fields: Tackling PDEs on General Geometries
Authors:
Louis Serrano,
Lise Le Boudec,
Armand Kassaï Koupaï,
Thomas X Wang,
Yuan Yin,
Jean-Noël Vittaut,
Patrick Gallinari
Abstract:
Machine learning approaches for solving partial differential equations require learning mappings between function spaces. While convolutional or graph neural networks are constrained to discretized functions, neural operators present a promising milestone toward mapping functions directly. Despite impressive results they still face challenges with respect to the domain geometry and typically rely…
▽ More
Machine learning approaches for solving partial differential equations require learning mappings between function spaces. While convolutional or graph neural networks are constrained to discretized functions, neural operators present a promising milestone toward mapping functions directly. Despite impressive results they still face challenges with respect to the domain geometry and typically rely on some form of discretization. In order to alleviate such limitations, we present CORAL, a new method that leverages coordinate-based networks for solving PDEs on general geometries. CORAL is designed to remove constraints on the input mesh, making it applicable to any spatial sampling and geometry. Its ability extends to diverse problem domains, including PDE solving, spatio-temporal forecasting, and inverse problems like geometric design. CORAL demonstrates robust performance across multiple resolutions and performs well in both convex and non-convex domains, surpassing or performing on par with state-of-the-art models.
△ Less
Submitted 30 November, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Time Series Continuous Modeling for Imputation and Forecasting with Implicit Neural Representations
Authors:
Etienne Le Naour,
Louis Serrano,
Léon Migus,
Yuan Yin,
Ghislain Agoua,
Nicolas Baskiotis,
Patrick Gallinari,
Vincent Guigue
Abstract:
We introduce a novel modeling approach for time series imputation and forecasting, tailored to address the challenges often encountered in real-world data, such as irregular samples, missing data, or unaligned measurements from multiple sensors. Our method relies on a continuous-time-dependent model of the series' evolution dynamics. It leverages adaptations of conditional, implicit neural represe…
▽ More
We introduce a novel modeling approach for time series imputation and forecasting, tailored to address the challenges often encountered in real-world data, such as irregular samples, missing data, or unaligned measurements from multiple sensors. Our method relies on a continuous-time-dependent model of the series' evolution dynamics. It leverages adaptations of conditional, implicit neural representations for sequential data. A modulation mechanism, driven by a meta-learning algorithm, allows adaptation to unseen samples and extrapolation beyond observed time-windows for long-term predictions. The model provides a highly flexible and unified framework for imputation and forecasting tasks across a wide range of challenging scenarios. It achieves state-of-the-art performance on classical benchmarks and outperforms alternative time-continuous models.
△ Less
Submitted 22 April, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Adversarial Sample Detection Through Neural Network Transport Dynamics
Authors:
Skander Karkar,
Patrick Gallinari,
Alain Rakotomamonjy
Abstract:
We propose a detector of adversarial samples that is based on the view of neural networks as discrete dynamic systems. The detector tells clean inputs from abnormal ones by comparing the discrete vector fields they follow through the layers. We also show that regularizing this vector field during training makes the network more regular on the data distribution's support, thus making the activation…
▽ More
We propose a detector of adversarial samples that is based on the view of neural networks as discrete dynamic systems. The detector tells clean inputs from abnormal ones by comparing the discrete vector fields they follow through the layers. We also show that regularizing this vector field during training makes the network more regular on the data distribution's support, thus making the activations of clean inputs more distinguishable from those of abnormal ones. Experimentally, we compare our detector favorably to other detectors on seen and unseen attacks, and show that the regularization of the network's dynamics improves the performance of adversarial detectors that use the internal embeddings as inputs, while also improving test accuracy.
△ Less
Submitted 8 June, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Stability of implicit neural networks for long-term forecasting in dynamical systems
Authors:
Leon Migus,
Julien Salomon,
Patrick Gallinari
Abstract:
Forecasting physical signals in long time range is among the most challenging tasks in Partial Differential Equations (PDEs) research. To circumvent limitations of traditional solvers, many different Deep Learning methods have been proposed. They are all based on auto-regressive methods and exhibit stability issues. Drawing inspiration from the stability property of implicit numerical schemes, we…
▽ More
Forecasting physical signals in long time range is among the most challenging tasks in Partial Differential Equations (PDEs) research. To circumvent limitations of traditional solvers, many different Deep Learning methods have been proposed. They are all based on auto-regressive methods and exhibit stability issues. Drawing inspiration from the stability property of implicit numerical schemes, we introduce a stable auto-regressive implicit neural network. We develop a theory based on the stability definition of schemes to ensure the stability in forecasting of this network. It leads us to introduce hard constraints on its weights and propagate the dynamics in the latent space. Our experimental results validate our stability property, and show improved results at long-term forecasting for two transports PDEs.
△ Less
Submitted 8 June, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Learning from Multiple Sources for Data-to-Text and Text-to-Data
Authors:
Song Duong,
Alberto Lumbreras,
Mike Gartrell,
Patrick Gallinari
Abstract:
Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa. These tasks are usually handled separately and use corpora extracted from a single source. Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks. This approach has two main limitations: first, a separate system has to be tun…
▽ More
Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa. These tasks are usually handled separately and use corpora extracted from a single source. Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks. This approach has two main limitations: first, a separate system has to be tuned for each task and source; second, learning is limited by the scarcity of available corpora. This paper considers a more general scenario where data are available from multiple heterogeneous sources. Each source, with its specific data format and semantic domain, provides a non-parallel corpus of text and structured data. We introduce a variational auto-encoder model with disentangled style and content variables that allows us to represent the diversity that stems from multiple sources of text and data. Our model is designed to handle the tasks of D2T and T2D jointly. We evaluate our model on several datasets, and show that by learning from multiple sources, our model closes the performance gap with its supervised single-source counterpart and outperforms it in some cases.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Exemplars and Counterexemplars Explanations for Image Classifiers, Targeting Skin Lesion Labeling
Authors:
Carlo Metta,
Riccardo Guidotti,
Yuan Yin,
Patrick Gallinari,
Salvatore Rinzivillo
Abstract:
Explainable AI consists in developing mechanisms allowing for an interaction between decision systems and humans by making the decisions of the formers understandable. This is particularly important in sensitive contexts like in the medical domain. We propose a use case study, for skin lesion diagnosis, illustrating how it is possible to provide the practitioner with explanations on the decisions…
▽ More
Explainable AI consists in developing mechanisms allowing for an interaction between decision systems and humans by making the decisions of the formers understandable. This is particularly important in sensitive contexts like in the medical domain. We propose a use case study, for skin lesion diagnosis, illustrating how it is possible to provide the practitioner with explanations on the decisions of a state of the art deep neural network classifier trained to characterize skin lesions from examples. Our framework consists of a trained classifier onto which an explanation module operates. The latter is able to offer the practitioner exemplars and counterexemplars for the classification diagnosis thus allowing the physician to interact with the automatic diagnosis system. The exemplars are generated via an adversarial autoencoder. We illustrate the behavior of the system on representative examples.
△ Less
Submitted 18 January, 2023;
originally announced February 2023.
-
AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier-Stokes Solutions
Authors:
Florent Bonnet,
Ahmed Jocelyn Mazari,
Paola Cinnella,
Patrick Gallinari
Abstract:
Surrogate models are necessary to optimize meaningful quantities in physical dynamics as their recursive numerical resolutions are often prohibitively expensive. It is mainly the case for fluid dynamics and the resolution of Navier-Stokes equations. However, despite the fast-growing field of data-driven models for physical systems, reference datasets representing real-world phenomena are lacking.…
▽ More
Surrogate models are necessary to optimize meaningful quantities in physical dynamics as their recursive numerical resolutions are often prohibitively expensive. It is mainly the case for fluid dynamics and the resolution of Navier-Stokes equations. However, despite the fast-growing field of data-driven models for physical systems, reference datasets representing real-world phenomena are lacking. In this work, we develop AirfRANS, a dataset for studying the two-dimensional incompressible steady-state Reynolds-Averaged Navier-Stokes equations over airfoils at a subsonic regime and for different angles of attacks. We also introduce metrics on the stress forces at the surface of geometries and visualization of boundary layers to assess the capabilities of models to accurately predict the meaningful information of the problem. Finally, we propose deep learning baselines on four machine learning tasks to study AirfRANS under different constraints for generalization considerations: big and scarce data regime, Reynolds number, and angle of attack extrapolation.
△ Less
Submitted 1 June, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
HMOE: Hypernetwork-based Mixture of Experts for Domain Generalization
Authors:
Jingang Qu,
Thibault Faney,
Ze Wang,
Patrick Gallinari,
Soleiman Yousef,
Jean-Charles de Hemptinne
Abstract:
Due to domain shifts, machine learning systems typically struggle to generalize well to new domains that differ from those of training data, which is what domain generalization (DG) aims to address. Although a variety of DG methods have been proposed, most of them fall short in interpretability and require domain labels, which are not available in many real-world scenarios. This paper presents a n…
▽ More
Due to domain shifts, machine learning systems typically struggle to generalize well to new domains that differ from those of training data, which is what domain generalization (DG) aims to address. Although a variety of DG methods have been proposed, most of them fall short in interpretability and require domain labels, which are not available in many real-world scenarios. This paper presents a novel DG method, called HMOE: Hypernetwork-based Mixture of Experts (MoE), which does not rely on domain labels and is more interpretable. MoE proves effective in identifying heterogeneous patterns in data. For the DG problem, heterogeneity arises exactly from domain shifts. HMOE employs hypernetworks taking vectors as input to generate the weights of experts, which promotes knowledge sharing among experts and enables the exploration of their similarities in a low-dimensional vector space. We benchmark HMOE against other DG methods under a fair evaluation framework -- DomainBed. Our extensive experiments show that HMOE can effectively separate mixed-domain data into distinct clusters that are surprisingly more consistent with human intuition than original domain labels. Using self-learned domain information, HMOE achieves state-of-the-art results on most datasets and significantly surpasses other DG methods in average accuracy across all datasets.
△ Less
Submitted 14 November, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Block-wise Training of Residual Networks via the Minimizing Movement Scheme
Authors:
Skander Karkar,
Ibrahim Ayed,
Emmanuel de Bézenac,
Patrick Gallinari
Abstract:
End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device tra…
▽ More
End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device training of neural networks. We develop a layer-wise training method, particularly welladapted to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test accuracy after a certain depth. We show on classification tasks that the test accuracy of block-wise trained ResNets is improved when using our method, whether the blocks are trained sequentially or in parallel.
△ Less
Submitted 6 June, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Continuous PDE Dynamics Forecasting with Implicit Neural Representations
Authors:
Yuan Yin,
Matthieu Kirchmeyer,
Jean-Yves Franceschi,
Alain Rakotomamonjy,
Patrick Gallinari
Abstract:
Effective data-driven PDE forecasting methods often rely on fixed spatial and / or temporal discretizations. This raises limitations in real-world applications like weather prediction where flexible extrapolation at arbitrary spatiotemporal locations is required. We address this problem by introducing a new data-driven approach, DINo, that models a PDE's flow with continuous-time dynamics of spati…
▽ More
Effective data-driven PDE forecasting methods often rely on fixed spatial and / or temporal discretizations. This raises limitations in real-world applications like weather prediction where flexible extrapolation at arbitrary spatiotemporal locations is required. We address this problem by introducing a new data-driven approach, DINo, that models a PDE's flow with continuous-time dynamics of spatially continuous functions. This is achieved by embedding spatial observations independently of their discretization via Implicit Neural Representations in a small latent space temporally driven by a learned ODE. This separate and flexible treatment of time and space makes DINo the first data-driven model to combine the following advantages. It extrapolates at arbitrary spatial and temporal locations; it can learn from sparse irregular grids or manifolds; at test time, it generalizes to new grids or resolutions. DINo outperforms alternative neural PDE forecasters in a variety of challenging generalization scenarios on representative PDE systems.
△ Less
Submitted 15 February, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes Equations
Authors:
Florent Bonnet,
Jocelyn Ahmed Mazari,
Thibaut Munzer,
Pierre Yser,
Patrick Gallinari
Abstract:
Recent progress in \emph{Geometric Deep Learning} (GDL) has shown its potential to provide powerful data-driven models. This gives momentum to explore new methods for learning physical systems governed by \emph{Partial Differential Equations} (PDEs) from Graph-Mesh data. However, despite the efforts and recent achievements, several research directions remain unexplored and progress is still far fr…
▽ More
Recent progress in \emph{Geometric Deep Learning} (GDL) has shown its potential to provide powerful data-driven models. This gives momentum to explore new methods for learning physical systems governed by \emph{Partial Differential Equations} (PDEs) from Graph-Mesh data. However, despite the efforts and recent achievements, several research directions remain unexplored and progress is still far from satisfying the physical requirements of real-world phenomena. One of the major impediments is the absence of benchmarking datasets and common physics evaluation protocols. In this paper, we propose a 2-D graph-mesh dataset to study the airflow over airfoils at high Reynolds regime (from $10^6$ and beyond). We also introduce metrics on the stress forces over the airfoil in order to evaluate GDL models on important physical quantities. Moreover, we provide extensive GDL baselines.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Multi-scale Physical Representations for Approximating PDE Solutions with Graph Neural Operators
Authors:
Léon Migus,
Yuan Yin,
Jocelyn Ahmed Mazari,
Patrick Gallinari
Abstract:
Representing physical signals at different scales is among the most challenging problems in engineering. Several multi-scale modeling tools have been developed to describe physical systems governed by \emph{Partial Differential Equations} (PDEs). These tools are at the crossroad of principled physical models and numerical schema. Recently, data-driven models have been introduced to speed-up the ap…
▽ More
Representing physical signals at different scales is among the most challenging problems in engineering. Several multi-scale modeling tools have been developed to describe physical systems governed by \emph{Partial Differential Equations} (PDEs). These tools are at the crossroad of principled physical models and numerical schema. Recently, data-driven models have been introduced to speed-up the approximation of PDE solutions compared to numerical solvers. Among these recent data-driven methods, neural integral operators are a class that learn a mapping between function spaces. These functions are discretized on graphs (meshes) which are appropriate for modeling interactions in physical phenomena. In this work, we study three multi-resolution schema with integral kernel operators that can be approximated with \emph{Message Passing Graph Neural Networks} (MPGNNs). To validate our study, we make extensive MPGNNs experiments with well-chosen metrics considering steady and unsteady PDEs.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Diverse Weight Averaging for Out-of-Distribution Generalization
Authors:
Alexandre Ramé,
Matthieu Kirchmeyer,
Thibaud Rahier,
Alain Rakotomamonjy,
Patrick Gallinari,
Matthieu Cord
Abstract:
Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In t…
▽ More
Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In this paper, we propose Diverse Weight Averaging (DiWA), a new WA strategy whose main motivation is to increase the functional diversity across averaged models. To this end, DiWA averages weights obtained from several independent training runs: indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between WA and standard functional ensembling. Moreover, this decomposition highlights that WA succeeds when the variance term dominates, which we show occurs when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on DomainBed without inference overhead.
△ Less
Submitted 27 January, 2023; v1 submitted 19 May, 2022;
originally announced May 2022.
-
PTFlash : A deep learning framework for isothermal two-phase equilibrium calculations
Authors:
Jingang Qu,
Thibault Faney,
Jean-Charles de Hemptinne,
Soleiman Yousef,
Patrick Gallinari
Abstract:
Phase equilibrium calculations are an essential part of numerical simulations of multi-component multi-phase flow in porous media, accounting for the largest share of the computational time. In this work, we introduce a GPUenabled, fast, and parallel framework, PTFlash, that vectorizes algorithms required for isothermal two-phase flash calculations using PyTorch, and can facilitate a wide range of…
▽ More
Phase equilibrium calculations are an essential part of numerical simulations of multi-component multi-phase flow in porous media, accounting for the largest share of the computational time. In this work, we introduce a GPUenabled, fast, and parallel framework, PTFlash, that vectorizes algorithms required for isothermal two-phase flash calculations using PyTorch, and can facilitate a wide range of downstream applications. In addition, to further accelerate PTFlash, we design two task-specific neural networks, one for predicting the stability of given mixtures and the other for providing estimates of the distribution coefficients, which are trained offline and help shorten computation time by sidestepping stability analysis and reducing the number of iterations to reach convergence. The evaluation of PTFlash was conducted on three case studies involving hydrocarbons, CO 2 and N 2 , for which the phase equilibrium was tested over a large range of temperature, pressure and composition conditions, using the Soave-Redlich-Kwong (SRK) equation of state. We compare PTFlash with an in-house thermodynamic library, Carnot, written in C++ and performing flash calculations one by one on CPU. Results show speed-ups on large scale calculations up to two order of magnitudes, while maintaining perfect precision with the reference solution provided by Carnot.
△ Less
Submitted 19 May, 2022; v1 submitted 6 May, 2022;
originally announced May 2022.
-
Generalizing to New Physical Systems via Context-Informed Dynamics Model
Authors:
Matthieu Kirchmeyer,
Yuan Yin,
Jérémie Donà,
Nicolas Baskiotis,
Alain Rakotomamonjy,
Patrick Gallinari
Abstract:
Data-driven approaches to modeling physical systems fail to generalize to unseen systems that share the same general dynamics with the learning domain, but correspond to different physical contexts. We propose a new framework for this key problem, context-informed dynamics adaptation (CoDA), which takes into account the distributional shift across systems for fast and efficient adaptation to new d…
▽ More
Data-driven approaches to modeling physical systems fail to generalize to unseen systems that share the same general dynamics with the learning domain, but correspond to different physical contexts. We propose a new framework for this key problem, context-informed dynamics adaptation (CoDA), which takes into account the distributional shift across systems for fast and efficient adaptation to new dynamics. CoDA leverages multiple environments, each associated to a different dynamic, and learns to condition the dynamics model on contextual parameters, specific to each environment. The conditioning is performed via a hypernetwork, learned jointly with a context vector from observed data. The proposed formulation constrains the search hypothesis space to foster fast adaptation and better generalization across environments. We theoretically motivate our approach and show state-of-the-art generalization results on a set of nonlinear dynamics, representative of a variety of application domains. We also show, on these systems, that new system parameters can be inferred from context vectors with minimal supervision. Code is available at https://github.com/yuan-yin/CoDA .
△ Less
Submitted 24 June, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Explainable Deep Image Classifiers for Skin Lesion Diagnosis
Authors:
Carlo Metta,
Andrea Beretta,
Riccardo Guidotti,
Yuan Yin,
Patrick Gallinari,
Salvatore Rinzivillo,
Fosca Giannotti
Abstract:
A key issue in critical contexts such as medical diagnosis is the interpretability of the deep learning models adopted in decision-making systems. Research in eXplainable Artificial Intelligence (XAI) is trying to solve this issue. However, often XAI approaches are only tested on generalist classifier and do not represent realistic problems such as those of medical diagnosis. In this paper, we ana…
▽ More
A key issue in critical contexts such as medical diagnosis is the interpretability of the deep learning models adopted in decision-making systems. Research in eXplainable Artificial Intelligence (XAI) is trying to solve this issue. However, often XAI approaches are only tested on generalist classifier and do not represent realistic problems such as those of medical diagnosis. In this paper, we analyze a case study on skin lesion images where we customize an existing XAI approach for explaining a deep learning model able to recognize different types of skin lesions. The explanation is formed by synthetic exemplar and counter-exemplar images of skin lesion and offers the practitioner a way to highlight the crucial traits responsible for the classification decision. A survey conducted with domain experts, beginners and unskilled people proof that the usage of explanations increases the trust and confidence in the automatic decision system. Also, an analysis of the latent space adopted by the explainer unveils that some of the most frequent skin lesion classes are distinctly separated. This phenomenon could derive from the intrinsic characteristics of each class and, hopefully, can provide support in the resolution of the most frequent misclassifications by human experts.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Mapping conditional distributions for domain adaptation under generalized target shift
Authors:
Matthieu Kirchmeyer,
Alain Rakotomamonjy,
Emmanuel de Bezenac,
Patrick Gallinari
Abstract:
We consider the problem of unsupervised domain adaptation (UDA) between a source and a target domain under conditional and label shift a.k.a Generalized Target Shift (GeTarS). Unlike simpler UDA settings, few works have addressed this challenging problem. Recent approaches learn domain-invariant representations, yet they have practical limitations and rely on strong assumptions that may not hold i…
▽ More
We consider the problem of unsupervised domain adaptation (UDA) between a source and a target domain under conditional and label shift a.k.a Generalized Target Shift (GeTarS). Unlike simpler UDA settings, few works have addressed this challenging problem. Recent approaches learn domain-invariant representations, yet they have practical limitations and rely on strong assumptions that may not hold in practice. In this paper, we explore a novel and general approach to align pretrained representations, which circumvents existing drawbacks. Instead of constraining representation invariance, it learns an optimal transport map, implemented as a NN, which maps source representations onto target ones. Our approach is flexible and scalable, it preserves the problem's structure and it has strong theoretical guarantees under mild assumptions. In particular, our solution is unique, matches conditional distributions across domains, recovers target proportions and explicitly controls the target generalization risk. Through an exhaustive comparison on several datasets, we challenge the state-of-the-art in GeTarS.
△ Less
Submitted 18 March, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction
Authors:
Bruno Taillé,
Vincent Guigue,
Geoffrey Scoutheeten,
Patrick Gallinari
Abstract:
State-of-the-art NLP models can adopt shallow heuristics that limit their generalization capability (McCoy et al., 2019). Such heuristics include lexical overlap with the training set in Named-Entity Recognition (Taillé et al., 2020) and Event or Type heuristics in Relation Extraction (Rosenman et al., 2020). In the more realistic end-to-end RE setting, we can expect yet another heuristic: the mer…
▽ More
State-of-the-art NLP models can adopt shallow heuristics that limit their generalization capability (McCoy et al., 2019). Such heuristics include lexical overlap with the training set in Named-Entity Recognition (Taillé et al., 2020) and Event or Type heuristics in Relation Extraction (Rosenman et al., 2020). In the more realistic end-to-end RE setting, we can expect yet another heuristic: the mere retention of training relation triples. In this paper, we propose several experiments confirming that retention of known facts is a key factor of performance on standard benchmarks. Furthermore, one experiment suggests that a pipeline model able to use intermediate type representations is less prone to over-rely on retention.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Unsupervised domain adaptation with non-stochastic missing data
Authors:
Matthieu Kirchmeyer,
Patrick Gallinari,
Alain Rakotomamonjy,
Amin Mantrach
Abstract:
We consider unsupervised domain adaptation (UDA) for classification problems in the presence of missing data in the unlabelled target domain. More precisely, motivated by practical applications, we analyze situations where distribution shift exists between domains and where some components are systematically absent on the target domain without available supervision for imputing the missing target…
▽ More
We consider unsupervised domain adaptation (UDA) for classification problems in the presence of missing data in the unlabelled target domain. More precisely, motivated by practical applications, we analyze situations where distribution shift exists between domains and where some components are systematically absent on the target domain without available supervision for imputing the missing target components. We propose a generative approach for imputation. Imputation is performed in a domain-invariant latent space and leverages indirect supervision from a complete source domain. We introduce a single model performing joint adaptation, imputation and classification which, under our assumptions, minimizes an upper bound of its target generalization error and performs well under various representative divergence families (H-divergence, Optimal Transport). Moreover, we compare the target error of our Adaptation-imputation framework and the "ideal" target error of a UDA classifier without missing target components. Our model is further improved with self-training, to bring the learned source and target class posterior distributions closer. We perform experiments on three families of datasets of different modalities: a classical digit classification benchmark, the Amazon product reviews dataset both commonly used in UDA and real-world digital advertising datasets. We show the benefits of jointly performing adaptation, classification and imputation on these datasets.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
Differentiable Feature Selection, a Reparameterization Approach
Authors:
Jérémie Dona,
Patrick Gallinari
Abstract:
We consider the task of feature selection for reconstruction which consists in choosing a small subset of features from which whole data instances can be reconstructed. This is of particular importance in several contexts involving for example costly physical measurements, sensor placement or information compression. To break the intrinsic combinatorial nature of this problem, we formulate the tas…
▽ More
We consider the task of feature selection for reconstruction which consists in choosing a small subset of features from which whole data instances can be reconstructed. This is of particular importance in several contexts involving for example costly physical measurements, sensor placement or information compression. To break the intrinsic combinatorial nature of this problem, we formulate the task as optimizing a binary mask distribution enabling an accurate reconstruction. We then face two main challenges. One concerns differentiability issues due to the binary distribution. The second one corresponds to the elimination of redundant information by selecting variables in a correlated fashion which requires modeling the covariance of the binary distribution. We address both issues by introducing a relaxation of the problem via a novel reparameterization of the logitNormal distribution. We demonstrate that the proposed method provides an effective exploration scheme and leads to efficient feature selection for reconstruction through evaluation on several high dimensional image benchmarks. We show that the method leverages the intrinsic geometry of the data, facilitating reconstruction.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
A Neural Tangent Kernel Perspective of GANs
Authors:
Jean-Yves Franceschi,
Emmanuel de Bézenac,
Ibrahim Ayed,
Mickaël Chen,
Sylvain Lamprier,
Patrick Gallinari
Abstract:
We propose a novel theoretical framework of analysis for Generative Adversarial Networks (GANs). We reveal a fundamental flaw of previous analyses which, by incorrectly modeling GANs' training scheme, are subject to ill-defined discriminator gradients. We overcome this issue which impedes a principled study of GAN training, solving it within our framework by taking into account the discriminator's…
▽ More
We propose a novel theoretical framework of analysis for Generative Adversarial Networks (GANs). We reveal a fundamental flaw of previous analyses which, by incorrectly modeling GANs' training scheme, are subject to ill-defined discriminator gradients. We overcome this issue which impedes a principled study of GAN training, solving it within our framework by taking into account the discriminator's architecture. To this end, we leverage the theory of infinite-width neural networks for the discriminator via its Neural Tangent Kernel. We characterize the trained discriminator for a wide range of losses and establish general differentiability properties of the network. From this, we derive new insights about the convergence of the generated distribution, advancing our understanding of GANs' training dynamics. We empirically corroborate these results via an analysis toolkit based on our framework, unveiling intuitions that are consistent with GAN practice.
△ Less
Submitted 7 November, 2022; v1 submitted 10 June, 2021;
originally announced June 2021.
-
LEADS: Learning Dynamical Systems that Generalize Across Environments
Authors:
Yuan Yin,
Ibrahim Ayed,
Emmanuel de Bézenac,
Nicolas Baskiotis,
Patrick Gallinari
Abstract:
When modeling dynamical systems from real-world data samples, the distribution of data often changes according to the environment in which they are captured, and the dynamics of the system itself vary from one environment to another. Generalizing across environments thus challenges the conventional frameworks. The classical settings suggest either considering data as i.i.d. and learning a single m…
▽ More
When modeling dynamical systems from real-world data samples, the distribution of data often changes according to the environment in which they are captured, and the dynamics of the system itself vary from one environment to another. Generalizing across environments thus challenges the conventional frameworks. The classical settings suggest either considering data as i.i.d. and learning a single model to cover all situations or learning environment-specific models. Both are sub-optimal: the former disregards the discrepancies between environments leading to biased solutions, while the latter does not exploit their potential commonalities and is prone to scarcity problems. We propose LEADS, a novel framework that leverages the commonalities and discrepancies among known environments to improve model generalization. This is achieved with a tailored training formulation aiming at capturing common dynamics within a shared model while additional terms capture environment-specific dynamics. We ground our approach in theory, exhibiting a decrease in sample complexity with our approach and corroborate these results empirically, instantiating it for linear dynamics. Moreover, we concretize this framework for neural networks and evaluate it experimentally on representative families of nonlinear dynamics. We show that this new setting can exploit knowledge extracted from environment-dependent data and improves generalization for both known and novel environments. Code is available at https://github.com/yuan-yin/LEADS.
△ Less
Submitted 14 February, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation
Authors:
Clément Rebuffel,
Thomas Scialom,
Laure Soulier,
Benjamin Piwowarski,
Sylvain Lamprier,
Jacopo Staiano,
Geoffrey Scoutheeten,
Patrick Gallinari
Abstract:
QuestEval is a reference-less metric used in text-to-text tasks, that compares the generated summaries directly to the source text, by automatically asking and answering questions. Its adaptation to Data-to-Text tasks is not straightforward, as it requires multimodal Question Generation and Answering systems on the considered tasks, which are seldom available. To this purpose, we propose a method…
▽ More
QuestEval is a reference-less metric used in text-to-text tasks, that compares the generated summaries directly to the source text, by automatically asking and answering questions. Its adaptation to Data-to-Text tasks is not straightforward, as it requires multimodal Question Generation and Answering systems on the considered tasks, which are seldom available. To this purpose, we propose a method to build synthetic multimodal corpora enabling to train multimodal components for a data-QuestEval metric. The resulting metric is reference-less and multimodal; it obtains state-of-the-art correlations with human judgment on the WebNLG and WikiBio benchmarks. We make data-QuestEval's code and models available for reproducibility purpose, as part of the QuestEval project.
△ Less
Submitted 7 September, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
QuestEval: Summarization Asks for Fact-based Evaluation
Authors:
Thomas Scialom,
Paul-Alexis Dray,
Patrick Gallinari,
Sylvain Lamprier,
Benjamin Piwowarski,
Jacopo Staiano,
Alex Wang
Abstract:
Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely on question answering models to assess whether a summary contains all the relevant information in its source document. Though promising, the proposed approaches…
▽ More
Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely on question answering models to assess whether a summary contains all the relevant information in its source document. Though promising, the proposed approaches have so far failed to correlate better than ROUGE with human judgments.
In this paper, we extend previous approaches and propose a unified framework, named QuestEval. In contrast to established metrics such as ROUGE or BERTScore, QuestEval does not require any ground-truth reference. Nonetheless, QuestEval substantially improves the correlation with human judgments over four evaluation dimensions (consistency, coherence, fluency, and relevance), as shown in the extensive experiments we report.
△ Less
Submitted 9 April, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Controlling Hallucinations at Word Level in Data-to-Text Generation
Authors:
Clément Rebuffel,
Marco Roberti,
Laure Soulier,
Geoffrey Scoutheeten,
Rossella Cancelliere,
Patrick Gallinari
Abstract:
Data-to-Text Generation (DTG) is a subfield of Natural Language Generation aiming at transcribing structured data in natural language descriptions. The field has been recently boosted by the use of neural-based generators which exhibit on one side great syntactic skills without the need of hand-crafted pipelines; on the other side, the quality of the generated text reflects the quality of the trai…
▽ More
Data-to-Text Generation (DTG) is a subfield of Natural Language Generation aiming at transcribing structured data in natural language descriptions. The field has been recently boosted by the use of neural-based generators which exhibit on one side great syntactic skills without the need of hand-crafted pipelines; on the other side, the quality of the generated text reflects the quality of the training data, which in realistic settings only offer imperfectly aligned structure-text pairs. Consequently, state-of-art neural models include misleading statements - usually called hallucinations - in their outputs. The control of this phenomenon is today a major challenge for DTG, and is the problem addressed in the paper.
Previous work deal with this issue at the instance level: using an alignment score for each table-reference pair. In contrast, we propose a finer-grained approach, arguing that hallucinations should rather be treated at the word level. Specifically, we propose a Multi-Branch Decoder which is able to leverage word-level labels to learn the relevant parts of each training instance. These labels are obtained following a simple and efficient scoring procedure based on co-occurrence analysis and dependency parsing. Extensive evaluations, via automated metrics and human judgment on the standard WikiBio benchmark, show the accuracy of our alignment labels and the effectiveness of the proposed Multi-Branch Decoder. Our model is able to reduce and control hallucinations, while keeping fluency and coherence in generated texts. Further experiments on a degraded version of ToTTo show that our model could be successfully used on very noisy settings.
△ Less
Submitted 9 July, 2021; v1 submitted 4 February, 2021;
originally announced February 2021.
-
Stochastic sparse adversarial attacks
Authors:
Manon Césaire,
Lucas Schott,
Hatem Hajri,
Sylvain Lamprier,
Patrick Gallinari
Abstract:
This paper introduces stochastic sparse adversarial attacks (SSAA), standing as simple, fast and purely noise-based targeted and untargeted attacks of neural network classifiers (NNC). SSAA offer new examples of sparse (or $L_0$) attacks for which only few methods have been proposed previously. These attacks are devised by exploiting a small-time expansion idea widely used for Markov processes. Ex…
▽ More
This paper introduces stochastic sparse adversarial attacks (SSAA), standing as simple, fast and purely noise-based targeted and untargeted attacks of neural network classifiers (NNC). SSAA offer new examples of sparse (or $L_0$) attacks for which only few methods have been proposed previously. These attacks are devised by exploiting a small-time expansion idea widely used for Markov processes. Experiments on small and large datasets (CIFAR-10 and ImageNet) illustrate several advantages of SSAA in comparison with the-state-of-the-art methods. For instance, in the untargeted case, our method called Voting Folded Gaussian Attack (VFGA) scales efficiently to ImageNet and achieves a significantly lower $L_0$ score than SparseFool (up to $\frac{2}{5}$) while being faster. Moreover, VFGA achieves better $L_0$ scores on ImageNet than Sparse-RS when both attacks are fully successful on a large number of samples.
△ Less
Submitted 19 February, 2022; v1 submitted 24 November, 2020;
originally announced November 2020.
-
Transductive Zero-Shot Learning using Cross-Modal CycleGAN
Authors:
Patrick Bordes,
Eloi Zablocki,
Benjamin Piwowarski,
Patrick Gallinari
Abstract:
In Computer Vision, Zero-Shot Learning (ZSL) aims at classifying unseen classes -- classes for which no matching training image exists. Most of ZSL works learn a cross-modal mapping between images and class labels for seen classes. However, the data distribution of seen and unseen classes might differ, causing a domain shift problem. Following this observation, transductive ZSL (T-ZSL) assumes tha…
▽ More
In Computer Vision, Zero-Shot Learning (ZSL) aims at classifying unseen classes -- classes for which no matching training image exists. Most of ZSL works learn a cross-modal mapping between images and class labels for seen classes. However, the data distribution of seen and unseen classes might differ, causing a domain shift problem. Following this observation, transductive ZSL (T-ZSL) assumes that unseen classes and their associated images are known during training, but not their correspondence. As current T-ZSL approaches do not scale efficiently when the number of seen classes is high, we tackle this problem with a new model for T-ZSL based upon CycleGAN. Our model jointly (i) projects images on their seen class labels with a supervised objective and (ii) aligns unseen class labels and visual exemplars with adversarial and cycle-consistency objectives. We show the efficiency of our Cross-Modal CycleGAN model (CM-GAN) on the ImageNet T-ZSL task where we obtain state-of-the-art results. We further validate CM-GAN on a language grounding task, and on a new task that we propose: zero-shot sentence-to-image matching on MS COCO.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation
Authors:
Clément Rebuffel,
Laure Soulier,
Geoffrey Scoutheeten,
Patrick Gallinari
Abstract:
In language generation models conditioned by structured data, the classical training via maximum likelihood almost always leads models to pick up on dataset divergence (i.e., hallucinations or omissions), and to incorporate them erroneously in their own generations at inference. In this work, we build ontop of previous Reinforcement Learning based approaches and show that a model-agnostic framewor…
▽ More
In language generation models conditioned by structured data, the classical training via maximum likelihood almost always leads models to pick up on dataset divergence (i.e., hallucinations or omissions), and to incorporate them erroneously in their own generations at inference. In this work, we build ontop of previous Reinforcement Learning based approaches and show that a model-agnostic framework relying on the recently introduced PARENT metric is efficient at reducing both hallucinations and omissions. Evaluations on the widely used WikiBIO and WebNLG benchmarks demonstrate the effectiveness of this framework compared to state-of-the-art models.
△ Less
Submitted 22 October, 2020; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting
Authors:
Yuan Yin,
Vincent Le Guen,
Jérémie Dona,
Emmanuel de Bézenac,
Ibrahim Ayed,
Nicolas Thome,
Patrick Gallinari
Abstract:
Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framewor…
▽ More
Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framework, a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models. It consists in decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model. The learning problem is carefully formulated such that the physical model explains as much of the data as possible, while the data-driven component only describes information that cannot be captured by the physical model, no more, no less. This not only provides the existence and uniqueness for this decomposition, but also ensures interpretability and benefits generalization. Experiments made on three important use cases, each representative of a different family of phenomena, i.e. reaction-diffusion equations, wave equations and the non-linear damped pendulum, show that APHYNITY can efficiently leverage approximate physical models to accurately forecast the evolution of the system and correctly identify relevant physical parameters. Code is available at https://github.com/yuan-yin/APHYNITY .
△ Less
Submitted 10 May, 2022; v1 submitted 9 October, 2020;
originally announced October 2020.
-
Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!
Authors:
Bruno Taillé,
Vincent Guigue,
Geoffrey Scoutheeten,
Patrick Gallinari
Abstract:
Despite efforts to distinguish three different evaluation setups (Bekoulis et al., 2018), numerous end-to-end Relation Extraction (RE) articles present unreliable performance comparison to previous work. In this paper, we first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the i…
▽ More
Despite efforts to distinguish three different evaluation setups (Bekoulis et al., 2018), numerous end-to-end Relation Extraction (RE) articles present unreliable performance comparison to previous work. In this paper, we first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the impact of the most common mistake and evaluate it leads to overestimating the final RE performance by around 5% on ACE05. We also seize this opportunity to study the unexplored ablations of two recent developments: the use of language model pretraining (specifically BERT) and span-level NER. This meta-analysis emphasizes the need for rigor in the report of both the evaluation setting and the datasets statistics and we call for unifying the evaluation setting in end-to-end RE.
△ Less
Submitted 9 August, 2021; v1 submitted 22 September, 2020;
originally announced September 2020.
-
A Principle of Least Action for the Training of Neural Networks
Authors:
Skander Karkar,
Ibrahim Ayed,
Emmanuel de Bézenac,
Patrick Gallinari
Abstract:
Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In…
▽ More
Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternate perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network's behavior through its displacements, we show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: finding neural networks which solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes.
△ Less
Submitted 15 June, 2021; v1 submitted 17 September, 2020;
originally announced September 2020.
-
PDE-Driven Spatiotemporal Disentanglement
Authors:
Jérémie Donà,
Jean-Yves Franceschi,
Sylvain Lamprier,
Patrick Gallinari
Abstract:
A recent line of work in the machine learning community addresses the problem of predicting high-dimensional spatiotemporal phenomena by leveraging specific tools from the differential equations theory. Following this direction, we propose in this article a novel and general paradigm for this task based on a resolution method for partial differential equations: the separation of variables. This in…
▽ More
A recent line of work in the machine learning community addresses the problem of predicting high-dimensional spatiotemporal phenomena by leveraging specific tools from the differential equations theory. Following this direction, we propose in this article a novel and general paradigm for this task based on a resolution method for partial differential equations: the separation of variables. This inspiration allows us to introduce a dynamical interpretation of spatiotemporal disentanglement. It induces a principled model based on learning disentangled spatial and temporal representations of a phenomenon to accurately predict future observations. We experimentally demonstrate the performance and broad applicability of our method against prior state-of-the-art models on physical and synthetic video datasets.
△ Less
Submitted 23 March, 2021; v1 submitted 4 August, 2020;
originally announced August 2020.
-
What BERT Sees: Cross-Modal Transfer for Visual Question Generation
Authors:
Thomas Scialom,
Patrick Bordes,
Paul-Alexis Dray,
Jacopo Staiano,
Patrick Gallinari
Abstract:
Pre-trained language models have recently contributed to significant advances in NLP tasks. Recently, multi-modal versions of BERT have been developed, using heavy pre-training relying on vast corpora of aligned textual and image data, primarily applied to classification tasks such as VQA. In this paper, we are interested in evaluating the visual capabilities of BERT out-of-the-box, by avoiding pr…
▽ More
Pre-trained language models have recently contributed to significant advances in NLP tasks. Recently, multi-modal versions of BERT have been developed, using heavy pre-training relying on vast corpora of aligned textual and image data, primarily applied to classification tasks such as VQA. In this paper, we are interested in evaluating the visual capabilities of BERT out-of-the-box, by avoiding pre-training made on supplementary data. We choose to study Visual Question Generation, a task of great interest for grounded dialog, that enables to study the impact of each modality (as input can be visual and/or textual). Moreover, the generation aspect of the task requires an adaptation since BERT is primarily designed as an encoder. We introduce BERT-gen, a BERT-based architecture for text generation, able to leverage on either mono- or multi- modal representations. The results reported under different configurations indicate an innate capacity for BERT-gen to adapt to multi-modal data and text generation, even with few data available, avoiding expensive pre-training. The proposed model obtains substantial improvements over the state-of-the-art on two established VQG datasets.
△ Less
Submitted 16 December, 2020; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Stochastic Latent Residual Video Prediction
Authors:
Jean-Yves Franceschi,
Edouard Delasalles,
Mickaël Chen,
Sylvain Lamprier,
Patrick Gallinari
Abstract:
Designing video prediction models that account for the inherent uncertainty of the future is challenging. Most works in the literature are based on stochastic image-autoregressive recurrent networks, which raises several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and temporal dynamics. However, no such model for stochasti…
▽ More
Designing video prediction models that account for the inherent uncertainty of the future is challenging. Most works in the literature are based on stochastic image-autoregressive recurrent networks, which raises several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and temporal dynamics. However, no such model for stochastic video prediction has been proposed in the literature yet, due to design and training difficulties. In this paper, we overcome these difficulties by introducing a novel stochastic temporal model whose dynamics are governed in a latent space by a residual update rule. This first-order scheme is motivated by discretization schemes of differential equations. It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.
△ Less
Submitted 7 August, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
Incorporating Visual Semantics into Sentence Representations within a Grounded Space
Authors:
Patrick Bordes,
Eloi Zablocki,
Laure Soulier,
Benjamin Piwowarski,
Patrick Gallinari
Abstract:
Language grounding is an active field aiming at enriching textual representations with visual information. Generally, textual and visual elements are embedded in the same representation space, which implicitly assumes a one-to-one correspondence between modalities. This hypothesis does not hold when representing words, and becomes problematic when used to learn sentence representations --- the foc…
▽ More
Language grounding is an active field aiming at enriching textual representations with visual information. Generally, textual and visual elements are embedded in the same representation space, which implicitly assumes a one-to-one correspondence between modalities. This hypothesis does not hold when representing words, and becomes problematic when used to learn sentence representations --- the focus of this paper --- as a visual scene can be described by a wide variety of sentences. To overcome this limitation, we propose to transfer visual information to textual representations by learning an intermediate representation space: the grounded space. We further propose two new complementary objectives ensuring that (1) sentences associated with the same visual content are close in the grounded space and (2) similarities between related elements are preserved across modalities. We show that this model outperforms the previous state-of-the-art on classification and semantic relatedness tasks.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.