-
Growth strategies for arbitrary DAG neural architectures
Authors:
Stella Douka,
Manon Verbockhaven,
Théo Rudkiewicz,
Stéphane Rivaud,
François P. Landes,
Sylvain Chevallier,
Guillaume Charpiat
Abstract:
Deep learning has shown impressive results obtained at the cost of training huge neural networks. However, the larger the architecture, the higher the computational, financial, and environmental costs during training and inference. We aim at reducing both training and inference durations. We focus on Neural Architecture Growth, which can increase the size of a small model when needed, directly dur…
▽ More
Deep learning has shown impressive results obtained at the cost of training huge neural networks. However, the larger the architecture, the higher the computational, financial, and environmental costs during training and inference. We aim at reducing both training and inference durations. We focus on Neural Architecture Growth, which can increase the size of a small model when needed, directly during training using information from the backpropagation. We expand existing work and freely grow neural networks in the form of any Directed Acyclic Graph by reducing expressivity bottlenecks in the architecture. We explore strategies to reduce excessive computations and steer network growth toward more parameter-efficient architectures.
△ Less
Submitted 14 February, 2025; v1 submitted 22 January, 2025;
originally announced January 2025.
-
Class Imbalance in Anomaly Detection: Learning from an Exactly Solvable Model
Authors:
F. S. Pezzicoli,
V. Ros,
F. P. Landes,
M. Baity-Jesi
Abstract:
Class imbalance (CI) is a longstanding problem in machine learning, slowing down training and reducing performances. Although empirical remedies exist, it is often unclear which ones work best and when, due to the lack of an overarching theory. We address a common case of imbalance, that of anomaly (or outlier) detection. We provide a theoretical framework to analyze, interpret and address CI. It…
▽ More
Class imbalance (CI) is a longstanding problem in machine learning, slowing down training and reducing performances. Although empirical remedies exist, it is often unclear which ones work best and when, due to the lack of an overarching theory. We address a common case of imbalance, that of anomaly (or outlier) detection. We provide a theoretical framework to analyze, interpret and address CI. It is based on an exact solution of the teacher-student perceptron model, through replica theory. Within this framework, one can distinguish several sources of CI: either intrinsic, train or test imbalance. Our analysis reveals that the optimal train imbalance is generally different from 50%, with a non trivial dependence on the intrinsic imbalance, the abundance of data and on the noise in the learning. Moreover, there is a crossover between a small noise training regime where results are independent of the noise level to a high noise regime where performances quickly degrade with noise. Our results challenge some of the conventional wisdom on CI and offer practical guidelines to address it.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Assessing the predicting power of GPS data for aftershocks forecasting
Authors:
Vincenzo Maria Schimmenti,
Giuseppe Petrillo,
Alberto Rosso,
Francois P. Landes
Abstract:
We present a machine learning approach for the aftershock forecasting of Japanese earthquake catalogue from 2015 to 2019. Our method takes as sole input the ground surface deformation as measured by Global Positioning System (GPS) stations at the day of the mainshock, and processes it with a Convolutional Neural Network (CNN), thus capturing the input's spatial correlations. Despite the moderate a…
▽ More
We present a machine learning approach for the aftershock forecasting of Japanese earthquake catalogue from 2015 to 2019. Our method takes as sole input the ground surface deformation as measured by Global Positioning System (GPS) stations at the day of the mainshock, and processes it with a Convolutional Neural Network (CNN), thus capturing the input's spatial correlations. Despite the moderate amount of data the performance of this new approach is very promising. The accuracy of the prediction heavily relies on the density of GPS stations: the predictive power is lost when the mainshocks occur far from measurement stations, as in offshore regions.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Rotation-equivariant Graph Neural Networks for Learning Glassy Liquids Representations
Authors:
Francesco Saverio Pezzicoli,
Guillaume Charpiat,
François P. Landes
Abstract:
The difficult problem of relating the static structure of glassy liquids and their dynamics is a good target for Machine Learning, an approach which excels at finding complex patterns hidden in data. Indeed, this approach is currently a hot topic in the glassy liquids community, where the state of the art consists in Graph Neural Networks (GNNs), which have great expressive power but are heavy mod…
▽ More
The difficult problem of relating the static structure of glassy liquids and their dynamics is a good target for Machine Learning, an approach which excels at finding complex patterns hidden in data. Indeed, this approach is currently a hot topic in the glassy liquids community, where the state of the art consists in Graph Neural Networks (GNNs), which have great expressive power but are heavy models and lack interpretability. Inspired by recent advances in the field of Machine Learning group-equivariant representations, we build a GNN that learns a robust representation of the glass' static structure by constraining it to preserve the roto-translation (SE(3)) equivariance. We show that this constraint significantly improves the predictive power at comparable or reduced number of parameters but most importantly, improves the ability to generalize to unseen temperatures. While remaining a Deep network, our model has improved interpretability compared to other GNNs, as the action of our basic convolution layer relates directly to well-known rotation-invariant expert features. Through transfer-learning experiments displaying unprecedented performance, we demonstrate that our network learns a robust representation, which allows us to push forward the idea of a learned structural order parameter for glasses.
△ Less
Submitted 12 April, 2024; v1 submitted 6 November, 2022;
originally announced November 2022.
-
Attractive versus truncated repulsive supercooled liquids: The dynamics is encoded in the pair correlation function
Authors:
François P. Landes,
Giulio Biroli,
Olivier Dauchot,
Andrea J. Liu,
David R. Reichman
Abstract:
We compare glassy dynamics in two liquids that differ in the form of their interaction potentials. Both systems have the same repulsive interactions but one has also an attractive part in the potential. These two systems exhibit very different dynamics despite having nearly identical pair correlation functions. We demonstrate that a properly weighted integral of the pair correlation function, whic…
▽ More
We compare glassy dynamics in two liquids that differ in the form of their interaction potentials. Both systems have the same repulsive interactions but one has also an attractive part in the potential. These two systems exhibit very different dynamics despite having nearly identical pair correlation functions. We demonstrate that a properly weighted integral of the pair correlation function, which amplifies the subtle differences between the two systems, correctly captures their dynamical differences. The weights are obtained from a standard machine learning algorithm.
△ Less
Submitted 14 January, 2020; v1 submitted 3 June, 2019;
originally announced June 2019.