-
Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology
Authors:
Boqi Chen,
Cédric Vincent-Cuaz,
Lydia A. Schoenpflug,
Manuel Madeira,
Lisa Fournier,
Vaishnavi Subramanian,
Sonali Andani,
Samuel Ruiperez-Campillo,
Julia E. Vogt,
Raphaëlle Luisier,
Dorina Thanou,
Viktor H. Koelzer,
Pascal Frossard,
Gabriele Campanella,
Gunnar Rätsch
Abstract:
Vision foundation models (FMs) are accelerating the development of digital pathology algorithms and transforming biomedical research. These models learn, in a self-supervised manner, to represent histological features in highly heterogeneous tiles extracted from whole-slide images (WSIs) of real-world patient samples. The performance of these FMs is significantly influenced by the size, diversity,…
▽ More
Vision foundation models (FMs) are accelerating the development of digital pathology algorithms and transforming biomedical research. These models learn, in a self-supervised manner, to represent histological features in highly heterogeneous tiles extracted from whole-slide images (WSIs) of real-world patient samples. The performance of these FMs is significantly influenced by the size, diversity, and balance of the pre-training data. However, data selection has been primarily guided by expert knowledge at the WSI level, focusing on factors such as disease classification and tissue types, while largely overlooking the granular details available at the tile level. In this paper, we investigate the potential of unsupervised automatic data curation at the tile-level, taking into account 350 million tiles. Specifically, we apply hierarchical clustering trees to pre-extracted tile embeddings, allowing us to sample balanced datasets uniformly across the embedding space of the pretrained FM. We further identify these datasets are subject to a trade-off between size and balance, potentially compromising the quality of representations learned by FMs, and propose tailored batch sampling strategies to mitigate this effect. We demonstrate the effectiveness of our method through improved performance on a diverse range of clinically relevant downstream tasks.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Fake it till you predict it: data augmentation strategies to detect initiation and termination of oncology treatment
Authors:
Valentin Pohyer,
Elizabeth Fabre,
Stéphane Oudard,
Laure Fournier,
Bastien Rance
Abstract:
At the hospital, the dispersion of information regarding anti-cancer treatment makes it difficult to extract. We proposed a solution capable of identifying dates, drugs and their temporal relationship within free-text oncology reports with very few manual annotations. We used pattern recognition for dates, dictionaries for drugs and transformer language models for the relationship, combined with a…
▽ More
At the hospital, the dispersion of information regarding anti-cancer treatment makes it difficult to extract. We proposed a solution capable of identifying dates, drugs and their temporal relationship within free-text oncology reports with very few manual annotations. We used pattern recognition for dates, dictionaries for drugs and transformer language models for the relationship, combined with a data augmentation strategy. Our models achieved good prediction F1-scores, reaching 0.872. The performance of models with data augmentation outperforms those of models without. By inferring such models, we can now identify and structure thousands of previously unavailable treatment events to better apprehend solutions and patient response.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training
Authors:
Adel Nabli,
Louis Fournier,
Pierre Erbacher,
Louis Serrano,
Eugene Belilovsky,
Edouard Oyallon
Abstract:
Training LLMs relies on distributed implementations using multiple GPUs to compute gradients in parallel with sharded optimizers. However, synchronizing gradients in data parallel setups introduces communication overhead that grows with the number of workers, limiting parallelization efficiency. Local optimization algorithms reduce communications but incur high memory costs as they prevent optimiz…
▽ More
Training LLMs relies on distributed implementations using multiple GPUs to compute gradients in parallel with sharded optimizers. However, synchronizing gradients in data parallel setups introduces communication overhead that grows with the number of workers, limiting parallelization efficiency. Local optimization algorithms reduce communications but incur high memory costs as they prevent optimizer state sharding, hindering scalability. To address this, we propose \textbf{AC}cumulate while \textbf{CO}mmunicate (\acco), a memory-efficient optimization algorithm for distributed LLM training. By synchronizing delayed gradients while computing new ones, \acco~reduces GPU idle time and supports heterogeneous hardware. To mitigate the convergence issues caused by delayed updates, we introduce a novel technique ensuring training dynamics align with standard distributed optimization. Compared to ZeRO-1, our approach is significantly faster and scales effectively across heterogeneous hardware.
△ Less
Submitted 19 May, 2025; v1 submitted 3 June, 2024;
originally announced June 2024.
-
PETRA: Parallel End-to-end Training with Reversible Architectures
Authors:
Stéphane Rivaud,
Louis Fournier,
Thomas Pumir,
Eugene Belilovsky,
Michael Eickenberg,
Edouard Oyallon
Abstract:
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations…
▽ More
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.
△ Less
Submitted 19 May, 2025; v1 submitted 4 June, 2024;
originally announced June 2024.
-
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Authors:
Louis Fournier,
Adel Nabli,
Masih Aminbeidokhti,
Marco Pedersoli,
Eugene Belilovsky,
Edouard Oyallon
Abstract:
The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as model…
▽ More
The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as models converge to different loss basins, and aligning the models to improve the performance of the average is challenging. Alternatively, inspired by distributed training, methods like DART and PAPA have been proposed to train several models in parallel such that they will end up in the same basin, resulting in good averaging accuracy. However, these methods either compromise ensembling accuracy or demand significant communication between models during training. In this paper, we introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy. WASH maintains models within the same basin by randomly shuffling a small percentage of weights during training, resulting in diverse models and lower communication costs compared to standard parameter averaging methods.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks
Authors:
Louis Fournier,
Edouard Oyallon
Abstract:
Training large deep learning models requires parallelization techniques to scale. In existing methods such as Data Parallelism or ZeRO-DP, micro-batches of data are processed in parallel, which creates two drawbacks: the total memory required to store the model's activations peaks at the end of the forward pass, and gradients must be simultaneously averaged at the end of the backpropagation step.…
▽ More
Training large deep learning models requires parallelization techniques to scale. In existing methods such as Data Parallelism or ZeRO-DP, micro-batches of data are processed in parallel, which creates two drawbacks: the total memory required to store the model's activations peaks at the end of the forward pass, and gradients must be simultaneously averaged at the end of the backpropagation step. We propose Cyclic Data Parallelism, a novel paradigm shifting the execution of the micro-batches from simultaneous to sequential, with a uniform delay. At the cost of a slight gradient delay, the total memory taken by activations is constant, and the gradient communications are balanced during the training step. With Model Parallelism, our technique reduces the number of GPUs needed, by sharing GPUs across micro-batches. Within the ZeRO-DP framework, our technique allows communication of the model states with point-to-point operations rather than a collective broadcast operation. We illustrate the strength of our approach on the CIFAR-10 and ImageNet datasets.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Can Forward Gradient Match Backpropagation?
Authors:
Louis Fournier,
Stéphane Rivaud,
Eugene Belilovsky,
Michael Eickenberg,
Edouard Oyallon
Abstract:
Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While c…
▽ More
Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Paraphrases do not explain word analogies
Authors:
Louis Fournier,
Ewan Dunbar
Abstract:
Many types of distributional word embeddings (weakly) encode linguistic regularities as directions (the difference between "jump" and "jumped" will be in a similar direction to that of "walk" and "walked," and so on). Several attempts have been made to explain this fact. We respond to Allen and Hospedales' recent (ICML, 2019) theoretical explanation, which claims that word2vec and GloVe will encod…
▽ More
Many types of distributional word embeddings (weakly) encode linguistic regularities as directions (the difference between "jump" and "jumped" will be in a similar direction to that of "walk" and "walked," and so on). Several attempts have been made to explain this fact. We respond to Allen and Hospedales' recent (ICML, 2019) theoretical explanation, which claims that word2vec and GloVe will encode linguistic regularities whenever a specific relation of paraphrase holds between the four words involved in the regularity. We demonstrate that the explanation does not go through: the paraphrase relations needed under this explanation do not hold empirically.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Analogies minus analogy test: measuring regularities in word embeddings
Authors:
Louis Fournier,
Emmanuel Dupoux,
Ewan Dunbar
Abstract:
Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar direction…
▽ More
Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar directions between pairs of words drawn from different broad classes, such as France--London, China--Ottawa, ...) and pairing consistency (the existence of a regular transformation between correctly-matched pairs such as France:Paris::China:Beijing). We show that, while the standard analogy test is flawed, several popular word embeddings do nevertheless encode linguistic regularities.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Geometric entanglement in integer quantum Hall states
Authors:
Benoit Sirois,
Lucie Maude Fournier,
Julien Leduc,
William Witczak-Krempa
Abstract:
We study the quantum entanglement structure of integer quantum Hall states via the reduced density matrix of spatial subregions. In particular, we examine the eigenstates, spectrum and entanglement entropy (EE) of the density matrix for various ground and excited states, with or without mass anisotropy. We focus on an important class of regions that contain sharp corners or cusps, leading to a geo…
▽ More
We study the quantum entanglement structure of integer quantum Hall states via the reduced density matrix of spatial subregions. In particular, we examine the eigenstates, spectrum and entanglement entropy (EE) of the density matrix for various ground and excited states, with or without mass anisotropy. We focus on an important class of regions that contain sharp corners or cusps, leading to a geometric angle-dependent contribution to the EE. We unravel surprising relations by comparing this corner term at different fillings. We further find that the corner term, when properly normalized, has nearly the same angle dependence as numerous conformal field theories (CFTs) in two spatial dimensions, which hints at a broader structure. In fact, the Hall corner term is found to obey bounds that were previously obtained for CFTs. In addition, the low-lying entanglement spectrum and the corresponding eigenfunctions reveal "excitations" localized near corners. Finally, we present an outlook for fractional quantum Hall states.
△ Less
Submitted 4 September, 2020;
originally announced September 2020.
-
Possible quantum nematic in a colossal magnetoresistance material
Authors:
Gabrielle Beaudin,
Lucie Maude Fournier,
Michael Nicklas,
Michel Kenzelmann,
Mark Laver,
William Witczak-Krempa,
Andrea D. Bianchi
Abstract:
EuB6 has for a long time captured the attention of the physics community, as it shows a ferromagnetic phase transition leading to a insulator the metal transition together with colossal magnetoresistance (CMR). EuB6 has a very low carrier density, which is known to drastically change the interaction between the localized Eu moments and the conduction electrons. One of early triumphs of the quantum…
▽ More
EuB6 has for a long time captured the attention of the physics community, as it shows a ferromagnetic phase transition leading to a insulator the metal transition together with colossal magnetoresistance (CMR). EuB6 has a very low carrier density, which is known to drastically change the interaction between the localized Eu moments and the conduction electrons. One of early triumphs of the quantum theory in condensed matter was the presence of Fermi surface, which is intimately linked to the symmetry of the underlying crystal lattice. This symmetry can be probed by angle resolved magnetoresistance (AMRO) measurements. Here, we present angle resolved magnetoresistance (AMRO) measurements that show a that in EuB6 this symmetry is broken, possibly indicating the presence of a quantum nematic phase. We identify the region in the temperature-magnetic field phase diagram where the magnetoresistance shows two-fold oscillations instead of the expected fourfold pattern. Quantum nematic phases are analogous to classical liquid crystals. Like liquid crystals, which break the rotational symmetry of space, their quantum analogs break the point-group symmetry of the crystal due to strong electron-electron interactions, as in quantum Hall states, Sr3Ru2O7, and high temperature superconductors. This is the same region where magnetic polarons were previously observed, suggesting that they drive the nematicity in EuB6. This is also the region of the phase diagram where EuB6 shows a colossal magnetoresistance (CMR). This novel interplay between magnetic and electronic properties could thus be harnessed for spintronic applications.
△ Less
Submitted 30 September, 2021; v1 submitted 20 August, 2020;
originally announced August 2020.
-
AI-Driven CT-based quantification, staging and short-term outcome prediction of COVID-19 pneumonia
Authors:
Guillaume Chassagnon,
Maria Vakalopoulou,
Enzo Battistella,
Stergios Christodoulidis,
Trieu-Nghi Hoang-Thi,
Severine Dangeard,
Eric Deutsch,
Fabrice Andre,
Enora Guillo,
Nara Halm,
Stefany El Hajj,
Florian Bompard,
Sophie Neveu,
Chahinez Hani,
Ines Saab,
Alienor Campredon,
Hasmik Koulakian,
Souhail Bennani,
Gael Freche,
Aurelien Lombard,
Laure Fournier,
Hippolyte Monnier,
Teodor Grand,
Jules Gregory,
Antoine Khalil
, et al. (6 additional authors not shown)
Abstract:
Chest computed tomography (CT) is widely used for the management of Coronavirus disease 2019 (COVID-19) pneumonia because of its availability and rapidity. The standard of reference for confirming COVID-19 relies on microbiological tests but these tests might not be available in an emergency setting and their results are not immediately available, contrary to CT. In addition to its role for early…
▽ More
Chest computed tomography (CT) is widely used for the management of Coronavirus disease 2019 (COVID-19) pneumonia because of its availability and rapidity. The standard of reference for confirming COVID-19 relies on microbiological tests but these tests might not be available in an emergency setting and their results are not immediately available, contrary to CT. In addition to its role for early diagnosis, CT has a prognostic role by allowing visually evaluating the extent of COVID-19 lung abnormalities. The objective of this study is to address prediction of short-term outcomes, especially need for mechanical ventilation. In this multi-centric study, we propose an end-to-end artificial intelligence solution for automatic quantification and prognosis assessment by combining automatic CT delineation of lung disease meeting performance of experts and data-driven identification of biomarkers for its prognosis. AI-driven combination of variables with CT-based biomarkers offers perspectives for optimal patient management given the shortage of intensive care beds and ventilators.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Spared cognitive and behavioral functions prior to epilepsy onset in a rat model of 2 subcortical band heteropia
Authors:
Fanny Sandrine Martineau,
Lauriane Fournier,
Emmanuelle Buhler,
Françoise Watrin,
Francesca Sargolini,
Jean-Bernard Manent,
Bruno Poucet,
Alfonso Represa
Abstract:
13 Subcortical band heterotopia (SBH), also known as doublecortex syndrome, is a 14 malformation of cortical development resulting from mutations in the doublecortin gene 15 (DCX). It is characterized by a lack of migration of cortical neurons that accumulate in the 16 white matter forming a heterotopic band. Patients with SBH may present mild to moderate 17 intellectual disability as well as epil…
▽ More
13 Subcortical band heterotopia (SBH), also known as doublecortex syndrome, is a 14 malformation of cortical development resulting from mutations in the doublecortin gene 15 (DCX). It is characterized by a lack of migration of cortical neurons that accumulate in the 16 white matter forming a heterotopic band. Patients with SBH may present mild to moderate 17 intellectual disability as well as epilepsy. The SBH condition can be modeled in rats by in 18 utero knockdown (KD) of Dcx. The affected cells form an SBH reminiscent of that observed in 19 human patients and the animals develop a chronic epileptic condition in adulthood. Here, 20 we investigated if the presence of an SBH is sufficient to induce cognitive impairment in 21
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
RFC 7800 - Money Over IP
Authors:
Laurent Fournier
Abstract:
This Request For Comment (RFC) is a proposal for a new protocol to use money over the Internet. Features like a distributed architecture, a published cryptographic algorithm, a minimal authority responsibility, the absence of fees will make this protocol a perfect tool for citizens in today's digital World. An implementation has validated the main principles and we entering now a testing phase (v0…
▽ More
This Request For Comment (RFC) is a proposal for a new protocol to use money over the Internet. Features like a distributed architecture, a published cryptographic algorithm, a minimal authority responsibility, the absence of fees will make this protocol a perfect tool for citizens in today's digital World. An implementation has validated the main principles and we entering now a testing phase (v0.1). Depending of the results, a released date for the 1.0 revision will to decided to allow anybody to send or to receive money to/from anyone, in any currency, with a regular and personal smart-phone. A distributed hash table (DHT) is used to store all transactions, public keys and certificates redundantly on several nodes. The all system may replace coins, banknotes and classical checks in the future. We also argue that the Bitcoin technology does not satisfy the requirements for a digital mean of payment. This proposal is expected to be reviewed and commented by the Internet Engineering Task Force.
△ Less
Submitted 28 August, 2015; v1 submitted 14 August, 2015;
originally announced August 2015.
-
Merchant Sharing Towards a Zero Marginal Cost Economy
Authors:
Laurent Fournier
Abstract:
This paper is the first attempt to formalize a new field of economics; studding the Intangibles Goods available on the Internet. We are taking advantage of the digital world's specific rules, in particular the zero marginal cost, to propose a theory of trading & sharing unified. A function based money is created as a world-wide currency; "cup". We argue that our system discourage speculation activ…
▽ More
This paper is the first attempt to formalize a new field of economics; studding the Intangibles Goods available on the Internet. We are taking advantage of the digital world's specific rules, in particular the zero marginal cost, to propose a theory of trading & sharing unified. A function based money is created as a world-wide currency; "cup". We argue that our system discourage speculation activities while it makes easy captured taxes for governments. The implementation removes the today's paywall on the Internet and provides a simple-to-use, open-source, free-of-charge, highly-secure, person-to-person, privacy-respectful, digital payment tool for citizens, using standard smart-phones with a strong authentication. Next step will be the propagation of the network application and we expect many shared benefits for the whole economics development.
△ Less
Submitted 7 May, 2014;
originally announced May 2014.
-
Économie des biens immatériels - Economics of Intangible Goods
Authors:
Laurent Fournier
Abstract:
We introduce a new economic system suited for Intangible Goods ({\sc ig}). We argue that such system can now be implemented in the real world using advance technics in distributed network computing and cryptography. The specification of the so called \net{} is presented. To Limit the number of financial transactions, the system is forced to define its own currency, with many benefits. The new "cup…
▽ More
We introduce a new economic system suited for Intangible Goods ({\sc ig}). We argue that such system can now be implemented in the real world using advance technics in distributed network computing and cryptography. The specification of the so called \net{} is presented. To Limit the number of financial transactions, the system is forced to define its own currency, with many benefits. The new "cup" currency, extended worldwide, is dedicated to {\sc ig}, available only for person-to-person trading, protected from speculation and adapted for tax recovery with no additional computation. Those nices features makes the \net{} a new democratic tool, fixing specific issues in {\sc ig} trading and reviving a whole domain activity. We emphasis on the fact that all proposed documentation, algorithm, program in any language related to this proposal shall be open-source without any possibility to post any patent of any sort on the system or subsystem. This new trading model should be considered as a pure intellectual construction, like parts of Mathematics and then belongs to nobody or everybody, like $1+1=2$. Next step will be to test, validate the security of various implementations details, and to ask for legal rules adaptations.
The first draft paper is written in French language and posted to arXiv.org and hal.archive-ouverte.fr . We expect to provide an English translation before Christmas.
△ Less
Submitted 26 November, 2012; v1 submitted 15 October, 2012;
originally announced October 2012.
-
A lattice model for the kinetics of rupture of fluid bilayer membranes
Authors:
Luc Fournier,
Bela Joos
Abstract:
We have constructed a model for the kinetics of rupture of membranes under tension, applying physical principles relevant to lipid bilayers held together by hydrophobic interactions. The membrane is characterized by the bulk compressibility (for expansion), the thickness of the hydrophobic part of the bilayer, the hydrophobicity and a parameter characterizing the tail rigidity of the lipids. The…
▽ More
We have constructed a model for the kinetics of rupture of membranes under tension, applying physical principles relevant to lipid bilayers held together by hydrophobic interactions. The membrane is characterized by the bulk compressibility (for expansion), the thickness of the hydrophobic part of the bilayer, the hydrophobicity and a parameter characterizing the tail rigidity of the lipids. The model is a lattice model which incorporates strain relaxation, and considers the nucleation of pores at constant area, constant temperature, and constant particle number. The particle number is conserved by allowing multiple occupancy of the sites. An equilibrium ``phase diagram'' is constructed as a function of temperature and strain with the total pore surface and distribution as the order parameters. A first order rupture line is found with increasing tension, and a continuous increase in proto-pore concentration with rising temperature till instability. The model explains current results on saturated and unsaturated PC lipid bilayers and thicker artificial bilayers made of diblock copolymers. Pore size distributions are presented for various values of area expansion and temperature, and the fractal dimension of the pore edge is evaluated.
△ Less
Submitted 18 February, 2003; v1 submitted 17 April, 2002;
originally announced April 2002.