-
On Monotonicity in AI Alignment
Authors:
Gilles Bareilles,
Julien Fageot,
Lê-Nguyên Hoang,
Peva Blanchard,
Wassim Bouaziz,
Sébastien Rouault,
El-Mahdi El-Mhamdi
Abstract:
Comparison-based preference learning has become central to the alignment of AI models with human preferences. However, these methods may behave counterintuitively. After empirically observing that, when accounting for a preference for response $y$ over $z$, the model may actually decrease the probability (and reward) of generating $y$ (an observation also made by others), this paper investigates t…
▽ More
Comparison-based preference learning has become central to the alignment of AI models with human preferences. However, these methods may behave counterintuitively. After empirically observing that, when accounting for a preference for response $y$ over $z$, the model may actually decrease the probability (and reward) of generating $y$ (an observation also made by others), this paper investigates the root causes of (non) monotonicity, for a general comparison-based preference learning framework that subsumes Direct Preference Optimization (DPO), Generalized Preference Optimization (GPO) and Generalized Bradley-Terry (GBT). Under mild assumptions, we prove that such methods still satisfy what we call local pairwise monotonicity. We also provide a bouquet of formalizations of monotonicity, and identify sufficient conditions for their guarantee, thereby providing a toolbox to evaluate how prone learning models are to monotonicity violations. These results clarify the limitations of current methods and provide guidance for developing more trustworthy preference learning algorithms.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Generalizing while preserving monotonicity in comparison-based preference learning models
Authors:
Julien Fageot,
Peva Blanchard,
Gilles Bareilles,
Lê-Nguyên Hoang
Abstract:
If you tell a learning model that you prefer an alternative $a$ over another alternative $b$, then you probably expect the model to be monotone, that is, the valuation of $a$ increases, and that of $b$ decreases. Yet, perhaps surprisingly, many widely deployed comparison-based preference learning models, including large language models, fail to have this guarantee. Until now, the only comparison-b…
▽ More
If you tell a learning model that you prefer an alternative $a$ over another alternative $b$, then you probably expect the model to be monotone, that is, the valuation of $a$ increases, and that of $b$ decreases. Yet, perhaps surprisingly, many widely deployed comparison-based preference learning models, including large language models, fail to have this guarantee. Until now, the only comparison-based preference learning algorithms that were proved to be monotone are the Generalized Bradley-Terry models. Yet, these models are unable to generalize to uncompared data. In this paper, we advance the understanding of the set of models with generalization ability that are monotone. Namely, we propose a new class of Linear Generalized Bradley-Terry models with Diffusion Priors, and identify sufficient conditions on alternatives' embeddings that guarantee monotonicity. Our experiments show that this monotonicity is far from being a general guarantee, and that our new class of generalizing models improves accuracy, especially when the dataset is limited.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Solidago: A Modular Collaborative Scoring Pipeline
Authors:
Lê Nguyên Hoang,
Romain Beylerian,
Bérangère Colbois,
Julien Fageot,
Louis Faucon,
Aidan Jungo,
Alain Le Noac'h,
Adrien Matissart,
Oscar Villemaud
Abstract:
This paper presents Solidago, an end-to-end modular pipeline to allow any community of users to collaboratively score any number of entities. Solidago proposes a six-module decomposition. First, it uses pretrust and peer-to-peer vouches to assign trust scores to users. Second, based on participation, trust scores are turned into voting rights per user per entity. Third, for each user, a preference…
▽ More
This paper presents Solidago, an end-to-end modular pipeline to allow any community of users to collaboratively score any number of entities. Solidago proposes a six-module decomposition. First, it uses pretrust and peer-to-peer vouches to assign trust scores to users. Second, based on participation, trust scores are turned into voting rights per user per entity. Third, for each user, a preference model is learned from the user's evaluation data. Fourth, users' models are put on a similar scale. Fifth, these models are securely aggregated. Sixth, models are post-processed to yield human-readable global scores. We also propose default implementations of the six modules, including a novel trust propagation algorithm, and adaptations of state-of-the-art scaling and aggregation solutions. Our pipeline has been successfully deployed on the open-source platform tournesol.app. We thereby lay an appealing foundation for the collaborative, effective, scalable, fair, interpretable and secure scoring of any set of entities.
△ Less
Submitted 25 September, 2024; v1 submitted 30 October, 2022;
originally announced November 2022.
-
Entropic Compressibility of Lévy Processes
Authors:
Julien Fageot,
Alireza Fallah,
Thibaut Horel
Abstract:
In contrast to their seemingly simple and shared structure of independence and stationarity, Lévy processes exhibit a wide variety of behaviors, from the self-similar Wiener process to piecewise-constant compound Poisson processes. Inspired by the recent paper of Ghourchian, Amini, and Gohari (2018), we characterize their compressibility by studying the entropy of their double discretization (both…
▽ More
In contrast to their seemingly simple and shared structure of independence and stationarity, Lévy processes exhibit a wide variety of behaviors, from the self-similar Wiener process to piecewise-constant compound Poisson processes. Inspired by the recent paper of Ghourchian, Amini, and Gohari (2018), we characterize their compressibility by studying the entropy of their double discretization (both in time and amplitude) in the regime of vanishing discretization steps. For a Lévy process with absolutely continuous marginals, this reduces to understanding the asymptotics of the differential entropy of its marginals at small times, for which we obtain a new local central limit theorem. We generalize known results for stable processes to the non-stable case, with a special focus on Lévy processes that are locally self-similar, and conceptualize a new compressibility hierarchy of Lévy processes, captured by their Blumenthal-Getoor index.
△ Less
Submitted 15 May, 2022; v1 submitted 22 September, 2020;
originally announced September 2020.
-
3D Solid Spherical Bispectrum CNNs for Biomedical Texture Analysis
Authors:
Valentin Oreiller,
Vincent Andrearczyk,
Julien Fageot,
John O. Prior,
Adrien Depeursinge
Abstract:
Locally Rotation Invariant (LRI) operators have shown great potential in biomedical texture analysis where patterns appear at random positions and orientations. LRI operators can be obtained by computing the responses to the discrete rotation of local descriptors, such as Local Binary Patterns (LBP) or the Scale Invariant Feature Transform (SIFT). Other strategies achieve this invariance using Lap…
▽ More
Locally Rotation Invariant (LRI) operators have shown great potential in biomedical texture analysis where patterns appear at random positions and orientations. LRI operators can be obtained by computing the responses to the discrete rotation of local descriptors, such as Local Binary Patterns (LBP) or the Scale Invariant Feature Transform (SIFT). Other strategies achieve this invariance using Laplacian of Gaussian or steerable wavelets for instance, preventing the introduction of sampling errors during the discretization of the rotations. In this work, we obtain LRI operators via the local projection of the image on the spherical harmonics basis, followed by the computation of the bispectrum, which shares and extends the invariance properties of the spectrum. We investigate the benefits of using the bispectrum over the spectrum in the design of a LRI layer embedded in a shallow Convolutional Neural Network (CNN) for 3D image analysis. The performance of each design is evaluated on two datasets and compared against a standard 3D CNN. The first dataset is made of 3D volumes composed of synthetically generated rotated patterns, while the second contains malignant and benign pulmonary nodules in Computed Tomography (CT) images. The results indicate that bispectrum CNNs allows for a significantly better characterization of 3D textures than both the spectral and standard CNN. In addition, it can efficiently learn with fewer training examples and trainable parameters when compared to a standard convolutional layer.
△ Less
Submitted 2 June, 2020; v1 submitted 28 April, 2020;
originally announced April 2020.
-
Wavelet Compressibility of Compound Poisson Processes
Authors:
Shayan Aziznejad,
Julien Fageot
Abstract:
In this paper, we precisely quantify the wavelet compressibility of compound Poisson processes. To that end, we expand the given random process over the Haar wavelet basis and we analyse its asymptotic approximation properties. By only considering the nonzero wavelet coefficients up to a given scale, what we call the greedy approximation, we exploit the extreme sparsity of the wavelet expansion th…
▽ More
In this paper, we precisely quantify the wavelet compressibility of compound Poisson processes. To that end, we expand the given random process over the Haar wavelet basis and we analyse its asymptotic approximation properties. By only considering the nonzero wavelet coefficients up to a given scale, what we call the greedy approximation, we exploit the extreme sparsity of the wavelet expansion that derives from the piecewise-constant nature of compound Poisson processes. More precisely, we provide lower and upper bounds for the mean squared error of greedy approximation of compound Poisson processes. We are then able to deduce that the greedy approximation error has a sub-exponential and super-polynomial asymptotic behavior. Finally, we provide numerical experiments to highlight the remarkable ability of wavelet-based dictionaries in achieving highly compressible approximations of compound Poisson processes.
△ Less
Submitted 17 December, 2021; v1 submitted 25 March, 2020;
originally announced March 2020.
-
Local Rotation Invariance in 3D CNNs
Authors:
Vincent Andrearczyk,
Julien Fageot,
Valentin Oreiller,
Xavier Montet,
Adrien Depeursinge
Abstract:
Locally Rotation Invariant (LRI) image analysis was shown to be fundamental in many applications and in particular in medical imaging where local structures of tissues occur at arbitrary rotations. LRI constituted the cornerstone of several breakthroughs in texture analysis, including Local Binary Patterns (LBP), Maximum Response 8 (MR8) and steerable filterbanks. Whereas globally rotation invaria…
▽ More
Locally Rotation Invariant (LRI) image analysis was shown to be fundamental in many applications and in particular in medical imaging where local structures of tissues occur at arbitrary rotations. LRI constituted the cornerstone of several breakthroughs in texture analysis, including Local Binary Patterns (LBP), Maximum Response 8 (MR8) and steerable filterbanks. Whereas globally rotation invariant Convolutional Neural Networks (CNN) were recently proposed, LRI was very little investigated in the context of deep learning. LRI designs allow learning filters accounting for all orientations, which enables a drastic reduction of trainable parameters and training data when compared to standard 3D CNNs. In this paper, we propose and compare several methods to obtain LRI CNNs with directional sensitivity. Two methods use orientation channels (responses to rotated kernels), either by explicitly rotating the kernels or using steerable filters. These orientation channels constitute a locally rotation equivariant representation of the data. Local pooling across orientations yields LRI image analysis. Steerable filters are used to achieve a fine and efficient sampling of 3D rotations as well as a reduction of trainable parameters and operations, thanks to a parametric representations involving solid Spherical Harmonics (SH), which are products of SH with associated learned radial profiles.Finally, we investigate a third strategy to obtain LRI based on rotational invariants calculated from responses to a learned set of solid SHs. The proposed methods are evaluated and compared to standard CNNs on 3D datasets including synthetic textured volumes composed of rotated patterns, and pulmonary nodule classification in CT. The results show the importance of LRI image analysis while resulting in a drastic reduction of trainable parameters, outperforming standard 3D CNNs trained with data augmentation.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
Continuous-Domain Solutions of Linear Inverse Problems with Tikhonov vs. Generalized TV Regularization
Authors:
Harshit Gupta,
Julien Fageot,
Michael Unser
Abstract:
We consider linear inverse problems that are formulated in the continuous domain. The object of recovery is a function that is assumed to minimize a convex objective functional. The solutions are constrained by imposing a continuous-domain regularization. We derive the parametric form of the solution (representer theorems) for Tikhonov (quadratic) and generalized total-variation (gTV) regularizati…
▽ More
We consider linear inverse problems that are formulated in the continuous domain. The object of recovery is a function that is assumed to minimize a convex objective functional. The solutions are constrained by imposing a continuous-domain regularization. We derive the parametric form of the solution (representer theorems) for Tikhonov (quadratic) and generalized total-variation (gTV) regularizations. We show that, in both cases, the solutions are splines that are intimately related to the regularization operator. In the Tikhonov case, the solution is smooth and constrained to live in a fixed subspace that depends on the measurement operator. By contrast, the gTV regularization results in a sparse solution composed of only a few dictionary elements that are upper-bounded by the number of measurements and independent of the measurement operator. Our findings for the gTV regularization resonates with the minimization of the $l_1$ norm, which is its discrete counterpart and also produces sparse solutions. Finally, we find the experimental solutions for some measurement models in one dimension. We discuss the special case when the gTV regularization results in multiple solutions and devise an algorithm to find an extreme point of the solution set which is guaranteed to be sparse.
△ Less
Submitted 5 February, 2018;
originally announced February 2018.
-
Angular Accuracy of Steerable Feature Detectors
Authors:
Zsuzsanna Püspöki,
Arash Amini,
Julien Fageot,
John Paul Ward,
Michael Unser
Abstract:
The detection of landmarks or patterns is of interest for extracting features in biological images. Hence, algorithms for finding these keypoints have been extensively investigated in the literature, and their localization and detection properties are well known. In this paper, we study the complementary topic of local orientation estimation, which has not received similar attention. Simply stated…
▽ More
The detection of landmarks or patterns is of interest for extracting features in biological images. Hence, algorithms for finding these keypoints have been extensively investigated in the literature, and their localization and detection properties are well known. In this paper, we study the complementary topic of local orientation estimation, which has not received similar attention. Simply stated, the problem that we address is the following: estimate the angle of rotation of a pattern with steerable filters centered at the same location, where the image is corrupted by colored isotropic Gaussian noise. For this problem, we use a statistical framework based on the Cramér-Rao lower bound (CRLB) that sets a fundamental limit on the accuracy of the corresponding class of estimators. We propose a scheme to measure the performance of estimators based on steerable filters (as a lower bound), while considering the connection to maximum likelihood estimation. Beyond the general results, we analyze the asymptotic behaviour of the lower bound in terms of the order of steerablility and propose an optimal subset of components that minimizes the bound. We define a mechanism for selecting optimal subspaces of the span of the detectors. These are characterized by the most relevant angular frequencies. Finally, we project our template to a basis of steerable functions and experimentally show that the prediction accuracy achieves the predicted CRLB. As an extension, we also consider steerable wavelet detectors.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
Gaussian and Sparse Processes Are Limits of Generalized Poisson Processes
Authors:
Julien Fageot,
Virginie Uhlmann,
Michael Unser
Abstract:
The theory of sparse stochastic processes offers a broad class of statistical models to study signals. In this framework, signals are represented as realizations of random processes that are solution of linear stochastic differential equations driven by white Lévy noises. Among these processes, generalized Poisson processes based on compound-Poisson noises admit an interpretation as random L-splin…
▽ More
The theory of sparse stochastic processes offers a broad class of statistical models to study signals. In this framework, signals are represented as realizations of random processes that are solution of linear stochastic differential equations driven by white Lévy noises. Among these processes, generalized Poisson processes based on compound-Poisson noises admit an interpretation as random L-splines with random knots and weights. We demonstrate that every generalized Lévy process-from Gaussian to sparse-can be understood as the limit in law of a sequence of generalized Poisson processes. This enables a new conceptual understanding of sparse processes and suggests simple algorithms for the numerical generation of such objects.
△ Less
Submitted 16 February, 2017;
originally announced February 2017.