Search | arXiv e-print repository

Collapse-Proof Non-Contrastive Self-Supervised Learning

Authors: Emanuele Sansone, Tim Lebailly, Tinne Tuytelaars

Abstract: We present a principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing. We theoretically demonstrate that this design introduces an inductive bias that encourages representations to be simultaneously decorrelated and clustered, without explicitly enforcing these properties. This bias provably enhances genera… ▽ More We present a principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing. We theoretically demonstrate that this design introduces an inductive bias that encourages representations to be simultaneously decorrelated and clustered, without explicitly enforcing these properties. This bias provably enhances generalization and suffices to avoid known training failure modes, such as representation, dimensional, cluster, and intracluster collapses. We validate our theoretical findings on image datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet-100. Our approach effectively combines the strengths of feature decorrelation and cluster-based self-supervised learning methods, overcoming training failure modes while achieving strong generalization in clustering and linear classification tasks. △ Less

Submitted 6 July, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

Comments: ICML 2025

arXiv:2408.08133 [pdf, other]

EXPLAIN, AGREE, LEARN: Scaling Learning for Neural Probabilistic Logic

Authors: Victor Verreet, Lennert De Smet, Luc De Raedt, Emanuele Sansone

Abstract: Neural probabilistic logic systems follow the neuro-symbolic (NeSy) paradigm by combining the perceptive and learning capabilities of neural networks with the robustness of probabilistic logic. Learning corresponds to likelihood optimization of the neural networks. However, to obtain the likelihood exactly, expensive probabilistic logic inference is required. To scale learning to more complex syst… ▽ More Neural probabilistic logic systems follow the neuro-symbolic (NeSy) paradigm by combining the perceptive and learning capabilities of neural networks with the robustness of probabilistic logic. Learning corresponds to likelihood optimization of the neural networks. However, to obtain the likelihood exactly, expensive probabilistic logic inference is required. To scale learning to more complex systems, we therefore propose to instead optimize a sampling based objective. We prove that the objective has a bounded error with respect to the likelihood, which vanishes when increasing the sample count. Furthermore, the error vanishes faster by exploiting a new concept of sample diversity. We then develop the EXPLAIN, AGREE, LEARN (EXAL) method that uses this objective. EXPLAIN samples explanations for the data. AGREE reweighs each explanation in concordance with the neural component. LEARN uses the reweighed explanations as a signal for learning. In contrast to previous NeSy methods, EXAL can scale to larger problem sizes while retaining theoretical guarantees on the error. Experimentally, our theoretical claims are verified and EXAL outperforms recent NeSy methods when scaling up the MNIST addition and Warcraft pathfinding problems. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2407.11244 [pdf, other]

(Deep) Generative Geodesics

Authors: Beomsu Kim, Michael Puthawala, Jong Chul Ye, Emanuele Sansone

Abstract: In this work, we propose to study the global geometrical properties of generative models. We introduce a new Riemannian metric to assess the similarity between any two data points. Importantly, our metric is agnostic to the parametrization of the generative model and requires only the evaluation of its data likelihood. Moreover, the metric leads to the conceptual definition of generative distances… ▽ More In this work, we propose to study the global geometrical properties of generative models. We introduce a new Riemannian metric to assess the similarity between any two data points. Importantly, our metric is agnostic to the parametrization of the generative model and requires only the evaluation of its data likelihood. Moreover, the metric leads to the conceptual definition of generative distances and generative geodesics, whose computation can be done efficiently in the data space. Their approximations are proven to converge to their true values under mild conditions. We showcase three proof-of-concept applications of this global metric, including clustering, data visualization, and data interpolation, thus providing new tools to support the geometrical understanding of generative models. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 10 pages, 9 figures

arXiv:2401.00873 [pdf, other]

Unifying Self-Supervised Clustering and Energy-Based Models

Authors: Emanuele Sansone, Robin Manhaeve

Abstract: Self-supervised learning excels at learning representations from large amounts of data. At the same time, generative models offer the complementary property of learning information about the underlying data generation process. In this study, we aim at establishing a principled connection between these two paradigms and highlight the benefits of their complementarity. In particular, we perform an a… ▽ More Self-supervised learning excels at learning representations from large amounts of data. At the same time, generative models offer the complementary property of learning information about the underlying data generation process. In this study, we aim at establishing a principled connection between these two paradigms and highlight the benefits of their complementarity. In particular, we perform an analysis of self-supervised learning objectives, elucidating the underlying probabilistic graphical models and presenting a standardized methodology for their derivation from first principles. The analysis suggests a natural means of integrating self-supervised learning with likelihood-based generative models. We instantiate this concept within the realm of cluster-based self-supervised learning and energy models, introducing a lower bound proven to reliably penalize the most important failure modes. Our theoretical findings are substantiated through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, demonstrating that our objective function allows to jointly train a backbone network in a discriminative and generative fashion, consequently outperforming existing self-supervised learning strategies in terms of clustering, generation and out-of-distribution detection performance by a wide margin. We also demonstrate that the solution can be integrated into a neuro-symbolic framework to tackle a simple yet non-trivial instantiation of the symbol grounding problem. △ Less

Submitted 9 March, 2025; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: Changes from previous version: added mean and standard deviations in experiments. Integral version of workshop paper arXiv:2309.15420. Improved GEDI version (from two stages to single stage training) arxiv:2212.13425

arXiv:2311.12569 [pdf, other]

Differentiable Sampling of Categorical Distributions Using the CatLog-Derivative Trick

Authors: Lennert De Smet, Emanuele Sansone, Pedro Zuidberg Dos Martires

Abstract: Categorical random variables can faithfully represent the discrete and uncertain aspects of data as part of a discrete latent variable model. Learning in such models necessitates taking gradients with respect to the parameters of the categorical probability distributions, which is often intractable due to their combinatorial nature. A popular technique to estimate these otherwise intractable gradi… ▽ More Categorical random variables can faithfully represent the discrete and uncertain aspects of data as part of a discrete latent variable model. Learning in such models necessitates taking gradients with respect to the parameters of the categorical probability distributions, which is often intractable due to their combinatorial nature. A popular technique to estimate these otherwise intractable gradients is the Log-Derivative trick. This trick forms the basis of the well-known REINFORCE gradient estimator and its many extensions. While the Log-Derivative trick allows us to differentiate through samples drawn from categorical distributions, it does not take into account the discrete nature of the distribution itself. Our first contribution addresses this shortcoming by introducing the CatLog-Derivative trick - a variation of the Log-Derivative trick tailored towards categorical distributions. Secondly, we use the CatLog-Derivative trick to introduce IndeCateR, a novel and unbiased gradient estimator for the important case of products of independent categorical distributions with provably lower variance than REINFORCE. Thirdly, we empirically show that IndeCateR can be efficiently implemented and that its gradient estimates have significantly lower bias and variance for the same number of samples compared to the state of the art. △ Less

Submitted 21 November, 2023; originally announced November 2023.

MSC Class: 68T05 ACM Class: G.3; G.4; I.2.6

arXiv:2309.15420 [pdf, other]

The Triad of Failure Modes and a Possible Way Out

Authors: Emanuele Sansone

Abstract: We present a novel objective function for cluster-based self-supervised learning (SSL) that is designed to circumvent the triad of failure modes, namely representation collapse, cluster collapse, and the problem of invariance to permutations of cluster assignments. This objective consists of three key components: (i) A generative term that penalizes representation collapse, (ii) a term that promot… ▽ More We present a novel objective function for cluster-based self-supervised learning (SSL) that is designed to circumvent the triad of failure modes, namely representation collapse, cluster collapse, and the problem of invariance to permutations of cluster assignments. This objective consists of three key components: (i) A generative term that penalizes representation collapse, (ii) a term that promotes invariance to data augmentations, thereby addressing the issue of label permutations and (ii) a uniformity term that penalizes cluster collapse. Additionally, our proposed objective possesses two notable advantages. Firstly, it can be interpreted from a Bayesian perspective as a lower bound on the data log-likelihood. Secondly, it enables the training of a standard backbone architecture without the need for asymmetric elements like stop gradients, momentum encoders, or specialized clustering layers. Due to its simplicity and theoretical foundation, our proposed objective is well-suited for optimization. Experiments on both toy and real world data demonstrate its effectiveness △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Some sentences in the Background Section are overlapping with Section 2 in arXiv:2304.11357 However, the main technical content and all other sections are different

arXiv:2304.11357 [pdf, other]

Learning Symbolic Representations Through Joint GEnerative and DIscriminative Training

Authors: Emanuele Sansone, Robin Manhaeve

Abstract: We introduce GEDI, a Bayesian framework that combines existing self-supervised learning objectives with likelihood-based generative models. This framework leverages the benefits of both GEnerative and DIscriminative approaches, resulting in improved symbolic representations over standalone solutions. Additionally, GEDI can be easily integrated and trained jointly with existing neuro-symbolic frame… ▽ More We introduce GEDI, a Bayesian framework that combines existing self-supervised learning objectives with likelihood-based generative models. This framework leverages the benefits of both GEnerative and DIscriminative approaches, resulting in improved symbolic representations over standalone solutions. Additionally, GEDI can be easily integrated and trained jointly with existing neuro-symbolic frameworks without the need for additional supervision or costly pre-training steps. We demonstrate through experiments on real-world data, including SVHN, CIFAR10, and CIFAR100, that GEDI outperforms existing self-supervised learning strategies in terms of clustering performance by a significant margin. The symbolic component further allows it to leverage knowledge in the form of logical constraints to improve performance in the small data regime. △ Less

Submitted 22 April, 2023; originally announced April 2023.

Comments: ICLR 2023 Workshop NeSy-GeMs. arXiv admin note: substantial text overlap with arXiv:2212.13425

Journal ref: ICLR 2023 Workshop NeSy-GeMs

arXiv:2212.13425 [pdf, other]

GEDI: GEnerative and DIscriminative Training for Self-Supervised Learning

Authors: Emanuele Sansone, Robin Manhaeve

Abstract: Self-supervised learning is a popular and powerful method for utilizing large amounts of unlabeled data, for which a wide variety of training objectives have been proposed in the literature. In this study, we perform a Bayesian analysis of state-of-the-art self-supervised learning objectives and propose a unified formulation based on likelihood learning. Our analysis suggests a simple method for i… ▽ More Self-supervised learning is a popular and powerful method for utilizing large amounts of unlabeled data, for which a wide variety of training objectives have been proposed in the literature. In this study, we perform a Bayesian analysis of state-of-the-art self-supervised learning objectives and propose a unified formulation based on likelihood learning. Our analysis suggests a simple method for integrating self-supervised learning with generative models, allowing for the joint training of these two seemingly distinct approaches. We refer to this combined framework as GEDI, which stands for GEnerative and DIscriminative training. Additionally, we demonstrate an instantiation of the GEDI framework by integrating an energy-based model with a cluster-based self-supervised learning model. Through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, we show that GEDI outperforms existing self-supervised learning strategies in terms of clustering performance by a wide margin. We also demonstrate that GEDI can be integrated into a neural-symbolic framework to address tasks in the small data regime, where it can use logical constraints to further improve clustering and classification performance. △ Less

Submitted 6 February, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: Fixed typos/cleaned the experimental section

arXiv:2202.04178 [pdf, other]

VAEL: Bridging Variational Autoencoders and Probabilistic Logic Programming

Authors: Eleonora Misino, Giuseppe Marra, Emanuele Sansone

Abstract: We present VAEL, a neuro-symbolic generative model integrating variational autoencoders (VAE) with the reasoning capabilities of probabilistic logic (L) programming. Besides standard latent subsymbolic variables, our model exploits a probabilistic logic program to define a further structured representation, which is used for logical reasoning. The entire process is end-to-end differentiable. Once… ▽ More We present VAEL, a neuro-symbolic generative model integrating variational autoencoders (VAE) with the reasoning capabilities of probabilistic logic (L) programming. Besides standard latent subsymbolic variables, our model exploits a probabilistic logic program to define a further structured representation, which is used for logical reasoning. The entire process is end-to-end differentiable. Once trained, VAEL can solve new unseen generation tasks by (i) leveraging the previously acquired knowledge encoded in the neural component and (ii) exploiting new logical programs on the structured latent space. Our experiments provide support on the benefits of this neuro-symbolic integration both in terms of task generalization and data efficiency. To the best of our knowledge, this work is the first to propose a general-purpose end-to-end framework integrating probabilistic logic programming into a deep generative model. △ Less

Submitted 25 May, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

arXiv:2109.03867 [pdf, other]

LSB: Local Self-Balancing MCMC in Discrete Spaces

Authors: Emanuele Sansone

Abstract: We present the Local Self-Balancing sampler (LSB), a local Markov Chain Monte Carlo (MCMC) method for sampling in purely discrete domains, which is able to autonomously adapt to the target distribution and to reduce the number of target evaluations required to converge. LSB is based on (i) a parametrization of locally balanced proposals, (ii) a newly proposed objective function based on mutual inf… ▽ More We present the Local Self-Balancing sampler (LSB), a local Markov Chain Monte Carlo (MCMC) method for sampling in purely discrete domains, which is able to autonomously adapt to the target distribution and to reduce the number of target evaluations required to converge. LSB is based on (i) a parametrization of locally balanced proposals, (ii) a newly proposed objective function based on mutual information and (iii) a self-balancing learning procedure, which minimises the proposed objective to update the proposal parameters. Experiments on energy-based models and Markov networks show that LSB converges using a smaller number of queries to the oracle distribution compared to recent local MCMC samplers. △ Less

Submitted 4 July, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: ICML 2022

arXiv:2106.16060 [pdf, other]

Leveraging Hidden Structure in Self-Supervised Learning

Authors: Emanuele Sansone

Abstract: This work considers the problem of learning structured representations from raw images using self-supervised learning. We propose a principled framework based on a mutual information objective, which integrates self-supervised and structure learning. Furthermore, we devise a post-hoc procedure to interpret the meaning of the learnt representations. Preliminary experiments on CIFAR-10 show that the… ▽ More This work considers the problem of learning structured representations from raw images using self-supervised learning. We propose a principled framework based on a mutual information objective, which integrates self-supervised and structure learning. Furthermore, we devise a post-hoc procedure to interpret the meaning of the learnt representations. Preliminary experiments on CIFAR-10 show that the proposed framework achieves higher generalization performance in downstream classification tasks and provides more interpretable representations compared to the ones learnt through traditional self-supervised learning. △ Less

Submitted 30 June, 2021; originally announced June 2021.

arXiv:1802.03505 [pdf, other]

Coulomb Autoencoders

Authors: Emanuele Sansone, Hafiz Tiomoko Ali, Sun Jiacheng

Abstract: Learning the true density in high-dimensional feature spaces is a well-known problem in machine learning. In this work, we consider generative autoencoders based on maximum-mean discrepancy (MMD) and provide theoretical insights. In particular, (i) we prove that MMD coupled with Coulomb kernels has optimal convergence properties, which are similar to convex functionals, thus improving the training… ▽ More Learning the true density in high-dimensional feature spaces is a well-known problem in machine learning. In this work, we consider generative autoencoders based on maximum-mean discrepancy (MMD) and provide theoretical insights. In particular, (i) we prove that MMD coupled with Coulomb kernels has optimal convergence properties, which are similar to convex functionals, thus improving the training of autoencoders, and (ii) we provide a probabilistic bound on the generalization performance, highlighting some fundamental conditions to achieve better generalization. We validate the theory on synthetic examples and on the popular dataset of celebrities' faces, showing that our model, called Coulomb autoencoders, outperform the state-of-the-art. △ Less

Submitted 26 November, 2019; v1 submitted 9 February, 2018; originally announced February 2018.

arXiv:1710.01013 [pdf, other]

Training Feedforward Neural Networks with Standard Logistic Activations is Feasible

Authors: Emanuele Sansone, Francesco G. B. De Natale

Abstract: Training feedforward neural networks with standard logistic activations is considered difficult because of the intrinsic properties of these sigmoidal functions. This work aims at showing that these networks can be trained to achieve generalization performance comparable to those based on hyperbolic tangent activations. The solution consists on applying a set of conditions in parameter initializat… ▽ More Training feedforward neural networks with standard logistic activations is considered difficult because of the intrinsic properties of these sigmoidal functions. This work aims at showing that these networks can be trained to achieve generalization performance comparable to those based on hyperbolic tangent activations. The solution consists on applying a set of conditions in parameter initialization, which have been derived from the study of the properties of a single neuron from an information-theoretic perspective. The proposed initialization is validated through an extensive experimental analysis. △ Less

Submitted 3 October, 2017; originally announced October 2017.

arXiv:1608.06807 [pdf, ps, other]

doi 10.1109/TPAMI.2018.2860995

Efficient Training for Positive Unlabeled Learning

Authors: Emanuele Sansone, Francesco G. B. De Natale, Zhi-Hua Zhou

Abstract: Positive unlabeled (PU) learning is useful in various practical situations, where there is a need to learn a classifier for a class of interest from an unlabeled data set, which may contain anomalies as well as samples from unknown classes. The learning task can be formulated as an optimization problem under the framework of statistical learning theory. Recent studies have theoretically analyzed i… ▽ More Positive unlabeled (PU) learning is useful in various practical situations, where there is a need to learn a classifier for a class of interest from an unlabeled data set, which may contain anomalies as well as samples from unknown classes. The learning task can be formulated as an optimization problem under the framework of statistical learning theory. Recent studies have theoretically analyzed its properties and generalization performance, nevertheless, little effort has been made to consider the problem of scalability, especially when large sets of unlabeled data are available. In this work we propose a novel scalable PU learning algorithm that is theoretically proven to provide the optimal solution, while showing superior computational and memory performance. Experimental evaluation confirms the theoretical evidence and shows that the proposed method can be successfully applied to a large variety of real-world problems involving PU learning. △ Less

Submitted 14 March, 2018; v1 submitted 24 August, 2016; originally announced August 2016.

Comments: Submitted to IEEE TPAMI

Journal ref: 31 July 2018

arXiv:1608.06770 [pdf, other]

Automatic Synchronization of Multi-User Photo Galleries

Authors: E. Sansone, K. Apostolidis, N. Conci, G. Boato, V. Mezaris, F. G. B. De Natale

Abstract: In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solutio… ▽ More In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solution that achieves better generalization performance for the synchronization task compared to the available literature. The method is characterized by three stages: at first, deep convolutional neural network features are used to assess the visual similarity among the photos; then, pairs of similar photos are detected across different galleries and used to construct a graph; eventually, a probabilistic graphical model is used to estimate the temporal offset of each pair of galleries, by traversing the minimum spanning tree extracted from this graph. The experimental evaluation is conducted on four publicly available datasets covering different types of events, demonstrating the strength of our proposed method. A thorough discussion of the obtained results is provided for a critical assessment of the quality in synchronization. △ Less

Submitted 16 January, 2017; v1 submitted 24 August, 2016; originally announced August 2016.

Comments: ACCEPTED to IEEE Transactions on Multimedia

Showing 1–15 of 15 results for author: Sansone, E