Search | arXiv e-print repository

arXiv:2505.19682 [pdf, other]

Deep Actor-Critics with Tight Risk Certificates

Authors: Bahareh Tasdighi, Manuel Haussmann, Yi-Shan Wu, Andres R. Masegosa, Melih Kandemir

Abstract: After a period of research, deep actor-critic algorithms have reached a level where they influence our everyday lives. They serve as the driving force behind the continual improvement of large language models through user-collected feedback. However, their deployment in physical systems is not yet widely adopted, mainly because no validation scheme that quantifies their risk of malfunction. We dem… ▽ More After a period of research, deep actor-critic algorithms have reached a level where they influence our everyday lives. They serve as the driving force behind the continual improvement of large language models through user-collected feedback. However, their deployment in physical systems is not yet widely adopted, mainly because no validation scheme that quantifies their risk of malfunction. We demonstrate that it is possible to develop tight risk certificates for deep actor-critic algorithms that predict generalization performance from validation-time observations. Our key insight centers on the effectiveness of minimal evaluation data. Surprisingly, a small feasible of evaluation roll-outs collected from a pretrained policy suffices to produce accurate risk certificates when combined with a simple adaptation of PAC-Bayes theory. Specifically, we adopt a recently introduced recursive PAC-Bayes approach, which splits validation data into portions and recursively builds PAC-Bayes bounds on the excess loss of each portion's predictor, using the predictor from the previous portion as a data-informed prior. Our empirical results across multiple locomotion tasks and policy expertise levels demonstrate risk certificates that are tight enough to be considered for practical use. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2411.02125 [pdf, ps, other]

Revisiting K-mer Profile for Effective and Scalable Genome Representation Learning

Authors: Abdulkadir Celikkanat, Andres R. Masegosa, Thomas D. Nielsen

Abstract: Obtaining effective representations of DNA sequences is crucial for genome analysis. Metagenomic binning, for instance, relies on genome representations to cluster complex mixtures of DNA fragments from biological samples with the aim of determining their microbial compositions. In this paper, we revisit k-mer-based representations of genomes and provide a theoretical analysis of their use in repr… ▽ More Obtaining effective representations of DNA sequences is crucial for genome analysis. Metagenomic binning, for instance, relies on genome representations to cluster complex mixtures of DNA fragments from biological samples with the aim of determining their microbial compositions. In this paper, we revisit k-mer-based representations of genomes and provide a theoretical analysis of their use in representation learning. Based on the analysis, we propose a lightweight and scalable model for performing metagenomic binning at the genome read level, relying only on the k-mer compositions of the DNA fragments. We compare the model to recent genome foundation models and demonstrate that while the models are comparable in performance, the proposed model is significantly more effective in terms of scalability, a crucial aspect for performing metagenomic binning of real-world datasets. △ Less

Submitted 4 November, 2024; originally announced November 2024.

Comments: Accepted to the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

arXiv:2401.01148 [pdf, other]

PAC-Bayes-Chernoff bounds for unbounded losses

Authors: Ioar Casado, Luis A. Ortega, Aritz Pérez, Andrés R. Masegosa

Abstract: We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cramér-Chernoff bounds to the PAC-Bayesian setting. The proof technique relies on controlling the tails of certain random variables involving the Cramér transform of the loss. Our approach naturally leverages properties of Cramér-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds. We… ▽ More We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cramér-Chernoff bounds to the PAC-Bayesian setting. The proof technique relies on controlling the tails of certain random variables involving the Cramér transform of the loss. Our approach naturally leverages properties of Cramér-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds. We highlight several applications of the main theorem. Firstly, we show that our bound recovers and generalizes previous results. Additionally, our approach allows working with richer assumptions that result in more informative and potentially tighter bounds. In this direction, we provide a general bound under a new \textit{model-dependent} assumption from which we obtain bounds based on parameter norms and log-Sobolev inequalities. Notably, many of these bounds can be minimized to obtain distributions beyond the Gibbs posterior and provide novel theoretical coverage to existing regularization techniques. △ Less

Submitted 30 October, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: Camera-ready version for NeurIPS 2024

arXiv:2310.01189 [pdf, other]

If there is no underfitting, there is no Cold Posterior Effect

Authors: Yijie Zhang, Yi-Shan Wu, Luis A. Ortega, Andrés R. Masegosa

Abstract: The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performances than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification p… ▽ More The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performances than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood function. In this work, we provide a more nuanced understanding of the CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 9 pages, 3 figures, ICLR 2024

arXiv:2306.10947 [pdf, other]

doi 10.1613/jair.1.17036

PAC-Chernoff Bounds: Understanding Generalization in the Interpolation Regime

Authors: Andrés R. Masegosa, Luis A. Ortega

Abstract: This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators, even within over-parameterized model classes. This bound, which relies on basic principles of Large Deviation Theory, defines a natural measure of the smoothness of a model, characterized by simple real-valued functions. Building upon this bound and the new concept of smoothness, we… ▽ More This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators, even within over-parameterized model classes. This bound, which relies on basic principles of Large Deviation Theory, defines a natural measure of the smoothness of a model, characterized by simple real-valued functions. Building upon this bound and the new concept of smoothness, we present an unified theoretical framework revealing why certain interpolators show an exceptional generalization, while others falter. We theoretically show how a wide spectrum of modern learning methodologies, encompassing techniques such as $\ell_2$-norm, distance-from-initialization and input-gradient regularization, in combination with data augmentation, invariant architectures, and over-parameterization, collectively guide the optimizer toward smoother interpolators, which, according to our theoretical framework, are the ones exhibiting superior generalization performance. This study shows that distribution-dependent bounds serve as a powerful tool to understand the complex dynamics behind the generalization capabilities of over-parameterized interpolators. △ Less

Submitted 10 February, 2025; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 60 pages, 12 figures, published at JAIR 2025

Journal ref: Journal of Artificial Intelligence Research 82 (2025) 503-562

arXiv:2112.06610 [pdf, other]

From Anecdote to Evidence: The Relationship Between Personality and Need for Cognition of Developers

Authors: Daniel Russo, Andres R. Masegosa, Klaas-Jan Stol

Abstract: There is considerable anecdotal evidence suggesting that software engineers enjoy engaging in solving puzzles and other cognitive efforts. A tendency to engage in and enjoy effortful thinking is referred to as a person's 'need for cognition.' In this article we study the relationship between software engineers' personality traits and their need for cognition. Through a large-scale sample study of… ▽ More There is considerable anecdotal evidence suggesting that software engineers enjoy engaging in solving puzzles and other cognitive efforts. A tendency to engage in and enjoy effortful thinking is referred to as a person's 'need for cognition.' In this article we study the relationship between software engineers' personality traits and their need for cognition. Through a large-scale sample study of 483 respondents we collected data to capture the six 'bright' personality traits of the HEXACO model of personality, and three `dark' personality traits. Data were analyzed using several methods including a multiple Bayesian linear regression analysis. The results indicate that ca. 33% of variation in developers' need for cognition can be explained by personality traits. The Bayesian analysis suggests four traits to be of particular interest in predicting need for cognition: openness to experience, conscientiousness, honesty-humility, and emotionality. Further, we also find that need for cognition of software engineers is, on average, higher than in the general population, based on a comparison with prior studies. Given the importance of human factors for software engineers' performance in general, and problem solving skills in particular, our findings suggest several implications for recruitment, working behavior, and teaming. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Journal ref: Empirical Software Engineering, 2022

arXiv:2110.13786 [pdf, other]

Diversity and Generalization in Neural Network Ensembles

Authors: Luis A. Ortega, Rafael Cabañas, Andrés R. Masegosa

Abstract: Ensembles are widely used in machine learning and, usually, provide state-of-the-art performance in many prediction tasks. From the very beginning, the diversity of an ensemble has been identified as a key factor for the superior performance of these models. But the exact role that diversity plays in ensemble models is poorly understood, specially in the context of neural networks. In this work, w… ▽ More Ensembles are widely used in machine learning and, usually, provide state-of-the-art performance in many prediction tasks. From the very beginning, the diversity of an ensemble has been identified as a key factor for the superior performance of these models. But the exact role that diversity plays in ensemble models is poorly understood, specially in the context of neural networks. In this work, we combine and expand previously published results in a theoretically sound framework that describes the relationship between diversity and ensemble performance for a wide range of ensemble methods. More precisely, we provide sound answers to the following questions: how to measure diversity, how diversity relates to the generalization error of an ensemble, and how diversity is promoted by neural network ensemble algorithms. This analysis covers three widely used loss functions, namely, the squared loss, the cross-entropy loss, and the 0-1 loss; and two widely used model combination strategies, namely, model averaging and weighted majority vote. We empirically validate this theoretical analysis with neural network ensembles. △ Less

Submitted 16 February, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

arXiv:2106.13624 [pdf, other]

Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote

Authors: Yi-Shan Wu, Andrés R. Masegosa, Stephan S. Lorenzen, Christian Igel, Yevgeny Seldin

Abstract: We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev- Cantelli inequality (a.k.a. one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain e… ▽ More We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev- Cantelli inequality (a.k.a. one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain et al., 2015], and, at the same time, it improves on the oracle bound based on second order Markov's inequality introduced by Masegosa et al. [2020]. We also derive a new concentration of measure inequality, which we name PAC-Bayes-Bennett, since it combines PAC-Bayesian bounding with Bennett's inequality. We use it for empirical estimation of the oracle bound. The PAC-Bayes-Bennett inequality improves on the PAC-Bayes-Bernstein inequality of Seldin et al. [2012]. We provide an empirical evaluation demonstrating that the new bounds can improve on the work of Masegosa et al. [2020]. Both the parametric form of the Chebyshev-Cantelli inequality and the PAC-Bayes-Bennett inequality may be of independent interest for the study of concentration of measure in other domains. △ Less

Submitted 17 January, 2023; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: aligned with the camera-ready version published at NeurIPS 2021

arXiv:2007.13532 [pdf, other]

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

Authors: Andrés R. Masegosa, Stephan S. Lorenzen, Christian Igel, Yevgeny Seldin

Abstract: We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to ex… ▽ More We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble. △ Less

Submitted 17 December, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

arXiv:1912.08335 [pdf, other]

Learning under Model Misspecification: Applications to Variational and Ensemble methods

Authors: Andres R. Masegosa

Abstract: Virtually any model we use in machine learning to make predictions does not perfectly represent reality. So, most of the learning happens under model misspecification. In this work, we present a novel analysis of the generalization performance of Bayesian model averaging under model misspecification and i.i.d. data using a new family of second-order PAC-Bayes bounds. This analysis shows, in simple… ▽ More Virtually any model we use in machine learning to make predictions does not perfectly represent reality. So, most of the learning happens under model misspecification. In this work, we present a novel analysis of the generalization performance of Bayesian model averaging under model misspecification and i.i.d. data using a new family of second-order PAC-Bayes bounds. This analysis shows, in simple and intuitive terms, that Bayesian model averaging provides suboptimal generalization performance when the model is misspecified. In consequence, we provide strong theoretical arguments showing that Bayesian methods are not optimal for learning predictive models, unless the model class is perfectly specified. Using novel second-order PAC-Bayes bounds, we derive a new family of Bayesian-like algorithms, which can be implemented as variational and ensemble methods. The output of these algorithms is a new posterior distribution, different from the Bayesian posterior, which induces a posterior predictive distribution with better generalization performance. Experiments with Bayesian neural networks illustrate these findings. △ Less

Submitted 22 October, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

Comments: Camera-Ready Version. NeurIPS 2020. Minor changes

arXiv:1908.11161 [pdf, other]

InferPy: Probabilistic Modeling with Deep Neural Networks Made Easy

Authors: Javier Cózar, Rafael Cabañas, Antonio Salmerón, Andrés R. Masegosa

Abstract: InferPy is a Python package for probabilistic modeling with deep neural networks. It defines a user-friendly API that trades-off model complexity with ease of use, unlike other libraries whose focus is on dealing with very general probabilistic models at the cost of having a more complex API. In particular, this package allows to define, learn and evaluate general hierarchical probabilistic models… ▽ More InferPy is a Python package for probabilistic modeling with deep neural networks. It defines a user-friendly API that trades-off model complexity with ease of use, unlike other libraries whose focus is on dealing with very general probabilistic models at the cost of having a more complex API. In particular, this package allows to define, learn and evaluate general hierarchical probabilistic models containing deep neural networks in a compact and simple way. InferPy is built on top of Tensorflow Probability and Keras. △ Less

Submitted 12 February, 2020; v1 submitted 29 August, 2019; originally announced August 2019.

Comments: 5 pages limit (paper submitted to an original software publication track). This paper briefly describes a scientific software

arXiv:1908.03442 [pdf, other]

Probabilistic Models with Deep Neural Networks

Authors: Andrés R. Masegosa, Rafael Cabañas, Helge Langseth, Thomas D. Nielsen, Antonio Salmerón

Abstract: Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inf… ▽ More Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inference, a general form of approximate probabilistic inference originated in statistical physics, are allowing probabilistic modeling to overcome these restrictions: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computation engines allow to apply probabilistic modeling over massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within a probabilistic model to capture complex non-linear stochastic relationships between random variables. These advances in conjunction with the release of novel probabilistic modeling toolboxes have greatly expanded the scope of application of probabilistic models, and allow these models to take advantage of the recent strides made by the deep learning community. In this paper we review the main concepts, methods and tools needed to use deep neural networks within a probabilistic modeling framework. △ Less

Submitted 2 October, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

arXiv:1704.01427 [pdf, ps, other]

doi 10.1016/j.knosys.2018.09.019

AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning

Authors: Andrés R. Masegosa, Ana M. Martínez, Darío Ramos-López, Rafael Cabañas, Antonio Salmerón, Thomas D. Nielsen, Helge Langseth, Anders L. Madsen

Abstract: The AMIDST Toolbox is a software for scalable probabilistic machine learning with a spe- cial focus on (massive) streaming data. The toolbox supports a flexible modeling language based on probabilistic graphical models with latent variables and temporal dependencies. The specified models can be learnt from large data sets using parallel or distributed implementa- tions of Bayesian learning algorit… ▽ More The AMIDST Toolbox is a software for scalable probabilistic machine learning with a spe- cial focus on (massive) streaming data. The toolbox supports a flexible modeling language based on probabilistic graphical models with latent variables and temporal dependencies. The specified models can be learnt from large data sets using parallel or distributed implementa- tions of Bayesian learning algorithms for either streaming or batch data. These algorithms are based on a flexible variational message passing scheme, which supports discrete and continu- ous variables from a wide range of probability distributions. AMIDST also leverages existing functionality and algorithms by interfacing to software tools such as Flink, Spark, MOA, Weka, R and HUGIN. AMIDST is an open source toolbox written in Java and available at http://www.amidsttoolbox.com under the Apache Software License version 2.0. △ Less

Submitted 4 April, 2017; originally announced April 2017.

ACM Class: I.2.6

arXiv:1604.07990 [pdf, other]

doi 10.1109/MCI.2016.2532267

Probabilistic Graphical Models on Multi-Core CPUs using Java 8

Authors: Andres R. Masegosa, Ana M. Martinez, Hanen Borchani

Abstract: In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum… ▽ More In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions. △ Less

Submitted 27 April, 2016; originally announced April 2016.

Comments: Pre-print version of the paper presented in the special issue on Computational Intelligence Software at IEEE Computational Intelligence Magazine journal

Journal ref: IEEE Computational Intelligence Magazine, 11(2), 41-54. 2016

arXiv:1410.1784 [pdf, ps, other]

Stochastic Discriminative EM

Authors: Andres R. Masegosa

Abstract: Stochastic discriminative EM (sdEM) is an online-EM-type algorithm for discriminative training of probabilistic generative models belonging to the exponential family. In this work, we introduce and justify this algorithm as a stochastic natural gradient descent method, i.e. a method which accounts for the information geometry in the parameter space of the statistical model. We show how this learni… ▽ More Stochastic discriminative EM (sdEM) is an online-EM-type algorithm for discriminative training of probabilistic generative models belonging to the exponential family. In this work, we introduce and justify this algorithm as a stochastic natural gradient descent method, i.e. a method which accounts for the information geometry in the parameter space of the statistical model. We show how this learning algorithm can be used to train probabilistic generative models by minimizing different discriminative loss functions, such as the negative conditional log-likelihood and the Hinge loss. The resulting models trained by sdEM are always generative (i.e. they define a joint probability distribution) and, in consequence, allows to deal with missing data and latent variables in a principled way either when being learned or when making predictions. The performance of this method is illustrated by several text classification problems for which a multinomial naive Bayes and a latent Dirichlet allocation based classifier are learned using different discriminative loss functions. △ Less

Submitted 2 October, 2014; originally announced October 2014.

Comments: UAI 2014 paper + Supplementary Material. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI 2014), edited by Nevin L. Zhang and Jian Tian. AUAI Press

Journal ref: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence UAI-2014 (pp. 573-582). AUAI Press

Showing 1–15 of 15 results for author: Masegosa, A R