Skip to main content

Showing 1–4 of 4 results for author: Penedones, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2201.13157  [pdf, other

    cs.LG cs.DM

    Equivariant neural networks for recovery of Hadamard matrices

    Authors: Augusto Peres, Eduardo Dias, Luís Sarmento, Hugo Penedones

    Abstract: We propose a message passing neural network architecture designed to be equivariant to column and row permutations of a matrix. We illustrate its advantages over traditional architectures like multi-layer perceptrons (MLPs), convolutional neural networks (CNNs) and even Transformers, on the combinatorial optimization task of recovering a set of deleted entries of a Hadamard matrix. We argue that t… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

  2. arXiv:1906.07987  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

    Authors: Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

    Abstract: We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this pa… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

  3. arXiv:1807.03064  [pdf, other

    cs.LG stat.ML

    Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

    Authors: Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy Mann, Andre Barreto

    Abstract: Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation. To increase our understanding of the problem, we investigate the issue of approximation errors in areas of sharp discontinuities of the value function being further propagated by bootstr… ▽ More

    Submitted 9 July, 2018; originally announced July 2018.

  4. arXiv:1612.09465  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive Lambda Least-Squares Temporal Difference Learning

    Authors: Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester

    Abstract: Temporal Difference learning or TD($λ$) is a fundamental algorithm in the field of reinforcement learning. However, setting TD's $λ$ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the $λ$ selection problem as a bias-variance trade-off where the solution is the value of $λ$ that leads to the smallest Mean Squared Value Error (MSVE). To… ▽ More

    Submitted 30 December, 2016; originally announced December 2016.