Skip to main content

Showing 1–40 of 40 results for author: van Hoof, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.01871  [pdf, other

    cs.LG cs.AI cs.RO

    Data Augmentation for Instruction Following Policies via Trajectory Segmentation

    Authors: Niklas Höpner, Ilaria Tiddi, Herke van Hoof

    Abstract: The scalability of instructable agents in robotics or gaming is often hindered by limited data that pairs instructions with agent trajectories. However, large datasets of unannotated trajectories containing sequences of various agent behaviour (play trajectories) are often available. In a semi-supervised setup, we explore methods to extract labelled segments from play trajectories. The goal is to… ▽ More

    Submitted 25 February, 2025; originally announced March 2025.

  2. arXiv:2502.14777  [pdf, other

    cs.AI

    Making Universal Policies Universal

    Authors: Niklas Höpner, David Kuric, Herke van Hoof

    Abstract: The development of a generalist agent capable of solving a wide range of sequential decision-making tasks remains a significant challenge. We address this problem in a cross-agent setup where agents share the same observation space but differ in their action spaces. Our approach builds on the universal policy framework, which decouples policy learning into two stages: a diffusion-based planner tha… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  3. arXiv:2501.03264  [pdf, other

    cs.LG cs.AI cs.NE

    Bridge the Inference Gaps of Neural Processes via Expectation Maximization

    Authors: Qi Wang, Marco Federici, Herke van Hoof

    Abstract: The neural process (NP) is a family of computationally efficient models for learning distributions over functions. However, it suffers from under-fitting and shows suboptimal performance in practice. Researchers have primarily focused on incorporating diverse structural inductive biases, \textit{e.g.} attention or convolution, in modeling. The topic of inference suboptimality and an analysis of th… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: ICLR2023

  4. arXiv:2408.04332  [pdf, other

    cs.IR

    Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits

    Authors: Masoud Mansoury, Bamshad Mobasher, Herke van Hoof

    Abstract: Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This bias becomes particularly problematic over time as a few items are repeatedly over-represented in recommendation lists, leading to a feedback loop that further amplifies this bias. Although extensive research has addressed this issue in model-based or… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  5. Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems

    Authors: Jin Huang, Harrie Oosterhuis, Masoud Mansoury, Herke van Hoof, Maarten de Rijke

    Abstract: Two typical forms of bias in user interaction data with recommender systems (RSs) are popularity bias and positivity bias, which manifest themselves as the over-representation of interactions with popular items or items that users prefer, respectively. Debiasing methods aim to mitigate the effect of selection bias on the evaluation and optimization of RSs. However, existing debiasing methods only… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: SIGIR 2024

  6. arXiv:2403.15301  [pdf, other

    cs.LG cs.AI

    Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

    Authors: Guillermo Infante, David Kuric, Anders Jonsson, Vicenç Gómez, Herke van Hoof

    Abstract: Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subprob… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  7. arXiv:2311.02129  [pdf, other

    cs.LG cs.AI eess.SY

    Hierarchical Reinforcement Learning for Power Network Topology Control

    Authors: Blazej Manczak, Jan Viebahn, Herke van Hoof

    Abstract: Learning in high-dimensional action spaces is a key challenge in applying reinforcement learning (RL) to real-world systems. In this paper, we study the possibility of controlling power networks using RL methods. Power networks are critical infrastructures that are complex to control. In particular, the combinatorial nature of the action space poses a challenge to both conventional optimizers and… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  8. arXiv:2309.05477  [pdf, other

    cs.LG

    Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes

    Authors: Tim Bakker, Herke van Hoof, Max Welling

    Abstract: Pool-based active learning (AL) is a promising technology for increasing data-efficiency of machine learning models. However, surveys show that performance of recent AL methods is very sensitive to the choice of dataset and training setting, making them unsuitable for general application. In order to tackle this problem, the field Learning Active Learning (LAL) suggests to learn the active learnin… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted at ECML 2023

  9. arXiv:2302.03438  [pdf, other

    cs.LG cs.AI cs.MA

    Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

    Authors: Robert Loftin, Mustafa Mert Çelikok, Herke van Hoof, Samuel Kaski, Frans A. Oliehoek

    Abstract: In multi-agent problems requiring a high degree of cooperation, success often depends on the ability of the agents to adapt to each other's behavior. A natural solution concept in such settings is the Stackelberg equilibrium, in which the ``leader'' agent selects the strategy that maximizes its own payoff given that the ``follower'' agent will choose their best response to this strategy. Recent wo… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2024

  10. arXiv:2212.11726  [pdf, other

    cs.LG

    Reusable Options through Gradient-based Meta Learning

    Authors: David Kuric, Herke van Hoof

    Abstract: Hierarchical methods in reinforcement learning have the potential to reduce the amount of decisions that the agent needs to perform when learning new tasks. However, finding reusable useful temporal abstractions that facilitate fast learning remains a challenging problem. Recently, several deep learning approaches were proposed to learn such temporal abstractions in the form of options in an end-t… ▽ More

    Submitted 4 April, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Published in Transactions on Machine Learning Research (TMLR)

  11. arXiv:2209.01665  [pdf, ps, other

    cs.IR

    Exposure-Aware Recommendation using Contextual Bandits

    Authors: Masoud Mansoury, Bamshad Mobasher, Herke van Hoof

    Abstract: Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This is especially problematic when bias is amplified over time as a few items (e.g., popular ones) are repeatedly over-represented in recommendation lists and users' interactions with those items will amplify bias towards those items over time resulting i… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

  12. arXiv:2208.09570  [pdf, ps, other

    cs.LG

    Calculus on MDPs: Potential Shaping as a Gradient

    Authors: Erik Jenner, Herke van Hoof, Adam Gleave

    Abstract: In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce. A particularly well-known and important example is potential shaping, a class of functions that can be added to any reward function without changing the optimal policy set under arbitrary transition dynamics. Potential shaping is conceptually similar to potentials, conservative vec… ▽ More

    Submitted 2 December, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: Fixed mistake in proof that affected several results

  13. arXiv:2207.05899  [pdf, other

    cs.LG

    Neural Topological Ordering for Computation Graphs

    Authors: Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke Van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan

    Abstract: Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal topological order on a directed acyclic graph with focus on the memory minimization problem which arises in compilers. We propose an end-to-end machine learning base… ▽ More

    Submitted 7 October, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: To appear in NeurIPS 2022

  14. arXiv:2203.04378  [pdf, other

    cs.AI cs.LG

    Logic-based AI for Interpretable Board Game Winner Prediction with Tsetlin Machine

    Authors: Charul Giri, Ole-Christoffer Granmo, Herke van Hoof, Christian D. Blakely

    Abstract: Hex is a turn-based two-player connection game with a high branching factor, making the game arbitrarily complex with increasing board sizes. As such, top-performing algorithms for playing Hex rely on accurate evaluation of board positions using neural networks. However, the limited interpretability of neural networks is problematic when the user wants to understand the reasoning behind the predic… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

  15. arXiv:2203.03355  [pdf, other

    cs.AI cs.LG cs.MA

    Reliably Re-Acting to Partner's Actions with the Social Intrinsic Motivation of Transfer Empowerment

    Authors: Tessa van der Heiden, Herke van Hoof, Efstratios Gavves, Christoph Salge

    Abstract: We consider multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks. MARL agents can be brittle because they can overfit their training partners' policies. This overfitting can produce agents that adopt policies that act under the expectation that other agents will act in a certain way rather than react to their actions. Our objective is to bias the learning… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2012.08255

  16. arXiv:2203.03078  [pdf, other

    cs.LG cs.AI

    Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

    Authors: Alexander Long, Alan Blair, Herke van Hoof

    Abstract: We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing.… ▽ More

    Submitted 6 March, 2022; originally announced March 2022.

    Comments: AAAI2022

  17. arXiv:2201.12126  [pdf, other

    cs.AI cs.LG

    Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods

    Authors: Niklas Höpner, Ilaria Tiddi, Herke van Hoof

    Abstract: Enabling reinforcement learning (RL) agents to leverage a knowledge base while learning from experience promises to advance RL in knowledge intensive domains. However, it has proven difficult to leverage knowledge that is not manually tailored to the environment. We propose to use the subclass relationships present in open-source knowledge graphs to abstract away from specific objects. We develop… ▽ More

    Submitted 1 May, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  18. arXiv:2110.04495  [pdf, other

    cs.LG cs.MA

    Multi-Agent MDP Homomorphic Networks

    Authors: Elise van der Pol, Herke van Hoof, Frans A. Oliehoek, Max Welling

    Abstract: This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observ… ▽ More

    Submitted 29 April, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Camera ready version

  19. Hierarchies of Planning and Reinforcement Learning for Robot Navigation

    Authors: Jan Wöhlke, Felix Schmitt, Herke van Hoof

    Abstract: Solving robotic navigation tasks via reinforcement learning (RL) is challenging due to their sparse reward and long decision horizon nature. However, in many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available. Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation and using sub-goal… ▽ More

    Submitted 5 November, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

    Comments: 7 pages, 5 figures, 2021 IEEE International Conference on Robotics and Automation (ICRA), v2: DOI number added

  20. arXiv:2109.00157  [pdf, other

    cs.LG cs.AI

    A Survey of Exploration Methods in Reinforcement Learning

    Authors: Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup

    Abstract: Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments. Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning. In this article, we provide a survey of modern… ▽ More

    Submitted 2 September, 2021; v1 submitted 31 August, 2021; originally announced September 2021.

  21. arXiv:2103.12142  [pdf, other

    cs.LG cs.AI

    Combining Reward Information from Multiple Sources

    Authors: Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof

    Abstract: Given two sources of evidence about a latent variable, one can combine the information from both by multiplying the likelihoods of each piece of evidence. However, when one or both of the observation models are misspecified, the distributions will conflict. We study this problem in the setting with two conflicting reward functions learned from different sources. In such a setting, we would like to… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

  22. arXiv:2102.11756  [pdf, other

    cs.LG stat.ML

    Deep Policy Dynamic Programming for Vehicle Routing Problems

    Authors: Wouter Kool, Herke van Hoof, Joaquim Gromicho, Max Welling

    Abstract: Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms guarantee optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims… ▽ More

    Submitted 2 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: 21 pages

  23. arXiv:2102.08291  [pdf, other

    cs.LG

    Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

    Authors: Qi Wang, Herke van Hoof

    Abstract: Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications. Model-based meta reinforcement learning addresses these issues by learning dynamics and leveraging knowledge from prior experience. In this paper, we take a closer look at this framework, and propose… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

  24. arXiv:2012.08255  [pdf, other

    cs.MA

    Robust Multi-Agent Reinforcement Learning with Social Empowerment for Coordination and Communication

    Authors: T. van der Heiden, C. Salge, E. Gavves, H. van Hoof

    Abstract: We consider the problem of robust multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks. MARL agents, mainly those trained in a centralized way, can be brittle because they can adopt policies that act under the expectation that other agents will act a certain way rather than react to their actions. Our objective is to bias the learning process towards findi… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  25. arXiv:2010.16262  [pdf, other

    cs.CV cs.LG cs.NE

    Experimental design for MRI by greedy policy search

    Authors: Tim Bakker, Herke van Hoof, Max Welling

    Abstract: In today's clinical practice, magnetic resonance imaging (MRI) is routinely accelerated through subsampling of the associated Fourier domain. Currently, the construction of these subsampling strategies - known as experimental design - relies primarily on heuristics. We propose to learn experimental design strategies for accelerated MRI with policy gradient methods. Unexpectedly, our experiments sh… ▽ More

    Submitted 15 December, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: Accepted to NeurIPS 2020 (spotlight), 15-12-2020: Fixed typos, Figure 9, and pseudocode

  26. arXiv:2008.09469  [pdf, other

    cs.LG stat.AP stat.ML

    Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables

    Authors: Qi Wang, Herke van Hoof

    Abstract: Neural processes (NPs) constitute a family of variational approximate models for stochastic processes with promising properties in computational efficiency and uncertainty quantification. These processes use neural networks with latent variable inputs to induce predictive distributions. However, the expressiveness of vanilla NPs is limited as they only use a global latent variable, while target sp… ▽ More

    Submitted 30 October, 2020; v1 submitted 21 August, 2020; originally announced August 2020.

  27. arXiv:2007.01599  [pdf

    cs.AI cs.LG

    An Autonomous Free Airspace En-route Controller using Deep Reinforcement Learning Techniques

    Authors: Joris Mollinga, Herke van Hoof

    Abstract: Air traffic control is becoming a more and more complex task due to the increasing number of aircraft. Current air traffic control methods are not suitable for managing this increased traffic. Autonomous air traffic control is deemed a promising alternative. In this paper an air traffic control model is presented that guides an arbitrary number of aircraft across a three-dimensional, unstructured… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

    Comments: Published at ICRAT2020

  28. arXiv:2006.16908  [pdf, other

    cs.LG stat.ML

    MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

    Authors: Elise van der Pol, Daniel E. Worrall, Herke van Hoof, Frans A. Oliehoek, Max Welling

    Abstract: This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance con… ▽ More

    Submitted 20 January, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

  29. arXiv:2003.08158  [pdf, other

    cs.MA cs.AI cs.LG

    Social Navigation with Human Empowerment driven Deep Reinforcement Learning

    Authors: Tessa van der Heiden, Florian Mirus, Herke van Hoof

    Abstract: Mobile robot navigation has seen extensive research in the last decades. The aspect of collaboration with robots and humans sharing workspaces will become increasingly important in the future. Therefore, the next generation of mobile robots needs to be socially-compliant to be accepted by their human collaborators. However, a formal definition of compliance is not straightforward. On the other han… ▽ More

    Submitted 5 August, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

  30. arXiv:2002.06043  [pdf, other

    cs.LG stat.ML

    Estimating Gradients for Discrete Random Variables by Sampling without Replacement

    Authors: Wouter Kool, Herke van Hoof, Max Welling

    Abstract: We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in con… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: ICLR 2020

  31. arXiv:1910.10367  [pdf, other

    stat.ML cs.LG

    Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales

    Authors: Sanjay Thakur, Herke Van Hoof, Gunshi Gupta, David Meger

    Abstract: Neural Network based controllers hold enormous potential to learn complex, high-dimensional functions. However, they are prone to overfitting and unwarranted extrapolations. PAC Bayes is a generalized framework which is more resistant to overfitting and that yields performance bounds that hold with arbitrarily high probability even on the unjustified extrapolations. However, optimizing to learn su… ▽ More

    Submitted 17 December, 2019; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: 13 pages, 8 figures, 8 tables

  32. Reinforcement Learning with Non-uniform State Representations for Adaptive Search

    Authors: Sandeep Manjanna, Herke van Hoof, Gregory Dudek

    Abstract: Efficient spatial exploration is a key aspect of search and rescue. In this paper, we present a search algorithm that generates efficient trajectories that optimize the rate at which probability mass is covered by a searcher. This should allow an autonomous vehicle find one or more lost targets as rapidly as possible. We do this by performing non-uniform sampling of the search region. The path gen… ▽ More

    Submitted 15 June, 2019; originally announced June 2019.

    Comments: Published at IEEE INTERNATIONAL SYMPOSIUM ON SAFETY, SECURITY AND RESCUE ROBOTICS 2018

  33. arXiv:1903.06059  [pdf, other

    cs.LG stat.ML

    Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

    Authors: Wouter Kool, Herke van Hoof, Max Welling

    Abstract: The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows… ▽ More

    Submitted 29 May, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

    Comments: ICML 2019 ; 13 pages, 4 figures

  34. arXiv:1903.05697  [pdf, other

    cs.RO cs.LG

    Uncertainty Aware Learning from Demonstrations in Multiple Contexts using Bayesian Neural Networks

    Authors: Sanjay Thakur, Herke van Hoof, Juan Camilo Gamboa Higuera, Doina Precup, David Meger

    Abstract: Diversity of environments is a key challenge that causes learned robotic controllers to fail due to the discrepancies between the training and evaluation conditions. Training from demonstrations in various conditions can mitigate---but not completely prevent---such failures. Learned controllers such as neural networks typically do not have a notion of uncertainty that allows to diagnose an offset… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: Copyright 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  35. arXiv:1901.01777  [pdf, ps, other

    stat.ML cs.LG

    Understanding partition comparison indices based on counting object pairs

    Authors: Matthijs J. Warrens, Hanneke van der Hoef

    Abstract: In unsupervised machine learning, agreement between partitions is commonly assessed with so-called external validity indices. Researchers tend to use and report indices that quantify agreement between two partitions for all clusters simultaneously. Commonly used examples are the Rand index and the adjusted Rand index. Since these overall measures give a general notion of what is going on, their va… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

    Comments: 29 pages, 7 tables

    MSC Class: 62H30; 62H20

  36. arXiv:1812.01180  [pdf, other

    cs.CV

    Deep Generative Modeling of LiDAR Data

    Authors: Lucas Caccia, Herke van Hoof, Aaron Courville, Joelle Pineau

    Abstract: Building models capable of generating structured output is a key challenge for AI and robotics. While generative models have been explored on many types of data, little work has been done on synthesizing lidar scans, which play a key role in robot mapping and localization. In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map.… ▽ More

    Submitted 2 December, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

    Comments: Presented at IROS 2019

  37. arXiv:1809.09672  [pdf, other

    cs.CL

    BanditSum: Extractive Summarization as a Contextual Bandit

    Authors: Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung

    Abstract: In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summ… ▽ More

    Submitted 7 May, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

    Comments: 12 pages, 2 figures, EMNLP 2018

  38. arXiv:1803.08475  [pdf, other

    stat.ML cs.LG

    Attention, Learn to Solve Routing Problems!

    Authors: Wouter Kool, Herke van Hoof, Max Welling

    Abstract: The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development. However, to push this idea towards practical implementation, we need better models and better ways of training. We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this mode… ▽ More

    Submitted 7 February, 2019; v1 submitted 22 March, 2018; originally announced March 2018.

    Comments: Accepted at ICLR 2019. 25 pages, 7 figures

  39. arXiv:1802.09477  [pdf, other

    cs.AI cs.LG stat.ML

    Addressing Function Approximation Error in Actor-Critic Methods

    Authors: Scott Fujimoto, Herke van Hoof, David Meger

    Abstract: In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value bet… ▽ More

    Submitted 22 October, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

    Comments: Accepted at ICML 2018

  40. arXiv:1611.03231  [pdf, ps, other

    stat.ML cs.LG

    Policy Search with High-Dimensional Context Variables

    Authors: Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama

    Abstract: Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such a… ▽ More

    Submitted 10 November, 2016; originally announced November 2016.