Search | arXiv e-print repository

Regular Decision Processes for Grid Worlds

Authors: Nicky Lenaers, Martijn van Otterlo

Abstract: Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is… ▽ More Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds. △ Less

Submitted 9 November, 2021; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: 21 pages, 10 figures, accepted for oral presentation at the AI & ML conference for Belgium, Netherlands & Luxemburg (BNAIC/BeneLearn 2021), 10-12 November, Luxembourg

MSC Class: 68T05 (Primary); 68Q45 (Secondary) ACM Class: I.2.8

arXiv:1804.03592 [pdf, other]

A clustering-based reinforcement learning approach for tailored personalization of e-Health interventions

Authors: Ali el Hassouni, Mark Hoogendoorn, Martijn van Otterlo, A. E. Eiben, Vesa Muhonen, Eduardo Barbaro

Abstract: Personalization is very powerful in improving the effectiveness of health interventions. Reinforcement learning (RL) algorithms are suitable for learning these tailored interventions from sequential data collected about individuals. However, learning can be very fragile. The time to learn intervention policies is limited as disengagement from the user can occur quickly. Also, in e-Health intervent… ▽ More Personalization is very powerful in improving the effectiveness of health interventions. Reinforcement learning (RL) algorithms are suitable for learning these tailored interventions from sequential data collected about individuals. However, learning can be very fragile. The time to learn intervention policies is limited as disengagement from the user can occur quickly. Also, in e-Health intervention timing can be crucial before the optimal window passes. We present an approach that learns tailored personalization policies for groups of users by combining RL and clustering. The benefits are two-fold: speeding up the learning to prevent disengagement while maintaining a high level of personalization. Our clustering approach utilizes dynamic time warping to compare user trajectories consisting of states and rewards. We apply online and batch RL to learn policies over clusters of individuals and introduce our self-developed and publicly available simulator for e-Health interventions to evaluate our approach. We compare our methods with an e-Health intervention benchmark. We demonstrate that batch learning outperforms online learning for our setting. Furthermore, our proposed clustering approach for RL finds near-optimal clusterings which lead to significantly better policies in terms of cumulative reward compared to learning a policy per individual or learning one non-personalized policy across all individuals. Our findings also indicate that the learned policies accurately learn to send interventions at the right moments and that the users workout more and at the right times of the day. △ Less

Submitted 21 May, 2020; v1 submitted 10 April, 2018; originally announced April 2018.

arXiv:1801.01705 [pdf]

Gatekeeping Algorithms with Human Ethical Bias: The ethics of algorithms in archives, libraries and society

Authors: Martijn van Otterlo

Abstract: In the age of algorithms, I focus on the question of how to ensure algorithms that will take over many of our familiar archival and library tasks, will behave according to human ethical norms that have evolved over many years. I start by characterizing physical archives in the context of related institutions such as libraries and museums. In this setting I analyze how ethical principles, in partic… ▽ More In the age of algorithms, I focus on the question of how to ensure algorithms that will take over many of our familiar archival and library tasks, will behave according to human ethical norms that have evolved over many years. I start by characterizing physical archives in the context of related institutions such as libraries and museums. In this setting I analyze how ethical principles, in particular about access to information, have been formalized and communicated in the form of ethical codes, or: codes of conducts. After that I describe two main developments: digitalization, in which physical aspects of the world are turned into digital data, and algorithmization, in which intelligent computer programs turn this data into predictions and decisions. Both affect interactions that were once physical but now digital. In this new setting I survey and analyze the ethical aspects of algorithms and how they shape a vision on the future of archivists and librarians, in the form of algorithmic documentalists, or: codementalists. Finally I outline a general research strategy, called IntERMEeDIUM, to obtain algorithms that obey are human ethical values encoded in code of ethics. △ Less

Submitted 5 January, 2018; originally announced January 2018.

Comments: Submitted (Nov 2017)

arXiv:1711.06035 [pdf, other]

From Algorithmic Black Boxes to Adaptive White Boxes: Declarative Decision-Theoretic Ethical Programs as Codes of Ethics

Authors: Martijn van Otterlo

Abstract: Ethics of algorithms is an emerging topic in various disciplines such as social science, law, and philosophy, but also artificial intelligence (AI). The value alignment problem expresses the challenge of (machine) learning values that are, in some way, aligned with human requirements or values. In this paper I argue for looking at how humans have formalized and communicated values, in professional… ▽ More Ethics of algorithms is an emerging topic in various disciplines such as social science, law, and philosophy, but also artificial intelligence (AI). The value alignment problem expresses the challenge of (machine) learning values that are, in some way, aligned with human requirements or values. In this paper I argue for looking at how humans have formalized and communicated values, in professional codes of ethics, and for exploring declarative decision-theoretic ethical programs (DDTEP) to formalize codes of ethics. This renders machine ethical reasoning and decision-making, as well as learning, more transparent and hopefully more accountable. The paper includes proof-of-concept examples of known toy dilemmas and gatekeeping domains such as archives and libraries. △ Less

Submitted 16 November, 2017; originally announced November 2017.

Comments: 7 pages, 1 figure, submitted

Showing 1–4 of 4 results for author: van Otterlo, M