-
Decentralized Deep Reinforcement Learning for a Distributed and Adaptive Locomotion Controller of a Hexapod Robot
Authors:
Malte Schilling,
Kai Konen,
Frank W. Ohl,
Timo Korthals
Abstract:
Locomotion is a prime example for adaptive behavior in animals and biological control principles have inspired control architectures for legged robots. While machine learning has been successfully applied to many tasks in recent years, Deep Reinforcement Learning approaches still appear to struggle when applied to real world robots in continuous control tasks and in particular do not appear as rob…
▽ More
Locomotion is a prime example for adaptive behavior in animals and biological control principles have inspired control architectures for legged robots. While machine learning has been successfully applied to many tasks in recent years, Deep Reinforcement Learning approaches still appear to struggle when applied to real world robots in continuous control tasks and in particular do not appear as robust solutions that can handle uncertainties well. Therefore, there is a new interest in incorporating biological principles into such learning architectures. While inducing a hierarchical organization as found in motor control has shown already some success, we here propose a decentralized organization as found in insect motor control for coordination of different legs. A decentralized and distributed architecture is introduced on a simulated hexapod robot and the details of the controller are learned through Deep Reinforcement Learning. We first show that such a concurrent local structure is able to learn better walking behavior. Secondly, that the simpler organization is learned faster compared to holistic approaches.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
From Crystallized Adaptivity to Fluid Adaptivity in Deep Reinforcement Learning -- Insights from Biological Systems on Adaptive Flexibility
Authors:
Malte Schilling,
Helge Ritter,
Frank W. Ohl
Abstract:
Recent developments in machine-learning algorithms have led to impressive performance increases in many traditional application scenarios of artificial intelligence research. In the area of deep reinforcement learning, deep learning functional architectures are combined with incremental learning schemes for sequential tasks that include interaction-based, but often delayed feedback. Despite their…
▽ More
Recent developments in machine-learning algorithms have led to impressive performance increases in many traditional application scenarios of artificial intelligence research. In the area of deep reinforcement learning, deep learning functional architectures are combined with incremental learning schemes for sequential tasks that include interaction-based, but often delayed feedback. Despite their impressive successes, modern machine-learning approaches, including deep reinforcement learning, still perform weakly when compared to flexibly adaptive biological systems in certain naturally occurring scenarios. Such scenarios include transfers to environments different than the ones in which the training took place or environments that dynamically change, both of which are often mastered by biological systems through a capability that we here term "fluid adaptivity" to contrast it from the much slower adaptivity ("crystallized adaptivity") of the prior learning from which the behavior emerged. In this article, we derive and discuss research strategies, based on analyzes of fluid adaptivity in biological systems and its neuronal modeling, that might aid in equipping future artificially intelligent systems with capabilities of fluid adaptivity more similar to those seen in some biologically intelligent systems. A key component of this research strategy is the dynamization of the problem space itself and the implementation of this dynamization by suitably designed flexibly interacting modules.
△ Less
Submitted 13 August, 2019;
originally announced August 2019.
-
Setup of a Recurrent Neural Network as a Body Model for Solving Inverse and Forward Kinematics as well as Dynamics for a Redundant Manipulator
Authors:
Malte Schilling
Abstract:
An internal model of the own body can be assumed a fundamental and evolutionary-early representation as it is present throughout the animal kingdom. Such functional models are, on the one hand, required in motor control, for example solving the inverse kinematic or dynamic task in goal-directed movements or a forward task in ballistic movements. On the other hand, such models are recruited in cogn…
▽ More
An internal model of the own body can be assumed a fundamental and evolutionary-early representation as it is present throughout the animal kingdom. Such functional models are, on the one hand, required in motor control, for example solving the inverse kinematic or dynamic task in goal-directed movements or a forward task in ballistic movements. On the other hand, such models are recruited in cognitive tasks as are planning ahead or observation of actions of a conspecific. Here, we present a functional internal body model that is based on the Mean of Multiple Computations principle. For the first time such a model is completely realized in a recurrent neural network as necessary normalization steps are integrated into the neural model itself. Secondly, a dynamic extension is applied to the model. It is shown how the neural network solves a series of inverse tasks. Furthermore, emerging representation in transformational layers are analyzed that show a form of prototypical population-coding as found in place or direction cells.
△ Less
Submitted 12 April, 2019;
originally announced April 2019.
-
Is Basketball a Game of Runs?
Authors:
Mark F. Schilling
Abstract:
Basketball is often referred to as "a game of runs." We investigate the appropriateness of this claim using data from the full NBA 2016-17 season, comparing actual longest runs of scoring events to what long run theory predicts under the assumption that team "momentum" is not present. We provide several different variations of the analysis. Our results consistently indicate that the lengths of lon…
▽ More
Basketball is often referred to as "a game of runs." We investigate the appropriateness of this claim using data from the full NBA 2016-17 season, comparing actual longest runs of scoring events to what long run theory predicts under the assumption that team "momentum" is not present. We provide several different variations of the analysis. Our results consistently indicate that the lengths of longest runs in NBA games are no longer than those that would occur naturally when scoring events are generated by a random process, rather than one that is influenced by "momentum".
△ Less
Submitted 20 March, 2019;
originally announced March 2019.
-
Modularization of End-to-End Learning: Case Study in Arcade Games
Authors:
Andrew Melnik,
Sascha Fleer,
Malte Schilling,
Helge Ritter
Abstract:
Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better genera…
▽ More
Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better generalisation capability. Here, we consider arcade-game environments as sets of interacting objects (controllable, non-controllable) and propose a set of functional modules that are specialized on mastering different types of interactions in a broad range of environments. The modules utilize regression, supervised learning, and reinforcement learning algorithms. Results of this case study in different Atari games suggest that human-level performance can be achieved by a learning agent within a human amount of game experience (10-15 minutes game time) when a proper decomposition of an environment or a task is provided. However, automatization of such decomposition remains a challenging problem. This case study shows how a model of a causal structure underlying an environment or a task can benefit learning time and generalization capability of the agent, and argues in favor of exploiting modular structure in contrast to using pure end-to-end learning approaches.
△ Less
Submitted 27 January, 2019;
originally announced January 2019.
-
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments
Authors:
Łukasz Kidziński,
Sharada Prasanna Mohanty,
Carmichael Ong,
Zhewei Huang,
Shuchang Zhou,
Anton Pechenko,
Adam Stelmaszczyk,
Piotr Jarosik,
Mikhail Pavlov,
Sergey Kolesnikov,
Sergey Plis,
Zhibo Chen,
Zhizheng Zhang,
Jiale Chen,
Jun Shi,
Zhuobin Zheng,
Chun Yuan,
Zhihui Lin,
Henryk Michalewski,
Piotr Miłoś,
Błażej Osiński,
Andrew Melnik,
Malte Schilling,
Helge Ritter,
Sean Carroll
, et al. (4 additional authors not shown)
Abstract:
In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient…
▽ More
In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.
△ Less
Submitted 1 April, 2018;
originally announced April 2018.
-
Exploring Unpopular Presidential Elections
Authors:
Michael Neubauer,
Mark Schilling,
Joel Zeitlin
Abstract:
There have been several instances in our nation's history in which the presidential candidate who received the most popular votes did not win the presidency. Using a principal components analysis of recent presidential elections, we estimate the likelihood of such an outcome under modern conditions to be approximately 5%. We also investigate the effect on this estimate of eliminating from the Elec…
▽ More
There have been several instances in our nation's history in which the presidential candidate who received the most popular votes did not win the presidency. Using a principal components analysis of recent presidential elections, we estimate the likelihood of such an outcome under modern conditions to be approximately 5%. We also investigate the effect on this estimate of eliminating from the Electoral College the two electors per state awarded due to their representation in the Senate, as well as the effect of increasing them. We conclude with an analysis of the likely consequences of The National Popular Vote Bill, which would award the presidency to the candidate with the largest national popular vote total if it took effect.
△ Less
Submitted 12 June, 2012;
originally announced June 2012.
-
Evidence of Systematic Bias in 2008 Presidential Polling (preliminary report)
Authors:
Leonard Adleman,
Mark Schilling
Abstract:
Political polls achieve their results by sampling a small number of potential voters rather than the population as a whole. This leads to sampling error which most polling agencies dutifully report. But factors such as nonrepresentative samples, question wording and nonresponse can produce non-sampling errors. While pollsters are aware of such errors, they are difficult to quantify and seldom re…
▽ More
Political polls achieve their results by sampling a small number of potential voters rather than the population as a whole. This leads to sampling error which most polling agencies dutifully report. But factors such as nonrepresentative samples, question wording and nonresponse can produce non-sampling errors. While pollsters are aware of such errors, they are difficult to quantify and seldom reported. When a polling agency, whether by intention or not, produces results with non-sampling errors that systematically favor one candidate over another, then that agency's poll is biased. We analyzed polling data for the (on-going) 2008 Presidential race, and though our methods do not allow us to identify which agencies' polls are biased, they do provide significant evidence that some agencies' polls are.
We compared polls produced by major television networks with those produced by Gallup and Rasmussen. We found that, taken as a whole, polls produced by the networks were significantly to the left of those produced by Gallup and Rasmussen. We used the available data to provide a tentative ordering of the major television networks' polls from right to left. Our order was: FOX, CNN, NBC (which partners with the Wall Street Journal), ABC (which partners with the Washington Post), CBS (which partners with the New York Times). These results appear to comport well with the informal perceptions of the political leanings of these agencies.
Our findings are preliminary, but they make a case for further research into the causes of and remedies for polling bias.
△ Less
Submitted 30 October, 2008;
originally announced October 2008.