Reward-based learning under hardware constraints - Using a RISC processor embedded in a neuromorphic substrate

Friedmann, Simon; Frémaux, Nicolas; Schemmel, Johannes; Gerstner, Wulfram; Meier, Karlheinz

Quantitative Biology > Neurons and Cognition

arXiv:1303.6708 (q-bio)

[Submitted on 27 Mar 2013 (v1), last revised 20 Aug 2013 (this version, v2)]

Title:Reward-based learning under hardware constraints - Using a RISC processor embedded in a neuromorphic substrate

Authors:Simon Friedmann, Nicolas Frémaux, Johannes Schemmel, Wulfram Gerstner, Karlheinz Meier

View PDF

Abstract:In this study, we propose and analyze in simulations a new, highly flexible method of implementing synaptic plasticity in a wafer-scale, accelerated neuromorphic hardware system. The study focuses on globally modulated STDP, as a special use-case of this method. Flexibility is achieved by embedding a general-purpose processor dedicated to plasticity into the wafer. To evaluate the suitability of the proposed system, we use a reward modulated STDP rule in a spike train learning task. A single layer of neurons is trained to fire at specific points in time with only the reward as feedback. This model is simulated to measure its performance, i.e. the increase in received reward after learning. Using this performance as baseline, we then simulate the model with various constraints imposed by the proposed implementation and compare the performance. The simulated constraints include discretized synaptic weights, a restricted interface between analog synapses and embedded processor, and mismatch of analog circuits. We find that probabilistic updates can increase the performance of low-resolution weights, a simple interface between analog synapses and processor is sufficient for learning, and performance is insensitive to mismatch. Further, we consider communication latency between wafer and the conventional control computer system that is simulating the environment. This latency increases the delay, with which the reward is sent to the embedded processor. Because of the time continuous operation of the analog synapses, delay can cause a deviation of the updates as compared to the not delayed situation. We find that for highly accelerated systems latency has to be kept to a minimum. This study demonstrates the suitability of the proposed implementation to emulate the selected reward modulated STDP learning rule.

Comments:	37 pages, 11 figures, to be published in Frontiers in Neuromorphic Engineering. This version contains major additions to the result and discussion parts
Subjects:	Neurons and Cognition (q-bio.NC)
Cite as:	arXiv:1303.6708 [q-bio.NC]
	(or arXiv:1303.6708v2 [q-bio.NC] for this version)
	https://doi.org/10.48550/arXiv.1303.6708

Submission history

From: Simon Friedmann [view email]
[v1] Wed, 27 Mar 2013 00:05:35 UTC (4,452 KB)
[v2] Tue, 20 Aug 2013 08:04:03 UTC (5,807 KB)

Quantitative Biology > Neurons and Cognition

Title:Reward-based learning under hardware constraints - Using a RISC processor embedded in a neuromorphic substrate

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Neurons and Cognition

Title:Reward-based learning under hardware constraints - Using a RISC processor embedded in a neuromorphic substrate

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators