-
Flat Channels to Infinity in Neural Loss Landscapes
Authors:
Flavio Martinelli,
Alexander Van Meegen,
Berfin Şimşek,
Wulfram Gerstner,
Johanni Brea
Abstract:
The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slowly, while the output weights of at least two neurons, $a_i$ and $a_j$, diverge to $\pm$infinity, and their input weight vectors,…
▽ More
The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slowly, while the output weights of at least two neurons, $a_i$ and $a_j$, diverge to $\pm$infinity, and their input weight vectors, $\mathbf{w_i}$ and $\mathbf{w_j}$, become equal to each other. At convergence, the two neurons implement a gated linear unit: $a_iσ(\mathbf{w_i} \cdot \mathbf{x}) + a_jσ(\mathbf{w_j} \cdot \mathbf{x}) \rightarrow σ(\mathbf{w} \cdot \mathbf{x}) + (\mathbf{v} \cdot \mathbf{x}) σ'(\mathbf{w} \cdot \mathbf{x})$. Geometrically, these channels to infinity are asymptotically parallel to symmetry-induced lines of critical points. Gradient flow solvers, and related optimization methods like SGD or ADAM, reach the channels with high probability in diverse regression settings, but without careful inspection they look like flat local minima with finite parameter values. Our characterization provides a comprehensive picture of these quasi-flat regions in terms of gradient dynamics, geometry, and functional interpretation. The emergence of gated linear units at the end of the channels highlights a surprising aspect of the computational capabilities of fully connected layers.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Should Under-parameterized Student Networks Copy or Average Teacher Weights?
Authors:
Berfin Şimşek,
Amire Bendjeddou,
Wulfram Gerstner,
Johanni Brea
Abstract:
Any continuous function $f^*$ can be approximated arbitrarily well by a neural network with sufficiently many neurons $k$. We consider the case when $f^*$ itself is a neural network with one hidden layer and $k$ neurons. Approximating $f^*$ with a neural network with $n< k$ neurons can thus be seen as fitting an under-parameterized "student" network with $n$ neurons to a "teacher" network with…
▽ More
Any continuous function $f^*$ can be approximated arbitrarily well by a neural network with sufficiently many neurons $k$. We consider the case when $f^*$ itself is a neural network with one hidden layer and $k$ neurons. Approximating $f^*$ with a neural network with $n< k$ neurons can thus be seen as fitting an under-parameterized "student" network with $n$ neurons to a "teacher" network with $k$ neurons. As the student has fewer neurons than the teacher, it is unclear, whether each of the $n$ student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. For shallow neural networks with erf activation function and for the standard Gaussian input distribution, we prove that "copy-average" configurations are critical points if the teacher's incoming vectors are orthonormal and its outgoing weights are unitary. Moreover, the optimum among such configurations is reached when $n-1$ student neurons each copy one teacher neuron and the $n$-th student neuron averages the remaining $k-n+1$ teacher neurons. For the student network with $n=1$ neuron, we provide additionally a closed-form solution of the non-trivial critical point(s) for commonly used activation functions through solving an equivalent constrained optimization problem. Empirically, we find for the erf activation function that gradient flow converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron. Finally, we find similar results for the ReLU activation function, suggesting that the optimal solution of underparameterized networks has a universal structure.
△ Less
Submitted 15 January, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
High-performance deep spiking neural networks with 0.3 spikes per neuron
Authors:
Ana Stanojevic,
Stanisław Woźniak,
Guillaume Bellec,
Giovanni Cherubini,
Angeliki Pantazi,
Wulfram Gerstner
Abstract:
Communication by rare, binary spikes is a key factor for the energy efficiency of biological brains. However, it is harder to train biologically-inspired spiking neural networks (SNNs) than artificial neural networks (ANNs). This is puzzling given that theoretical results provide exact mapping algorithms from ANNs to SNNs with time-to-first-spike (TTFS) coding. In this paper we analyze in theory a…
▽ More
Communication by rare, binary spikes is a key factor for the energy efficiency of biological brains. However, it is harder to train biologically-inspired spiking neural networks (SNNs) than artificial neural networks (ANNs). This is puzzling given that theoretical results provide exact mapping algorithms from ANNs to SNNs with time-to-first-spike (TTFS) coding. In this paper we analyze in theory and simulation the learning dynamics of TTFS-networks and identify a specific instance of the vanishing-or-exploding gradient problem. While two choices of SNN mappings solve this problem at initialization, only the one with a constant slope of the neuron membrane potential at threshold guarantees the equivalence of the training trajectory between SNNs and ANNs with rectified linear units. We demonstrate that training deep SNN models achieves the exact same performance as that of ANNs, surpassing previous SNNs on image classification datasets such as MNIST/Fashion-MNIST, CIFAR10/CIFAR100 and PLACES365. Our SNN accomplishes high-performance classification with less than 0.3 spikes per neuron, lending itself for an energy-efficient implementation. We show that fine-tuning SNNs with our robust gradient descent algorithm enables their optimization for hardware implementations with low latency and resilience to noise and quantization.
△ Less
Submitted 20 November, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Trial matching: capturing variability with data-constrained spiking neural networks
Authors:
Christos Sourmpis,
Carl Petersen,
Wulfram Gerstner,
Guillaume Bellec
Abstract:
Simultaneous behavioral and electrophysiological recordings call for new methods to reveal the interactions between neural activity and behavior. A milestone would be an interpretable model of the co-variability of spiking activity and behavior across trials. Here, we model a mouse cortical sensory-motor pathway in a tactile detection task reported by licking with a large recurrent spiking neural…
▽ More
Simultaneous behavioral and electrophysiological recordings call for new methods to reveal the interactions between neural activity and behavior. A milestone would be an interpretable model of the co-variability of spiking activity and behavior across trials. Here, we model a mouse cortical sensory-motor pathway in a tactile detection task reported by licking with a large recurrent spiking neural network (RSNN), fitted to the recordings via gradient-based optimization. We focus specifically on the difficulty to match the trial-to-trial variability in the data. Our solution relies on optimal transport to define a distance between the distributions of generated and recorded trials. The technique is applied to artificial data and neural recordings covering six cortical areas. We find that the resulting RSNN can generate realistic cortical activity and predict jaw movements across the main modes of trial-to-trial variability. Our analysis also identifies an unexpected mode of variability in the data corresponding to task-irrelevant movements of the mouse.
△ Less
Submitted 1 December, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Context selectivity with dynamic availability enables lifelong continual learning
Authors:
Martin Barry,
Wulfram Gerstner,
Guillaume Bellec
Abstract:
"You never forget how to ride a bike", -- but how is that possible? The brain is able to learn complex skills, stop the practice for years, learn other skills in between, and still retrieve the original knowledge when necessary. The mechanisms of this capability, referred to as lifelong learning (or continual learning, CL), are unknown. We suggest a bio-plausible meta-plasticity rule building on c…
▽ More
"You never forget how to ride a bike", -- but how is that possible? The brain is able to learn complex skills, stop the practice for years, learn other skills in between, and still retrieve the original knowledge when necessary. The mechanisms of this capability, referred to as lifelong learning (or continual learning, CL), are unknown. We suggest a bio-plausible meta-plasticity rule building on classical work in CL which we summarize in two principles: (i) neurons are context selective, and (ii) a local availability variable partially freezes the plasticity if the neuron was relevant for previous tasks. In a new neuro-centric formalization of these principles, we suggest that neuron selectivity and neuron-wide consolidation is a simple and viable meta-plasticity hypothesis to enable CL in the brain. In simulation, this simple model balances forgetting and consolidation leading to better transfer learning than contemporary CL algorithms on image recognition and natural language processing CL benchmarks.
△ Less
Submitted 25 January, 2024; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Expand-and-Cluster: Parameter Recovery of Neural Networks
Authors:
Flavio Martinelli,
Berfin Simsek,
Wulfram Gerstner,
Johanni Brea
Abstract:
Can we identify the weights of a neural network by probing its input-output mapping? At first glance, this problem seems to have many solutions because of permutation, overparameterisation and activation function symmetries. Yet, we show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function. Our novel method 'Expand-and-Cluster'…
▽ More
Can we identify the weights of a neural network by probing its input-output mapping? At first glance, this problem seems to have many solutions because of permutation, overparameterisation and activation function symmetries. Yet, we show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function. Our novel method 'Expand-and-Cluster' can identify layer sizes and weights of a target network for all commonly used activation functions. Expand-and-Cluster consists of two phases: (i) to relax the non-convex optimisation problem, we train multiple overparameterised student networks to best imitate the target function; (ii) to reverse engineer the target network's weights, we employ an ad-hoc clustering procedure that reveals the learnt weight vectors shared between students -- these correspond to the target weight vectors. We demonstrate successful weights and size recovery of trained shallow and deep networks with less than 10\% overhead in the layer size and describe an `ease-of-identifiability' axis by analysing 150 synthetic problems of variable difficulty.
△ Less
Submitted 27 June, 2024; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Emergent rate-based dynamics in duplicate-free populations of spiking neurons
Authors:
Valentin Schmutz,
Johanni Brea,
Wulfram Gerstner
Abstract:
Can Spiking Neural Networks (SNNs) approximate the dynamics of Recurrent Neural Networks (RNNs)? Arguments in classical mean-field theory based on laws of large numbers provide a positive answer when each neuron in the network has many "duplicates", i.e. other neurons with almost perfectly correlated inputs. Using a disordered network model that guarantees the absence of duplicates, we show that d…
▽ More
Can Spiking Neural Networks (SNNs) approximate the dynamics of Recurrent Neural Networks (RNNs)? Arguments in classical mean-field theory based on laws of large numbers provide a positive answer when each neuron in the network has many "duplicates", i.e. other neurons with almost perfectly correlated inputs. Using a disordered network model that guarantees the absence of duplicates, we show that duplicate-free SNNs can converge to RNNs, thanks to the concentration of measure phenomenon. This result reveals a general mechanism underlying the emergence of rate-based dynamics in large SNNs.
△ Less
Submitted 7 November, 2024; v1 submitted 9 March, 2023;
originally announced March 2023.
-
MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)
Authors:
Johanni Brea,
Flavio Martinelli,
Berfin Şimşek,
Wulfram Gerstner
Abstract:
MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot θ= -\nabla \mathcal L(θ; \mathcal D)$, where $θ$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, and $\nabla \mathcal L$ is the gradient of a loss function. We show numerically that adaptive first- or higher-order integration methods based on Runge-Kutta schemes h…
▽ More
MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot θ= -\nabla \mathcal L(θ; \mathcal D)$, where $θ$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, and $\nabla \mathcal L$ is the gradient of a loss function. We show numerically that adaptive first- or higher-order integration methods based on Runge-Kutta schemes have better accuracy and convergence speed than gradient descent with the Adam optimizer. However, we find Newton's method and approximations like BFGS preferable to find fixed points (local and global minima of $\mathcal L$) efficiently and accurately. For small networks and data sets, gradients are usually computed faster than in pytorch and Hessian are computed at least $5\times$ faster. Additionally, the package features an integrator for a teacher-student setup with bias-free, two-layer networks trained with standard Gaussian input in the limit of infinite data. The code is accessible at https://github.com/jbrea/MLPGradientFlow.jl.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
An Exact Mapping From ReLU Networks to Spiking Neural Networks
Authors:
Ana Stanojevic,
Stanisław Woźniak,
Guillaume Bellec,
Giovanni Cherubini,
Angeliki Pantazi,
Wulfram Gerstner
Abstract:
Deep spiking neural networks (SNNs) offer the promise of low-power artificial intelligence. However, training deep SNNs from scratch or converting deep artificial neural networks to SNNs without loss of performance has been a challenge. Here we propose an exact mapping from a network with Rectified Linear Units (ReLUs) to an SNN that fires exactly one spike per neuron. For our constructive proof,…
▽ More
Deep spiking neural networks (SNNs) offer the promise of low-power artificial intelligence. However, training deep SNNs from scratch or converting deep artificial neural networks to SNNs without loss of performance has been a challenge. Here we propose an exact mapping from a network with Rectified Linear Units (ReLUs) to an SNN that fires exactly one spike per neuron. For our constructive proof, we assume that an arbitrary multi-layer ReLU network with or without convolutional layers, batch normalization and max pooling layers was trained to high performance on some training set. Furthermore, we assume that we have access to a representative example of input data used during training and to the exact parameters (weights and biases) of the trained ReLU network. The mapping from deep ReLU networks to SNNs causes zero percent drop in accuracy on CIFAR10, CIFAR100 and the ImageNet-like data sets Places365 and PASS. More generally our work shows that an arbitrary deep ReLU network can be replaced by an energy-efficient single-spike neural network without any loss of performance.
△ Less
Submitted 23 December, 2022;
originally announced December 2022.
-
A taxonomy of surprise definitions
Authors:
Alireza Modirshanechi,
Johanni Brea,
Wulfram Gerstner
Abstract:
Surprising events trigger measurable brain activity and influence human behavior by affecting learning, memory, and decision-making. Currently there is, however, no consensus on the definition of surprise. Here we identify 18 mathematical definitions of surprise in a unifying framework. We first propose a technical classification of these definitions into three groups based on their dependence on…
▽ More
Surprising events trigger measurable brain activity and influence human behavior by affecting learning, memory, and decision-making. Currently there is, however, no consensus on the definition of surprise. Here we identify 18 mathematical definitions of surprise in a unifying framework. We first propose a technical classification of these definitions into three groups based on their dependence on an agent's belief, show how they relate to each other, and prove under what conditions they are indistinguishable. Going beyond this technical analysis, we propose a taxonomy of surprise definitions and classify them into four conceptual categories based on the quantity they measure: (i) 'prediction surprise' measures a mismatch between a prediction and an observation; (ii) 'change-point detection surprise' measures the probability of a change in the environment; (iii) 'confidence-corrected surprise' explicitly accounts for the effect of confidence; and (iv) 'information gain surprise' measures the belief-update upon a new observation. The taxonomy poses the foundation for principled studies of the functional roles and physiological signatures of surprise in the brain.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Kernel Memory Networks: A Unifying Framework for Memory Modeling
Authors:
Georgios Iatropoulos,
Johanni Brea,
Wulfram Gerstner
Abstract:
We consider the problem of training a neural network to store a set of patterns with maximal noise robustness. A solution, in terms of optimal weights and state update rules, is derived by training each individual neuron to perform either kernel classification or interpolation with a minimum weight norm. By applying this method to feed-forward and recurrent networks, we derive optimal models, term…
▽ More
We consider the problem of training a neural network to store a set of patterns with maximal noise robustness. A solution, in terms of optimal weights and state update rules, is derived by training each individual neuron to perform either kernel classification or interpolation with a minimum weight norm. By applying this method to feed-forward and recurrent networks, we derive optimal models, termed kernel memory networks, that include, as special cases, many of the hetero- and auto-associative memory models that have been proposed over the past years, such as modern Hopfield networks and Kanerva's sparse distributed memory. We modify Kanerva's model and demonstrate a simple way to design a kernel memory network that can store an exponential number of continuous-valued patterns with a finite basin of attraction. The framework of kernel memory networks offers a simple and intuitive way to understand the storage capacity of previous memory models, and allows for new biological interpretations in terms of dendritic non-linearities and synaptic cross-talk.
△ Less
Submitted 23 July, 2024; v1 submitted 19 August, 2022;
originally announced August 2022.
-
Mesoscopic modeling of hidden spiking neurons
Authors:
Shuqi Wang,
Valentin Schmutz,
Guillaume Bellec,
Wulfram Gerstner
Abstract:
Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to…
▽ More
Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neuLVM), where the activity of the unobserved neurons is reduced to a low-dimensional mesoscopic description. In contrast to previous latent variable models, neuLVM can be explicitly mapped to a recurrent, multi-population SNN, giving it a transparent biological interpretation. We show, on synthetic spike trains, that a few observed neurons are sufficient for neuLVM to perform efficient model inversion of large SNNs, in the sense that it can recover connectivity parameters, infer single-trial latent population activity, reproduce ongoing metastable dynamics, and generalize when subjected to perturbations mimicking photo-stimulation.
△ Less
Submitted 7 January, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Fitting summary statistics of neural data with a differentiable spiking network simulator
Authors:
Guillaume Bellec,
Shuqi Wang,
Alireza Modirshanechi,
Johanni Brea,
Wulfram Gerstner
Abstract:
Fitting network models to neural activity is an important tool in neuroscience. A popular approach is to model a brain area with a probabilistic recurrent spiking network whose parameters maximize the likelihood of the recorded activity. Although this is widely used, we show that the resulting model does not produce realistic neural activity. To correct for this, we suggest to augment the log-like…
▽ More
Fitting network models to neural activity is an important tool in neuroscience. A popular approach is to model a brain area with a probabilistic recurrent spiking network whose parameters maximize the likelihood of the recorded activity. Although this is widely used, we show that the resulting model does not produce realistic neural activity. To correct for this, we suggest to augment the log-likelihood with terms that measure the dissimilarity between simulated and recorded activity. This dissimilarity is defined via summary statistics commonly used in neuroscience and the optimization is efficient because it relies on back-propagation through the stochastically simulated spike trains. We analyze this method theoretically and show empirically that it generates more realistic activity statistics. We find that it improves upon other fitting algorithms for spiking network models like GLMs (Generalized Linear Models) which do not usually rely on back-propagation. This new fitting algorithm also enables the consideration of hidden neurons which is otherwise notoriously hard, and we show that it can be crucial when trying to infer the network connectivity from spike recordings.
△ Less
Submitted 14 November, 2021; v1 submitted 18 June, 2021;
originally announced June 2021.
-
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances
Authors:
Berfin Şimşek,
François Ged,
Arthur Jacot,
Francesco Spadaro,
Clément Hongler,
Wulfram Gerstner,
Johanni Brea
Abstract:
We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1^*! \cdots r_{L-1}^*! $ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to conn…
▽ More
We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1^*! \cdots r_{L-1}^*! $ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another. For a network of width $m$, we identify the number $G(r,m)$ of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width $r<r^*$. Via a combinatorial analysis, we derive closed-form formulas for $ T $ and $ G $ and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small $ h $) and vice versa in the vastly overparameterized regime ($h \gg r^*$). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.
△ Less
Submitted 12 September, 2021; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Correlation-invariant synaptic plasticity
Authors:
Carlos Stein N. Brito,
Wulfram Gerstner
Abstract:
Cortical populations of neurons develop sparse representations adapted to the statistics of the environment. While existing synaptic plasticity models reproduce some of the observed receptive-field properties, a major obstacle is the sensitivity of Hebbian learning to omnipresent spurious correlations in cortical networks which can overshadow relevant latent input features. Here we develop a theor…
▽ More
Cortical populations of neurons develop sparse representations adapted to the statistics of the environment. While existing synaptic plasticity models reproduce some of the observed receptive-field properties, a major obstacle is the sensitivity of Hebbian learning to omnipresent spurious correlations in cortical networks which can overshadow relevant latent input features. Here we develop a theory for synaptic plasticity that is invariant to second-order correlations in the input. Going beyond classical Hebbian learning, we show how Hebbian long-term depression (LTD) cancels the sensitivity to second-order correlations, so that receptive fields become aligned with features hidden in higher-order statistics. Our simulations demonstrate how correlation-invariance enables biologically realistic models to develop sparse population codes, despite diverse levels of variability and heterogeneity. The theory advances our understanding of local unsupervised learning in cortical circuits and assigns a specific functional role to synaptic LTD mechanisms in pyramidal neurons.
△ Less
Submitted 15 September, 2022; v1 submitted 20 May, 2021;
originally announced May 2021.
-
Local plasticity rules can learn deep representations using self-supervised contrastive predictions
Authors:
Bernd Illing,
Jean Ventura,
Guillaume Bellec,
Wulfram Gerstner
Abstract:
Learning in the brain is poorly understood and learning rules that respect biological constraints, yet yield deep hierarchical representations, are still unknown. Here, we propose a learning rule that takes inspiration from neuroscience and recent advances in self-supervised deep learning. Learning minimizes a simple layer-specific loss function and does not need to back-propagate error signals wi…
▽ More
Learning in the brain is poorly understood and learning rules that respect biological constraints, yet yield deep hierarchical representations, are still unknown. Here, we propose a learning rule that takes inspiration from neuroscience and recent advances in self-supervised deep learning. Learning minimizes a simple layer-specific loss function and does not need to back-propagate error signals within or between layers. Instead, weight updates follow a local, Hebbian, learning rule that only depends on pre- and post-synaptic neuronal activity, predictive dendritic input and widely broadcasted modulation factors which are identical for large groups of neurons. The learning rule applies contrastive predictive learning to a causal, biological setting using saccades (i.e. rapid shifts in gaze direction). We find that networks trained with this self-supervised and local rule build deep hierarchical representations of images, speech and video.
△ Less
Submitted 25 October, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Paradoxical Results of Long-Term Potentiation explained by Voltage-based Plasticity Rule
Authors:
Claire Meissner-Bernard,
Matthias Tsai,
Laureline Logiaco,
Wulfram Gerstner
Abstract:
Experiments have shown that the same stimulation pattern that causes Long-Term Potentiation in proximal synapses, will induce Long-Term Depression in distal ones. In order to understand these, and other, surprising observations we use a phenomenological model of Hebbian plasticity at the location of the synapse. Our computational model describes the Hebbian condition of joint activity of pre- and…
▽ More
Experiments have shown that the same stimulation pattern that causes Long-Term Potentiation in proximal synapses, will induce Long-Term Depression in distal ones. In order to understand these, and other, surprising observations we use a phenomenological model of Hebbian plasticity at the location of the synapse. Our computational model describes the Hebbian condition of joint activity of pre- and post-synaptic neuron in a compact form as the interaction of the glutamate trace left by a presynaptic spike with the time course of the postsynaptic voltage. We test the model using experimentally recorded dendritic voltage traces in hippocampus and neocortex. We find that the time course of the voltage in the neighborhood of a stimulated synapse is a reliable predictor of whether a stimulated synapse undergoes potentiation, depression, or no change. Our model can explain the existence of different -- at first glance seemingly paradoxical -- outcomes of synaptic potentiation and depression experiments depending on the dendritic location of the synapse and the frequency or timing of the stimulation.
△ Less
Submitted 18 November, 2020; v1 submitted 10 January, 2020;
originally announced January 2020.
-
Working memory facilitates reward-modulated Hebbian learning in recurrent neural networks
Authors:
Roman Pogodin,
Dane Corneil,
Alexander Seeholzer,
Joseph Heng,
Wulfram Gerstner
Abstract:
Reservoir computing is a powerful tool to explain how the brain learns temporal sequences, such as movements, but existing learning schemes are either biologically implausible or too inefficient to explain animal performance. We show that a network can learn complicated sequences with a reward-modulated Hebbian learning rule if the network of reservoir neurons is combined with a second network tha…
▽ More
Reservoir computing is a powerful tool to explain how the brain learns temporal sequences, such as movements, but existing learning schemes are either biologically implausible or too inefficient to explain animal performance. We show that a network can learn complicated sequences with a reward-modulated Hebbian learning rule if the network of reservoir neurons is combined with a second network that serves as a dynamic working memory and provides a spatio-temporal backbone signal to the reservoir. In combination with the working memory, reward-modulated Hebbian learning of the readout neurons performs as well as FORCE learning, but with the advantage of a biologically plausible interpretation of both the learning rule and the learning paradigm.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
Learning in Volatile Environments with the Bayes Factor Surprise
Authors:
Vasiliki Liakoni,
Alireza Modirshanechi,
Wulfram Gerstner,
Johanni Brea
Abstract:
Surprise-based learning allows agents to rapidly adapt to non-stationary stochastic environments characterized by sudden changes. We show that exact Bayesian inference in a hierarchical model gives rise to a surprise-modulated trade-off between forgetting old observations and integrating them with the new ones. The modulation depends on a probability ratio, which we call "Bayes Factor Surprise", t…
▽ More
Surprise-based learning allows agents to rapidly adapt to non-stationary stochastic environments characterized by sudden changes. We show that exact Bayesian inference in a hierarchical model gives rise to a surprise-modulated trade-off between forgetting old observations and integrating them with the new ones. The modulation depends on a probability ratio, which we call "Bayes Factor Surprise", that tests the prior belief against the current belief. We demonstrate that in several existing approximate algorithms the Bayes Factor Surprise modulates the rate of adaptation to new observations. We derive three novel surprised-based algorithms, one in the family of particle filters, one in the family of variational learning, and the other in the family of message passing, that have constant scaling in observation sequence length and particularly simple update dynamics for any distribution in the exponential family. Empirical results show that these surprise-based algorithms estimate parameters better than alternative approximate approaches and reach levels of performance comparable to computationally more expensive algorithms. The Bayes Factor Surprise is related to but different from Shannon Surprise. In two hypothetical experiments, we make testable predictions for physiological indicators that dissociate the Bayes Factor Surprise from Shannon Surprise. The theoretical insight of casting various approaches as surprise-based learning, as well as the proposed online algorithms, may be applied to the analysis of animal and human behavior, and to reinforcement learning in non-stationary environments.
△ Less
Submitted 23 September, 2020; v1 submitted 5 July, 2019;
originally announced July 2019.
-
Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape
Authors:
Johanni Brea,
Berfin Simsek,
Bernd Illing,
Wulfram Gerstner
Abstract:
The permutation symmetry of neurons in each layer of a deep neural network gives rise not only to multiple equivalent global minima of the loss function, but also to first-order saddle points located on the path between the global minima. In a network of $d-1$ hidden layers with $n_k$ neurons in layers $k = 1, \ldots, d$, we construct smooth paths between equivalent global minima that lead through…
▽ More
The permutation symmetry of neurons in each layer of a deep neural network gives rise not only to multiple equivalent global minima of the loss function, but also to first-order saddle points located on the path between the global minima. In a network of $d-1$ hidden layers with $n_k$ neurons in layers $k = 1, \ldots, d$, we construct smooth paths between equivalent global minima that lead through a `permutation point' where the input and output weight vectors of two neurons in the same hidden layer $k$ collide and interchange. We show that such permutation points are critical points with at least $n_{k+1}$ vanishing eigenvalues of the Hessian matrix of second derivatives indicating a local plateau of the loss function. We find that a permutation point for the exchange of neurons $i$ and $j$ transits into a flat valley (or generally, an extended plateau of $n_{k+1}$ flat dimensions) that enables all $n_k!$ permutations of neurons in a given layer $k$ at the same loss value. Moreover, we introduce high-order permutation points by exploiting the recursive structure in neural network functions, and find that the number of $K^{\text{th}}$-order permutation points is at least by a factor $\sum_{k=1}^{d-1}\frac{1}{2!^K}{n_k-K \choose K}$ larger than the (already huge) number of equivalent global minima. In two tasks, we illustrate numerically that some of the permutation points correspond to first-order saddles (`permutation saddles'): first, in a toy network with a single hidden layer on a function approximation task and, second, in a multilayer network on the MNIST task. Our geometric approach yields a lower bound on the number of critical points generated by weight-space symmetries and provides a simple intuitive link between previous mathematical results and numerical observations.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Biologically plausible deep learning -- but how far can we go with shallow networks?
Authors:
Bernd Illing,
Wulfram Gerstner,
Johanni Brea
Abstract:
Training deep neural networks with the error backpropagation algorithm is considered implausible from a biological perspective. Numerous recent publications suggest elaborate models for biologically plausible variants of deep learning, typically defining success as reaching around 98% test accuracy on the MNIST data set. Here, we investigate how far we can go on digit (MNIST) and object (CIFAR10)…
▽ More
Training deep neural networks with the error backpropagation algorithm is considered implausible from a biological perspective. Numerous recent publications suggest elaborate models for biologically plausible variants of deep learning, typically defining success as reaching around 98% test accuracy on the MNIST data set. Here, we investigate how far we can go on digit (MNIST) and object (CIFAR10) classification with biologically plausible, local learning rules in a network with one hidden layer and a single readout layer. The hidden layer weights are either fixed (random or random Gabor filters) or trained with unsupervised methods (PCA, ICA or Sparse Coding) that can be implemented by local learning rules. The readout layer is trained with a supervised, local learning rule. We first implement these models with rate neurons. This comparison reveals, first, that unsupervised learning does not lead to better performance than fixed random projections or Gabor filters for large hidden layers. Second, networks with localized receptive fields perform significantly better than networks with all-to-all connectivity and can reach backpropagation performance on MNIST. We then implement two of the networks - fixed, localized, random & random Gabor filters in the hidden layer - with spiking leaky integrate-and-fire neurons and spike timing dependent plasticity to train the readout layer. These spiking models achieve > 98.2% test accuracy on MNIST, which is close to the performance of rate networks with one hidden layer trained with backpropagation. The performance of our shallow network models is comparable to most current biologically plausible models of deep learning. Furthermore, our results with a shallow spiking network provide an important reference and suggest the use of datasets other than MNIST for testing the performance of future models of biologically plausible deep learning.
△ Less
Submitted 17 June, 2019; v1 submitted 27 February, 2019;
originally announced May 2019.
-
Mesoscopic population equations for spiking neural networks with synaptic short-term plasticity
Authors:
Valentin Schmutz,
Wulfram Gerstner,
Tilo Schwalger
Abstract:
Coarse-graining microscopic models of biological neural networks to obtain mesoscopic models of neural activities is an essential step towards multi-scale models of the brain. Here, we extend a recent theory for mesoscopic population dynamics with static synapses to the case of dynamic synapses exhibiting short-term plasticity (STP). Under the assumption that spike arrivals at synapses have Poisso…
▽ More
Coarse-graining microscopic models of biological neural networks to obtain mesoscopic models of neural activities is an essential step towards multi-scale models of the brain. Here, we extend a recent theory for mesoscopic population dynamics with static synapses to the case of dynamic synapses exhibiting short-term plasticity (STP). Under the assumption that spike arrivals at synapses have Poisson statistics, we derive analytically stochastic mean-field dynamics for the effective synaptic coupling between finite-size populations undergoing Tsodyks-Markram STP. The novel mean-field equations account for both finite number of synapses and correlations between the neurotransmitter release probability and the fraction of available synaptic resources. Comparisons with Monte Carlo simulations of the microscopic model show that in both feedforward and recurrent networks the mesoscopic mean-field model accurately reproduces stochastic realizations of the total synaptic input into a postsynaptic neuron and accounts for stochastic switches between Up and Down states as well as for population spikes. The extended mesoscopic population theory of spiking neural networks with STP may be useful for a systematic reduction of detailed biophysical models of cortical microcircuits to efficient and mathematically tractable mean-field models.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
How single neuron properties shape chaotic dynamics and signal transmission in random neural networks
Authors:
Samuel P. Muscinelli,
Wulfram Gerstner,
Tilo Schwalger
Abstract:
While most models of randomly connected networks assume nodes with simple dynamics, nodes in realistic highly connected networks, such as neurons in the brain, exhibit intrinsic dynamics over multiple timescales. We analyze how the dynamical properties of nodes (such as single neurons) and recurrent connections interact to shape the effective dynamics in large randomly connected networks. A novel…
▽ More
While most models of randomly connected networks assume nodes with simple dynamics, nodes in realistic highly connected networks, such as neurons in the brain, exhibit intrinsic dynamics over multiple timescales. We analyze how the dynamical properties of nodes (such as single neurons) and recurrent connections interact to shape the effective dynamics in large randomly connected networks. A novel dynamical mean-field theory for strongly connected networks of multi-dimensional rate units shows that the power spectrum of the network activity in the chaotic phase emerges from a nonlinear sharpening of the frequency response function of single units. For the case of two-dimensional rate units with strong adaptation, we find that the network exhibits a state of "resonant chaos", characterized by robust, narrow-band stochastic oscillations. The coherence of stochastic oscillations is maximal at the onset of chaos and their correlation time scales with the adaptation timescale of single units. Surprisingly, the resonance frequency can be predicted from the properties of isolated units, even in the presence of heterogeneity in the adaptation parameters. In the presence of these internally-generated chaotic fluctuations, the transmission of weak, low-frequency signals is strongly enhanced by adaptation, whereas signal transmission is not influenced by adaptation in the non-chaotic regime. Our theoretical framework can be applied to other mechanisms at the level of single nodes, such as synaptic filtering, refractoriness or spike synchronization. These results advance our understanding of the interaction between the dynamics of single units and recurrent connectivity, which is a fundamental step toward the description of biologically realistic network models in the brain, or, more generally, networks of other physical or man-made complex dynamical units.
△ Less
Submitted 28 February, 2019; v1 submitted 17 December, 2018;
originally announced December 2018.
-
Learning to Generate Music with BachProp
Authors:
Florian Colombo,
Johanni Brea,
Wulfram Gerstner
Abstract:
As deep learning advances, algorithms of music composition increase in performance. However, most of the successful models are designed for specific musical structures. Here, we present BachProp, an algorithmic composer that can generate music scores in many styles given sufficient training data. To adapt BachProp to a broad range of musical styles, we propose a novel representation of music and t…
▽ More
As deep learning advances, algorithms of music composition increase in performance. However, most of the successful models are designed for specific musical structures. Here, we present BachProp, an algorithmic composer that can generate music scores in many styles given sufficient training data. To adapt BachProp to a broad range of musical styles, we propose a novel representation of music and train a deep network to predict the note transition probabilities of a given music corpus. In this paper, new music scores generated by BachProp are compared with the original corpora as well as with different network architectures and other related models. We show that BachProp captures important features of the original datasets better than other models and invite the reader to a qualitative comparison on a large collection of generated songs.
△ Less
Submitted 12 June, 2019; v1 submitted 17 December, 2018;
originally announced December 2018.
-
On the choice of metric in gradient-based theories of brain function
Authors:
Simone Carlo Surace,
Jean-Pascal Pfister,
Wulfram Gerstner,
Johanni Brea
Abstract:
The idea that the brain functions so as to minimize certain costs pervades theoretical neuroscience. Since a cost function by itself does not predict how the brain finds its minima, additional assumptions about the optimization method need to be made to predict the dynamics of physiological quantities. In this context, steepest descent (also called gradient descent) is often suggested as an algori…
▽ More
The idea that the brain functions so as to minimize certain costs pervades theoretical neuroscience. Since a cost function by itself does not predict how the brain finds its minima, additional assumptions about the optimization method need to be made to predict the dynamics of physiological quantities. In this context, steepest descent (also called gradient descent) is often suggested as an algorithmic principle of optimization potentially implemented by the brain. In practice, researchers often consider the vector of partial derivatives as the gradient. However, the definition of the gradient and the notion of a steepest direction depend on the choice of a metric. Since the choice of the metric involves a large number of degrees of freedom, the predictive power of models that are based on gradient descent must be called into question, unless there are strong constraints on the choice of the metric. Here we provide a didactic review of the mathematics of gradient descent, illustrate common pitfalls of using gradient descent as a principle of brain function with examples from the literature and propose ways forward to constrain the metric.
△ Less
Submitted 21 December, 2018; v1 submitted 30 May, 2018;
originally announced May 2018.
-
Optimal stimulation protocol in a bistable synaptic consolidation model
Authors:
Chiara Gastaldi,
Samuel P. Muscinelli,
Wulfram Gerstner
Abstract:
Consolidation of synaptic changes in response to neural activity is thought to be fundamental for memory maintenance over a timescale of hours. In experiments, synaptic consolidation can be induced by repeatedly stimulating presynaptic neurons. However, the effectiveness of such protocols depends crucially on the repetition frequency of the stimulations and the mechanisms that cause this complex d…
▽ More
Consolidation of synaptic changes in response to neural activity is thought to be fundamental for memory maintenance over a timescale of hours. In experiments, synaptic consolidation can be induced by repeatedly stimulating presynaptic neurons. However, the effectiveness of such protocols depends crucially on the repetition frequency of the stimulations and the mechanisms that cause this complex dependence are unknown. Here we propose a simple mathematical model that allows us to systematically study the interaction between the stimulation protocol and synaptic consolidation. We show the existence of optimal stimulation protocols for our model and, similarly to LTP experiments, the repetition frequency of the stimulation plays a crucial role in achieving consolidation. Our results show that the complex dependence of LTP on the stimulation frequency emerges naturally from a model which satisfies only minimal bistability requirements.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
BachProp: Learning to Compose Music in Multiple Styles
Authors:
Florian Colombo,
Wulfram Gerstner
Abstract:
Hand in hand with deep learning advancements, algorithms of music composition increase in performance. However, most of the successful models are designed for specific musical structures. Here, we present BachProp, an algorithmic composer that can generate music scores in any style given sufficient training data. To adapt BachProp to a broad range of musical styles, we propose a novel normalized r…
▽ More
Hand in hand with deep learning advancements, algorithms of music composition increase in performance. However, most of the successful models are designed for specific musical structures. Here, we present BachProp, an algorithmic composer that can generate music scores in any style given sufficient training data. To adapt BachProp to a broad range of musical styles, we propose a novel normalized representation of music and train a deep network to predict the note transition probabilities of a given music corpus. In this paper, new music scores sampled by BachProp are compared with the original corpora via crowdsourcing. This evaluation indicates that the music scores generated by BachProp are not less preferred than the original music corpus the algorithm was provided with.
△ Less
Submitted 20 February, 2018; v1 submitted 14 February, 2018;
originally announced February 2018.
-
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
Authors:
Dane Corneil,
Wulfram Gerstner,
Johanni Brea
Abstract:
Modern reinforcement learning algorithms reach super-human performance on many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy. In this article we introduc…
▽ More
Modern reinforcement learning algorithms reach super-human performance on many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy. In this article we introduce Variational State Tabulation (VaST), which maps an environment with a high-dimensional state space (e.g. the space of visual inputs) to an abstract tabular model. Prioritized sweeping with small backups, a highly efficient planning method, can then be used to update state-action values. We show how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in rewards or transition probabilities.
△ Less
Submitted 11 June, 2018; v1 submitted 12 February, 2018;
originally announced February 2018.
-
Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of neoHebbian Three-Factor Learning Rules
Authors:
Wulfram Gerstner,
Marco Lehmann,
Vasiliki Liakoni,
Dane Corneil,
Johanni Brea
Abstract:
Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales.
Modern theories of synaptic plasticity have postulated th…
▽ More
Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales.
Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only if an additional factor is present while the flag is set. This third factor, signaling reward, punishment, surprise, or novelty, could be implemented by the phasic activity of neuromodulators or specific neuronal inputs signaling special events. While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few years.
Here we review, in the context of three-factor rules of synaptic plasticity, four key experiments that support the role of synaptic eligibility traces in combination with a third factor as a biological implementation of neoHebbian three-factor learning rules.
△ Less
Submitted 16 January, 2018;
originally announced January 2018.
-
Non-linear motor control by local learning in spiking neural networks
Authors:
Aditya Gilra,
Wulfram Gerstner
Abstract:
Learning weights in a spiking neural network with hidden neurons, using local, stable and online rules, to control non-linear body dynamics is an open problem. Here, we employ a supervised scheme, Feedback-based Online Local Learning Of Weights (FOLLOW), to train a network of heterogeneous spiking neurons with hidden layers, to control a two-link arm so as to reproduce a desired state trajectory.…
▽ More
Learning weights in a spiking neural network with hidden neurons, using local, stable and online rules, to control non-linear body dynamics is an open problem. Here, we employ a supervised scheme, Feedback-based Online Local Learning Of Weights (FOLLOW), to train a network of heterogeneous spiking neurons with hidden layers, to control a two-link arm so as to reproduce a desired state trajectory. The network first learns an inverse model of the non-linear dynamics, i.e. from state trajectory as input to the network, it learns to infer the continuous-time command that produced the trajectory. Connection weights are adjusted via a local plasticity rule that involves pre-synaptic firing and post-synaptic feedback of the error in the inferred command. We choose a network architecture, termed differential feedforward, that gives the lowest test error from different feedforward and recurrent architectures. The learned inverse model is then used to generate a continuous-time motor command to control the arm, given a desired trajectory.
△ Less
Submitted 29 December, 2017;
originally announced December 2017.
-
Multi-timescale memory dynamics in a reinforcement learning network with attention-gated memory
Authors:
Marco Martinolli,
Wulfram Gerstner,
Aditya Gilra
Abstract:
Learning and memory are intertwined in our brain and their relationship is at the core of several recent neural network models. In particular, the Attention-Gated MEmory Tagging model (AuGMEnT) is a reinforcement learning network with an emphasis on biological plausibility of memory dynamics and learning. We find that the AuGMEnT network does not solve some hierarchical tasks, where higher-level s…
▽ More
Learning and memory are intertwined in our brain and their relationship is at the core of several recent neural network models. In particular, the Attention-Gated MEmory Tagging model (AuGMEnT) is a reinforcement learning network with an emphasis on biological plausibility of memory dynamics and learning. We find that the AuGMEnT network does not solve some hierarchical tasks, where higher-level stimuli have to be maintained over a long time, while lower-level stimuli need to be remembered and forgotten over a shorter timescale. To overcome this limitation, we introduce hybrid AuGMEnT, with leaky or short-timescale and non-leaky or long-timescale units in memory, that allow to exchange lower-level information while maintaining higher-level one, thus solving both hierarchical and distractor tasks.
△ Less
Submitted 28 December, 2017;
originally announced December 2017.
-
Efficient low-dimensional approximation of continuous attractor networks
Authors:
Alexander Seeholzer,
Moritz Deger,
Wulfram Gerstner
Abstract:
Continuous "bump" attractors are an established model of cortical working memory for continuous variables and can be implemented using various neuron and network models. Here, we develop a generalizable approach for the approximation of bump states of continuous attractor networks implemented in networks of both rate-based and spiking neurons. The method relies on a low-dimensional parametrization…
▽ More
Continuous "bump" attractors are an established model of cortical working memory for continuous variables and can be implemented using various neuron and network models. Here, we develop a generalizable approach for the approximation of bump states of continuous attractor networks implemented in networks of both rate-based and spiking neurons. The method relies on a low-dimensional parametrization of the spatial shape of firing rates, allowing to apply efficient numerical optimization methods. Using our theory, we can establish a mapping between network structure and attractor properties that allows the prediction of the effects of network parameters on the steady state firing rate profile and the existence of bumps, and vice-versa, to fine-tune a network to produce bumps of a given shape.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
One-shot learning and behavioral eligibility traces in sequential decision making
Authors:
Marco Lehmann,
He Xu,
Vasiliki Liakoni,
Michael Herzog,
Wulfram Gerstner,
Kerstin Preuschoff
Abstract:
In many daily tasks we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetit…
▽ More
In many daily tasks we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here we asked whether humans use eligibility traces. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.
△ Less
Submitted 12 November, 2019; v1 submitted 13 July, 2017;
originally announced July 2017.
-
Predicting non-linear dynamics by stable local learning in a recurrent spiking neural network
Authors:
Aditya Gilra,
Wulfram Gerstner
Abstract:
Brains need to predict how the body reacts to motor commands. It is an open question how networks of spiking neurons can learn to reproduce the non-linear body dynamics caused by motor commands, using local, online and stable learning rules. Here, we present a supervised learning scheme for the feedforward and recurrent connections in a network of heterogeneous spiking neurons. The error in the ou…
▽ More
Brains need to predict how the body reacts to motor commands. It is an open question how networks of spiking neurons can learn to reproduce the non-linear body dynamics caused by motor commands, using local, online and stable learning rules. Here, we present a supervised learning scheme for the feedforward and recurrent connections in a network of heterogeneous spiking neurons. The error in the output is fed back through fixed random connections with a negative gain, causing the network to follow the desired dynamics, while an online and local rule changes the weights. The rule for Feedback-based Online Local Learning Of Weights (FOLLOW) is local in the sense that weight changes depend on the presynaptic activity and the error signal projected onto the postsynaptic neuron. We provide examples of learning linear, non-linear and chaotic dynamics, as well as the dynamics of a two-link arm. Using the Lyapunov method, and under reasonable assumptions and approximations, we show that FOLLOW learning is stable uniformly, with the error going to zero asymptotically.
△ Less
Submitted 26 April, 2017; v1 submitted 21 February, 2017;
originally announced February 2017.
-
Towards deep learning with spiking neurons in energy based models with contrastive Hebbian plasticity
Authors:
Thomas Mesnard,
Wulfram Gerstner,
Johanni Brea
Abstract:
In machine learning, error back-propagation in multi-layer neural networks (deep learning) has been impressively successful in supervised and reinforcement learning tasks. As a model for learning in the brain, however, deep learning has long been regarded as implausible, since it relies in its basic form on a non-local plasticity rule. To overcome this problem, energy-based models with local contr…
▽ More
In machine learning, error back-propagation in multi-layer neural networks (deep learning) has been impressively successful in supervised and reinforcement learning tasks. As a model for learning in the brain, however, deep learning has long been regarded as implausible, since it relies in its basic form on a non-local plasticity rule. To overcome this problem, energy-based models with local contrastive Hebbian learning were proposed and tested on a classification task with networks of rate neurons. We extended this work by implementing and testing such a model with networks of leaky integrate-and-fire neurons. Preliminary results indicate that it is possible to learn a non-linear regression task with hidden layers, spiking neurons and a local synaptic plasticity rule.
△ Less
Submitted 9 December, 2016;
originally announced December 2016.
-
Towards a theory of cortical columns: From spiking neurons to interacting neural populations of finite size
Authors:
Tilo Schwalger,
Moritz Deger,
Wulfram Gerstner
Abstract:
Neural population equations such as neural mass or field models are widely used to study brain activity on a large scale. However, the relation of these models to the properties of single neurons is unclear. Here we derive an equation for several interacting populations at the mesoscopic scale starting from a microscopic model of randomly connected generalized integrate-and-fire neuron models. Eac…
▽ More
Neural population equations such as neural mass or field models are widely used to study brain activity on a large scale. However, the relation of these models to the properties of single neurons is unclear. Here we derive an equation for several interacting populations at the mesoscopic scale starting from a microscopic model of randomly connected generalized integrate-and-fire neuron models. Each population consists of 50 -- 2000 neurons of the same type but different populations account for different neuron types. The stochastic population equations that we find reveal how spike-history effects in single-neuron dynamics such as refractoriness and adaptation interact with finite-size fluctuations on the population level. Efficient integration of the stochastic mesoscopic equations reproduces the statistical behavior of the population activities obtained from microscopic simulations of a full spiking neural network model. The theory describes nonlinear emergent dynamics like finite-size-induced stochastic transitions in multistable networks and synchronization in balanced networks of excitatory and inhibitory neurons. The mesoscopic equations are employed to rapidly simulate a model of a local cortical microcircuit consisting of eight neuron types. Our theory establishes a general framework for modeling finite-size neural population dynamics based on single cell and synapse parameters and offers an efficient approach to analyzing cortical circuits and computations.
△ Less
Submitted 21 April, 2017; v1 submitted 1 November, 2016;
originally announced November 2016.
-
Multi-contact synapses for stable networks: a spike-timing dependent model of dendritic spine plasticity and turnover
Authors:
Moritz Deger,
Alexander Seeholzer,
Wulfram Gerstner
Abstract:
Excitatory synaptic connections in the adult neocortex consist of multiple synaptic contacts, almost exclusively formed on dendritic spines. Changes of dendritic spine shape and volume, a correlate of synaptic strength, can be tracked in vivo for weeks. Here, we present a combined model of spike-timing dependent dendritic spine plasticity and turnover that explains the steady state multi-contact c…
▽ More
Excitatory synaptic connections in the adult neocortex consist of multiple synaptic contacts, almost exclusively formed on dendritic spines. Changes of dendritic spine shape and volume, a correlate of synaptic strength, can be tracked in vivo for weeks. Here, we present a combined model of spike-timing dependent dendritic spine plasticity and turnover that explains the steady state multi-contact configuration of synapses in adult neocortical networks. In this model, many presynaptic neurons compete to make strong synaptic connections onto postsynaptic neurons, while the synaptic contacts comprising each connection cooperate via postsynaptic firing. We demonstrate that the model is consistent with experimentally observed long-term dendritic spine dynamics under steady-state and lesion induced conditions, and show that cooperation of multiple synaptic contacts is crucial for stable, long-term synaptic memories. In simulations of a simplified network of barrel cortex, our plasticity rule reproduces whisker-trimming induced rewiring of thalamo-cortical and recurrent synaptic connectivity on realistic time scales.
△ Less
Submitted 19 September, 2016;
originally announced September 2016.
-
Algorithmic Composition of Melodies with Deep Recurrent Neural Networks
Authors:
Florian Colombo,
Samuel P. Muscinelli,
Alexander Seeholzer,
Johanni Brea,
Wulfram Gerstner
Abstract:
A big challenge in algorithmic composition is to devise a model that is both easily trainable and able to reproduce the long-range temporal dependencies typical of music. Here we investigate how artificial neural networks can be trained on a large corpus of melodies and turned into automated music composers able to generate new melodies coherent with the style they have been trained on. We employ…
▽ More
A big challenge in algorithmic composition is to devise a model that is both easily trainable and able to reproduce the long-range temporal dependencies typical of music. Here we investigate how artificial neural networks can be trained on a large corpus of melodies and turned into automated music composers able to generate new melodies coherent with the style they have been trained on. We employ gated recurrent unit networks that have been shown to be particularly efficient in learning complex sequential activations with arbitrary long time lags. Our model processes rhythm and melody in parallel while modeling the relation between these two features. Using such an approach, we were able to generate interesting complete melodies or suggest possible continuations of a melody fragment that is coherent with the characteristics of the fragment itself.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
Balancing New Against Old Information: The Role of Surprise in Learning
Authors:
Mohammadjavad Faraji,
Kerstin Preuschoff,
Wulfram Gerstner
Abstract:
Surprise describes a range of phenomena from unexpected events to behavioral responses. We propose a measure of surprise and use it for surprise-driven learning. Our surprise measure takes into account data likelihood as well as the degree of commitment to a belief via the entropy of the belief distribution. We find that surprise-minimizing learning dynamically adjusts the balance between new and…
▽ More
Surprise describes a range of phenomena from unexpected events to behavioral responses. We propose a measure of surprise and use it for surprise-driven learning. Our surprise measure takes into account data likelihood as well as the degree of commitment to a belief via the entropy of the belief distribution. We find that surprise-minimizing learning dynamically adjusts the balance between new and old information without the need of knowledge about the temporal statistics of the environment. We apply our framework to a dynamic decision-making task and a maze exploration task. Our surprise minimizing framework is suitable for learning in complex environments, even if the environment undergoes gradual or sudden changes and could eventually provide a framework to study the behavior of humans and animals encountering surprising events.
△ Less
Submitted 1 March, 2017; v1 submitted 17 June, 2016;
originally announced June 2016.
-
Automated point-neuron simplification of data-driven microcircuit models
Authors:
Christian Rössert,
Christian Pozzorini,
Giuseppe Chindemi,
Andrew P. Davison,
Csaba Eroe,
James King,
Taylor H. Newton,
Max Nolte,
Srikanth Ramaswamy,
Michael W. Reimann,
Willem Wybo,
Marc-Oliver Gewaltig,
Wulfram Gerstner,
Henry Markram,
Idan Segev,
Eilif Muller
Abstract:
A method is presented for the reduction of morphologically detailed microcircuit models to a point-neuron representation without human intervention. The simplification occurs in a modular workflow, in the neighborhood of a user specified network activity state for the reference model, the "operating point". First, synapses are moved to the soma, correcting for dendritic filtering by low-pass filte…
▽ More
A method is presented for the reduction of morphologically detailed microcircuit models to a point-neuron representation without human intervention. The simplification occurs in a modular workflow, in the neighborhood of a user specified network activity state for the reference model, the "operating point". First, synapses are moved to the soma, correcting for dendritic filtering by low-pass filtering the delivered synaptic current. Filter parameters are computed numerically and independently for inhibitory and excitatory input using a Green's function approach. Next, point-neuron models for each neuron in the microcircuit are fit to their respective morphologically detailed counterparts. Here, generalized integrate-and-fire point neuron models are used, leveraging a recently published fitting toolbox. The fits are constrained by currents and voltages computed in the morphologically detailed partner neurons with soma corrected synapses at three depolarizations about the user specified operating point. The result is a simplified circuit which is well constrained by the reference circuit, and can be continuously updated as the latter iteratively integrates new data. The modularity of the approach makes it applicable also for other point-neuron and synapse models. The approach is demonstrated on a recently reported reconstruction of a neocortical microcircuit around an in vivo-like working point. The resulting simplified network model is benchmarked to the reference morphologically detailed microcircuit model for a range of simulated network protocols. The simplified network is found to be slightly more sub-critical than the reference, with otherwise good agreement for both quantitative and qualitative validations.
△ Less
Submitted 30 March, 2017; v1 submitted 31 March, 2016;
originally announced April 2016.
-
Nonlinear Hebbian learning as a unifying principle in receptive field formation
Authors:
Carlos S. N. Brito,
Wulfram Gerstner
Abstract:
The development of sensory receptive fields has been modeled in the past by a variety of models including normative models such as sparse coding or independent component analysis and bottom-up models such as spike-timing dependent plasticity or the Bienenstock-Cooper-Munro model of synaptic plasticity. Here we show that the above variety of approaches can all be unified into a single common princi…
▽ More
The development of sensory receptive fields has been modeled in the past by a variety of models including normative models such as sparse coding or independent component analysis and bottom-up models such as spike-timing dependent plasticity or the Bienenstock-Cooper-Munro model of synaptic plasticity. Here we show that the above variety of approaches can all be unified into a single common principle, namely Nonlinear Hebbian Learning. When Nonlinear Hebbian Learning is applied to natural images, receptive field shapes were strongly constrained by the input statistics and preprocessing, but exhibited only modest variation across different choices of nonlinearities in neuron models or synaptic plasticity rules. Neither overcompleteness nor sparse network activity are necessary for the development of localized receptive fields. The analysis of alternative sensory modalities such as auditory models or V2 development lead to the same conclusions. In all examples, receptive fields can be predicted a priori by reformulating an abstract model as nonlinear Hebbian learning. Thus nonlinear Hebbian learning and natural statistics can account for many aspects of receptive field formation across models and sensory modalities.
△ Less
Submitted 4 January, 2016;
originally announced January 2016.
-
Fluctuations and information filtering in coupled populations of spiking neurons with adaptation
Authors:
Moritz Deger,
Tilo Schwalger,
Richard Naud,
Wulfram Gerstner
Abstract:
Finite-sized populations of spiking elements are fundamental to brain function, but also used in many areas of physics. Here we present a theory of the dynamics of finite-sized populations of spiking units, based on a quasi-renewal description of neurons with adaptation. We derive an integral equation with colored noise that governs the stochastic dynamics of the population activity in response to…
▽ More
Finite-sized populations of spiking elements are fundamental to brain function, but also used in many areas of physics. Here we present a theory of the dynamics of finite-sized populations of spiking units, based on a quasi-renewal description of neurons with adaptation. We derive an integral equation with colored noise that governs the stochastic dynamics of the population activity in response to time-dependent stimulation and calculate the spectral density in the asynchronous state. We show that systems of coupled populations with adaptation can generate a frequency band in which sensory information is preferentially encoded. The theory is applicable to fully as well as randomly connected networks, and to leaky integrate-and-fire as well as to generalized spiking neurons with adaptation on multiple time scales.
△ Less
Submitted 3 March, 2015; v1 submitted 17 November, 2013;
originally announced November 2013.
-
Spike timing prediction with active dendrites
Authors:
Richard Naud,
Brice Bathellier,
Wulfram Gerstner
Abstract:
A complete single-neuron model must correctly reproduce the firing of spikes and bursts. We present a study of a simplified model of deep pyramidal cells of the cortex with active dendrites. We hypothesized that we can model the soma and its apical tuft with only two compartments, without significant loss in the accuracy of spike-timing predictions. The model is based on experimentally measurable…
▽ More
A complete single-neuron model must correctly reproduce the firing of spikes and bursts. We present a study of a simplified model of deep pyramidal cells of the cortex with active dendrites. We hypothesized that we can model the soma and its apical tuft with only two compartments, without significant loss in the accuracy of spike-timing predictions. The model is based on experimentally measurable impulse-response functions, which transfer the effect of current injected in one compartment to current reaching the other. Each compartment was modeled with a pair of non-linear differential equations and a small number of parameters that approximate the Hodgkin-and-Huxley equations. The predictive power of this model was tested on electrophysiological experiments where noisy current was injected in both the soma and the apical dendrite simultaneously. We conclude that a simple two-compartment model can predict spike times of pyramidal cells stimulated in the soma and dendrites simultaneously. Our results support that regenerating activity in the dendritic tuft is required to properly account for the dynamics of layer 5 pyramidal cells under in-vivo-like conditions.
△ Less
Submitted 2 December, 2013; v1 submitted 14 November, 2013;
originally announced November 2013.
-
Reward-based learning under hardware constraints - Using a RISC processor embedded in a neuromorphic substrate
Authors:
Simon Friedmann,
Nicolas Frémaux,
Johannes Schemmel,
Wulfram Gerstner,
Karlheinz Meier
Abstract:
In this study, we propose and analyze in simulations a new, highly flexible method of implementing synaptic plasticity in a wafer-scale, accelerated neuromorphic hardware system. The study focuses on globally modulated STDP, as a special use-case of this method. Flexibility is achieved by embedding a general-purpose processor dedicated to plasticity into the wafer. To evaluate the suitability of t…
▽ More
In this study, we propose and analyze in simulations a new, highly flexible method of implementing synaptic plasticity in a wafer-scale, accelerated neuromorphic hardware system. The study focuses on globally modulated STDP, as a special use-case of this method. Flexibility is achieved by embedding a general-purpose processor dedicated to plasticity into the wafer. To evaluate the suitability of the proposed system, we use a reward modulated STDP rule in a spike train learning task. A single layer of neurons is trained to fire at specific points in time with only the reward as feedback. This model is simulated to measure its performance, i.e. the increase in received reward after learning. Using this performance as baseline, we then simulate the model with various constraints imposed by the proposed implementation and compare the performance. The simulated constraints include discretized synaptic weights, a restricted interface between analog synapses and embedded processor, and mismatch of analog circuits. We find that probabilistic updates can increase the performance of low-resolution weights, a simple interface between analog synapses and processor is sufficient for learning, and performance is insensitive to mismatch. Further, we consider communication latency between wafer and the conventional control computer system that is simulating the environment. This latency increases the delay, with which the reward is sent to the embedded processor. Because of the time continuous operation of the analog synapses, delay can cause a deviation of the updates as compared to the not delayed situation. We find that for highly accelerated systems latency has to be kept to a minimum. This study demonstrates the suitability of the proposed implementation to emulate the selected reward modulated STDP learning rule.
△ Less
Submitted 20 August, 2013; v1 submitted 26 March, 2013;
originally announced March 2013.
-
Nonnormal amplification in random balanced neuronal networks
Authors:
Guillaume Hennequin,
Tim P. Vogels,
Wulfram Gerstner
Abstract:
In dynamical models of cortical networks, the recurrent connectivity can amplify the input given to the network in two distinct ways. One is induced by the presence of near-critical eigenvalues in the connectivity matrix W, producing large but slow activity fluctuations along the corresponding eigenvectors (dynamical slowing). The other relies on W being nonnormal, which allows the network activit…
▽ More
In dynamical models of cortical networks, the recurrent connectivity can amplify the input given to the network in two distinct ways. One is induced by the presence of near-critical eigenvalues in the connectivity matrix W, producing large but slow activity fluctuations along the corresponding eigenvectors (dynamical slowing). The other relies on W being nonnormal, which allows the network activity to make large but fast excursions along specific directions. Here we investigate the tradeoff between nonnormal amplification and dynamical slowing in the spontaneous activity of large random neuronal networks composed of excitatory and inhibitory neurons. We use a Schur decomposition of W to separate the two amplification mechanisms. Assuming linear stochastic dynamics, we derive an exact expression for the expected amount of purely nonnormal amplification. We find that amplification is very limited if dynamical slowing must be kept weak. We conclude that, to achieve strong transient amplification with little slowing, the connectivity must be structured. We show that unidirectional connections between neurons of the same type together with reciprocal connections between neurons of different types, allow for amplification already in the fast dynamical regime. Finally, our results also shed light on the differences between balanced networks in which inhibition exactly cancels excitation, and those where inhibition dominates.
△ Less
Submitted 13 April, 2012;
originally announced April 2012.
-
Rescaling, thinning or complementing? On goodness-of-fit procedures for point process models and Generalized Linear Models
Authors:
Felipe Gerhard,
Wulfram Gerstner
Abstract:
Generalized Linear Models (GLMs) are an increasingly popular framework for modeling neural spike trains. They have been linked to the theory of stochastic point processes and researchers have used this relation to assess goodness-of-fit using methods from point-process theory, e.g. the time-rescaling theorem. However, high neural firing rates or coarse discretization lead to a breakdown of the ass…
▽ More
Generalized Linear Models (GLMs) are an increasingly popular framework for modeling neural spike trains. They have been linked to the theory of stochastic point processes and researchers have used this relation to assess goodness-of-fit using methods from point-process theory, e.g. the time-rescaling theorem. However, high neural firing rates or coarse discretization lead to a breakdown of the assumptions necessary for this connection. Here, we show how goodness-of-fit tests from point-process theory can still be applied to GLMs by constructing equivalent surrogate point processes out of time-series observations. Furthermore, two additional tests based on thinning and complementing point processes are introduced. They augment the instruments available for checking model adequacy of point processes as well as discretized models.
△ Less
Submitted 18 November, 2010;
originally announced November 2010.
-
Noise-enhanced computation in a model of a cortical column
Authors:
Julien Mayor,
Wulfram Gerstner
Abstract:
Varied sensory systems use noise in order to enhance detection of weak signals. It has been conjectured in the literature that this effect, known as stochastic resonance, may take place in central cognitive processes such as the memory retrieval of arithmetical multiplication. We show in a simplified model of cortical tissue, that complex arithmetical calculations can be carried out and are enha…
▽ More
Varied sensory systems use noise in order to enhance detection of weak signals. It has been conjectured in the literature that this effect, known as stochastic resonance, may take place in central cognitive processes such as the memory retrieval of arithmetical multiplication. We show in a simplified model of cortical tissue, that complex arithmetical calculations can be carried out and are enhanced in the presence of a stochastic background. The performance is shown to be positively correlated to the susceptibility of the network, defined as its sensitivity to a variation of the mean of its inputs. For nontrivial arithmetic tasks such as multiplication, stochastic resonance is an emergent property of the microcircuitry of the model network.
△ Less
Submitted 4 May, 2005;
originally announced May 2005.
-
Optimal Spike-Timing Dependent Plasticity for Precise Action Potential Firing
Authors:
Jean-Pascal Pfister,
Taro Toyoizumi,
David Barber,
Wulfram Gerstner
Abstract:
In timing-based neural codes, neurons have to emit action potentials at precise moments in time. We use a supervised learning paradigm to derive a synaptic update rule that optimizes via gradient ascent the likelihood of postsynaptic firing at one or several desired firing times. We find that the optimal strategy of up- and downregulating synaptic efficacies can be described by a two-phase learn…
▽ More
In timing-based neural codes, neurons have to emit action potentials at precise moments in time. We use a supervised learning paradigm to derive a synaptic update rule that optimizes via gradient ascent the likelihood of postsynaptic firing at one or several desired firing times. We find that the optimal strategy of up- and downregulating synaptic efficacies can be described by a two-phase learning window similar to that of Spike-Timing Dependent Plasticity (STDP). If the presynaptic spike arrives before the desired postsynaptic spike timing, our optimal learning rule predicts that the synapse should become potentiated. The dependence of the potentiation on spike timing directly reflects the time course of an excitatory postsynaptic potential. The presence and amplitude of depression of synaptic efficacies for reversed spike timing depends on how constraints are implemented in the optimization problem. Two different constraints, i.e., control of postsynaptic rates or control of temporal locality,are discussed.
△ Less
Submitted 24 February, 2005;
originally announced February 2005.
-
Signal buffering in random networks of spiking neurons: microscopic vs. macroscopic phenomena
Authors:
Julien Mayor,
Wulfram Gerstner
Abstract:
In randomly connected networks of pulse-coupled elements a time-dependent input signal can be buffered over a short time. We studied the signal buffering properties in simulated networks as a function of the networks state, characterized by both the Lyapunov exponent of the microscopic dynamics and the macroscopic activity derived from mean-field theory. If all network elements receive the same…
▽ More
In randomly connected networks of pulse-coupled elements a time-dependent input signal can be buffered over a short time. We studied the signal buffering properties in simulated networks as a function of the networks state, characterized by both the Lyapunov exponent of the microscopic dynamics and the macroscopic activity derived from mean-field theory. If all network elements receive the same signal, signal buffering over delays comparable to the intrinsic time constant of the network elements can be explained by macroscopic properties and works best at the phase transition to chaos. However, if only 20 percent of the network units receive a common time-dependent signal, signal buffering properties improve and can no longer be attributed to the macroscopic dynamics.
△ Less
Submitted 23 February, 2005;
originally announced February 2005.
-
Transient Information Flow in a Network of Excitatory and Inhibitory Model Neurons: Role of Noise and Signal Autocorrelation
Authors:
Julien Mayor,
Wulfram Gerstner
Abstract:
We investigate the performance of sparsely-connected networks of integrate-and-fire neurons for ultra-short term information processing. We exploit the fact that the population activity of networks with balanced excitation and inhibition can switch from an oscillatory firing regime to a state of asynchronous irregular firing or quiescence depending on the rate of external background spikes.
We…
▽ More
We investigate the performance of sparsely-connected networks of integrate-and-fire neurons for ultra-short term information processing. We exploit the fact that the population activity of networks with balanced excitation and inhibition can switch from an oscillatory firing regime to a state of asynchronous irregular firing or quiescence depending on the rate of external background spikes.
We find that in terms of information buffering the network performs best for a moderate, non-zero, amount of noise. Analogous to the phenomenon of stochastic resonance the performance decreases for higher and lower noise levels. The optimal amount of noise corresponds to the transition zone between a quiescent state and a regime of stochastic dynamics. This provides a potential explanation on the role of non-oscillatory population activity in a simplified model of cortical micro-circuits.
△ Less
Submitted 23 February, 2005;
originally announced February 2005.