-
Competitive plasticity to reduce the energetic costs of learning
Authors:
Mark CW van Rossum
Abstract:
The brain is not only constrained by energy needed to fuel computation, but it is also constrained by energy needed to form memories. Experiments have shown that learning simple conditioning tasks already carries a significant metabolic cost. Yet, learning a task like MNIST to 95% accuracy appears to require at least 10^{8} synaptic updates. Therefore the brain has likely evolved to be able to lea…
▽ More
The brain is not only constrained by energy needed to fuel computation, but it is also constrained by energy needed to form memories. Experiments have shown that learning simple conditioning tasks already carries a significant metabolic cost. Yet, learning a task like MNIST to 95% accuracy appears to require at least 10^{8} synaptic updates. Therefore the brain has likely evolved to be able to learn using as little energy as possible. We explored the energy required for learning in feedforward neural networks. Based on a parsimonious energy model, we propose two plasticity restricting algorithms that save energy: 1) only modify synapses with large updates, and 2) restrict plasticity to subsets of synapses that form a path through the network. Combining these two methods leads to substantial energy savings while only incurring a small increase in learning time. In biology networks are often much larger than the task requires. In particular in that case, large savings can be achieved. Thus competitively restricting plasticity helps to save metabolic energy associated to synaptic plasticity. The results might lead to a better understanding of biological plasticity and a better match between artificial and biological learning. Moreover, the algorithms might also benefit hardware because in electronics memory storage is energetically costly as well.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Lazy learning: a biologically-inspired plasticity rule for fast and energy efficient synaptic plasticity
Authors:
Aaron Pache,
Mark CW van Rossum
Abstract:
When training neural networks for classification tasks with backpropagation, parameters are updated on every trial, even if the sample is classified correctly. In contrast, humans concentrate their learning effort on errors. Inspired by human learning, we introduce lazy learning, which only learns on incorrect samples. Lazy learning can be implemented in a few lines of code and requires no hyperpa…
▽ More
When training neural networks for classification tasks with backpropagation, parameters are updated on every trial, even if the sample is classified correctly. In contrast, humans concentrate their learning effort on errors. Inspired by human learning, we introduce lazy learning, which only learns on incorrect samples. Lazy learning can be implemented in a few lines of code and requires no hyperparameter tuning. Lazy learning achieves state-of-the-art performance and is particularly suited when datasets are large. For instance, it reaches 99.2% test accuracy on Extended MNIST using a single-layer MLP, and does so 7.6x faster than a matched backprop network
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Estimating the energy requirements for long term memory formation
Authors:
Maxime Girard,
Jiamu Jiang,
Mark CW van Rossum
Abstract:
Brains consume metabolic energy to process information, but also to store memories. The energy required for memory formation can be substantial, for instance in fruit flies memory formation leads to a shorter lifespan upon subsequent starvation (Mery and Kawecki, 2005). Here we estimate that the energy required corresponds to about 10mJ/bit and compare this to biophysical estimates as well as ener…
▽ More
Brains consume metabolic energy to process information, but also to store memories. The energy required for memory formation can be substantial, for instance in fruit flies memory formation leads to a shorter lifespan upon subsequent starvation (Mery and Kawecki, 2005). Here we estimate that the energy required corresponds to about 10mJ/bit and compare this to biophysical estimates as well as energy requirements in computer hardware. We conclude that biological memory storage is expensive, but the reason behind it is not known.
△ Less
Submitted 8 February, 2023; v1 submitted 16 January, 2023;
originally announced January 2023.
-
Program synthesis performance constrained by non-linear spatial relations in Synthetic Visual Reasoning Test
Authors:
Lu Yihe,
Scott C. Lowe,
Penelope A. Lewis,
Mark C. W. van Rossum
Abstract:
Despite remarkable advances in automated visual recognition by machines, some visual tasks remain challenging for machines. Fleuret et al. (2011) introduced the Synthetic Visual Reasoning Test (SVRT) to highlight this point, which required classification of images consisting of randomly generated shapes based on hidden abstract rules using only a few examples. Ellis et al. (2015) demonstrated that…
▽ More
Despite remarkable advances in automated visual recognition by machines, some visual tasks remain challenging for machines. Fleuret et al. (2011) introduced the Synthetic Visual Reasoning Test (SVRT) to highlight this point, which required classification of images consisting of randomly generated shapes based on hidden abstract rules using only a few examples. Ellis et al. (2015) demonstrated that a program synthesis approach could solve some of the SVRT problems with unsupervised, few-shot learning, whereas they remained challenging for several convolutional neural networks trained with thousands of examples. Here we re-considered the human and machine experiments, because they followed different protocols and yielded different statistics. We thus proposed a quantitative reintepretation of the data between the protocols, so that we could make fair comparison between human and machine performance. We improved the program synthesis classifier by correcting the image parsings, and compared the results to the performance of other machine agents and human subjects. We grouped the SVRT problems into different types by the two aspects of the core characteristics for classification: shape specification and location relation. We found that the program synthesis classifier could not solve problems involving shape distances, because it relied on symbolic computation which scales poorly with input dimension and adding distances into such computation would increase the dimension combinatorially with the number of shapes in an image. Therefore, although the program synthesis classifier is capable of abstract reasoning, its performance is highly constrained by the accessible information in image parsings.
△ Less
Submitted 19 November, 2019; v1 submitted 18 November, 2019;
originally announced November 2019.
-
The effect of neural adaptation of population coding accuracy
Authors:
J. M. Cortes,
D. Marinazzo,
P. Series,
M. W. Oram,
T. J. Sejnowski,
M. C. W. van Rossum
Abstract:
Most neurons in the primary visual cortex initially respond vigorously when a preferred stimulus is presented, but adapt as stimulation continues. The functional consequences of adaptation are unclear. Typically a reduction of firing rate would reduce single neuron accuracy as less spikes are available for decoding, but it has been suggested that on the population level, adaptation increases codin…
▽ More
Most neurons in the primary visual cortex initially respond vigorously when a preferred stimulus is presented, but adapt as stimulation continues. The functional consequences of adaptation are unclear. Typically a reduction of firing rate would reduce single neuron accuracy as less spikes are available for decoding, but it has been suggested that on the population level, adaptation increases coding accuracy. This question requires careful analysis as adaptation not only changes the firing rates of neurons, but also the neural variability and correlations between neurons, which affect coding accuracy as well. We calculate the coding accuracy using a computational model that implements two forms of adaptation: spike frequency adaptation and synaptic adaptation in the form of short-term synaptic plasticity. We find that the net effect of adaptation is subtle and heterogeneous. Depending on adaptation mechanism and test stimulus, adaptation can either increase or decrease coding accuracy. We discuss the neurophysiological and psychophysical implications of the findings and relate it to published experimental data.
△ Less
Submitted 14 March, 2011;
originally announced March 2011.
-
Shannon Information Capacity of Discrete Synapses
Authors:
Adam B. Barrett,
M. C. W. van Rossum
Abstract:
There is evidence that biological synapses have only a fixed number of discrete weight states. Memory storage with such synapses behaves quite differently from synapses with unbounded, continuous weights as old memories are automatically overwritten by new memories. We calculate the storage capacity of discrete, bounded synapses in terms of Shannon information. For optimal learning rules, we inv…
▽ More
There is evidence that biological synapses have only a fixed number of discrete weight states. Memory storage with such synapses behaves quite differently from synapses with unbounded, continuous weights as old memories are automatically overwritten by new memories. We calculate the storage capacity of discrete, bounded synapses in terms of Shannon information. For optimal learning rules, we investigate how information storage depends on the number of synapses, the number of synaptic states and the coding sparseness.
△ Less
Submitted 13 March, 2008;
originally announced March 2008.
-
Dynamics and robustness of familiarity memory
Authors:
J. M. Cortes,
A. Greve,
A. B. Barrett,
M. C. W. van Rossum
Abstract:
When one is presented with an item or a face, one can sometimes have a sense of recognition without being able to recall where or when one has encountered it before. This sense of recognition is known as familiarity. Following previous computational models of familiarity memory we investigate the dynamical properties of familiarity discrimination, and contrast two different familiarity discrimin…
▽ More
When one is presented with an item or a face, one can sometimes have a sense of recognition without being able to recall where or when one has encountered it before. This sense of recognition is known as familiarity. Following previous computational models of familiarity memory we investigate the dynamical properties of familiarity discrimination, and contrast two different familiarity discriminators: one based on the energy of the neural network, and the other based on the time derivative of the energy. We show how the familiarity signal decays after a stimulus is presented, and examine the robustness of the familiarity discriminator in the presence of random fluctuations in neural activity. For both discriminators we establish, via a combined method of signal-to-noise ratio and mean field analysis, how the maximum number of successfully discriminated stimuli depends on the noise level.
△ Less
Submitted 6 October, 2007;
originally announced October 2007.
-
Multiple scattering of classical waves: from microscopy to mesoscopy and diffusion
Authors:
M. C. W. van Rossum,
Th. M. Nieuwenhuizen
Abstract:
A tutorial discussion of the propagation of waves in random media is presented. In first approximation the transport of the multiple scattered waves is given by diffusion theory, but important corrections are present. These corrections are calculated with the radiative transfer or Schwarzschild-Milne equation, which describes intensity transport at the ``mesoscopic'' level and is derived from th…
▽ More
A tutorial discussion of the propagation of waves in random media is presented. In first approximation the transport of the multiple scattered waves is given by diffusion theory, but important corrections are present. These corrections are calculated with the radiative transfer or Schwarzschild-Milne equation, which describes intensity transport at the ``mesoscopic'' level and is derived from the ``microscopic'' wave equation. A precise treatment of the diffuse intensity is derived which automatically includes the effects of boundary layers. Effects such as the enhanced backscatter cone and imaging of objects in opaque media are also discussed within this framework. In the second part the approach is extended to mesoscopic correlations between multiple scattered intensities which arise when scattering is strong. These correlations arise from the underlying wave character. The derivation of correlation functions and intensity distribution functions is given and experimental data are discussed. Although the focus is on light scattering, the theory is also applicable to micro waves, sound waves and non-interacting electrons.
△ Less
Submitted 14 April, 1998;
originally announced April 1998.
-
Deviations from the Gaussian distribution of mesoscopic conductance fluctuations
Authors:
M. C. W. van Rossum,
Igor V. Lerner,
Boris L. Altshuler,
Th. M. Nieuwenhuizen
Abstract:
The conductance distribution of metallic mesoscopic systems is considered. The variance of this distribution describes the universal conductance fluctuations, yielding a Gaussian distribution of the conductance. We calculate diagrammatically the third cumulant of this distribution, the leading deviation from the Gaussian. We confirm random matrix theory calculations that the leading contribution…
▽ More
The conductance distribution of metallic mesoscopic systems is considered. The variance of this distribution describes the universal conductance fluctuations, yielding a Gaussian distribution of the conductance. We calculate diagrammatically the third cumulant of this distribution, the leading deviation from the Gaussian. We confirm random matrix theory calculations that the leading contribution in quasi-one dimension vanishes. However, in quasi two dimensions the third cumulant is negative, whereas in three dimensions it is positive.
△ Less
Submitted 28 January, 1997;
originally announced January 1997.
-
Mesoscopic phenomena in multiple light scattering
Authors:
M. C. W. van Rossum
Abstract:
In my thesis I study mesoscopic corrections on diffuse transport. I first describe the diffuse transport of light, using the scalar approximation and the radiative transfer approach. Next, I focus on the correlations in transmission, I discuss the so called C_1, C_2, C_3 decomposition and calculate each term in detail. Finally, I discuss the full distribution functions in the transmission.
Man…
▽ More
In my thesis I study mesoscopic corrections on diffuse transport. I first describe the diffuse transport of light, using the scalar approximation and the radiative transfer approach. Next, I focus on the correlations in transmission, I discuss the so called C_1, C_2, C_3 decomposition and calculate each term in detail. Finally, I discuss the full distribution functions in the transmission.
Many references and figures are included. Note, however, that much of the work was already published or is present on the cond-mat archive.
A limited number is available as hardcopy on request ([email protected]) else 132 pages Postscript.
△ Less
Submitted 25 April, 1995;
originally announced April 1995.
-
Third Cumulant of the total Transmission of diffuse Waves
Authors:
M. C. W. van Rossum,
Johannes F. de Boer,
Th. M. Nieuwenhuizen
Abstract:
The probability distribution of the total transmission is studied for waves multiple scattered from a random, static configuration of scatterers. A theoretical study of the second and third cumulant of this distribution is presented. Within a diagrammatic approach a theory is developed which relates the third cumulant normalized to the average, $\langle \langle T_a^3 \rangle \rangle$, to the nor…
▽ More
The probability distribution of the total transmission is studied for waves multiple scattered from a random, static configuration of scatterers. A theoretical study of the second and third cumulant of this distribution is presented. Within a diagrammatic approach a theory is developed which relates the third cumulant normalized to the average, $\langle \langle T_a^3 \rangle \rangle$, to the normalized second cumulant $\langle \langle T_a^2 \rangle \rangle$. For a broad Gaussian beam profile it is found that $\langle \langle T_a^3 \rangle \rangle= \frac{16}{5} \langle \langle T_a^2 \rangle \rangle^2 $. This is in good agreement with data of optical experiments.
△ Less
Submitted 30 December, 1994;
originally announced December 1994.
-
"Optical conductance fluctuations: diagrammatic analysis in Landauer approach and non-universal effects"
Authors:
M. C. W. van Rossum,
Th. M. Nieuwenhuizen,
R. Vlaming
Abstract:
The optical conductance of a multiple scattering medium is the total transmitted light of a diffuse incoming beam. This quantity, very analogous to the electronic conductance, exhibits universal conductance fluctuations. We perform a detailed diagrammatic analysis of these fluctuations. With a Kadanoff-Baym technique all the leading diagrams are systematically generated. A cancellation of the sh…
▽ More
The optical conductance of a multiple scattering medium is the total transmitted light of a diffuse incoming beam. This quantity, very analogous to the electronic conductance, exhibits universal conductance fluctuations. We perform a detailed diagrammatic analysis of these fluctuations. With a Kadanoff-Baym technique all the leading diagrams are systematically generated. A cancellation of the short distance divergencies occurs, that yields a well behaved theory. The analytical form of the fluctuations is calculated and applied to optical systems. Absorption and internal reflections reduce the fluctuations significantly.
△ Less
Submitted 8 December, 1994;
originally announced December 1994.
-
Intensity Distribution of Waves Transmitted Through a Multiple Scattering Medium
Authors:
Th. M. Nieuwenhuizen,
M. C. W. van Rossum
Abstract:
The distributions of the angular transmission coefficient and of the total transmission are calculated for multiple scattered waves. The calculation is based on a mapping to the distribution of eigenvalues of the transmission matrix. The distributions depend on the profile of the incoming beam. The distribution function of the angular transmission has a stretched exponential decay. The total-tra…
▽ More
The distributions of the angular transmission coefficient and of the total transmission are calculated for multiple scattered waves. The calculation is based on a mapping to the distribution of eigenvalues of the transmission matrix. The distributions depend on the profile of the incoming beam. The distribution function of the angular transmission has a stretched exponential decay. The total-transmission distribution grows log-normally whereas it decays exponentially.
△ Less
Submitted 13 May, 1994;
originally announced May 1994.