Search | arXiv e-print repository

Flat Channels to Infinity in Neural Loss Landscapes

Authors: Flavio Martinelli, Alexander Van Meegen, Berfin Şimşek, Wulfram Gerstner, Johanni Brea

Abstract: The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slowly, while the output weights of at least two neurons, $a_i$ and $a_j$, diverge to $\pm$infinity, and their input weight vectors,… ▽ More The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slowly, while the output weights of at least two neurons, $a_i$ and $a_j$, diverge to $\pm$infinity, and their input weight vectors, $\mathbf{w_i}$ and $\mathbf{w_j}$, become equal to each other. At convergence, the two neurons implement a gated linear unit: $a_iσ(\mathbf{w_i} \cdot \mathbf{x}) + a_jσ(\mathbf{w_j} \cdot \mathbf{x}) \rightarrow σ(\mathbf{w} \cdot \mathbf{x}) + (\mathbf{v} \cdot \mathbf{x}) σ'(\mathbf{w} \cdot \mathbf{x})$. Geometrically, these channels to infinity are asymptotically parallel to symmetry-induced lines of critical points. Gradient flow solvers, and related optimization methods like SGD or ADAM, reach the channels with high probability in diverse regression settings, but without careful inspection they look like flat local minima with finite parameter values. Our characterization provides a comprehensive picture of these quasi-flat regions in terms of gradient dynamics, geometry, and functional interpretation. The emergence of gated linear units at the end of the channels highlights a surprising aspect of the computational capabilities of fully connected layers. △ Less

Submitted 17 June, 2025; originally announced June 2025.

arXiv:2409.01969 [pdf, other]

Connectivity structure and dynamics of nonlinear recurrent neural networks

Authors: David G. Clark, Owen Marschall, Alexander van Meegen, Ashok Litwin-Kumar

Abstract: We develop a theory to analyze how structure in connectivity shapes the high-dimensional, internally generated activity of nonlinear recurrent neural networks. Using two complementary methods -- a path-integral calculation of fluctuations around the saddle point, and a recently introduced two-site cavity approach -- we derive analytic expressions that characterize important features of collective… ▽ More We develop a theory to analyze how structure in connectivity shapes the high-dimensional, internally generated activity of nonlinear recurrent neural networks. Using two complementary methods -- a path-integral calculation of fluctuations around the saddle point, and a recently introduced two-site cavity approach -- we derive analytic expressions that characterize important features of collective activity, including its dimensionality and temporal correlations. To model structure in the coupling matrices of real neural circuits, such as synaptic connectomes obtained through electron microscopy, we introduce the random-mode model, which parameterizes a coupling matrix using random input and output modes and a specified spectrum. This model enables systematic study of the effects of low-dimensional structure in connectivity on neural activity. These effects manifest in features of collective activity, that we calculate, and can be undetectable when analyzing only single-neuron activities. We derive a relation between the effective rank of the coupling matrix and the dimension of activity. By extending the random-mode model, we compare the effects of single-neuron heterogeneity and low-dimensional connectivity. We also investigate the impact of structured overlaps between input and output modes, a feature of biological coupling matrices. Our theory provides tools to relate neural-network architecture and collective dynamics in artificial and biological systems. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: 35 pages, 11 figures

arXiv:2406.16689 [pdf, other]

doi 10.1038/s41467-025-58276-6

Coding schemes in neural networks learning classification tasks

Authors: Alexander van Meegen, Haim Sompolinsky

Abstract: Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate… ▽ More Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate fully-connected, wide neural networks learning classification tasks using the Bayesian framework where learning shapes the posterior distribution of the network weights. Consistent with previous findings, our analysis of the feature learning regime (also known as `non-lazy', `rich', or `mean-field' regime) shows that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity. In linear networks, an analog coding scheme of the task emerges. Despite the strong representations, the mean predictor is identical to the lazy case. In nonlinear networks, spontaneous symmetry breaking leads to either redundant or sparse coding schemes. Our findings highlight how network properties such as scaling of weights and neuronal nonlinearity can profoundly influence the emergent representations. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2309.14973 [pdf, other]

Linking Network and Neuron-level Correlations by Renormalized Field Theory

Authors: Michael Dick, Alexander van Meegen, Moritz Helias

Abstract: It is frequently hypothesized that cortical networks operate close to a critical point. Advantages of criticality include rich dynamics well-suited for computation and critical slowing down, which may offer a mechanism for dynamic memory. However, mean-field approximations, while versatile and popular, inherently neglect the fluctuations responsible for such critical dynamics. Thus, a renormalized… ▽ More It is frequently hypothesized that cortical networks operate close to a critical point. Advantages of criticality include rich dynamics well-suited for computation and critical slowing down, which may offer a mechanism for dynamic memory. However, mean-field approximations, while versatile and popular, inherently neglect the fluctuations responsible for such critical dynamics. Thus, a renormalized theory is necessary. We consider the Sompolinsky-Crisanti-Sommers model which displays a well studied chaotic as well as a magnetic transition. Based on the analogue of a quantum effective action, we derive self-consistency equations for the first two renormalized Greens functions. Their self-consistent solution reveals a coupling between the population level activity and single neuron heterogeneity. The quantitative theory explains the population autocorrelation function, the single-unit autocorrelation function with its multiple temporal scales, and cross correlations. △ Less

Submitted 27 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2210.07877 [pdf, other]

The Distribution of Unstable Fixed Points in Chaotic Neural Networks

Authors: Jakob Stubenrauch, Christian Keup, Anno C. Kurth, Moritz Helias, Alexander van Meegen

Abstract: We analytically determine the number and distribution of fixed points in a canonical model of a chaotic neural network. This distribution reveals that fixed points and dynamics are confined to separate shells in phase space. Furthermore, the distribution enables us to determine the eigenvalue spectra of the Jacobian at the fixed points. Despite the radial separation of fixed points and dynamics, w… ▽ More We analytically determine the number and distribution of fixed points in a canonical model of a chaotic neural network. This distribution reveals that fixed points and dynamics are confined to separate shells in phase space. Furthermore, the distribution enables us to determine the eigenvalue spectra of the Jacobian at the fixed points. Despite the radial separation of fixed points and dynamics, we find that nearby fixed points act as partially attracting landmarks for the dynamics. △ Less

Submitted 11 December, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 37 pages, 11 figures. Main changes include: - additional analysis of the fixed points' role for the dynamics - replica calculation for the random determinant (removing a previous assumption) - extensive investigation of finite size effects - asymptotic results for the number of fixed points and their distribution at the edge of chaos and in the strongly chaotic limit

arXiv:2112.05589 [pdf, other]

doi 10.1088/1742-5468/ac8e57

Unified field theoretical approach to deep and recurrent neuronal networks

Authors: Kai Segadlo, Bastian Epping, Alexander van Meegen, David Dahmen, Michael Krämer, Moritz Helias

Abstract: Understanding capabilities and limitations of different network architectures is of fundamental importance to machine learning. Bayesian inference on Gaussian processes has proven to be a viable approach for studying recurrent and deep networks in the limit of infinite layer width, $n\to\infty$. Here we present a unified and systematic derivation of the mean-field theory for both architectures tha… ▽ More Understanding capabilities and limitations of different network architectures is of fundamental importance to machine learning. Bayesian inference on Gaussian processes has proven to be a viable approach for studying recurrent and deep networks in the limit of infinite layer width, $n\to\infty$. Here we present a unified and systematic derivation of the mean-field theory for both architectures that starts from first principles by employing established methods from statistical physics of disordered systems. The theory elucidates that while the mean-field equations are different with regard to their temporal structure, they yet yield identical Gaussian kernels when readouts are taken at a single time point or layer, respectively. Bayesian inference applied to classification then predicts identical performance and capabilities for the two architectures. Numerically, we find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks and the convergence speed depends non-trivially on the parameters of the weight prior as well as the depth or number of time steps, respectively. Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$ and we compute next-to-leading-order corrections which turn out to be architecture-specific. The formalism thus paves the way to investigate the fundamental differences between recurrent and deep architectures at finite widths $n$. △ Less

Submitted 24 June, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

Comments: Revision including next-to-leading-order corrections

arXiv:2011.11335 [pdf, other]

doi 10.1007/978-3-030-82427-3_4

Usage and Scaling of an Open-Source Spiking Multi-Area Model of Monkey Cortex

Authors: Sacha Jennifer van Albada, Jari Pronold, Alexander van Meegen, Markus Diesmann

Abstract: We are entering an age of `big' computational neuroscience, in which neural network models are increasing in size and in numbers of underlying data sets. Consolidating the zoo of models into large-scale models simultaneously consistent with a wide range of data is only possible through the effort of large teams, which can be spread across multiple research institutions. To ensure that computationa… ▽ More We are entering an age of `big' computational neuroscience, in which neural network models are increasing in size and in numbers of underlying data sets. Consolidating the zoo of models into large-scale models simultaneously consistent with a wide range of data is only possible through the effort of large teams, which can be spread across multiple research institutions. To ensure that computational neuroscientists can build on each other's work, it is important to make models publicly available as well-documented code. This chapter describes such an open-source model, which relates the connectivity structure of all vision-related cortical areas of the macaque monkey with their resting-state dynamics. We give a brief overview of how to use the executable model specification, which employs NEST as simulation engine, and show its runtime scaling. The solutions found serve as an example for organizing the workflow of future models from the raw experimental data to the visualization of the results, expose the challenges, and give guidance for the construction of ICT infrastructure for neuroscience. △ Less

Submitted 23 November, 2020; originally announced November 2020.

ACM Class: J.3

arXiv:2009.08889 [pdf, other]

doi 10.1103/PhysRevLett.127.158302

Large Deviations Approach to Random Recurrent Neuronal Networks: Parameter Inference and Fluctuation-Induced Transitions

Authors: Alexander van Meegen, Tobias Kühn, Moritz Helias

Abstract: We here unify the field theoretical approach to neuronal networks with large deviations theory. For a prototypical random recurrent network model with continuous-valued units, we show that the effective action is identical to the rate function and derive the latter using field theory. This rate function takes the form of a Kullback-Leibler divergence which enables data-driven inference of model pa… ▽ More We here unify the field theoretical approach to neuronal networks with large deviations theory. For a prototypical random recurrent network model with continuous-valued units, we show that the effective action is identical to the rate function and derive the latter using field theory. This rate function takes the form of a Kullback-Leibler divergence which enables data-driven inference of model parameters and calculation of fluctuations beyond mean-field theory. Lastly, we expose a regime with fluctuation-induced transitions between mean-field solutions. △ Less

Submitted 19 August, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

Comments: Extension to multiple populations

Journal ref: Phys. Rev. Lett. 127, 158302 (2021)

arXiv:1909.01908 [pdf, other]

doi 10.1103/PhysRevResearch.3.043077

A Microscopic Theory of Intrinsic Timescales in Spiking Neural Networks

Authors: Alexander van Meegen, Sacha J. van Albada

Abstract: A complex interplay of single-neuron properties and the recurrent network structure shapes the activity of cortical neurons. The single-neuron activity statistics differ in general from the respective population statistics, including spectra and, correspondingly, autocorrelation times. We develop a theory for self-consistent second-order single-neuron statistics in block-structured sparse random n… ▽ More A complex interplay of single-neuron properties and the recurrent network structure shapes the activity of cortical neurons. The single-neuron activity statistics differ in general from the respective population statistics, including spectra and, correspondingly, autocorrelation times. We develop a theory for self-consistent second-order single-neuron statistics in block-structured sparse random networks of spiking neurons. In particular, the theory predicts the neuron-level autocorrelation times, also known as intrinsic timescales, of the neuronal activity. The theory is based on an extension of dynamic mean-field theory from rate networks to spiking networks, which is validated via simulations. It accounts for both static variability, e.g. due to a distributed number of incoming synapses per neuron, and temporal fluctuations of the input. We apply the theory to balanced random networks of generalized linear model neurons, balanced random networks of leaky integrate-and-fire neurons, and a biologically constrained network of leaky integrate-and-fire neurons. For the generalized linear model network with an error function nonlinearity, a novel analytical solution of the colored noise problem allows us to obtain self-consistent firing rate distributions, single-neuron power spectra, and intrinsic timescales. For the leaky integrate-and-fire networks, we derive an approximate analytical solution of the colored noise problem, based on the Stratonovich approximation of the Wiener-Rice series and a novel analytical solution for the free upcrossing statistics. Again closing the system self-consistently, in the fluctuation-driven regime this approximation yields reliable estimates of the mean firing rate and its variance across neurons, the inter-spike interval distribution, the single-neuron power spectra, and intrinsic timescales. △ Less

Submitted 15 September, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

Journal ref: Phys. Rev. Research 3, 043077 (2021)

Showing 1–9 of 9 results for author: Van Meegen, A