-
Quantifying Emergence in Neural Networks: Insights from Pruning and Training Dynamics
Authors:
Faisal AlShinaifi,
Zeyad Almoaigel,
Johnny Jingze Li,
Abdulla Kuleib,
Gabriel A. Silva
Abstract:
Emergence, where complex behaviors develop from the interactions of simpler components within a network, plays a crucial role in enhancing neural network capabilities. We introduce a quantitative framework to measure emergence during the training process and examine its impact on network performance, particularly in relation to pruning and training dynamics. Our hypothesis posits that the degree o…
▽ More
Emergence, where complex behaviors develop from the interactions of simpler components within a network, plays a crucial role in enhancing neural network capabilities. We introduce a quantitative framework to measure emergence during the training process and examine its impact on network performance, particularly in relation to pruning and training dynamics. Our hypothesis posits that the degree of emergence, defined by the connectivity between active and inactive nodes, can predict the development of emergent behaviors in the network. Through experiments with feedforward and convolutional architectures on benchmark datasets, we demonstrate that higher emergence correlates with improved trainability and performance. We further explore the relationship between network complexity and the loss landscape, suggesting that higher emergence indicates a greater concentration of local minima and a more rugged loss landscape. Pruning, which reduces network complexity by removing redundant nodes and connections, is shown to enhance training efficiency and convergence speed, though it may lead to a reduction in final accuracy. These findings provide new insights into the interplay between emergence, complexity, and performance in neural networks, offering valuable implications for the design and optimization of more efficient architectures.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Authors:
Johnny Jingze Li,
Vivek Kurien George,
Gabriel A. Silva
Abstract:
Emergence in machine learning refers to the spontaneous appearance of complex behaviors or capabilities that arise from the scale and structure of training data and model architectures, despite not being explicitly programmed. We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence. Measuring emergence as a kind of struct…
▽ More
Emergence in machine learning refers to the spontaneous appearance of complex behaviors or capabilities that arise from the scale and structure of training data and model architectures, despite not being explicitly programmed. We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence. Measuring emergence as a kind of structural nonlinearity, our method adjusts the layer-wise weight scaling factors to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization. The simplicity, theoretical innovation, and demonstrable empirical advantages of our method make it a potent enhancement to neural network initialization practices. These results suggest a promising direction for leveraging emergence to improve neural network training methodologies. Code is available at: https://github.com/johnnyjingzeli/EmergenceInit.
△ Less
Submitted 3 January, 2025; v1 submitted 26 July, 2024;
originally announced July 2024.
-
Leveraging Quantum Superposition to Infer the Dynamic Behavior of a Spatial-Temporal Neural Network Signaling Model
Authors:
Gabriel A. Silva
Abstract:
The exploration of new problem classes for quantum computation is an active area of research. In this paper, we introduce and solve a novel problem class related to dynamics on large-scale networks relevant to neurobiology and machine learning. Specifically, we ask if a network can sustain inherent dynamic activity beyond some arbitrary observation time or if the activity ceases through quiescence…
▽ More
The exploration of new problem classes for quantum computation is an active area of research. In this paper, we introduce and solve a novel problem class related to dynamics on large-scale networks relevant to neurobiology and machine learning. Specifically, we ask if a network can sustain inherent dynamic activity beyond some arbitrary observation time or if the activity ceases through quiescence or saturation via an epileptic-like state. We show that this class of problems can be formulated and structured to take advantage of quantum superposition and solved efficiently using the Deutsch-Jozsa and Grover quantum algorithms. To do so, we extend their functionality to address the unique requirements of how input (sub)sets into the algorithms must be mathematically structured while simultaneously constructing the inputs so that measurement outputs can be interpreted as meaningful properties of the network dynamics. This, in turn, allows us to answer the question we pose.
△ Less
Submitted 20 January, 2025; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Comparison of robust, reliability-based and non-probabilistic topology optimization under uncertain loads and stress constraints
Authors:
Gustavo Assis da Silva,
Eduardo Lenz Cardoso,
Andre T. Beck
Abstract:
It is nowadays widely acknowledged that optimal structural design should be robust with respect to the uncertainties in loads and material parameters. However, there are several alternatives to consider such uncertainties in structural optimization problems. This paper presents a comprehensive comparison between the results of three different approaches to topology optimization under uncertain loa…
▽ More
It is nowadays widely acknowledged that optimal structural design should be robust with respect to the uncertainties in loads and material parameters. However, there are several alternatives to consider such uncertainties in structural optimization problems. This paper presents a comprehensive comparison between the results of three different approaches to topology optimization under uncertain loading, considering stress constraints: 1) the robust formulation, which requires only the mean and standard deviation of stresses at each element; 2) the reliability-based formulation, which imposes a reliability constraint on computed stresses; 3) the non-probabilistic formulation, which considers a worst-case scenario for the stresses caused by uncertain loads. The information required by each method, regarding the uncertain loads, and the uncertainty propagation approach used in each case is quite different. The robust formulation requires only mean and standard deviation of uncertain loads; stresses are computed via a first-order perturbation approach. The reliability-based formulation requires full probability distributions of random loads, reliability constraints are computed via a first-order performance measure approach. The non-probabilistic formulation is applicable for bounded uncertain loads; only lower and upper bounds are used, and worst-case stresses are computed via a nested optimization with anti-optimization. The three approaches are quite different in the handling of uncertainties; however, the basic topology optimization framework is the same: the traditional density approach is employed for material parameterization, while the augmented Lagrangian method is employed to solve the resulting problem, in order to handle the large number of stress constraints.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Information entropy re-defined in a category theory context using preradicals
Authors:
Sebastian Pardo G.,
Gabriel A. Silva
Abstract:
Algebraically, entropy can be defined for abelian groups and their endomorphisms, and was latter extended to consider objects in a Flow category derived from abelian categories, such as $R\textit{-}Mod$ with $R$ a ring. Preradicals are endofunctors which can be realized as compatible choice assignments in the category where they are defined. Here we present a formal definition of entropy for prera…
▽ More
Algebraically, entropy can be defined for abelian groups and their endomorphisms, and was latter extended to consider objects in a Flow category derived from abelian categories, such as $R\textit{-}Mod$ with $R$ a ring. Preradicals are endofunctors which can be realized as compatible choice assignments in the category where they are defined. Here we present a formal definition of entropy for preradicals on $R$-Mod and show that the concept of entropy for preradicals respects their order as a big lattice. Also, due to the connection between modules and complete bounded modular lattices, we provide a definition of entropy for lattice preradicals, and show that this notion is equivalent, from a functorial perspective, to the one defined for module preradicals.
△ Less
Submitted 11 December, 2021;
originally announced December 2021.
-
Learning without gradient descent encoded by the dynamics of a neurobiological model
Authors:
Vivek Kurien George,
Vikash Morar,
Weiwei Yang,
Jonathan Larson,
Bryan Tower,
Shweti Mahajan,
Arkin Gupta,
Christopher White,
Gabriel A. Silva
Abstract:
The success of state-of-the-art machine learning is essentially all based on different variations of gradient descent algorithms that minimize some version of a cost or loss function. A fundamental limitation, however, is the need to train these systems in either supervised or unsupervised ways by exposing them to typically large numbers of training examples. Here, we introduce a fundamentally nov…
▽ More
The success of state-of-the-art machine learning is essentially all based on different variations of gradient descent algorithms that minimize some version of a cost or loss function. A fundamental limitation, however, is the need to train these systems in either supervised or unsupervised ways by exposing them to typically large numbers of training examples. Here, we introduce a fundamentally novel conceptual approach to machine learning that takes advantage of a neurobiologically derived model of dynamic signaling, constrained by the geometric structure of a network. We show that MNIST images can be uniquely encoded and classified by the dynamics of geometric networks with nearly state-of-the-art accuracy in an unsupervised way, and without the need for any training.
△ Less
Submitted 23 March, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Generalizable Machine Learning in Neuroscience using Graph Neural Networks
Authors:
Paul Y. Wang,
Sandalika Sapra,
Vivek Kurien George,
Gabriel A. Silva
Abstract:
Although a number of studies have explored deep learning in neuroscience, the application of these algorithms to neural systems on a microscopic scale, i.e. parameters relevant to lower scales of organization, remains relatively novel. Motivated by advances in whole-brain imaging, we examined the performance of deep learning models on microscopic neural dynamics and resulting emergent behaviors us…
▽ More
Although a number of studies have explored deep learning in neuroscience, the application of these algorithms to neural systems on a microscopic scale, i.e. parameters relevant to lower scales of organization, remains relatively novel. Motivated by advances in whole-brain imaging, we examined the performance of deep learning models on microscopic neural dynamics and resulting emergent behaviors using calcium imaging data from the nematode C. elegans. We show that neural networks perform remarkably well on both neuron-level dynamics prediction, and behavioral state classification. In addition, we compared the performance of structure agnostic neural networks and graph neural networks to investigate if graph structure can be exploited as a favorable inductive bias. To perform this experiment, we designed a graph neural network which explicitly infers relations between neurons from neural activity and leverages the inferred graph structure during computations. In our experiments, we found that graph neural networks generally outperformed structure agnostic models and excel in generalization on unseen organisms, implying a potential path to generalizable machine learning in neuroscience.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Large scale three-dimensional manufacturing tolerant stress-constrained topology optimization
Authors:
Gustavo Assis da Silva,
Niels Aage,
André Teófilo Beck,
Ole Sigmund
Abstract:
In topology optimization, the treatment of stress constraints for very large scale problems has so far not been tractable due to the failure of robust agglomeration methods, i.e. their inability to accurately handle the locality of the stress constraints. This paper presents a three-dimensional design methodology that alleviates this shortcoming using both deterministic and robust problem formulat…
▽ More
In topology optimization, the treatment of stress constraints for very large scale problems has so far not been tractable due to the failure of robust agglomeration methods, i.e. their inability to accurately handle the locality of the stress constraints. This paper presents a three-dimensional design methodology that alleviates this shortcoming using both deterministic and robust problem formulations. The robust formulation, based on the three-field density projection approach, is extended to handle manufacturing uncertainty in three-dimensional stress-constrained problems. Several numerical examples are solved and further post-processed with body-fitted meshes using commercial software. The numerical investigations demonstrate that: (1) the employed solution approach based on the augmented Lagrangian method is able to handle large problems, with hundreds of millions of stress constraints; (2) if appropriate interpolation parameters are adopted, voxel-based (fixed grid) models can be used to compute von Mises stresses with excellent accuracy; and (3) in order to ensure manufacturing tolerance in three-dimensional stress-constrained topology optimization, a combination of double filtering and more than three realizations may be required.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Fuzzy neural networks to create an expert system for detecting attacks by SQL Injection
Authors:
Lucas Oliveira Batista,
Gabriel Adriano de Silva,
Vanessa Souza Araújo,
Vinícius Jonathan Silva Araújo,
Thiago Silva Rezende,
Augusto Junio Guimarães,
Paulo Vitor de Campos Souza
Abstract:
Its constant technological evolution characterizes the contemporary world, and every day the processes, once manual, become computerized. Data are stored in the cyberspace, and as a consequence, one must increase the concern with the security of this environment. Cyber-attacks are represented by a growing worldwide scale and are characterized as one of the significant challenges of the century. Th…
▽ More
Its constant technological evolution characterizes the contemporary world, and every day the processes, once manual, become computerized. Data are stored in the cyberspace, and as a consequence, one must increase the concern with the security of this environment. Cyber-attacks are represented by a growing worldwide scale and are characterized as one of the significant challenges of the century. This article aims to propose a computational system based on intelligent hybrid models, which through fuzzy rules allows the construction of expert systems in cybernetic data attacks, focusing on the SQL Injection attack. The tests were performed with real bases of SQL Injection attacks on government computers, using fuzzy neural networks. According to the results obtained, the feasibility of constructing a system based on fuzzy rules, with the classification accuracy of cybernetic invasions within the margin of the standard deviation (compared to the state-of-the-art model in solving this type of problem) is real. The model helps countries prepare to protect their data networks and information systems, as well as create opportunities for expert systems to automate the identification of attacks in cyberspace.
△ Less
Submitted 9 January, 2019;
originally announced January 2019.
-
Multimodal vs. Unimodal Physiological Control in Videogames for Enhanced Realism and Depth
Authors:
Gonçalo Amaral da Silva
Abstract:
(arXiv abridged abstract) In the last two decades, videogames have evolved in a nearly explosive way from the pixelated graphics to today's near-realistic 3D environments. The interaction devices traditionally used in videogames have not evolved with the same intensity, but recent HCI studies have explored biofeedback interaction - the explicit manipulation of a person's physiological data as inpu…
▽ More
(arXiv abridged abstract) In the last two decades, videogames have evolved in a nearly explosive way from the pixelated graphics to today's near-realistic 3D environments. The interaction devices traditionally used in videogames have not evolved with the same intensity, but recent HCI studies have explored biofeedback interaction - the explicit manipulation of a person's physiological data as input to a system - as an alternative to them. Traditional biofeedback prototypes apply 1 sensor to each game mechanic (unimodality).
In this dissertation, we introduce the combination of 2 physiological sensors simultaneously per game mechanic (multimodality) and present a First-Person Shooter game comprised of 8 game mechanics with three interaction flavours (no biofeedback/vanilla, unimodal and multimodal). An empirical study with 32 regular players was employed to explore and study differences between the three interaction types and where they can be best employed.
Players compared the three games in terms of Fun, Ease of Use, Originality, Playability and Favourite Condition. For the sake of completeness, other evaluation methods were used as well: IMI Questionnaire, keywords association and open-ended commentaries. The vanilla version was considered easier to use, but both biofeedback versions were considered the most fun. Both versions were praised differently: the unimodal version for its simplicity of use, and the multimodal for its realism, activation safety of game mechanics and depth added to the game. Our conclusion is that multimodal biofeedback can have a relevant impact in terms of added depth, depending on the way it is used inside the game. On a boundary case, it can be used to increase the feeling of empowerment on the player when using certain abilities, or to intentionally make in-game actions more difficult by demanding more physical effort from the player.
△ Less
Submitted 2 June, 2014;
originally announced June 2014.
-
Mapping the spatiotemporal dynamics of calcium signaling in cellular neural networks using optical flow
Authors:
Marius Buibas,
Diana Yu,
Krystal Nizar,
Gabriel A. Silva
Abstract:
An optical flow gradient algorithm was applied to spontaneously forming net- works of neurons and glia in culture imaged by fluorescence optical microscopy in order to map functional calcium signaling with single pixel resolution. Optical flow estimates the direction and speed of motion of objects in an image between subsequent frames in a recorded digital sequence of images (i.e. a movie). Comp…
▽ More
An optical flow gradient algorithm was applied to spontaneously forming net- works of neurons and glia in culture imaged by fluorescence optical microscopy in order to map functional calcium signaling with single pixel resolution. Optical flow estimates the direction and speed of motion of objects in an image between subsequent frames in a recorded digital sequence of images (i.e. a movie). Computed vector field outputs by the algorithm were able to track the spatiotemporal dynamics of calcium signaling pat- terns. We begin by briefly reviewing the mathematics of the optical flow algorithm, and then describe how to solve for the displacement vectors and how to measure their reliability. We then compare computed flow vectors with manually estimated vectors for the progression of a calcium signal recorded from representative astrocyte cultures. Finally, we applied the algorithm to preparations of primary astrocytes and hippocampal neurons and to the rMC-1 Muller glial cell line in order to illustrate the capability of the algorithm for capturing different types of spatiotemporal calcium activity. We discuss the imaging requirements, parameter selection and threshold selection for reliable measurements, and offer perspectives on uses of the vector data.
△ Less
Submitted 22 January, 2010; v1 submitted 1 December, 2009;
originally announced December 2009.