-
The Importance of the Current Input in Sequence Modeling
Authors:
Christian Oliva,
Luis F. Lago-Fernández
Abstract:
The last advances in sequence modeling are mainly based on deep learning approaches. The current state of the art involves the use of variations of the standard LSTM architecture, combined with several tricks that improve the final prediction rates of the trained neural networks. However, in some cases, these adaptations might be too much tuned to the particular problems being addressed. In this a…
▽ More
The last advances in sequence modeling are mainly based on deep learning approaches. The current state of the art involves the use of variations of the standard LSTM architecture, combined with several tricks that improve the final prediction rates of the trained neural networks. However, in some cases, these adaptations might be too much tuned to the particular problems being addressed. In this article, we show that a very simple idea, to add a direct connection between the input and the output, skipping the recurrent module, leads to an increase of the prediction accuracy in sequence modeling problems related to natural language processing. Experiments carried out on different problems show that the addition of this kind of connection to a recurrent network always improves the results, regardless of the architecture and training-specific details. When this idea is introduced into the models that lead the field, the resulting networks achieve a new state-of-the-art perplexity in language modeling problems.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Backward Gradient Normalization in Deep Neural Networks
Authors:
Alejandro Cabana,
Luis F. Lago-Fernández
Abstract:
We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest n…
▽ More
We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest network layers without experimenting vanishing or explosion. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm, allowing the update of weights in the deepest layers and improving network accuracy on several experimental conditions.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Stability of Internal States in Recurrent Neural Networks Trained on Regular Languages
Authors:
Christian Oliva,
Luis F. Lago-Fernández
Abstract:
We provide an empirical study of the stability of recurrent neural networks trained to recognize regular languages. When a small amount of noise is introduced into the activation function, the neurons in the recurrent layer tend to saturate in order to compensate the variability. In this saturated regime, analysis of the network activation shows a set of clusters that resemble discrete states in a…
▽ More
We provide an empirical study of the stability of recurrent neural networks trained to recognize regular languages. When a small amount of noise is introduced into the activation function, the neurons in the recurrent layer tend to saturate in order to compensate the variability. In this saturated regime, analysis of the network activation shows a set of clusters that resemble discrete states in a finite state machine. We show that transitions between these states in response to input symbols are deterministic and stable. The networks display a stable behavior for arbitrarily long strings, and when random perturbations are applied to any of the states, they are able to recover and their evolution converges to the original clusters. This observation reinforces the interpretation of the networks as finite automata, with neurons or groups of neurons coding specific and meaningful input patterns.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
Separation of Memory and Processing in Dual Recurrent Neural Networks
Authors:
Christian Oliva,
Luis F. Lago-Fernández
Abstract:
We explore a neural network architecture that stacks a recurrent layer and a feedforward layer that is also connected to the input, and compare it to standard Elman and LSTM architectures in terms of accuracy and interpretability. When noise is introduced into the activation function of the recurrent units, these neurons are forced into a binary activation regime that makes the networks behave muc…
▽ More
We explore a neural network architecture that stacks a recurrent layer and a feedforward layer that is also connected to the input, and compare it to standard Elman and LSTM architectures in terms of accuracy and interpretability. When noise is introduced into the activation function of the recurrent units, these neurons are forced into a binary activation regime that makes the networks behave much as finite automata. The resulting models are simpler, easier to interpret and get higher accuracy on different sample problems, including the recognition of regular languages, the computation of additions in different bases and the generation of arithmetic expressions.
△ Less
Submitted 17 May, 2020;
originally announced May 2020.
-
A compression based framework for the detection of anomalies in heterogeneous data sources
Authors:
Gonzalo de la Torre-Abaitua,
Luis F. Lago-Fernández,
David Arroyo
Abstract:
Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security log…
▽ More
Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security logs. This log files are usually inspected, in a semi-automatic way, by security analysts to detect events that may affect system integrity. On this basis, we propose a parameter-free methodology to detect security incidents from structured text regardless its nature. We use the Normalized Compression Distance to obtain a set of features that can be used by a Support Vector Machine to classify events from a heterogeneous cybersecurity environment. In specific, we explore and validate the application of our methodology in four different cybersecurity domains: HTTP anomaly identification, spam detection, Domain Generation Algorithms tracking and sentiment analysis. The results obtained show the validity and flexibility of our approach in different security scenarios with a low configuration burden.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.