Search | arXiv e-print repository

The Importance of the Current Input in Sequence Modeling

Authors: Christian Oliva, Luis F. Lago-Fernández

Abstract: The last advances in sequence modeling are mainly based on deep learning approaches. The current state of the art involves the use of variations of the standard LSTM architecture, combined with several tricks that improve the final prediction rates of the trained neural networks. However, in some cases, these adaptations might be too much tuned to the particular problems being addressed. In this a… ▽ More The last advances in sequence modeling are mainly based on deep learning approaches. The current state of the art involves the use of variations of the standard LSTM architecture, combined with several tricks that improve the final prediction rates of the trained neural networks. However, in some cases, these adaptations might be too much tuned to the particular problems being addressed. In this article, we show that a very simple idea, to add a direct connection between the input and the output, skipping the recurrent module, leads to an increase of the prediction accuracy in sequence modeling problems related to natural language processing. Experiments carried out on different problems show that the addition of this kind of connection to a recurrent network always improves the results, regardless of the architecture and training-specific details. When this idea is introduced into the models that lead the field, the resulting networks achieve a new state-of-the-art perplexity in language modeling problems. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 11 pages, 2 appendix pages

arXiv:2006.10828 [pdf, other]

doi 10.1016/j.neucom.2021.04.058

Stability of Internal States in Recurrent Neural Networks Trained on Regular Languages

Authors: Christian Oliva, Luis F. Lago-Fernández

Abstract: We provide an empirical study of the stability of recurrent neural networks trained to recognize regular languages. When a small amount of noise is introduced into the activation function, the neurons in the recurrent layer tend to saturate in order to compensate the variability. In this saturated regime, analysis of the network activation shows a set of clusters that resemble discrete states in a… ▽ More We provide an empirical study of the stability of recurrent neural networks trained to recognize regular languages. When a small amount of noise is introduced into the activation function, the neurons in the recurrent layer tend to saturate in order to compensate the variability. In this saturated regime, analysis of the network activation shows a set of clusters that resemble discrete states in a finite state machine. We show that transitions between these states in response to input symbols are deterministic and stable. The networks display a stable behavior for arbitrarily long strings, and when random perturbations are applied to any of the states, they are able to recover and their evolution converges to the original clusters. This observation reinforces the interpretation of the networks as finite automata, with neurons or groups of neurons coding specific and meaningful input patterns. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 11 pages, 9 figures

arXiv:2005.13971 [pdf, other]

Separation of Memory and Processing in Dual Recurrent Neural Networks

Authors: Christian Oliva, Luis F. Lago-Fernández

Abstract: We explore a neural network architecture that stacks a recurrent layer and a feedforward layer that is also connected to the input, and compare it to standard Elman and LSTM architectures in terms of accuracy and interpretability. When noise is introduced into the activation function of the recurrent units, these neurons are forced into a binary activation regime that makes the networks behave muc… ▽ More We explore a neural network architecture that stacks a recurrent layer and a feedforward layer that is also connected to the input, and compare it to standard Elman and LSTM architectures in terms of accuracy and interpretability. When noise is introduced into the activation function of the recurrent units, these neurons are forced into a binary activation regime that makes the networks behave much as finite automata. The resulting models are simpler, easier to interpret and get higher accuracy on different sample problems, including the recognition of regular languages, the computation of additions in different bases and the generation of arithmetic expressions. △ Less

Submitted 17 May, 2020; originally announced May 2020.

Comments: 10 pages

Showing 1–3 of 3 results for author: Oliva, C