-
A Minimum Description Length Approach to Regularization in Neural Networks
Authors:
Matan Abudy,
Orr Well,
Emmanuel Chemla,
Roni Katzir,
Nur Lan
Abstract:
State-of-the-art neural networks can be trained to become remarkable solutions to many problems. But while these architectures can express symbolic, perfect solutions, trained models often arrive at approximations instead. We show that the choice of regularization method plays a crucial role: when trained on formal languages with standard regularization ($L_1$, $L_2$, or none), expressive architec…
▽ More
State-of-the-art neural networks can be trained to become remarkable solutions to many problems. But while these architectures can express symbolic, perfect solutions, trained models often arrive at approximations instead. We show that the choice of regularization method plays a crucial role: when trained on formal languages with standard regularization ($L_1$, $L_2$, or none), expressive architectures not only fail to converge to correct solutions but are actively pushed away from perfect initializations. In contrast, applying the Minimum Description Length (MDL) principle to balance model complexity with data fit provides a theoretically grounded regularization method. Using MDL, perfect solutions are selected over approximations, independently of the optimization algorithm. We propose that unlike existing regularization techniques, MDL introduces the appropriate inductive bias to effectively counteract overfitting and promote generalization.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Visual Environment-Interactive Planning for Embodied Complex-Question Answering
Authors:
Ning Lan,
Baoshan Ou,
Xuemei Xie,
Guangming Shi
Abstract:
This study focuses on Embodied Complex-Question Answering task, which means the embodied robot need to understand human questions with intricate structures and abstract semantics. The core of this task lies in making appropriate plans based on the perception of the visual environment. Existing methods often generate plans in a once-for-all manner, i.e., one-step planning. Such approach rely on lar…
▽ More
This study focuses on Embodied Complex-Question Answering task, which means the embodied robot need to understand human questions with intricate structures and abstract semantics. The core of this task lies in making appropriate plans based on the perception of the visual environment. Existing methods often generate plans in a once-for-all manner, i.e., one-step planning. Such approach rely on large models, without sufficient understanding of the environment. Considering multi-step planning, the framework for formulating plans in a sequential manner is proposed in this paper. To ensure the ability of our framework to tackle complex questions, we create a structured semantic space, where hierarchical visual perception and chain expression of the question essence can achieve iterative interaction. This space makes sequential task planning possible. Within the framework, we first parse human natural language based on a visual hierarchical scene graph, which can clarify the intention of the question. Then, we incorporate external rules to make a plan for current step, weakening the reliance on large models. Every plan is generated based on feedback from visual perception, with multiple rounds of interaction until an answer is obtained. This approach enables continuous feedback and adjustment, allowing the robot to optimize its action strategy. To test our framework, we contribute a new dataset with more complex questions. Experimental results demonstrate that our approach performs excellently and stably on complex tasks. And also, the feasibility of our approach in real-world scenarios has been established, indicating its practical applicability.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Large Language Models as Proxies for Theories of Human Linguistic Cognition
Authors:
Imry Ziv,
Nur Lan,
Emmanuel Chemla,
Roni Katzir
Abstract:
We consider the possible role of current large language models (LLMs) in the study of human linguistic cognition. We focus on the use of such models as proxies for theories of cognition that are relatively linguistically-neutral in their representations and learning but differ from current LLMs in key ways. We illustrate this potential use of LLMs as proxies for theories of cognition in the contex…
▽ More
We consider the possible role of current large language models (LLMs) in the study of human linguistic cognition. We focus on the use of such models as proxies for theories of cognition that are relatively linguistically-neutral in their representations and learning but differ from current LLMs in key ways. We illustrate this potential use of LLMs as proxies for theories of cognition in the context of two kinds of questions: (a) whether the target theory accounts for the acquisition of a given pattern from a given corpus; and (b) whether the target theory makes a given typologically-attested pattern easier to acquire than another, typologically-unattested pattern. For each of the two questions we show, building on recent literature, how current LLMs can potentially be of help, but we note that at present this help is quite limited.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
What Makes Two Language Models Think Alike?
Authors:
Jeanne Salle,
Louis Jalouzot,
Nur Lan,
Emmanuel Chemla,
Yair Lakretz
Abstract:
Do architectural differences significantly affect the way models represent and process language? We propose a new approach, based on metric-learning encoding models (MLEMs), as a first step to answer this question. The approach provides a feature-based comparison of how any two layers of any two models represent linguistic information. We apply the method to BERT, GPT-2 and Mamba. Unlike previous…
▽ More
Do architectural differences significantly affect the way models represent and process language? We propose a new approach, based on metric-learning encoding models (MLEMs), as a first step to answer this question. The approach provides a feature-based comparison of how any two layers of any two models represent linguistic information. We apply the method to BERT, GPT-2 and Mamba. Unlike previous methods, MLEMs offer a transparent comparison, by identifying the specific linguistic features responsible for similarities and differences. More generally, the method uses formal, symbolic descriptions of a domain, and use these to compare neural representations. As such, the approach can straightforwardly be extended to other domains, such as speech and vision, and to other neural systems, including human brains.
△ Less
Submitted 24 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
The Invariant Rauch-Tung-Striebel Smoother
Authors:
Niels van der Laan,
Mitchell Cohen,
Jonathan Arsenault,
James Richard Forbes
Abstract:
This paper presents an invariant Rauch-Tung- Striebel (IRTS) smoother applicable to systems with states that are an element of a matrix Lie group. In particular, the extended Rauch-Tung-Striebel (RTS) smoother is adapted to work within a matrix Lie group framework. The main advantage of the invariant RTS (IRTS) smoother is that the linearization of the process and measurement models is independent…
▽ More
This paper presents an invariant Rauch-Tung- Striebel (IRTS) smoother applicable to systems with states that are an element of a matrix Lie group. In particular, the extended Rauch-Tung-Striebel (RTS) smoother is adapted to work within a matrix Lie group framework. The main advantage of the invariant RTS (IRTS) smoother is that the linearization of the process and measurement models is independent of the state estimate resulting in state-estimate-independent Jacobians when certain technical requirements are met. A sample problem is considered that involves estimation of the three dimensional pose of a rigid body on SE(3), along with sensor biases. The multiplicative RTS (MRTS) smoother is also reviewed and is used as a direct comparison to the proposed IRTS smoother using experimental data. Both smoothing methods are also compared to invariant and multiplicative versions of the Gauss-Newton approach to solving the batch state estimation problem.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations
Authors:
Louis Jalouzot,
Robin Sobczyk,
Bastien Lhopitallier,
Jeanne Salle,
Nur Lan,
Emmanuel Chemla,
Yair Lakretz
Abstract:
We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features…
▽ More
We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features are ordered: they separate representations of sentences to different degrees in different layers; (2) neural representations are organized hierarchically: in some layers, we find clusters of representations nested within larger clusters, following successively important linguistic features; (3) linguistic features are disentangled in middle layers: distinct, selective units are activated by distinct linguistic features. Methodologically, MLEMs are superior (4) to multivariate decoding methods, being more robust to type-I errors, and (5) to univariate encoding methods, in being able to predict both local and distributed representations. Together, this demonstrates the utility of Metric-Learning Encoding Methods for studying how linguistic features are neurally encoded in language models and the advantage of MLEMs over traditional methods. MLEMs can be extended to other domains (e.g. vision) and to other neural systems, such as the human brain.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length
Authors:
Nur Lan,
Emmanuel Chemla,
Roni Katzir
Abstract:
Neural networks offer good approximation to many tasks but consistently fail to reach perfect generalization, even when theoretical work shows that such perfect solutions can be expressed by certain architectures. Using the task of formal language learning, we focus on one simple formal language and show that the theoretically correct solution is in fact not an optimum of commonly used objectives…
▽ More
Neural networks offer good approximation to many tasks but consistently fail to reach perfect generalization, even when theoretical work shows that such perfect solutions can be expressed by certain architectures. Using the task of formal language learning, we focus on one simple formal language and show that the theoretically correct solution is in fact not an optimum of commonly used objectives -- even with regularization techniques that according to common wisdom should lead to simple weights and good generalization (L1, L2) or other meta-heuristics (early-stopping, dropout). On the other hand, replacing standard targets with the Minimum Description Length objective (MDL) results in the correct solution being an optimum.
△ Less
Submitted 6 June, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Minimum Description Length Hopfield Networks
Authors:
Matan Abudy,
Nur Lan,
Emmanuel Chemla,
Roni Katzir
Abstract:
Associative memory architectures are designed for memorization but also offer, through their retrieval method, a form of generalization to unseen inputs: stored memories can be seen as prototypes from this point of view. Focusing on Modern Hopfield Networks (MHN), we show that a large memorization capacity undermines the generalization opportunity. We offer a solution to better optimize this trade…
▽ More
Associative memory architectures are designed for memorization but also offer, through their retrieval method, a form of generalization to unseen inputs: stored memories can be seen as prototypes from this point of view. Focusing on Modern Hopfield Networks (MHN), we show that a large memorization capacity undermines the generalization opportunity. We offer a solution to better optimize this tradeoff. It relies on Minimum Description Length (MDL) to determine during training which memories to store, as well as how many of them.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Benchmarking Neural Network Generalization for Grammar Induction
Authors:
Nur Lan,
Emmanuel Chemla,
Roni Katzir
Abstract:
How well do neural networks generalize? Even for grammar induction tasks, where the target generalization is fully known, previous works have left the question open, testing very limited ranges beyond the training set and using different success criteria. We provide a measure of neural network generalization based on fully specified formal languages. Given a model and a formal grammar, the method…
▽ More
How well do neural networks generalize? Even for grammar induction tasks, where the target generalization is fully known, previous works have left the question open, testing very limited ranges beyond the training set and using different success criteria. We provide a measure of neural network generalization based on fully specified formal languages. Given a model and a formal grammar, the method assigns a generalization score representing how well a model generalizes to unseen samples in inverse relation to the amount of data it was trained on. The benchmark includes languages such as $a^nb^n$, $a^nb^nc^n$, $a^nb^mc^{n+m}$, and Dyck-1 and 2. We evaluate selected architectures using the benchmark and find that networks trained with a Minimum Description Length objective (MDL) generalize better and using less data than networks trained using standard loss functions. The benchmark is available at https://github.com/taucompling/bliss.
△ Less
Submitted 25 August, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
BlazeNeo: Blazing fast polyp segmentation and neoplasm detection
Authors:
Nguyen Sy An,
Phan Ngoc Lan,
Dao Viet Hang,
Dao Van Long,
Tran Quang Trung,
Nguyen Thi Thuy,
Dinh Viet Sang
Abstract:
In recent years, computer-aided automatic polyp segmentation and neoplasm detection have been an emerging topic in medical image analysis, providing valuable support to colonoscopy procedures. Attentions have been paid to improving the accuracy of polyp detection and segmentation. However, not much focus has been given to latency and throughput for performing these tasks on dedicated devices, whic…
▽ More
In recent years, computer-aided automatic polyp segmentation and neoplasm detection have been an emerging topic in medical image analysis, providing valuable support to colonoscopy procedures. Attentions have been paid to improving the accuracy of polyp detection and segmentation. However, not much focus has been given to latency and throughput for performing these tasks on dedicated devices, which can be crucial for practical applications. This paper introduces a novel deep neural network architecture called BlazeNeo, for the task of polyp segmentation and neoplasm detection with an emphasis on compactness and speed while maintaining high accuracy. The model leverages the highly efficient HarDNet backbone alongside lightweight Receptive Field Blocks for computational efficiency, and an auxiliary training mechanism to take full advantage of the training data for the segmentation quality. Our experiments on a challenging dataset show that BlazeNeo achieves improvements in latency and model size while maintaining comparable accuracy against state-of-the-art methods. When deploying on the Jetson AGX Xavier edge device in INT8 precision, our BlazeNeo achieves over 155 fps while yielding the best accuracy among all compared methods.
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
Minimum Description Length Recurrent Neural Networks
Authors:
Nur Lan,
Michal Geyer,
Emmanuel Chemla,
Roni Katzir
Abstract:
We train neural networks to optimize a Minimum Description Length score, i.e., to balance between the complexity of the network and its accuracy at a task. We show that networks optimizing this objective function master tasks involving memory challenges and go beyond context-free languages. These learners master languages such as $a^nb^n$, $a^nb^nc^n$, $a^nb^{2n}$, $a^nb^mc^{n+m}$, and they perfor…
▽ More
We train neural networks to optimize a Minimum Description Length score, i.e., to balance between the complexity of the network and its accuracy at a task. We show that networks optimizing this objective function master tasks involving memory challenges and go beyond context-free languages. These learners master languages such as $a^nb^n$, $a^nb^nc^n$, $a^nb^{2n}$, $a^nb^mc^{n+m}$, and they perform addition. Moreover, they often do so with 100% accuracy. The networks are small, and their inner workings are transparent. We thus provide formal proofs that their perfect accuracy holds not only on a given test set, but for any input sequence. To our knowledge, no other connectionist model has been shown to capture the underlying grammars for these languages in full generality.
△ Less
Submitted 31 March, 2022; v1 submitted 31 October, 2021;
originally announced November 2021.
-
NeoUNet: Towards accurate colon polyp segmentation and neoplasm detection
Authors:
Phan Ngoc Lan,
Nguyen Sy An,
Dao Viet Hang,
Dao Van Long,
Tran Quang Trung,
Nguyen Thi Thuy,
Dinh Viet Sang
Abstract:
Automatic polyp segmentation has proven to be immensely helpful for endoscopy procedures, reducing the missing rate of adenoma detection for endoscopists while increasing efficiency. However, classifying a polyp as being neoplasm or not and segmenting it at the pixel level is still a challenging task for doctors to perform in a limited time. In this work, we propose a fine-grained formulation for…
▽ More
Automatic polyp segmentation has proven to be immensely helpful for endoscopy procedures, reducing the missing rate of adenoma detection for endoscopists while increasing efficiency. However, classifying a polyp as being neoplasm or not and segmenting it at the pixel level is still a challenging task for doctors to perform in a limited time. In this work, we propose a fine-grained formulation for the polyp segmentation problem. Our formulation aims to not only segment polyp regions, but also identify those at high risk of malignancy with high accuracy. In addition, we present a UNet-based neural network architecture called NeoUNet, along with a hybrid loss function to solve this problem. Experiments show highly competitive results for NeoUNet on our benchmark dataset compared to existing polyp segmentation models.
△ Less
Submitted 11 July, 2021;
originally announced July 2021.
-
AG-CUResNeSt: A Novel Method for Colon Polyp Segmentation
Authors:
Dinh Viet Sang,
Tran Quang Chung,
Phan Ngoc Lan,
Dao Viet Hang,
Dao Van Long,
Nguyen Thi Thuy
Abstract:
Colorectal cancer is among the most common malignancies and can develop from high-risk colon polyps. Colonoscopy is an effective screening tool to detect and remove polyps, especially in the case of precancerous lesions. However, the missing rate in clinical practice is relatively high due to many factors. The procedure could benefit greatly from using AI models for automatic polyp segmentation, w…
▽ More
Colorectal cancer is among the most common malignancies and can develop from high-risk colon polyps. Colonoscopy is an effective screening tool to detect and remove polyps, especially in the case of precancerous lesions. However, the missing rate in clinical practice is relatively high due to many factors. The procedure could benefit greatly from using AI models for automatic polyp segmentation, which provide valuable insights for improving colon polyp detection. However, precise segmentation is still challenging due to variations of polyps in size, shape, texture, and color. This paper proposes a novel neural network architecture called AG-CUResNeSt, which enhances Coupled UNets using the robust ResNeSt backbone and attention gates. The network is capable of effectively combining multi-level features to yield accurate polyp segmentation. Experimental results on five popular benchmark datasets show that our proposed method achieves state-of-the-art accuracy compared to existing methods.
△ Less
Submitted 1 March, 2022; v1 submitted 2 May, 2021;
originally announced May 2021.
-
On the Spontaneous Emergence of Discrete and Compositional Signals
Authors:
Nur Geffen Lan,
Emmanuel Chemla,
Shane Steinert-Threlkeld
Abstract:
We propose a general framework to study language emergence through signaling games with neural agents. Using a continuous latent space, we are able to (i) train using backpropagation, (ii) show that discrete messages nonetheless naturally emerge. We explore whether categorical perception effects follow and show that the messages are not compositional.
We propose a general framework to study language emergence through signaling games with neural agents. Using a continuous latent space, we are able to (i) train using backpropagation, (ii) show that discrete messages nonetheless naturally emerge. We explore whether categorical perception effects follow and show that the messages are not compositional.
△ Less
Submitted 30 April, 2020;
originally announced May 2020.