Skip to main content

Showing 1–17 of 17 results for author: Chemla, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13450  [pdf, ps, other

    cs.CL cs.AI

    A Neural Model for Word Repetition

    Authors: Daniel Dager, Robin Sobczyk, Emmanuel Chemla, Yair Lakretz

    Abstract: It takes several years for the developing brain of a baby to fully master word repetition-the task of hearing a word and repeating it aloud. Repeating a new word, such as from a new language, can be a challenging task also for adults. Additionally, brain damage, such as from a stroke, may lead to systematic speech errors with specific characteristics dependent on the location of the brain damage.… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: To appear at Cognitive Computational Neuroscience 2025 (CCN)

  2. arXiv:2505.13398  [pdf, ps, other

    cs.LG cs.CL

    A Minimum Description Length Approach to Regularization in Neural Networks

    Authors: Matan Abudy, Orr Well, Emmanuel Chemla, Roni Katzir, Nur Lan

    Abstract: State-of-the-art neural networks can be trained to become remarkable solutions to many problems. But while these architectures can express symbolic, perfect solutions, trained models often arrive at approximations instead. We show that the choice of regularization method plays a crucial role: when trained on formal languages with standard regularization ($L_1$, $L_2$, or none), expressive architec… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 9 pages

  3. arXiv:2505.02692  [pdf, other

    cs.CL cs.SD eess.AS

    fastabx: A library for efficient computation of ABX discriminability

    Authors: Maxime Poli, Emmanuel Chemla, Emmanuel Dupoux

    Abstract: We introduce fastabx, a high-performance Python library for building ABX discrimination tasks. ABX is a measure of the separation between generic categories of interest. It has been used extensively to evaluate phonetic discriminability in self-supervised speech representations. However, its broader adoption has been limited by the absence of adequate tools. fastabx addresses this gap by providing… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 8 pages, 6 figures

  4. arXiv:2502.07687  [pdf, other

    cs.CL

    Large Language Models as Proxies for Theories of Human Linguistic Cognition

    Authors: Imry Ziv, Nur Lan, Emmanuel Chemla, Roni Katzir

    Abstract: We consider the possible role of current large language models (LLMs) in the study of human linguistic cognition. We focus on the use of such models as proxies for theories of cognition that are relatively linguistically-neutral in their representations and learning but differ from current LLMs in key ways. We illustrate this potential use of LLMs as proxies for theories of cognition in the contex… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  5. arXiv:2412.10446  [pdf, other

    cs.CV cs.AI

    Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

    Authors: Bruno Bianchi, Aakash Agrawal, Stanislas Dehaene, Emmanuel Chemla, Yair Lakretz

    Abstract: Human readers can accurately count how many letters are in a word (e.g., 7 in ``buffalo''), remove a letter from a given position (e.g., ``bufflo'') or add a new one. The human brain of readers must have therefore learned to disentangle information related to the position of a letter and its identity. Such disentanglement is necessary for the compositional, unbounded, ability of humans to create a… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  6. arXiv:2412.05571  [pdf, other

    cs.CL

    A polar coordinate system represents syntax in large language models

    Authors: Pablo Diego-Simón, Stéphane D'Ascoli, Emmanuel Chemla, Yair Lakretz, Jean-Rémi King

    Abstract: Originally formalized with symbolic representations, syntactic trees may also be effectively represented in the activations of large language models (LLMs). Indeed, a 'Structural Probe' can find a subspace of neural activations, where syntactically related words are relatively close to one-another. However, this syntactic code remains incomplete: the distance between the Structural Probe word embe… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

    Journal ref: NeurIPS 2024

  7. arXiv:2410.00025  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

    Authors: Maxime Poli, Emmanuel Chemla, Emmanuel Dupoux

    Abstract: Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible. Generating speech through a pipeline that operates at the text level typically loses nuances, intonations, and non-verbal vocalizations. Modeling directly from speech opens up the path to more natural and expressive systems. On the other hand, speech-only systems require up to three order… ▽ More

    Submitted 30 October, 2024; v1 submitted 16 September, 2024; originally announced October 2024.

    Comments: Accepted at EMNLP 2024 main conference. 9 pages, 4 figures

  8. arXiv:2408.09544  [pdf, other

    cs.CL

    No Such Thing as a General Learner: Language models and their dual optimization

    Authors: Emmanuel Chemla, Ryan M. Nefdt

    Abstract: What role can the otherwise successful Large Language Models (LLMs) play in the understanding of human cognition, and in particular in terms of informing language acquisition debates? To contribute to this question, we first argue that neither humans nor LLMs are general learners, in a variety of senses. We make a novel case for how in particular LLMs follow a dual-optimization process: they are o… ▽ More

    Submitted 21 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 11 pages, 4 figures

  9. arXiv:2406.12620  [pdf, other

    cs.CL

    What Makes Two Language Models Think Alike?

    Authors: Jeanne Salle, Louis Jalouzot, Nur Lan, Emmanuel Chemla, Yair Lakretz

    Abstract: Do architectural differences significantly affect the way models represent and process language? We propose a new approach, based on metric-learning encoding models (MLEMs), as a first step to answer this question. The approach provides a feature-based comparison of how any two layers of any two models represent linguistic information. We apply the method to BERT, GPT-2 and Mamba. Unlike previous… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures

  10. arXiv:2403.18031  [pdf, other

    cs.CL

    The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation

    Authors: Nicolas Guerin, Shane Steinert-Threlkeld, Emmanuel Chemla

    Abstract: Unsupervised on-the-fly back-translation, in conjunction with multilingual pretraining, is the dominant method for unsupervised neural machine translation. Theoretically, however, the method should not work in general. We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  11. arXiv:2402.11608  [pdf, other

    cs.CL

    Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

    Authors: Louis Jalouzot, Robin Sobczyk, Bastien Lhopitallier, Jeanne Salle, Nur Lan, Emmanuel Chemla, Yair Lakretz

    Abstract: We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 17 pages, 13 figures

  12. arXiv:2402.10013  [pdf, other

    cs.CL cs.FL

    Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length

    Authors: Nur Lan, Emmanuel Chemla, Roni Katzir

    Abstract: Neural networks offer good approximation to many tasks but consistently fail to reach perfect generalization, even when theoretical work shows that such perfect solutions can be expressed by certain architectures. Using the task of formal language learning, we focus on one simple formal language and show that the theoretically correct solution is in fact not an optimum of commonly used objectives… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures, 3 appendix pages

  13. arXiv:2311.06518  [pdf, other

    cs.LG cs.CL

    Minimum Description Length Hopfield Networks

    Authors: Matan Abudy, Nur Lan, Emmanuel Chemla, Roni Katzir

    Abstract: Associative memory architectures are designed for memorization but also offer, through their retrieval method, a form of generalization to unseen inputs: stored memories can be seen as prototypes from this point of view. Focusing on Modern Hopfield Networks (MHN), we show that a large memorization capacity undermines the generalization opportunity. We offer a solution to better optimize this trade… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: 4 pages, Associative Memory & Hopfield Networks Workshop at NeurIPS2023

  14. arXiv:2308.08253  [pdf, other

    cs.CL

    Benchmarking Neural Network Generalization for Grammar Induction

    Authors: Nur Lan, Emmanuel Chemla, Roni Katzir

    Abstract: How well do neural networks generalize? Even for grammar induction tasks, where the target generalization is fully known, previous works have left the question open, testing very limited ranges beyond the training set and using different success criteria. We provide a measure of neural network generalization based on fully specified formal languages. Given a model and a formal grammar, the method… ▽ More

    Submitted 25 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 10 pages, 4 figures, 2 tables. Conference: Learning with Small Data 2023

  15. arXiv:2111.00600  [pdf, other

    cs.CL

    Minimum Description Length Recurrent Neural Networks

    Authors: Nur Lan, Michal Geyer, Emmanuel Chemla, Roni Katzir

    Abstract: We train neural networks to optimize a Minimum Description Length score, i.e., to balance between the complexity of the network and its accuracy at a task. We show that networks optimizing this objective function master tasks involving memory challenges and go beyond context-free languages. These learners master languages such as $a^nb^n$, $a^nb^nc^n$, $a^nb^{2n}$, $a^nb^mc^{n+m}$, and they perfor… ▽ More

    Submitted 31 March, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: 15 pages

  16. arXiv:2005.00110  [pdf, other

    cs.CL cs.AI cs.MA

    On the Spontaneous Emergence of Discrete and Compositional Signals

    Authors: Nur Geffen Lan, Emmanuel Chemla, Shane Steinert-Threlkeld

    Abstract: We propose a general framework to study language emergence through signaling games with neural agents. Using a continuous latent space, we are able to (i) train using backpropagation, (ii) show that discrete messages nonetheless naturally emerge. We explore whether categorical perception effects follow and show that the messages are not compositional.

    Submitted 30 April, 2020; originally announced May 2020.

    Comments: ACL 2020

  17. arXiv:1707.08017  [pdf, other

    math.LO cs.LO

    Suszko's Problem: Mixed Consequence and Compositionality

    Authors: Emmanuel Chemla, Paul Egré

    Abstract: Suszko's problem is the problem of finding the minimal number of truth values needed to semantically characterize a syntactic consequence relation. Suszko proved that every Tarskian consequence relation can be characterized using only two truth values. Malinowski showed that this number can equal three if some of Tarski's structural constraints are relaxed. By so doing, Malinowski introduced a cas… ▽ More

    Submitted 9 February, 2019; v1 submitted 25 July, 2017; originally announced July 2017.

    Comments: Keywords: Suszko's thesis; truth value; logical consequence; mixed consequence; compositionality; truth-functionality; many-valued logic; algebraic logic; substructural logics; regular connectives

    MSC Class: 03B47; 03B50; 03G27