-
Transformers as Transducers
Authors:
Lena Strobl,
Dana Angluin,
David Chiang,
Jonathan Rawski,
Ashish Sabharwal
Abstract:
We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence funct…
▽ More
We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence functions and show that it computes exactly the first-order rational functions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular functions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular functions. Finally, we show that masked average-hard attention transformers can simulate S-RASP.
△ Less
Submitted 5 November, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Benchmarking Compositionality with Formal Languages
Authors:
Josef Valvoda,
Naomi Saphra,
Jonathan Rawski,
Adina Williams,
Ryan Cotterell
Abstract:
Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability. Whether large neural models in NLP can acquire this ability while learning from data is an open question. In this paper, we investigate this problem from the perspective of formal languages. We use deterministic finite-state transducers to make an unbounded number of datasets with…
▽ More
Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability. Whether large neural models in NLP can acquire this ability while learning from data is an open question. In this paper, we investigate this problem from the perspective of formal languages. We use deterministic finite-state transducers to make an unbounded number of datasets with controllable properties governing compositionality. By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network. We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
△ Less
Submitted 1 August, 2023; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Tensor Product Representations of Subregular Formal Languages
Authors:
Jonathan Rawski
Abstract:
This paper provides a geometric characterization of subclasses of the regular languages. We use finite model theory to characterize objects like strings and trees as relational structures. Logical statements meeting certain criteria over these models define subregular classes of languages. The semantics of such statements can be compiled into tensor structures, using multilinear maps as function a…
▽ More
This paper provides a geometric characterization of subclasses of the regular languages. We use finite model theory to characterize objects like strings and trees as relational structures. Logical statements meeting certain criteria over these models define subregular classes of languages. The semantics of such statements can be compiled into tensor structures, using multilinear maps as function application for evaluation. This method is applied to consider two properly subregular languages over different string models.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
Learning with Partially Ordered Representations
Authors:
Jane Chandlee,
Remi Eyraud,
Jeffrey Heinz,
Adam Jardine,
Jonathan Rawski
Abstract:
This paper examines the characterization and learning of grammars defined with enriched representational models. Model-theoretic approaches to formal language theory traditionally assume that each position in a string belongs to exactly one unary relation. We consider unconventional string models where positions can have multiple, shared properties, which are arguably useful in many applications.…
▽ More
This paper examines the characterization and learning of grammars defined with enriched representational models. Model-theoretic approaches to formal language theory traditionally assume that each position in a string belongs to exactly one unary relation. We consider unconventional string models where positions can have multiple, shared properties, which are arguably useful in many applications. We show the structures given by these models are partially ordered, and present a learning algorithm that exploits this ordering relation to effectively prune the hypothesis space. We prove this learning algorithm, which takes positive examples as input, finds the most general grammar which covers the data.
△ Less
Submitted 23 June, 2019; v1 submitted 18 June, 2019;
originally announced June 2019.