-
My Life in Artificial Intelligence: People, anecdotes, and some lessons learnt
Authors:
Kees van Deemter
Abstract:
In this very personal workography, I relate my 40-year experiences as a researcher and educator in and around Artificial Intelligence (AI), more specifically Natural Language Processing. I describe how curiosity, and the circumstances of the day, led me to work in both industry and academia, and in various countries, including The Netherlands (Amsterdam, Eindhoven, and Utrecht), the USA (Stanford)…
▽ More
In this very personal workography, I relate my 40-year experiences as a researcher and educator in and around Artificial Intelligence (AI), more specifically Natural Language Processing. I describe how curiosity, and the circumstances of the day, led me to work in both industry and academia, and in various countries, including The Netherlands (Amsterdam, Eindhoven, and Utrecht), the USA (Stanford), England (Brighton), Scotland (Aberdeen), and China (Beijing and Harbin). People and anecdotes play a large role in my story; the history of AI forms its backdrop. I focus on things that might be of interest to (even) younger colleagues, given the choices they face in their own work and life at a time when AI is finally emerging from the shadows.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
Reference-free Evaluation Metrics for Text Generation: A Survey
Authors:
Takumi Ito,
Kees van Deemter,
Jun Suzuki
Abstract:
A number of automatic evaluation metrics have been proposed for natural language generation systems. The most common approach to automatic evaluation is the use of a reference-based metric that compares the model's output with gold-standard references written by humans. However, it is expensive to create such references, and for some tasks, such as response generation in dialogue, creating referen…
▽ More
A number of automatic evaluation metrics have been proposed for natural language generation systems. The most common approach to automatic evaluation is the use of a reference-based metric that compares the model's output with gold-standard references written by humans. However, it is expensive to create such references, and for some tasks, such as response generation in dialogue, creating references is not a simple matter. Therefore, various reference-free metrics have been developed in recent years. In this survey, which intends to cover the full breadth of all NLG tasks, we investigate the most commonly used approaches, their application, and their other uses beyond evaluating models. The survey concludes by highlighting some promising directions for future research.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
Computational Modelling of Plurality and Definiteness in Chinese Noun Phrases
Authors:
Yuqi Liu,
Guanyi Chen,
Kees van Deemter
Abstract:
Theoretical linguists have suggested that some languages (e.g., Chinese and Japanese) are "cooler" than other languages based on the observation that the intended meaning of phrases in these languages depends more on their contexts. As a result, many expressions in these languages are shortened, and their meaning is inferred from the context. In this paper, we focus on the omission of the pluralit…
▽ More
Theoretical linguists have suggested that some languages (e.g., Chinese and Japanese) are "cooler" than other languages based on the observation that the intended meaning of phrases in these languages depends more on their contexts. As a result, many expressions in these languages are shortened, and their meaning is inferred from the context. In this paper, we focus on the omission of the plurality and definiteness markers in Chinese noun phrases (NPs) to investigate the predictability of their intended meaning given the contexts. To this end, we built a corpus of Chinese NPs, each of which is accompanied by its corresponding context, and by labels indicating its singularity/plurality and definiteness/indefiniteness. We carried out corpus assessments and analyses. The results suggest that Chinese speakers indeed drop plurality and definiteness markers very frequently. Building on the corpus, we train a bank of computational models using both classic machine learning models and state-of-the-art pre-trained language models to predict the plurality and definiteness of each NP. We report on the performance of these models and analyse their behaviours.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Intrinsic Task-based Evaluation for Referring Expression Generation
Authors:
Guanyi Chen,
Fahime Same,
Kees van Deemter
Abstract:
Recently, a human evaluation study of Referring Expression Generation (REG) models had an unexpected conclusion: on \textsc{webnlg}, Referring Expressions (REs) generated by the state-of-the-art neural models were not only indistinguishable from the REs in \textsc{webnlg} but also from the REs generated by a simple rule-based system. Here, we argue that this limitation could stem from the use of a…
▽ More
Recently, a human evaluation study of Referring Expression Generation (REG) models had an unexpected conclusion: on \textsc{webnlg}, Referring Expressions (REs) generated by the state-of-the-art neural models were not only indistinguishable from the REs in \textsc{webnlg} but also from the REs generated by a simple rule-based system. Here, we argue that this limitation could stem from the use of a purely ratings-based human evaluation (which is a common practice in Natural Language Generation). To investigate these issues, we propose an intrinsic task-based evaluation for REG models, in which, in addition to rating the quality of REs, participants were asked to accomplish two meta-level tasks. One of these tasks concerns the referential success of each RE; the other task asks participants to suggest a better alternative for each RE. The outcomes suggest that, in comparison to previous evaluations, the new evaluation protocol assesses the performance of each REG model more comprehensively and makes the participants' ratings more reliable and discriminable.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Textual Summarisation of Large Sets: Towards a General Approach
Authors:
Kittipitch Kuptavanich,
Ehud Reiter,
Kees Van Deemter,
Advaith Siddharthan
Abstract:
We are developing techniques to generate summary descriptions of sets of objects. In this paper, we present and evaluate a rule-based NLG technique for summarising sets of bibliographical references in academic papers. This extends our previous work on summarising sets of consumer products and shows how our model generalises across these two very different domains.
We are developing techniques to generate summary descriptions of sets of objects. In this paper, we present and evaluate a rule-based NLG technique for summarising sets of bibliographical references in academic papers. This extends our previous work on summarising sets of consumer products and shows how our model generalises across these two very different domains.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
The Pitfalls of Defining Hallucination
Authors:
Kees van Deemter
Abstract:
Despite impressive advances in Natural Language Generation (NLG) and Large Language Models (LLMs), researchers are still unclear about important aspects of NLG evaluation. To substantiate this claim, I examine current classifications of hallucination and omission in Data-text NLG, and I propose a logic-based synthesis of these classfications. I conclude by highlighting some remaining limitations o…
▽ More
Despite impressive advances in Natural Language Generation (NLG) and Large Language Models (LLMs), researchers are still unclear about important aspects of NLG evaluation. To substantiate this claim, I examine current classifications of hallucination and omission in Data-text NLG, and I propose a logic-based synthesis of these classfications. I conclude by highlighting some remaining limitations of all current thinking about hallucination and by discussing implications for LLMs.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Models of reference production: How do they withstand the test of time?
Authors:
Fahime Same,
Guanyi Chen,
Kees van Deemter
Abstract:
In recent years, many NLP studies have focused solely on performance improvement. In this work, we focus on the linguistic and scientific aspects of NLP. We use the task of generating referring expressions in context (REG-in-context) as a case study and start our analysis from GREC, a comprehensive set of shared tasks in English that addressed this topic over a decade ago. We ask what the performa…
▽ More
In recent years, many NLP studies have focused solely on performance improvement. In this work, we focus on the linguistic and scientific aspects of NLP. We use the task of generating referring expressions in context (REG-in-context) as a case study and start our analysis from GREC, a comprehensive set of shared tasks in English that addressed this topic over a decade ago. We ask what the performance of models would be if we assessed them (1) on more realistic datasets, and (2) using more advanced methods. We test the models using different evaluation metrics and feature selection experiments. We conclude that GREC can no longer be regarded as offering a reliable assessment of models' ability to mimic human reference production, because the results are highly impacted by the choice of corpus and evaluation metrics. Our results also suggest that pre-trained language models are less dependent on the choice of corpus than classic Machine Learning models, and therefore make more robust class predictions.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Does ChatGPT have Theory of Mind?
Authors:
Bart Holterman,
Kees van Deemter
Abstract:
Theory of Mind (ToM) is the ability to understand human thinking and decision-making, an ability that plays a crucial role in social interaction between people, including linguistic communication. This paper investigates to what extent recent Large Language Models in the ChatGPT tradition possess ToM. We posed six well-known problems that address biases in human reasoning and decision making to tw…
▽ More
Theory of Mind (ToM) is the ability to understand human thinking and decision-making, an ability that plays a crucial role in social interaction between people, including linguistic communication. This paper investigates to what extent recent Large Language Models in the ChatGPT tradition possess ToM. We posed six well-known problems that address biases in human reasoning and decision making to two versions of ChatGPT and we compared the results under a range of prompting strategies. While the results concerning ChatGPT-3 were somewhat inconclusive, ChatGPT-4 was shown to arrive at the correct answers more often than would be expected based on chance, although correct answers were often arrived at on the basis of false assumptions or invalid reasoning.
△ Less
Submitted 13 September, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Authors:
Anya Belz,
Craig Thomson,
Ehud Reiter,
Gavin Abercrombie,
Jose M. Alonso-Moral,
Mohammad Arvan,
Anouck Braggaar,
Mark Cieliebak,
Elizabeth Clark,
Kees van Deemter,
Tanvi Dinkar,
Ondřej Dušek,
Steffen Eger,
Qixiang Fang,
Mingqi Gao,
Albert Gatt,
Dimitra Gkatzia,
Javier González-Corbelle,
Dirk Hovy,
Manuela Hürlimann,
Takumi Ito,
John D. Kelleher,
Filip Klubicka,
Emiel Krahmer,
Huiyuan Lai
, et al. (17 additional authors not shown)
Abstract:
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a…
▽ More
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP.
△ Less
Submitted 7 August, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Interpreting Vision and Language Generative Models with Semantic Visual Priors
Authors:
Michele Cafagna,
Lina M. Rojas-Barahona,
Kees van Deemter,
Albert Gatt
Abstract:
When applied to Image-to-text models, interpretability methods often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. Those explanations are expensive to compute and unable to comprehensively explain the model's output. Therefore, these models often require some sort of approximation that eventually leads to misleading explanat…
▽ More
When applied to Image-to-text models, interpretability methods often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. Those explanations are expensive to compute and unable to comprehensively explain the model's output. Therefore, these models often require some sort of approximation that eventually leads to misleading explanations. We develop a framework based on SHAP, that allows for generating comprehensive, meaningful explanations leveraging the meaning representation of the output sequence as a whole. Moreover, by exploiting semantic priors in the visual backbone, we extract an arbitrary number of features that allows the efficient computation of Shapley values on large-scale models, generating at the same time highly meaningful visual explanations. We demonstrate that our method generates semantically more expressive explanations than traditional methods at a lower compute cost and that it can be generalized over other explainability methods.
△ Less
Submitted 4 May, 2023; v1 submitted 28 April, 2023;
originally announced April 2023.
-
HL Dataset: Visually-grounded Description of Scenes, Actions and Rationales
Authors:
Michele Cafagna,
Kees van Deemter,
Albert Gatt
Abstract:
Current captioning datasets focus on object-centric captions, describing the visible objects in the image, e.g. "people eating food in a park". Although these datasets are useful to evaluate the ability of Vision & Language models to recognize and describe visual content, they do not support controlled experiments involving model testing or fine-tuning, with more high-level captions, which humans…
▽ More
Current captioning datasets focus on object-centric captions, describing the visible objects in the image, e.g. "people eating food in a park". Although these datasets are useful to evaluate the ability of Vision & Language models to recognize and describe visual content, they do not support controlled experiments involving model testing or fine-tuning, with more high-level captions, which humans find easy and natural to produce. For example, people often describe images based on the type of scene they depict ('people at a holiday resort') and the actions they perform ('people having a picnic'). Such descriptions draw on personal experience and commonsense assumptions. We present the High-Level Dataset a dataset extending 14997 images from the COCO dataset, aligned with a new set of 134,973 human-annotated (high-level) captions collected along three axes: scenes, actions, and rationales. We further extend this dataset with confidence scores collected from an independent set of readers, as well as a set of narrative captions generated synthetically, by combining each of the three axes. We describe this dataset and analyse it extensively. We also present baseline results for the High-Level Captioning task.
△ Less
Submitted 25 September, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions
Authors:
Michele Cafagna,
Kees van Deemter,
Albert Gatt
Abstract:
Image captioning models tend to describe images in an object-centric way, emphasising visible objects. But image descriptions can also abstract away from objects and describe the type of scene depicted. In this paper, we explore the potential of a state-of-the-art Vision and Language model, VinVL, to caption images at the scene level using (1) a novel dataset which pairs images with both object-ce…
▽ More
Image captioning models tend to describe images in an object-centric way, emphasising visible objects. But image descriptions can also abstract away from objects and describe the type of scene depicted. In this paper, we explore the potential of a state-of-the-art Vision and Language model, VinVL, to caption images at the scene level using (1) a novel dataset which pairs images with both object-centric and scene descriptions. Through (2) an in-depth analysis of the effect of the fine-tuning, we show (3) that a small amount of curated data suffices to generate scene descriptions without losing the capability to identify object-level concepts in the scene; the model acquires a more holistic view of the image compared to when object-centric descriptions are generated. We discuss the parallels between these results and insights from computational and cognitive science research on scene perception.
△ Less
Submitted 10 November, 2022; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset
Authors:
Guanyi Chen,
Fahime Same,
Kees van Deemter
Abstract:
Previous work on Neural Referring Expression Generation (REG) all uses WebNLG, an English dataset that has been shown to reflect a very limited range of referring expression (RE) use. To tackle this issue, we build a dataset based on the OntoNotes corpus that contains a broader range of RE use in both English and Chinese (a language that uses zero pronouns). We build neural Referential Form Select…
▽ More
Previous work on Neural Referring Expression Generation (REG) all uses WebNLG, an English dataset that has been shown to reflect a very limited range of referring expression (RE) use. To tackle this issue, we build a dataset based on the OntoNotes corpus that contains a broader range of RE use in both English and Chinese (a language that uses zero pronouns). We build neural Referential Form Selection (RFS) models accordingly, assess them on the dataset and conduct probing experiments. The experiments suggest that, compared to WebNLG, OntoNotes is better for assessing REG/RFS models. We compare English and Chinese RFS and confirm that, in line with linguistic theories, Chinese RFS depends more on discourse context than English.
△ Less
Submitted 11 October, 2022; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Understanding the Use of Quantifiers in Mandarin
Authors:
Guanyi Chen,
Kees van Deemter
Abstract:
We introduce a corpus of short texts in Mandarin, in which quantified expressions figure prominently. We illustrate the significance of the corpus by examining the hypothesis (known as Huang's "coolness" hypothesis) that speakers of East Asian Languages tend to speak more briefly but less informatively than, for example, speakers of West-European languages. The corpus results from an elicitation e…
▽ More
We introduce a corpus of short texts in Mandarin, in which quantified expressions figure prominently. We illustrate the significance of the corpus by examining the hypothesis (known as Huang's "coolness" hypothesis) that speakers of East Asian Languages tend to speak more briefly but less informatively than, for example, speakers of West-European languages. The corpus results from an elicitation experiment in which participants were asked to describe abstract visual scenes. We compare the resulting corpus, called MQTUNA, with an English corpus that was collected using the same experimental paradigm. The comparison reveals that some, though not all, aspects of quantifier use support the above-mentioned hypothesis. Implications of these findings for the generation of quantified noun phrases are discussed.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
The Role of Explanatory Value in Natural Language Processing
Authors:
Kees van Deemter
Abstract:
A key aim of science is explanation, yet the idea of explaining language phenomena has taken a backseat in mainstream Natural Language Processing (NLP) and many other areas of Artificial Intelligence. I argue that explanation of linguistic behaviour should be a main goal of NLP, and that this is not the same as making NLP models explainable. To illustrate these ideas, some recent models of human l…
▽ More
A key aim of science is explanation, yet the idea of explaining language phenomena has taken a backseat in mainstream Natural Language Processing (NLP) and many other areas of Artificial Intelligence. I argue that explanation of linguistic behaviour should be a main goal of NLP, and that this is not the same as making NLP models explainable. To illustrate these ideas, some recent models of human language production are compared with each other. I conclude by asking what it would mean for NLP research and institutional policies if our community took explanatory value seriously, while heeding some possible pitfalls.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Semeval-2022 Task 1: CODWOE -- Comparing Dictionaries and Word Embeddings
Authors:
Timothee Mickus,
Kees van Deemter,
Mathieu Constant,
Denis Paperno
Abstract:
Word embeddings have advanced the state of the art in NLP across numerous tasks. Understanding the contents of dense neural representations is of utmost interest to the computational semantics community. We propose to focus on relating these opaque word vectors with human-readable definitions, as found in dictionaries. This problem naturally divides into two subtasks: converting definitions into e…
▽ More
Word embeddings have advanced the state of the art in NLP across numerous tasks. Understanding the contents of dense neural representations is of utmost interest to the computational semantics community. We propose to focus on relating these opaque word vectors with human-readable definitions, as found in dictionaries. This problem naturally divides into two subtasks: converting definitions into embeddings, and converting embeddings into definitions. This task was conducted in a multilingual setting, using comparable sets of embeddings trained homogeneously.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Evaluating Automatic Difficulty Estimation of Logic Formalization Exercises
Authors:
Alexandra Mayn,
Kees van Deemter
Abstract:
Teaching logic effectively requires an understanding of the factors which cause logic students to struggle. Formalization exercises, which require the student to produce a formula corresponding to the natural language sentence, are a good candidate for scrutiny since they tap into the students' understanding of various aspects of logic. We correlate the difficulty of formalization exercises predic…
▽ More
Teaching logic effectively requires an understanding of the factors which cause logic students to struggle. Formalization exercises, which require the student to produce a formula corresponding to the natural language sentence, are a good candidate for scrutiny since they tap into the students' understanding of various aspects of logic. We correlate the difficulty of formalization exercises predicted by a previously proposed difficulty estimation algorithm with two empirical difficulty measures on the Grade Grinder corpus, which contains student solutions to FOL exercises. We obtain a moderate correlation with both measures, suggesting that the said algorithm indeed taps into important sources of difficulty but leaves a fair amount of variance uncaptured. We conduct an error analysis, closely examining exercises which were misclassified, with the aim of identifying additional sources of difficulty. We identify three additional factors which emerge from the difficulty analysis, namely predicate complexity, pragmatic factors and typicality of the exercises, and discuss the implications of automated difficulty estimation for logic teaching and explainable AI.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Non-neural Models Matter: A Re-evaluation of Neural Referring Expression Generation Systems
Authors:
Fahime Same,
Guanyi Chen,
Kees van Deemter
Abstract:
In recent years, neural models have often outperformed rule-based and classic Machine Learning approaches in NLG. These classic approaches are now often disregarded, for example when new neural models are evaluated. We argue that they should not be overlooked, since, for some tasks, well-designed non-neural approaches achieve better performance than neural ones. In this paper, the task of generati…
▽ More
In recent years, neural models have often outperformed rule-based and classic Machine Learning approaches in NLG. These classic approaches are now often disregarded, for example when new neural models are evaluated. We argue that they should not be overlooked, since, for some tasks, well-designed non-neural approaches achieve better performance than neural ones. In this paper, the task of generating referring expressions in linguistic context is used as an example. We examined two very different English datasets (WEBNLG and WSJ), and evaluated each algorithm using both automatic and human evaluations. Overall, the results of these evaluations suggest that rule-based systems with simple rule sets achieve on-par or better performance on both datasets compared to state-of-the-art neural REG systems. In the case of the more realistic dataset, WSJ, a machine learning-based system with well-designed linguistic features performed best. We hope that our work can encourage researchers to consider non-neural models in future.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
What Vision-Language Models `See' when they See Scenes
Authors:
Michele Cafagna,
Kees van Deemter,
Albert Gatt
Abstract:
Images can be described in terms of the objects they contain, or in terms of the types of scene or place that they instantiate. In this paper we address to what extent pretrained Vision and Language models can learn to align descriptions of both types with images. We compare 3 state-of-the-art models, VisualBERT, LXMERT and CLIP. We find that (i) V&L models are susceptible to stylistic biases acqu…
▽ More
Images can be described in terms of the objects they contain, or in terms of the types of scene or place that they instantiate. In this paper we address to what extent pretrained Vision and Language models can learn to align descriptions of both types with images. We compare 3 state-of-the-art models, VisualBERT, LXMERT and CLIP. We find that (i) V&L models are susceptible to stylistic biases acquired during pretraining; (ii) only CLIP performs consistently well on both object- and scene-level descriptions. A follow-up ablation study shows that CLIP uses object-level information in the visual modality to align with scene-level textual descriptions.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
What can Neural Referential Form Selectors Learn?
Authors:
Guanyi Chen,
Fahime Same,
Kees van Deemter
Abstract:
Despite achieving encouraging results, neural Referring Expression Generation models are often thought to lack transparency. We probed neural Referential Form Selection (RFS) models to find out to what extent the linguistic features influencing the RE form are learnt and captured by state-of-the-art RFS models. The results of 8 probing tasks show that all the defined features were learnt to some e…
▽ More
Despite achieving encouraging results, neural Referring Expression Generation models are often thought to lack transparency. We probed neural Referential Form Selection (RFS) models to find out to what extent the linguistic features influencing the RE form are learnt and captured by state-of-the-art RFS models. The results of 8 probing tasks show that all the defined features were learnt to some extent. The probing tasks pertaining to referential status and syntactic position exhibited the highest performance. The lowest performance was achieved by the probing models designed to predict discourse structure properties beyond the sentence level.
△ Less
Submitted 15 August, 2021;
originally announced August 2021.
-
Lessons from Computational Modelling of Reference Production in Mandarin and English
Authors:
Guanyi Chen,
Kees van Deemter
Abstract:
Referring expression generation (REG) algorithms offer computational models of the production of referring expressions. In earlier work, a corpus of referring expressions (REs) in Mandarin was introduced. In the present paper, we annotate this corpus, evaluate classic REG algorithms on it, and compare the results with earlier results on the evaluation of REG for English referring expressions. Next…
▽ More
Referring expression generation (REG) algorithms offer computational models of the production of referring expressions. In earlier work, a corpus of referring expressions (REs) in Mandarin was introduced. In the present paper, we annotate this corpus, evaluate classic REG algorithms on it, and compare the results with earlier results on the evaluation of REG for English referring expressions. Next, we offer an in-depth analysis of the corpus, focusing on issues that arise from the grammar of Mandarin. We discuss shortcomings of previous REG evaluations that came to light during our investigation and we highlight some surprising results. Perhaps most strikingly, we found a much higher proportion of under-specified expressions than previous studies had suggested, not just in Mandarin but in English as well.
△ Less
Submitted 15 August, 2021; v1 submitted 14 November, 2020;
originally announced November 2020.
-
A Text Reassembling Approach to Natural Language Generation
Authors:
Xiao Li,
Kees van Deemter,
Chenghua Lin
Abstract:
Recent years have seen a number of proposals for performing Natural Language Generation (NLG) based in large part on statistical techniques. Despite having many attractive features, we argue that these existing approaches nonetheless have some important drawbacks, sometimes because the approach in question is not fully statistical (i.e., relies on a certain amount of handcrafting), sometimes becau…
▽ More
Recent years have seen a number of proposals for performing Natural Language Generation (NLG) based in large part on statistical techniques. Despite having many attractive features, we argue that these existing approaches nonetheless have some important drawbacks, sometimes because the approach in question is not fully statistical (i.e., relies on a certain amount of handcrafting), sometimes because the approach in question lacks transparency. Focussing on some of the key NLG tasks (namely Content Selection, Lexical Choice, and Linguistic Realisation), we propose a novel approach, called the Text Reassembling approach to NLG (TRG), which approaches the ideal of a purely statistical approach very closely, and which is at the same time highly transparent. We evaluate the TRG approach and discuss how TRG may be extended to deal with other NLG tasks, such as Document Structuring, and Aggregation. We discuss the strengths and limitations of TRG, concluding that the method may hold particular promise for domain experts who want to build an NLG system despite having little expertise in linguistics and NLG.
△ Less
Submitted 25 December, 2020; v1 submitted 16 May, 2020;
originally announced May 2020.
-
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Authors:
Timothee Mickus,
Denis Paperno,
Mathieu Constant,
Kees van Deemter
Abstract:
Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen as an extension of previous noncontextual distributional semantic models. In this work, we focus on BERT, a deep neural network that produces contextualized embeddings and has set the state-of-the-art in several semantic tasks, and study the semantic coherence of its embedding space. While showing…
▽ More
Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen as an extension of previous noncontextual distributional semantic models. In this work, we focus on BERT, a deep neural network that produces contextualized embeddings and has set the state-of-the-art in several semantic tasks, and study the semantic coherence of its embedding space. While showing a tendency towards coherence, BERT does not fully live up to the natural expectations for a semantic vector space. In particular, we find that the position of the sentence in which a word occurs, while having no meaning correlates, leaves a noticeable trace on the word embeddings and disturbs similarity relationships.
△ Less
Submitted 8 May, 2020; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Meteorologists and Students: A resource for language grounding of geographical descriptors
Authors:
Alejandro Ramos-Soto,
Ehud Reiter,
Kees van Deemter,
Jose M. Alonso,
Albert Gatt
Abstract:
We present a data resource which can be useful for research purposes on language grounding tasks in the context of geographical referring expression generation. The resource is composed of two data sets that encompass 25 different geographical descriptors and a set of associated graphical representations, drawn as polygons on a map by two groups of human subjects: teenage students and expert meteo…
▽ More
We present a data resource which can be useful for research purposes on language grounding tasks in the context of geographical referring expression generation. The resource is composed of two data sets that encompass 25 different geographical descriptors and a set of associated graphical representations, drawn as polygons on a map by two groups of human subjects: teenage students and expert meteorologists.
△ Less
Submitted 7 September, 2018;
originally announced September 2018.
-
An Empirical Approach for Modeling Fuzzy Geographical Descriptors
Authors:
Alejandro Ramos-Soto,
Jose M. Alonso,
Ehud Reiter,
Kees van Deemter,
Albert Gatt
Abstract:
We present a novel heuristic approach that defines fuzzy geographical descriptors using data gathered from a survey with human subjects. The participants were asked to provide graphical interpretations of the descriptors `north' and `south' for the Galician region (Spain). Based on these interpretations, our approach builds fuzzy descriptors that are able to compute membership degrees for geograph…
▽ More
We present a novel heuristic approach that defines fuzzy geographical descriptors using data gathered from a survey with human subjects. The participants were asked to provide graphical interpretations of the descriptors `north' and `south' for the Galician region (Spain). Based on these interpretations, our approach builds fuzzy descriptors that are able to compute membership degrees for geographical locations. We evaluated our approach in terms of efficiency and precision. The fuzzy descriptors are meant to be used as the cornerstones of a geographical referring expression generation algorithm that is able to linguistically characterize geographical locations and regions. This work is also part of a general research effort that intends to establish a methodology which reunites the empirical studies traditionally practiced in data-to-text and the use of fuzzy sets to model imprecision and vagueness in words and expressions for text generation purposes.
△ Less
Submitted 30 March, 2017;
originally announced March 2017.
-
Dialogue as Discourse: Controlling Global Properties of Scripted Dialogue
Authors:
Paul Piwek,
Kees van Deemter
Abstract:
This paper explains why scripted dialogue shares some crucial properties with discourse. In particular, when scripted dialogues are generated by a Natural Language Generation system, the generator can apply revision strategies that cannot normally be used when the dialogue results from an interaction between autonomous agents (i.e., when the dialogue is not scripted). The paper explains that the…
▽ More
This paper explains why scripted dialogue shares some crucial properties with discourse. In particular, when scripted dialogues are generated by a Natural Language Generation system, the generator can apply revision strategies that cannot normally be used when the dialogue results from an interaction between autonomous agents (i.e., when the dialogue is not scripted). The paper explains that the relevant revision operators are best applied at the level of a dialogue plan and discusses how the generator may decide when to apply a given revision operator.
△ Less
Submitted 22 December, 2003;
originally announced December 2003.
-
Towards Automated Generation of Scripted Dialogue: Some Time-Honoured Strategies
Authors:
Paul Piwek,
Kees van Deemter
Abstract:
The main aim of this paper is to introduce automated generation of scripted dialogue as a worthwhile topic of investigation. In particular the fact that scripted dialogue involves two layers of communication, i.e., uni-directional communication between the author and the audience of a scripted dialogue and bi-directional pretended communication between the characters featuring in the dialogue, i…
▽ More
The main aim of this paper is to introduce automated generation of scripted dialogue as a worthwhile topic of investigation. In particular the fact that scripted dialogue involves two layers of communication, i.e., uni-directional communication between the author and the audience of a scripted dialogue and bi-directional pretended communication between the characters featuring in the dialogue, is argued to raise some interesting issues. Our hope is that the combined study of the two layers will forge links between research in text generation and dialogue processing. The paper presents a first attempt at creating such links by studying three types of strategies for the automated generation of scripted dialogue. The strategies are derived from examples of human-authored and naturally occurring dialogue.
△ Less
Submitted 22 December, 2003;
originally announced December 2003.