-
Automatic extraction of requirements expressed in industrial standards : a way towards machine readable standards ?
Authors:
Helene de Ribaupierre,
Anne-Francoise Cutting-Decelle,
Nathalie Baumier,
Serge Blumental
Abstract:
The project, under industrial funding, presented in this publication aims at the semantic analysis of a normative document describing requirements applicable to electrical appliances. The objective of the project is to build a semantic approach to extract and automatically process information related to the requirements contained in the standard. To this end, the project has been divided into thre…
▽ More
The project, under industrial funding, presented in this publication aims at the semantic analysis of a normative document describing requirements applicable to electrical appliances. The objective of the project is to build a semantic approach to extract and automatically process information related to the requirements contained in the standard. To this end, the project has been divided into three parts, covering the analysis of the requirements document, the extraction of relevant information and creation of the ontology and the comparison with other approaches. The first part of our work deals with the analysis of the requirements document under study. The study focuses on the specificity of the sentence structure, the use of particular words and vocabulary related to the representation of the requirements. The aim is to propose a representation facilitating the extraction of information, used in the second part of the study. In the second part, the extraction of relevant information is conducted in two ways: manual (the ontology being built by hand), semi-automatic (using semantic annotation software and natural language processing techniques). Whatever the method used, the aim of this extraction is to create the concept dictionary, then the ontology, enriched as the document is scanned and understood by the system. Once the relevant terms have been identified, the work focuses on identifying and representing the requirements, separating the textual writing from the information given in the tables. The automatic processing of requirements involves the extraction of sentences containing terms identified as relevant to a requirement. The identified requirement is then indexed and stored in a representation that can be used for query processing.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification
Authors:
Aleksandra Edwards,
Asahi Ushio,
Jose Camacho-Collados,
Hélène de Ribaupierre,
Alun Preece
Abstract:
Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity. State-of-the-art generative language models have been shown to provide significant gains across different NLP tasks. However, their applicability to data augmentation for text classification tasks in few-shot settings have not been fully explor…
▽ More
Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity. State-of-the-art generative language models have been shown to provide significant gains across different NLP tasks. However, their applicability to data augmentation for text classification tasks in few-shot settings have not been fully explored, especially for specialised domains. In this paper, we leverage GPT-2 (Radford A et al, 2019) for generating artificial training instances in order to improve classification performance. Our aim is to analyse the impact the selection process of seed training examples have over the quality of GPT-generated samples and consequently the classifier performance. We perform experiments with several seed selection strategies that, among others, exploit class hierarchical structures and domain expert selection. Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements and outperform competitive baselines. Finally, we show that guiding this process through domain expert selection can lead to further improvements, which opens up interesting research avenues for combining generative models and active learning.
△ Less
Submitted 9 January, 2023; v1 submitted 17 November, 2021;
originally announced November 2021.
-
Interface to Query and Visualise Definitions from a Knowledge Base
Authors:
Anelia Kurteva,
Hélène De Ribaupierre
Abstract:
The semantic linked data model is at the core of the Web due to its ability to model real world entities, connect them via relationships and provide context, which could help to transform data into information and information into knowledge. Linked Data, in the form of ontologies and knowledge graphs could be stored locally or could be made available to everyone online. For example, the DBpedia kn…
▽ More
The semantic linked data model is at the core of the Web due to its ability to model real world entities, connect them via relationships and provide context, which could help to transform data into information and information into knowledge. Linked Data, in the form of ontologies and knowledge graphs could be stored locally or could be made available to everyone online. For example, the DBpedia knowledge base, which provides global and unified access to knowledge graphs is open access. However, both access and usage of Linked Data require individuals to have expert knowledge in the field of the Semantic Web. Many of the existing solutions that are powered by Linked Data are developed for specific use cases such as building and exploring ontologies visually and are aimed at researchers with knowledge of semantic technology. The solutions that are aimed at non-experts are generic and, in most cases, information visualisation is not available. Instead, information is presented in textual format, which does not ease cognitive processes such as comprehension and could lead to problems such as information overload. In this paper, we present a web application with a user interface (UI), which combines features from applications for both experts and non-experts. The UI allows individuals with no previous knowledge of the Semantic Web to query the DBpedia knowledge base for definitions of a specific word and to view a graphical visualisation of the query results (the search keyword itself and concepts related to it).
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports
Authors:
Aleksandra Edwards,
David Rogers,
Jose Camacho-Collados,
Hélène de Ribaupierre,
Alun Preece
Abstract:
The task of text and sentence classification is associated with the need for large amounts of labelled training data. The acquisition of high volumes of labelled datasets can be expensive or unfeasible, especially for highly-specialised domains for which documents are hard to obtain. Research on the application of supervised classification based on small amounts of training data is limited. In thi…
▽ More
The task of text and sentence classification is associated with the need for large amounts of labelled training data. The acquisition of high volumes of labelled datasets can be expensive or unfeasible, especially for highly-specialised domains for which documents are hard to obtain. Research on the application of supervised classification based on small amounts of training data is limited. In this paper, we address the combination of state-of-the-art deep learning and classification methods and provide an insight into what combination of methods fit the needs of small, domain-specific, and terminologically-rich corpora. We focus on a real-world scenario related to a collection of safeguarding reports comprising learning experiences and reflections on tackling serious incidents involving children and vulnerable adults. The relatively small volume of available reports and their use of highly domain-specific terminology makes the application of automated approaches difficult. We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches. Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
△ Less
Submitted 4 June, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.