-
Classification or Prompting: A Case Study on Legal Requirements Traceability
Authors:
Romina Etezadi,
Sallam Abualhaija,
Chetan Arora,
Lionel Briand
Abstract:
New regulations are continuously introduced to ensure that software development complies with the ethical concerns and prioritizes public safety. A prerequisite for demonstrating compliance involves tracing software requirements to legal provisions. Requirements traceability is a fundamental task where requirements engineers are supposed to analyze technical requirements against target artifacts,…
▽ More
New regulations are continuously introduced to ensure that software development complies with the ethical concerns and prioritizes public safety. A prerequisite for demonstrating compliance involves tracing software requirements to legal provisions. Requirements traceability is a fundamental task where requirements engineers are supposed to analyze technical requirements against target artifacts, often under limited time budget. Doing this analysis manually for complex systems with hundreds of requirements is infeasible. The legal dimension introduces additional challenges that only exacerbate manual effort.
In this paper, we investigate two automated solutions based on large language models (LLMs) to predict trace links between requirements and legal provisions. The first solution, Kashif, is a classifier that leverages sentence transformers. The second solution prompts a recent generative LLM based on Rice, a prompt engineering framework.
On a benchmark dataset, we empirically evaluate Kashif and compare it against a baseline classifier from the literature. Kashif can identify trace links with an average recall of ~67%, outperforming the baseline with a substantial gain of 54 percentage points (pp) in recall. However, on unseen, more complex requirements documents traced to the European general data protection regulation (GDPR), Kashif performs poorly, yielding an average recall of 15%. On the same documents, however, our Rice-based solution yields an average recall of 84%, with a remarkable gain of about 69 pp over Kashif. Our results suggest that requirements traceability in the legal context cannot be simply addressed by building classifiers, as such solutions do not generalize and fail to perform well on complex regulations and requirements. Resorting to generative LLMs, with careful prompt engineering, is thus a more promising alternative.
△ Less
Submitted 11 February, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
GDPR-Relevant Privacy Concerns in Mobile Apps Research: A Systematic Literature Review
Authors:
Orlando Amaral Cejas,
Nicolas Sannier,
Sallam Abualhaija,
Marcello Ceci,
Domenico Bianculli
Abstract:
The General Data Protection Regulation (GDPR) is the benchmark in the European Union (EU) for privacy and data protection standards. Substantial research has been conducted in the requirements engineering (RE) literature investigating the elicitation, representation, and verification of privacy requirements in GDPR. Software systems including mobile apps must comply with the GDPR. With the growing…
▽ More
The General Data Protection Regulation (GDPR) is the benchmark in the European Union (EU) for privacy and data protection standards. Substantial research has been conducted in the requirements engineering (RE) literature investigating the elicitation, representation, and verification of privacy requirements in GDPR. Software systems including mobile apps must comply with the GDPR. With the growing pervasiveness of mobile apps and their increasing demand for personal data, privacy concerns have acquired further interest within the software engineering (SE) community at large. Despite the extensive literature on GDPR-relevant privacy concerns in mobile apps, there is no secondary study that describes, analyzes, and categorizes the current focus. Research gaps and persistent challenges are thus left unnoticed. In this article, we aim to systematically review existing primary studies highlighting various GDPR concepts and how these concepts are addressed in mobile apps research. The objective is to reconcile the existing work on GDPR in the RE literature with the research on GDPR-related privacy concepts in mobile apps in the SE literature. Our findings show that the current research landscape reflects a rather shallow understanding of GDPR requirements. Some GDPR concepts such as data subject rights (i.e., the rights of individuals over their personal data) are fundamental to GDPR, yet under-explored in the literature. In this article, we highlight future directions to be pursued by the SE community for supporting the development of GDPR-compliant mobile apps.
△ Less
Submitted 28 November, 2024;
originally announced November 2024.
-
Model Generation with LLMs: From Requirements to UML Sequence Diagrams
Authors:
Alessio Ferrari,
Sallam Abualhaija,
Chetan Arora
Abstract:
Complementing natural language (NL) requirements with graphical models can improve stakeholders' communication and provide directions for system design. However, creating models from requirements involves manual effort. The advent of generative large language models (LLMs), ChatGPT being a notable example, offers promising avenues for automated assistance in model generation. This paper investigat…
▽ More
Complementing natural language (NL) requirements with graphical models can improve stakeholders' communication and provide directions for system design. However, creating models from requirements involves manual effort. The advent of generative large language models (LLMs), ChatGPT being a notable example, offers promising avenues for automated assistance in model generation. This paper investigates the capability of ChatGPT to generate a specific type of model, i.e., UML sequence diagrams, from NL requirements. We conduct a qualitative study in which we examine the sequence diagrams generated by ChatGPT for 28 requirements documents of various types and from different domains. Observations from the analysis of the generated diagrams have systematically been captured through evaluation logs, and categorized through thematic analysis. Our results indicate that, although the models generally conform to the standard and exhibit a reasonable level of understandability, their completeness and correctness with respect to the specified requirements often present challenges. This issue is particularly pronounced in the presence of requirements smells, such as ambiguity and inconsistency. The insights derived from this study can influence the practical utilization of LLMs in the RE process, and open the door to novel RE-specific prompting strategies targeting effective model generation.
△ Less
Submitted 1 July, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs
Authors:
Muhammad Ilyas Azeem,
Sallam Abualhaija
Abstract:
Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern to requirements engineering (RE). Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreem…
▽ More
Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern to requirements engineering (RE). Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated in GDPR and outlined in DPAs. Requirements engineers can elicit from DPAs legal requirements for regulating the data processing activities in software systems. Checking the completeness of a DPA according to the GDPR provisions is therefore an essential prerequisite to ensure that the elicited requirements are complete. Analyzing DPAs entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy to address the completeness checking of DPAs against GDPR. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Legal Requirements Analysis
Authors:
Sallam Abualhaija,
Marcello Ceci,
Lionel Briand
Abstract:
Modern software has been an integral part of everyday activities in many disciplines and application contexts. Introducing intelligent automation by leveraging artificial intelligence (AI) led to break-throughs in many fields. The effectiveness of AI can be attributed to several factors, among which is the increasing availability of data. Regulations such as the general data protection regulation…
▽ More
Modern software has been an integral part of everyday activities in many disciplines and application contexts. Introducing intelligent automation by leveraging artificial intelligence (AI) led to break-throughs in many fields. The effectiveness of AI can be attributed to several factors, among which is the increasing availability of data. Regulations such as the general data protection regulation (GDPR) in the European Union (EU) are introduced to ensure the protection of personal data. Software systems that collect, process, or share personal data are subject to compliance with such regulations. Developing compliant software depends heavily on addressing legal requirements stipulated in applicable regulations, a central activity in the requirements engineering (RE) phase of the software development process. RE is concerned with specifying and maintaining requirements of a system-to-be, including legal requirements. Legal agreements which describe the policies organizations implement for processing personal data can provide an additional source to regulations for eliciting legal requirements. In this chapter, we explore a variety of methods for analyzing legal requirements and exemplify them on GDPR. Specifically, we describe possible alternatives for creating machine-analyzable representations from regulations, survey the existing automated means for enabling compliance verification against regulations, and further reflect on the current challenges of legal requirements analysis.
△ Less
Submitted 17 February, 2024; v1 submitted 23 November, 2023;
originally announced November 2023.
-
Replication in Requirements Engineering: the NLP for RE Case
Authors:
Sallam Abualhaija,
F. BaŞAk Aydemir,
Fabiano Dalpiaz,
Davide Dell'Anna,
Alessio Ferrari,
Xavier Franch,
Davide Fucci
Abstract:
[Context]} Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the het…
▽ More
[Context]} Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the heterogeneity of the tasks involving NLP, the tasks' inherent hairiness, and, in turn, the heterogeneous reporting structure. [Solution] To address these issues, we propose a new artifact, referred to as ID-Card, whose goal is to provide a structured summary of research papers emphasizing replication-relevant information. We construct the ID-Card through a structured, iterative process based on design science. [Results] In this paper: (i) we report on hands-on experiences of replication, (ii) we review the state-of-the-art and extract replication-relevant information, (iii) we identify, through focus groups, challenges across two typical dimensions of replication: data annotation and tool reconstruction, and (iv) we present the concept and structure of the ID-Card to mitigate the identified challenges. [Contribution] This study aims to create awareness of replication in NLP for RE. We propose an ID-Card that is intended to foster study replication, but can also be used in other contexts, e.g., for educational purposes.
△ Less
Submitted 18 April, 2024; v1 submitted 20 April, 2023;
originally announced April 2023.
-
AI-based Question Answering Assistance for Analyzing Natural-language Requirements
Authors:
Saad Ezzini,
Sallam Abualhaija,
Chetan Arora,
Mehrdad Sabetzadeh
Abstract:
By virtue of being prevalently written in natural language (NL), requirements are prone to various defects, e.g., inconsistency and incompleteness. As such, requirements are frequently subject to quality assurance processes. These processes, when carried out entirely manually, are tedious and may further overlook important quality issues due to time and budget pressures. In this paper, we propose…
▽ More
By virtue of being prevalently written in natural language (NL), requirements are prone to various defects, e.g., inconsistency and incompleteness. As such, requirements are frequently subject to quality assurance processes. These processes, when carried out entirely manually, are tedious and may further overlook important quality issues due to time and budget pressures. In this paper, we propose QAssist -- a question-answering (QA) approach that provides automated assistance to stakeholders, including requirements engineers, during the analysis of NL requirements. Posing a question and getting an instant answer is beneficial in various quality-assurance scenarios, e.g., incompleteness detection. Answering requirements-related questions automatically is challenging since the scope of the search for answers can go beyond the given requirements specification. To that end, QAssist provides support for mining external domain-knowledge resources. Our work is one of the first initiatives to bring together QA and external domain knowledge for addressing requirements engineering challenges. We evaluate QAssist on a dataset covering three application domains and containing a total of 387 question-answer pairs. We experiment with state-of-the-art QA methods, based primarily on recent large-scale language models. In our empirical study, QAssist localizes the answer to a question to three passages within the requirements specification and within the external domain-knowledge resource with an average recall of 90.1% and 96.5%, respectively. QAssist extracts the actual answer to the posed question with an average accuracy of 84.2%.
Keywords: Natural-language Requirements, Question Answering (QA), Language Models, Natural Language Processing (NLP), Natural Language Generation (NLG), BERT, T5.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR
Authors:
Orlando Amaral,
Muhammad Ilyas Azeem,
Sallam Abualhaija,
Lionel C Briand
Abstract:
Processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs). Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA co…
▽ More
Processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs). Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA complies with GDPR is challenging as it requires significant time and effort for understanding and identifying DPA-relevant compliance requirements in GDPR and then verifying these requirements in the DPA. In this paper, we propose an automated solution to check the compliance of a given DPA against GDPR. In close interaction with legal experts, we first built two artifacts: (i) the "shall" requirements extracted from the GDPR provisions relevant to DPA compliance and (ii) a glossary table defining the legal concepts in the requirements. Then, we developed an automated solution that leverages natural language processing (NLP) technologies to check the compliance of a given DPA against these "shall" requirements. Specifically, our approach automatically generates phrasal-level representations for the textual content of the DPA and compares it against predefined representations of the "shall" requirements. Over a dataset of 30 actual DPAs, the approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements. The approach has thus an average precision of 89.1%, a recall of 82.4%, and an accuracy of 84.6%. Compared to a baseline that relies on off-the-shelf NLP tools, our approach provides an average accuracy gain of ~20 percentage points. The accuracy of our approach can be improved to ~94% with limited manual verification effort.
△ Less
Submitted 18 June, 2023; v1 submitted 20 September, 2022;
originally announced September 2022.
-
COREQQA -- A COmpliance REQuirements Understanding using Question Answering Tool
Authors:
Sallam Abualhaija,
Chetan Arora,
Lionel Briand
Abstract:
We introduce COREQQA, a tool for assisting requirements engineers in acquiring a better understanding of compliance requirements by means of automated Question Answering. Extracting compliance-related requirements by manually navigating through a legal document is both time-consuming and error-prone. COREQQA enables requirements engineers to pose questions in natural language about a compliance-re…
▽ More
We introduce COREQQA, a tool for assisting requirements engineers in acquiring a better understanding of compliance requirements by means of automated Question Answering. Extracting compliance-related requirements by manually navigating through a legal document is both time-consuming and error-prone. COREQQA enables requirements engineers to pose questions in natural language about a compliance-related topic given some legal document, e.g., asking about data breach. The tool then automatically navigates through the legal document and returns to the requirements engineer a list of text passages containing the possible answers to the input question. For better readability, the tool also highlights the likely answers in these passages. The engineer can then use this output for specifying compliance requirements. COREQQA is developed using advanced large-scale language models from BERT's family. COREQQA has been evaluated on four legal documents. The results of this evaluation are briefly presented in the paper. The tool is publicly available on Zenodo (DOI: 10.5281/zenodo.6653514).
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
TAPHSIR: Towards AnaPHoric Ambiguity Detection and ReSolution In Requirements
Authors:
Saad Ezzini,
Sallam Abualhaija,
Chetan Arora,
Mehrdad Sabetzadeh
Abstract:
We introduce TAPHSIR, a tool for anaphoric ambiguity detection and anaphora resolution in requirements. TAPHSIR facilities reviewing the use of pronouns in a requirements specification and revising those pronouns that can lead to misunderstandings during the development process. To this end, TAPHSIR detects the requirements which have potential anaphoric ambiguity and further attempts interpreting…
▽ More
We introduce TAPHSIR, a tool for anaphoric ambiguity detection and anaphora resolution in requirements. TAPHSIR facilities reviewing the use of pronouns in a requirements specification and revising those pronouns that can lead to misunderstandings during the development process. To this end, TAPHSIR detects the requirements which have potential anaphoric ambiguity and further attempts interpreting anaphora occurrences automatically. TAPHSIR employs a hybrid solution composed of an ambiguity detection solution based on machine learning and an anaphora resolution solution based on a variant of the BERT language model. Given a requirements specification, TAPHSIR decides for each pronoun occurrence in the specification whether the pronoun is ambiguous or unambiguous, and further provides an automatic interpretation for the pronoun. The output generated by TAPHSIR can be easily reviewed and validated by requirements engineers. TAPHSIR is publicly available on Zenodo (DOI: 10.5281/zenodo.5902117).
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
WikiDoMiner: Wikipedia Domain-specific Miner
Authors:
Saad Ezzini,
Sallam Abualhaija,
Mehrdad Sabetzadeh
Abstract:
We introduce WikiDoMiner, a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by…
▽ More
We introduce WikiDoMiner, a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by first extracting a set of domain-specific keywords from a given RS, and then querying Wikipedia for these keywords. The output of WikiDoMiner is a set of Wikipedia articles relevant to the domain of the input RS. Mining Wikipedia for domain-specific knowledge can be beneficial for multiple requirements engineering tasks, e.g., ambiguity handling, requirements classification, and question answering. WikiDoMiner is publicly available on Zenodo under an open-source license (DOI: 10.5281/zenodo.6671357).
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
AI-enabled Automation for Completeness Checking of Privacy Policies
Authors:
Orlando Amaral,
Sallam Abualhaija,
Damiano Torre,
Mehrdad Sabetzadeh,
Lionel C. Briand
Abstract:
Technological advances in information sharing have raised concerns about data protection. Privacy policies contain privacy-related requirements about how the personal data of individuals will be handled by an organization or a software system (e.g., a web service or an app). In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). A prerequisite for…
▽ More
Technological advances in information sharing have raised concerns about data protection. Privacy policies contain privacy-related requirements about how the personal data of individuals will be handled by an organization or a software system (e.g., a web service or an app). In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). A prerequisite for GDPR compliance checking is to verify whether the content of a privacy policy is complete according to the provisions of GDPR. Incomplete privacy policies might result in large fines on violating organization as well as incomplete privacy-related software specifications. Manual completeness checking is both time-consuming and error-prone. In this paper, we propose AI-based automation for the completeness checking of privacy policies. Through systematic qualitative methods, we first build two artifacts to characterize the privacy-related provisions of GDPR, namely a conceptual model and a set of completeness criteria. Then, we develop an automated solution on top of these artifacts by leveraging a combination of natural language processing and supervised machine learning. Specifically, we identify the GDPR-relevant information content in privacy policies and subsequently check them against the completeness criteria. To evaluate our approach, we collected 234 real privacy policies from the fund industry. Over a set of 48 unseen privacy policies, our approach detected 300 of the total of 334 violations of some completeness criteria correctly, while producing 23 false positives. The approach thus has a precision of 92.9% and recall of 89.8%. Compared to a baseline that applies keyword search only, our approach results in an improvement of 24.5% in precision and 38% in recall.
△ Less
Submitted 5 October, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
D-Bees: A Novel Method Inspired by Bee Colony Optimization for Solving Word Sense Disambiguation
Authors:
Sallam Abualhaija,
Karl-Heinz Zimmermann
Abstract:
Word sense disambiguation (WSD) is a problem in the field of computational linguistics given as finding the intended sense of a word (or a set of words) when it is activated within a certain context. WSD was recently addressed as a combinatorial optimization problem in which the goal is to find a sequence of senses that maximize the semantic relatedness among the target words. In this article, a n…
▽ More
Word sense disambiguation (WSD) is a problem in the field of computational linguistics given as finding the intended sense of a word (or a set of words) when it is activated within a certain context. WSD was recently addressed as a combinatorial optimization problem in which the goal is to find a sequence of senses that maximize the semantic relatedness among the target words. In this article, a novel algorithm for solving the WSD problem called D-Bees is proposed which is inspired by bee colony optimization (BCO)where artificial bee agents collaborate to solve the problem. The D-Bees algorithm is evaluated on a standard dataset (SemEval 2007 coarse-grained English all-words task corpus)and is compared to simulated annealing, genetic algorithms, and two ant colony optimization techniques (ACO). It will be observed that the BCO and ACO approaches are on par.
△ Less
Submitted 6 May, 2014;
originally announced May 2014.