-
From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care
Authors:
Daniel Anadria,
Roel Dobbe,
Anastasia Giachanou,
Ruurd Kuiper,
Richard Bartels,
Wouter van Amsterdam,
Íñigo Martínez de Rituerto de Troya,
Carmen Zürcher,
Daniel Oberski
Abstract:
In this work, we reflect on the data-driven modeling paradigm that is gaining ground in AI-driven automation of patient care. We argue that the repurposing of existing real-world patient datasets for machine learning may not always represent an optimal approach to model development as it could lead to undesirable outcomes in patient care. We reflect on the history of data analysis to explain how t…
▽ More
In this work, we reflect on the data-driven modeling paradigm that is gaining ground in AI-driven automation of patient care. We argue that the repurposing of existing real-world patient datasets for machine learning may not always represent an optimal approach to model development as it could lead to undesirable outcomes in patient care. We reflect on the history of data analysis to explain how the data-driven paradigm rose to popularity, and we envision ways in which systems thinking and clinical domain theory could complement the existing model development approaches in reaching human-centric outcomes. We call for a purpose-driven machine learning paradigm that is grounded in clinical theory and the sociotechnical realities of real-world operational contexts. We argue that understanding the utility of existing patient datasets requires looking in two directions: upstream towards the data generation, and downstream towards the automation objectives. This purpose-driven perspective to AI system development opens up new methodological opportunities and holds promise for AI automation of patient care.
△ Less
Submitted 18 June, 2025; v1 submitted 16 June, 2025;
originally announced June 2025.
-
Explainability-Based Token Replacement on LLM-Generated Text
Authors:
Hadi Mohammadi,
Anastasia Giachanou,
Daniel L. Oberski,
Ayoub Bagheri
Abstract:
Generative models, especially large language models (LLMs), have shown remarkable progress in producing text that appears human-like. However, they often exhibit patterns that make their output easier to detect than text written by humans. In this paper, we investigate how explainable AI (XAI) methods can be used to reduce the detectability of AI-generated text (AIGT) while also introducing a robu…
▽ More
Generative models, especially large language models (LLMs), have shown remarkable progress in producing text that appears human-like. However, they often exhibit patterns that make their output easier to detect than text written by humans. In this paper, we investigate how explainable AI (XAI) methods can be used to reduce the detectability of AI-generated text (AIGT) while also introducing a robust ensemble-based detection approach. We begin by training an ensemble classifier to distinguish AIGT from human-written text, then apply SHAP and LIME to identify tokens that most strongly influence its predictions. We propose four explainability-based token replacement strategies to modify these influential tokens. Our findings show that these token replacement approaches can significantly diminish a single classifier's ability to detect AIGT. However, our ensemble classifier maintains strong performance across multiple languages and domains, showing that a multi-model approach can mitigate the impact of token-level manipulations. These results show that XAI methods can make AIGT harder to detect by focusing on the most influential tokens. At the same time, they highlight the need for robust, ensemble-based detection strategies that can adapt to evolving approaches for hiding AIGT.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Stereotype Detection in Natural Language Processing
Authors:
Alessandra Teresa Cignarella,
Anastasia Giachanou,
Els Lefever
Abstract:
Stereotypes influence social perceptions and can escalate into discrimination and violence. While NLP research has extensively addressed gender bias and hate speech, stereotype detection remains an emerging field with significant societal implications. In this work is presented a survey of existing research, analyzing definitions from psychology, sociology, and philosophy. A semi-automatic literat…
▽ More
Stereotypes influence social perceptions and can escalate into discrimination and violence. While NLP research has extensively addressed gender bias and hate speech, stereotype detection remains an emerging field with significant societal implications. In this work is presented a survey of existing research, analyzing definitions from psychology, sociology, and philosophy. A semi-automatic literature review was performed by using Semantic Scholar. We retrieved and filtered over 6,000 papers (in the year range 2000-2025), identifying key trends, methodologies, challenges and future directions. The findings emphasize stereotype detection as a potential early-monitoring tool to prevent bias escalation and the rise of hate speech. Conclusions highlight the need for a broader, multilingual, and intersectional approach in NLP studies.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Explainability in Practice: A Survey of Explainable NLP Across Various Domains
Authors:
Hadi Mohammadi,
Ayoub Bagheri,
Anastasia Giachanou,
Daniel L. Oberski
Abstract:
Natural Language Processing (NLP) has become a cornerstone in many critical sectors, including healthcare, finance, and customer relationship management. This is especially true with the development and use of advanced models such as GPT-based architectures and BERT, which are widely used in decision-making processes. However, the black-box nature of these advanced NLP models has created an urgent…
▽ More
Natural Language Processing (NLP) has become a cornerstone in many critical sectors, including healthcare, finance, and customer relationship management. This is especially true with the development and use of advanced models such as GPT-based architectures and BERT, which are widely used in decision-making processes. However, the black-box nature of these advanced NLP models has created an urgent need for transparency and explainability. This review explores explainable NLP (XNLP) with a focus on its practical deployment and real-world applications, examining its implementation and the challenges faced in domain-specific contexts. The paper underscores the importance of explainability in NLP and provides a comprehensive perspective on how XNLP can be designed to meet the unique demands of various sectors, from healthcare's need for clear insights to finance's emphasis on fraud detection and risk assessment. Additionally, this review aims to bridge the knowledge gap in XNLP literature by offering a domain-specific exploration and discussing underrepresented areas such as real-world applicability, metric evaluation, and the role of human interaction in model assessment. The paper concludes by suggesting future research directions that could enhance the understanding and broader application of XNLP.
△ Less
Submitted 5 June, 2025; v1 submitted 2 February, 2025;
originally announced February 2025.
-
On Text-based Personality Computing: Challenges and Future Directions
Authors:
Qixiang Fang,
Anastasia Giachanou,
Ayoub Bagheri,
Laura Boeschoten,
Erik-Jan van Kesteren,
Mahdi Shafiee Kamalabad,
Daniel L Oberski
Abstract:
Text-based personality computing (TPC) has gained many research interests in NLP. In this paper, we describe 15 challenges that we consider deserving the attention of the research community. These challenges are organized by the following topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. When addressing each ch…
▽ More
Text-based personality computing (TPC) has gained many research interests in NLP. In this paper, we describe 15 challenges that we consider deserving the attention of the research community. These challenges are organized by the following topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. When addressing each challenge, not only do we combine perspectives from both NLP and social sciences, but also offer concrete suggestions. We hope to inspire more valid and reliable TPC research.
△ Less
Submitted 22 May, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Improving Stance Detection by Leveraging Measurement Knowledge from Social Sciences: A Case Study of Dutch Political Tweets and Traditional Gender Role Division
Authors:
Qixiang Fang,
Anastasia Giachanou,
Ayoub Bagheri
Abstract:
Stance detection (SD) concerns automatically determining the viewpoint (i.e., in favour of, against, or neutral) of a text's author towards a target. SD has been applied to many research topics, among which the detection of stances behind political tweets is an important one. In this paper, we apply SD to a dataset of tweets from official party accounts in the Netherlands between 2017 and 2021, wi…
▽ More
Stance detection (SD) concerns automatically determining the viewpoint (i.e., in favour of, against, or neutral) of a text's author towards a target. SD has been applied to many research topics, among which the detection of stances behind political tweets is an important one. In this paper, we apply SD to a dataset of tweets from official party accounts in the Netherlands between 2017 and 2021, with a focus on stances towards traditional gender role division, a dividing issue between (some) Dutch political parties. To implement and improve SD of traditional gender role division, we propose to leverage an established survey instrument from social sciences, which has been validated for the purpose of measuring attitudes towards traditional gender role division. Based on our experiments, we show that using such a validated survey instrument helps to improve SD performance.
△ Less
Submitted 25 July, 2024; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Studying Fake News Spreading, Polarisation Dynamics, and Manipulation by Bots: a Tale of Networks and Language
Authors:
Giancarlo Ruffo,
Alfonso Semeraro,
Anastasia Giachanou,
Paolo Rosso
Abstract:
With the explosive growth of online social media, the ancient problem of information disorders interfering with news diffusion has surfaced with a renewed intensity threatening our democracies, public health, and news outlets' credibility. Therefore, thousands of scientific papers have been published in a relatively short period, making researchers of different disciplines struggle with an informa…
▽ More
With the explosive growth of online social media, the ancient problem of information disorders interfering with news diffusion has surfaced with a renewed intensity threatening our democracies, public health, and news outlets' credibility. Therefore, thousands of scientific papers have been published in a relatively short period, making researchers of different disciplines struggle with an information overload problem. The aim of this survey is threefold: (1) we present the results of a network-based analysis of the existing multidisciplinary literature to support the search for relevant trends and central publications; (2) we describe the main results and necessary background to attack the problem under a computational perspective; (3) we review selected contributions using network science as a unifying framework and computational linguistics as the tool to make sense of the shared content. Despite scholars working on computational linguistics and networks traditionally belong to different scientific communities, we expect that those interested in the area of fake news should be aware of crucial aspects of both disciplines.
△ Less
Submitted 14 January, 2023; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Comparative Opinion Mining: A Review
Authors:
Kasturi Dewi Varathan,
Anastasia Giachanou,
Fabio Crestani
Abstract:
Opinion mining refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyse the public opinion on a number of different topics. Comparative opinion mining i…
▽ More
Opinion mining refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyse the public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining that deals with identifying and extracting information that is expressed in a comparative form (e.g.~"paper X is better than the Y"). Comparative opinion mining plays a very important role when ones tries to evaluate something, as it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from perspective of techniques and the other from perspective of comparative opinion elements. It also incorporates preprocessing tools as well as dataset that were used by the past researchers that can be useful to the future researchers in the field of comparative opinion mining.
△ Less
Submitted 24 December, 2017;
originally announced December 2017.