Search | arXiv e-print repository

EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models

Authors: Mahdi Dhaini, Kafaite Zahra Hussain, Efstratios Zaradoukas, Gjergji Kasneci

Abstract: As Natural Language Processing (NLP) models continue to evolve and become integral to high-stakes applications, ensuring their interpretability remains a critical challenge. Given the growing variety of explainability methods and diverse stakeholder requirements, frameworks that help stakeholders select appropriate explanations tailored to their specific use cases are increasingly important. To ad… ▽ More As Natural Language Processing (NLP) models continue to evolve and become integral to high-stakes applications, ensuring their interpretability remains a critical challenge. Given the growing variety of explainability methods and diverse stakeholder requirements, frameworks that help stakeholders select appropriate explanations tailored to their specific use cases are increasingly important. To address this need, we introduce EvalxNLP, a Python framework for benchmarking state-of-the-art feature attribution methods for transformer-based NLP models. EvalxNLP integrates eight widely recognized explainability techniques from the Explainable AI (XAI) literature, enabling users to generate and evaluate explanations based on key properties such as faithfulness, plausibility, and complexity. Our framework also provides interactive, LLM-based textual explanations, facilitating user understanding of the generated explanations and evaluation outcomes. Human evaluation results indicate high user satisfaction with EvalxNLP, suggesting it is a promising framework for benchmarking explanation methods across diverse user groups. By offering a user-friendly and extensible platform, EvalxNLP aims at democratizing explainability tools and supporting the systematic comparison and advancement of XAI techniques in NLP. △ Less

Submitted 2 May, 2025; originally announced May 2025.

Comments: Accepted to the xAI World Conference (2025) - System Demonstration

arXiv:2501.12206 [pdf, other]

PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model

Authors: Kazi Hasan Ibn Arif, Sajib Acharjee Dip, Khizar Hussain, Lang Zhang, Chris Thomas

Abstract: Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities in understanding and describing visual content, achieving state-of-the-art performance across various vision-language tasks. However, these models often generate descriptions containing objects or details that are absent in the input image, a phenomenon commonly known as hallucination. Our work investigates the key reas… ▽ More Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities in understanding and describing visual content, achieving state-of-the-art performance across various vision-language tasks. However, these models often generate descriptions containing objects or details that are absent in the input image, a phenomenon commonly known as hallucination. Our work investigates the key reasons behind this issue by analyzing the pattern of self-attention in transformer layers. We find that hallucinations often arise from the progressive weakening of attention weight to visual tokens in the deeper layers of the LLM. Some previous works naively boost the attention of all visual tokens to mitigate this issue, resulting in suboptimal hallucination reduction. To address this, we identify two critical sets of visual tokens that facilitate the transfer of visual information from the vision encoder to the LLM. Local tokens encode grounded information about objects present in an image, while summary tokens capture the overall aggregated representation of the image. Importantly, these two sets of tokens require different levels of weight enhancement. To this end, we propose \textbf{PAINT} (\textbf{P}aying \textbf{A}ttention to \textbf{IN}formed \textbf{T}okens), a plug-and-play framework that intervenes in the self-attention mechanism of the LLM, selectively boosting the attention weights of local and summary tokens with experimentally learned margins. Evaluation on the MSCOCO image captioning dataset demonstrate that our approach reduces hallucination rates by up to 62.3\% compared to baseline models while maintaining accuracy. Code is available at \href{https://github.com/hasanar1f/PAINT}{https://github.com/hasanar1f/PAINT} △ Less

Submitted 25 March, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

Comments: 6 pages, 4 tables, 3 figures

arXiv:2410.22912 [pdf, other]

Self-optimization in distributed manufacturing systems using Modular State-based Stackelberg Games

Authors: Steve Yuwono, Ahmar Kamal Hussain, Dorothea Schwung, Andreas Schwung

Abstract: In this study, we introduce Modular State-based Stackelberg Games (Mod-SbSG), a novel game structure developed for distributed self-learning in modular manufacturing systems. Mod-SbSG enhances cooperative decision-making among self-learning agents within production systems by integrating State-based Potential Games (SbPG) with Stackelberg games. This hierarchical structure assigns more important m… ▽ More In this study, we introduce Modular State-based Stackelberg Games (Mod-SbSG), a novel game structure developed for distributed self-learning in modular manufacturing systems. Mod-SbSG enhances cooperative decision-making among self-learning agents within production systems by integrating State-based Potential Games (SbPG) with Stackelberg games. This hierarchical structure assigns more important modules of the manufacturing system a first-mover advantage, while less important modules respond optimally to the leaders' decisions. This decision-making process differs from typical multi-agent learning algorithms in manufacturing systems, where decisions are made simultaneously. We provide convergence guarantees for the novel game structure and design learning algorithms to account for the hierarchical game structure. We further analyse the effects of single-leader/multiple-follower and multiple-leader/multiple-follower scenarios within a Mod-SbSG. To assess its effectiveness, we implement and test Mod-SbSG in an industrial control setting using two laboratory-scale testbeds featuring sequential and serial-parallel processes. The proposed approach delivers promising results compared to the vanilla SbPG, which reduces overflow by 97.1%, and in some cases, prevents overflow entirely. Additionally, it decreases power consumption by 5-13% while satisfying the production demand, which significantly improves potential (global objective) values. △ Less

Submitted 30 October, 2024; originally announced October 2024.

Comments: This pre-print was submitted to Journal of Manufacturing Systems on October 30, 2024

arXiv:2403.00830 [pdf, other]

MedAide: Leveraging Large Language Models for On-Premise Medical Assistance on Edge Devices

Authors: Abdul Basit, Khizar Hussain, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Large language models (LLMs) are revolutionizing various domains with their remarkable natural language processing (NLP) abilities. However, deploying LLMs in resource-constrained edge computing and embedded systems presents significant challenges. Another challenge lies in delivering medical assistance in remote areas with limited healthcare facilities and infrastructure. To address this, we intr… ▽ More Large language models (LLMs) are revolutionizing various domains with their remarkable natural language processing (NLP) abilities. However, deploying LLMs in resource-constrained edge computing and embedded systems presents significant challenges. Another challenge lies in delivering medical assistance in remote areas with limited healthcare facilities and infrastructure. To address this, we introduce MedAide, an on-premise healthcare chatbot. It leverages tiny-LLMs integrated with LangChain, providing efficient edge-based preliminary medical diagnostics and support. MedAide employs model optimizations for minimal memory footprint and latency on embedded edge devices without server infrastructure. The training process is optimized using low-rank adaptation (LoRA). Additionally, the model is trained on diverse medical datasets, employing reinforcement learning from human feedback (RLHF) to enhance its domain-specific capabilities. The system is implemented on various consumer GPUs and Nvidia Jetson development board. MedAide achieves 77\% accuracy in medical consultations and scores 56 in USMLE benchmark, enabling an energy-efficient healthcare assistance platform that alleviates privacy concerns due to edge-based deployment, thereby empowering the community. △ Less

Submitted 28 February, 2024; originally announced March 2024.

Comments: 7 pages, 11 figures, ACM conference paper, 33 references

ACM Class: I.2.7

arXiv:2308.03767 [pdf, other]

Enhancing image captioning with depth information using a Transformer-based framework

Authors: Aya Mahmoud Ahmed, Mohamed Yousef, Khaled F. Hussain, Yousef Bassyouni Mahdy

Abstract: Captioning images is a challenging scene-understanding task that connects computer vision and natural language processing. While image captioning models have been successful in producing excellent descriptions, the field has primarily focused on generating a single sentence for 2D images. This paper investigates whether integrating depth information with RGB images can enhance the captioning task… ▽ More Captioning images is a challenging scene-understanding task that connects computer vision and natural language processing. While image captioning models have been successful in producing excellent descriptions, the field has primarily focused on generating a single sentence for 2D images. This paper investigates whether integrating depth information with RGB images can enhance the captioning task and generate better descriptions. For this purpose, we propose a Transformer-based encoder-decoder framework for generating a multi-sentence description of a 3D scene. The RGB image and its corresponding depth map are provided as inputs to our framework, which combines them to produce a better understanding of the input scene. Depth maps could be ground truth or estimated, which makes our framework widely applicable to any RGB captioning dataset. We explored different fusion approaches to fuse RGB and depth images. The experiments are performed on the NYU-v2 dataset and the Stanford image paragraph captioning dataset. During our work with the NYU-v2 dataset, we found inconsistent labeling that prevents the benefit of using depth information to enhance the captioning task. The results were even worse than using RGB images only. As a result, we propose a cleaned version of the NYU-v2 dataset that is more consistent and informative. Our results on both datasets demonstrate that the proposed framework effectively benefits from depth information, whether it is ground truth or estimated, and generates better captions. Code, pre-trained models, and the cleaned version of the NYU-v2 dataset will be made publically available. △ Less

Submitted 24 July, 2023; originally announced August 2023.

Comments: 19 pages, 5 figures, 13 tables

arXiv:2209.04796 [pdf, other]

Multiple Object Tracking in Recent Times: A Literature Review

Authors: Mk Bashar, Samia Islam, Kashifa Kawaakib Hussain, Md. Bakhtiar Hasan, A. B. M. Ashikur Rahman, Md. Hasanul Kabir

Abstract: Multiple object tracking gained a lot of interest from researchers in recent years, and it has become one of the trending problems in computer vision, especially with the recent advancement of autonomous driving. MOT is one of the critical vision tasks for different issues like occlusion in crowded scenes, similar appearance, small object detection difficulty, ID switching, etc. To tackle these ch… ▽ More Multiple object tracking gained a lot of interest from researchers in recent years, and it has become one of the trending problems in computer vision, especially with the recent advancement of autonomous driving. MOT is one of the critical vision tasks for different issues like occlusion in crowded scenes, similar appearance, small object detection difficulty, ID switching, etc. To tackle these challenges, as researchers tried to utilize the attention mechanism of transformer, interrelation of tracklets with graph convolutional neural network, appearance similarity of objects in different frames with the siamese network, they also tried simple IOU matching based CNN network, motion prediction with LSTM. To take these scattered techniques under an umbrella, we have studied more than a hundred papers published over the last three years and have tried to extract the techniques that are more focused on by researchers in recent times to solve the problems of MOT. We have enlisted numerous applications, possibilities, and how MOT can be related to real life. Our review has tried to show the different perspectives of techniques that researchers used overtimes and give some future direction for the potential researchers. Moreover, we have included popular benchmark datasets and metrics in this review. △ Less

Submitted 11 September, 2022; originally announced September 2022.

arXiv:1906.08864 [pdf, other]

Accurate and Energy-Efficient Classification with Spiking Random Neural Network: Corrected and Expanded Version

Authors: Khaled F. Hussain, Mohamed Yousef Bassyouni, Erol Gelenbe

Abstract: Artificial Neural Network (ANN) based techniques have dominated state-of-the-art results in most problems related to computer vision, audio recognition, and natural language processing in the past few years, resulting in strong industrial adoption from all leading technology companies worldwide. One of the major obstacles that have historically delayed large scale adoption of ANNs is the huge comp… ▽ More Artificial Neural Network (ANN) based techniques have dominated state-of-the-art results in most problems related to computer vision, audio recognition, and natural language processing in the past few years, resulting in strong industrial adoption from all leading technology companies worldwide. One of the major obstacles that have historically delayed large scale adoption of ANNs is the huge computational and power costs associated with training and testing (deploying) them. In the mean-time, Neuromorphic Computing platforms have recently achieved remarkable performance running more bio-realistic Spiking Neural Networks at high throughput and very low power consumption making them a natural alternative to ANNs. Here, we propose using the Random Neural Network (RNN), a spiking neural network with both theoretical and practical appealing properties, as a general purpose classifier that can match the classification power of ANNs on a number of tasks while enjoying all the features of a spiking neural network. This is demonstrated on a number of real-world classification datasets. △ Less

Submitted 1 June, 2019; originally announced June 2019.

ACM Class: I.2; G.3

arXiv:1812.11894 [pdf, other]

Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks

Authors: Mohamed Yousef, Khaled F. Hussain, Usama S. Mohammed

Abstract: Unconstrained text recognition is an important computer vision task, featuring a wide variety of different sub-tasks, each with its own set of challenges. One of the biggest promises of deep neural networks has been the convergence and automation of feature extractors from input raw signals, allowing for the highest possible performance with minimum required domain knowledge. To this end, we propo… ▽ More Unconstrained text recognition is an important computer vision task, featuring a wide variety of different sub-tasks, each with its own set of challenges. One of the biggest promises of deep neural networks has been the convergence and automation of feature extractors from input raw signals, allowing for the highest possible performance with minimum required domain knowledge. To this end, we propose a data-efficient, end-to-end neural network model for generic, unconstrained text recognition. In our proposed architecture we strive for simplicity and efficiency without sacrificing recognition accuracy. Our proposed architecture is a fully convolutional network without any recurrent connections trained with the CTC loss function. Thus it operates on arbitrary input sizes and produces strings of arbitrary length in a very efficient and parallelizable manner. We show the generality and superiority of our proposed text recognition architecture by achieving state of the art results on seven public benchmark datasets, covering a wide spectrum of text recognition tasks, namely: Handwriting Recognition, CAPTCHA recognition, OCR, License Plate Recognition, and Scene Text Recognition. Our proposed architecture has won the ICFHR2018 Competition on Automated Text Recognition on a READ Dataset. △ Less

Submitted 31 December, 2018; originally announced December 2018.

Comments: Submitted for publication

arXiv:1711.00972 [pdf, other]

The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques

Authors: Mahmoud Afifi, Khaled F. Hussain

Abstract: In spite of the high accuracy of the existing optical mark reading (OMR) systems and devices, a few restrictions remain existent. In this work, we aim to reduce the restrictions of multiple choice questions (MCQ) within tests. We use an image registration technique to extract the answer boxes from answer sheets. Unlike other systems that rely on simple image processing steps to recognize the extra… ▽ More In spite of the high accuracy of the existing optical mark reading (OMR) systems and devices, a few restrictions remain existent. In this work, we aim to reduce the restrictions of multiple choice questions (MCQ) within tests. We use an image registration technique to extract the answer boxes from answer sheets. Unlike other systems that rely on simple image processing steps to recognize the extracted answer boxes, we address the problem from another perspective by training a machine learning classifier to recognize the class of each answer box (i.e., confirmed, crossed out, or blank answer). This gives us the ability to deal with a variety of shading and mark patterns, and distinguish between chosen (i.e., confirmed) and canceled answers (i.e., crossed out). All existing machine learning techniques require a large number of examples in order to train a model for classification, therefore we present a dataset including six real MCQ assessments with different answer sheet templates. We evaluate two strategies of classification: a straight-forward approach and a two-stage classifier approach. We test two handcrafted feature methods and a convolutional neural network. In the end, we present an easy-to-use graphical user interface of the proposed system. Compared with existing OMR systems, the proposed system has the least constraints and achieves a high accuracy. We believe that the presented work will further direct the development of OMR systems towards reducing the restrictions of the MCQ tests. △ Less

Submitted 11 January, 2019; v1 submitted 2 November, 2017; originally announced November 2017.

arXiv:1610.06085 [pdf]

doi 10.5121/ijcses.2015.6502

The survey of sentiment and opinion mining for behavior analysis of social media

Authors: Saqib Iqbal, Ali Zulqurnain, Yaqoob Wani, Khalid Hussain

Abstract: Nowadays, internet has changed the world into a global village. Social Media has reduced the gaps among the individuals. Previously communication was a time consuming and expensive task between the people. Social Media has earned fame because it is a cheaper and faster communication provider. Besides, social media has allowed us to reduce the gaps of physical distance, it also generates and preser… ▽ More Nowadays, internet has changed the world into a global village. Social Media has reduced the gaps among the individuals. Previously communication was a time consuming and expensive task between the people. Social Media has earned fame because it is a cheaper and faster communication provider. Besides, social media has allowed us to reduce the gaps of physical distance, it also generates and preserves huge amount of data. The data are very valuable and it presents association degree between people and their opinions. The comprehensive analysis of the methods which are used on user behavior prediction is presented in this paper. This comparison will provide a detailed information, pros and cons in the domain of sentiment and opinion mining. △ Less

Submitted 7 November, 2015; originally announced October 2016.

Comments: 21-28, International Journal of Computer Science & Engineering Survey (IJCSES), Vol.6, No.5, October 2015

Showing 1–10 of 10 results for author: Hussain, K