Search | arXiv e-print repository

Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models

Authors: Patrick Leask, Neel Nanda, Noura Al Moubayed

Abstract: Sparse autoencoders (SAEs) are a popular method for decomposing Large Langage Models (LLM) activations into interpretable latents. However, due to their substantial training cost, most academic research uses open-source SAEs which are only available for a restricted set of models of up to 27B parameters. SAE latents are also learned from a dataset of activations, which means they do not transfer b… ▽ More Sparse autoencoders (SAEs) are a popular method for decomposing Large Langage Models (LLM) activations into interpretable latents. However, due to their substantial training cost, most academic research uses open-source SAEs which are only available for a restricted set of models of up to 27B parameters. SAE latents are also learned from a dataset of activations, which means they do not transfer between models. Motivated by relative representation similarity measures, we introduce Inference-Time Decomposition of Activations (ITDA) models, an alternative method for decomposing language model activations. To train an ITDA, we greedily construct a dictionary of language model activations on a dataset of prompts, selecting those activations which were worst approximated by matching pursuit on the existing dictionary. ITDAs can be trained in just 1% of the time required for SAEs, using 1% of the data. This allowed us to train ITDAs on Llama-3.1 70B and 405B on a single consumer GPU. ITDAs can achieve similar reconstruction performance to SAEs on some target LLMs, but generally incur a performance penalty. However, ITDA dictionaries enable cross-model comparisons, and a simple Jaccard similarity index on ITDA dictionaries outperforms existing methods like CKA, SVCCA, and relative representation similarity metrics. ITDAs provide a cheap alternative to SAEs where computational resources are limited, or when cross model comparisons are necessary. Code available at https://github.com/pleask/itda. △ Less

Submitted 12 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

Comments: ICML 2025

arXiv:2504.13816 [pdf, ps, other]

Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

Authors: Chenghao Xiao, Hou Pong Chan, Hao Zhang, Mahani Aljunied, Lidong Bing, Noura Al Moubayed, Yu Rong

Abstract: While understanding the knowledge boundaries of LLMs is crucial to prevent hallucination, research on the knowledge boundaries of LLMs has predominantly focused on English. In this work, we present the first study to analyze how LLMs recognize knowledge boundaries across different languages by probing their internal representations when processing known and unknown questions in multiple languages.… ▽ More While understanding the knowledge boundaries of LLMs is crucial to prevent hallucination, research on the knowledge boundaries of LLMs has predominantly focused on English. In this work, we present the first study to analyze how LLMs recognize knowledge boundaries across different languages by probing their internal representations when processing known and unknown questions in multiple languages. Our empirical studies reveal three key findings: 1) LLMs' perceptions of knowledge boundaries are encoded in the middle to middle-upper layers across different languages. 2) Language differences in knowledge boundary perception follow a linear structure, which motivates our proposal of a training-free alignment method that effectively transfers knowledge boundary perception ability across languages, thereby helping reduce hallucination risk in low-resource languages; 3) Fine-tuning on bilingual question pair translation further enhances LLMs' recognition of knowledge boundaries across languages. Given the absence of standard testbeds for cross-lingual knowledge boundary analysis, we construct a multilingual evaluation suite comprising three representative types of knowledge boundary data. Our code and datasets are publicly available at https://github.com/DAMO-NLP-SG/LLM-Multilingual-Knowledge-Boundaries. △ Less

Submitted 6 June, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

Comments: ACL 2025 main; camera ready

arXiv:2504.10471 [pdf, other]

MIEB: Massive Image Embedding Benchmark

Authors: Chenghao Xiao, Isaac Chung, Imene Kerboua, Jamie Stirling, Xin Zhang, Márton Kardos, Roman Solomatin, Noura Al Moubayed, Kenneth Enevoldsen, Niklas Muennighoff

Abstract: Image representations are often evaluated through disjointed, task-specific protocols, leading to a fragmented understanding of model capabilities. For instance, it is unclear whether an image embedding model adept at clustering images is equally good at retrieving relevant images given a piece of text. We introduce the Massive Image Embedding Benchmark (MIEB) to evaluate the performance of image… ▽ More Image representations are often evaluated through disjointed, task-specific protocols, leading to a fragmented understanding of model capabilities. For instance, it is unclear whether an image embedding model adept at clustering images is equally good at retrieving relevant images given a piece of text. We introduce the Massive Image Embedding Benchmark (MIEB) to evaluate the performance of image and image-text embedding models across the broadest spectrum to date. MIEB spans 38 languages across 130 individual tasks, which we group into 8 high-level categories. We benchmark 50 models across our benchmark, finding that no single method dominates across all task categories. We reveal hidden capabilities in advanced vision models such as their accurate visual representation of texts, and their yet limited capabilities in interleaved encodings and matching images and texts in the presence of confounders. We also show that the performance of vision encoders on MIEB correlates highly with their performance when used in multimodal large language models. Our code, dataset, and leaderboard are publicly available at https://github.com/embeddings-benchmark/mteb. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2502.04878 [pdf, other]

Sparse Autoencoders Do Not Find Canonical Units of Analysis

Authors: Patrick Leask, Bart Bussmann, Michael Pearce, Joseph Bloom, Curt Tigges, Noura Al Moubayed, Lee Sharkey, Neel Nanda

Abstract: A common goal of mechanistic interpretability is to decompose the activations of neural networks into features: interpretable properties of the input computed by the model. Sparse autoencoders (SAEs) are a popular method for finding these features in LLMs, and it has been postulated that they can be used to find a \textit{canonical} set of units: a unique and complete list of atomic features. We c… ▽ More A common goal of mechanistic interpretability is to decompose the activations of neural networks into features: interpretable properties of the input computed by the model. Sparse autoencoders (SAEs) are a popular method for finding these features in LLMs, and it has been postulated that they can be used to find a \textit{canonical} set of units: a unique and complete list of atomic features. We cast doubt on this belief using two novel techniques: SAE stitching to show they are incomplete, and meta-SAEs to show they are not atomic. SAE stitching involves inserting or swapping latents from a larger SAE into a smaller one. Latents from the larger SAE can be divided into two categories: \emph{novel latents}, which improve performance when added to the smaller SAE, indicating they capture novel information, and \emph{reconstruction latents}, which can replace corresponding latents in the smaller SAE that have similar behavior. The existence of novel features indicates incompleteness of smaller SAEs. Using meta-SAEs -- SAEs trained on the decoder matrix of another SAE -- we find that latents in SAEs often decompose into combinations of latents from a smaller SAE, showing that larger SAE latents are not atomic. The resulting decompositions are often interpretable; e.g. a latent representing ``Einstein'' decomposes into ``scientist'', ``Germany'', and ``famous person''. Even if SAEs do not find canonical units of analysis, they may still be useful tools. We suggest that future research should either pursue different approaches for identifying such units, or pragmatically choose the SAE size suited to their task. We provide an interactive dashboard to explore meta-SAEs: https://metasaes.streamlit.app/ △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: Accepted to ICLR 2025

arXiv:2411.10503 [pdf, other]

Everything is a Video: Unifying Modalities through Next-Frame Prediction

Authors: G. Thomas Hudson, Dean Slack, Thomas Winterbottom, Jamie Sterling, Chenghao Xiao, Junjie Shentu, Noura Al Moubayed

Abstract: Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Traditional approaches rely on modality-specific encoders and late fusion techniques, which can hinder scalability and flexibility when adapting to new tasks o… ▽ More Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Traditional approaches rely on modality-specific encoders and late fusion techniques, which can hinder scalability and flexibility when adapting to new tasks or modalities. To address these limitations, we introduce a novel framework that extends the concept of task reformulation beyond natural language processing (NLP) to multimodal learning. We propose to reformulate diverse multimodal tasks into a unified next-frame prediction problem, allowing a single model to handle different modalities without modality-specific components. This method treats all inputs and outputs as sequential frames in a video, enabling seamless integration of modalities and effective knowledge transfer across tasks. Our approach is evaluated on a range of tasks, including text-to-text, image-to-text, video-to-video, video-to-text, and audio-to-text, demonstrating the model's ability to generalize across modalities with minimal adaptation. We show that task reformulation can significantly simplify multimodal model design across various tasks, laying the groundwork for more generalized multimodal foundation models. △ Less

Submitted 15 November, 2024; originally announced November 2024.

Comments: 10 pages, 10 figures

arXiv:2409.00438 [pdf, other]

Breaking Down Financial News Impact: A Novel AI Approach with Geometric Hypergraphs

Authors: Anoushka Harit, Zhongtian Sun, Jongmin Yu, Noura Al Moubayed

Abstract: In the fast-paced and volatile financial markets, accurately predicting stock movements based on financial news is critical for investors and analysts. Traditional models often struggle to capture the intricate and dynamic relationships between news events and market reactions, limiting their ability to provide actionable insights. This paper introduces a novel approach leveraging Explainable Arti… ▽ More In the fast-paced and volatile financial markets, accurately predicting stock movements based on financial news is critical for investors and analysts. Traditional models often struggle to capture the intricate and dynamic relationships between news events and market reactions, limiting their ability to provide actionable insights. This paper introduces a novel approach leveraging Explainable Artificial Intelligence (XAI) through the development of a Geometric Hypergraph Attention Network (GHAN) to analyze the impact of financial news on market behaviours. Geometric hypergraphs extend traditional graph structures by allowing edges to connect multiple nodes, effectively modelling high-order relationships and interactions among financial entities and news events. This unique capability enables the capture of complex dependencies, such as the simultaneous impact of a single news event on multiple stocks or sectors, which traditional models frequently overlook. By incorporating attention mechanisms within hypergraphs, GHAN enhances the model's ability to focus on the most relevant information, ensuring more accurate predictions and better interpretability. Additionally, we employ BERT-based embeddings to capture the semantic richness of financial news texts, providing a nuanced understanding of the content. Using a comprehensive financial news dataset, our GHAN model addresses key challenges in financial news impact analysis, including the complexity of high-order interactions, the necessity for model interpretability, and the dynamic nature of financial markets. Integrating attention mechanisms and SHAP values within GHAN ensures transparency, highlighting the most influential factors driving market predictions. Empirical validation demonstrates the superior effectiveness of our approach over traditional sentiment analysis and time-series models. △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: 16 pages, conference

arXiv:2406.17911 [pdf, other]

X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation

Authors: Kun Zhao, Chenghao Xiao, Sixing Yan, Haoteng Tang, William K. Cheung, Noura Al Moubayed, Liang Zhan, Chenghua Lin

Abstract: Radiology Report Generation (RRG) has advanced considerably with the development of multimodal generative models. Despite the progress, the field still faces significant challenges in evaluation, as existing metrics lack robustness and fairness. We reveal that, RRG with high performance on existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by le… ▽ More Radiology Report Generation (RRG) has advanced considerably with the development of multimodal generative models. Despite the progress, the field still faces significant challenges in evaluation, as existing metrics lack robustness and fairness. We reveal that, RRG with high performance on existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become a pressing issue for RRG due to the highly patternized nature of these reports. In addition, standard radiology reports are often highly technical. Helping patients understand these reports is crucial from a patient's perspective, yet this has been largely overlooked in previous work. In this work, we un-intuitively approach these problems by proposing the Layman's RRG framework that can systematically improve RRG with day-to-day language. Specifically, our framework first contributes a translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is effective in mitigating the inflated numbers of BLEU and provides more robust evaluation. We show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. Last, we reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats. △ Less

Submitted 19 May, 2025; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: BioLaySumm shared-task 2025 official dataset

arXiv:2405.17965 [pdf, other]

AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

Authors: Junjie Shentu, Matthew Watson, Noura Al Moubayed

Abstract: With the unprecedented performance being achieved by text-to-image (T2I) diffusion models, T2I customization further empowers users to tailor the diffusion model to new concepts absent in the pre-training dataset, termed subject-driven generation. Moreover, extracting several new concepts from a single image enables the model to learn multiple concepts, and simultaneously decreases the difficultie… ▽ More With the unprecedented performance being achieved by text-to-image (T2I) diffusion models, T2I customization further empowers users to tailor the diffusion model to new concepts absent in the pre-training dataset, termed subject-driven generation. Moreover, extracting several new concepts from a single image enables the model to learn multiple concepts, and simultaneously decreases the difficulties of training data preparation, urging the disentanglement of multiple concepts to be a new challenge. However, existing models for disentanglement commonly require pre-determined masks or retain background elements. To this end, we propose an attention-guided method, AttenCraft, for multiple concept disentanglement. In particular, our method leverages self-attention and cross-attention maps to create accurate masks for each concept within a single initialization step, omitting any required mask preparation by humans or other models. The created masks are then applied to guide the cross-attention activation of each target concept during training and achieve concept disentanglement. Additionally, we introduce Uniform sampling and Reweighted sampling schemes to alleviate the non-synchronicity of feature acquisition from different concepts, and improve generation quality. Our method outperforms baseline models in terms of image-alignment, and behaves comparably on text-alignment. Finally, we showcase the applicability of AttenCraft to more complicated settings, such as an input image containing three concepts. The project is available at https://github.com/junjie-shentu/AttenCraft. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17450 [pdf, other]

The Power of Next-Frame Prediction for Learning Physical Laws

Authors: Thomas Winterbottom, G. Thomas Hudson, Daniel Kluvanec, Dean Slack, Jamie Sterling, Junjie Shentu, Chenghao Xiao, Zheming Zhou, Noura Al Moubayed

Abstract: Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data. Inspired by the empirical success of causal language modelling and next-token prediction in language modelling, we explore the extent to which next-frame prediction serves as a strong foundational learning strategy (analogous to language modelling) for inducing an understanding of the… ▽ More Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data. Inspired by the empirical success of causal language modelling and next-token prediction in language modelling, we explore the extent to which next-frame prediction serves as a strong foundational learning strategy (analogous to language modelling) for inducing an understanding of the visual world. In order to quantify the specific visual understanding induced by next-frame prediction, we introduce six diagnostic simulation video datasets derived from fundamental physical laws created by varying physical constants such as gravity and mass. We demonstrate that our models trained only on next-frame prediction are capable of predicting the value of these physical constants (e.g. gravity) without having been trained directly to learn these constants via a regression task. We find that the generative training phase alone induces a model state that can predict physical constants significantly better than that of a random model, improving the loss by a factor of between 1.28 to 6.24. We conclude that next-frame prediction shows great promise as a general learning strategy to induce understanding of the many `laws' that govern the visual domain without the need for explicit labelling. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 7 Figures, 12 Pages, 1 Table

MSC Class: 68T45 ACM Class: I.2.6; I.2.10

arXiv:2404.06347 [pdf, other]

RAR-b: Reasoning as Retrieval Benchmark

Authors: Chenghao Xiao, G Thomas Hudson, Noura Al Moubayed

Abstract: Semantic textual similartiy (STS) and information retrieval tasks (IR) tasks have been the two major avenues to record the progress of embedding models in the past few years. Under the emerging Retrieval-augmented Generation (RAG) paradigm, we envision the need to evaluate next-level language understanding abilities of embedding models, and take a conscious look at the reasoning abilities stored i… ▽ More Semantic textual similartiy (STS) and information retrieval tasks (IR) tasks have been the two major avenues to record the progress of embedding models in the past few years. Under the emerging Retrieval-augmented Generation (RAG) paradigm, we envision the need to evaluate next-level language understanding abilities of embedding models, and take a conscious look at the reasoning abilities stored in them. Addressing this, we pose the question: Can retrievers solve reasoning problems? By transforming reasoning tasks into retrieval tasks, we find that without specifically trained for reasoning-level language understanding, current state-of-the-art retriever models may still be far from being competent for playing the role of assisting LLMs, especially in reasoning-intensive tasks. Moreover, albeit trained to be aware of instructions, instruction-aware IR models are often better off without instructions in inference time for reasoning tasks, posing an overlooked retriever-LLM behavioral gap for the research community to align. However, recent decoder-based embedding models show great promise in narrowing the gap, highlighting the pathway for embedding models to achieve reasoning-level language understanding. We also show that, although current off-the-shelf re-ranker models fail on these tasks, injecting reasoning abilities into them through fine-tuning still appears easier than doing so to bi-encoders, and we are able to achieve state-of-the-art performance across all tasks by fine-tuning a reranking model. We release Reasoning as Retrieval Benchmark (RAR-b), a holistic suite of tasks and settings to evaluate the reasoning abilities stored in retriever models. RAR-b is available at https://github.com/gowitheflow-1998/RAR-b. △ Less

Submitted 12 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: v2, small typo fixes

arXiv:2403.19897 [pdf, other]

Disentangling Racial Phenotypes: Fine-Grained Control of Race-related Facial Phenotype Characteristics

Authors: Seyma Yucer, Amir Atapour Abarghouei, Noura Al Moubayed, Toby P. Breckon

Abstract: Achieving an effective fine-grained appearance variation over 2D facial images, whilst preserving facial identity, is a challenging task due to the high complexity and entanglement of common 2D facial feature encoding spaces. Despite these challenges, such fine-grained control, by way of disentanglement is a crucial enabler for data-driven racial bias mitigation strategies across multiple automate… ▽ More Achieving an effective fine-grained appearance variation over 2D facial images, whilst preserving facial identity, is a challenging task due to the high complexity and entanglement of common 2D facial feature encoding spaces. Despite these challenges, such fine-grained control, by way of disentanglement is a crucial enabler for data-driven racial bias mitigation strategies across multiple automated facial analysis tasks, as it allows to analyse, characterise and synthesise human facial diversity. In this paper, we propose a novel GAN framework to enable fine-grained control over individual race-related phenotype attributes of the facial images. Our framework factors the latent (feature) space into elements that correspond to race-related facial phenotype representations, thereby separating phenotype aspects (e.g. skin, hair colour, nose, eye, mouth shapes), which are notoriously difficult to annotate robustly in real-world facial data. Concurrently, we also introduce a high quality augmented, diverse 2D face image dataset drawn from CelebA-HQ for GAN training. Unlike prior work, our framework only relies upon 2D imagery and related parameters to achieve state-of-the-art individual control over race-related phenotype attributes with improved photo-realistic output. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.09966 [pdf, other]

Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

Authors: Junjie Shentu, Matthew Watson, Noura Al Moubayed

Abstract: Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-ima… ▽ More Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-image model (Texual Localization) to handle multi-concept input images. During fine-tuning, our method incorporates a novel cross-attention guidance to decompose multiple concepts, establishing distinct connections between the visual representation of the target concept and the identifier token in the text prompt. Experimental results reveal that our method outperforms or performs comparably to the baseline models in terms of image fidelity and image-text alignment on multi-concept input images. In comparison to Custom Diffusion, our method with hard guidance achieves CLIP-I scores that are 7.04%, 8.13% higher and CLIP-T scores that are 2.22%, 5.85% higher in single-concept and multi-concept generation, respectively. Notably, our method generates cross-attention maps consistent with the target concept in the generated images, a capability absent in existing models. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08183 [pdf, other]

Pixel Sentence Representation Learning

Authors: Chenghao Xiao, Zhuoxu Huang, Danlu Chen, G Thomas Hudson, Yizhi Li, Haoran Duan, Chenghua Lin, Jie Fu, Jungong Han, Noura Al Moubayed

Abstract: Pretrained language models are long known to be subpar in capturing sentence and document-level semantics. Though heavily investigated, transferring perturbation-based methods from unsupervised visual representation learning to NLP remains an unsolved problem. This is largely due to the discreteness of subword units brought by tokenization of language models, limiting small perturbations of inputs… ▽ More Pretrained language models are long known to be subpar in capturing sentence and document-level semantics. Though heavily investigated, transferring perturbation-based methods from unsupervised visual representation learning to NLP remains an unsolved problem. This is largely due to the discreteness of subword units brought by tokenization of language models, limiting small perturbations of inputs to form semantics-preserved positive pairs. In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process. Drawing from cognitive and linguistic sciences, we introduce an unsupervised visual sentence representation learning framework, employing visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to texts to be perceived as continuous. Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision, achieving comparable performance in semantic textual similarity (STS) to existing state-of-the-art NLP methods. Additionally, we unveil our method's inherent zero-shot cross-lingual transferability and a unique leapfrogging pattern across languages during iterative training. To our knowledge, this is the first representation learning method devoid of traditional language models for understanding sentence and document semantics, marking a stride closer to human-like textual comprehension. Our code is available at https://github.com/gowitheflow-1998/Pixel-Linguist △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.01655 [pdf, other]

A Deep Learning Approach Towards Student Performance Prediction in Online Courses: Challenges Based on a Global Perspective

Authors: Abdallah Moubayed, MohammadNoor Injadat, Nouh Alhindawi, Ghassan Samara, Sara Abuasal, Raed Alazaidah

Abstract: Analyzing and evaluating students' progress in any learning environment is stressful and time consuming if done using traditional analysis methods. This is further exasperated by the increasing number of students due to the shift of focus toward integrating the Internet technologies in education and the focus of academic institutions on moving toward e-Learning, blended, or online learning models.… ▽ More Analyzing and evaluating students' progress in any learning environment is stressful and time consuming if done using traditional analysis methods. This is further exasperated by the increasing number of students due to the shift of focus toward integrating the Internet technologies in education and the focus of academic institutions on moving toward e-Learning, blended, or online learning models. As a result, the topic of student performance prediction has become a vibrant research area in recent years. To address this, machine learning and data mining techniques have emerged as a viable solution. To that end, this work proposes the use of deep learning techniques (CNN and RNN-LSTM) to predict the students' performance at the midpoint stage of the online course delivery using three distinct datasets collected from three different regions of the world. Experimental results show that deep learning models have promising performance as they outperform other optimized traditional ML models in two of the three considered datasets while also having comparable performance for the third dataset. △ Less

Submitted 10 January, 2024; originally announced February 2024.

Comments: Accepted and presented in 24th International Arab Conference on Information Technology (ACIT'2023)

arXiv:2401.13478 [pdf, other]

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Authors: Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kaijing Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

Abstract: Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in… ▽ More Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in scholarly language usually do not play a significant role. To bridge this gap, we develop a specialised scientific MMIR (SciMMIR) benchmark by leveraging open-access paper collections to extract data relevant to the scientific domain. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents. We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines. We conducted zero-shot and fine-tuning evaluations on prominent multi-modal image-captioning and visual language models, such as CLIP and BLIP. Our analysis offers critical insights for MMIR in the scientific domain, including the impact of pre-training and fine-tuning settings and the influence of the visual and textual encoders. All our data and checkpoints are publicly available at https://github.com/Wusiwei0410/SciMMIR. △ Less

Submitted 11 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: camera-ready version for ACL 2024 Findings

arXiv:2310.16193 [pdf, other]

Length is a Curse and a Blessing for Document-level Semantics

Authors: Chenghao Xiao, Yizhi Li, G Thomas Hudson, Chenghua Lin, Noura Al Moubayed

Abstract: In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but… ▽ More In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP 2023. Our code is publicly available at https://github.com/gowitheflow-1998/LA-SER-cubed

arXiv:2309.11895 [pdf, other]

Audio Contrastive based Fine-tuning

Authors: Yang Wang, Qibin Liang, Chenghao Xiao, Yizhi Li, Noura Al Moubayed, Chenghua Lin

Abstract: Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuni… ▽ More Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuning (AudioConFit), an efficient approach characterised by robust generalisability. Empirical experiments on a variety of audio classification tasks demonstrate the effectiveness and robustness of our approach, which achieves state-of-the-art results in various settings. △ Less

Submitted 19 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Under review

arXiv:2308.09171 [pdf, other]

doi 10.1142/9789811273209_0004

Forensic Data Analytics for Anomaly Detection in Evolving Networks

Authors: Li Yang, Abdallah Moubayed, Abdallah Shami, Amine Boukhtouta, Parisa Heidari, Stere Preda, Richard Brunner, Daniel Migault, Adel Larabi

Abstract: In the prevailing convergence of traditional infrastructure-based deployment (i.e., Telco and industry operational networks) towards evolving deployments enabled by 5G and virtualization, there is a keen interest in elaborating effective security controls to protect these deployments in-depth. By considering key enabling technologies like 5G and virtualization, evolving networks are democratized,… ▽ More In the prevailing convergence of traditional infrastructure-based deployment (i.e., Telco and industry operational networks) towards evolving deployments enabled by 5G and virtualization, there is a keen interest in elaborating effective security controls to protect these deployments in-depth. By considering key enabling technologies like 5G and virtualization, evolving networks are democratized, facilitating the establishment of point presences integrating different business models ranging from media, dynamic web content, gaming, and a plethora of IoT use cases. Despite the increasing services provided by evolving networks, many cybercrimes and attacks have been launched in evolving networks to perform malicious activities. Due to the limitations of traditional security artifacts (e.g., firewalls and intrusion detection systems), the research on digital forensic data analytics has attracted more attention. Digital forensic analytics enables people to derive detailed information and comprehensive conclusions from different perspectives of cybercrimes to assist in convicting criminals and preventing future crimes. This chapter presents a digital analytics framework for network anomaly detection, including multi-perspective feature engineering, unsupervised anomaly detection, and comprehensive result correction procedures. Experiments on real-world evolving network data show the effectiveness of the proposed forensic data analytics solution. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Electronic version of an article published as [Book Series: World Scientific Series in Digital Forensics and Cybersecurity, Volume 2, Innovations in Digital Forensics, 2023, Pages 99-137] [DOI:10.1142/9789811273209_0004] \c{opyright} copyright World Scientific Publishing Company [https://doi.org/10.1142/9789811273209_0004]

MSC Class: 68T01 ACM Class: I.2.6; C.2.0

arXiv:2305.00817 [pdf, other]

Racial Bias within Face Recognition: A Survey

Authors: Seyma Yucer, Furkan Tektas, Noura Al Moubayed, Toby P. Breckon

Abstract: Facial recognition is one of the most academically studied and industrially developed areas within computer vision where we readily find associated applications deployed globally. This widespread adoption has uncovered significant performance variation across subjects of different racial profiles leading to focused research attention on racial bias within face recognition spanning both current cau… ▽ More Facial recognition is one of the most academically studied and industrially developed areas within computer vision where we readily find associated applications deployed globally. This widespread adoption has uncovered significant performance variation across subjects of different racial profiles leading to focused research attention on racial bias within face recognition spanning both current causation and future potential solutions. In support, this study provides an extensive taxonomic review of research on racial bias within face recognition exploring every aspect and stage of the face recognition processing pipeline. Firstly, we discuss the problem definition of racial bias, starting with race definition, grouping strategies, and the societal implications of using race or race-related groupings. Secondly, we divide the common face recognition processing pipeline into four stages: image acquisition, face localisation, face representation, face verification and identification, and review the relevant corresponding literature associated with each stage. The overall aim is to provide comprehensive coverage of the racial bias problem with respect to each and every stage of the face recognition processing pipeline whilst also highlighting the potential pitfalls and limitations of contemporary mitigation strategies that need to be considered within future research endeavours or commercial applications alike. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2301.02275 [pdf, other]

doi 10.1016/j.aiopen.2023.05.001

Language as a Latent Sequence: deep latent variable models for semi-supervised paraphrase generation

Authors: Jialin Yu, Alexandra I. Cristea, Anoushka Harit, Zhongtian Sun, Olanrewaju Tahir Aduragba, Lei Shi, Noura Al Moubayed

Abstract: This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text. To leverage information from text pairs, we add… ▽ More This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text. To leverage information from text pairs, we additionally introduce a novel supervised model we call dual directional learning (DDL), which is designed to integrate with our proposed VSAR model. Combining VSAR with DDL (DDL+VSAR) enables us to conduct semi-supervised learning. Still, the combined model suffers from a cold-start problem. To further combat this issue, we propose an improved weight initialisation solution, leading to a novel two-stage training scheme we call knowledge-reinforced-learning (KRL). Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data. Furthermore, in scenarios where only a fraction of the labelled pairs are available, our combined model consistently outperforms the strong supervised model baseline (DDL) by a significant margin (p <.05; Wilcoxon test). Our code is publicly available at "https://github.com/jialin-yu/latent-sequence-paraphrase". △ Less

Submitted 8 September, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

arXiv:2212.09170 [pdf, other]

On Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning

Authors: Chenghao Xiao, Yang Long, Noura Al Moubayed

Abstract: Incorporating contrastive learning objectives in sentence representation learning (SRL) has yielded significant improvements on many sentence-level NLP tasks. However, it is not well understood why contrastive learning works for learning sentence-level semantics. In this paper, we aim to help guide future designs of sentence representation learning methods by taking a closer look at contrastive SR… ▽ More Incorporating contrastive learning objectives in sentence representation learning (SRL) has yielded significant improvements on many sentence-level NLP tasks. However, it is not well understood why contrastive learning works for learning sentence-level semantics. In this paper, we aim to help guide future designs of sentence representation learning methods by taking a closer look at contrastive SRL through the lens of isotropy, contextualization and learning dynamics. We interpret its successes through the geometry of the representation shifts and show that contrastive learning brings isotropy, and drives high intra-sentence similarity: when in the same sentence, tokens converge to similar positions in the semantic space. We also find that what we formalize as "spurious contextualization" is mitigated for semantically meaningful tokens, while augmented for functional ones. We find that the embedding space is directed towards the origin during training, with more areas now better defined. We ablate these findings by observing the learning dynamics with different training temperatures, batch sizes and pooling methods. △ Less

Submitted 26 May, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

Comments: Accepted by ACL 2023 (Findings, long paper)

arXiv:2211.01266 [pdf, other]

Knowing the Past to Predict the Future: Reinforcement Virtual Learning

Authors: Peng Zhang, Yawen Huang, Bingzhang Hu, Shizheng Wang, Haoran Duan, Noura Al Moubayed, Yefeng Zheng, Yang Long

Abstract: Reinforcement Learning (RL)-based control system has received considerable attention in recent decades. However, in many real-world problems, such as Batch Process Control, the environment is uncertain, which requires expensive interaction to acquire the state and reward values. In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space us… ▽ More Reinforcement Learning (RL)-based control system has received considerable attention in recent decades. However, in many real-world problems, such as Batch Process Control, the environment is uncertain, which requires expensive interaction to acquire the state and reward values. In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space using the predictive models with only historical data. The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions. The main focuses are summarized as: 1) how to balance the long-sight and short-sight rewards with an optimal strategy; 2) how to make the virtual model interacting with real environment to converge to a final learning policy. Under the experimental settings of Fed-Batch Process, our method consistently outperforms the existing state-of-the-art methods. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2209.01061 [pdf, other]

INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations

Authors: Jialin Yu, Alexandra I. Cristea, Anoushka Harit, Zhongtian Sun, Olanrewaju Tahir Aduragba, Lei Shi, Noura Al Moubayed

Abstract: XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by pro… ▽ More XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by proposing a generative XAI framework, INTERACTION (explaIn aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder). Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation. We conduct intensive experiments with the Transformer architecture on a benchmark dataset, e-SNLI. Our method achieves competitive or better performance against state-of-the-art baseline models on explanation generation (up to 4.7% gain in BLEU) and prediction (up to 4.4% gain in accuracy) in step one; it can also generate multiple diverse explanations in step two. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.07613 [pdf, other]

Does lossy image compression affect racial bias within face recognition?

Authors: Seyma Yucer, Matt Poyser, Noura Al Moubayed, Toby P. Breckon

Abstract: Yes - This study investigates the impact of commonplace lossy image compression on face recognition algorithms with regard to the racial characteristics of the subject. We adopt a recently proposed racial phenotype-based bias analysis methodology to measure the effect of varying levels of lossy compression across racial phenotype categories. Additionally, we determine the relationship between chro… ▽ More Yes - This study investigates the impact of commonplace lossy image compression on face recognition algorithms with regard to the racial characteristics of the subject. We adopt a recently proposed racial phenotype-based bias analysis methodology to measure the effect of varying levels of lossy compression across racial phenotype categories. Additionally, we determine the relationship between chroma-subsampling and race-related phenotypes for recognition performance. Prior work investigates the impact of lossy JPEG compression algorithm on contemporary face recognition performance. However, there is a gap in how this impact varies with different race-related inter-sectional groups and the cause of this impact. Via an extensive experimental setup, we demonstrate that common lossy image compression approaches have a more pronounced negative impact on facial recognition performance for specific racial phenotype categories such as darker skin tones (by up to 34.55\%). Furthermore, removing chroma-subsampling during compression improves the false matching rate (up to 15.95\%) across all phenotype categories affected by the compression, including darker skin tones, wide noses, big lips, and monolid eye categories. In addition, we outline the characteristics that may be attributable as the underlying cause of such phenomenon for lossy compression algorithms such as JPEG. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2208.03824 [pdf, other]

Towards Graph Representation Learning Based Surgical Workflow Anticipation

Authors: Xiatian Zhang, Noura Al Moubayed, Hubert P. H. Shum

Abstract: Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, current approaches are limited to their insufficient expressive power for relationships between instruments. Hence, we propose a graph representation le… ▽ More Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, current approaches are limited to their insufficient expressive power for relationships between instruments. Hence, we propose a graph representation learning framework to comprehensively represent instrument motions in the surgical workflow anticipation problem. In our proposed graph representation, we maps the bounding box information of instruments to the graph nodes in the consecutive frames and build inter-frame/inter-instrument graph edges to represent the trajectory and interaction of the instruments over time. This design enhances the ability of our network on modeling both the spatial and temporal patterns of surgical instruments and their interactions. In addition, we design a multi-horizon learning strategy to balance the understanding of various horizons indifferent anticipation tasks, which significantly improves the model performance in anticipation with various horizons. Experiments on the Cholec80 dataset demonstrate the performance of our proposed method can exceed the state-of-the-art method based on richer backbones, especially in instrument anticipation (1.27 v.s. 1.48 for inMAE; 1.48 v.s. 2.68 for eMAE). To the best of our knowledge, we are the first to introduce a spatial-temporal graph representation into surgical workflow anticipation. △ Less

Submitted 7 August, 2022; originally announced August 2022.

Comments: Proceedings of the 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2022

arXiv:2205.14040 [pdf, other]

Intelligent Transportation Systems' Orchestration: Lessons Learned & Potential Opportunities

Authors: Abdallah Moubayed, Abdallah Shami, Abbas Ibrahim

Abstract: The growing deployment efforts of 5G networks globally has led to the acceleration of the businesses/services' digital transformation. This growth has led to the need for new communication technologies that will promote this transformation. 6G is being proposed as the set of technologies and architectures that will achieve this target. Among the main use cases that have emerged for 5G networks and… ▽ More The growing deployment efforts of 5G networks globally has led to the acceleration of the businesses/services' digital transformation. This growth has led to the need for new communication technologies that will promote this transformation. 6G is being proposed as the set of technologies and architectures that will achieve this target. Among the main use cases that have emerged for 5G networks and will continue to play a pivotal role in 6G networks is that of Intelligent Transportation Systems (ITSs). With all the projected benefits of developing and deploying efficient and effective ITSs comes a group of unique challenges that need to be addressed. One prominent challenge is ITS orchestration due to the various supporting technologies and heterogeneous networks used to offer the desired ITS applications/services. To that end, this paper focuses on the ITS orchestration challenge in detail by highlighting the related previous works from the literature and listing the lessons learned from current ITS deployment orchestration efforts. It also presents multiple potential data-driven research opportunities in which paradigms such as reinforcement learning and federated learning can be deployed to offer effective and efficient ITS orchestration. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: 6 pages, 3 figures, accepted and presented in 1320th International Conference on Recent Innovations in Engineering and Technology (ICRIET-2022)

ACM Class: J.2; I.2.11

arXiv:2203.13004 [pdf, other]

Using Orientation to Distinguish Overlapping Chromosomes

Authors: Daniel Kluvanec, Thomas B. Phillips, Kenneth J. W. McCaffrey, Noura Al Moubayed

Abstract: A difficult step in the process of karyotyping is segmenting chromosomes that touch or overlap. In an attempt to automate the process, previous studies turned to Deep Learning methods, with some formulating the task as a semantic segmentation problem. These models treat separate chromosome instances as semantic classes, which we show to be problematic, since it is uncertain which chromosome should… ▽ More A difficult step in the process of karyotyping is segmenting chromosomes that touch or overlap. In an attempt to automate the process, previous studies turned to Deep Learning methods, with some formulating the task as a semantic segmentation problem. These models treat separate chromosome instances as semantic classes, which we show to be problematic, since it is uncertain which chromosome should be classed as #1 and #2. Assigning class labels based on comparison rules, such as the shorter/longer chromosome alleviates, but does not fully resolve the issue. Instead, we separate the chromosome instances in a second stage, predicting the orientation of the chromosomes by the model and use it as one of the key distinguishing factors of the chromosomes. We demonstrate this method to be effective. Furthermore, we introduce a novel Double-Angle representation that a neural network can use to predict the orientation. The representation maps any direction and its reverse to the same point. Lastly, we present a new expanded synthetic dataset, which is based on Pommier's dataset, but addresses its issues with insufficient separation between its training and testing sets. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: Conference for Health, Inference, and Learning (CHIL) 2022 - Invited non-archival presentation

arXiv:2202.07362 [pdf, other]

MuLD: The Multitask Long Document Benchmark

Authors: G Thomas Hudson, Noura Al Moubayed

Abstract: The impressive progress in NLP techniques has been driven by the development of multi-task benchmarks such as GLUE and SuperGLUE. While these benchmarks focus on tasks for one or two input sentences, there has been exciting work in designing efficient techniques for processing much longer inputs. In this paper, we present MuLD: a new long document benchmark consisting of only documents over 10,000… ▽ More The impressive progress in NLP techniques has been driven by the development of multi-task benchmarks such as GLUE and SuperGLUE. While these benchmarks focus on tasks for one or two input sentences, there has been exciting work in designing efficient techniques for processing much longer inputs. In this paper, we present MuLD: a new long document benchmark consisting of only documents over 10,000 tokens. By modifying existing NLP tasks, we create a diverse benchmark which requires models to successfully model long-term dependencies in the text. We evaluate how existing models perform, and find that our benchmark is much more challenging than their `short document' equivalents. Furthermore, by evaluating both regular and efficient transformers, we show that models with increased context length are better able to solve the tasks presented, suggesting that future improvements in these models are vital for solving similar long document problems. We release the data and code for baselines to encourage further research on efficient NLP models. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2110.09839 [pdf, other]

doi 10.1109/WACV51458.2022.00326

Measuring Hidden Bias within Face Recognition via Racial Phenotypes

Authors: Seyma Yucer, Furkan Tektas, Noura Al Moubayed, Toby P. Breckon

Abstract: Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact on the underlying findings of such racial bias analysis. Previous studies define these groups based on either demographic information (e.g. African, Asian etc.) or skin tone (e.g. ligh… ▽ More Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact on the underlying findings of such racial bias analysis. Previous studies define these groups based on either demographic information (e.g. African, Asian etc.) or skin tone (e.g. lighter or darker skins). The use of such sensitive or broad group definitions has disadvantages for bias investigation and subsequent counter-bias solutions design. By contrast, this study introduces an alternative racial bias analysis methodology via facial phenotype attributes for face recognition. We use the set of observable characteristics of an individual face where a race-related facial phenotype is hence specific to the human face and correlated to the racial profile of the subject. We propose categorical test cases to investigate the individual influence of those attributes on bias within face recognition tasks. We compare our phenotype-based grouping methodology with previous grouping strategies and show that phenotype-based groupings uncover hidden bias without reliance upon any potentially protected attributes or ill-defined grouping strategies. Furthermore, we contribute corresponding phenotype attribute category labels for two face recognition tasks: RFW for face verification and VGGFace2 (test set) for face identification. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: published in IEEE Winter Conference on Applications of Computer Vision, WACV, 2022

arXiv:2110.07808 [pdf, other]

doi 10.1109/ISNCC52172.2021.9615795

Mobility Aware Edge Computing Segmentation Towards Localized Orchestration

Authors: Sam Aleyadeh, Abdallah Moubayed, Abdallah Shami

Abstract: The current trend in end-user devices' advancements in computing and communication capabilities makes edge computing an attractive solution to pave the way for the coveted ultra-low latency services. The success of the edge computing networking paradigm depends on the proper orchestration of the edge servers. Several Edge applications and services are intolerant to latency, especially in 5G and be… ▽ More The current trend in end-user devices' advancements in computing and communication capabilities makes edge computing an attractive solution to pave the way for the coveted ultra-low latency services. The success of the edge computing networking paradigm depends on the proper orchestration of the edge servers. Several Edge applications and services are intolerant to latency, especially in 5G and beyond networks, such as intelligent video surveillance, E-health, Internet of Vehicles, and augmented reality applications. The edge devices underwent rapid growth in both capabilities and size to cope with the service demands. Orchestrating it on the cloud was a prominent trend during the past decade. However, the increasing number of edge devices poses a significant burden on the orchestration delay. In addition to the growth in edge devices, the high mobility of users renders traditional orchestration schemes impractical for contemporary edge networks. Proper segmentation of the edge space becomes necessary to adapt these schemes to address these challenges. In this paper, we introduce a segmentation technique employing lax clustering and segregated mobility-based clustering. We then apply latency mapping to these clusters. The proposed scheme's main objective is to create subspaces (segments) that enable light and efficient edge orchestration by reducing the processing time and the core cloud communication overhead. A bench-marking simulation is conducted with the results showing decreased mobility-related failures and reduced orchestration delay. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: accepted at ISNCC 2021

arXiv:2108.01589 [pdf, other]

ExBERT: An External Knowledge Enhanced BERT for Natural Language Inference

Authors: Amit Gajbhiye, Noura Al Moubayed, Steven Bradley

Abstract: Neural language representation models such as BERT, pre-trained on large-scale unstructured corpora lack explicit grounding to real-world commonsense knowledge and are often unable to remember facts required for reasoning and inference. Natural Language Inference (NLI) is a challenging reasoning task that relies on common human understanding of language and real-world commonsense knowledge. We int… ▽ More Neural language representation models such as BERT, pre-trained on large-scale unstructured corpora lack explicit grounding to real-world commonsense knowledge and are often unable to remember facts required for reasoning and inference. Natural Language Inference (NLI) is a challenging reasoning task that relies on common human understanding of language and real-world commonsense knowledge. We introduce a new model for NLI called External Knowledge Enhanced BERT (ExBERT), to enrich the contextual representation with real-world commonsense knowledge from external knowledge sources and enhance BERT's language understanding and reasoning capabilities. ExBERT takes full advantage of contextual word representations obtained from BERT and employs them to retrieve relevant external knowledge from knowledge graphs and to encode the retrieved external knowledge. Our model adaptively incorporates the external knowledge context required for reasoning over the inputs. Extensive experiments on the challenging SciTail and SNLI benchmarks demonstrate the effectiveness of ExBERT: in comparison to the previous state-of-the-art, we obtain an accuracy of 95.9% on SciTail and 91.5% on SNLI. △ Less

Submitted 3 August, 2021; originally announced August 2021.

arXiv:2107.11514 [pdf, other]

doi 10.1109/TNSM.2021.3100308

Multi-Perspective Content Delivery Networks Security Framework Using Optimized Unsupervised Anomaly Detection

Authors: Li Yang, Abdallah Moubayed, Abdallah Shami, Parisa Heidari, Amine Boukhtouta, Adel Larabi, Richard Brunner, Stere Preda, Daniel Migault

Abstract: Content delivery networks (CDNs) provide efficient content distribution over the Internet. CDNs improve the connectivity and efficiency of global communications, but their caching mechanisms may be breached by cyber-attackers. Among the security mechanisms, effective anomaly detection forms an important part of CDN security enhancement. In this work, we propose a multi-perspective unsupervised lea… ▽ More Content delivery networks (CDNs) provide efficient content distribution over the Internet. CDNs improve the connectivity and efficiency of global communications, but their caching mechanisms may be breached by cyber-attackers. Among the security mechanisms, effective anomaly detection forms an important part of CDN security enhancement. In this work, we propose a multi-perspective unsupervised learning framework for anomaly detection in CDNs. In the proposed framework, a multi-perspective feature engineering approach, an optimized unsupervised anomaly detection model that utilizes an isolation forest and a Gaussian mixture model, and a multi-perspective validation method, are developed to detect abnormal behaviors in CDNs mainly from the client Internet Protocol (IP) and node perspectives, therefore to identify the denial of service (DoS) and cache pollution attack (CPA) patterns. Experimental results are presented based on the analytics of eight days of real-world CDN log data provided by a major CDN operator. Through experiments, the abnormal contents, compromised nodes, malicious IPs, as well as their corresponding attack types, are identified effectively by the proposed framework and validated by multiple cybersecurity experts. This shows the effectiveness of the proposed method when applied to real-world CDN data. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: Accepted and to Appear in IEEE Transactions on Network and Service Management

MSC Class: 68T01 ACM Class: I.2.6; C.2.0

arXiv:2106.02183 [pdf, other]

Towards Equal Gender Representation in the Annotations of Toxic Language Detection

Authors: Elizabeth Excell, Noura Al Moubayed

Abstract: Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper, we focus on the differences in the ways men and women annotate comments for toxicity, investigating how these differences result in models that amplify the opi… ▽ More Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper, we focus on the differences in the ways men and women annotate comments for toxicity, investigating how these differences result in models that amplify the opinions of male annotators. We find that the BERT model as-sociates toxic comments containing offensive words with male annotators, causing the model to predict 67.7% of toxic comments as having been annotated by men. We show that this disparity between gender predictions can be mitigated by removing offensive words and highly toxic comments from the training data. We then apply the learned associations between gender and language to toxic language classifiers, finding that models trained exclusively on female-annotated data perform 1.8% better than those trained solely on male-annotated data and that training models on data after removing all offensive words reduces bias in the model by 55.5% while increasing the sensitivity by 0.4%. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: Paper is accepted at GeBNLP2021 workshop at ACL-IJCNLP 2021

arXiv:2105.13289 [pdf, other]

doi 10.1109/JIOT.2021.3084796

MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles

Authors: Li Yang, Abdallah Moubayed, Abdallah Shami

Abstract: Modern vehicles, including connected vehicles and autonomous vehicles, nowadays involve many electronic control units connected through intra-vehicle networks to implement various functionalities and perform actions. Modern vehicles are also connected to external networks through vehicle-to-everything technologies, enabling their communications with other vehicles, infrastructures, and smart devic… ▽ More Modern vehicles, including connected vehicles and autonomous vehicles, nowadays involve many electronic control units connected through intra-vehicle networks to implement various functionalities and perform actions. Modern vehicles are also connected to external networks through vehicle-to-everything technologies, enabling their communications with other vehicles, infrastructures, and smart devices. However, the improving functionality and connectivity of modern vehicles also increase their vulnerabilities to cyber-attacks targeting both intra-vehicle and external networks due to the large attack surfaces. To secure vehicular networks, many researchers have focused on developing intrusion detection systems (IDSs) that capitalize on machine learning methods to detect malicious cyber-attacks. In this paper, the vulnerabilities of intra-vehicle and external networks are discussed, and a multi-tiered hybrid IDS that incorporates a signature-based IDS and an anomaly-based IDS is proposed to detect both known and unknown attacks on vehicular networks. Experimental results illustrate that the proposed system can detect various types of known attacks with 99.99% accuracy on the CAN-intrusion-dataset representing the intra-vehicle network data and 99.88% accuracy on the CICIDS2017 dataset illustrating the external vehicular network data. For the zero-day attack detection, the proposed system achieves high F1-scores of 0.963 and 0.800 on the above two datasets, respectively. The average processing time of each data packet on a vehicle-level machine is less than 0.6 ms, which shows the feasibility of implementing the proposed system in real-time vehicle systems. This emphasizes the effectiveness and efficiency of the proposed IDS. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: Accepted and to appear in IEEE Internet of Things Journal; Code is available at Github link: https://github.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning

MSC Class: 68T01 ACM Class: I.2.6; C.2.0

arXiv:2105.06791 [pdf, other]

Agree to Disagree: When Deep Learning Models With Identical Architectures Produce Distinct Explanations

Authors: Matthew Watson, Bashar Awwad Shiekh Hasan, Noura Al Moubayed

Abstract: Deep Learning of neural networks has progressively become more prominent in healthcare with models reaching, or even surpassing, expert accuracy levels. However, these success stories are tainted by concerning reports on the lack of model transparency and bias against some medical conditions or patients' sub-groups. Explainable methods are considered the gateway to alleviate many of these concerns… ▽ More Deep Learning of neural networks has progressively become more prominent in healthcare with models reaching, or even surpassing, expert accuracy levels. However, these success stories are tainted by concerning reports on the lack of model transparency and bias against some medical conditions or patients' sub-groups. Explainable methods are considered the gateway to alleviate many of these concerns. In this study we demonstrate that the generated explanations are volatile to changes in model training that are perpendicular to the classification task and model structure. This raises further questions about trust in deep learning models for healthcare. Mainly, whether the models capture underlying causal links in the data or just rely on spurious correlations that are made visible via explanation methods. We demonstrate that the output of explainability methods on deep neural networks can vary significantly by changes of hyper-parameters, such as the random seed or how the training set is shuffled. We introduce a measure of explanation consistency which we use to highlight the identified problems on the MIMIC-CXR dataset. We find explanations of identical models but with different training setups have a low consistency: $\approx$ 33% on average. On the contrary, kernel methods are robust against any orthogonal changes, with explanation consistency at 94%. We conclude that current trends in model explanation are not sufficient to mitigate the risks of deploying models in real life healthcare applications. △ Less

Submitted 30 October, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

Comments: 9 pages, 5 figures, 3 tables

ACM Class: I.2

arXiv:2105.01959 [pdf, other]

Attack-agnostic Adversarial Detection on Medical Data Using Explainable Machine Learning

Authors: Matthew Watson, Noura Al Moubayed

Abstract: Explainable machine learning has become increasingly prevalent, especially in healthcare where explainable models are vital for ethical and trusted automated decision making. Work on the susceptibility of deep learning models to adversarial attacks has shown the ease of designing samples to mislead a model into making incorrect predictions. In this work, we propose a model agnostic explainability-… ▽ More Explainable machine learning has become increasingly prevalent, especially in healthcare where explainable models are vital for ethical and trusted automated decision making. Work on the susceptibility of deep learning models to adversarial attacks has shown the ease of designing samples to mislead a model into making incorrect predictions. In this work, we propose a model agnostic explainability-based method for the accurate detection of adversarial samples on two datasets with different complexity and properties: Electronic Health Record (EHR) and chest X-ray (CXR) data. On the MIMIC-III and Henan-Renmin EHR datasets, we report a detection accuracy of 77% against the Longitudinal Adversarial Attack. On the MIMIC-CXR dataset, we achieve an accuracy of 88%; significantly improving on the state of the art of adversarial detection in both datasets by over 10% in all settings. We propose an anomaly detection based method using explainability techniques to detect adversarial samples which is able to generalise to different attack methods without a need for retraining. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 13 pages, 6 figures, accepted to ICPR 2020

ACM Class: I.2; I.4

arXiv:2101.03655 [pdf, other]

doi 10.1007/s10462-020-09948-w

Machine Learning Towards Intelligent Systems: Applications, Challenges, and Opportunities

Authors: MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif, Abdallah Shami

Abstract: The emergence and continued reliance on the Internet and related technologies has resulted in the generation of large amounts of data that can be made available for analyses. However, humans do not possess the cognitive capabilities to understand such large amounts of data. Machine learning (ML) provides a mechanism for humans to process large amounts of data, gain insights about the behavior of t… ▽ More The emergence and continued reliance on the Internet and related technologies has resulted in the generation of large amounts of data that can be made available for analyses. However, humans do not possess the cognitive capabilities to understand such large amounts of data. Machine learning (ML) provides a mechanism for humans to process large amounts of data, gain insights about the behavior of the data, and make more informed decision based on the resulting analysis. ML has applications in various fields. This review focuses on some of the fields and applications such as education, healthcare, network security, banking and finance, and social media. Within these fields, there are multiple unique challenges that exist. However, ML can provide solutions to these challenges, as well as create further research opportunities. Accordingly, this work surveys some of the challenges facing the aforementioned fields and presents some of the previous literature works that tackled them. Moreover, it suggests several research opportunities that benefit from the use of ML to address these challenges. △ Less

Submitted 10 January, 2021; originally announced January 2021.

Comments: 46 pages, 7 figures, 5 tables, journal

arXiv:2101.03581 [pdf, other]

doi 10.1016/j.techfore.2021.121127

Curvature-based Feature Selection with Application in Classifying Electronic Health Records

Authors: Zheming Zuo, Jie Li, Han Xu, Noura Al Moubayed

Abstract: Disruptive technologies provides unparalleled opportunities to contribute to the identifications of many aspects in pervasive healthcare, from the adoption of the Internet of Things through to Machine Learning (ML) techniques. As a powerful tool, ML has been widely applied in patient-centric healthcare solutions. To further improve the quality of patient care, Electronic Health Records (EHRs) are… ▽ More Disruptive technologies provides unparalleled opportunities to contribute to the identifications of many aspects in pervasive healthcare, from the adoption of the Internet of Things through to Machine Learning (ML) techniques. As a powerful tool, ML has been widely applied in patient-centric healthcare solutions. To further improve the quality of patient care, Electronic Health Records (EHRs) are commonly adopted in healthcare facilities for analysis. It is a crucial task to apply AI and ML to analyse those EHRs for prediction and diagnostics due to their highly unstructured, unbalanced, incomplete, and high-dimensional nature. Dimensionality reduction is a common data preprocessing technique to cope with high-dimensional EHR data, which aims to reduce the number of features of EHR representation while improving the performance of the subsequent data analysis, e.g. classification. In this work, an efficient filter-based feature selection method, namely Curvature-based Feature Selection (CFS), is presented. The proposed CFS applied the concept of Menger Curvature to rank the weights of all features in the given data set. The performance of the proposed CFS has been evaluated in four well-known EHR data sets, including Cervical Cancer Risk Factors (CCRFDS), Breast Cancer Coimbra (BCCDS), Breast Tissue (BTDS), and Diabetic Retinopathy Debrecen (DRDDS). The experimental results show that the proposed CFS achieved state-of-the-art performance on the above data sets against conventional PCA and other most recent approaches. The source code of the proposed approach is publicly available at https://github.com/zhemingzuo/CFS. △ Less

Submitted 30 November, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

Comments: Accepted by Technological Forecasting and Social Change; Source code available

arXiv:2101.02006 [pdf, other]

doi 10.1109/EDUNINE.2018.8451005

Relationship between Student Engagement and Performance in e-Learning Environment Using Association Rules

Authors: Abdallah Moubayed, MohammadNoor Injadat, Abdallah Shami, Hanan Lutfiyya

Abstract: The field of e-learning has emerged as a topic of interest in academia due to the increased ease of accessing the Internet using using smart-phones and wireless devices. One of the challenges facing e-learning platforms is how to keep students motivated and engaged. Moreover, it is also crucial to identify the students that might need help in order to make sure their academic performance doesn't s… ▽ More The field of e-learning has emerged as a topic of interest in academia due to the increased ease of accessing the Internet using using smart-phones and wireless devices. One of the challenges facing e-learning platforms is how to keep students motivated and engaged. Moreover, it is also crucial to identify the students that might need help in order to make sure their academic performance doesn't suffer. To that end, this paper tries to investigate the relationship between student engagement and their academic performance. Apriori association rules algorithm is used to derive a set of rules that relate student engagement to academic performance. Experimental results' analysis done using confidence and lift metrics show that a positive correlation exists between students' engagement level and their academic performance in a blended e-learning environment. In particular, it is shown that higher engagement often leads to better academic performance. This cements the previous work that linked engagement and academic performance in traditional classrooms. △ Less

Submitted 25 December, 2020; originally announced January 2021.

Comments: 1 Table, 1 Figure, published in 2018 IEEE World Engineering Education Conference (EDUNINE)

Journal ref: 2018 IEEE World Engineering Education Conference (EDUNINE), 2018, pp. 1-6

arXiv:2012.13604 [pdf, other]

doi 10.1109/GLOCOM.2018.8647679

DNS Typo-squatting Domain Detection: A Data Analytics & Machine Learning Based Approach

Authors: Abdallah Moubayed, MohammadNoor Injadat, Abdallah Shami, Hanan Lutfiyya

Abstract: Domain Name System (DNS) is a crucial component of current IP-based networks as it is the standard mechanism for name to IP resolution. However, due to its lack of data integrity and origin authentication processes, it is vulnerable to a variety of attacks. One such attack is Typosquatting. Detecting this attack is particularly important as it can be a threat to corporate secrets and can be used t… ▽ More Domain Name System (DNS) is a crucial component of current IP-based networks as it is the standard mechanism for name to IP resolution. However, due to its lack of data integrity and origin authentication processes, it is vulnerable to a variety of attacks. One such attack is Typosquatting. Detecting this attack is particularly important as it can be a threat to corporate secrets and can be used to steal information or commit fraud. In this paper, a machine learning-based approach is proposed to tackle the typosquatting vulnerability. To that end, exploratory data analytics is first used to better understand the trends observed in eight domain name-based extracted features. Furthermore, a majority voting-based ensemble learning classifier built using five classification algorithms is proposed that can detect suspicious domains with high accuracy. Moreover, the observed trends are validated by studying the same features in an unlabeled dataset using K-means clustering algorithm and through applying the developed ensemble learning classifier. Results show that legitimate domains have a smaller domain name length and fewer unique characters. Moreover, the developed ensemble learning classifier performs better in terms of accuracy, precision, and F-score. Furthermore, it is shown that similar trends are observed when clustering is used. However, the number of domains identified as potentially suspicious is high. Hence, the ensemble learning classifier is applied with results showing that the number of domains identified as potentially suspicious is reduced by almost a factor of five while still maintaining the same trends in terms of features' statistics. △ Less

Submitted 25 December, 2020; originally announced December 2020.

Comments: 7 pages, 6 figures, 3 tables, published in 2018 IEEE Global Communications Conference (GLOBECOM)

Journal ref: 2018 IEEE Global Communications Conference (GLOBECOM), 2018, pp. 1-7

arXiv:2012.11326 [pdf, other]

Optimized Random Forest Model for Botnet Detection Based on DNS Queries

Authors: Abdallah Moubayed, MohammadNoor Injadat, Abdallah Shami

Abstract: The Domain Name System (DNS) protocol plays a major role in today's Internet as it translates between website names and corresponding IP addresses. However, due to the lack of processes for data integrity and origin authentication, the DNS protocol has several security vulnerabilities. This often leads to a variety of cyber-attacks, including botnet network attacks. One promising solution to detec… ▽ More The Domain Name System (DNS) protocol plays a major role in today's Internet as it translates between website names and corresponding IP addresses. However, due to the lack of processes for data integrity and origin authentication, the DNS protocol has several security vulnerabilities. This often leads to a variety of cyber-attacks, including botnet network attacks. One promising solution to detect DNS-based botnet attacks is adopting machine learning (ML) based solutions. To that end, this paper proposes a novel optimized ML-based framework to detect botnets based on their corresponding DNS queries. More specifically, the framework consists of using information gain as a feature selection method and genetic algorithm (GA) as a hyper-parameter optimization model to tune the parameters of a random forest (RF) classifier. The proposed framework is evaluated using a state-of-the-art TI-2016 DNS dataset. Experimental results show that the proposed optimized framework reduced the feature set size by up to 60%. Moreover, it achieved a high detection accuracy, precision, recall, and F-score compared to the default classifier. This highlights the effectiveness and robustness of the proposed framework in detecting botnet attacks. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: 4 pages, 3 figures, 1 table, Accepted and presented in IEEE 32nd International Conference on Microelectronics (IEEE-ICM2020)

arXiv:2012.11325 [pdf, other]

Detecting Botnet Attacks in IoT Environments: An Optimized Machine Learning Approach

Authors: MohammadNoor Injadat, Abdallah Moubayed, Abdallah Shami

Abstract: The increased reliance on the Internet and the corresponding surge in connectivity demand has led to a significant growth in Internet-of-Things (IoT) devices. The continued deployment of IoT devices has in turn led to an increase in network attacks due to the larger number of potential attack surfaces as illustrated by the recent reports that IoT malware attacks increased by 215.7% from 10.3 milli… ▽ More The increased reliance on the Internet and the corresponding surge in connectivity demand has led to a significant growth in Internet-of-Things (IoT) devices. The continued deployment of IoT devices has in turn led to an increase in network attacks due to the larger number of potential attack surfaces as illustrated by the recent reports that IoT malware attacks increased by 215.7% from 10.3 million in 2017 to 32.7 million in 2018. This illustrates the increased vulnerability and susceptibility of IoT devices and networks. Therefore, there is a need for proper effective and efficient attack detection and mitigation techniques in such environments. Machine learning (ML) has emerged as one potential solution due to the abundance of data generated and available for IoT devices and networks. Hence, they have significant potential to be adopted for intrusion detection for IoT environments. To that end, this paper proposes an optimized ML-based framework consisting of a combination of Bayesian optimization Gaussian Process (BO-GP) algorithm and decision tree (DT) classification model to detect attacks on IoT devices in an effective and efficient manner. The performance of the proposed framework is evaluated using the Bot-IoT-2018 dataset. Experimental results show that the proposed optimized framework has a high detection accuracy, precision, recall, and F-score, highlighting its effectiveness and robustness for the detection of botnet attacks in IoT environments. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: 4 pages, 2 figures, 1 table, Accepted and presented at IEEE 32nd International Conference on Microelectronics (IEEE-ICM2020)

arXiv:2012.10285 [pdf, other]

Trying Bilinear Pooling in Video-QA

Authors: Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed

Abstract: Bilinear pooling (BLP) refers to a family of operations recently developed for fusing features from different modalities predominantly developed for VQA models. A bilinear (outer-product) expansion is thought to encourage models to learn interactions between two feature spaces and has experimentally outperformed `simpler' vector operations (concatenation and element-wise-addition/multiplication) o… ▽ More Bilinear pooling (BLP) refers to a family of operations recently developed for fusing features from different modalities predominantly developed for VQA models. A bilinear (outer-product) expansion is thought to encourage models to learn interactions between two feature spaces and has experimentally outperformed `simpler' vector operations (concatenation and element-wise-addition/multiplication) on VQA benchmarks. Successive BLP techniques have yielded higher performance with lower computational expense and are often implemented alongside attention mechanisms. However, despite significant progress in VQA, BLP methods have not been widely applied to more recently explored video question answering (video-QA) tasks. In this paper, we begin to bridge this research gap by applying BLP techniques to various video-QA benchmarks, namely: TVQA, TGIF-QA, Ego-VQA and MSVD-QA. We share our results on the TVQA baseline model, and the recently proposed heterogeneous-memory-enchanced multimodal attention (HME) model. Our experiments include both simply replacing feature concatenation in the existing models with BLP, and a modified version of the TVQA baseline to accommodate BLP we name the `dual-stream' model. We find that our relatively simple integration of BLP does not increase, and mostly harms, performance on these video-QA benchmarks. Using recently proposed theoretical multimodal fusion taxonomies, we offer insight into why BLP-driven performance gain for video-QA benchmarks may be more difficult to achieve than in earlier VQA models. We suggest a few additional `best-practices' to consider when applying BLP to video-QA. We stress that video-QA models should carefully consider where the complex representational potential from BLP is actually needed to avoid computational expense on `redundant' fusion. △ Less

Submitted 18 December, 2020; originally announced December 2020.

Comments: 16 Pages, 8 Figures, 4 Tables, +Supp Mats

MSC Class: 68T99 ACM Class: I.2.10; I.2.7

arXiv:2012.10210 [pdf, other]

On Modality Bias in the TVQA Dataset

Authors: Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed

Abstract: TVQA is a large scale video question answering (video-QA) dataset based on popular TV shows. The questions were specifically designed to require "both vision and language understanding to answer". In this work, we demonstrate an inherent bias in the dataset towards the textual subtitle modality. We infer said bias both directly and indirectly, notably finding that models trained with subtitles lea… ▽ More TVQA is a large scale video question answering (video-QA) dataset based on popular TV shows. The questions were specifically designed to require "both vision and language understanding to answer". In this work, we demonstrate an inherent bias in the dataset towards the textual subtitle modality. We infer said bias both directly and indirectly, notably finding that models trained with subtitles learn, on-average, to suppress video feature contribution. Our results demonstrate that models trained on only the visual information can answer ~45% of the questions, while using only the subtitles achieves ~68%. We find that a bilinear pooling based joint representation of modalities damages model performance by 9% implying a reliance on modality specific information. We also show that TVQA fails to benefit from the RUBi modality bias reduction technique popularised in VQA. By simply improving text processing using BERT embeddings with the simple model first proposed for TVQA, we achieve state-of-the-art results (72.13%) compared to the highly complex STAGE model (70.50%). We recommend a multimodal evaluation framework that can highlight biases in models and isolate visual and textual reliant subsets of data. Using this framework we propose subsets of TVQA that respond exclusively to either or both modalities in order to facilitate multimodal modelling as TVQA originally intended. △ Less

Submitted 18 December, 2020; originally announced December 2020.

Comments: 10 pages, 4 Figures, 2 Tables, +Supp Mats, BMVC 2020

MSC Class: 68T99 ACM Class: I.2.10; I.2.7; I.2.4

arXiv:2010.11562 [pdf, other]

doi 10.1007/978-3-030-61609-0_50

Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models

Authors: Amit Gajbhiye, Thomas Winterbottom, Noura Al Moubayed, Steven Bradley

Abstract: We consider the task of incorporating real-world commonsense knowledge into deep Natural Language Inference (NLI) models. Existing external knowledge incorporation methods are limited to lexical level knowledge and lack generalization across NLI models, datasets, and commonsense knowledge sources. To address these issues, we propose a novel NLI model-independent neural framework, BiCAM. BiCAM inco… ▽ More We consider the task of incorporating real-world commonsense knowledge into deep Natural Language Inference (NLI) models. Existing external knowledge incorporation methods are limited to lexical level knowledge and lack generalization across NLI models, datasets, and commonsense knowledge sources. To address these issues, we propose a novel NLI model-independent neural framework, BiCAM. BiCAM incorporates real-world commonsense knowledge into NLI models. Combined with convolutional feature detectors and bilinear feature fusion, BiCAM provides a conceptually simple mechanism that generalizes well. Quantitative evaluations with two state-of-the-art NLI baselines on SNLI and SciTail datasets in conjunction with ConceptNet and Aristo Tuple KGs show that BiCAM considerably improves the accuracy the incorporated NLI baselines. For example, our BiECAM model, an instance of BiCAM, on the challenging SciTail dataset, improves the accuracy of incorporated baselines by 7.0% with ConceptNet, and 8.0% with Aristo Tuple KG. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: Published in Lecture Notes in Computer Science, Springer International Publishing

arXiv:2010.07223 [pdf, other]

Cost-optimal V2X Service Placement in Distributed Cloud/Edge Environment

Authors: Abdallah Moubayed, Abdallah Shami, Parisa Heidari, Adel Larabi, Richard Brunner

Abstract: Deploying V2X services has become a challenging task. This is mainly due to the fact that such services have strict latency requirements. To meet these requirements, one potential solution is adopting mobile edge computing (MEC). However, this presents new challenges including how to find a cost efficient placement that meets other requirements such as latency. In this work, the problem of cost-op… ▽ More Deploying V2X services has become a challenging task. This is mainly due to the fact that such services have strict latency requirements. To meet these requirements, one potential solution is adopting mobile edge computing (MEC). However, this presents new challenges including how to find a cost efficient placement that meets other requirements such as latency. In this work, the problem of cost-optimal V2X service placement (CO-VSP) in a distributed cloud/edge environment is formulated. Additionally, a cost-focused delay-aware V2X service placement (DA-VSP) heuristic algorithm is proposed. Simulation results show that both CO-VSP model and DA-VSP algorithm guarantee the QoS requirements of all such services and illustrates the trade-off between latency and deployment cost. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: 6 pages, 4 figures, 1 table, 1 algorithm pseudocode, Accepted & presented in IEEE 16th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2020)

arXiv:2009.03756 [pdf, other]

Machine Learning Towards Enabling Spectrum-as-a-Service Dynamic Sharing

Authors: Abdallah Moubayed, Tanveer Ahmed, Anwar Haque, Abdallah Shami

Abstract: The growth in wireless broadband users, devices, and novel applications has led to a significant increase in the demand for new radio frequency spectrum. This is expected to grow even further given the projection that the global traffic per year will reach 4.8 zettabytes by 2022. Moreover, it is projected that the number of Internet users will reach 4.8 billion and the number of connected devices… ▽ More The growth in wireless broadband users, devices, and novel applications has led to a significant increase in the demand for new radio frequency spectrum. This is expected to grow even further given the projection that the global traffic per year will reach 4.8 zettabytes by 2022. Moreover, it is projected that the number of Internet users will reach 4.8 billion and the number of connected devices will be close 28.5 billion devices. However, due to the spectrum being mostly allocated and divided, providing more spectrum to expand existing services or offer new ones has become more challenging. To address this, spectrum sharing has been proposed as a potential solution to improve spectrum utilization efficiency. Adopting effective and efficient spectrum sharing mechanisms is in itself a challenging task given the multitude of levels and techniques that can be integrated to enable it. To that end, this paper provides an overview of the different spectrum sharing levels and techniques that have been proposed in the literature. Moreover, it discusses the potential of adopting dynamic sharing mechanisms by offering Spectrum-as-a-Service architecture. Furthermore, it describes the potential role of machine learning models in facilitating the automated and efficient dynamic sharing of the spectrum and offering Spectrum-as-a-Service. △ Less

Submitted 4 September, 2020; originally announced September 2020.

Comments: 6 pages, 2 figures, Accepted and presented in 2020 IEEE Canadian Conference On Electrical And Computer Engineering (CCECE 2020)

arXiv:2008.03297 [pdf, other]

doi 10.1109/TNSM.2020.3014929

Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection

Authors: MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif, Abdallah Shami

Abstract: Cyber-security garnered significant attention due to the increased dependency of individuals and organizations on the Internet and their concern about the security and privacy of their online activities. Several previous machine learning (ML)-based network intrusion detection systems (NIDSs) have been developed to protect against malicious online behavior. This paper proposes a novel multi-stage o… ▽ More Cyber-security garnered significant attention due to the increased dependency of individuals and organizations on the Internet and their concern about the security and privacy of their online activities. Several previous machine learning (ML)-based network intrusion detection systems (NIDSs) have been developed to protect against malicious online behavior. This paper proposes a novel multi-stage optimized ML-based NIDS framework that reduces computational complexity while maintaining its detection performance. This work studies the impact of oversampling techniques on the models' training sample size and determines the minimal suitable training sample size. Furthermore, it compares between two feature selection techniques, information gain and correlation-based, and explores their effect on detection performance and time complexity. Moreover, different ML hyper-parameter optimization techniques are investigated to enhance the NIDS's performance. The performance of the proposed framework is evaluated using two recent intrusion detection datasets, the CICIDS 2017 and the UNSW-NB 2015 datasets. Experimental results show that the proposed model significantly reduces the required training sample size (up to 74%) and feature set size (up to 50%). Moreover, the model performance is enhanced with hyper-parameter optimization with detection accuracies over 99% for both datasets, outperforming recent literature works by 1-2% higher accuracy and 1-2% lower false alarm rate. △ Less

Submitted 8 August, 2020; originally announced August 2020.

Comments: 14 Pages, 13 Figures, 4 tables, Published IEEE Transactions on Network and Service Management ( Early Access )

Journal ref: Electronic ISSN: 1932-4537

arXiv:2006.09272 [pdf, other]

Ensemble-based Feature Selection and Classification Model for DNS Typo-squatting Detection

Authors: Abdallah Moubayed, Emad Aqeeli, Abdallah Shami

Abstract: Domain Name System (DNS) plays in important role in the current IP-based Internet architecture. This is because it performs the domain name to IP resolution. However, the DNS protocol has several security vulnerabilities due to the lack of data integrity and origin authentication within it. This paper focuses on one particular security vulnerability, namely typo-squatting. Typo-squatting refers to… ▽ More Domain Name System (DNS) plays in important role in the current IP-based Internet architecture. This is because it performs the domain name to IP resolution. However, the DNS protocol has several security vulnerabilities due to the lack of data integrity and origin authentication within it. This paper focuses on one particular security vulnerability, namely typo-squatting. Typo-squatting refers to the registration of a domain name that is extremely similar to that of an existing popular brand with the goal of redirecting users to malicious/suspicious websites. The danger of typo-squatting is that it can lead to information threat, corporate secret leakage, and can facilitate fraud. This paper builds on our previous work in [1], which only proposed majority-voting based classifier, by proposing an ensemble-based feature selection and bagging classification model to detect DNS typo-squatting attack. Experimental results show that the proposed framework achieves high accuracy and precision in identifying the malicious/suspicious typo-squatting domains (a loss of at most 1.5% in accuracy and 5% in precision when compared to the model that used the complete feature set) while having a lower computational complexity due to the smaller feature set (a reduction of more than 50% in feature set size). △ Less

Submitted 8 June, 2020; originally announced June 2020.

Comments: 6 pages, 2 figures, 6 tables, Accepted in 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE 2020)

arXiv:2006.05031 [pdf, other]

doi 10.1007/s10489-020-01776-3

Multi-split Optimized Bagging Ensemble Model Selection for Multi-class Educational Data Mining

Authors: MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif, Abdallah Shami

Abstract: Predicting students' academic performance has been a research area of interest in recent years with many institutions focusing on improving the students' performance and the education quality. The analysis and prediction of students' performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the studen… ▽ More Predicting students' academic performance has been a research area of interest in recent years with many institutions focusing on improving the students' performance and the education quality. The analysis and prediction of students' performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the students' final marks. To that end, this work analyzes two different undergraduate datasets at two different universities. Furthermore, this work aims to predict the students' performance at two stages of course delivery (20% and 50% respectively). This analysis allows for properly choosing the appropriate machine learning algorithms to use as well as optimize the algorithms' parameters. Furthermore, this work adopts a systematic multi-split approach based on Gini index and p-value. This is done by optimizing a suitable bagging ensemble learner that is built from any combination of six potential base machine learning algorithms. It is shown through experimental results that the posited bagging ensemble models achieve high accuracy for the target group for both datasets. △ Less

Submitted 8 June, 2020; originally announced June 2020.

Comments: 29 Pages, 13 Figures, 19 Tables, Accepted in Springer's Applied Intelligence

Showing 1–50 of 59 results for author: Moubayed, A