Skip to main content

Showing 1–50 of 59 results for author: Moubayed, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17769  [pdf, ps, other

    cs.LG

    Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models

    Authors: Patrick Leask, Neel Nanda, Noura Al Moubayed

    Abstract: Sparse autoencoders (SAEs) are a popular method for decomposing Large Langage Models (LLM) activations into interpretable latents. However, due to their substantial training cost, most academic research uses open-source SAEs which are only available for a restricted set of models of up to 27B parameters. SAE latents are also learned from a dataset of activations, which means they do not transfer b… ▽ More

    Submitted 12 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  2. arXiv:2504.13816  [pdf, ps, other

    cs.CL

    Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

    Authors: Chenghao Xiao, Hou Pong Chan, Hao Zhang, Mahani Aljunied, Lidong Bing, Noura Al Moubayed, Yu Rong

    Abstract: While understanding the knowledge boundaries of LLMs is crucial to prevent hallucination, research on the knowledge boundaries of LLMs has predominantly focused on English. In this work, we present the first study to analyze how LLMs recognize knowledge boundaries across different languages by probing their internal representations when processing known and unknown questions in multiple languages.… ▽ More

    Submitted 6 June, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: ACL 2025 main; camera ready

  3. arXiv:2504.10471  [pdf, other

    cs.CV cs.CL

    MIEB: Massive Image Embedding Benchmark

    Authors: Chenghao Xiao, Isaac Chung, Imene Kerboua, Jamie Stirling, Xin Zhang, Márton Kardos, Roman Solomatin, Noura Al Moubayed, Kenneth Enevoldsen, Niklas Muennighoff

    Abstract: Image representations are often evaluated through disjointed, task-specific protocols, leading to a fragmented understanding of model capabilities. For instance, it is unclear whether an image embedding model adept at clustering images is equally good at retrieving relevant images given a piece of text. We introduce the Massive Image Embedding Benchmark (MIEB) to evaluate the performance of image… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  4. arXiv:2502.04878  [pdf, other

    cs.LG cs.AI

    Sparse Autoencoders Do Not Find Canonical Units of Analysis

    Authors: Patrick Leask, Bart Bussmann, Michael Pearce, Joseph Bloom, Curt Tigges, Noura Al Moubayed, Lee Sharkey, Neel Nanda

    Abstract: A common goal of mechanistic interpretability is to decompose the activations of neural networks into features: interpretable properties of the input computed by the model. Sparse autoencoders (SAEs) are a popular method for finding these features in LLMs, and it has been postulated that they can be used to find a \textit{canonical} set of units: a unique and complete list of atomic features. We c… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: Accepted to ICLR 2025

  5. arXiv:2411.10503  [pdf, other

    cs.CV cs.CL cs.LG

    Everything is a Video: Unifying Modalities through Next-Frame Prediction

    Authors: G. Thomas Hudson, Dean Slack, Thomas Winterbottom, Jamie Sterling, Chenghao Xiao, Junjie Shentu, Noura Al Moubayed

    Abstract: Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Traditional approaches rely on modality-specific encoders and late fusion techniques, which can hinder scalability and flexibility when adapting to new tasks o… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: 10 pages, 10 figures

  6. arXiv:2409.00438  [pdf, other

    cs.LG cs.AI

    Breaking Down Financial News Impact: A Novel AI Approach with Geometric Hypergraphs

    Authors: Anoushka Harit, Zhongtian Sun, Jongmin Yu, Noura Al Moubayed

    Abstract: In the fast-paced and volatile financial markets, accurately predicting stock movements based on financial news is critical for investors and analysts. Traditional models often struggle to capture the intricate and dynamic relationships between news events and market reactions, limiting their ability to provide actionable insights. This paper introduces a novel approach leveraging Explainable Arti… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 16 pages, conference

  7. arXiv:2406.17911  [pdf, other

    cs.CL

    X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation

    Authors: Kun Zhao, Chenghao Xiao, Sixing Yan, Haoteng Tang, William K. Cheung, Noura Al Moubayed, Liang Zhan, Chenghua Lin

    Abstract: Radiology Report Generation (RRG) has advanced considerably with the development of multimodal generative models. Despite the progress, the field still faces significant challenges in evaluation, as existing metrics lack robustness and fairness. We reveal that, RRG with high performance on existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by le… ▽ More

    Submitted 19 May, 2025; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: BioLaySumm shared-task 2025 official dataset

  8. arXiv:2405.17965  [pdf, other

    cs.CV

    AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

    Authors: Junjie Shentu, Matthew Watson, Noura Al Moubayed

    Abstract: With the unprecedented performance being achieved by text-to-image (T2I) diffusion models, T2I customization further empowers users to tailor the diffusion model to new concepts absent in the pre-training dataset, termed subject-driven generation. Moreover, extracting several new concepts from a single image enables the model to learn multiple concepts, and simultaneously decreases the difficultie… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  9. arXiv:2405.17450  [pdf, other

    cs.CV cs.LG

    The Power of Next-Frame Prediction for Learning Physical Laws

    Authors: Thomas Winterbottom, G. Thomas Hudson, Daniel Kluvanec, Dean Slack, Jamie Sterling, Junjie Shentu, Chenghao Xiao, Zheming Zhou, Noura Al Moubayed

    Abstract: Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data. Inspired by the empirical success of causal language modelling and next-token prediction in language modelling, we explore the extent to which next-frame prediction serves as a strong foundational learning strategy (analogous to language modelling) for inducing an understanding of the… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 7 Figures, 12 Pages, 1 Table

    MSC Class: 68T45 ACM Class: I.2.6; I.2.10

  10. arXiv:2404.06347  [pdf, other

    cs.CL cs.IR

    RAR-b: Reasoning as Retrieval Benchmark

    Authors: Chenghao Xiao, G Thomas Hudson, Noura Al Moubayed

    Abstract: Semantic textual similartiy (STS) and information retrieval tasks (IR) tasks have been the two major avenues to record the progress of embedding models in the past few years. Under the emerging Retrieval-augmented Generation (RAG) paradigm, we envision the need to evaluate next-level language understanding abilities of embedding models, and take a conscious look at the reasoning abilities stored i… ▽ More

    Submitted 12 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: v2, small typo fixes

  11. arXiv:2403.19897  [pdf, other

    cs.CV cs.LG

    Disentangling Racial Phenotypes: Fine-Grained Control of Race-related Facial Phenotype Characteristics

    Authors: Seyma Yucer, Amir Atapour Abarghouei, Noura Al Moubayed, Toby P. Breckon

    Abstract: Achieving an effective fine-grained appearance variation over 2D facial images, whilst preserving facial identity, is a challenging task due to the high complexity and entanglement of common 2D facial feature encoding spaces. Despite these challenges, such fine-grained control, by way of disentanglement is a crucial enabler for data-driven racial bias mitigation strategies across multiple automate… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  12. arXiv:2402.09966  [pdf, other

    cs.CV

    Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

    Authors: Junjie Shentu, Matthew Watson, Noura Al Moubayed

    Abstract: Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-ima… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  13. arXiv:2402.08183  [pdf, other

    cs.CL cs.CV

    Pixel Sentence Representation Learning

    Authors: Chenghao Xiao, Zhuoxu Huang, Danlu Chen, G Thomas Hudson, Yizhi Li, Haoran Duan, Chenghua Lin, Jie Fu, Jungong Han, Noura Al Moubayed

    Abstract: Pretrained language models are long known to be subpar in capturing sentence and document-level semantics. Though heavily investigated, transferring perturbation-based methods from unsupervised visual representation learning to NLP remains an unsolved problem. This is largely due to the discreteness of subword units brought by tokenization of language models, limiting small perturbations of inputs… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  14. arXiv:2402.01655  [pdf, other

    cs.CY cs.AI cs.LG

    A Deep Learning Approach Towards Student Performance Prediction in Online Courses: Challenges Based on a Global Perspective

    Authors: Abdallah Moubayed, MohammadNoor Injadat, Nouh Alhindawi, Ghassan Samara, Sara Abuasal, Raed Alazaidah

    Abstract: Analyzing and evaluating students' progress in any learning environment is stressful and time consuming if done using traditional analysis methods. This is further exasperated by the increasing number of students due to the shift of focus toward integrating the Internet technologies in education and the focus of academic institutions on moving toward e-Learning, blended, or online learning models.… ▽ More

    Submitted 10 January, 2024; originally announced February 2024.

    Comments: Accepted and presented in 24th International Arab Conference on Information Technology (ACIT'2023)

  15. arXiv:2401.13478  [pdf, other

    cs.IR cs.CL cs.CV cs.MM

    SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

    Authors: Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kaijing Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

    Abstract: Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in… ▽ More

    Submitted 11 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: camera-ready version for ACL 2024 Findings

  16. arXiv:2310.16193  [pdf, other

    cs.CL cs.AI

    Length is a Curse and a Blessing for Document-level Semantics

    Authors: Chenghao Xiao, Yizhi Li, G Thomas Hudson, Chenghua Lin, Noura Al Moubayed

    Abstract: In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023. Our code is publicly available at https://github.com/gowitheflow-1998/LA-SER-cubed

  17. arXiv:2309.11895  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Audio Contrastive based Fine-tuning

    Authors: Yang Wang, Qibin Liang, Chenghao Xiao, Yizhi Li, Noura Al Moubayed, Chenghua Lin

    Abstract: Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuni… ▽ More

    Submitted 19 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Under review

  18. arXiv:2308.09171  [pdf, other

    cs.CR cs.AI cs.LG cs.NI

    Forensic Data Analytics for Anomaly Detection in Evolving Networks

    Authors: Li Yang, Abdallah Moubayed, Abdallah Shami, Amine Boukhtouta, Parisa Heidari, Stere Preda, Richard Brunner, Daniel Migault, Adel Larabi

    Abstract: In the prevailing convergence of traditional infrastructure-based deployment (i.e., Telco and industry operational networks) towards evolving deployments enabled by 5G and virtualization, there is a keen interest in elaborating effective security controls to protect these deployments in-depth. By considering key enabling technologies like 5G and virtualization, evolving networks are democratized,… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Electronic version of an article published as [Book Series: World Scientific Series in Digital Forensics and Cybersecurity, Volume 2, Innovations in Digital Forensics, 2023, Pages 99-137] [DOI:10.1142/9789811273209_0004] \c{opyright} copyright World Scientific Publishing Company [https://doi.org/10.1142/9789811273209_0004]

    MSC Class: 68T01 ACM Class: I.2.6; C.2.0

  19. arXiv:2305.00817  [pdf, other

    cs.CV

    Racial Bias within Face Recognition: A Survey

    Authors: Seyma Yucer, Furkan Tektas, Noura Al Moubayed, Toby P. Breckon

    Abstract: Facial recognition is one of the most academically studied and industrially developed areas within computer vision where we readily find associated applications deployed globally. This widespread adoption has uncovered significant performance variation across subjects of different racial profiles leading to focused research attention on racial bias within face recognition spanning both current cau… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  20. Language as a Latent Sequence: deep latent variable models for semi-supervised paraphrase generation

    Authors: Jialin Yu, Alexandra I. Cristea, Anoushka Harit, Zhongtian Sun, Olanrewaju Tahir Aduragba, Lei Shi, Noura Al Moubayed

    Abstract: This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text. To leverage information from text pairs, we add… ▽ More

    Submitted 8 September, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  21. arXiv:2212.09170  [pdf, other

    cs.CL cs.AI

    On Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning

    Authors: Chenghao Xiao, Yang Long, Noura Al Moubayed

    Abstract: Incorporating contrastive learning objectives in sentence representation learning (SRL) has yielded significant improvements on many sentence-level NLP tasks. However, it is not well understood why contrastive learning works for learning sentence-level semantics. In this paper, we aim to help guide future designs of sentence representation learning methods by taking a closer look at contrastive SR… ▽ More

    Submitted 26 May, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: Accepted by ACL 2023 (Findings, long paper)

  22. arXiv:2211.01266  [pdf, other

    cs.LG cs.AI eess.SY

    Knowing the Past to Predict the Future: Reinforcement Virtual Learning

    Authors: Peng Zhang, Yawen Huang, Bingzhang Hu, Shizheng Wang, Haoran Duan, Noura Al Moubayed, Yefeng Zheng, Yang Long

    Abstract: Reinforcement Learning (RL)-based control system has received considerable attention in recent decades. However, in many real-world problems, such as Batch Process Control, the environment is uncertain, which requires expensive interaction to acquire the state and reward values. In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space us… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  23. arXiv:2209.01061  [pdf, other

    cs.CL cs.AI

    INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations

    Authors: Jialin Yu, Alexandra I. Cristea, Anoushka Harit, Zhongtian Sun, Olanrewaju Tahir Aduragba, Lei Shi, Noura Al Moubayed

    Abstract: XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by pro… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

  24. arXiv:2208.07613  [pdf, other

    cs.CV cs.CY cs.LG

    Does lossy image compression affect racial bias within face recognition?

    Authors: Seyma Yucer, Matt Poyser, Noura Al Moubayed, Toby P. Breckon

    Abstract: Yes - This study investigates the impact of commonplace lossy image compression on face recognition algorithms with regard to the racial characteristics of the subject. We adopt a recently proposed racial phenotype-based bias analysis methodology to measure the effect of varying levels of lossy compression across racial phenotype categories. Additionally, we determine the relationship between chro… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

  25. arXiv:2208.03824  [pdf, other

    cs.CV cs.LG

    Towards Graph Representation Learning Based Surgical Workflow Anticipation

    Authors: Xiatian Zhang, Noura Al Moubayed, Hubert P. H. Shum

    Abstract: Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, current approaches are limited to their insufficient expressive power for relationships between instruments. Hence, we propose a graph representation le… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: Proceedings of the 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2022

  26. arXiv:2205.14040  [pdf, other

    cs.NI cs.LG

    Intelligent Transportation Systems' Orchestration: Lessons Learned & Potential Opportunities

    Authors: Abdallah Moubayed, Abdallah Shami, Abbas Ibrahim

    Abstract: The growing deployment efforts of 5G networks globally has led to the acceleration of the businesses/services' digital transformation. This growth has led to the need for new communication technologies that will promote this transformation. 6G is being proposed as the set of technologies and architectures that will achieve this target. Among the main use cases that have emerged for 5G networks and… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 6 pages, 3 figures, accepted and presented in 1320th International Conference on Recent Innovations in Engineering and Technology (ICRIET-2022)

    ACM Class: J.2; I.2.11

  27. arXiv:2203.13004  [pdf, other

    cs.LG

    Using Orientation to Distinguish Overlapping Chromosomes

    Authors: Daniel Kluvanec, Thomas B. Phillips, Kenneth J. W. McCaffrey, Noura Al Moubayed

    Abstract: A difficult step in the process of karyotyping is segmenting chromosomes that touch or overlap. In an attempt to automate the process, previous studies turned to Deep Learning methods, with some formulating the task as a semantic segmentation problem. These models treat separate chromosome instances as semantic classes, which we show to be problematic, since it is uncertain which chromosome should… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: Conference for Health, Inference, and Learning (CHIL) 2022 - Invited non-archival presentation

  28. arXiv:2202.07362  [pdf, other

    cs.CL cs.AI

    MuLD: The Multitask Long Document Benchmark

    Authors: G Thomas Hudson, Noura Al Moubayed

    Abstract: The impressive progress in NLP techniques has been driven by the development of multi-task benchmarks such as GLUE and SuperGLUE. While these benchmarks focus on tasks for one or two input sentences, there has been exciting work in designing efficient techniques for processing much longer inputs. In this paper, we present MuLD: a new long document benchmark consisting of only documents over 10,000… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  29. Measuring Hidden Bias within Face Recognition via Racial Phenotypes

    Authors: Seyma Yucer, Furkan Tektas, Noura Al Moubayed, Toby P. Breckon

    Abstract: Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact on the underlying findings of such racial bias analysis. Previous studies define these groups based on either demographic information (e.g. African, Asian etc.) or skin tone (e.g. ligh… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: published in IEEE Winter Conference on Applications of Computer Vision, WACV, 2022

  30. Mobility Aware Edge Computing Segmentation Towards Localized Orchestration

    Authors: Sam Aleyadeh, Abdallah Moubayed, Abdallah Shami

    Abstract: The current trend in end-user devices' advancements in computing and communication capabilities makes edge computing an attractive solution to pave the way for the coveted ultra-low latency services. The success of the edge computing networking paradigm depends on the proper orchestration of the edge servers. Several Edge applications and services are intolerant to latency, especially in 5G and be… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: accepted at ISNCC 2021

  31. arXiv:2108.01589  [pdf, other

    cs.CL

    ExBERT: An External Knowledge Enhanced BERT for Natural Language Inference

    Authors: Amit Gajbhiye, Noura Al Moubayed, Steven Bradley

    Abstract: Neural language representation models such as BERT, pre-trained on large-scale unstructured corpora lack explicit grounding to real-world commonsense knowledge and are often unable to remember facts required for reasoning and inference. Natural Language Inference (NLI) is a challenging reasoning task that relies on common human understanding of language and real-world commonsense knowledge. We int… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

  32. arXiv:2107.11514  [pdf, other

    cs.CR cs.AI cs.LG cs.NI

    Multi-Perspective Content Delivery Networks Security Framework Using Optimized Unsupervised Anomaly Detection

    Authors: Li Yang, Abdallah Moubayed, Abdallah Shami, Parisa Heidari, Amine Boukhtouta, Adel Larabi, Richard Brunner, Stere Preda, Daniel Migault

    Abstract: Content delivery networks (CDNs) provide efficient content distribution over the Internet. CDNs improve the connectivity and efficiency of global communications, but their caching mechanisms may be breached by cyber-attackers. Among the security mechanisms, effective anomaly detection forms an important part of CDN security enhancement. In this work, we propose a multi-perspective unsupervised lea… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: Accepted and to Appear in IEEE Transactions on Network and Service Management

    MSC Class: 68T01 ACM Class: I.2.6; C.2.0

  33. arXiv:2106.02183  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Equal Gender Representation in the Annotations of Toxic Language Detection

    Authors: Elizabeth Excell, Noura Al Moubayed

    Abstract: Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper, we focus on the differences in the ways men and women annotate comments for toxicity, investigating how these differences result in models that amplify the opi… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: Paper is accepted at GeBNLP2021 workshop at ACL-IJCNLP 2021

  34. arXiv:2105.13289  [pdf, other

    cs.CR cs.AI cs.LG cs.NI

    MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles

    Authors: Li Yang, Abdallah Moubayed, Abdallah Shami

    Abstract: Modern vehicles, including connected vehicles and autonomous vehicles, nowadays involve many electronic control units connected through intra-vehicle networks to implement various functionalities and perform actions. Modern vehicles are also connected to external networks through vehicle-to-everything technologies, enabling their communications with other vehicles, infrastructures, and smart devic… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

    Comments: Accepted and to appear in IEEE Internet of Things Journal; Code is available at Github link: https://github.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning

    MSC Class: 68T01 ACM Class: I.2.6; C.2.0

  35. arXiv:2105.06791  [pdf, other

    cs.LG

    Agree to Disagree: When Deep Learning Models With Identical Architectures Produce Distinct Explanations

    Authors: Matthew Watson, Bashar Awwad Shiekh Hasan, Noura Al Moubayed

    Abstract: Deep Learning of neural networks has progressively become more prominent in healthcare with models reaching, or even surpassing, expert accuracy levels. However, these success stories are tainted by concerning reports on the lack of model transparency and bias against some medical conditions or patients' sub-groups. Explainable methods are considered the gateway to alleviate many of these concerns… ▽ More

    Submitted 30 October, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: 9 pages, 5 figures, 3 tables

    ACM Class: I.2

  36. arXiv:2105.01959  [pdf, other

    cs.LG cs.CR

    Attack-agnostic Adversarial Detection on Medical Data Using Explainable Machine Learning

    Authors: Matthew Watson, Noura Al Moubayed

    Abstract: Explainable machine learning has become increasingly prevalent, especially in healthcare where explainable models are vital for ethical and trusted automated decision making. Work on the susceptibility of deep learning models to adversarial attacks has shown the ease of designing samples to mislead a model into making incorrect predictions. In this work, we propose a model agnostic explainability-… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 13 pages, 6 figures, accepted to ICPR 2020

    ACM Class: I.2; I.4

  37. Machine Learning Towards Intelligent Systems: Applications, Challenges, and Opportunities

    Authors: MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif, Abdallah Shami

    Abstract: The emergence and continued reliance on the Internet and related technologies has resulted in the generation of large amounts of data that can be made available for analyses. However, humans do not possess the cognitive capabilities to understand such large amounts of data. Machine learning (ML) provides a mechanism for humans to process large amounts of data, gain insights about the behavior of t… ▽ More

    Submitted 10 January, 2021; originally announced January 2021.

    Comments: 46 pages, 7 figures, 5 tables, journal

  38. Curvature-based Feature Selection with Application in Classifying Electronic Health Records

    Authors: Zheming Zuo, Jie Li, Han Xu, Noura Al Moubayed

    Abstract: Disruptive technologies provides unparalleled opportunities to contribute to the identifications of many aspects in pervasive healthcare, from the adoption of the Internet of Things through to Machine Learning (ML) techniques. As a powerful tool, ML has been widely applied in patient-centric healthcare solutions. To further improve the quality of patient care, Electronic Health Records (EHRs) are… ▽ More

    Submitted 30 November, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

    Comments: Accepted by Technological Forecasting and Social Change; Source code available

  39. Relationship between Student Engagement and Performance in e-Learning Environment Using Association Rules

    Authors: Abdallah Moubayed, MohammadNoor Injadat, Abdallah Shami, Hanan Lutfiyya

    Abstract: The field of e-learning has emerged as a topic of interest in academia due to the increased ease of accessing the Internet using using smart-phones and wireless devices. One of the challenges facing e-learning platforms is how to keep students motivated and engaged. Moreover, it is also crucial to identify the students that might need help in order to make sure their academic performance doesn't s… ▽ More

    Submitted 25 December, 2020; originally announced January 2021.

    Comments: 1 Table, 1 Figure, published in 2018 IEEE World Engineering Education Conference (EDUNINE)

    Journal ref: 2018 IEEE World Engineering Education Conference (EDUNINE), 2018, pp. 1-6

  40. DNS Typo-squatting Domain Detection: A Data Analytics & Machine Learning Based Approach

    Authors: Abdallah Moubayed, MohammadNoor Injadat, Abdallah Shami, Hanan Lutfiyya

    Abstract: Domain Name System (DNS) is a crucial component of current IP-based networks as it is the standard mechanism for name to IP resolution. However, due to its lack of data integrity and origin authentication processes, it is vulnerable to a variety of attacks. One such attack is Typosquatting. Detecting this attack is particularly important as it can be a threat to corporate secrets and can be used t… ▽ More

    Submitted 25 December, 2020; originally announced December 2020.

    Comments: 7 pages, 6 figures, 3 tables, published in 2018 IEEE Global Communications Conference (GLOBECOM)

    Journal ref: 2018 IEEE Global Communications Conference (GLOBECOM), 2018, pp. 1-7

  41. arXiv:2012.11326  [pdf, other

    cs.CR cs.LG cs.NI

    Optimized Random Forest Model for Botnet Detection Based on DNS Queries

    Authors: Abdallah Moubayed, MohammadNoor Injadat, Abdallah Shami

    Abstract: The Domain Name System (DNS) protocol plays a major role in today's Internet as it translates between website names and corresponding IP addresses. However, due to the lack of processes for data integrity and origin authentication, the DNS protocol has several security vulnerabilities. This often leads to a variety of cyber-attacks, including botnet network attacks. One promising solution to detec… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: 4 pages, 3 figures, 1 table, Accepted and presented in IEEE 32nd International Conference on Microelectronics (IEEE-ICM2020)

  42. arXiv:2012.11325  [pdf, other

    cs.CR cs.LG cs.NI

    Detecting Botnet Attacks in IoT Environments: An Optimized Machine Learning Approach

    Authors: MohammadNoor Injadat, Abdallah Moubayed, Abdallah Shami

    Abstract: The increased reliance on the Internet and the corresponding surge in connectivity demand has led to a significant growth in Internet-of-Things (IoT) devices. The continued deployment of IoT devices has in turn led to an increase in network attacks due to the larger number of potential attack surfaces as illustrated by the recent reports that IoT malware attacks increased by 215.7% from 10.3 milli… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: 4 pages, 2 figures, 1 table, Accepted and presented at IEEE 32nd International Conference on Microelectronics (IEEE-ICM2020)

  43. arXiv:2012.10285  [pdf, other

    cs.CV cs.CL

    Trying Bilinear Pooling in Video-QA

    Authors: Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed

    Abstract: Bilinear pooling (BLP) refers to a family of operations recently developed for fusing features from different modalities predominantly developed for VQA models. A bilinear (outer-product) expansion is thought to encourage models to learn interactions between two feature spaces and has experimentally outperformed `simpler' vector operations (concatenation and element-wise-addition/multiplication) o… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

    Comments: 16 Pages, 8 Figures, 4 Tables, +Supp Mats

    MSC Class: 68T99 ACM Class: I.2.10; I.2.7

  44. arXiv:2012.10210  [pdf, other

    cs.CV cs.AI cs.CL

    On Modality Bias in the TVQA Dataset

    Authors: Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed

    Abstract: TVQA is a large scale video question answering (video-QA) dataset based on popular TV shows. The questions were specifically designed to require "both vision and language understanding to answer". In this work, we demonstrate an inherent bias in the dataset towards the textual subtitle modality. We infer said bias both directly and indirectly, notably finding that models trained with subtitles lea… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

    Comments: 10 pages, 4 Figures, 2 Tables, +Supp Mats, BMVC 2020

    MSC Class: 68T99 ACM Class: I.2.10; I.2.7; I.2.4

  45. Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models

    Authors: Amit Gajbhiye, Thomas Winterbottom, Noura Al Moubayed, Steven Bradley

    Abstract: We consider the task of incorporating real-world commonsense knowledge into deep Natural Language Inference (NLI) models. Existing external knowledge incorporation methods are limited to lexical level knowledge and lack generalization across NLI models, datasets, and commonsense knowledge sources. To address these issues, we propose a novel NLI model-independent neural framework, BiCAM. BiCAM inco… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: Published in Lecture Notes in Computer Science, Springer International Publishing

  46. arXiv:2010.07223  [pdf, other

    cs.NI eess.SP

    Cost-optimal V2X Service Placement in Distributed Cloud/Edge Environment

    Authors: Abdallah Moubayed, Abdallah Shami, Parisa Heidari, Adel Larabi, Richard Brunner

    Abstract: Deploying V2X services has become a challenging task. This is mainly due to the fact that such services have strict latency requirements. To meet these requirements, one potential solution is adopting mobile edge computing (MEC). However, this presents new challenges including how to find a cost efficient placement that meets other requirements such as latency. In this work, the problem of cost-op… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 6 pages, 4 figures, 1 table, 1 algorithm pseudocode, Accepted & presented in IEEE 16th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2020)

  47. arXiv:2009.03756  [pdf, other

    cs.NI cs.LG

    Machine Learning Towards Enabling Spectrum-as-a-Service Dynamic Sharing

    Authors: Abdallah Moubayed, Tanveer Ahmed, Anwar Haque, Abdallah Shami

    Abstract: The growth in wireless broadband users, devices, and novel applications has led to a significant increase in the demand for new radio frequency spectrum. This is expected to grow even further given the projection that the global traffic per year will reach 4.8 zettabytes by 2022. Moreover, it is projected that the number of Internet users will reach 4.8 billion and the number of connected devices… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

    Comments: 6 pages, 2 figures, Accepted and presented in 2020 IEEE Canadian Conference On Electrical And Computer Engineering (CCECE 2020)

  48. Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection

    Authors: MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif, Abdallah Shami

    Abstract: Cyber-security garnered significant attention due to the increased dependency of individuals and organizations on the Internet and their concern about the security and privacy of their online activities. Several previous machine learning (ML)-based network intrusion detection systems (NIDSs) have been developed to protect against malicious online behavior. This paper proposes a novel multi-stage o… ▽ More

    Submitted 8 August, 2020; originally announced August 2020.

    Comments: 14 Pages, 13 Figures, 4 tables, Published IEEE Transactions on Network and Service Management ( Early Access )

    Journal ref: Electronic ISSN: 1932-4537

  49. arXiv:2006.09272  [pdf, other

    cs.CR cs.LG cs.NI stat.ML

    Ensemble-based Feature Selection and Classification Model for DNS Typo-squatting Detection

    Authors: Abdallah Moubayed, Emad Aqeeli, Abdallah Shami

    Abstract: Domain Name System (DNS) plays in important role in the current IP-based Internet architecture. This is because it performs the domain name to IP resolution. However, the DNS protocol has several security vulnerabilities due to the lack of data integrity and origin authentication within it. This paper focuses on one particular security vulnerability, namely typo-squatting. Typo-squatting refers to… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: 6 pages, 2 figures, 6 tables, Accepted in 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE 2020)

  50. Multi-split Optimized Bagging Ensemble Model Selection for Multi-class Educational Data Mining

    Authors: MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif, Abdallah Shami

    Abstract: Predicting students' academic performance has been a research area of interest in recent years with many institutions focusing on improving the students' performance and the education quality. The analysis and prediction of students' performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the studen… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: 29 Pages, 13 Figures, 19 Tables, Accepted in Springer's Applied Intelligence