-
DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization
Authors:
Renjith Prasad,
Abhilekh Borah,
Hasnat Md Abdullah,
Chathurangi Shyalika,
Gurpreet Singh,
Ritvik Garimella,
Rajarshi Roy,
Harshul Surana,
Nasrin Imanpour,
Suranjana Trivedy,
Amit Sheth,
Amitava Das
Abstract:
Alignment is crucial for text-to-image (T2I) models to ensure that generated images faithfully capture user intent while maintaining safety and fairness. Direct Preference Optimization (DPO), prominent in large language models (LLMs), is extending its influence to T2I systems. This paper introduces DPO-Kernels for T2I models, a novel extension enhancing alignment across three dimensions: (i) Hybri…
▽ More
Alignment is crucial for text-to-image (T2I) models to ensure that generated images faithfully capture user intent while maintaining safety and fairness. Direct Preference Optimization (DPO), prominent in large language models (LLMs), is extending its influence to T2I systems. This paper introduces DPO-Kernels for T2I models, a novel extension enhancing alignment across three dimensions: (i) Hybrid Loss, integrating embedding-based objectives with traditional probability-based loss for improved optimization; (ii) Kernelized Representations, employing Radial Basis Function (RBF), Polynomial, and Wavelet kernels for richer feature transformations and better separation between safe and unsafe inputs; and (iii) Divergence Selection, expanding beyond DPO's default Kullback-Leibler (KL) regularizer by incorporating Wasserstein and R'enyi divergences for enhanced stability and robustness. We introduce DETONATE, the first large-scale benchmark of its kind, comprising approximately 100K curated image pairs categorized as chosen and rejected. DETONATE encapsulates three axes of social bias and discrimination: Race, Gender, and Disability. Prompts are sourced from hate speech datasets, with images generated by leading T2I models including Stable Diffusion 3.5 Large, Stable Diffusion XL, and Midjourney. Additionally, we propose the Alignment Quality Index (AQI), a novel geometric measure quantifying latent-space separability of safe/unsafe image activations, revealing hidden vulnerabilities. Empirically, we demonstrate that DPO-Kernels maintain strong generalization bounds via Heavy-Tailed Self-Regularization (HT-SR). DETONATE and complete code are publicly released.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations
Authors:
Abhilekh Borah,
Chhavi Sharma,
Danush Khanna,
Utkarsh Bhatt,
Gurpreet Singh,
Hasnat Md Abdullah,
Raghav Kaushik Ravi,
Vinija Jain,
Jyoti Patel,
Shubham Singh,
Vasu Sharma,
Arpita Vats,
Rahul Raja,
Aman Chadha,
Amitava Das
Abstract:
Alignment is no longer a luxury, it is a necessity. As large language models (LLMs) enter high-stakes domains like education, healthcare, governance, and law, their behavior must reliably reflect human-aligned values and safety constraints. Yet current evaluations rely heavily on behavioral proxies such as refusal rates, G-Eval scores, and toxicity classifiers, all of which have critical blind spo…
▽ More
Alignment is no longer a luxury, it is a necessity. As large language models (LLMs) enter high-stakes domains like education, healthcare, governance, and law, their behavior must reliably reflect human-aligned values and safety constraints. Yet current evaluations rely heavily on behavioral proxies such as refusal rates, G-Eval scores, and toxicity classifiers, all of which have critical blind spots. Aligned models are often vulnerable to jailbreaking, stochasticity of generation, and alignment faking.
To address this issue, we introduce the Alignment Quality Index (AQI). This novel geometric and prompt-invariant metric empirically assesses LLM alignment by analyzing the separation of safe and unsafe activations in latent space. By combining measures such as the Davies-Bouldin Score (DBS), Dunn Index (DI), Xie-Beni Index (XBI), and Calinski-Harabasz Index (CHI) across various formulations, AQI captures clustering quality to detect hidden misalignments and jailbreak risks, even when outputs appear compliant. AQI also serves as an early warning signal for alignment faking, offering a robust, decoding invariant tool for behavior agnostic safety auditing.
Additionally, we propose the LITMUS dataset to facilitate robust evaluation under these challenging conditions. Empirical tests on LITMUS across different models trained under DPO, GRPO, and RLHF conditions demonstrate AQI's correlation with external judges and ability to reveal vulnerabilities missed by refusal metrics. We make our implementation publicly available to foster future research in this area.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Is the Fitness Dependent Optimizer Ready for the Future of Optimization?
Authors:
Ardalan H. Awlla,
Tarik A. Rashid,
Ronak M. Abdullah
Abstract:
Metaheuristic algorithms are optimization methods that are inspired by real phenomena in nature or the behavior of living beings, e.g., animals, to be used for solving complex problems, as in engineering, energy optimization, health care, etc. One of them was the creation of the Fitness Dependent Optimizer (FDO) in 2019, which is based on bee-inspired swarm intelligence and provides efficient opti…
▽ More
Metaheuristic algorithms are optimization methods that are inspired by real phenomena in nature or the behavior of living beings, e.g., animals, to be used for solving complex problems, as in engineering, energy optimization, health care, etc. One of them was the creation of the Fitness Dependent Optimizer (FDO) in 2019, which is based on bee-inspired swarm intelligence and provides efficient optimization. This paper aims to introduce a comprehensive review of FDO, including its basic concepts, main variations, and applications from the beginning. It systematically gathers and examines every relevant paper, providing significant insights into the algorithm's pros and cons. The objective is to assess FDO's performance in several dimensions and to identify its strengths and weaknesses. This study uses a comparative analysis to show how well FDO and its variations work at solving real-world optimization problems, which helps us understand what they can do. Finally, this paper proposes future research directions that can help researchers further enhance the performance of FDO.
△ Less
Submitted 23 January, 2025;
originally announced June 2025.
-
Where Journalism Silenced Voices: Exploring Discrimination in the Representation of Indigenous Communities in Bangladesh
Authors:
Abhijit Paul,
Adity Khisa,
Zarif Masud,
Sharif Md. Abdullah,
Ahmedul Kabir,
Shebuti Rayana
Abstract:
In this paper, we examine the intersections of indigeneity and media representation in shaping perceptions of indigenous communities in Bangladesh. Using a mixed-methods approach, we combine quantitative analysis of media data with qualitative insights from focus group discussions (FGD). First, we identify a total of 4,893 indigenous-related articles from our initial dataset of 2.2 million newspap…
▽ More
In this paper, we examine the intersections of indigeneity and media representation in shaping perceptions of indigenous communities in Bangladesh. Using a mixed-methods approach, we combine quantitative analysis of media data with qualitative insights from focus group discussions (FGD). First, we identify a total of 4,893 indigenous-related articles from our initial dataset of 2.2 million newspaper articles, using a combination of keyword-based filtering and LLM, achieving 77% accuracy and an F1-score of 81.9\%. From manually inspecting 3 prominent Bangla newspapers, we identify 15 genres that we use as our topics for semi-supervised topic modeling using CorEx. Results show indigenous news articles have higher representation of culture and entertainment (19%, 10% higher than general news articles), and a disproportionate focus on conflict and protest (9%, 7% higher than general news). On the other hand, sentiment analysis reveals that 57% of articles on indigenous topics carry a negative tone, compared to 27% for non-indigenous related news. Drawing from communication studies, we further analyze framing, priming, and agenda-setting (frequency of themes) to support the case for discrimination in representation of indigenous news coverage. For the qualitative part of our analysis, we facilitated FGD, where participants further validated these findings. Participants unanimously expressed their feeling of being under-represented, and that critical issues affecting their communities (such as education, healthcare, and land rights) are systematically marginalized in news media coverage. By highlighting 8 cases of discrimination and media misrepresentation that were frequently mentioned by participants in the FGD, this study emphasizes the urgent need for more equitable media practices that accurately reflect the experiences and struggles of marginalized communities.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
It's Not a Walk in the Park! Challenges of Idiom Translation in Speech-to-text Systems
Authors:
Iuliia Zaitova,
Badr M. Abdullah,
Wei Xue,
Dietrich Klakow,
Bernd Möbius,
Tania Avgustinova
Abstract:
Idioms are defined as a group of words with a figurative meaning not deducible from their individual components. Although modern machine translation systems have made remarkable progress, translating idioms remains a major challenge, especially for speech-to-text systems, where research on this topic is notably sparse. In this paper, we systematically evaluate idiom translation as compared to conv…
▽ More
Idioms are defined as a group of words with a figurative meaning not deducible from their individual components. Although modern machine translation systems have made remarkable progress, translating idioms remains a major challenge, especially for speech-to-text systems, where research on this topic is notably sparse. In this paper, we systematically evaluate idiom translation as compared to conventional news translation in both text-to-text machine translation (MT) and speech-to-text translation (SLT) systems across two language pairs (German to English, Russian to English). We compare state-of-the-art end-to-end SLT systems (SeamlessM4T SLT-to-text, Whisper Large v3) with MT systems (SeamlessM4T SLT-to-text, No Language Left Behind), Large Language Models (DeepSeek, LLaMA) and cascaded alternatives. Our results reveal that SLT systems experience a pronounced performance drop on idiomatic data, often reverting to literal translations even in higher layers, whereas MT systems and Large Language Models demonstrate better handling of idioms. These findings underscore the need for idiom-specific strategies and improved internal representations in SLT architectures.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Authors:
Badr M. Abdullah,
Matthew Baas,
Bernd Möbius,
Dietrich Klakow
Abstract:
Arabic dialect identification (ADI) systems are essential for large-scale data collection pipelines that enable the development of inclusive speech technologies for Arabic language varieties. However, the reliability of current ADI systems is limited by poor generalization to out-of-domain speech. In this paper, we present an effective approach based on voice conversion for training ADI models tha…
▽ More
Arabic dialect identification (ADI) systems are essential for large-scale data collection pipelines that enable the development of inclusive speech technologies for Arabic language varieties. However, the reliability of current ADI systems is limited by poor generalization to out-of-domain speech. In this paper, we present an effective approach based on voice conversion for training ADI models that achieves state-of-the-art performance and significantly improves robustness in cross-domain scenarios. Evaluated on a newly collected real-world test set spanning four different domains, our approach yields consistent improvements of up to +34.1% in accuracy across domains. Furthermore, we present an analysis of our approach and demonstrate that voice conversion helps mitigate the speaker bias in the ADI dataset. We release our robust ADI model and cross-domain evaluation dataset to support the development of inclusive speech technologies for Arabic.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
Authors:
Kangda Wei,
Hasnat Md Abdullah,
Ruihong Huang
Abstract:
Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects across different contexts. To address this issue, we propose a novel data generation framework that fosters exploratory thinking in LLMs. Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios,…
▽ More
Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects across different contexts. To address this issue, we propose a novel data generation framework that fosters exploratory thinking in LLMs. Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios, then elicits and compares their moral judgments. When inconsistencies arise, the model is guided to produce balanced, gender-neutral judgments. These story-judgment pairs are used to fine-tune or optimize the models via Direct Preference Optimization (DPO). Experimental results show that our method significantly reduces gender bias while preserving or even enhancing general model capabilities. We will release the code and generated data.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax
Authors:
Iuliia Zaitova,
Vitalii Hirak,
Badr M. Abdullah,
Dietrich Klakow,
Bernd Möbius,
Tania Avgustinova
Abstract:
This study analyzes the attention patterns of fine-tuned encoder-only models based on the BERT architecture (BERT-based models) towards two distinct types of Multiword Expressions (MWEs): idioms and microsyntactic units (MSUs). Idioms present challenges in semantic non-compositionality, whereas MSUs demonstrate unconventional syntactic behavior that does not conform to standard grammatical categor…
▽ More
This study analyzes the attention patterns of fine-tuned encoder-only models based on the BERT architecture (BERT-based models) towards two distinct types of Multiword Expressions (MWEs): idioms and microsyntactic units (MSUs). Idioms present challenges in semantic non-compositionality, whereas MSUs demonstrate unconventional syntactic behavior that does not conform to standard grammatical categorizations. We aim to understand whether fine-tuning BERT-based models on specific tasks influences their attention to MWEs, and how this attention differs between semantic and syntactic tasks. We examine attention scores to MWEs in both pre-trained and fine-tuned BERT-based models. We utilize monolingual models and datasets in six Indo-European languages - English, German, Dutch, Polish, Russian, and Ukrainian. Our results show that fine-tuning significantly influences how models allocate attention to MWEs. Specifically, models fine-tuned on semantic tasks tend to distribute attention to idiomatic expressions more evenly across layers. Models fine-tuned on syntactic tasks show an increase in attention to MSUs in the lower layers, corresponding with syntactic processing requirements.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ)
Authors:
Abhilekh Borah,
Hasnat Md Abdullah,
Kangda Wei,
Ruihong Huang
Abstract:
The rise of Large Language Models (LLMs) has raised questions about their ability to understand climate-related contexts. Though climate change dominates social media, analyzing its multimodal expressions is understudied, and current tools have failed to determine whether LLMs amplify credible solutions or spread unsubstantiated claims. To address this, we introduce CliME (Climate Change Multimoda…
▽ More
The rise of Large Language Models (LLMs) has raised questions about their ability to understand climate-related contexts. Though climate change dominates social media, analyzing its multimodal expressions is understudied, and current tools have failed to determine whether LLMs amplify credible solutions or spread unsubstantiated claims. To address this, we introduce CliME (Climate Change Multimodal Evaluation), a first-of-its-kind multimodal dataset, comprising 2579 Twitter and Reddit posts. The benchmark features a diverse collection of humorous memes and skeptical posts, capturing how these formats distill complex issues into viral narratives that shape public opinion and policy discussions. To systematically evaluate LLM performance, we present the Climate Alignment Quotient (CAQ), a novel metric comprising five distinct dimensions: Articulation, Evidence, Resonance, Transition, and Specificity. Additionally, we propose three analytical lenses: Actionability, Criticality, and Justice, to guide the assessment of LLM-generated climate discourse using CAQ. Our findings, based on the CAQ metric, indicate that while most evaluated LLMs perform relatively well in Criticality and Justice, they consistently underperform on the Actionability axis. Among the models evaluated, Claude 3.7 Sonnet achieves the highest overall performance. We publicly release our CliME dataset and code to foster further research in this domain.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
State-of-the-Art Translation of Text-to-Gloss using mBART : A case study of Bangla
Authors:
Sharif Md. Abdullah,
Abhijit Paul,
Shebuti Rayana,
Ahmedul Kabir,
Zarif Masud
Abstract:
Despite a large deaf and dumb population of 1.7 million, Bangla Sign Language (BdSL) remains a understudied domain. Specifically, there are no works on Bangla text-to-gloss translation task. To address this gap, we begin by addressing the dataset problem. We take inspiration from grammatical rule based gloss generation used in Germany and American sign langauage (ASL) and adapt it for BdSL. We als…
▽ More
Despite a large deaf and dumb population of 1.7 million, Bangla Sign Language (BdSL) remains a understudied domain. Specifically, there are no works on Bangla text-to-gloss translation task. To address this gap, we begin by addressing the dataset problem. We take inspiration from grammatical rule based gloss generation used in Germany and American sign langauage (ASL) and adapt it for BdSL. We also leverage LLM to generate synthetic data and use back-translation, text generation for data augmentation. With dataset prepared, we started experimentation. We fine-tuned pretrained mBART-50 and mBERT-multiclass-uncased model on our dataset. We also trained GRU, RNN and a novel seq-to-seq model with multi-head attention. We observe significant high performance (ScareBLEU=79.53) with fine-tuning pretrained mBART-50 multilingual model from Facebook. We then explored why we observe such high performance with mBART. We soon notice an interesting property of mBART -- it was trained on shuffled and masked text data. And as we know, gloss form has shuffling property. So we hypothesize that mBART is inherently good at text-to-gloss tasks. To find support against this hypothesis, we trained mBART-50 on PHOENIX-14T benchmark and evaluated it with existing literature. Our mBART-50 finetune demonstrated State-of-the-Art performance on PHOENIX-14T benchmark, far outperforming existing models in all 6 metrics (ScareBLEU = 63.89, BLEU-1 = 55.14, BLEU-2 = 38.07, BLEU-3 = 27.13, BLEU-4 = 20.68, COMET = 0.624). Based on the results, this study proposes a new paradigm for text-to-gloss task using mBART models. Additionally, our results show that BdSL text-to-gloss task can greatly benefit from rule-based synthetic dataset.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Towards Continuous Experiment-driven MLOps
Authors:
Keerthiga Rajenthiram,
Milad Abdullah,
Ilias Gerostathopoulos,
Petr Hnetynka,
Tomáš Bureš,
Gerard Pons,
Besim Bilalli,
Anna Queralt
Abstract:
Despite advancements in MLOps and AutoML, ML development still remains challenging for data scientists. First, there is poor support for and limited control over optimizing and evolving ML models. Second, there is lack of efficient mechanisms for continuous evolution of ML models which would leverage the knowledge gained in previous optimizations of the same or different models. We propose an expe…
▽ More
Despite advancements in MLOps and AutoML, ML development still remains challenging for data scientists. First, there is poor support for and limited control over optimizing and evolving ML models. Second, there is lack of efficient mechanisms for continuous evolution of ML models which would leverage the knowledge gained in previous optimizations of the same or different models. We propose an experiment-driven MLOps approach which tackles these problems. Our approach relies on the concept of an experiment, which embodies a fully controllable optimization process. It introduces full traceability and repeatability to the optimization process, allows humans to be in full control of it, and enables continuous improvement of the ML system. Importantly, it also establishes knowledge, which is carried over and built across a series of experiments and allows for improving the efficiency of experimentation over time. We demonstrate our approach through its realization and application in the ExtremeXP1 project (Horizon Europe).
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
A Comparative Performance Analysis of Classification and Segmentation Models on Bangladeshi Pothole Dataset
Authors:
Antara Firoz Parsa,
S. M. Abdullah,
Anika Hasan Talukder,
Md. Asif Shahidullah Kabbya,
Shakib Al Hasan,
Md. Farhadul Islam,
Jannatun Noor
Abstract:
The study involves a comprehensive performance analysis of popular classification and segmentation models, applied over a Bangladeshi pothole dataset, being developed by the authors of this research. This custom dataset of 824 samples, collected from the streets of Dhaka and Bogura performs competitively against the existing industrial and custom datasets utilized in the present literature. The da…
▽ More
The study involves a comprehensive performance analysis of popular classification and segmentation models, applied over a Bangladeshi pothole dataset, being developed by the authors of this research. This custom dataset of 824 samples, collected from the streets of Dhaka and Bogura performs competitively against the existing industrial and custom datasets utilized in the present literature. The dataset was further augmented four-fold for segmentation and ten-fold for classification evaluation. We tested nine classification models (CCT, CNN, INN, Swin Transformer, ConvMixer, VGG16, ResNet50, DenseNet201, and Xception) and four segmentation models (U-Net, ResU-Net, U-Net++, and Attention-Unet) over both the datasets. Among the classification models, lightweight models namely CCT, CNN, INN, Swin Transformer, and ConvMixer were emphasized due to their low computational requirements and faster prediction times. The lightweight models performed respectfully, oftentimes equating to the performance of heavyweight models. In addition, augmentation was found to enhance the performance of all the tested models. The experimental results exhibit that, our dataset performs on par or outperforms the similar classification models utilized in the existing literature, reaching accuracy and f1-scores over 99%. The dataset also performed on par with the existing datasets for segmentation, achieving model Dice Similarity Coefficient up to 67.54% and IoU scores up to 59.39%.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)
Authors:
Nasrin Imanpour,
Shashwat Bajpai,
Subhankar Ghosh,
Sainath Reddy Sankepally,
Abhilekh Borah,
Hasnat Md Abdullah,
Nishoak Kosaraju,
Shreyas Dixit,
Ashhar Aziz,
Shwetangshu Biswas,
Vinija Jain,
Aman Chadha,
Amit Sheth,
Amitava Das
Abstract:
The proliferation of AI techniques for image generation, coupled with their increasing accessibility, has raised significant concerns about the potential misuse of these images to spread misinformation. Recent AI-generated image detection (AGID) methods include CNNDetection, NPR, DM Image Detection, Fake Image Detection, DIRE, LASTED, GAN Image Detection, AIDE, SSP, DRCT, RINE, OCC-CLIP, De-Fake,…
▽ More
The proliferation of AI techniques for image generation, coupled with their increasing accessibility, has raised significant concerns about the potential misuse of these images to spread misinformation. Recent AI-generated image detection (AGID) methods include CNNDetection, NPR, DM Image Detection, Fake Image Detection, DIRE, LASTED, GAN Image Detection, AIDE, SSP, DRCT, RINE, OCC-CLIP, De-Fake, and Deep Fake Detection. However, we argue that the current state-of-the-art AGID techniques are inadequate for effectively detecting contemporary AI-generated images and advocate for a comprehensive reevaluation of these methods. We introduce the Visual Counter Turing Test (VCT^2), a benchmark comprising ~130K images generated by contemporary text-to-image models (Stable Diffusion 2.1, Stable Diffusion XL, Stable Diffusion 3, DALL-E 3, and Midjourney 6). VCT^2 includes two sets of prompts sourced from tweets by the New York Times Twitter account and captions from the MS COCO dataset. We also evaluate the performance of the aforementioned AGID techniques on the VCT$^2$ benchmark, highlighting their ineffectiveness in detecting AI-generated images. As image-generative AI models continue to evolve, the need for a quantifiable framework to evaluate these models becomes increasingly critical. To meet this need, we propose the Visual AI Index (V_AI), which assesses generated images from various visual perspectives, including texture complexity and object coherence, setting a new standard for evaluating image-generative AI models. To foster research in this domain, we make our https://huggingface.co/datasets/anonymous1233/COCO_AI and https://huggingface.co/datasets/anonymous1233/twitter_AI datasets publicly available.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark
Authors:
Hasnat Md Abdullah,
Tian Liu,
Kangda Wei,
Shu Kong,
Ruihong Huang
Abstract:
Localizing unusual activities, such as human errors or surveillance incidents, in videos holds practical significance. However, current video understanding models struggle with localizing these unusual events likely because of their insufficient representation in models' pretraining datasets. To explore foundation models' capability in localizing unusual activity, we introduce UAL-Bench, a compreh…
▽ More
Localizing unusual activities, such as human errors or surveillance incidents, in videos holds practical significance. However, current video understanding models struggle with localizing these unusual events likely because of their insufficient representation in models' pretraining datasets. To explore foundation models' capability in localizing unusual activity, we introduce UAL-Bench, a comprehensive benchmark for unusual activity localization, featuring three video datasets: UAG-OOPS, UAG-SSBD, UAG-FunQA, and an instruction-tune dataset: OOPS-UAG-Instruct, to improve model capabilities. UAL-Bench evaluates three approaches: Video-Language Models (Vid-LLMs), instruction-tuned Vid-LLMs, and a novel integration of Vision-Language Models and Large Language Models (VLM-LLM). Our results show the VLM-LLM approach excels in localizing short-span unusual events and predicting their onset (start time) more accurately than Vid-LLMs. We also propose a new metric, R@1, TD <= p, to address limitations in existing evaluation methods. Our findings highlight the challenges posed by long-duration videos, particularly in autism diagnosis scenarios, and the need for further advancements in localization techniques. Our work not only provides a benchmark for unusual activity localization but also outlines the key challenges for existing foundation models, suggesting future research directions on this important task.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
On the Encoding of Gender in Transformer-based ASR Representations
Authors:
Aravind Krishnan,
Badr M. Abdullah,
Dietrich Klakow
Abstract:
While existing literature relies on performance differences to uncover gender biases in ASR models, a deeper analysis is essential to understand how gender is encoded and utilized during transcript generation. This work investigates the encoding and utilization of gender in the latent representations of two transformer-based ASR models, Wav2Vec2 and HuBERT. Using linear erasure, we demonstrate the…
▽ More
While existing literature relies on performance differences to uncover gender biases in ASR models, a deeper analysis is essential to understand how gender is encoded and utilized during transcript generation. This work investigates the encoding and utilization of gender in the latent representations of two transformer-based ASR models, Wav2Vec2 and HuBERT. Using linear erasure, we demonstrate the feasibility of removing gender information from each layer of an ASR model and show that such an intervention has minimal impacts on the ASR performance. Additionally, our analysis reveals a concentration of gender information within the first and last frames in the final layers, explaining the ease of erasing gender in these layers. Our findings suggest the prospect of creating gender-neutral embeddings that can be integrated into ASR frameworks without compromising their efficacy.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation
Authors:
Muhammad Abdullah,
Anne Querfurth,
Deepak Bhatia,
Mahdi Mantash
Abstract:
This paper investigates the use of deep learning approaches to estimate the femur caput-collum-diaphyseal (CCD) angle from X-ray images. The CCD angle is an important measurement in the diagnosis of hip problems, and correct prediction can help in the planning of surgical procedures. Manual measurement of this angle, on the other hand, can be time-intensive and vulnerable to inter-observer variabi…
▽ More
This paper investigates the use of deep learning approaches to estimate the femur caput-collum-diaphyseal (CCD) angle from X-ray images. The CCD angle is an important measurement in the diagnosis of hip problems, and correct prediction can help in the planning of surgical procedures. Manual measurement of this angle, on the other hand, can be time-intensive and vulnerable to inter-observer variability. In this paper, we present a deep-learning algorithm that can reliably estimate the femur CCD angle from X-ray images. To train and test the performance of our model, we employed an X-ray image dataset with associated femur CCD angle measurements. Furthermore, we built a prototype to display the resulting predictions and to allow the user to interact with the predictions. As this is happening in a sterile setting during surgery, we expanded our interface to the possibility of being used only by voice commands.
Our results show that our deep learning model predicts the femur CCD angle on X-ray images with great accuracy, with a mean absolute error of 4.3 degrees on the left femur and 4.9 degrees on the right femur on the test dataset. Our results suggest that deep learning has the potential to give a more efficient and accurate technique for predicting the femur CCD angle, which might have substantial therapeutic implications for the diagnosis and management of hip problems.
△ Less
Submitted 26 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape
Authors:
Sifat Muhammad Abdullah,
Aravind Cheruvu,
Shravya Kanchi,
Taejoong Chung,
Peng Gao,
Murtuza Jadliwala,
Bimal Viswanath
Abstract:
Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developm…
▽ More
Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Rethinking Software Engineering in the Foundation Model Era: A Curated Catalogue of Challenges in the Development of Trustworthy FMware
Authors:
Ahmed E. Hassan,
Dayi Lin,
Gopi Krishnan Rajbahadur,
Keheliya Gallaba,
Filipe R. Cogo,
Boyuan Chen,
Haoxiang Zhang,
Kishanthan Thangarajah,
Gustavo Ansaldi Oliva,
Jiahuei Lin,
Wali Mohammad Abdullah,
Zhen Ming Jiang
Abstract:
Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized software development by enabling new use cases and business models. We refer to software built using FMs as FMware. The unique properties of FMware (e.g., prompts, agents, and the need for orchestration), coupled with the intrinsic limitations of FMs (e.g., hallucination) lead to a completely new set of software eng…
▽ More
Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized software development by enabling new use cases and business models. We refer to software built using FMs as FMware. The unique properties of FMware (e.g., prompts, agents, and the need for orchestration), coupled with the intrinsic limitations of FMs (e.g., hallucination) lead to a completely new set of software engineering challenges. Based on our industrial experience, we identified 10 key SE4FMware challenges that have caused enterprise FMware development to be unproductive, costly, and risky. In this paper, we discuss these challenges in detail and state the path for innovation that we envision. Next, we present FMArts, which is our long-term effort towards creating a cradle-to-grave platform for the engineering of trustworthy FMware. Finally, we (i) show how the unique properties of FMArts enabled us to design and develop a complex FMware for a large customer in a timely manner and (ii) discuss the lessons that we learned in doing so. We hope that the disclosure of the aforementioned challenges and our associated efforts to tackle them will not only raise awareness but also promote deeper and further discussions, knowledge sharing, and innovative solutions across the software engineering discipline.
△ Less
Submitted 3 March, 2024; v1 submitted 24 February, 2024;
originally announced February 2024.
-
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification
Authors:
Mohammed Maqsood Shaik,
Dietrich Klakow,
Badr M. Abdullah
Abstract:
Pre-trained Transformer-based speech models have shown striking performance when fine-tuned on various downstream tasks such as automatic speech recognition and spoken language identification (SLID). However, the problem of domain mismatch remains a challenge in this area, where the domain of the pre-training data might differ from that of the downstream labeled data used for fine-tuning. In multi…
▽ More
Pre-trained Transformer-based speech models have shown striking performance when fine-tuned on various downstream tasks such as automatic speech recognition and spoken language identification (SLID). However, the problem of domain mismatch remains a challenge in this area, where the domain of the pre-training data might differ from that of the downstream labeled data used for fine-tuning. In multilingual tasks such as SLID, the pre-trained speech model may not support all the languages in the downstream task. To address this challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the pre-trained model to the target domain and languages of the downstream task. We apply SAPT to the XLSR-128 model and investigate the effectiveness of this approach for the SLID task. First, we demonstrate that SAPT improves XLSR performance on the FLEURS benchmark with substantial gains up to 40.1% for under-represented languages. Second, we apply SAPT on four different datasets in a few-shot learning setting, showing that our approach improves the sample efficiency of XLSR during fine-tuning. Our experiments provide strong empirical evidence that continual adaptation via self-supervision improves downstream performance for multilingual speech models.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification
Authors:
S. M. Nabil Ashraf,
Md. Adyelullahil Mamun,
Hasnat Md. Abdullah,
Md. Golam Rabiul Alam
Abstract:
Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We co…
▽ More
Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We conducted experiments on the "ChestX-ray14" dataset using various pre-trained CNNs, transformers, hybrid(CNN+Transformer) models and classical models. The best individual model was the CoAtNet, which achieved an area under the receiver operating characteristic curve (AUROC) of 84.2%. By combining the predictions of all trained models using a weighted average ensemble where the weight of each model was determined using differential evolution, we further improved the AUROC to 85.4%, outperforming other state-of-the-art methods in this field. Our findings demonstrate the potential of deep learning techniques, particularly ensemble deep learning, for improving the accuracy of automatic diagnosis of thoracic diseases from chest X-rays. Code available at:https://github.com/syednabilashraf/SynthEnsemble
△ Less
Submitted 22 May, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder
Authors:
Rownak Ara Rasul,
Promy Saha,
Diponkor Bala,
S M Rakib Ul Karim,
Md. Ibrahim Abdullah,
Bishwajit Saha
Abstract:
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify cru…
▽ More
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process. We study eight state-of-the-art classification models to determine their effectiveness in ASD detection. We evaluate the models using accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), kappa, and log loss metrics to find the best classifier for these binary datasets. Among all the classification models, for the children dataset, the SVM and LR models achieve the highest accuracy of 100% and for the adult dataset, the LR model produces the highest accuracy of 97.14%. Our proposed ANN model provides the highest accuracy of 94.24% for the new combined dataset when hyperparameters are precisely tuned for each model. As almost all classification models achieve high accuracy which utilize true labels, we become interested in delving into five popular clustering algorithms to understand model behavior in scenarios without true labels. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI and ARI metrics while demonstrating comparability to the optimal SC achieved by k-means. The implemented code is available at GitHub.
△ Less
Submitted 28 December, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Ensemble-based modeling abstractions for modern self-optimizing systems
Authors:
Michal Töpfer,
Milad Abdullah,
Tomáš Bureš,
Petr Hnětynka,
Martin Kruliš
Abstract:
In this paper, we extend our ensemble-based component model DEECo with the capability to use machine-learning and optimization heuristics in establishing and reconfiguration of autonomic component ensembles. We show how to capture these concepts on the model level and give an example of how such a model can be beneficially used for modeling access-control related problem in the Industry 4.0 settin…
▽ More
In this paper, we extend our ensemble-based component model DEECo with the capability to use machine-learning and optimization heuristics in establishing and reconfiguration of autonomic component ensembles. We show how to capture these concepts on the model level and give an example of how such a model can be beneficially used for modeling access-control related problem in the Industry 4.0 settings. We argue that incorporating machine-learning and optimization heuristics is a key feature for modern smart systems which are to learn over the time and optimize their behavior at runtime to deal with uncertainty in their environment.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
A New Approach to Overcoming Zero Trade in Gravity Models to Avoid Indefinite Values in Linear Logarithmic Equations and Parameter Verification Using Machine Learning
Authors:
Mikrajuddin Abdullah
Abstract:
The presence of a high number of zero flow trades continues to provide a challenge in identifying gravity parameters to explain international trade using the gravity model. Linear regression with a logarithmic linear equation encounters an indefinite value on the logarithmic trade. Although several approaches to solving this problem have been proposed, the majority of them are no longer based on l…
▽ More
The presence of a high number of zero flow trades continues to provide a challenge in identifying gravity parameters to explain international trade using the gravity model. Linear regression with a logarithmic linear equation encounters an indefinite value on the logarithmic trade. Although several approaches to solving this problem have been proposed, the majority of them are no longer based on linear regression, making the process of finding solutions more complex. In this work, we suggest a two-step technique for determining the gravity parameters: first, perform linear regression locally to establish a dummy value to substitute trade flow zero, and then estimating the gravity parameters. Iterative techniques are used to determine the optimum parameters. Machine learning is used to test the estimated parameters by analyzing their position in the cluster. We calculated international trade figures for 2004, 2009, 2014, and 2019. We just examine the classic gravity equation and discover that the powers of GDP and distance are in the same cluster and are both worth roughly one. The strategy presented here can be used to solve other problems involving log-linear regression.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech
Authors:
Badr M. Abdullah,
Mohammed Maqsood Shaik,
Bernd Möbius,
Dietrich Klakow
Abstract:
Self-supervised representation learning for speech often involves a quantization step that transforms the acoustic input into discrete units. However, it remains unclear how to characterize the relationship between these discrete units and abstract phonetic categories such as phonemes. In this paper, we develop an information-theoretic framework whereby we represent each phonetic category as a dis…
▽ More
Self-supervised representation learning for speech often involves a quantization step that transforms the acoustic input into discrete units. However, it remains unclear how to characterize the relationship between these discrete units and abstract phonetic categories such as phonemes. In this paper, we develop an information-theoretic framework whereby we represent each phonetic category as a distribution over discrete units. We then apply our framework to two different self-supervised models (namely wav2vec 2.0 and XLSR) and use American English speech as a case study. Our study demonstrates that the entropy of phonetic distributions reflects the variability of the underlying speech sounds, with phonetically similar sounds exhibiting similar distributions. While our study confirms the lack of direct, one-to-one correspondence, we find an intriguing, indirect relationship between phonetic categories and discrete units.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
Affective social anthropomorphic intelligent system
Authors:
Md. Adyelullahil Mamun,
Hasnat Md. Abdullah,
Md. Golam Rabiul Alam,
Muhammad Mehedi Hassan,
Md. Zia Uddin
Abstract:
Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that ca…
▽ More
Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality. A voice style transfer method is also proposed to map the attributes of a specific emotion. Initially, the frequency domain data (Mel-Spectrogram) is created by converting the temporal audio wave data, which comprises discrete patterns for audio features such as notes, pitch, rhythm, and melody. A collateral CNN-Transformer-Encoder is used to predict seven different affective states from voice. The voice is also fed parallelly to the deep-speech, an RNN model that generates the text transcription from the spectrogram. Then the transcripted text is transferred to the multi-domain conversation agent using blended skill talk, transformer-based retrieve-and-generate generation strategy, and beam-search decoding, and an appropriate textual response is generated. The system learns an invertible mapping of data to a latent space that can be manipulated and generates a Mel-spectrogram frame based on previous Mel-spectrogram frames to voice synthesize and style transfer. Finally, the waveform is generated using WaveGlow from the spectrogram. The outcomes of the studies we conducted on individual models were auspicious. Furthermore, users who interacted with the system provided positive feedback, demonstrating the system's effectiveness.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Developing the Reliable Shallow Supervised Learning for Thermal Comfort using ASHRAE RP-884 and ASHRAE Global Thermal Comfort Database II
Authors:
Kanisius Karyono,
Badr M. Abdullah,
Alison J. Cotgrave,
Ana Bras,
Jeff Cullen
Abstract:
The artificial intelligence (AI) system designer for thermal comfort faces insufficient data recorded from the current user or overfitting due to unreliable training data. This work introduces the reliable data set for training the AI subsystem for thermal comfort. This paper presents the control algorithm based on shallow supervised learning, which is simple enough to be implemented in the Intern…
▽ More
The artificial intelligence (AI) system designer for thermal comfort faces insufficient data recorded from the current user or overfitting due to unreliable training data. This work introduces the reliable data set for training the AI subsystem for thermal comfort. This paper presents the control algorithm based on shallow supervised learning, which is simple enough to be implemented in the Internet of Things (IoT) system for residential usage using ASHRAE RP-884 and ASHRAE Global Thermal Comfort Database II. No training data for thermal comfort is available as reliable as this dataset, but the direct use of this data can lead to overfitting. This work offers the algorithm for data filtering and semantic data augmentation for the ASHRAE database for the supervised learning process. Overfitting always becomes a problem due to the psychological aspect involved in the thermal comfort decision. The method to check the AI system based on the psychrometric chart against overfitting is presented. This paper also assesses the most important parameters needed to achieve human thermal comfort. This method can support the development of reinforced learning for thermal comfort.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Multi objective Fitness Dependent Optimizer Algorithm
Authors:
Jaza M. Abdullah,
Tarik A. Rashid,
Bestan B. Maaroof,
Seyedali Mirjalili
Abstract:
This paper proposes the multi objective variant of the recently introduced fitness dependent optimizer (FDO). The algorithm is called a Multi objective Fitness Dependent Optimizer (MOFDO) and is equipped with all five types of knowledge (situational, normative, topographical, domain, and historical knowledge) as in FDO. MOFDO is tested on two standard benchmarks for the performance-proof purpose;…
▽ More
This paper proposes the multi objective variant of the recently introduced fitness dependent optimizer (FDO). The algorithm is called a Multi objective Fitness Dependent Optimizer (MOFDO) and is equipped with all five types of knowledge (situational, normative, topographical, domain, and historical knowledge) as in FDO. MOFDO is tested on two standard benchmarks for the performance-proof purpose; classical ZDT test functions, which is a widespread test suite that takes its name from its authors Zitzler, Deb, and Thiele, and on IEEE Congress of Evolutionary Computation benchmark (CEC 2019) multi modal multi objective functions. MOFDO results are compared to the latest variant of multi objective particle swarm optimization (MOPSO), non-dominated sorting genetic algorithm third improvement (NSGA-III), and multi objective dragonfly algorithm (MODA). The comparative study shows the superiority of MOFDO in most cases and comparative results in other cases. Moreover, MOFDO is used for optimizing real-world engineering problems (e.g., welded beam design problems). It is observed that the proposed algorithm successfully provides a wide variety of well-distributed feasible solutions, which enable the decision-makers to have more applicable-comfort choices to consider.
△ Less
Submitted 26 January, 2023;
originally announced February 2023.
-
Analyzing the Representational Geometry of Acoustic Word Embeddings
Authors:
Badr M. Abdullah,
Dietrich Klakow
Abstract:
Acoustic word embeddings (AWEs) are vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their use in speech technology applications such as spoken term discovery and keyword spotting, AWE models have been adopted as models of spoken-word processing in several cognitively motivated studies and have been shown to…
▽ More
Acoustic word embeddings (AWEs) are vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their use in speech technology applications such as spoken term discovery and keyword spotting, AWE models have been adopted as models of spoken-word processing in several cognitively motivated studies and have been shown to exhibit human-like performance in some auditory processing tasks. Nevertheless, the representational geometry of AWEs remains an under-explored topic that has not been studied in the literature. In this paper, we take a closer analytical look at AWEs learned from English speech and study how the choice of the learning objective and the architecture shapes their representational profile. To this end, we employ a set of analytic techniques from machine learning and neuroscience in three different analyses: embedding space uniformity, word discriminability, and representational consistency. Our main findings highlight the prominent role of the learning objective on shaping the representation profile compared to the model architecture.
△ Less
Submitted 8 January, 2023;
originally announced January 2023.
-
Huruf: An Application for Arabic Handwritten Character Recognition Using Deep Learning
Authors:
Minhaz Kamal,
Fairuz Shaiara,
Chowdhury Mohammad Abdullah,
Sabbir Ahmed,
Tasnim Ahmed,
Md. Hasanul Kabir
Abstract:
Handwriting Recognition has been a field of great interest in the Artificial Intelligence domain. Due to its broad use cases in real life, research has been conducted widely on it. Prominent work has been done in this field focusing mainly on Latin characters. However, the domain of Arabic handwritten character recognition is still relatively unexplored. The inherent cursive nature of the Arabic c…
▽ More
Handwriting Recognition has been a field of great interest in the Artificial Intelligence domain. Due to its broad use cases in real life, research has been conducted widely on it. Prominent work has been done in this field focusing mainly on Latin characters. However, the domain of Arabic handwritten character recognition is still relatively unexplored. The inherent cursive nature of the Arabic characters and variations in writing styles across individuals makes the task even more challenging. We identified some probable reasons behind this and proposed a lightweight Convolutional Neural Network-based architecture for recognizing Arabic characters and digits. The proposed pipeline consists of a total of 18 layers containing four layers each for convolution, pooling, batch normalization, dropout, and finally one Global average pooling and a Dense layer. Furthermore, we thoroughly investigated the different choices of hyperparameters such as the choice of the optimizer, kernel initializer, activation function, etc. Evaluating the proposed architecture on the publicly available 'Arabic Handwritten Character Dataset (AHCD)' and 'Modified Arabic handwritten digits Database (MadBase)' datasets, the proposed model respectively achieved an accuracy of 96.93% and 99.35% which is comparable to the state-of-the-art and makes it a suitable solution for real-life end-level applications.
△ Less
Submitted 24 December, 2022; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Deepfake Text Detection: Limitations and Opportunities
Authors:
Jiameng Pu,
Zain Sarwar,
Sifat Muhammad Abdullah,
Abdullah Rehman,
Yoonjin Kim,
Parantapa Bhattacharya,
Mobin Javed,
Bimal Viswanath
Abstract:
Recent advances in generative models for language have enabled the creation of convincing synthetic text or deepfake text. Prior work has demonstrated the potential for misuse of deepfake text to mislead content consumers. Therefore, deepfake text detection, the task of discriminating between human and machine-generated text, is becoming increasingly critical. Several defenses have been proposed f…
▽ More
Recent advances in generative models for language have enabled the creation of convincing synthetic text or deepfake text. Prior work has demonstrated the potential for misuse of deepfake text to mislead content consumers. Therefore, deepfake text detection, the task of discriminating between human and machine-generated text, is becoming increasingly critical. Several defenses have been proposed for deepfake text detection. However, we lack a thorough understanding of their real-world applicability. In this paper, we collect deepfake text from 4 online services powered by Transformer-based tools to evaluate the generalization ability of the defenses on content in the wild. We develop several low-cost adversarial attacks, and investigate the robustness of existing defenses against an adaptive attacker. We find that many defenses show significant degradation in performance under our evaluation scenarios compared to their original claimed performance. Our evaluation shows that tapping into the semantic information in the text content is a promising approach for improving the robustness and generalization performance of deepfake text detection schemes.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings
Authors:
Badr M. Abdullah,
Bernd Möbius,
Dietrich Klakow
Abstract:
Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks…
▽ More
Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.
△ Less
Submitted 18 September, 2022; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Harmony Search: Current Studies and Uses on Healthcare Systems
Authors:
Maryam T. Abdulkhaleq,
Tarik A. Rashid,
Abeer Alsadoon,
Bryar A. Hassan,
Mokhtar Mohammadi,
Jaza M. Abdullah,
Amit Chhabra,
Sazan L. Ali,
Rawshan N. Othman,
Hadil A. Hasan,
Sara Azad,
Naz A. Mahmood,
Sivan S. Abdalrahman,
Hezha O. Rasul,
Nebojsa Bacanin,
S. Vimal
Abstract:
One of the popular metaheuristic search algorithms is Harmony Search (HS). It has been verified that HS can find solutions to optimization problems due to its balanced exploratory and convergence behavior and its simple and flexible structure. This capability makes the algorithm preferable to be applied in several real-world applications in various fields, including healthcare systems, different e…
▽ More
One of the popular metaheuristic search algorithms is Harmony Search (HS). It has been verified that HS can find solutions to optimization problems due to its balanced exploratory and convergence behavior and its simple and flexible structure. This capability makes the algorithm preferable to be applied in several real-world applications in various fields, including healthcare systems, different engineering fields, and computer science. The popularity of HS urges us to provide a comprehensive survey of the literature on HS and its variants on health systems, analyze its strengths and weaknesses, and suggest future research directions. In this review paper, the current studies and uses of harmony search are studied in four main domains. (i) The variants of HS, including its modifications and hybridization. (ii) Summary of the previous review works. (iii) Applications of HS in healthcare systems. (iv) And finally, an operational framework is proposed for the applications of HS in healthcare systems. The main contribution of this review is intended to provide a thorough examination of HS in healthcare systems while also serving as a valuable resource for prospective scholars who want to investigate or implement this method.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Fitness Dependent Optimizer for IoT Healthcare using Adapted Parameters: A Case Study Implementation
Authors:
Aso M. Aladdin,
Jaza M. Abdullah,
Kazhan Othman Mohammed Salih,
Tarik A. Rashid,
Rafid Sagban,
Abeer Alsaddon,
Nebojsa Bacanin,
Amit Chhabra,
S. Vimal,
Indradip Banerjee
Abstract:
This discusses a case study on Fitness Dependent Optimizer or so-called FDO and adapting its parameters to the Internet of Things (IoT) healthcare. The reproductive way is sparked by the bee swarm and the collaborative decision-making of FDO. As opposed to the honey bee or artificial bee colony algorithms, this algorithm has no connection to them. In FDO, the search agent's position is updated usi…
▽ More
This discusses a case study on Fitness Dependent Optimizer or so-called FDO and adapting its parameters to the Internet of Things (IoT) healthcare. The reproductive way is sparked by the bee swarm and the collaborative decision-making of FDO. As opposed to the honey bee or artificial bee colony algorithms, this algorithm has no connection to them. In FDO, the search agent's position is updated using speed or velocity, but it's done differently. It creates weights based on the fitness function value of the problem, which assists lead the agents through the exploration and exploitation processes. Other algorithms are evaluated and compared to FDO as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) in the original work. The key current algorithms:The Salp-Swarm Algorithms (SSA), Dragonfly Algorithm (DA), and Whale Optimization Algorithm (WOA) have been evaluated against FDO in terms of their results. Using these FDO experimental findings, we may conclude that FDO outperforms the other techniques stated. There are two primary goals for this chapter: first, the implementation of FDO will be shown step-by-step so that readers can better comprehend the algorithm method and apply FDO to solve real-world applications quickly. The second issue deals with how to tweak the FDO settings to make the meta-heuristic evolutionary algorithm better in the IoT health service system at evaluating big quantities of information. Ultimately, the target of this chapter's enhancement is to adapt the IoT healthcare framework based on FDO to spawn effective IoT healthcare applications for reasoning out real-world optimization, aggregation, prediction, segmentation, and other technological problems.
△ Less
Submitted 18 May, 2022;
originally announced July 2022.
-
A Web-Based Tool for Comparative Process Mining
Authors:
Madhavi Bangalore Shankara Narayana,
Elisabetta Benevento,
Marco Pegoraro,
Muhammad Abdullah,
Rahim Bin Shahid,
Qasim Sajid,
Muhammad Usman Mansoor,
Wil M. P. van der Aalst
Abstract:
Process mining techniques enable the analysis of a wide variety of processes using event data. Among the available process mining techniques, most consider a single process perspective at a time-in the shape of a model or log. In this paper, we have developed a tool that can compare and visualize the same process under different constraints, allowing to analyze multiple aspects of the process. We…
▽ More
Process mining techniques enable the analysis of a wide variety of processes using event data. Among the available process mining techniques, most consider a single process perspective at a time-in the shape of a model or log. In this paper, we have developed a tool that can compare and visualize the same process under different constraints, allowing to analyze multiple aspects of the process. We describe the architecture, structure and use of the tool, and we provide an open-source full implementation.
△ Less
Submitted 4 April, 2022; v1 submitted 1 April, 2022;
originally announced April 2022.
-
How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings
Authors:
Badr M. Abdullah,
Iuliia Zaitova,
Tania Avgustinova,
Bernd Möbius,
Dietrich Klakow
Abstract:
How do neural networks "perceive" speech sounds from unknown languages? Does the typological similarity between the model's training language (L1) and an unknown language (L2) have an impact on the model representations of L2 speech signals? To answer these questions, we present a novel experimental design based on representational similarity analysis (RSA) to analyze acoustic word embeddings (AWE…
▽ More
How do neural networks "perceive" speech sounds from unknown languages? Does the typological similarity between the model's training language (L1) and an unknown language (L2) have an impact on the model representations of L2 speech signals? To answer these questions, we present a novel experimental design based on representational similarity analysis (RSA) to analyze acoustic word embeddings (AWEs) -- vector representations of variable-duration spoken-word segments. First, we train monolingual AWE models on seven Indo-European languages with various degrees of typological similarity. We then employ RSA to quantify the cross-lingual similarity by simulating native and non-native spoken-word processing using AWEs. Our experiments show that typological similarity indeed affects the representational similarity of the models in our study. We further discuss the implications of our work on modeling speech processing and language similarity with neural networks.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study
Authors:
Badr M. Abdullah,
Marius Mosbach,
Iuliia Zaitova,
Bernd Möbius,
Dietrich Klakow
Abstract:
Several variants of deep neural networks have been successfully employed for building parametric models that project variable-duration spoken word segments onto fixed-size vector representations, or acoustic word embeddings (AWEs). However, it remains unclear to what degree we can rely on the distance in the emerging AWE space as an estimate of word-form similarity. In this paper, we ask: does the…
▽ More
Several variants of deep neural networks have been successfully employed for building parametric models that project variable-duration spoken word segments onto fixed-size vector representations, or acoustic word embeddings (AWEs). However, it remains unclear to what degree we can rely on the distance in the emerging AWE space as an estimate of word-form similarity. In this paper, we ask: does the distance in the acoustic embedding space correlate with phonological dissimilarity? To answer this question, we empirically investigate the performance of supervised approaches for AWEs with different neural architectures and learning objectives. We train AWE models in controlled settings for two languages (German and Czech) and evaluate the embeddings on two tasks: word discrimination and phonological similarity. Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the word discrimination task does not necessarily yield models that better reflect word phonological similarity. Our findings highlight the necessity to rethink the current intrinsic evaluations for AWEs.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
SIGTYP 2021 Shared Task: Robust Spoken Language Identification
Authors:
Elizabeth Salesky,
Badr M. Abdullah,
Sabrina J. Mielke,
Elena Klyachko,
Oleg Serikov,
Edoardo Ponti,
Ritesh Kumar,
Ryan Cotterell,
Ekaterina Vylomova
Abstract:
While language identification is a fundamental speech and language processing task, for many languages and language families it remains a challenging task. For many low-resource and endangered languages this is in part due to resource availability: where larger datasets exist, they may be single-speaker or have different domains than desired application scenarios, demanding a need for domain and s…
▽ More
While language identification is a fundamental speech and language processing task, for many languages and language families it remains a challenging task. For many low-resource and endangered languages this is in part due to resource availability: where larger datasets exist, they may be single-speaker or have different domains than desired application scenarios, demanding a need for domain and speaker-invariant language identification systems. This year's shared task on robust spoken language identification sought to investigate just this scenario: systems were to be trained on largely single-speaker speech from one domain, but evaluated on data in other domains recorded from speakers under different recording circumstances, mimicking realistic low-resource scenarios. We see that domain and speaker mismatch proves very challenging for current methods which can perform above 95% accuracy in-domain, which domain adaptation can address to some degree, but that these conditions merit further investigation to make spoken language identification accessible in many scenarios.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
HEVC Watermarking Techniques for Authentication and Copyright Applications: Challenges and Opportunities
Authors:
Ali A. Elrowayati,
Mohamed A. Alrshah,
M. F. L. Abdullah,
Rohaya Latip
Abstract:
Recently, High-Efficiency Video Coding (HEVC/H.265) has been chosen to replace previous video coding standards, such as H.263 and H.264. Despite the efficiency of HEVC, it still lacks reliable and practical functionalities to support authentication and copyright applications. In order to provide this support, several watermarking techniques have been proposed by many researchers during the last fe…
▽ More
Recently, High-Efficiency Video Coding (HEVC/H.265) has been chosen to replace previous video coding standards, such as H.263 and H.264. Despite the efficiency of HEVC, it still lacks reliable and practical functionalities to support authentication and copyright applications. In order to provide this support, several watermarking techniques have been proposed by many researchers during the last few years. However, those techniques are still suffering from many issues that need to be considered for future designs. In this paper, a Systematic Literature Review (SLR) is introduced to identify HEVC challenges and potential research directions for interested researchers and developers. The time scope of this SLR covers all research articles published during the last six years starting from January 2014 up to the end of April 2020. Forty-two articles have met the criteria of selection out of 343 articles published in this area during the mentioned time scope. A new classification has been drawn followed by an identification of the challenges of implementing HEVC watermarking techniques based on the analysis and discussion of those chosen articles. Eventually, recommendations for HEVC watermarking techniques have been listed to help researchers to improve the existing techniques or to design new efficient ones.
△ Less
Submitted 14 February, 2021;
originally announced February 2021.
-
A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English
Authors:
Marius Mosbach,
Stefania Degaetano-Ortlieb,
Marie-Pauline Krielke,
Badr M. Abdullah,
Dietrich Klakow
Abstract:
Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as…
▽ More
Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a naturalistic dataset, probing shows that all three models indeed capture linguistic knowledge about grammaticality, achieving high performance. Evaluation on diagnostic cases and masked prediction tasks considering fine-grained linguistic knowledge, however, shows pronounced model-specific weaknesses especially on semantic knowledge, strongly impacting models' performance. Our results highlight the importance of (a)model comparison in evaluation task and (b) building up claims of model performance and the linguistic knowledge they capture beyond purely probing-based evaluations.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Classifying Eye-Tracking Data Using Saliency Maps
Authors:
Shafin Rahman,
Sejuti Rahman,
Omar Shahid,
Md. Tahmeed Abdullah,
Jubair Ahmed Sourov
Abstract:
A plethora of research in the literature shows how human eye fixation pattern varies depending on different factors, including genetics, age, social functioning, cognitive functioning, and so on. Analysis of these variations in visual attention has already elicited two potential research avenues: 1) determining the physiological or psychological state of the subject and 2) predicting the tasks ass…
▽ More
A plethora of research in the literature shows how human eye fixation pattern varies depending on different factors, including genetics, age, social functioning, cognitive functioning, and so on. Analysis of these variations in visual attention has already elicited two potential research avenues: 1) determining the physiological or psychological state of the subject and 2) predicting the tasks associated with the act of viewing from the recorded eye-fixation data. To this end, this paper proposes a visual saliency based novel feature extraction method for automatic and quantitative classification of eye-tracking data, which is applicable to both of the research directions. Instead of directly extracting features from the fixation data, this method employs several well-known computational models of visual attention to predict eye fixation locations as saliency maps. Comparing the saliency amplitudes, similarity and dissimilarity of saliency maps with the corresponding eye fixations maps gives an extra dimension of information which is effectively utilized to generate discriminative features to classify the eye-tracking data. Extensive experimentation using Saliency4ASD, Age Prediction, and Visual Perceptual Task dataset show that our saliency-based feature can achieve superior performance, outperforming the previous state-of-the-art methods by a considerable margin. Moreover, unlike the existing application-specific solutions, our method demonstrates performance improvement across three distinct problems from the real-life domain: Autism Spectrum Disorder screening, toddler age prediction, and human visual perceptual task classification, providing a general paradigm that utilizes the extra-information inherent in saliency maps for a more accurate classification.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.
-
Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification
Authors:
Badr M. Abdullah,
Jacek Kudera,
Tania Avgustinova,
Bernd Möbius,
Dietrich Klakow
Abstract:
Deep neural networks have been employed for various spoken language recognition tasks, including tasks that are multilingual by definition such as spoken language identification. In this paper, we present a neural model for Slavic language identification in speech signals and analyze its emergent representations to investigate whether they reflect objective measures of language relatedness and/or…
▽ More
Deep neural networks have been employed for various spoken language recognition tasks, including tasks that are multilingual by definition such as spoken language identification. In this paper, we present a neural model for Slavic language identification in speech signals and analyze its emergent representations to investigate whether they reflect objective measures of language relatedness and/or non-linguists' perception of language similarity. While our analysis shows that the language representation space indeed captures language relatedness to a great extent, we find perceptual confusability between languages in our study to be the best predictor of the language representation similarity.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
The 2ST-UNet for Pneumothorax Segmentation in Chest X-Rays using ResNet34 as a Backbone for U-Net
Authors:
Ayat Abedalla,
Malak Abdullah,
Mahmoud Al-Ayyoub,
Elhadj Benkhelifa
Abstract:
Pneumothorax, also called a collapsed lung, refers to the presence of the air in the pleural space between the lung and chest wall. It can be small (no need for treatment), or large and causes death if it is not identified and treated on time. It is easily seen and identified by experts using a chest X-ray. Although this method is mostly error-free, it is time-consuming and needs expert radiologis…
▽ More
Pneumothorax, also called a collapsed lung, refers to the presence of the air in the pleural space between the lung and chest wall. It can be small (no need for treatment), or large and causes death if it is not identified and treated on time. It is easily seen and identified by experts using a chest X-ray. Although this method is mostly error-free, it is time-consuming and needs expert radiologists. Recently, Computer Vision has been providing great assistance in detecting and segmenting pneumothorax. In this paper, we propose a 2-Stage Training system (2ST-UNet) to segment images with pneumothorax. This system is built based on U-Net with Residual Networks (ResNet-34) backbone that is pre-trained on the ImageNet dataset. We start with training the network at a lower resolution before we load the trained model weights to retrain the network with a higher resolution. Moreover, we utilize different techniques including Stochastic Weight Averaging (SWA), data augmentation, and Test-Time Augmentation (TTA). We use the chest X-ray dataset that is provided by the 2019 SIIM-ACR Pneumothorax Segmentation Challenge, which contains 12,047 training images and 3,205 testing images. Our experiments show that 2-Stage Training leads to better and faster network convergence. Our method achieves 0.8356 mean Dice Similarity Coefficient (DSC) placing it among the top 9% of models with a rank of 124 out of 1,475.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages
Authors:
Badr M. Abdullah,
Tania Avgustinova,
Bernd Möbius,
Dietrich Klakow
Abstract:
State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with differ…
▽ More
State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with different acoustic conditions due to domain shift. In this paper, we present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains (read speech and radio broadcast) and examine two low-level signal descriptors (spectral and cepstral features) for this task. Our experiments show that (1) out-of-domain speech samples severely hinder the performance of neural LID models, and (2) while both spectral and cepstral features show comparable performance within-domain, spectral features show more robustness under domain mismatch. Moreover, we apply unsupervised domain adaptation to minimize the discrepancy between the two domains in our study. We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.
△ Less
Submitted 6 August, 2020; v1 submitted 2 August, 2020;
originally announced August 2020.
-
SAMBA: Safe Model-Based & Active Reinforcement Learning
Authors:
Alexander I. Cowen-Rivers,
Daniel Palenicek,
Vincent Moens,
Mohammed Abdullah,
Aivar Sootla,
Jun Wang,
Haitham Ammar
Abstract:
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We…
▽ More
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Exploiting ML algorithms for Efficient Detection and Prevention of JavaScript-XSS Attacks in Android Based Hybrid Applications
Authors:
Usama Khalid,
Muhammad Abdullah,
Kashif Inayat
Abstract:
The development and analysis of mobile applications in term of security have become an active research area from many years as many apps are vulnerable to different attacks. Especially the concept of hybrid applications has emerged in the last three years where applications are developed in both native and web languages because the use of web languages raises certain security risks in hybrid mobil…
▽ More
The development and analysis of mobile applications in term of security have become an active research area from many years as many apps are vulnerable to different attacks. Especially the concept of hybrid applications has emerged in the last three years where applications are developed in both native and web languages because the use of web languages raises certain security risks in hybrid mobile applications as it creates possible channels where malicious code can be injected inside the application. WebView is an important component in hybrid mobile applications which used to implements a sandbox mechanism to protect the local resources of smartphone devices from un-authorized access of JavaScript. However, the WebView application program interfaces (APIs) also have security issues. For example, an attacker can attack the hybrid application via JavaScript code by bypassing the sandbox security through accessing the public methods of the applications. Cross-site scripting (XSS) is one of the most popular malicious code injection technique for accessing the public methods of the application through JavaScript. This research proposes a framework for detection and prevention of XSS attacks in hybrid applications using state-of-the-art machine learning (ML) algorithms. The detection of the attacks have been perform by exploiting the registered Java object features. The dataset and the sample hybrid applications have been developed using the android studio. Then the widely used toolkit, RapidMiner, has been used for empirical analysis. The results reveal that the ensemble based Random Forest algorithm outperforms other algorithms and achieves both the accuracy and F-measures as high as of 99%.
△ Less
Submitted 30 July, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Robust Baggage Detection and Classification Based on Local Tri-directional Pattern
Authors:
Shahbano,
Muhammad Abdullah,
Kashif Inayat
Abstract:
In recent decades, the automatic video surveillance system has gained significant importance in computer vision community. The crucial objective of surveillance is monitoring and security in public places. In the traditional Local Binary Pattern, the feature description is somehow inaccurate, and the feature size is large enough. Therefore, to overcome these shortcomings, our research proposed a d…
▽ More
In recent decades, the automatic video surveillance system has gained significant importance in computer vision community. The crucial objective of surveillance is monitoring and security in public places. In the traditional Local Binary Pattern, the feature description is somehow inaccurate, and the feature size is large enough. Therefore, to overcome these shortcomings, our research proposed a detection algorithm for a human with or without carrying baggage. The Local tri-directional pattern descriptor is exhibited to extract features of different human body parts including head, trunk, and limbs. Then with the help of support vector machine, extracted features are trained and evaluated. Experimental results on INRIA and MSMT17 V1 datasets show that LtriDP outperforms several state-of-the-art feature descriptors and validate its effectiveness.
△ Less
Submitted 31 January, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
SmartCoAuth: Smart-Contract privacy preservation mechanism on querying sensitive records in the cloud
Authors:
Muhammed Siraj,
Mohd. Izuan Hafez Hj. Ninggal,
Nur Izura Udzir,
Muhammad Daniel Hafiz Abdullah,
Aziah Asmawi
Abstract:
Sensitive records stored in the cloud such as healthcare records, private conversation and credit card information are targets of hackers and privacy abuse. Current information and record management systems have difficulties achieving privacy protection of such sensitive records in a secure, transparent, decentralized and trustless environment. The Blockchain technology is a nascent and a promisin…
▽ More
Sensitive records stored in the cloud such as healthcare records, private conversation and credit card information are targets of hackers and privacy abuse. Current information and record management systems have difficulties achieving privacy protection of such sensitive records in a secure, transparent, decentralized and trustless environment. The Blockchain technology is a nascent and a promising technology that facilitates data sharing and access in a secure, decentralized and trustless environment. The technology enables the use of smart contracts that can be leveraged to complement existing traditional systems to achieve security objectives that were never possible before. In this paper, we propose a framework based on Blockchain technology to enable privacy-preservation in a secured, decentralized, transparent and trustless environment. We name our framework SmartCoAuth. It is based on Ethereum Smart Contract functions as the secure, decentralized, transparent authentication and authorization mechanism in the framework. It also enables tamper-proof auditing of access to the protected records. We analysed how SmartCoAuth could be integrated into a cloud application to provide reliable privacy-preservation among stakeholders of healthcare records stored in the cloud. The proposed framework provides a satisfactory level of data utility and privacy preservation.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Convolutional neural networks model improvements using demographics and image processing filters on chest x-rays
Authors:
Mir Muhammad Abdullah,
Mir Muhammad Abdur Rahman,
Mir Mohammed Assadullah
Abstract:
Purpose: The purpose of this study was to observe change in accuracies of convolutional neural networks (CNN) models (ratio of correct classifications to total predictions) on thoracic radiological images by creating different binary classification models based on age, gender, and image pre-processing filters on 14 pathologies.
Methodology: This is a quantitative research exploring variation in…
▽ More
Purpose: The purpose of this study was to observe change in accuracies of convolutional neural networks (CNN) models (ratio of correct classifications to total predictions) on thoracic radiological images by creating different binary classification models based on age, gender, and image pre-processing filters on 14 pathologies.
Methodology: This is a quantitative research exploring variation in CNN model accuracies. Radiological thoracic images were divided by age and gender and pre-processed by various image processing filters.
Findings: We found partial support for enhancement to model accuracies by segregating modeling images by age and gender and applying image processing filters even though image processing filters are sometimes thought of as information filters.
Research limitations: This study may be biased because it is based on radiological images by another research that tagged the images using an automated process that was not checked by a human.
Practical implications: Researchers may want to focus on creating models segregated by demographics and pre-process the modeling images using image processing filters. Practitioners developing assistive technologies for thoracic diagnoses may benefit from incorporating demographics and employing multiple models simultaneously with varying statistical likelihood.
Originality/value: This study uses demographics in model creation and utilizes image processing filters to improve model performance.
Keywords: Convolutional Neural Network (CNN), Chest X-Ray, ChestX-ray14, Lung, Atelectasis, Cardiomegaly, Consolidation, Edema, Effusion, Emphysema, Infiltration, Mass, Nodule, Pleural Thickening, Pneumonia, Pneumathorax
△ Less
Submitted 30 November, 2019;
originally announced December 2019.
-
Wasserstein Robust Reinforcement Learning
Authors:
Mohammed Amin Abdullah,
Hang Ren,
Haitham Bou Ammar,
Vladimir Milenkovic,
Rui Luo,
Mingtian Zhang,
Jun Wang
Abstract:
Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $\text{W}\text{R}^{2}\text{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a…
▽ More
Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $\text{W}\text{R}^{2}\text{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments.
△ Less
Submitted 16 September, 2019; v1 submitted 30 July, 2019;
originally announced July 2019.
-
Optimal Downlink Transmission for Cell Free SWIPT Massive MIMO Systems with Active Eavesdropping
Authors:
Mahmoud Alageli,
Aissa Ikhlef,
Fahad Alsifiany,
Mohammed A. M. Abdullah,
Gaojie Chen,
Jonathon Chambers
Abstract:
This paper considers secure simultaneous wireless information and power transfer (SWIPT) in cell-free massive multiple-input multiple-output (MIMO) systems. The system consists of a large number of randomly (Poisson-distributed) located access points (APs) serving multiple information users (IUs) and an information-untrusted dual-antenna active energy harvester (EH). The active EH uses one antenna…
▽ More
This paper considers secure simultaneous wireless information and power transfer (SWIPT) in cell-free massive multiple-input multiple-output (MIMO) systems. The system consists of a large number of randomly (Poisson-distributed) located access points (APs) serving multiple information users (IUs) and an information-untrusted dual-antenna active energy harvester (EH). The active EH uses one antenna to legitimately harvest energy and the other antenna to eavesdrop information. The APs are networked by a centralized infinite backhaul which allows the APs to synchronize and cooperate via a central processing unit (CPU). Closed-form expressions for the average harvested energy (AHE) and a tight lower bound on the ergodic secrecy rate (ESR) are derived. The obtained lower bound on the ESR takes into account the IUs' knowledge attained by downlink effective precoded-channel training. Since the transmit power constraint is per AP, the ESR is nonlinear in terms of the transmit power elements of the APs and that imposes new challenges in formulating a convex power control problem for the downlink transmission. To deal with these nonlinearities, a new method of balancing the transmit power among the APs via relaxed semidefinite programming (SDP) which is proved to be rank-one globally optimal is derived. A fair comparison between the proposed cell-free and the colocated massive MIMO systems shows that the cell-free MIMO outperforms the colocated MIMO over the interval in which the AHE constraint is low and vice versa. Also, the cell-free MIMO is found to be more immune to the increase in the active eavesdropping power than the colocated MIMO.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.