-
Pneumatic Multi-mode Silicone Actuator with Pressure, Vibration, and Cold Thermal Feedback
Authors:
Mohammad Shadman Hashem,
Ahsan Raza,
Sama E Shan,
Seokhee Jeon
Abstract:
A wide range of haptic feedback is crucial for achieving high realism and immersion in virtual environments. Therefore, a multi-modal haptic interface that provides various haptic signals simultaneously is highly beneficial. This paper introduces a novel silicone fingertip actuator that is pneumatically actuated, delivering a realistic and effective haptic experience by simultaneously providing pr…
▽ More
A wide range of haptic feedback is crucial for achieving high realism and immersion in virtual environments. Therefore, a multi-modal haptic interface that provides various haptic signals simultaneously is highly beneficial. This paper introduces a novel silicone fingertip actuator that is pneumatically actuated, delivering a realistic and effective haptic experience by simultaneously providing pressure, vibrotactile, and cold thermal feedback. The actuator features a design with multiple air chambers, each with controllable volume achieved through pneumatic valves connected to compressed air tanks. The lower air chamber generates pressure feedback, while the upper chamber produces vibrotactile feedback. In addition, two integrated lateral air nozzles create a cold thermal sensation. To showcase the system's capabilities, we designed two unique 3D surfaces in the virtual environment: a frozen meat surface and an abrasive icy surface. These surfaces simulate tactile perceptions of coldness, pressure, and texture. Comprehensive performance assessments and user studies were conducted to validate the actuator's effectiveness, highlighting its diverse feedback capabilities compared to traditional actuators that offer only single feedback modalities.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
From Traditional to Deep Learning Approaches in Whole Slide Image Registration: A Methodological Review
Authors:
Behnaz Elhaminia,
Abdullah Alsalemi,
Esha Nasir,
Mostafa Jahanifar,
Ruqayya Awan,
Lawrence S. Young,
Nasir M. Rajpoot,
Fayyaz Minhas,
Shan E Ahmed Raza
Abstract:
Whole slide image (WSI) registration is an essential task for analysing the tumour microenvironment (TME) in histopathology. It involves the alignment of spatial information between WSIs of the same section or serial sections of a tissue sample. The tissue sections are usually stained with single or multiple biomarkers before imaging, and the goal is to identify neighbouring nuclei along the Z-axi…
▽ More
Whole slide image (WSI) registration is an essential task for analysing the tumour microenvironment (TME) in histopathology. It involves the alignment of spatial information between WSIs of the same section or serial sections of a tissue sample. The tissue sections are usually stained with single or multiple biomarkers before imaging, and the goal is to identify neighbouring nuclei along the Z-axis for creating a 3D image or identifying subclasses of cells in the TME. This task is considerably more challenging compared to radiology image registration, such as magnetic resonance imaging or computed tomography, due to various factors. These include gigapixel size of images, variations in appearance between differently stained tissues, changes in structure and morphology between non-consecutive sections, and the presence of artefacts, tears, and deformations. Currently, there is a noticeable gap in the literature regarding a review of the current approaches and their limitations, as well as the challenges and opportunities they present. We aim to provide a comprehensive understanding of the available approaches and their application for various purposes. Furthermore, we investigate current deep learning methods used for WSI registration, emphasising their diverse methodologies. We examine the available datasets and explore tools and software employed in the field. Finally, we identify open challenges and potential future trends in this area of research.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Deep Learning Based Segmentation of Blood Vessels from H&E Stained Oesophageal Adenocarcinoma Whole-Slide Images
Authors:
Jiaqi Lv,
Stefan S Antonowicz,
Shan E Ahmed Raza
Abstract:
Blood vessels (BVs) play a critical role in the Tumor Micro-Environment (TME), potentially influencing cancer progression and treatment response. However, manually quantifying BVs in Hematoxylin and Eosin (H&E) stained images is challenging and labor-intensive due to their heterogeneous appearances. We propose a novel approach of constructing guiding maps to improve the performance of state-of-the…
▽ More
Blood vessels (BVs) play a critical role in the Tumor Micro-Environment (TME), potentially influencing cancer progression and treatment response. However, manually quantifying BVs in Hematoxylin and Eosin (H&E) stained images is challenging and labor-intensive due to their heterogeneous appearances. We propose a novel approach of constructing guiding maps to improve the performance of state-of-the-art segmentation models for BV segmentation, the guiding maps encourage the models to learn representative features of BVs. This is particularly beneficial for computational pathology, where labeled training data is often limited and large models are prone to overfitting. We have quantitative and qualitative results to demonstrate the efficacy of our approach in improving segmentation accuracy. In future, we plan to validate this method to segment BVs across various tissue types and investigate the role of cellular structures in relation to BVs in the TME.
△ Less
Submitted 24 January, 2025; v1 submitted 21 January, 2025;
originally announced January 2025.
-
A Lightweight and Interpretable Deepfakes Detection Framework
Authors:
Muhammad Umar Farooq,
Ali Javed,
Khalid Mahmood Malik,
Muhammad Anas Raza
Abstract:
The recent realistic creation and dissemination of so-called deepfakes poses a serious threat to social life, civil rest, and law. Celebrity defaming, election manipulation, and deepfakes as evidence in court of law are few potential consequences of deepfakes. The availability of open source trained models based on modern frameworks such as PyTorch or TensorFlow, video manipulations Apps such as F…
▽ More
The recent realistic creation and dissemination of so-called deepfakes poses a serious threat to social life, civil rest, and law. Celebrity defaming, election manipulation, and deepfakes as evidence in court of law are few potential consequences of deepfakes. The availability of open source trained models based on modern frameworks such as PyTorch or TensorFlow, video manipulations Apps such as FaceApp and REFACE, and economical computing infrastructure has easen the creation of deepfakes. Most of the existing detectors focus on detecting either face-swap, lip-sync, or puppet master deepfakes, but a unified framework to detect all three types of deepfakes is hardly explored. This paper presents a unified framework that exploits the power of proposed feature fusion of hybrid facial landmarks and our novel heart rate features for detection of all types of deepfakes. We propose novel heart rate features and fused them with the facial landmark features to better extract the facial artifacts of fake videos and natural variations available in the original videos. We used these features to train a light-weight XGBoost to classify between the deepfake and bonafide videos. We evaluated the performance of our framework on the world leaders dataset (WLDR) that contains all types of deepfakes. Experimental results illustrate that the proposed framework offers superior detection performance over the comparative deepfakes detection methods. Performance comparison of our framework against the LSTM-FCN, a candidate of deep learning model, shows that proposed model achieves similar results, however, it is more interpretable.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
CellOMaps: A Compact Representation for Robust Classification of Lung Adenocarcinoma Growth Patterns
Authors:
Arwa Al-Rubaian,
Gozde N. Gunesli,
Wajd A. Althakfi,
Ayesha Azam,
David Snead,
Nasir M. Rajpoot,
Shan E Ahmed Raza
Abstract:
Lung adenocarcinoma (LUAD) is a morphologically heterogeneous disease, characterized by five primary histological growth patterns. The classification of such patterns is crucial due to their direct relation to prognosis but the high subjectivity and observer variability pose a major challenge. Although several studies have developed machine learning methods for growth pattern classification, they…
▽ More
Lung adenocarcinoma (LUAD) is a morphologically heterogeneous disease, characterized by five primary histological growth patterns. The classification of such patterns is crucial due to their direct relation to prognosis but the high subjectivity and observer variability pose a major challenge. Although several studies have developed machine learning methods for growth pattern classification, they either only report the predominant pattern per slide or lack proper evaluation. We propose a generalizable machine learning pipeline capable of classifying lung tissue into one of the five patterns or as non-tumor. The proposed pipeline's strength lies in a novel compact Cell Organization Maps (cellOMaps) representation that captures the cellular spatial patterns from Hematoxylin and Eosin whole slide images (WSIs). The proposed pipeline provides state-of-the-art performance on LUAD growth pattern classification when evaluated on both internal unseen slides and external datasets, significantly outperforming the current approaches. In addition, our preliminary results show that the model's outputs can be used to predict patients Tumor Mutational Burden (TMB) levels.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
Quantifying Haptic Affection of Car Door through Data-Driven Analysis of Force Profile
Authors:
Mudassir Ibrahim Awan,
Ahsan Raza,
Waseem Hassan,
Ki-Uk Kyung,
Seokhee Jeon
Abstract:
Haptic affection plays a crucial role in user experience, particularly in the automotive industry where the tactile quality of components can influence customer satisfaction. This study aims to accurately predict the affective property of a car door by only watching the force or torque profile of it when opening. To this end, a deep learning model is designed to capture the underlying relationship…
▽ More
Haptic affection plays a crucial role in user experience, particularly in the automotive industry where the tactile quality of components can influence customer satisfaction. This study aims to accurately predict the affective property of a car door by only watching the force or torque profile of it when opening. To this end, a deep learning model is designed to capture the underlying relationships between force profiles and user-defined adjective ratings, providing insights into the door-opening experience. The dataset employed in this research includes force profiles and user adjective ratings collected from six distinct car models, reflecting a diverse set of door-opening characteristics and tactile feedback. The model's performance is assessed using Leave-One-Out Cross-Validation, a method that measures its generalization capability on unseen data. The results demonstrate that the proposed model achieves a high level of prediction accuracy, indicating its potential in various applications related to haptic affection and design optimization in the automotive industry.
△ Less
Submitted 22 May, 2025; v1 submitted 18 November, 2024;
originally announced November 2024.
-
Pneumatically Controlled Tactile Actuating Modules for Enhanced VR Safety Training
Authors:
Ahsan Raza,
Seokhee Jeon,
Mohammad Shadman Hashem
Abstract:
Our system introduces a modularized pneumatic actuating unit capable of delivering vibration, pressure, and impact feedback. Designed for adaptability, these modular tactile actuating units can be rapidly customized and reconfigured to suit a wide range of virtual reality (VR) scenarios, with a particular emphasis on safety training applications. This flexibility is demonstrated through scenarios…
▽ More
Our system introduces a modularized pneumatic actuating unit capable of delivering vibration, pressure, and impact feedback. Designed for adaptability, these modular tactile actuating units can be rapidly customized and reconfigured to suit a wide range of virtual reality (VR) scenarios, with a particular emphasis on safety training applications. This flexibility is demonstrated through scenarios such as using construction tools in a virtual environment and simulating safety protocols against falling objects. Innovative mounting solutions securely attach the actuators to various body sites, ensuring both comfort and stability during use. Our approach enables seamless integration into diverse VR safety training programs, enhancing the realism and effectiveness of simulations with precise and reliable haptic feedback.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Silicone-made Tactile Actuator Integrated with Hot Thermo-fiber Finger Sleeve
Authors:
Mohammad Shadman Hashem,
Ahsan Raza,
Seokhee Jeon
Abstract:
Multi-mode haptic feedback is essential to achieve high realism and immersion in virtual environments. This paper proposed a novel silicone fingertip actuator integrated with a hot thermal fabric finger sleeve to render pressure, vibration, and hot thermal feedback simultaneously. The actuator is pneumatically actuated to render a realistic and effective tactile experience in accordance with hot t…
▽ More
Multi-mode haptic feedback is essential to achieve high realism and immersion in virtual environments. This paper proposed a novel silicone fingertip actuator integrated with a hot thermal fabric finger sleeve to render pressure, vibration, and hot thermal feedback simultaneously. The actuator is pneumatically actuated to render a realistic and effective tactile experience in accordance with hot thermal sensation. The silicone actuator, with two air chambers controlled by pneumatic valves connected to compressed air tanks. Simultaneously, a PWM signal from a microcontroller regulates the temperature of the thermal fabric sleeve, enhancing overall system functionality. The lower chamber of the silicone actuator is responsible for pressure feedback, whereas the upper chamber is devoted to vibrotactile feedback. The conductive yarn or thread was utilized to spread the thermal feedback actuation points on the thermal fabric's surface. To demonstrate the actuator's capability, a VR environment consisting of a bowl of liquid and a stove with fire was designed. Based on different functionalities the scenario can simulate the tactile perception of pressure, vibration, and temperature simultaneously or consecutively.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
A Graph-Based Model for Vehicle-Centric Data Sharing Ecosystem
Authors:
Haiyue Yuan,
Ali Raza,
Nikolay Matyunin,
Jibesh Patra,
Shujun Li
Abstract:
The development of technologies has prompted a paradigm shift in the automotive industry, with an increasing focus on connected services and autonomous driving capabilities. This transformation allows vehicles to collect and share vast amounts of vehicle-specific and personal data. While these technological advancements offer enhanced user experiences, they also raise privacy concerns. To understa…
▽ More
The development of technologies has prompted a paradigm shift in the automotive industry, with an increasing focus on connected services and autonomous driving capabilities. This transformation allows vehicles to collect and share vast amounts of vehicle-specific and personal data. While these technological advancements offer enhanced user experiences, they also raise privacy concerns. To understand the ecosystem of data collection and sharing in modern vehicles, we adopted the ontology 101 methodology to incorporate information extracted from different sources, including analysis of privacy policies using GPT-4, a small-scale systematic literature review, and an existing ontology, to develop a high-level conceptual graph-based model, aiming to get insights into how modern vehicles handle data exchange among different parties. This serves as a foundational model with the flexibility and scalability to further expand for modelling and analysing data sharing practices across diverse contexts. Two realistic examples were developed to demonstrate the usefulness and effectiveness of discovering insights into privacy regarding vehicle-related data sharing. We also recommend several future research directions, such as exploring advanced ontology languages for reasoning tasks, supporting topological analysis for discovering data privacy risks/concerns, and developing useful tools for comparative analysis, to strengthen the understanding of the vehicle-centric data sharing ecosystem.
△ Less
Submitted 21 April, 2025; v1 submitted 30 October, 2024;
originally announced October 2024.
-
With a Grain of SALT: Are LLMs Fair Across Social Dimensions?
Authors:
Samee Arif,
Zohaib Khan,
Maaidah Kaleem,
Suhaib Rashid,
Agha Ali Raza,
Awais Athar
Abstract:
This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs), across gender, religion, and race. Our study evaluates bias in smaller-scale Llama and Gemma models using the SALT ($\textbf{S}$ocial $\textbf{A}$ppropriateness in $\textbf{L}$LM-Generated $\textbf{T}$ext) dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Caree…
▽ More
This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs), across gender, religion, and race. Our study evaluates bias in smaller-scale Llama and Gemma models using the SALT ($\textbf{S}$ocial $\textbf{A}$ppropriateness in $\textbf{L}$LM-Generated $\textbf{T}$ext) dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation. To quantify bias, we measure win rates in General Debate and the assignment of negative roles in Positioned Debate. For real-world use cases, such as Career Advice, Problem Solving, and CV Generation, we anonymize the outputs to remove explicit demographic identifiers and use DeepSeek-R1 as an automated evaluator. We also address inherent biases in LLM-based evaluation, including evaluation bias, positional bias, and length bias, and validate our results through human evaluations. Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment. By introducing SALT, we provide a comprehensive benchmark for bias analysis and underscore the need for robust bias mitigation strategies in the development of equitable AI systems.
△ Less
Submitted 18 February, 2025; v1 submitted 16 October, 2024;
originally announced October 2024.
-
Language Model-Driven Data Pruning Enables Efficient Active Learning
Authors:
Abdul Hameed Azeemi,
Ihsan Ayyub Qazi,
Agha Ali Raza
Abstract:
Active learning (AL) optimizes data labeling efficiency by selecting the most informative instances for annotation. A key component in this procedure is an acquisition function that guides the selection process and identifies the suitable instances for labeling from the unlabeled pool. However, these acquisition methods suffer from high computational costs with large unlabeled data pools, posing a…
▽ More
Active learning (AL) optimizes data labeling efficiency by selecting the most informative instances for annotation. A key component in this procedure is an acquisition function that guides the selection process and identifies the suitable instances for labeling from the unlabeled pool. However, these acquisition methods suffer from high computational costs with large unlabeled data pools, posing a roadblock to their applicability on large datasets. To address this challenge and bridge this gap, we introduce a novel plug-and-play unlabeled data pruning strategy, ActivePrune, which leverages language models to prune the unlabeled pool. ActivePrune implements a two-stage pruning process: an initial fast evaluation using perplexity scores from an n-gram language model, followed by a high-quality selection using metrics for data quality computed through a quantized LLM. Additionally, to enhance the diversity in the unlabeled pool, we propose a novel perplexity reweighting method that systematically brings forward underrepresented instances for selection in subsequent labeling iterations. Experiments on translation, sentiment analysis, topic classification, and summarization tasks on four diverse datasets and four active learning strategies demonstrate that ActivePrune outperforms existing data pruning methods. Finally, we compare the selection quality $\leftrightarrow$ efficiency tradeoff of the data pruning methods and demonstrate that ActivePrune is computationally more efficient than other LLM score-based pruning methods, and provides up to 74% reduction in the end-to-end time required for active learning.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives
Authors:
Samee Arif,
Taimoor Arif,
Muhammad Saad Haroon,
Aamina Jamal Khan,
Agha Ali Raza,
Awais Athar
Abstract:
This paper introduces the concept of an education tool that utilizes Generative Artificial Intelligence (GenAI) to enhance storytelling for children. The system combines GenAI-driven narrative co-creation, text-to-speech conversion, and text-to-video generation to produce an engaging experience for learners. We describe the co-creation process, the adaptation of narratives into spoken words using…
▽ More
This paper introduces the concept of an education tool that utilizes Generative Artificial Intelligence (GenAI) to enhance storytelling for children. The system combines GenAI-driven narrative co-creation, text-to-speech conversion, and text-to-video generation to produce an engaging experience for learners. We describe the co-creation process, the adaptation of narratives into spoken words using text-to-speech models, and the transformation of these narratives into contextually relevant visuals through text-to-video technology. Our evaluation covers the linguistics of the generated stories, the text-to-speech conversion quality, and the accuracy of the generated visuals.
△ Less
Submitted 18 February, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
WER We Stand: Benchmarking Urdu ASR Models
Authors:
Samee Arif,
Sualeha Farid,
Aamina Jamal Khan,
Mustafa Abbas,
Agha Ali Raza,
Awais Athar
Abstract:
This paper presents a comprehensive evaluation of Urdu Automatic Speech Recognition (ASR) models. We analyze the performance of three ASR model families: Whisper, MMS, and Seamless-M4T using Word Error Rate (WER), along with a detailed examination of the most frequent wrong words and error types including insertions, deletions, and substitutions. Our analysis is conducted using two types of datase…
▽ More
This paper presents a comprehensive evaluation of Urdu Automatic Speech Recognition (ASR) models. We analyze the performance of three ASR model families: Whisper, MMS, and Seamless-M4T using Word Error Rate (WER), along with a detailed examination of the most frequent wrong words and error types including insertions, deletions, and substitutions. Our analysis is conducted using two types of datasets, read speech and conversational speech. Notably, we present the first conversational speech dataset designed for benchmarking Urdu ASR models. We find that seamless-large outperforms other ASR models on the read speech dataset, while whisper-large performs best on the conversational speech dataset. Furthermore, this evaluation highlights the complexities of assessing ASR models for low-resource languages like Urdu using quantitative metrics alone and emphasizes the need for a robust Urdu text normalization system. Our findings contribute valuable insights for developing robust ASR systems for low-resource languages like Urdu.
△ Less
Submitted 5 June, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation
Authors:
Samee Arif,
Sualeha Farid,
Abdul Hameed Azeemi,
Awais Athar,
Agha Ali Raza
Abstract:
This paper presents a novel methodology for generating synthetic Preference Optimization (PO) datasets using multi-model workflows. We evaluate the effectiveness and potential of these workflows in automating and enhancing the dataset generation process. PO dataset generation requires two modules: (1) $\textit{response evaluation}$, and (2) $\textit{response generation}$. In the…
▽ More
This paper presents a novel methodology for generating synthetic Preference Optimization (PO) datasets using multi-model workflows. We evaluate the effectiveness and potential of these workflows in automating and enhancing the dataset generation process. PO dataset generation requires two modules: (1) $\textit{response evaluation}$, and (2) $\textit{response generation}$. In the $\textit{response evaluation}$ module, the responses from Large Language Models (LLMs) are evaluated and ranked - a task typically carried out by human annotators that we automate using LLMs. We assess the response evaluation module in a 2 step process. In step 1, we assess LLMs as evaluators using three distinct prompting strategies. In step 2, we apply the winning prompting strategy to compare the performance of LLM-as-a-Judge, LLMs-as-a-Jury, and LLM Debate. Our evaluation shows that GPT-4o-as-a-Judge is more consistent across all datasets. For the $\textit{response generation}$ module, we use the identified LLM evaluator configuration and compare different configurations of the LLM Feedback Loop. We use the win rate to determine the best multi-model configuration for generation. Experimenting with various configurations, we find that the LLM Feedback Loop, with Llama as the generator and Gemma as the reviewer, achieves a notable 71.8% and 73.8% win rate over single-model Llama and Gemma, respectively. After identifying the best configurations for both modules, we generate our PO datasets using the above pipeline.
△ Less
Submitted 11 June, 2025; v1 submitted 16 August, 2024;
originally announced August 2024.
-
Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention
Authors:
Zohaib Khan,
Muhammad Khaquan,
Omer Tafveez,
Burhanuddin Samiwala,
Agha Ali Raza
Abstract:
The Transformer architecture has revolutionized deep learning through its Self-Attention mechanism, which effectively captures contextual information. However, the memory footprint of Self-Attention presents significant challenges for long-sequence tasks. Grouped Query Attention (GQA) addresses this issue by grouping queries and mean-pooling the corresponding key-value heads - reducing the number…
▽ More
The Transformer architecture has revolutionized deep learning through its Self-Attention mechanism, which effectively captures contextual information. However, the memory footprint of Self-Attention presents significant challenges for long-sequence tasks. Grouped Query Attention (GQA) addresses this issue by grouping queries and mean-pooling the corresponding key-value heads - reducing the number of overall parameters and memory requirements in a flexible manner without adversely compromising model accuracy. In this work, we introduce enhancements to GQA, focusing on two novel approaches that deviate from the static nature of grouping: Key-Distributed GQA (KDGQA) and Dynamic Key-Distributed GQA (DGQA), which leverage information from the norms of the key heads to inform query allocation. Specifically, KDGQA looks at the ratios of the norms of the key heads during each forward pass, while DGQA examines the ratios of the norms as they evolve through training. Additionally, we present Perturbed GQA (PGQA) as a case-study, which introduces variability in (static) group formation via subtracting noise from the attention maps. Our experiments with up-trained Vision Transformers, for Image Classification on datasets such as CIFAR-10, CIFAR-100, Food101, and Tiny ImageNet, demonstrate the promise of these variants in improving upon the original GQA through more informed and adaptive grouping mechanisms: specifically ViT-L experiences accuracy gains of up to 8% when utilizing DGQA in comparison to GQA and other variants. We further analyze the impact of the number of Key-Value Heads on performance, underscoring the importance of utilizing query-key affinities. Code is available on GitHub.
△ Less
Submitted 28 August, 2024; v1 submitted 15 August, 2024;
originally announced August 2024.
-
Generalists vs. Specialists: Evaluating Large Language Models for Urdu
Authors:
Samee Arif,
Abdul Hameed Azeemi,
Agha Ali Raza,
Awais Athar
Abstract:
In this paper, we compare general-purpose models, GPT-4-Turbo and Llama-3-8b, with special-purpose models--XLM-Roberta-large, mT5-large, and Llama-3-8b--that have been fine-tuned on specific tasks. We focus on seven classification and seven generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented in Natural…
▽ More
In this paper, we compare general-purpose models, GPT-4-Turbo and Llama-3-8b, with special-purpose models--XLM-Roberta-large, mT5-large, and Llama-3-8b--that have been fine-tuned on specific tasks. We focus on seven classification and seven generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented in Natural Language Processing (NLP). Despite the frequent advancements in Large Language Models (LLMs), their performance in low-resource languages, including Urdu, still needs to be explored. We also conduct a human evaluation for the generation tasks and compare the results with the evaluations performed by GPT-4-Turbo, Llama-3-8b and Claude 3.5 Sonnet. We find that special-purpose models consistently outperform general-purpose models across various tasks. We also find that the evaluation done by GPT-4-Turbo for generation tasks aligns more closely with human evaluation compared to the evaluation the evaluation done by Llama-3-8b. This paper contributes to the NLP community by providing insights into the effectiveness of general and specific-purpose LLMs for low-resource languages.
△ Less
Submitted 3 October, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
A Novel Labeled Human Voice Signal Dataset for Misbehavior Detection
Authors:
Ali Raza,
Faizan Younas
Abstract:
Voice signal classification based on human behaviours involves analyzing various aspects of speech patterns and delivery styles. In this study, a real-time dataset collection is performed where participants are instructed to speak twelve psychology questions in two distinct manners: first, in a harsh voice, which is categorized as "misbehaved"; and second, in a polite manner, categorized as "norma…
▽ More
Voice signal classification based on human behaviours involves analyzing various aspects of speech patterns and delivery styles. In this study, a real-time dataset collection is performed where participants are instructed to speak twelve psychology questions in two distinct manners: first, in a harsh voice, which is categorized as "misbehaved"; and second, in a polite manner, categorized as "normal". These classifications are crucial in understanding how different vocal behaviours affect the interpretation and classification of voice signals. This research highlights the significance of voice tone and delivery in automated machine-learning systems for voice analysis and recognition. This research contributes to the broader field of voice signal analysis by elucidating the impact of human behaviour on the perception and categorization of voice signals, thereby enhancing the development of more accurate and context-aware voice recognition technologies.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Unobtrusive Monitoring of Physical Weakness: A Simulated Approach
Authors:
Chen Long-fei,
Muhammad Ahmed Raza,
Craig Innes,
Subramanian Ramamoorthy,
Robert B. Fisher
Abstract:
Aging and chronic conditions affect older adults' daily lives, making early detection of developing health issues crucial. Weakness, common in many conditions, alters physical movements and daily activities subtly. However, detecting such changes can be challenging due to their subtle and gradual nature. To address this, we employ a non-intrusive camera sensor to monitor individuals' daily sitting…
▽ More
Aging and chronic conditions affect older adults' daily lives, making early detection of developing health issues crucial. Weakness, common in many conditions, alters physical movements and daily activities subtly. However, detecting such changes can be challenging due to their subtle and gradual nature. To address this, we employ a non-intrusive camera sensor to monitor individuals' daily sitting and relaxing activities for signs of weakness. We simulate weakness in healthy subjects by having them perform physical exercise and observing the behavioral changes in their daily activities before and after workouts. The proposed system captures fine-grained features related to body motion, inactivity, and environmental context in real-time while prioritizing privacy. A Bayesian Network is used to model the relationships between features, activities, and health conditions. We aim to identify specific features and activities that indicate such changes and determine the most suitable time scale for observing the change. Results show 0.97 accuracy in distinguishing simulated weakness at the daily level. Fine-grained behavioral features, including non-dominant upper body motion speed and scale, and inactivity distribution, along with a 300-second window, are found most effective. However, individual-specific models are recommended as no universal set of optimal features and activities was identified across all participants.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Online learning of quantum processes
Authors:
Asad Raza,
Matthias C. Caro,
Jens Eisert,
Sumeet Khatri
Abstract:
Among recent insights into learning quantum states, online learning and shadow tomography procedures are notable for their ability to accurately predict expectation values even of adaptively chosen observables. In contrast to the state case, quantum process learning tasks with a similarly adaptive nature have received little attention. In this work, we investigate online learning tasks for quantum…
▽ More
Among recent insights into learning quantum states, online learning and shadow tomography procedures are notable for their ability to accurately predict expectation values even of adaptively chosen observables. In contrast to the state case, quantum process learning tasks with a similarly adaptive nature have received little attention. In this work, we investigate online learning tasks for quantum processes. Whereas online learning is infeasible for general quantum channels, we show that channels of bounded gate complexity as well as Pauli channels can be online learned in the regret and mistake-bounded models of online learning. In fact, we can online learn probabilistic mixtures of any exponentially large set of known channels. We also provide a provably sample-efficient shadow tomography procedure for Pauli channels. Our results extend beyond quantum channels to non-Markovian multi-time processes, with favorable regret and mistake bounds, as well as a shadow tomography procedure. We complement our online learning upper bounds with mistake as well as computational lower bounds. On the technical side, we make use of the multiplicative weights update algorithm, classical adaptive data analysis, and Bell sampling, as well as tools from the theory of quantum combs for multi-time quantum processes. Our work initiates a study of online learning for classes of quantum channels and, more generally, non-Markovian quantum processes. Given the importance of online learning for state shadow tomography, this may serve as a step towards quantum channel variants of adaptive shadow tomography.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
An Attention Based Pipeline for Identifying Pre-Cancer Lesions in Head and Neck Clinical Images
Authors:
Abdullah Alsalemi,
Anza Shakeel,
Mollie Clark,
Syed Ali Khurram,
Shan E Ahmed Raza
Abstract:
Early detection of cancer can help improve patient prognosis by early intervention. Head and neck cancer is diagnosed in specialist centres after a surgical biopsy, however, there is a potential for these to be missed leading to delayed diagnosis. To overcome these challenges, we present an attention based pipeline that identifies suspected lesions, segments, and classifies them as non-dysplastic,…
▽ More
Early detection of cancer can help improve patient prognosis by early intervention. Head and neck cancer is diagnosed in specialist centres after a surgical biopsy, however, there is a potential for these to be missed leading to delayed diagnosis. To overcome these challenges, we present an attention based pipeline that identifies suspected lesions, segments, and classifies them as non-dysplastic, dysplastic and cancerous lesions. We propose (a) a vision transformer based Mask R-CNN network for lesion detection and segmentation of clinical images, and (b) Multiple Instance Learning (MIL) based scheme for classification. Current results show that the segmentation model produces segmentation masks and bounding boxes with up to 82% overlap accuracy score on unseen external test data and surpassing reviewed segmentation benchmarks. Next, a classification F1-score of 85% on the internal cohort test set. An app has been developed to perform lesion segmentation taken via a smart device. Future work involves employing endoscopic video data for precise early detection and prognosis.
△ Less
Submitted 7 May, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
UQA: Corpus for Urdu Question Answering
Authors:
Samee Arif,
Sualeha Farid,
Awais Athar,
Agha Ali Raza
Abstract:
This paper introduces UQA, a novel dataset for question answering and text comprehension in Urdu, a low-resource language with over 70 million native speakers. UQA is generated by translating the Stanford Question Answering Dataset (SQuAD2.0), a large-scale English QA dataset, using a technique called EATS (Enclose to Anchor, Translate, Seek), which preserves the answer spans in the translated con…
▽ More
This paper introduces UQA, a novel dataset for question answering and text comprehension in Urdu, a low-resource language with over 70 million native speakers. UQA is generated by translating the Stanford Question Answering Dataset (SQuAD2.0), a large-scale English QA dataset, using a technique called EATS (Enclose to Anchor, Translate, Seek), which preserves the answer spans in the translated context paragraphs. The paper describes the process of selecting and evaluating the best translation model among two candidates: Google Translator and Seamless M4T. The paper also benchmarks several state-of-the-art multilingual QA models on UQA, including mBERT, XLM-RoBERTa, and mT5, and reports promising results. For XLM-RoBERTa-XL, we have an F1 score of 85.99 and 74.56 EM. UQA is a valuable resource for developing and testing multilingual NLP systems for Urdu and for enhancing the cross-lingual transferability of existing models. Further, the paper demonstrates the effectiveness of EATS for creating high-quality datasets for other languages and domains. The UQA dataset and the code are publicly available at www.github.com/sameearif/UQA.
△ Less
Submitted 22 July, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Nuclei-Location Based Point Set Registration of Multi-Stained Whole Slide Images
Authors:
Adith Jeyasangar,
Abdullah Alsalemi,
Shan E Ahmed Raza
Abstract:
Whole Slide Images (WSIs) provide exceptional detail for studying tissue architecture at the cell level. To study tumour microenvironment (TME) with the context of various protein biomarkers and cell sub-types, analysis and registration of features using multi-stained WSIs is often required. Multi-stained WSI pairs normally suffer from rigid and non-rigid deformities in addition to slide artefacts…
▽ More
Whole Slide Images (WSIs) provide exceptional detail for studying tissue architecture at the cell level. To study tumour microenvironment (TME) with the context of various protein biomarkers and cell sub-types, analysis and registration of features using multi-stained WSIs is often required. Multi-stained WSI pairs normally suffer from rigid and non-rigid deformities in addition to slide artefacts and control tissue which present challenges at precise registration. Traditional registration methods mainly focus on global rigid/non-rigid registration but struggle with aligning slides with complex tissue deformations at the nuclei level. However, nuclei level non-rigid registration is essential for downstream tasks such as cell sub-type analysis in the context of protein biomarker signatures. This paper focuses on local level non-rigid registration using a nuclei-location based point set registration approach for aligning multi-stained WSIs. We exploit the spatial distribution of nuclei that is prominent and consistent (to a large level) across different stains to establish a spatial correspondence. We evaluate our approach using the HYRECO dataset consisting of 54 re-stained images of H\&E and PHH3 image pairs. The approach can be extended to other IHC and IF stained WSIs considering a good nuclei detection algorithm is accessible. The performance of the model is tested against established registration algorithms and is shown to outperform the model for nuclei level registration.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Declarative Concurrent Data Structures
Authors:
Aun Raza,
Hamish Nicholson,
Ioanna Tsakalidou,
Anna Herlihy,
Prathamesh Tagore,
Anastasia Ailamaki
Abstract:
Implementing concurrent data structures is challenging and requires a deep understanding of concurrency concepts and careful design to ensure correctness, performance, and scalability. Further, composing operations on two or more concurrent data structures often requires a synchronization wrapper to ensure the operations are applied together atomically, resulting in serialization and, thereby, giv…
▽ More
Implementing concurrent data structures is challenging and requires a deep understanding of concurrency concepts and careful design to ensure correctness, performance, and scalability. Further, composing operations on two or more concurrent data structures often requires a synchronization wrapper to ensure the operations are applied together atomically, resulting in serialization and, thereby, giving up the performance benefit of the individual data structures. DBMS provides generalized concurrency control (CC) and is a good fit for implementing concurrent data structures. However, DBMSs are over-generalized for this use case, which fails to match the performance of specialized implementations.
This paper makes the case for the Declarative Concurrent Data Structures (DCDS) framework for automatically generating concurrent data structures from a serial specification. In DCDS, users declare the attributes and methods needed for their desired data structure through an embedded DSL at design time. DCDS automatically injects CC at build-time, generating a concurrent intermediate representation (IR) compiled into machine code. A declarative interface for designing data structure enables efficient composability through co-optimizing component structures; optimizations are applied to both the composed serial specification and the generated concurrent IR. We realize the DCDS framework in our prototype system Rosti and experimentally show that data structures declared in Rosti can be efficiently composed by co-optimizing their logical functionality and the generated CC protocol. Our evaluation shows that composing a map and a list to create an LRU container can benefit up to 2X performance scalability in Rosti compared to an open-source library. We demonstrate the applicability of DCDS as an in-process OLTP by comparing it with in-memory DBMS, Proteus, and showing up to 2X performance gains.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
To Label or Not to Label: Hybrid Active Learning for Neural Machine Translation
Authors:
Abdul Hameed Azeemi,
Ihsan Ayyub Qazi,
Agha Ali Raza
Abstract:
Active learning (AL) techniques reduce labeling costs for training neural machine translation (NMT) models by selecting smaller representative subsets from unlabeled data for annotation. Diversity sampling techniques select heterogeneous instances, while uncertainty sampling methods select instances with the highest model uncertainty. Both approaches have limitations - diversity methods may extrac…
▽ More
Active learning (AL) techniques reduce labeling costs for training neural machine translation (NMT) models by selecting smaller representative subsets from unlabeled data for annotation. Diversity sampling techniques select heterogeneous instances, while uncertainty sampling methods select instances with the highest model uncertainty. Both approaches have limitations - diversity methods may extract varied but trivial examples, while uncertainty sampling can yield repetitive, uninformative instances. To bridge this gap, we propose Hybrid Uncertainty and Diversity Sampling (HUDS), an AL strategy for domain adaptation in NMT that combines uncertainty and diversity for sentence selection. HUDS computes uncertainty scores for unlabeled sentences and subsequently stratifies them. It then clusters sentence embeddings within each stratum and computes diversity scores by distance to the centroid. A weighted hybrid score that combines uncertainty and diversity is then used to select the top instances for annotation in each AL iteration. Experiments on multi-domain German-English and French-English datasets demonstrate the better performance of HUDS over other strong AL baselines. We analyze the sentence selection with HUDS and show that it prioritizes diverse instances having high model uncertainty for annotation in early AL iterations.
△ Less
Submitted 18 December, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
An AI based Digital Score of Tumour-Immune Microenvironment Predicts Benefit to Maintenance Immunotherapy in Advanced Oesophagogastric Adenocarcinoma
Authors:
Quoc Dang Vu,
Caroline Fong,
Anderley Gordon,
Tom Lund,
Tatiany L Silveira,
Daniel Rodrigues,
Katharina von Loga,
Shan E Ahmed Raza,
David Cunningham,
Nasir Rajpoot
Abstract:
Gastric and oesophageal (OG) cancers are the leading causes of cancer mortality worldwide. In OG cancers, recent studies have showed that PDL1 immune checkpoint inhibitors (ICI) in combination with chemotherapy improves patient survival. However, our understanding of the tumour immune microenvironment in OG cancers remains limited. In this study, we interrogate multiplex immunofluorescence (mIF) i…
▽ More
Gastric and oesophageal (OG) cancers are the leading causes of cancer mortality worldwide. In OG cancers, recent studies have showed that PDL1 immune checkpoint inhibitors (ICI) in combination with chemotherapy improves patient survival. However, our understanding of the tumour immune microenvironment in OG cancers remains limited. In this study, we interrogate multiplex immunofluorescence (mIF) images taken from patients with advanced Oesophagogastric Adenocarcinoma (OGA) who received first-line fluoropyrimidine and platinum-based chemotherapy in the PLATFORM trial (NCT02678182) to predict the efficacy of the treatment and to explore the biological basis of patients responding to maintenance durvalumab (PDL1 inhibitor). Our proposed Artificial Intelligence (AI) based marker successfully identified responder from non-responder (p < 0.05) as well as those who could potentially benefit from ICI with statistical significance (p < 0.05) for both progression free and overall survival. Our findings suggest that T cells that express FOXP3 seem to heavily influence the patient treatment response and survival outcome. We also observed that higher levels of CD8+PD1+ cells are consistently linked to poor prognosis for both OS and PFS, regardless of ICI.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning
Authors:
Bang Yang,
Yong Dai,
Xuxin Cheng,
Yaowei Li,
Asif Raza,
Yuexian Zou
Abstract:
While vision-language pre-trained models (VL-PTMs) have advanced multimodal research in recent years, their mastery in a few languages like English restricts their applicability in broader communities. To this end, there is an increasing interest in developing multilingual VL models via a joint-learning setup, which, however, could be unrealistic due to expensive costs and data availability. In th…
▽ More
While vision-language pre-trained models (VL-PTMs) have advanced multimodal research in recent years, their mastery in a few languages like English restricts their applicability in broader communities. To this end, there is an increasing interest in developing multilingual VL models via a joint-learning setup, which, however, could be unrealistic due to expensive costs and data availability. In this work, we propose to extend VL-PTMs' language capacity by continual language learning (CLL), where a model needs to update its linguistic knowledge incrementally without suffering from catastrophic forgetting (CF). We begin our study by introducing a model dubbed CLL-CLIP, which builds upon CLIP, a prevailing VL-PTM that has acquired image-English text alignment. Specifically, CLL-CLIP contains an expandable token embedding layer to handle linguistic differences. It solely trains token embeddings to improve memory stability and is optimized under cross-modal and cross-lingual objectives to learn the alignment between images and multilingual texts. To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training. We construct a CLL benchmark covering 36 languages based on MSCOCO and XM3600 datasets and then evaluate multilingual image-text retrieval performance. Extensive experiments verify the effectiveness of CLL-CLIP and show that our approach can boost CLL-CLIP, e.g., by 6.7% in text-to-image average Recall@1 on XM3600, and improve various state-of-the-art methods consistently. Our code and data are available at \url{https://github.com/yangbang18/CLFM}.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Designing Visual Learning Analytics for Supporting Equity in STEM Classrooms
Authors:
Ali Raza,
William R. Penuel,
Tamara Sumner
Abstract:
Supporting equitable instruction is an important issue for teachers attending diverse STEM classrooms. Visual learning analytics along with effective student survey measures can support providing on time feedback to teachers in making instruction more culturally relevant to all students. We adopted a user-centered approach, where we engaged seven middle school science teachers in iterative testing…
▽ More
Supporting equitable instruction is an important issue for teachers attending diverse STEM classrooms. Visual learning analytics along with effective student survey measures can support providing on time feedback to teachers in making instruction more culturally relevant to all students. We adopted a user-centered approach, where we engaged seven middle school science teachers in iterative testing of thirty data visualizations disaggregated over markers such as gender and race for implementation of selected displays in a visual learning analytics tool- Student Electronic Exit Ticket (SEET). This process helped us gather insights into teachers' sensemaking in identifying patterns of student data related to gender and race, selecting and improving the design of the feedback displays for the SEET [10].
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Cell Maps Representation For Lung Adenocarcinoma Growth Patterns Classification In Whole Slide Images
Authors:
Arwa Al-Rubaian,
Gozde N. Gunesli,
Wajd A. Althakfi,
Ayesha Azam,
Nasir Rajpoot,
Shan E Ahmed Raza
Abstract:
Lung adenocarcinoma is a morphologically heterogeneous disease, characterized by five primary histologic growth patterns. The quantity of these patterns can be related to tumor behavior and has a significant impact on patient prognosis. In this work, we propose a novel machine learning pipeline capable of classifying tissue tiles into one of the five patterns or as non-tumor, with an Area Under th…
▽ More
Lung adenocarcinoma is a morphologically heterogeneous disease, characterized by five primary histologic growth patterns. The quantity of these patterns can be related to tumor behavior and has a significant impact on patient prognosis. In this work, we propose a novel machine learning pipeline capable of classifying tissue tiles into one of the five patterns or as non-tumor, with an Area Under the Receiver Operating Characteristic Curve (AUCROC) score of 0.97. Our model's strength lies in its comprehensive consideration of cellular spatial patterns, where it first generates cell maps from Hematoxylin and Eosin (H&E) whole slide images (WSIs), which are then fed into a convolutional neural network classification model. Exploiting these cell maps provides the model with robust generalizability to new data, achieving approximately 30% higher accuracy on unseen test-sets compared to current state of the art approaches. The insights derived from our model can be used to predict prognosis, enhancing patient outcomes.
△ Less
Submitted 16 May, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Emotion-Oriented Behavior Model Using Deep Learning
Authors:
Muhammad Arslan Raza,
Muhammad Shoaib Farooq,
Adel Khelifi,
Atif Alvi
Abstract:
Emotions, as a fundamental ingredient of any social interaction, lead to behaviors that represent the effectiveness of the interaction through facial expressions and gestures in humans. Hence an agent must possess the social and cognitive abilities to understand human social parameters and behave accordingly. However, no such emotion-oriented behavior model is presented yet in the existing researc…
▽ More
Emotions, as a fundamental ingredient of any social interaction, lead to behaviors that represent the effectiveness of the interaction through facial expressions and gestures in humans. Hence an agent must possess the social and cognitive abilities to understand human social parameters and behave accordingly. However, no such emotion-oriented behavior model is presented yet in the existing research. The emotion prediction may generate appropriate agents' behaviors for effective interaction using conversation modality. Considering the importance of emotions, and behaviors, for an agent's social interaction, an Emotion-based Behavior model is presented in this paper for Socio-cognitive artificial agents. The proposed model is implemented using tweets data trained on multiple models like Long Short-Term Memory (LSTM), Convolution Neural Network (CNN) and Bidirectional Encoder Representations from Transformers (BERT) for emotion prediction with an average accuracy of 92%, and 55% respectively. Further, using emotion predictions from CNN-LSTM, the behavior module responds using facial expressions and gestures using Behavioral Markup Language (BML). The accuracy of emotion-based behavior predictions is statistically validated using the 2-tailed Pearson correlation on the data collected from human users through questionnaires. Analysis shows that all emotion-based behaviors accurately depict human-like gestures and facial expressions based on the significant correlation at the 0.01 and 0.05 levels. This study is a steppingstone to a multi-faceted artificial agent interaction based on emotion-oriented behaviors. Cognition has significance regarding social interaction among humans.
△ Less
Submitted 28 October, 2023;
originally announced November 2023.
-
An Automated Pipeline for Tumour-Infiltrating Lymphocyte Scoring in Breast Cancer
Authors:
Adam J Shephard,
Mostafa Jahanifar,
Ruoyu Wang,
Muhammad Dawood,
Simon Graham,
Kastytis Sidlauskas,
Syed Ali Khurram,
Nasir M Rajpoot,
Shan E Ahmed Raza
Abstract:
Tumour-infiltrating lymphocytes (TILs) are considered as a valuable prognostic markers in both triple-negative and human epidermal growth factor receptor 2 (HER2) positive breast cancer. In this study, we introduce an innovative deep learning pipeline based on the Efficient-UNet architecture to predict the TILs score for breast cancer whole-slide images (WSIs). We first segment tumour and stromal…
▽ More
Tumour-infiltrating lymphocytes (TILs) are considered as a valuable prognostic markers in both triple-negative and human epidermal growth factor receptor 2 (HER2) positive breast cancer. In this study, we introduce an innovative deep learning pipeline based on the Efficient-UNet architecture to predict the TILs score for breast cancer whole-slide images (WSIs). We first segment tumour and stromal regions in order to compute a tumour bulk mask. We then detect TILs within the tumour-associated stroma, generating a TILs score by closely mirroring the pathologist's workflow. Our method exhibits state-of-the-art performance in segmenting tumour/stroma areas and TILs detection, as demonstrated by internal cross-validation on the TiGER Challenge training dataset and evaluation on the final leaderboards. Additionally, our TILs score proves competitive in predicting survival outcomes within the same challenge, underscoring the clinical relevance and potential of our automated TILs scoring pipeline as a breast cancer prognostic tool.
△ Less
Submitted 21 November, 2023; v1 submitted 10 November, 2023;
originally announced November 2023.
-
Transformer-based Model for Oral Epithelial Dysplasia Segmentation
Authors:
Adam J Shephard,
Hanya Mahmood,
Shan E Ahmed Raza,
Anna Luiza Damaceno Araujo,
Alan Roger Santos-Silva,
Marcio Ajudarte Lopes,
Pablo Agustin Vargas,
Kris McCombe,
Stephanie Craig,
Jacqueline James,
Jill Brooks,
Paul Nankivell,
Hisham Mehanna,
Syed Ali Khurram,
Nasir M Rajpoot
Abstract:
Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. OED grading is subject to large inter/intra-rater variability, resulting in the under/over-treatment of patients. We developed a new Transformer-based pipeline to improve detection and segmentation of OED in haematoxylin and eosin (H&E) stained whole slide images (WSIs). Our model was…
▽ More
Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. OED grading is subject to large inter/intra-rater variability, resulting in the under/over-treatment of patients. We developed a new Transformer-based pipeline to improve detection and segmentation of OED in haematoxylin and eosin (H&E) stained whole slide images (WSIs). Our model was trained on OED cases (n = 260) and controls (n = 105) collected using three different scanners, and validated on test data from three external centres in the United Kingdom and Brazil (n = 78). Our internal experiments yield a mean F1-score of 0.81 for OED segmentation, which reduced slightly to 0.71 on external testing, showing good generalisability, and gaining state-of-the-art results. This is the first externally validated study to use Transformers for segmentation in precancerous histology images. Our publicly available model shows great promise to be the first step of a fully-integrated pipeline, allowing earlier and more efficient OED diagnosis, ultimately benefiting patient outcomes.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Domain Generalization in Computational Pathology: Survey and Guidelines
Authors:
Mostafa Jahanifar,
Manahil Raza,
Kesi Xu,
Trinh Vuong,
Rob Jewsbury,
Adam Shephard,
Neda Zamanitajeddin,
Jin Tae Kwak,
Shan E Ahmed Raza,
Fayyaz Minhas,
Nasir Rajpoot
Abstract:
Deep learning models have exhibited exceptional effectiveness in Computational Pathology (CPath) by tackling intricate tasks across an array of histology image analysis applications. Nevertheless, the presence of out-of-distribution data (stemming from a multitude of sources such as disparate imaging devices and diverse tissue preparation methods) can cause \emph{domain shift} (DS). DS decreases t…
▽ More
Deep learning models have exhibited exceptional effectiveness in Computational Pathology (CPath) by tackling intricate tasks across an array of histology image analysis applications. Nevertheless, the presence of out-of-distribution data (stemming from a multitude of sources such as disparate imaging devices and diverse tissue preparation methods) can cause \emph{domain shift} (DS). DS decreases the generalization of trained models to unseen datasets with slightly different data distributions, prompting the need for innovative \emph{domain generalization} (DG) solutions. Recognizing the potential of DG methods to significantly influence diagnostic and prognostic models in cancer studies and clinical practice, we present this survey along with guidelines on achieving DG in CPath. We rigorously define various DS types, systematically review and categorize existing DG approaches and resources in CPath, and provide insights into their advantages, limitations, and applicability. We also conduct thorough benchmarking experiments with 28 cutting-edge DG algorithms to address a complex DG problem. Our findings suggest that careful experiment design and CPath-specific Stain Augmentation technique can be very effective. However, there is no one-size-fits-all solution for DG in CPath. Therefore, we establish clear guidelines for detecting and managing DS depending on different scenarios. While most of the concepts, guidelines, and recommendations are given for applications in CPath, we believe that they are applicable to most medical image analysis tasks as well.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
A Fully Automated and Explainable Algorithm for the Prediction of Malignant Transformation in Oral Epithelial Dysplasia
Authors:
Adam J Shephard,
Raja Muhammad Saad Bashir,
Hanya Mahmood,
Mostafa Jahanifar,
Fayyaz Minhas,
Shan E Ahmed Raza,
Kris D McCombe,
Stephanie G Craig,
Jacqueline James,
Jill Brooks,
Paul Nankivell,
Hisham Mehanna,
Syed Ali Khurram,
Nasir M Rajpoot
Abstract:
Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. Its grading suffers from significant inter-/intra- observer variability, and does not reliably predict malignancy progression, potentially leading to suboptimal treatment decisions. To address this, we developed a novel artificial intelligence algorithm that can assign an Oral Maligna…
▽ More
Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. Its grading suffers from significant inter-/intra- observer variability, and does not reliably predict malignancy progression, potentially leading to suboptimal treatment decisions. To address this, we developed a novel artificial intelligence algorithm that can assign an Oral Malignant Transformation (OMT) risk score, based on histological patterns in the in Haematoxylin and Eosin stained whole slide images, to quantify the risk of OED progression. The algorithm is based on the detection and segmentation of nuclei within (and around) the epithelium using an in-house segmentation model. We then employed a shallow neural network fed with interpretable morphological/spatial features, emulating histological markers. We conducted internal cross-validation on our development cohort (Sheffield; n = 193 cases) followed by independent validation on two external cohorts (Birmingham and Belfast; n = 92 cases). The proposed OMTscore yields an AUROC = 0.74 in predicting whether an OED progresses to malignancy or not. Survival analyses showed the prognostic value of our OMTscore for predicting malignancy transformation, when compared to the manually-assigned WHO and binary grades. Analysis of the correctly predicted cases elucidated the presence of peri-epithelial and epithelium-infiltrating lymphocytes in the most predictive patches of cases that transformed (p < 0.0001). This is the first study to propose a completely automated algorithm for predicting OED transformation based on interpretable nuclear features, whilst being validated on external datasets. The algorithm shows better-than-human-level performance for prediction of OED malignant transformation and offers a promising solution to the challenges of grading OED in routine clinical practice.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Customizing General-Purpose Foundation Models for Medical Report Generation
Authors:
Bang Yang,
Asif Raza,
Yuexian Zou,
Tong Zhang
Abstract:
Medical caption prediction which can be regarded as a task of medical report generation (MRG), requires the automatic generation of coherent and accurate captions for the given medical images. However, the scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks capable of harnessing the potential artificial general intell…
▽ More
Medical caption prediction which can be regarded as a task of medical report generation (MRG), requires the automatic generation of coherent and accurate captions for the given medical images. However, the scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks capable of harnessing the potential artificial general intelligence power like large language models (LLMs). In this work, we propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs), in computer vision and natural language processing with a specific focus on medical report generation. Specifically, following BLIP-2, a state-of-the-art vision-language pre-training approach, we introduce our encoder-decoder-based MRG model. This model utilizes a lightweight query Transformer to connect two FMs: the giant vision Transformer EVA-ViT-g and a bilingual LLM trained to align with human intentions (referred to as ChatGLM-6B). Furthermore, we conduct ablative experiments on the trainable components of the model to identify the crucial factors for effective transfer learning. Our findings demonstrate that unfreezing EVA-ViT-g to learn medical image representations, followed by parameter-efficient training of ChatGLM-6B to capture the writing styles of medical reports, is essential for achieving optimal results. Our best attempt (PCLmed Team) achieved the 4th and the 2nd, respectively, out of 13 participating teams, based on the BERTScore and ROUGE-1 metrics, in the ImageCLEFmedical Caption 2023 Caption Prediction Task competition.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Abstractive Summary Generation for the Urdu Language
Authors:
Ali Raza,
Hadia Sultan Raja,
Usman Maratib
Abstract:
Abstractive summary generation is a challenging task that requires the model to comprehend the source text and generate a concise and coherent summary that captures the essential information. In this paper, we explore the use of an encoder/decoder approach for abstractive summary generation in the Urdu language. We employ a transformer-based model that utilizes self-attention mechanisms to encode…
▽ More
Abstractive summary generation is a challenging task that requires the model to comprehend the source text and generate a concise and coherent summary that captures the essential information. In this paper, we explore the use of an encoder/decoder approach for abstractive summary generation in the Urdu language. We employ a transformer-based model that utilizes self-attention mechanisms to encode the input text and generate a summary. Our experiments show that our model can produce summaries that are grammatically correct and semantically meaningful. We evaluate our model on a publicly available dataset and achieve state-of-the-art results in terms of Rouge scores. We also conduct a qualitative analysis of our model's output to assess its effectiveness and limitations. Our findings suggest that the encoder/decoder approach is a promising method for abstractive summary generation in Urdu and can be extended to other languages with suitable modifications.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting
Authors:
Simon Graham,
Quoc Dang Vu,
Mostafa Jahanifar,
Martin Weigert,
Uwe Schmidt,
Wenhua Zhang,
Jun Zhang,
Sen Yang,
Jinxi Xiang,
Xiyue Wang,
Josef Lorenz Rumberger,
Elias Baumann,
Peter Hirsch,
Lihao Liu,
Chenyang Hong,
Angelica I. Aviles-Rivero,
Ayushi Jain,
Heeyoung Ahn,
Yiyu Hong,
Hussam Azzuni,
Min Xu,
Mohammad Yaqub,
Marie-Claire Blache,
Benoît Piégu,
Bertrand Vernay
, et al. (64 additional authors not shown)
Abstract:
Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of repro…
▽ More
Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery.
△ Less
Submitted 14 March, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Consistency Regularisation in Varying Contexts and Feature Perturbations for Semi-Supervised Semantic Segmentation of Histology Images
Authors:
Raja Muhammad Saad Bashir,
Talha Qaiser,
Shan E Ahmed Raza,
Nasir M. Rajpoot
Abstract:
Semantic segmentation of various tissue and nuclei types in histology images is fundamental to many downstream tasks in the area of computational pathology (CPath). In recent years, Deep Learning (DL) methods have been shown to perform well on segmentation tasks but DL methods generally require a large amount of pixel-wise annotated data. Pixel-wise annotation sometimes requires expert's knowledge…
▽ More
Semantic segmentation of various tissue and nuclei types in histology images is fundamental to many downstream tasks in the area of computational pathology (CPath). In recent years, Deep Learning (DL) methods have been shown to perform well on segmentation tasks but DL methods generally require a large amount of pixel-wise annotated data. Pixel-wise annotation sometimes requires expert's knowledge and time which is laborious and costly to obtain. In this paper, we present a consistency based semi-supervised learning (SSL) approach that can help mitigate this challenge by exploiting a large amount of unlabelled data for model training thus alleviating the need for a large annotated dataset. However, SSL models might also be susceptible to changing context and features perturbations exhibiting poor generalisation due to the limited training data. We propose an SSL method that learns robust features from both labelled and unlabelled images by enforcing consistency against varying contexts and feature perturbations. The proposed method incorporates context-aware consistency by contrasting pairs of overlapping images in a pixel-wise manner from changing contexts resulting in robust and context invariant features. We show that cross-consistency training makes the encoder features invariant to different perturbations and improves the prediction confidence. Finally, entropy minimisation is employed to further boost the confidence of the final prediction maps from unlabelled data. We conduct an extensive set of experiments on two publicly available large datasets (BCSS and MoNuSeg) and show superior performance compared to the state-of-the-art methods.
△ Less
Submitted 11 February, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
LYSTO: The Lymphocyte Assessment Hackathon and Benchmark Dataset
Authors:
Yiping Jiao,
Jeroen van der Laak,
Shadi Albarqouni,
Zhang Li,
Tao Tan,
Abhir Bhalerao,
Jiabo Ma,
Jiamei Sun,
Johnathan Pocock,
Josien P. W. Pluim,
Navid Alemi Koohbanani,
Raja Muhammad Saad Bashir,
Shan E Ahmed Raza,
Sibo Liu,
Simon Graham,
Suzanne Wetstein,
Syed Ali Khurram,
Thomas Watson,
Nasir Rajpoot,
Mitko Veta,
Francesco Ciompi
Abstract:
We introduce LYSTO, the Lymphocyte Assessment Hackathon, which was held in conjunction with the MICCAI 2019 Conference in Shenzen (China). The competition required participants to automatically assess the number of lymphocytes, in particular T-cells, in histopathological images of colon, breast, and prostate cancer stained with CD3 and CD8 immunohistochemistry. Differently from other challenges se…
▽ More
We introduce LYSTO, the Lymphocyte Assessment Hackathon, which was held in conjunction with the MICCAI 2019 Conference in Shenzen (China). The competition required participants to automatically assess the number of lymphocytes, in particular T-cells, in histopathological images of colon, breast, and prostate cancer stained with CD3 and CD8 immunohistochemistry. Differently from other challenges setup in medical image analysis, LYSTO participants were solely given a few hours to address this problem. In this paper, we describe the goal and the multi-phase organization of the hackathon; we describe the proposed methods and the on-site results. Additionally, we present post-competition results where we show how the presented methods perform on an independent set of lung cancer slides, which was not part of the initial competition, as well as a comparison on lymphocyte assessment between presented methods and a panel of pathologists. We show that some of the participants were capable to achieve pathologist-level performance at lymphocyte assessment. After the hackathon, LYSTO was left as a lightweight plug-and-play benchmark dataset on grand-challenge website, together with an automatic evaluation platform. LYSTO has supported a number of research in lymphocyte assessment in oncology. LYSTO will be a long-lasting educational challenge for deep learning and digital pathology, it is available at https://lysto.grand-challenge.org/.
△ Less
Submitted 13 April, 2023; v1 submitted 16 January, 2023;
originally announced January 2023.
-
Nuclear Segmentation and Classification: On Color & Compression Generalization
Authors:
Quoc Dang Vu,
Robert Jewsbury,
Simon Graham,
Mostafa Jahanifar,
Shan E Ahmed Raza,
Fayyaz Minhas,
Abhir Bhalerao,
Nasir Rajpoot
Abstract:
Since the introduction of digital and computational pathology as a field, one of the major problems in the clinical application of algorithms has been the struggle to generalize well to examples outside the distribution of the training data. Existing work to address this in both pathology and natural images has focused almost exclusively on classification tasks. We explore and evaluate the robustn…
▽ More
Since the introduction of digital and computational pathology as a field, one of the major problems in the clinical application of algorithms has been the struggle to generalize well to examples outside the distribution of the training data. Existing work to address this in both pathology and natural images has focused almost exclusively on classification tasks. We explore and evaluate the robustness of the 7 best performing nuclear segmentation and classification models from the largest computational pathology challenge for this problem to date, the CoNIC challenge. We demonstrate that existing state-of-the-art (SoTA) models are robust towards compression artifacts but suffer substantial performance reduction when subjected to shifts in the color domain. We find that using stain normalization to address the domain shift problem can be detrimental to the model performance. On the other hand, neural style transfer is more consistent in improving test performance when presented with large color variations in the wild.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Proof of Swarm Based Ensemble Learning for Federated Learning Applications
Authors:
Ali Raza,
Kim Phuc Tran,
Ludovic Koehl,
Shujun Li
Abstract:
Ensemble learning combines results from multiple machine learning models in order to provide a better and optimised predictive model with reduced bias, variance and improved predictions. However, in federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns. Hence, a mechanism is required to combine results of local models to produce a global mode…
▽ More
Ensemble learning combines results from multiple machine learning models in order to provide a better and optimised predictive model with reduced bias, variance and improved predictions. However, in federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns. Hence, a mechanism is required to combine results of local models to produce a global model. Most distributed consensus algorithms, such as Byzantine fault tolerance (BFT), do not normally perform well in such applications. This is because, in such methods predictions of some of the peers are disregarded, so a majority of peers can win without even considering other peers' decisions. Additionally, the confidence score of the result of each peer is not normally taken into account, although it is an important feature to consider for ensemble learning. Moreover, the problem of a tie event is often left un-addressed by methods such as BFT. To fill these research gaps, we propose PoSw (Proof of Swarm), a novel distributed consensus algorithm for ensemble learning in a federated setting, which was inspired by particle swarm based algorithms for solving optimisation problems. The proposed algorithm is theoretically proved to always converge in a relatively small number of steps and has mechanisms to resolve tie events while trying to achieve sub-optimum solutions. We experimentally validated the performance of the proposed algorithm using ECG classification as an example application in healthcare, showing that the ensemble learning model outperformed all local models and even the FL-based global model. To the best of our knowledge, the proposed algorithm is the first attempt to make consensus over the output results of distributed models trained using federated learning.
△ Less
Submitted 2 January, 2023; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Mitosis Detection, Fast and Slow: Robust and Efficient Detection of Mitotic Figures
Authors:
Mostafa Jahanifar,
Adam Shephard,
Neda Zamanitajeddin,
Simon Graham,
Shan E Ahmed Raza,
Fayyaz Minhas,
Nasir Rajpoot
Abstract:
Counting of mitotic figures is a fundamental step in grading and prognostication of several cancers. However, manual mitosis counting is tedious and time-consuming. In addition, variation in the appearance of mitotic figures causes a high degree of discordance among pathologists. With advances in deep learning models, several automatic mitosis detection algorithms have been proposed but they are s…
▽ More
Counting of mitotic figures is a fundamental step in grading and prognostication of several cancers. However, manual mitosis counting is tedious and time-consuming. In addition, variation in the appearance of mitotic figures causes a high degree of discordance among pathologists. With advances in deep learning models, several automatic mitosis detection algorithms have been proposed but they are sensitive to {\em domain shift} often seen in histology images. We propose a robust and efficient two-stage mitosis detection framework, which comprises mitosis candidate segmentation ({\em Detecting Fast}) and candidate refinement ({\em Detecting Slow}) stages. The proposed candidate segmentation model, termed \textit{EUNet}, is fast and accurate due to its architectural design. EUNet can precisely segment candidates at a lower resolution to considerably speed up candidate detection. Candidates are then refined using a deeper classifier network, EfficientNet-B7, in the second stage. We make sure both stages are robust against domain shift by incorporating domain generalization methods. We demonstrate state-of-the-art performance and generalizability of the proposed model on the three largest publicly available mitosis datasets, winning the two mitosis domain generalization challenge contests (MIDOG21 and MIDOG22). Finally, we showcase the utility of the proposed algorithm by processing the TCGA breast cancer cohort (1,125 whole-slide images) to generate and release a repository of more than 620K mitotic figures.
△ Less
Submitted 25 September, 2023; v1 submitted 26 August, 2022;
originally announced August 2022.
-
A Blockchain-based Decentralised and Dynamic Authorisation Scheme for the Internet of Things
Authors:
Khizar Hameed,
Ali Raza,
Saurabh Garg,
Muhammad Bilal Amin
Abstract:
An authorisation has been recognised as an important security measure for preventing unauthorised access to critical resources, such as devices and data, within the Internet of Things (IoT) networks. Existing authorisation methods for the IoT network are based on traditional access control models, which have several drawbacks, including architecture centralisation, policy tampering, access rights…
▽ More
An authorisation has been recognised as an important security measure for preventing unauthorised access to critical resources, such as devices and data, within the Internet of Things (IoT) networks. Existing authorisation methods for the IoT network are based on traditional access control models, which have several drawbacks, including architecture centralisation, policy tampering, access rights validation, malicious third-party policy assignment and control, and network-related overheads. The increasing trend of integrating Blockchain technology with IoT networks demonstrates its importance and potential to address the shortcomings of traditional IoT network authorisation mechanisms. This paper proposes a decentralised, secure, dynamic, and flexible authorisation scheme for IoT networks based on attribute-based access control (ABAC) fine-grained policies stored on a distributed immutable ledger. We design a Blockchain-based ABAC policy management framework divided into Attribute Management Authority (AMA) and Policy Management Authority (PMA) frameworks that use smart contract features to initialise, store, and manage attributes and policies on the Blockchain. To achieve flexibility and dynamicity in the authorisation process, we capture and utilise the environmental-related attributes in conjunction with the subject and object attributes of the ABAC model to define the policies. Furthermore, we designed the Blockchain-based Access Management Framework (AMF) to manage user requests to access IoT devices while maintaining the privacy and auditability of user requests and assigned policies. We implemented a prototype of our proposed scheme and executed it on the local Ethereum Blockchain. Finally, we demonstrated the applicability and flexibility of our proposed scheme for an IoT-based smart home scenario, taking into account deployment, execution and financial costs.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
Using Anomaly Detection to Detect Poisoning Attacks in Federated Learning Applications
Authors:
Ali Raza,
Shujun Li,
Kim-Phuc Tran,
Ludovic Koehl,
Kim Duc Tran
Abstract:
Adversarial attacks such as poisoning attacks have attracted the attention of many machine learning researchers. Traditionally, poisoning attacks attempt to inject adversarial training data in order to manipulate the trained model. In federated learning (FL), data poisoning attacks can be generalized to model poisoning attacks, which cannot be detected by simpler methods due to the lack of access…
▽ More
Adversarial attacks such as poisoning attacks have attracted the attention of many machine learning researchers. Traditionally, poisoning attacks attempt to inject adversarial training data in order to manipulate the trained model. In federated learning (FL), data poisoning attacks can be generalized to model poisoning attacks, which cannot be detected by simpler methods due to the lack of access to local training data by the detector. State-of-the-art poisoning attack detection methods for FL have various weaknesses, e.g., the number of attackers has to be known or not high enough, working with i.i.d. data only, and high computational complexity. To overcome above weaknesses, we propose a novel framework for detecting poisoning attacks in FL, which employs a reference model based on a public dataset and an auditor model to detect malicious updates. We implemented a detector based on the proposed framework and using a one-class support vector machine (OC-SVM), which reaches the lowest possible computational complexity O(K) where K is the number of clients. We evaluated our detector's performance against state-of-the-art (SOTA) poisoning attacks for two typical applications of FL: electrocardiograph (ECG) classification and human activity recognition (HAR). Our experimental results validated the performance of our detector over other SOTA detection methods.
△ Less
Submitted 25 March, 2025; v1 submitted 18 July, 2022;
originally announced July 2022.
-
TIAger: Tumor-Infiltrating Lymphocyte Scoring in Breast Cancer for the TiGER Challenge
Authors:
Adam Shephard,
Mostafa Jahanifar,
Ruoyu Wang,
Muhammad Dawood,
Simon Graham,
Kastytis Sidlauskas,
Syed Ali Khurram,
Nasir Rajpoot,
Shan E Ahmed Raza
Abstract:
The quantification of tumor-infiltrating lymphocytes (TILs) has been shown to be an independent predictor for prognosis of breast cancer patients. Typically, pathologists give an estimate of the proportion of the stromal region that contains TILs to obtain a TILs score. The Tumor InfiltratinG lymphocytes in breast cancER (TiGER) challenge, aims to assess the prognostic significance of computer-gen…
▽ More
The quantification of tumor-infiltrating lymphocytes (TILs) has been shown to be an independent predictor for prognosis of breast cancer patients. Typically, pathologists give an estimate of the proportion of the stromal region that contains TILs to obtain a TILs score. The Tumor InfiltratinG lymphocytes in breast cancER (TiGER) challenge, aims to assess the prognostic significance of computer-generated TILs scores for predicting survival as part of a Cox proportional hazards model. For this challenge, as the TIAger team, we have developed an algorithm to first segment tumor vs. stroma, before localising the tumor bulk region for TILs detection. Finally, we use these outputs to generate a TILs score for each case. On preliminary testing, our approach achieved a tumor-stroma weighted Dice score of 0.791 and a FROC score of 0.572 for lymphocytic detection. For predicting survival, our model achieved a C-index of 0.719. These results achieved first place across the preliminary testing leaderboards of the TiGER challenge.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Unikernel Linux (UKL)
Authors:
Ali Raza,
Thomas Unger,
Matthew Boyd,
Eric Munson,
Parul Sohal,
Ulrich Drepper,
Richard Jones,
Daniel Bristot de Oliveira,
Larry Woodman,
Renato Mancuso,
Jonathan Appavoo,
Orran Krieger
Abstract:
This paper presents Unikernel Linux (UKL), a path toward integrating unikernel optimization techniques in Linux, a general purpose operating system. UKL adds a configuration option to Linux allowing for a single, optimized process to link with the kernel directly, and run at supervisor privilege. This UKL process does not require application source code modification, only a re-link with our, sligh…
▽ More
This paper presents Unikernel Linux (UKL), a path toward integrating unikernel optimization techniques in Linux, a general purpose operating system. UKL adds a configuration option to Linux allowing for a single, optimized process to link with the kernel directly, and run at supervisor privilege. This UKL process does not require application source code modification, only a re-link with our, slightly modified, Linux kernel and glibc. Unmodified applications show modest performance gains out of the box, and developers can further optimize applications for more significant gains (e.g. 26% throughput improvement for Redis). UKL retains support for co-running multiple user level processes capable of communicating with the UKL process using standard IPC. UKL preserves Linux's battle-tested codebase, community, and ecosystem of tools, applications, and hardware support. UKL runs both on bare-metal and virtual servers and supports multi-core execution. The changes to the Linux kernel are modest (1250 LOC).
△ Less
Submitted 22 June, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition
Authors:
Abdul Hameed Azeemi,
Ihsan Ayyub Qazi,
Agha Ali Raza
Abstract:
Self-supervised speech recognition models require considerable labeled training data for learning high-fidelity representations for Automatic Speech Recognition (ASR) which is computationally demanding and time-consuming. We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR. We discover that the dataset pruning strategies…
▽ More
Self-supervised speech recognition models require considerable labeled training data for learning high-fidelity representations for Automatic Speech Recognition (ASR) which is computationally demanding and time-consuming. We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR. We discover that the dataset pruning strategies used in vision tasks for sampling the most informative examples do not perform better than random subset selection on fine-tuning self-supervised ASR. We then present the COWERAGE algorithm for representative subset selection in self-supervised ASR. COWERAGE is based on our finding that ensuring the coverage of examples based on training Word Error Rate (WER) in the early training epochs leads to better generalization performance. Extensive experiments with the wav2vec 2.0 and HuBERT model on TIMIT, Librispeech, and LJSpeech datasets show the effectiveness of COWERAGE and its transferability across models, with up to 17% relative WER improvement over existing dataset pruning methods and random sampling. We also demonstrate that the coverage of training instances in terms of WER values ensures the inclusion of phonemically diverse examples, leading to better test accuracy in self-supervised speech recognition models.
△ Less
Submitted 11 April, 2023; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Non-equilibrium molecular geometries in graph neural networks
Authors:
Ali Raza,
E. Adrian Henle,
Xiaoli Fern
Abstract:
Graph neural networks have become a powerful framework for learning complex structure-property relationships and fast screening of chemical compounds. Recently proposed methods have demonstrated that using 3D geometry information of the molecule along with the bonding structure can lead to more accurate prediction on a wide range of properties. A common practice is to use 3D geometries computed th…
▽ More
Graph neural networks have become a powerful framework for learning complex structure-property relationships and fast screening of chemical compounds. Recently proposed methods have demonstrated that using 3D geometry information of the molecule along with the bonding structure can lead to more accurate prediction on a wide range of properties. A common practice is to use 3D geometries computed through density functional theory (DFT) for both training and testing of models. However, the computational time needed for DFT calculations can be prohibitively large. Moreover, many of the properties that we aim to predict can often be obtained with little or no overhead on top of the DFT calculations used to produce the 3D geometry information, voiding the need for a predictive model. To be practically useful for high-throughput chemical screening and drug discovery, it is desirable to work with 3D geometries obtained using less-accurate but much more efficient non-DFT methods. In this work we investigate the impact of using non-DFT conformations in the training and the testing of existing models and propose a data augmentation method for improving the prediction accuracy of classical forcefield-derived geometries.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Deep Learning based Prediction of MSI using MMR Markers in Colorectal Cancer
Authors:
Ruqayya Awan,
Mohammed Nimir,
Shan E Ahmed Raza,
Mohsin Bilal,
Johannes Lotz,
David Snead,
Andrew Robinson,
Nasir Rajpoot
Abstract:
The accurate diagnosis and molecular profiling of colorectal cancers are critical for planning the best treatment options for patients. Microsatellite instability (MSI) or mismatch repair (MMR) status plays a vital role in appropriate treatment selection, has prognostic implications and is used to investigate the possibility of patients having underlying genetic disorders (Lynch syndrome). NICE re…
▽ More
The accurate diagnosis and molecular profiling of colorectal cancers are critical for planning the best treatment options for patients. Microsatellite instability (MSI) or mismatch repair (MMR) status plays a vital role in appropriate treatment selection, has prognostic implications and is used to investigate the possibility of patients having underlying genetic disorders (Lynch syndrome). NICE recommends that all CRC patients should be offered MMR/MSI testing. Immunohistochemistry is commonly used to assess MMR status with subsequent molecular testing performed as required. This incurs significant extra costs and requires additional resources. The introduction of automated methods that can predict MSI or MMR status from a target image could substantially reduce the cost associated with MMR testing. Unlike previous studies on MSI prediction involving training a CNN using coarse labels (MSI vs Microsatellite Stable (MSS)), we have utilised fine-grain MMR labels for training purposes. In this paper, we present our work on predicting MSI status in a two-stage process using a single target slide either stained with CK8/18 or H&E. First, we trained a multi-headed convolutional neural network model where each head was responsible for predicting one of the MMR protein expressions. To this end, we performed the registration of MMR stained slides to the target slide as a pre-processing step. In the second stage, statistical features computed from the MMR prediction maps were used for the final MSI prediction. Our results demonstrated that MSI classification can be improved by incorporating fine-grained MMR labels in comparison to the previous approaches in which only coarse labels were utilised.
△ Less
Submitted 26 April, 2022; v1 submitted 24 February, 2022;
originally announced March 2022.
-
Deep Camera Pose Regression Using Pseudo-LiDAR
Authors:
Ali Raza,
Lazar Lolic,
Shahmir Akhter,
Alfonso Dela Cruz,
Michael Liut
Abstract:
An accurate and robust large-scale localization system is an integral component for active areas of research such as autonomous vehicles and augmented reality. To this end, many learning algorithms have been proposed that predict 6DOF camera pose from RGB or RGB-D images. However, previous methods that incorporate depth typically treat the data the same way as RGB images, often adding depth maps a…
▽ More
An accurate and robust large-scale localization system is an integral component for active areas of research such as autonomous vehicles and augmented reality. To this end, many learning algorithms have been proposed that predict 6DOF camera pose from RGB or RGB-D images. However, previous methods that incorporate depth typically treat the data the same way as RGB images, often adding depth maps as additional channels to RGB images and passing them through convolutional neural networks (CNNs). In this paper, we show that converting depth maps into pseudo-LiDAR signals, previously shown to be useful for 3D object detection, is a better representation for camera localization tasks by projecting point clouds that can accurately determine 6DOF camera pose. This is demonstrated by first comparing localization accuracies of a network operating exclusively on pseudo-LiDAR representations, with networks operating exclusively on depth maps. We then propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose. FusionLoc is a dual stream neural network, which aims to remedy common issues with typical 2D CNNs operating on RGB-D images. The results from this architecture are compared against various other state-of-the-art deep pose regression implementations using the 7 Scenes dataset. The findings are that FusionLoc performs better than a number of other camera localization methods, with a notable improvement being, on average, 0.33m and 4.35° more accurate than RGB-D PoseNet. By proving the validity of using pseudo-LiDAR signals over depth maps for localization, there are new considerations when implementing large-scale localization systems.
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
One Model is All You Need: Multi-Task Learning Enables Simultaneous Histology Image Segmentation and Classification
Authors:
Simon Graham,
Quoc Dang Vu,
Mostafa Jahanifar,
Shan E Ahmed Raza,
Fayyaz Minhas,
David Snead,
Nasir Rajpoot
Abstract:
The recent surge in performance for image analysis of digitised pathology slides can largely be attributed to the advances in deep learning. Deep models can be used to initially localise various structures in the tissue and hence facilitate the extraction of interpretable features for biomarker discovery. However, these models are typically trained for a single task and therefore scale poorly as w…
▽ More
The recent surge in performance for image analysis of digitised pathology slides can largely be attributed to the advances in deep learning. Deep models can be used to initially localise various structures in the tissue and hence facilitate the extraction of interpretable features for biomarker discovery. However, these models are typically trained for a single task and therefore scale poorly as we wish to adapt the model for an increasing number of different tasks. Also, supervised deep learning models are very data hungry and therefore rely on large amounts of training data to perform well. In this paper, we present a multi-task learning approach for segmentation and classification of nuclei, glands, lumina and different tissue regions that leverages data from multiple independent data sources. While ensuring that our tasks are aligned by the same tissue type and resolution, we enable meaningful simultaneous prediction with a single network. As a result of feature sharing, we also show that the learned representation can be used to improve the performance of additional tasks via transfer learning, including nuclear classification and signet ring cell detection. As part of this work, we train our developed Cerberus model on a huge amount of data, consisting of over 600K objects for segmentation and 440K patches for classification. We use our approach to process 599 colorectal whole-slide images from TCGA, where we localise 377 million, 900K and 2.1 million nuclei, glands and lumina, respectively and make the results available to the community for downstream analysis.
△ Less
Submitted 14 November, 2022; v1 submitted 28 February, 2022;
originally announced March 2022.