-
Labels in Extremes: How Well Calibrated are Extreme Multi-label Classifiers?
Authors:
Nasib Ullah,
Erik Schultheis,
Jinbin Zhang,
Rohit Babbar
Abstract:
Extreme multilabel classification (XMLC) problems occur in settings such as related product recommendation, large-scale document tagging, or ad prediction, and are characterized by a label space that can span millions of possible labels. There are two implicit tasks that the classifier performs: \emph{Evaluating} each potential label for its expected worth, and then \emph{selecting} the best candi…
▽ More
Extreme multilabel classification (XMLC) problems occur in settings such as related product recommendation, large-scale document tagging, or ad prediction, and are characterized by a label space that can span millions of possible labels. There are two implicit tasks that the classifier performs: \emph{Evaluating} each potential label for its expected worth, and then \emph{selecting} the best candidates. For the latter task, only the relative order of scores matters, and this is what is captured by the standard evaluation procedure in the XMLC literature. However, in many practical applications, it is important to have a good estimate of the actual probability of a label being relevant, e.g., to decide whether to pay the fee to be allowed to display the corresponding ad. To judge whether an extreme classifier is indeed suited to this task, one can look, for example, to whether it returns \emph{calibrated} probabilities, which has hitherto not been done in this field. Therefore, this paper aims to establish the current status quo of calibration in XMLC by providing a systematic evaluation, comprising nine models from four different model families across seven benchmark datasets. As naive application of Expected Calibration Error (ECE) leads to meaningless results in long-tailed XMC datasets, we instead introduce the notion of \emph{calibration@k} (e.g., ECE@k), which focusses on the top-$k$ probability mass, offering a more appropriate measure for evaluating probability calibration in XMLC scenarios. While we find that different models can exhibit widely varying reliability plots, we also show that post-training calibration via a computationally efficient isotonic regression method enhances model calibration without sacrificing prediction accuracy. Thus, the practitioner can choose the model family based on accuracy considerations, and leave calibration to isotonic regression.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
Authors:
Nasib Ullah,
Erik Schultheis,
Mike Lasby,
Yani Ioannou,
Rohit Babbar
Abstract:
In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less effici…
▽ More
In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classification with large output spaces, where memory-efficiency is paramount. With a label space of possibly millions of candidates, the classification layer alone will consume several gigabytes of memory. Switching from a dense to a fixed fan-in sparse layer updated with sparse evolutionary training (SET); however, severely hampers training convergence, especially at the largest label spaces. We find that poor gradient flow from the sparse classifier to the dense text encoder make it difficult to learn good input representations. By employing an intermediate layer or adding an auxiliary training objective, we recover most of the generalisation performance of the dense model. Overall, we demonstrate the applicability and practical benefits of DST in a challenging domain -- characterized by a highly skewed label distribution that differs substantially from typical DST benchmark datasets -- which enables end-to-end training with millions of labels on commodity hardware.
△ Less
Submitted 9 February, 2025; v1 submitted 5 November, 2024;
originally announced November 2024.
-
Large Language Model as a Teacher for Zero-shot Tagging at Extreme Scales
Authors:
Jinbin Zhang,
Nasib Ullah,
Rohit Babbar
Abstract:
Extreme Multi-label Text Classification (XMC) entails selecting the most relevant labels for an instance from a vast label set. Extreme Zero-shot XMC (EZ-XMC) extends this challenge by operating without annotated data, relying only on raw text instances and a predefined label set, making it particularly critical for addressing cold-start problems in large-scale recommendation and categorization sy…
▽ More
Extreme Multi-label Text Classification (XMC) entails selecting the most relevant labels for an instance from a vast label set. Extreme Zero-shot XMC (EZ-XMC) extends this challenge by operating without annotated data, relying only on raw text instances and a predefined label set, making it particularly critical for addressing cold-start problems in large-scale recommendation and categorization systems. State-of-the-art methods, such as MACLR and RTS, leverage lightweight bi-encoders but rely on suboptimal pseudo labels for training, such as document titles (MACLR) or document segments (RTS), which may not align well with the intended tagging or categorization tasks. On the other hand, LLM-based approaches, like ICXML, achieve better label-instance alignment but are computationally expensive and impractical for real-world EZ-XMC applications due to their heavy inference costs. In this paper, we introduce LMTX (Large language Model as Teacher for eXtreme classification), a novel framework that bridges the gap between these two approaches. LMTX utilizes an LLM to identify high-quality pseudo labels during training, while employing a lightweight bi-encoder for efficient inference. This design eliminates the need for LLMs at inference time, offering the benefits of improved label alignment without sacrificing computational efficiency. Our approach achieves superior performance and efficiency over both LLM and non-LLM based approaches, establishing a new state-of-the-art in EZ-XMC.
△ Less
Submitted 24 February, 2025; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Optimal Synthesis of Finite State Machines with Universal Gates using Evolutionary Algorithm
Authors:
Noor Ullah,
Khawaja M. Yahya,
Irfan Ahmed
Abstract:
This work presents an optimization method for the synthesis of finite state machines. The focus is on the reduction in the on-chip area and the cost of the circuit. A list of finite state machines from MCNC91 benchmark circuits have been evolved using Cartesian Genetic Programming. On the average, almost 30% of reduction in the total number of gates has been achieved. The effects of some parameter…
▽ More
This work presents an optimization method for the synthesis of finite state machines. The focus is on the reduction in the on-chip area and the cost of the circuit. A list of finite state machines from MCNC91 benchmark circuits have been evolved using Cartesian Genetic Programming. On the average, almost 30% of reduction in the total number of gates has been achieved. The effects of some parameters on the evolutionary process have also been discussed in the paper.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
CAMP: A Context-Aware Cricket Players Performance Metric
Authors:
Muhammad Sohaib Ayub,
Naimat Ullah,
Sarwan Ali,
Imdad Ullah Khan,
Mian Muhammad Awais,
Muhammad Asad Khan,
Safiullah Faizullah
Abstract:
Cricket is the second most popular sport after soccer in terms of viewership. However, the assessment of individual player performance, a fundamental task in team sports, is currently primarily based on aggregate performance statistics, including average runs and wickets taken. We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a crick…
▽ More
Cricket is the second most popular sport after soccer in terms of viewership. However, the assessment of individual player performance, a fundamental task in team sports, is currently primarily based on aggregate performance statistics, including average runs and wickets taken. We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a cricket match outcome. CAMP employs data mining methods and enables effective data-driven decision-making for selection and drafting, coaching and training, team line-ups, and strategy development. CAMP incorporates the exact context of performance, such as opponents' strengths and specific circumstances of games, such as pressure situations. We empirically evaluate CAMP on data of limited-over cricket matches between 2001 and 2019. In every match, a committee of experts declares one player as the best player, called Man of the M}atch (MoM). The top two rated players by CAMP match with MoM in 83\% of the 961 games. Thus, the CAMP rating of the best player closely matches that of the domain experts. By this measure, CAMP significantly outperforms the current best-known players' contribution measure based on the Duckworth-Lewis-Stern (DLS) method.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
ImageCAS: A Large-Scale Dataset and Benchmark for Coronary Artery Segmentation based on Computed Tomography Angiography Images
Authors:
An Zeng,
Chunbiao Wu,
Meiping Huang,
Jian Zhuang,
Shanshan Bi,
Dan Pan,
Najeeb Ullah,
Kaleem Nawaz Khan,
Tianchen Wang,
Yiyu Shi,
Xiaomeng Li,
Guisen Lin,
Xiaowei Xu
Abstract:
Cardiovascular disease (CVD) accounts for about half of non-communicable diseases. Vessel stenosis in the coronary artery is considered to be the major risk of CVD. Computed tomography angiography (CTA) is one of the widely used noninvasive imaging modalities in coronary artery diagnosis due to its superior image resolution. Clinically, segmentation of coronary arteries is essential for the diagno…
▽ More
Cardiovascular disease (CVD) accounts for about half of non-communicable diseases. Vessel stenosis in the coronary artery is considered to be the major risk of CVD. Computed tomography angiography (CTA) is one of the widely used noninvasive imaging modalities in coronary artery diagnosis due to its superior image resolution. Clinically, segmentation of coronary arteries is essential for the diagnosis and quantification of coronary artery disease. Recently, a variety of works have been proposed to address this problem. However, on one hand, most works rely on in-house datasets, and only a few works published their datasets to the public which only contain tens of images. On the other hand, their source code have not been published, and most follow-up works have not made comparison with existing works, which makes it difficult to judge the effectiveness of the methods and hinders the further exploration of this challenging yet critical problem in the community. In this paper, we propose a large-scale dataset for coronary artery segmentation on CTA images. In addition, we have implemented a benchmark in which we have tried our best to implement several typical existing methods. Furthermore, we propose a strong baseline method which combines multi-scale patch fusion and two-stage processing to extract the details of vessels. Comprehensive experiments show that the proposed method achieves better performance than existing works on the proposed large-scale dataset. The benchmark and the dataset are published at https://github.com/XiaoweiXu/ImageCAS-A-Large-Scale-Dataset-and-Benchmark-for-Coronary-Artery-Segmentation-based-on-CT.
△ Less
Submitted 17 October, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Thinking Hallucination for Video Captioning
Authors:
Nasib Ullah,
Partha Pratim Mohanta
Abstract:
With the advent of rich visual representations and pre-trained language models, video captioning has seen continuous improvement over time. Despite the performance improvement, video captioning models are prone to hallucination. Hallucination refers to the generation of highly pathological descriptions that are detached from the source material. In video captioning, there are two kinds of hallucin…
▽ More
With the advent of rich visual representations and pre-trained language models, video captioning has seen continuous improvement over time. Despite the performance improvement, video captioning models are prone to hallucination. Hallucination refers to the generation of highly pathological descriptions that are detached from the source material. In video captioning, there are two kinds of hallucination: object and action hallucination. Instead of endeavoring to learn better representations of a video, in this work, we investigate the fundamental sources of the hallucination problem. We identify three main factors: (i) inadequate visual features extracted from pre-trained models, (ii) improper influences of source and target contexts during multi-modal fusion, and (iii) exposure bias in the training strategy. To alleviate these problems, we propose two robust solutions: (a) the introduction of auxiliary heads trained in multi-label settings on top of the extracted visual features and (b) the addition of context gates, which dynamically select the features during fusion. The standard evaluation metrics for video captioning measures similarity with ground truth captions and do not adequately capture object and action relevance. To this end, we propose a new metric, COAHA (caption object and action hallucination assessment), which assesses the degree of hallucination. Our method achieves state-of-the-art performance on the MSR-Video to Text (MSR-VTT) and the Microsoft Research Video Description Corpus (MSVD) datasets, especially by a massive margin in CIDEr score.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
Authors:
R. Gnana Praveen,
Wheidima Carneiro de Melo,
Nasib Ullah,
Haseeb Aslam,
Osama Zeeshan,
Théo Denorme,
Marco Pedersoli,
Alessandro Koerich,
Simon Bacon,
Patrick Cardinal,
Eric Granger
Abstract:
Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively lever…
▽ More
Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complementary nature of A-V modalities. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos. Specifically, we propose a joint cross-attention model that relies on the complementary relationships to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal. The proposed fusion model efficiently leverages the inter-modal relationships, while reducing the heterogeneity between the features. In particular, it computes the cross-attention weights based on correlation between the combined feature representation and individual modalities. By deploying the combined A-V feature representation into the cross-attention module, the performance of our fusion module improves significantly over the vanilla cross-attention module. Experimental results on validation-set videos from the AffWild2 dataset indicate that our proposed A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches. The code is available on GitHub: https://github.com/praveena2j/JointCrossAttentional-AV-Fusion.
△ Less
Submitted 6 July, 2024; v1 submitted 28 March, 2022;
originally announced March 2022.
-
A k-mer Based Approach for SARS-CoV-2 Variant Identification
Authors:
Sarwan Ali,
Bikram Sahoo,
Naimat Ullah,
Alexander Zelikovskiy,
Murray Patterson,
Imdadullah Khan
Abstract:
With the rapid spread of the novel coronavirus (COVID-19) across the globe and its continuous mutation, it is of pivotal importance to design a system to identify different known (and unknown) variants of SARS-CoV-2. Identifying particular variants helps to understand and model their spread patterns, design effective mitigation strategies, and prevent future outbreaks. It also plays a crucial role…
▽ More
With the rapid spread of the novel coronavirus (COVID-19) across the globe and its continuous mutation, it is of pivotal importance to design a system to identify different known (and unknown) variants of SARS-CoV-2. Identifying particular variants helps to understand and model their spread patterns, design effective mitigation strategies, and prevent future outbreaks. It also plays a crucial role in studying the efficacy of known vaccines against each variant and modeling the likelihood of breakthrough infections. It is well known that the spike protein contains most of the information/variation pertaining to coronavirus variants.
In this paper, we use spike sequences to classify different variants of the coronavirus in humans. We show that preserving the order of the amino acids helps the underlying classifiers to achieve better performance. We also show that we can train our model to outperform the baseline algorithms using only a small number of training samples ($1\%$ of the data). Finally, we show the importance of the different amino acids which play a key role in identifying variants and how they coincide with those reported by the USA's Centers for Disease Control and Prevention (CDC).
△ Less
Submitted 12 October, 2021; v1 submitted 7 August, 2021;
originally announced August 2021.
-
Boosting Video Captioning with Dynamic Loss Network
Authors:
Nasib Ullah,
Partha Pratim Mohanta
Abstract:
Video captioning is one of the challenging problems at the intersection of vision and language, having many real-life applications in video retrieval, video surveillance, assisting visually challenged people, Human-machine interface, and many more. Recent deep learning based methods have shown promising results but are still on the lower side than other vision tasks (such as image classification,…
▽ More
Video captioning is one of the challenging problems at the intersection of vision and language, having many real-life applications in video retrieval, video surveillance, assisting visually challenged people, Human-machine interface, and many more. Recent deep learning based methods have shown promising results but are still on the lower side than other vision tasks (such as image classification, object detection). A significant drawback with existing video captioning methods is that they are optimized over cross-entropy loss function, which is uncorrelated to the de facto evaluation metrics (BLEU, METEOR, CIDER, ROUGE). In other words, cross-entropy is not a proper surrogate of the true loss function for video captioning. To mitigate this, methods like REINFORCE, Actor-Critic, and Minimum Risk Training (MRT) have been applied but have limitations and are not very effective. This paper proposes an alternate solution by introducing a dynamic loss network (DLN), providing an additional feedback signal that reflects the evaluation metrics directly. Our solution proves to be more efficient than other solutions and can be easily adapted to similar tasks. Our results on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSRVTT) datasets outperform previous methods.
△ Less
Submitted 1 February, 2022; v1 submitted 24 July, 2021;
originally announced July 2021.
-
Op2Vec: An Opcode Embedding Technique and Dataset Design for End-to-End Detection of Android Malware
Authors:
Kaleem Nawaz Khan,
Najeeb Ullah,
Sikandar Ali,
Muhammad Salman Khan,
Mohammad Nauman,
Anwar Ghani
Abstract:
Android is one of the leading operating systems for smart phones in terms of market share and usage. Unfortunately, it is also an appealing target for attackers to compromise its security through malicious applications. To tackle this issue, domain experts and researchers are trying different techniques to stop such attacks. All the attempts of securing Android platform are somewhat successful. Ho…
▽ More
Android is one of the leading operating systems for smart phones in terms of market share and usage. Unfortunately, it is also an appealing target for attackers to compromise its security through malicious applications. To tackle this issue, domain experts and researchers are trying different techniques to stop such attacks. All the attempts of securing Android platform are somewhat successful. However, existing detection techniques have severe shortcomings, including the cumbersome process of feature engineering. Designing representative features require expert domain knowledge. There is a need for minimizing human experts' intervention by circumventing handcrafted feature engineering. Deep learning could be exploited by extracting deep features automatically. Previous work has shown that operational codes (opcodes) of executables provide key information to be used with deep learning models for detection process of malicious applications. The only challenge is to feed opcodes information to deep learning models. Existing techniques use one-hot encoding to tackle the challenge. However, the one-hot encoding scheme has severe limitations. In this paper, we introduce; (1) a novel technique for opcodes embedding, which we name Op2Vec, (2) based on the learned Op2Vec we have developed a dataset for end-to-end detection of android malware. Introducing the end-to-end Android malware detection technique avoids expert-intensive handcrafted features extraction, and ensures automation. Some of the recent deep learning-based techniques showed significantly improved results when tested with the proposed approach and achieved an average detection accuracy of 97.47%, precision of 0.976 and F1 score of 0.979.
△ Less
Submitted 1 March, 2022; v1 submitted 10 April, 2021;
originally announced April 2021.
-
A Survey of Home Energy Management Systems in Future Smart Grid Communications
Authors:
I. Khan,
N. Javaid,
M. N. Ullah,
A. Mahmood,
M. U. Farooq
Abstract:
In this paper we present a systematic review of various home energy management (HEM) schemes. Employment of home energy management programs will make the electricity consumption smarter and more efficient. Advantages of HEM include, increased savings for consumers as well as utilities, reduced peak to average ratio (PAR) and peak demand. Where there are numerous applications of smart grid technolo…
▽ More
In this paper we present a systematic review of various home energy management (HEM) schemes. Employment of home energy management programs will make the electricity consumption smarter and more efficient. Advantages of HEM include, increased savings for consumers as well as utilities, reduced peak to average ratio (PAR) and peak demand. Where there are numerous applications of smart grid technologies, home energy management is probably the most important one to be addressed. Utilities across the globe have taken various steps for efficient consumption of electricity. New pricing schemes like, Real Time Pricing (RTP), Time of Use (ToU), Inclining Block Rates (IBR), Critical Peak Pricing (CPP) etc, have been proposed for smart grid. Distributed Energy Resources (DER) (local generation) and/or home appliances coordination along with different tariff schemes lead towards efficient consumption of electricity. This work also discusses a HEM systems general architecture and various challenges in implementation of this architecture in smart grid.
△ Less
Submitted 26 July, 2013;
originally announced July 2013.
-
A Survey of Different Residential Energy Consumption Controlling Techniques for Autonomous DSM in Future Smart Grid Communications
Authors:
M. N. Ullah,
A. Mahmood,
S. Razzaq,
M. Ilahi,
R. D. Khan,
N. Javaid
Abstract:
In this work, we present a survey of residential load controlling techniques to implement demand side management in future smart grid. Power generation sector facing important challenges both in quality and quantity to meet the increasing requirements of consumers. Energy efficiency, reliability, economics and integration of new energy resources are important issues to enhance the stability of pow…
▽ More
In this work, we present a survey of residential load controlling techniques to implement demand side management in future smart grid. Power generation sector facing important challenges both in quality and quantity to meet the increasing requirements of consumers. Energy efficiency, reliability, economics and integration of new energy resources are important issues to enhance the stability of power system infrastructure. Optimal energy consumption scheduling minimizes the energy consumption cost and reduce the peak-to-average ratio (PAR) as well as peak load demand in peak hours. In this work, we discuss different energy consumption scheduling schemes that schedule the household appliances in real-time to achieve minimum energy consumption cost and reduce peak load curve in peak hours to shape the peak load demand.
△ Less
Submitted 5 June, 2013;
originally announced June 2013.
-
An Overview of IEEE 802.15.6 Standard
Authors:
Kyung Sup Kwak,
Sana Ullah,
Niamat Ullah
Abstract:
Wireless Body Area Networks (WBAN) has emerged as a key technology to provide real-time health monitoring of a patient and diagnose many life threatening diseases. WBAN operates in close vicinity to, on, or inside a human body and supports a variety of medical and non-medical applications. IEEE 802 has established a Task Group called IEEE 802.15.6 for the standardization of WBAN. The purpose of th…
▽ More
Wireless Body Area Networks (WBAN) has emerged as a key technology to provide real-time health monitoring of a patient and diagnose many life threatening diseases. WBAN operates in close vicinity to, on, or inside a human body and supports a variety of medical and non-medical applications. IEEE 802 has established a Task Group called IEEE 802.15.6 for the standardization of WBAN. The purpose of the group is to establish a communication standard optimized for low-power in-body/on-body nodes to serve a variety of medical and non-medical applications. This paper explains the most important features of the new IEEE 802.15.6 standard. The standard defines a Medium Access Control (MAC) layer supporting several Physical (PHY) layers. We briefly overview the PHY and MAC layers specifications together with the bandwidth efficiency of IEEE 802.15.6 standard. We also discuss the security paradigm of the standard.
△ Less
Submitted 20 February, 2011;
originally announced February 2011.
-
A Review of Wireless Body Area Networks for Medical Applications
Authors:
Sana Ullah,
Pervez Khan,
Niamat Ullah,
Shahnaz Saleem,
Henry Higgins,
Kyung Sup Kwak
Abstract:
Recent advances in Micro-Electro-Mechanical Systems (MEMS) technology, integrated circuits, and wireless communication have allowed the realization of Wireless Body Area Networks (WBANs). WBANs promise unobtrusive ambulatory health monitoring for a long period of time and provide real-time updates of the patient's status to the physician. They are widely used for ubiquitous healthcare, entertainme…
▽ More
Recent advances in Micro-Electro-Mechanical Systems (MEMS) technology, integrated circuits, and wireless communication have allowed the realization of Wireless Body Area Networks (WBANs). WBANs promise unobtrusive ambulatory health monitoring for a long period of time and provide real-time updates of the patient's status to the physician. They are widely used for ubiquitous healthcare, entertainment, and military applications. This paper reviews the key aspects of WBANs for numerous applications. We present a WBAN infrastructure that provides solutions to on-demand, emergency, and normal traffic. We further discuss in-body antenna design and low-power MAC protocol for WBAN. In addition, we briefly outline some of the WBAN applications with examples. Our discussion realizes a need for new power-efficient solutions towards in-body and on-body sensor networks.
△ Less
Submitted 3 August, 2010; v1 submitted 6 January, 2010;
originally announced January 2010.