-
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
Authors:
Abdul Hannan,
Muhammad Arslan Manzoor,
Shah Nawaz,
Muhammad Irzam Liaqat,
Markus Schedl,
Mubashir Noman
Abstract:
We study the task of learning association between faces and voices, which is gaining interest in the multimodal community lately. These methods suffer from the deliberate crafting of negative mining procedures as well as the reliance on the distant margin parameter. These issues are addressed by learning a joint embedding space in which orthogonality constraints are applied to the fused embeddings…
▽ More
We study the task of learning association between faces and voices, which is gaining interest in the multimodal community lately. These methods suffer from the deliberate crafting of negative mining procedures as well as the reliance on the distant margin parameter. These issues are addressed by learning a joint embedding space in which orthogonality constraints are applied to the fused embeddings of faces and voices. However, embedding spaces of faces and voices possess different characteristics and require spaces to be aligned before fusing them. To this end, we propose a method that accurately aligns the embedding spaces and fuses them with an enhanced gated fusion thereby improving the performance of face-voice association. Extensive experiments on the VoxCeleb dataset reveals the merits of the proposed approach.
△ Less
Submitted 28 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking
Authors:
Sarfraz Ahmad,
Hasan Iqbal,
Momina Ahsan,
Numaan Naeem,
Muhammad Ahsan Riaz Khan,
Arham Riaz,
Muhammad Arslan Manzoor,
Yuxia Wang,
Preslav Nakov
Abstract:
The rapid use of large language models (LLMs) has raised critical concerns regarding the factual reliability of their outputs, especially in low-resource languages such as Urdu. Existing automated fact-checking solutions overwhelmingly focus on English, leaving a significant gap for the 200+ million Urdu speakers worldwide. In this work, we introduce UrduFactCheck, the first comprehensive, modular…
▽ More
The rapid use of large language models (LLMs) has raised critical concerns regarding the factual reliability of their outputs, especially in low-resource languages such as Urdu. Existing automated fact-checking solutions overwhelmingly focus on English, leaving a significant gap for the 200+ million Urdu speakers worldwide. In this work, we introduce UrduFactCheck, the first comprehensive, modular fact-checking framework specifically tailored for Urdu. Our system features a dynamic, multi-strategy evidence retrieval pipeline that combines monolingual and translation-based approaches to address the scarcity of high-quality Urdu evidence. We curate and release two new hand-annotated benchmarks: UrduFactBench for claim verification and UrduFactQA for evaluating LLM factuality. Extensive experiments demonstrate that UrduFactCheck, particularly its translation-augmented variants, consistently outperforms baselines and open-source alternatives on multiple metrics. We further benchmark twelve state-of-the-art (SOTA) LLMs on factual question answering in Urdu, highlighting persistent gaps between proprietary and open-source models. UrduFactCheck's code and datasets are open-sourced and publicly available at https://github.com/mbzuai-nlp/UrduFactCheck.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
A Decade of Deep Learning: A Survey on The Magnificent Seven
Authors:
Dilshod Azizov,
Muhammad Arslan Manzoor,
Velibor Bojkovic,
Yingxu Wang,
Zixiao Wang,
Zangir Iklassov,
Kailong Zhao,
Liang Li,
Siwei Liu,
Yu Zhong,
Wei Liu,
Shangsong Liang
Abstract:
Deep learning has fundamentally reshaped the landscape of artificial intelligence over the past decade, enabling remarkable achievements across diverse domains. At the heart of these developments lie multi-layered neural network architectures that excel at automatic feature extraction, leading to significant improvements in machine learning tasks. To demystify these advances and offer accessible g…
▽ More
Deep learning has fundamentally reshaped the landscape of artificial intelligence over the past decade, enabling remarkable achievements across diverse domains. At the heart of these developments lie multi-layered neural network architectures that excel at automatic feature extraction, leading to significant improvements in machine learning tasks. To demystify these advances and offer accessible guidance, we present a comprehensive overview of the most influential deep learning algorithms selected through a broad-based survey of the field. Our discussion centers on pivotal architectures, including Residual Networks, Transformers, Generative Adversarial Networks, Variational Autoencoders, Graph Neural Networks, Contrastive Language-Image Pre-training, and Diffusion models. We detail their historical context, highlight their mathematical foundations and algorithmic principles, and examine subsequent variants, extensions, and practical considerations such as training methodologies, normalization techniques, and learning rate schedules. Beyond historical and technical insights, we also address their applications, challenges, and potential research directions. This survey aims to serve as a practical manual for both newcomers seeking an entry point into cutting-edge deep learning methods and experienced researchers transitioning into this rapidly evolving domain.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
MGM: Global Understanding of Audience Overlap Graphs for Predicting the Factuality and the Bias of News Media
Authors:
Muhammad Arslan Manzoor,
Ruihong Zeng,
Dilshod Azizov,
Preslav Nakov,
Shangsong Liang
Abstract:
In the current era of rapidly growing digital data, evaluating the political bias and factuality of news outlets has become more important for seeking reliable information online. In this work, we study the classification problem of profiling news media from the lens of political bias and factuality. Traditional profiling methods, such as Pre-trained Language Models (PLMs) and Graph Neural Network…
▽ More
In the current era of rapidly growing digital data, evaluating the political bias and factuality of news outlets has become more important for seeking reliable information online. In this work, we study the classification problem of profiling news media from the lens of political bias and factuality. Traditional profiling methods, such as Pre-trained Language Models (PLMs) and Graph Neural Networks (GNNs) have shown promising results, but they face notable challenges. PLMs focus solely on textual features, causing them to overlook the complex relationships between entities, while GNNs often struggle with media graphs containing disconnected components and insufficient labels. To address these limitations, we propose MediaGraphMind (MGM), an effective solution within a variational Expectation-Maximization (EM) framework. Instead of relying on limited neighboring nodes, MGM leverages features, structural patterns, and label information from globally similar nodes. Such a framework not only enables GNNs to capture long-range dependencies for learning expressive node representations but also enhances PLMs by integrating structural information and therefore improving the performance of both models. The extensive experiments demonstrate the effectiveness of the proposed framework and achieve new state-of-the-art results. Further, we share our repository1 which contains the dataset, code, and documentation
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
FineFACE: Fair Facial Attribute Classification Leveraging Fine-grained Features
Authors:
Ayesha Manzoor,
Ajita Rattani
Abstract:
Published research highlights the presence of demographic bias in automated facial attribute classification algorithms, particularly impacting women and individuals with darker skin tones. Existing bias mitigation techniques typically require demographic annotations and often obtain a trade-off between fairness and accuracy, i.e., Pareto inefficiency. Facial attributes, whether common ones like ge…
▽ More
Published research highlights the presence of demographic bias in automated facial attribute classification algorithms, particularly impacting women and individuals with darker skin tones. Existing bias mitigation techniques typically require demographic annotations and often obtain a trade-off between fairness and accuracy, i.e., Pareto inefficiency. Facial attributes, whether common ones like gender or others such as "chubby" or "high cheekbones", exhibit high interclass similarity and intraclass variation across demographics leading to unequal accuracy. This requires the use of local and subtle cues using fine-grained analysis for differentiation. This paper proposes a novel approach to fair facial attribute classification by framing it as a fine-grained classification problem. Our approach effectively integrates both low-level local features (like edges and color) and high-level semantic features (like shapes and structures) through cross-layer mutual attention learning. Here, shallow to deep CNN layers function as experts, offering category predictions and attention regions. An exhaustive evaluation on facial attribute annotated datasets demonstrates that our FineFACE model improves accuracy by 1.32% to 1.74% and fairness by 67% to 83.6%, over the SOTA bias mitigation techniques. Importantly, our approach obtains a Pareto-efficient balance between accuracy and fairness between demographic groups. In addition, our approach does not require demographic annotations and is applicable to diverse downstream classification tasks. To facilitate reproducibility, the code and dataset information is available at https://github.com/VCBSL-Fairness/FineFACE.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Deformable Convolution Based Road Scene Semantic Segmentation of Fisheye Images in Autonomous Driving
Authors:
Anam Manzoor,
Aryan Singh,
Ganesh Sistu,
Reenu Mohandas,
Eoin Grua,
Anthony Scanlan,
CiarĂ¡n Eising
Abstract:
This study investigates the effectiveness of modern Deformable Convolutional Neural Networks (DCNNs) for semantic segmentation tasks, particularly in autonomous driving scenarios with fisheye images. These images, providing a wide field of view, pose unique challenges for extracting spatial and geometric information due to dynamic changes in object attributes. Our experiments focus on segmenting t…
▽ More
This study investigates the effectiveness of modern Deformable Convolutional Neural Networks (DCNNs) for semantic segmentation tasks, particularly in autonomous driving scenarios with fisheye images. These images, providing a wide field of view, pose unique challenges for extracting spatial and geometric information due to dynamic changes in object attributes. Our experiments focus on segmenting the WoodScape fisheye image dataset into ten distinct classes, assessing the Deformable Networks' ability to capture intricate spatial relationships and improve segmentation accuracy. Additionally, we explore different loss functions to address class imbalance issues and compare the performance of conventional CNN architectures with Deformable Convolution-based CNNs, including Vanilla U-Net and Residual U-Net architectures. The significant improvement in mIoU score resulting from integrating Deformable CNNs demonstrates their effectiveness in handling the geometric distortions present in fisheye imagery, exceeding the performance of traditional CNN architectures. This underscores the significant role of Deformable convolution in enhancing semantic segmentation performance for fisheye imagery.
△ Less
Submitted 1 October, 2024; v1 submitted 23 July, 2024;
originally announced July 2024.
-
Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs
Authors:
Muhammad Arslan Manzoor,
Yuxia Wang,
Minghan Wang,
Preslav Nakov
Abstract:
Empathy plays a pivotal role in fostering prosocial behavior, often triggered by the sharing of personal experiences through narratives. However, modeling empathy using NLP approaches remains challenging due to its deep interconnection with human interaction dynamics. Previous approaches, which involve fine-tuning language models (LMs) on human-annotated empathic datasets, have had limited success…
▽ More
Empathy plays a pivotal role in fostering prosocial behavior, often triggered by the sharing of personal experiences through narratives. However, modeling empathy using NLP approaches remains challenging due to its deep interconnection with human interaction dynamics. Previous approaches, which involve fine-tuning language models (LMs) on human-annotated empathic datasets, have had limited success. In our pursuit of improving empathy understanding in LMs, we propose several strategies, including contrastive learning with masked LMs and supervised fine-tuning with large language models. While these methods show improvements over previous methods, the overall results remain unsatisfactory. To better understand this trend, we performed an analysis which reveals a low agreement among annotators. This lack of consensus hinders training and highlights the subjective nature of the task. We also explore the cultural impact on annotations. To study this, we meticulously collected story pairs in Urdu language and find that subjectivity in interpreting empathy among annotators appears to be independent of cultural background. Our systematic exploration of LMs' understanding of empathy reveals substantial opportunities for further investigation in both task formulation and modeling.
△ Less
Submitted 31 October, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Factuality of Large Language Models: A Survey
Authors:
Yuxia Wang,
Minghan Wang,
Muhammad Arslan Manzoor,
Fei Liu,
Georgi Georgiev,
Rocktim Jyoti Das,
Preslav Nakov
Abstract:
Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicabili…
▽ More
Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicability in real-world scenarios. As a result, research on evaluating and improving the factuality of LLMs has attracted a lot of attention recently. In this survey, we critically analyze existing work with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation. We further offer an outlook on where future research should go.
△ Less
Submitted 31 October, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
AntiPhishStack: LSTM-based Stacked Generalization Model for Optimized Phishing URL Detection
Authors:
Saba Aslam,
Hafsa Aslam,
Arslan Manzoor,
Chen Hui,
Abdur Rasool
Abstract:
The escalating reliance on revolutionary online web services has introduced heightened security risks, with persistent challenges posed by phishing despite extensive security measures. Traditional phishing systems, reliant on machine learning and manual features, struggle with evolving tactics. Recent advances in deep learning offer promising avenues for tackling novel phishing challenges and mali…
▽ More
The escalating reliance on revolutionary online web services has introduced heightened security risks, with persistent challenges posed by phishing despite extensive security measures. Traditional phishing systems, reliant on machine learning and manual features, struggle with evolving tactics. Recent advances in deep learning offer promising avenues for tackling novel phishing challenges and malicious URLs. This paper introduces a two-phase stack generalized model named AntiPhishStack, designed to detect phishing sites. The model leverages the learning of URLs and character-level TF-IDF features symmetrically, enhancing its ability to combat emerging phishing threats. In Phase I, features are trained on a base machine learning classifier, employing K-fold cross-validation for robust mean prediction. Phase II employs a two-layered stacked-based LSTM network with five adaptive optimizers for dynamic compilation, ensuring premier prediction on these features. Additionally, the symmetrical predictions from both phases are optimized and integrated to train a meta-XGBoost classifier, contributing to a final robust prediction. The significance of this work lies in advancing phishing detection with AntiPhishStack, operating without prior phishing-specific feature knowledge. Experimental validation on two benchmark datasets, comprising benign and phishing or malicious URLs, demonstrates the model's exceptional performance, achieving a notable 96.04% accuracy compared to existing studies. This research adds value to the ongoing discourse on symmetry and asymmetry in information security and provides a forward-thinking solution for enhancing network security in the face of evolving cyber threats.
△ Less
Submitted 21 January, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
ACPO: AI-Enabled Compiler Framework
Authors:
Amir H. Ashouri,
Muhammad Asif Manzoor,
Duc Minh Vu,
Raymond Zhang,
Colin Toft,
Ziwen Wang,
Angel Zhang,
Bryan Chan,
Tomasz S. Czajkowski,
Yaoqing Gao
Abstract:
The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this realization has been around since the late 90s, only recent advancements in ML enabled a practical application of ML to compilers as an end-to-end framework.…
▽ More
The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this realization has been around since the late 90s, only recent advancements in ML enabled a practical application of ML to compilers as an end-to-end framework.
This paper presents ACPO: An AI-Enabled Compiler Framework, a novel framework that provides LLVM with simple and comprehensive tools to benefit from employing ML models for different optimization passes. We first showcase the high-level view, class hierarchy, and functionalities of ACPO and subsequently, demonstrate \taco{a couple of use cases of ACPO by ML-enabling the Loop Unroll and Function Inlining passes used in LLVM's O3. and finally, describe how ACPO can be leveraged to optimize other passes. Experimental results reveal that the ACPO model for Loop Unroll can gain on average 4%, 3%, 5.4%, and 0.2% compared to LLVM's vanilla O3 optimization when deployed on Polybench, Coral-2, CoreMark, and Graph-500, respectively. Furthermore, by including both Function Inlining and Loop Unroll models, ACPO can provide a combined speedup of 4.5% on Polybench and 2.4% on Cbench when compared with LLVM's O3, respectively.
△ Less
Submitted 13 January, 2025; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Transfer Learning Based Diagnosis and Analysis of Lung Sound Aberrations
Authors:
Hafsa Gulzar,
Jiyun Li,
Arslan Manzoor,
Sadaf Rehmat,
Usman Amjad,
Hadiqa Jalil Khan
Abstract:
With the development of computer -systems that can collect and analyze enormous volumes of data, the medical profession is establishing several non-invasive tools. This work attempts to develop a non-invasive technique for identifying respiratory sounds acquired by a stethoscope and voice recording software via machine learning techniques. This study suggests a trained and proven CNN-based approac…
▽ More
With the development of computer -systems that can collect and analyze enormous volumes of data, the medical profession is establishing several non-invasive tools. This work attempts to develop a non-invasive technique for identifying respiratory sounds acquired by a stethoscope and voice recording software via machine learning techniques. This study suggests a trained and proven CNN-based approach for categorizing respiratory sounds. A visual representation of each audio sample is constructed, allowing resource identification for classification using methods like those used to effectively describe visuals. We used a technique called Mel Frequency Cepstral Coefficients (MFCCs). Here, features are retrieved and categorized via VGG16 (transfer learning) and prediction is accomplished using 5-fold cross-validation. Employing various data splitting techniques, Respiratory Sound Database obtained cutting-edge results, including accuracy of 95%, precision of 88%, recall score of 86%, and F1 score of 81%. The ICBHI dataset is used to train and test the model.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Authors:
Muhammad Arslan Manzoor,
Sarah Albarri,
Ziting Xian,
Zaiqiao Meng,
Preslav Nakov,
Shangsong Liang
Abstract:
Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA), Natural Language for Visual Reasoning (NLVR), and Vision Language Retrieval (VLR). Among these applications, cross-modal interaction and complementary informati…
▽ More
Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA), Natural Language for Visual Reasoning (NLVR), and Vision Language Retrieval (VLR). Among these applications, cross-modal interaction and complementary information from different modalities are crucial for advanced models to perform any multimodal task, e.g., understand, recognize, retrieve, or generate optimally. Researchers have proposed diverse methods to address these tasks. The different variants of transformer-based architectures performed extraordinarily on multiple modalities. This survey presents the comprehensive literature on the evolution and enhancement of deep learning multimodal architectures to deal with textual, visual and audio features for diverse cross-modal and modern multimodal tasks. This study summarizes the (i) recent task-specific deep learning methodologies, (ii) the pretraining types and multimodal pretraining objectives, (iii) from state-of-the-art pretrained multimodal approaches to unifying architectures, and (iv) multimodal task categories and possible future improvements that can be devised for better multimodal learning. Moreover, we prepare a dataset section for new researchers that covers most of the benchmarks for pretraining and finetuning. Finally, major challenges, gaps, and potential research topics are explored. A constantly-updated paperlist related to our survey is maintained at https://github.com/marslanm/multimodality-representation-learning.
△ Less
Submitted 1 March, 2024; v1 submitted 1 February, 2023;
originally announced February 2023.
-
INFACT: An Online Human Evaluation Framework for Conversational Recommendation
Authors:
Ahtsham Manzoor,
Dietmar jannach
Abstract:
Conversational recommender systems (CRS) are interactive agents that support their users in recommendation-related goals through multi-turn conversations. Generally, a CRS can be evaluated in various dimensions. Today's CRS mainly rely on offline(computational) measures to assess the performance of their algorithms in comparison to different baselines. However, offline measures can have limitation…
▽ More
Conversational recommender systems (CRS) are interactive agents that support their users in recommendation-related goals through multi-turn conversations. Generally, a CRS can be evaluated in various dimensions. Today's CRS mainly rely on offline(computational) measures to assess the performance of their algorithms in comparison to different baselines. However, offline measures can have limitations, for example, when the metrics for comparing a newly generated response with a ground truth do not correlate with human perceptions, because various alternative generated responses might be suitable too in a given dialog situation. Current research on machine learning-based CRS models therefore acknowledges the importance of humans in the evaluation process, knowing that pure offline measures may not be sufficient in evaluating a highly interactive system like a CRS.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
INSPIRED2: An Improved Dataset for Sociable Conversational Recommendation
Authors:
Ahtsham Manzoor,
Dietmar Jannach
Abstract:
Conversational recommender systems (CRS) that are able to interact with users in natural language often utilize recommendation dialogs which were previously collected with the help of paired humans, where one plays the role of a seeker and the other as a recommender. These recommendation dialogs include items and entities that indicate the users' preferences. In order to precisely model the seeker…
▽ More
Conversational recommender systems (CRS) that are able to interact with users in natural language often utilize recommendation dialogs which were previously collected with the help of paired humans, where one plays the role of a seeker and the other as a recommender. These recommendation dialogs include items and entities that indicate the users' preferences. In order to precisely model the seekers' preferences and respond consistently, CRS typically rely on item and entity annotations. A recent example of such a dataset is INSPIRED, which consists of recommendation dialogs for sociable conversational recommendation, where items and entities were annotated using automatic keyword or pattern matching techniques. An analysis of this dataset unfortunately revealed that there is a substantial number of cases where items and entities were either wrongly annotated or annotations were missing at all. This leads to the question to what extent automatic techniques for annotations are effective. Moreover, it is important to study impact of annotation quality on the overall effectiveness of a CRS in terms of the quality of the system's responses. To study these aspects, we manually fixed the annotations in INSPIRED. We then evaluated the performance of several benchmark CRS using both versions of the dataset. Our analyses suggest that the improved version of the dataset, i.e., INSPIRED2, helped increase the performance of several benchmark CRS, emphasizing the importance of data quality both for end-to-end learning and retrieval-based approaches to conversational recommendation. We release our improved dataset (INSPIRED2) publicly at https://github.com/ahtsham58/INSPIRED2.
△ Less
Submitted 7 September, 2022; v1 submitted 8 August, 2022;
originally announced August 2022.
-
MLGOPerf: An ML Guided Inliner to Optimize Performance
Authors:
Amir H. Ashouri,
Mostafa Elhoushi,
Yuzhe Hua,
Xiang Wang,
Muhammad Asif Manzoor,
Bryan Chan,
Yaoqing Gao
Abstract:
For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers, i.e., LLVM, to seamlessly integrate the former into the optimization pipeline of a compiler to be readily deployed by the user. MLGO was among the first of such pro…
▽ More
For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers, i.e., LLVM, to seamlessly integrate the former into the optimization pipeline of a compiler to be readily deployed by the user. MLGO was among the first of such projects and it only strives to reduce the code size of a binary with an ML-based Inliner using Reinforcement Learning.
This paper presents MLGOPerf; the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model to generate rewards used for training a retargeted Reinforcement learning agent, previously used as the primary model by MLGO. It does so by predicting the post-inlining speedup of a function under analysis and it enables a fast training framework for the primary model which otherwise wouldn't be practical. The experimental results show MLGOPerf is able to gain up to 1.8% and 2.2% with respect to LLVM's optimization at O3 when trained for performance on SPEC CPU2006 and Cbench benchmarks, respectively. Furthermore, the proposed approach provides up to 26% increased opportunities to autotune code regions for our benchmarks which can be translated into an additional 3.7% speedup value.
△ Less
Submitted 19 July, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Towards Retrieval-based Conversational Recommendation
Authors:
Ahtsham Manzoor,
Dietmar Jannach
Abstract:
Conversational recommender systems have attracted immense attention recently. The most recent approaches rely on neural models trained on recorded dialogs between humans, implementing an end-to-end learning process. These systems are commonly designed to generate responses given the user's utterances in natural language. One main challenge is that these generated responses both have to be appropri…
▽ More
Conversational recommender systems have attracted immense attention recently. The most recent approaches rely on neural models trained on recorded dialogs between humans, implementing an end-to-end learning process. These systems are commonly designed to generate responses given the user's utterances in natural language. One main challenge is that these generated responses both have to be appropriate for the given dialog context and must be grammatically and semantically correct. An alternative to such generation-based approaches is to retrieve responses from pre-recorded dialog data and to adapt them if needed. Such retrieval-based approaches were successfully explored in the context of general conversational systems, but have received limited attention in recent years for CRS. In this work, we re-assess the potential of such approaches and design and evaluate a novel technique for response retrieval and ranking. A user study (N=90) revealed that the responses by our system were on average of higher quality than those of two recent generation-based systems. We furthermore found that the quality ranking of the two generation-based approaches is not aligned with the results from the literature, which points to open methodological questions. Overall, our research underlines that retrieval-based approaches should be considered an alternative or complement to language generation approaches.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Ruin Theory for User Association and Energy Optimization in Multi-access Edge Computing
Authors:
Do Hyeon Kim,
Aunas Manzoor,
Madyan Alsenwi,
Yan Kyaw Tun,
Walid Saad,
Choong Seon Hong
Abstract:
In this correspondence, a novel framework is proposed for analyzing data offloading in a multi-access edge computing system. Specifically, a two-phase algorithm, is proposed, including two key phases: 1) user association phase and 2) task offloading phase. In the first phase, a ruin theory-based approach is developed to obtain the users association considering the users' transmission reliability a…
▽ More
In this correspondence, a novel framework is proposed for analyzing data offloading in a multi-access edge computing system. Specifically, a two-phase algorithm, is proposed, including two key phases: 1) user association phase and 2) task offloading phase. In the first phase, a ruin theory-based approach is developed to obtain the users association considering the users' transmission reliability and resource utilization efficiency. Meanwhile, in the second phase, an optimization-based algorithm is used to optimize the data offloading process. In particular, ruin theory is used to manage the user association phase, and a ruin probability-based preference profile is considered to control the priority of proposing users. Here, ruin probability is derived by the surplus buffer space of each edge node at each time slot. Giving the association results, an optimization problem is formulated to optimize the amount of offloaded data aiming at minimizing the energy consumption of users. Simulation results show that the developed solutions guarantee system reliability, association efficiency under a tolerable value of surplus buffer size, and minimize the total energy consumption of all users.
△ Less
Submitted 21 April, 2023; v1 submitted 2 July, 2021;
originally announced July 2021.
-
Ruin Theory for Energy-Efficient Resource Allocation in UAV-assisted Cellular Networks
Authors:
Aunas Manzoor,
Kitae Kim,
Shashi Raj Pandey,
S. M. Ahsan Kazmi,
Nguyen H. Tran,
Walid Saad,
Choong Seon Hong
Abstract:
Unmanned aerial vehicles (UAVs) can provide an effective solution for improving the coverage, capacity, and the overall performance of terrestrial wireless cellular networks. In particular, UAV-assisted cellular networks can meet the stringent performance requirements of the fifth generation new radio (5G NR) applications. In this paper, the problem of energy-efficient resource allocation in UAV-a…
▽ More
Unmanned aerial vehicles (UAVs) can provide an effective solution for improving the coverage, capacity, and the overall performance of terrestrial wireless cellular networks. In particular, UAV-assisted cellular networks can meet the stringent performance requirements of the fifth generation new radio (5G NR) applications. In this paper, the problem of energy-efficient resource allocation in UAV-assisted cellular networks is studied under the reliability and latency constraints of 5G NR applications. The framework of ruin theory is employed to allow solar-powered UAVs to capture the dynamics of harvested and consumed energies. First, the surplus power of every UAV is modeled, and then it is used to compute the probability of ruin of the UAVs. The probability of ruin denotes the vulnerability of draining out the power of a UAV. Next, the probability of ruin is used for efficient user association with each UAV. Then, power allocation for 5G NR applications is performed to maximize the achievable network rate using the water-filling approach. Simulation results demonstrate that the proposed ruin-based scheme can enhance the flight duration up to 61% and the number of served users in a UAV flight by up to 58\%, compared to a baseline SINR-based scheme.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
A Survey on Conversational Recommender Systems
Authors:
Dietmar Jannach,
Ahtsham Manzoor,
Wanling Cai,
Li Chen
Abstract:
Recommender systems are software applications that help users to find items of interest in situations of information overload. Current research often assumes a one-shot interaction paradigm, where the users' preferences are estimated based on past observed behavior and where the presentation of a ranked list of suggestions is the main, one-directional form of user interaction. Conversational recom…
▽ More
Recommender systems are software applications that help users to find items of interest in situations of information overload. Current research often assumes a one-shot interaction paradigm, where the users' preferences are estimated based on past observed behavior and where the presentation of a ranked list of suggestions is the main, one-directional form of user interaction. Conversational recommender systems (CRS) take a different approach and support a richer set of interactions. These interactions can, for example, help to improve the preference elicitation process or allow the user to ask questions about the recommendations and to give feedback. The interest in CRS has significantly increased in the past few years. This development is mainly due to the significant progress in the area of natural language processing, the emergence of new voice-controlled home assistants, and the increased use of chatbot technology. With this paper, we provide a detailed survey of existing approaches to conversational recommendation. We categorize these approaches in various dimensions, e.g., in terms of the supported user intents or the knowledge they use in the background. Moreover, we discuss technological approaches, review how CRS are evaluated, and finally identify a number of gaps that deserve more research in the future.
△ Less
Submitted 31 May, 2021; v1 submitted 1 April, 2020;
originally announced April 2020.
-
Contract-based Scheduling of URLLC Packets in Incumbent EMBB Traffic
Authors:
Aunas Manzoor,
S. M. Ahsan Kazmi,
Shashi Raj Pandey,
Choong Seon Hong
Abstract:
Recently, the coexistence of ultra-reliable and low-latency communication (URLLC) and enhanced mobile broadband (eMBB) services on the same licensed spectrum has gained a lot of attention from both academia and industry. However, the coexistence of these services is not trivial due to the diverse multiple access protocols, contrasting frame distributions in the existing network, and the distinct q…
▽ More
Recently, the coexistence of ultra-reliable and low-latency communication (URLLC) and enhanced mobile broadband (eMBB) services on the same licensed spectrum has gained a lot of attention from both academia and industry. However, the coexistence of these services is not trivial due to the diverse multiple access protocols, contrasting frame distributions in the existing network, and the distinct quality of service requirements posed by these services. Therefore, such coexistence drives towards a challenging resource scheduling problem. To address this problem, in this paper, we first investigate the possibilities of scheduling URLLC packets in incumbent eMBB traffic. In this regard, we formulate an optimization problem for coexistence by dynamically adopting a superposition or puncturing scheme. In particular, the aim is to provide spectrum access to the URLLC users while reducing the intervention on incumbent eMBB users. Next, we apply the one-to-one matching game to find stable URLLC-eMBB pairs that can coexist on the same spectrum. Then, we apply the contract theory framework to design contracts for URLLC users to adopt the superposition scheme. Simulation results reveal that the proposed contract-based scheduling scheme achieves up to 63% of the eMBB rate for the "No URLLC" case compared to the "Puncturing" scheme.
△ Less
Submitted 31 March, 2020; v1 submitted 24 March, 2020;
originally announced March 2020.
-
A Crowdsourcing Framework for On-Device Federated Learning
Authors:
Shashi Raj Pandey,
Nguyen H. Tran,
Mehdi Bennis,
Yan Kyaw Tun,
Aunas Manzoor,
Choong Seon Hong
Abstract:
Federated learning (FL) rests on the notion of training a global model in a decentralized manner. Under this setting, mobile devices perform computations on their local data before uploading the required updates to improve the global model. However, when the participating clients implement an uncoordinated computation strategy, the difficulty is to handle the communication efficiency (i.e., the nu…
▽ More
Federated learning (FL) rests on the notion of training a global model in a decentralized manner. Under this setting, mobile devices perform computations on their local data before uploading the required updates to improve the global model. However, when the participating clients implement an uncoordinated computation strategy, the difficulty is to handle the communication efficiency (i.e., the number of communications per iteration) while exchanging the model parameters during aggregation. Therefore, a key challenge in FL is how users participate to build a high-quality global model with communication efficiency. We tackle this issue by formulating a utility maximization problem, and propose a novel crowdsourcing framework to leverage FL that considers the communication efficiency during parameters exchange. First, we show an incentive-based interaction between the crowdsourcing platform and the participating client's independent strategies for training a global learning model, where each side maximizes its own benefit. We formulate a two-stage Stackelberg game to analyze such scenario and find the game's equilibria. Second, we formalize an admission control scheme for participating clients to ensure a level of local accuracy. Simulated results demonstrate the efficacy of our proposed solution with up to 22% gain in the offered reward.
△ Less
Submitted 2 February, 2020; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Ruin Theory for Dynamic Spectrum Allocation in LTE-U Networks
Authors:
Aunas Manzoor,
Nguyen H. Tran,
Walid Saad,
S. M. Ahsan Kazmi,
Shashi Raj Pandey,
Choong Seon Hong
Abstract:
LTE in the unlicensed band (LTE-U) is a promising solution to overcome the scarcity of the wireless spectrum. However, to reap the benefits of LTE-U, it is essential to maintain its effective coexistence with WiFi systems. Such a coexistence, hence, constitutes a major challenge for LTE-U deployment. In this paper, the problem of unlicensed spectrum sharing among WiFi and LTE-U system is studied.…
▽ More
LTE in the unlicensed band (LTE-U) is a promising solution to overcome the scarcity of the wireless spectrum. However, to reap the benefits of LTE-U, it is essential to maintain its effective coexistence with WiFi systems. Such a coexistence, hence, constitutes a major challenge for LTE-U deployment. In this paper, the problem of unlicensed spectrum sharing among WiFi and LTE-U system is studied. In particular, a fair time sharing model based on \emph{ruin theory} is proposed to share redundant spectral resources from the unlicensed band with LTE-U without jeopardizing the performance of the WiFi system. Fairness among both WiFi and LTE-U is maintained by applying the concept of the probability of ruin. In particular, the probability of ruin is used to perform efficient duty-cycle allocation in LTE-U, so as to provide fairness to the WiFi system and maintain certain WiFi performance. Simulation results show that the proposed ruin-based algorithm provides better fairness to the WiFi system as compared to equal duty-cycle sharing among WiFi and LTE-U.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
Enabling End-to-End Secure Connectivity for Low-Power IoT Devices with UAVs
Authors:
Archana Rajakaruna,
Ahsan Manzoor,
Pawani Porambage,
Madhusanka Liyanage,
Mika Ylianttila,
Andrei Gurtov
Abstract:
The proliferation of the Internet of Things (IoT) technologies have strengthen the self-monitoring and autonomous characteristics of the sensor networks deployed in numerous application areas. The recent developments of the edge computing paradigms have also enabled on-site processing and managing capabilities of sensor networks. In this paper, we introduce a system model that enables end-to-end s…
▽ More
The proliferation of the Internet of Things (IoT) technologies have strengthen the self-monitoring and autonomous characteristics of the sensor networks deployed in numerous application areas. The recent developments of the edge computing paradigms have also enabled on-site processing and managing capabilities of sensor networks. In this paper, we introduce a system model that enables end-to-end secure connectivity between low-power IoT devices and UAVs, that helps to manage data processing tasks of a heterogeneous wireless sensor networks. The performance of proposed solution is analyzed by using simulation results. Moreover, in order to demonstrate the practical usability of the proposed solution, the prototype implementation is presented using commercial off-the-shelf devices.
△ Less
Submitted 14 March, 2019; v1 submitted 10 November, 2018;
originally announced November 2018.
-
Blockchain based Proxy Re-Encryption Scheme for Secure IoT Data Sharing
Authors:
Ahsan Manzoor,
Madhsanka Liyanage,
An Braeken,
Salil S. Kanhere,
Mika Ylianttila
Abstract:
Data is central to the Internet of Things (IoT) ecosystem. Most of the current IoT systems are using centralized cloud-based data sharing systems, which will be difficult to scale up to meet the demands of future IoT systems. Involvement of such third-party service provider requires also trust from both sensor owner and sensor data user. Moreover, the fees need to be paid for their services. To ta…
▽ More
Data is central to the Internet of Things (IoT) ecosystem. Most of the current IoT systems are using centralized cloud-based data sharing systems, which will be difficult to scale up to meet the demands of future IoT systems. Involvement of such third-party service provider requires also trust from both sensor owner and sensor data user. Moreover, the fees need to be paid for their services. To tackle both the scalability and trust issues and to automatize the payments, this paper presents a blockchain based proxy re-encryption scheme. The system stores the IoT data in a distributed cloud after encryption. To share the collected IoT data, the system establishes runtime dynamic smart contracts between the sensor and data user without the involvement of a trusted third party. It also uses a very efficient proxy re-encryption scheme which allows that the data is only visible by the owner and the person present in the smart contract. This novel combination of smart contracts with proxy re-encryption provides an efficient, fast and secure platform for storing, trading and managing of sensor data. The proposed system is implemented in an Ethereum based testbed to analyze the performance and the security properties.
△ Less
Submitted 8 March, 2019; v1 submitted 6 November, 2018;
originally announced November 2018.
-
A 3D Game Theoretical Framework for the Evaluation of Unmanned Aircraft Systems Airspace Integration Concepts
Authors:
Negin Musavi,
Ayman Manzoor,
Yildiray Yildiz
Abstract:
Predicting the outcomes of integrating Unmanned Aerial Systems (UAS) into the National Airspace System (NAS) is a complex problem which is required to be addressed by simulation studies before allowing the routine access of UAS into the NAS. This paper focuses on providing a 3-dimensional (3D) simulation framework using a game theoretical methodology to evaluate integration concepts using scenario…
▽ More
Predicting the outcomes of integrating Unmanned Aerial Systems (UAS) into the National Airspace System (NAS) is a complex problem which is required to be addressed by simulation studies before allowing the routine access of UAS into the NAS. This paper focuses on providing a 3-dimensional (3D) simulation framework using a game theoretical methodology to evaluate integration concepts using scenarios where manned and unmanned air vehicles co-exist. In the proposed method, human pilot interactive decision making process is incorporated into airspace models which can fill the gap in the literature where the pilot behavior is generally assumed to be known a priori. The proposed human pilot behavior is modeled using dynamic level-k reasoning concept and approximate reinforcement learning. The level-k reasoning concept is a notion in game theory and is based on the assumption that humans have various levels of decision making. In the conventional "static" approach, each agent makes assumptions about his or her opponents and chooses his or her actions accordingly. On the other hand, in the dynamic level-k reasoning, agents can update their beliefs about their opponents and revise their level-k rule. In this study, Neural Fitted Q Iteration, which is an approximate reinforcement learning method, is used to model time-extended decisions of pilots with 3D maneuvers. An analysis of UAS integration is conducted using an example 3D scenario in the presence of manned aircraft and fully autonomous UAS equipped with sense and avoid algorithms.
△ Less
Submitted 27 February, 2018; v1 submitted 20 February, 2018;
originally announced February 2018.
-
A Delay-Tolerant Payment Scheme Based on the Ethereum Blockchain
Authors:
Yining Hu,
Ahsan Manzoor,
Parinya Ekparinya,
Madhusanka Liyanage,
Kanchana Thilakarathna,
Guillaume Jourjon,
Aruna Seneviratne,
Mika E Ylianttila
Abstract:
Banking as an essential service can be hard to access in remote, rural regions where the network connectivity is intermittent. Although micro-banking has been made possible by SMS or USSD messages in some places, their security flaws and session-based nature prevent them from a wider adoption. Global level cryptocurrencies enable low-cost, secure and pervasive money transferring among distributed…
▽ More
Banking as an essential service can be hard to access in remote, rural regions where the network connectivity is intermittent. Although micro-banking has been made possible by SMS or USSD messages in some places, their security flaws and session-based nature prevent them from a wider adoption. Global level cryptocurrencies enable low-cost, secure and pervasive money transferring among distributed peers, but are still limited in their ability to reach more people in remote communities.
We proposed to take advantage of the delay-tolerant nature of blockchains to deliver banking services to remote communities that only connect to the broader Internet intermittently. Using a base station that offers connectivity within the local area, regular transaction processing is solely handled by blockchain miners. The bank only joins to process currency exchange requests, reward miners and track user balances when the connection is available. By distributing the verification and storage tasks among peers, our system design saves on the overall deployment and operational costs without sacrificing the reliability and trustwor- thiness. Through theoretical and empirical analysis, we provided insights to system design, tested its robustness against network disturbances, and demonstrated the feasibility of implementation on off-the-shelf computers and mobile devices.
△ Less
Submitted 30 January, 2018;
originally announced January 2018.
-
Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs
Authors:
Emaad A. Manzoor,
Sadegh Momeni,
Venkat N. Venkatakrishnan,
Leman Akoglu
Abstract:
Given a stream of heterogeneous graphs containing different types of nodes and edges, how can we spot anomalous ones in real-time while consuming bounded memory? This problem is motivated by and generalizes from its application in security to host-level advanced persistent threat (APT) detection. We propose StreamSpot, a clustering based anomaly detection approach that addresses challenges in two…
▽ More
Given a stream of heterogeneous graphs containing different types of nodes and edges, how can we spot anomalous ones in real-time while consuming bounded memory? This problem is motivated by and generalizes from its application in security to host-level advanced persistent threat (APT) detection. We propose StreamSpot, a clustering based anomaly detection approach that addresses challenges in two key fronts: (1) heterogeneity, and (2) streaming nature. We introduce a new similarity function for heterogeneous graphs that compares two graphs based on their relative frequency of local substructures, represented as short strings. This function lends itself to a vector representation of a graph, which is (a) fast to compute, and (b) amenable to a sketched version with bounded size that preserves similarity. StreamSpot exhibits desirable properties that a streaming application requires---it is (i) fully-streaming; processing the stream one edge at a time as it arrives, (ii) memory-efficient; requiring constant space for the sketches and the clustering, (iii) fast; taking constant time to update the graph sketches and the cluster summaries that can process over 100K edges per second, and (iv) online; scoring and flagging anomalies in real time. Experiments on datasets containing simulated system-call flow graphs from normal browser activity and various attack scenarios (ground truth) show that our proposed StreamSpot is high-performance; achieving above 95% detection accuracy with small delay, as well as competitive time and memory usage.
△ Less
Submitted 22 February, 2016; v1 submitted 15 February, 2016;
originally announced February 2016.