-
Energy Efficiency Maximization for CR-NOMA based Smart Grid Communication Network
Authors:
Mubashar Sarfraz,
Sheraz Alam,
Sajjad A. Ghauri,
Asad Mahmood
Abstract:
Managing massive data flows effectively and resolving spectrum shortages are two challenges that Smart Grid Communication Networks (SGCN) must overcome. To address these problems, we provide a combined optimization approach that makes use of Cognitive Radio (CR) and Non-Orthogonal Multiple Access (NOMA) technologies. Our work focuses on using user pairing (UP) and power allocation (PA) techniques…
▽ More
Managing massive data flows effectively and resolving spectrum shortages are two challenges that Smart Grid Communication Networks (SGCN) must overcome. To address these problems, we provide a combined optimization approach that makes use of Cognitive Radio (CR) and Non-Orthogonal Multiple Access (NOMA) technologies. Our work focuses on using user pairing (UP) and power allocation (PA) techniques to maximize energy efficiency (EE) in SGCN, particularly within Neighbourhood Area Networks (NANs). We develop a joint optimization problem that takes into account the real-world limitations of a CR-NOMA setting. This problem is NP-hard, nonlinear, and nonconvex by nature. To address the computational complexity of the problem, we use the Block Coordinate Descent (BCD) method, which breaks the problem into UP and PA subproblems. Initially, we proposed the Zebra-Optimization User Pairing (ZOUP) algorithm to tackle the UP problem, which outperforms both Orthogonal Multiple Access (OMA) and non-optimized NOMA (UPWO) by 78.8\% and 13.6\%, respectively, at a SNR of 15 dB. Based on the ZOUP pairs, we subsequently proposed the PA approach, i.e., ZOUPPA, which significantly outperforms UPWO and ZOUP by 53.2\% and 25.4\%, respectively, at an SNR of 15 dB. A detailed analysis of key parameters, including varying SNRs, power allocation constants, path loss exponents, user density, channel availability, and coverage radius, underscores the superiority of our approach. By facilitating the effective use of communication resources in SGCN, our research opens the door to more intelligent and energy-efficient grid systems. Our work tackles important issues in SGCN and lays the groundwork for future developments in smart grid communication technologies by combining modern optimization approaches with CR-NOMA.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Referring Atomic Video Action Recognition
Authors:
Kunyu Peng,
Jia Fu,
Kailun Yang,
Di Wen,
Yufan Chen,
Ruiping Liu,
Junwei Zheng,
Jiaming Zhang,
M. Saquib Sarfraz,
Rainer Stiefelhagen,
Alina Roitberg
Abstract:
We introduce a new task called Referring Atomic Video Action Recognition (RAVAR), aimed at identifying atomic actions of a particular person based on a textual description and the video data of this person. This task differs from traditional action recognition and localization, where predictions are delivered for all present individuals. In contrast, we focus on recognizing the correct atomic acti…
▽ More
We introduce a new task called Referring Atomic Video Action Recognition (RAVAR), aimed at identifying atomic actions of a particular person based on a textual description and the video data of this person. This task differs from traditional action recognition and localization, where predictions are delivered for all present individuals. In contrast, we focus on recognizing the correct atomic action of a specific individual, guided by text. To explore this task, we present the RefAVA dataset, containing 36,630 instances with manually annotated textual descriptions of the individuals. To establish a strong initial benchmark, we implement and validate baselines from various domains, e.g., atomic action localization, video question answering, and text-video retrieval. Since these existing methods underperform on RAVAR, we introduce RefAtomNet -- a novel cross-stream attention-driven method specialized for the unique challenges of RAVAR: the need to interpret a textual referring expression for the targeted individual, utilize this reference to guide the spatial localization and harvest the prediction of the atomic actions for the referring person. The key ingredients are: (1) a multi-stream architecture that connects video, text, and a new location-semantic stream, and (2) cross-stream agent attention fusion and agent token fusion which amplify the most relevant information across these streams and consistently surpasses standard attention-based fusion on RAVAR. Extensive experiments demonstrate the effectiveness of RefAtomNet and its building blocks for recognizing the action of the described individual. The dataset and code will be made publicly available at https://github.com/KPeng9510/RAVAR.
△ Less
Submitted 10 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation
Authors:
Ruiping Liu,
Jiaming Zhang,
Kunyu Peng,
Yufan Chen,
Ke Cao,
Junwei Zheng,
M. Saquib Sarfraz,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level…
▽ More
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level modality absence and sensor-level modality errors. To avoid the predominant modality reliance in multi-modal fusion, we introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training. Utilizing bit-level batch-wise sampling enhances the model's performance in both complete and incomplete testing scenarios. Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate representative spectral information into a limited number of learnable prompts that maintain robustness against all MISS scenarios. Akin to fine-tuning effects but with fewer tunable parameters (1.1%). Extensive experiments prove the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing. The source code is publicly available at https://github.com/RuipingL/MISS.
△ Less
Submitted 10 April, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Navigating Open Set Scenarios for Skeleton-based Action Recognition
Authors:
Kunyu Peng,
Cheng Yin,
Junwei Zheng,
Ruiping Liu,
David Schneider,
Jiaming Zhang,
Kailun Yang,
M. Saquib Sarfraz,
Rainer Stiefelhagen,
Alina Roitberg
Abstract:
In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Se…
▽ More
In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Set Skeleton-based Action Recognition (OS-SAR) task and formalize the benchmark on three skeleton-based datasets. We assess the performance of seven established open-set approaches on our task and identify their limits and critical generalization issues when dealing with skeleton information. To address these challenges, we propose a distance-based cross-modality ensemble method that leverages the cross-modal alignment of skeleton joints, bones, and velocities to achieve superior open-set recognition performance. We refer to the key idea as CrossMax - an approach that utilizes a novel cross-modality mean max discrepancy suppression mechanism to align latent spaces during training and a cross-modality distance-based logits refinement method during testing. CrossMax outperforms existing approaches and consistently yields state-of-the-art results across all datasets and backbones. The benchmark, code, and models will be released at https://github.com/KPeng9510/OS-SAR.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains
Authors:
Kunyu Peng,
Di Wen,
David Schneider,
Jiaming Zhang,
Kailun Yang,
M. Saquib Sarfraz,
Rainer Stiefelhagen,
Alina Roitberg
Abstract:
Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverag…
▽ More
Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos to achieve effective adaptation. This approach is appealing for applications because it only needs a few or even one labeled example per class in the target domain, ideal for recognizing rare but critical activities. However, the existing FSDA-AR works mostly focus on the domain adaptation on sports videos, where the domain diversity is limited. We propose a new FSDA-AR benchmark using five established datasets considering the adaptation on more diverse and challenging domains. Our results demonstrate that FSDA-AR performs comparably to unsupervised domain adaptation with significantly fewer labeled target domain samples. We further propose a novel approach, RelaMiX, to better leverage the few labeled target domain samples as knowledge guidance. RelaMiX encompasses a temporal relational attention network with relation dropout, alongside a cross-domain information alignment mechanism. Furthermore, it integrates a mechanism for mixing features within a latent space by using the few-shot target domain samples. The proposed RelaMiX solution achieves state-of-the-art performance on all datasets within the FSDA-AR benchmark. To encourage future research of few-shot domain adaptation for activity recognition, our code will be publicly available at https://github.com/KPeng9510/RelaMiX.
△ Less
Submitted 27 April, 2024; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Towards Activated Muscle Group Estimation in the Wild
Authors:
Kunyu Peng,
David Schneider,
Alina Roitberg,
Kailun Yang,
Jiaming Zhang,
Chen Deng,
Kaiyu Zhang,
M. Saquib Sarfraz,
Rainer Stiefelhagen
Abstract:
In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying active muscle regions during physical activity in the wild. To this intent, we provide the MuscleMap dataset featuring >15K video clips with 135 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabil…
▽ More
In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying active muscle regions during physical activity in the wild. To this intent, we provide the MuscleMap dataset featuring >15K video clips with 135 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabilitation medicine under flexible environment constraints. The proposed MuscleMap dataset is constructed with YouTube videos, specifically targeting High-Intensity Interval Training (HIIT) physical exercise in the wild. To make the AMGE model applicable in real-life situations, it is crucial to ensure that the model can generalize well to numerous types of physical activities not present during training and involving new combinations of activated muscles. To achieve this, our benchmark also covers an evaluation setting where the model is exposed to activity types excluded from the training set. Our experiments reveal that the generalizability of existing architectures adapted for the AMGE task remains a challenge. Therefore, we also propose a new approach, TransM3E, which employs a multi-modality feature fusion mechanism between both the video transformer model and the skeleton-based graph convolution model with novel cross-modal knowledge distillation executed on multi-classification tokens. The proposed method surpasses all popular video classification models when dealing with both, previously seen and new types of physical activities. The database and code can be found at https://github.com/KPeng9510/MuscleMap.
△ Less
Submitted 5 August, 2024; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Content and Colour Distillation for Learning Image Translations with the Spatial Profile Loss
Authors:
M. Saquib Sarfraz,
Constantin Seibold,
Haroon Khalid,
Rainer Stiefelhagen
Abstract:
Generative adversarial networks has emerged as a defacto standard for image translation problems. To successfully drive such models, one has to rely on additional networks e.g., discriminators and/or perceptual networks. Training these networks with pixel based losses alone are generally not sufficient to learn the target distribution. In this paper, we propose a novel method of computing the loss…
▽ More
Generative adversarial networks has emerged as a defacto standard for image translation problems. To successfully drive such models, one has to rely on additional networks e.g., discriminators and/or perceptual networks. Training these networks with pixel based losses alone are generally not sufficient to learn the target distribution. In this paper, we propose a novel method of computing the loss directly between the source and target images that enable proper distillation of shape/content and colour/style. We show that this is useful in typical image-to-image translations allowing us to successfully drive the generator without relying on additional networks. We demonstrate this on many difficult image translation problems such as image-to-image domain mapping, single image super-resolution and photo realistic makeup transfer. Our extensive evaluation shows the effectiveness of the proposed formulation and its ability to synthesize realistic images. [Code release: https://github.com/ssarfraz/SPL]
△ Less
Submitted 1 August, 2019;
originally announced August 2019.