-
A federated large language model for long-term time series forecasting
Authors:
Raed Abdel-Sater,
A. Ben Hamza
Abstract:
Long-term time series forecasting in centralized environments poses unique challenges regarding data privacy, communication overhead, and scalability. To address these challenges, we propose FedTime, a federated large language model (LLM) tailored for long-range time series prediction. Specifically, we introduce a federated pre-trained LLM with fine-tuning and alignment strategies. Prior to the le…
▽ More
Long-term time series forecasting in centralized environments poses unique challenges regarding data privacy, communication overhead, and scalability. To address these challenges, we propose FedTime, a federated large language model (LLM) tailored for long-range time series prediction. Specifically, we introduce a federated pre-trained LLM with fine-tuning and alignment strategies. Prior to the learning process, we employ K-means clustering to partition edge devices or clients into distinct clusters, thereby facilitating more focused model training. We also incorporate channel independence and patching to better preserve local semantic information, ensuring that important contextual details are retained while minimizing the risk of information loss. We demonstrate the effectiveness of our FedTime model through extensive experiments on various real-world forecasting benchmarks, showcasing substantial improvements over recent approaches. In addition, we demonstrate the efficiency of FedTime in streamlining resource usage, resulting in reduced communication overhead.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Flexible graph convolutional network for 3D human pose estimation
Authors:
Abu Taib Mohammed Shahjahan,
A. Ben Hamza
Abstract:
Although graph convolutional networks exhibit promising performance in 3D human pose estimation, their reliance on one-hop neighbors limits their ability to capture high-order dependencies among body joints, crucial for mitigating uncertainty arising from occlusion or depth ambiguity. To tackle this limitation, we introduce Flex-GCN, a flexible graph convolutional network designed to learn graph r…
▽ More
Although graph convolutional networks exhibit promising performance in 3D human pose estimation, their reliance on one-hop neighbors limits their ability to capture high-order dependencies among body joints, crucial for mitigating uncertainty arising from occlusion or depth ambiguity. To tackle this limitation, we introduce Flex-GCN, a flexible graph convolutional network designed to learn graph representations that capture broader global information and dependencies. At its core is the flexible graph convolution, which aggregates features from both immediate and second-order neighbors of each node, while maintaining the same time and memory complexity as the standard convolution. Our network architecture comprises residual blocks of flexible graph convolutional layers, as well as a global response normalization layer for global feature aggregation, normalization and calibration. Quantitative and qualitative results demonstrate the effectiveness of our model, achieving competitive performance on benchmark datasets.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
PEEKABOO: Hiding parts of an image for unsupervised object localization
Authors:
Hasib Zunair,
A. Ben Hamza
Abstract:
Localizing objects in an unsupervised manner poses significant challenges due to the absence of key visual information such as the appearance, type and number of objects, as well as the lack of labeled object classes typically available in supervised settings. While recent approaches to unsupervised object localization have demonstrated significant progress by leveraging self-supervised visual rep…
▽ More
Localizing objects in an unsupervised manner poses significant challenges due to the absence of key visual information such as the appearance, type and number of objects, as well as the lack of labeled object classes typically available in supervised settings. While recent approaches to unsupervised object localization have demonstrated significant progress by leveraging self-supervised visual representations, they often require computationally intensive training processes, resulting in high resource demands in terms of computation, learnable parameters, and data. They also lack explicit modeling of visual context, potentially limiting their accuracy in object localization. To tackle these challenges, we propose a single-stage learning framework, dubbed PEEKABOO, for unsupervised object localization by learning context-based representations at both the pixel- and shape-level of the localized objects through image masking. The key idea is to selectively hide parts of an image and leverage the remaining image information to infer the location of objects without explicit supervision. The experimental results, both quantitative and qualitative, across various benchmark datasets, demonstrate the simplicity, effectiveness and competitive performance of our approach compared to state-of-the-art methods in both single object discovery and unsupervised salient object detection tasks. Code and pre-trained models are available at: https://github.com/hasibzunair/peekaboo
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Multi-hop graph transformer network for 3D human pose estimation
Authors:
Zaedul Islam,
A. Ben Hamza
Abstract:
Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. In this paper, we introduce a multi-hop graph transformer network designed for 2D-to-3D human pose estimation in videos by leveraging the strengths of multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle lon…
▽ More
Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. In this paper, we introduce a multi-hop graph transformer network designed for 2D-to-3D human pose estimation in videos by leveraging the strengths of multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle long-range interactions. The proposed network architecture consists of a graph attention block composed of stacked layers of multi-head self-attention and graph convolution with learnable adjacency matrix, and a multi-hop graph convolutional block comprised of multi-hop convolutional and dilated convolutional layers. The combination of multi-head self-attention and multi-hop graph convolutional layers enables the model to capture both local and global dependencies, while the integration of dilated convolutional layers enhances the model's ability to handle spatial details required for accurate localization of the human body joints. Extensive experiments demonstrate the effectiveness and generalization ability of our model, achieving competitive performance on benchmark datasets.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving
Authors:
Hasib Zunair,
Shakib Khan,
A. Ben Hamza
Abstract:
Road scene understanding is crucial in autonomous driving, enabling machines to perceive the visual environment. However, recent object detectors tailored for learning on datasets collected from certain geographical locations struggle to generalize across different locations. In this paper, we present RSUD20K, a new dataset for road scene understanding, comprised of over 20K high-resolution images…
▽ More
Road scene understanding is crucial in autonomous driving, enabling machines to perceive the visual environment. However, recent object detectors tailored for learning on datasets collected from certain geographical locations struggle to generalize across different locations. In this paper, we present RSUD20K, a new dataset for road scene understanding, comprised of over 20K high-resolution images from the driving perspective on Bangladesh roads, and includes 130K bounding box annotations for 13 objects. This challenging dataset encompasses diverse road scenes, narrow streets and highways, featuring objects from different viewpoints and scenes from crowded environments with densely cluttered objects and various weather conditions. Our work significantly improves upon previous efforts, providing detailed annotations and increased object complexity. We thoroughly examine the dataset, benchmarking various state-of-the-art object detectors and exploring large vision models as image annotators.
△ Less
Submitted 9 February, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Adaptive spectral graph wavelets for collaborative filtering
Authors:
Osama Alshareet,
A. Ben Hamza
Abstract:
Collaborative filtering is a popular approach in recommender systems, whose objective is to provide personalized item suggestions to potential users based on their purchase or browsing history. However, personalized recommendations require considerable amount of behavioral data on users, which is usually unavailable for new users, giving rise to the cold-start problem. To help alleviate this chall…
▽ More
Collaborative filtering is a popular approach in recommender systems, whose objective is to provide personalized item suggestions to potential users based on their purchase or browsing history. However, personalized recommendations require considerable amount of behavioral data on users, which is usually unavailable for new users, giving rise to the cold-start problem. To help alleviate this challenging problem, we introduce a spectral graph wavelet collaborative filtering framework for implicit feedback data, where users, items and their interactions are represented as a bipartite graph. Specifically, we first propose an adaptive transfer function by leveraging a power transform with the goal of stabilizing the variance of graph frequencies in the spectral domain. Then, we design a deep recommendation model for efficient learning of low-dimensional embeddings of users and items using spectral graph wavelets in an end-to-end fashion. In addition to capturing the graph's local and global structures, our approach yields localization of graph signals in both spatial and spectral domains, and hence not only learns discriminative representations of users and items, but also promotes the recommendation quality. The effectiveness of our proposed model is demonstrated through extensive experiments on real-world benchmark datasets, achieving better recommendation performance compared with strong baseline methods.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Classification of developmental and brain disorders via graph convolutional aggregation
Authors:
Ibrahim Salim,
A. Ben Hamza
Abstract:
While graph convolution based methods have become the de-facto standard for graph representation learning, their applications to disease prediction tasks remain quite limited, particularly in the classification of neurodevelopmental and neurodegenerative brain disorders. In this paper, we introduce an aggregator normalization graph convolutional network by leveraging aggregation in graph sampling,…
▽ More
While graph convolution based methods have become the de-facto standard for graph representation learning, their applications to disease prediction tasks remain quite limited, particularly in the classification of neurodevelopmental and neurodegenerative brain disorders. In this paper, we introduce an aggregator normalization graph convolutional network by leveraging aggregation in graph sampling, as well as skip connections and identity mapping. The proposed model learns discriminative graph node representations by incorporating both imaging and non-imaging features into the graph nodes and edges, respectively, with the aim of augmenting predictive capabilities and providing a holistic perspective on the underlying mechanisms of brain disorders. Skip connections enable the direct flow of information from the input features to later layers of the network, while identity mapping helps maintain the structural information of the graph during feature learning. We benchmark our model against several recent baseline methods on two large datasets, Autism Brain Imaging Data Exchange (ABIDE) and Alzheimer's Disease Neuroimaging Initiative (ADNI), for the prediction of autism spectrum disorder and Alzheimer's disease, respectively. Experimental results demonstrate the competitive performance of our approach in comparison with recent baselines in terms of several evaluation metrics, achieving relative improvements of 50% and 13.56% in classification accuracy over graph convolutional networks on ABIDE and ADNI, respectively.
△ Less
Submitted 16 November, 2023; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Learning to recognize occluded and small objects with partial inputs
Authors:
Hasib Zunair,
A. Ben Hamza
Abstract:
Recognizing multiple objects in an image is challenging due to occlusions, and becomes even more so when the objects are small. While promising, existing multi-label image recognition models do not explicitly learn context-based representations, and hence struggle to correctly recognize small and occluded objects. Intuitively, recognizing occluded objects requires knowledge of partial input, and h…
▽ More
Recognizing multiple objects in an image is challenging due to occlusions, and becomes even more so when the objects are small. While promising, existing multi-label image recognition models do not explicitly learn context-based representations, and hence struggle to correctly recognize small and occluded objects. Intuitively, recognizing occluded objects requires knowledge of partial input, and hence context. Motivated by this intuition, we propose Masked Supervised Learning (MSL), a single-stage, model-agnostic learning paradigm for multi-label image recognition. The key idea is to learn context-based representations using a masked branch and to model label co-occurrence using label consistency. Experimental results demonstrate the simplicity, applicability and more importantly the competitive performance of MSL against previous state-of-the-art methods on standard multi-label image recognition benchmarks. In addition, we show that MSL is robust to random masking and demonstrate its effectiveness in recognizing non-masked objects. Code and pretrained models are available on GitHub.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Spatio-temporal MLP-graph network for 3D human pose estimation
Authors:
Tanvir Hassan,
A. Ben Hamza
Abstract:
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. Despite their success, most of these methods only consider spatial correlations between body joints and do not take into account temporal correlations, thereby limiting their ability to capture relationships in the presence of occlusions and inherent ambiguity. To address this potential weak…
▽ More
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. Despite their success, most of these methods only consider spatial correlations between body joints and do not take into account temporal correlations, thereby limiting their ability to capture relationships in the presence of occlusions and inherent ambiguity. To address this potential weakness, we propose a spatio-temporal network architecture composed of a joint-mixing multi-layer perceptron block that facilitates communication among different joints and a graph weighted Jacobi network block that enables communication among various feature channels. The major novelty of our approach lies in a new weighted Jacobi feature propagation rule obtained through graph filtering with implicit fairing. We leverage temporal information from the 2D pose sequences, and integrate weight modulation into the model to enable untangling of the feature transformations of distinct nodes. We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined linkages between body joints by altering the graph topology through a learnable modulation matrix. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our model, outperforming recent state-of-the-art methods for 3D human pose estimation.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
A Graph Encoder-Decoder Network for Unsupervised Anomaly Detection
Authors:
Mahsa Mesgaran,
A. Ben Hamza
Abstract:
A key component of many graph neural networks (GNNs) is the pooling operation, which seeks to reduce the size of a graph while preserving important structural information. However, most existing graph pooling strategies rely on an assignment matrix obtained by employing a GNN layer, which is characterized by trainable parameters, often leading to significant computational complexity and a lack of…
▽ More
A key component of many graph neural networks (GNNs) is the pooling operation, which seeks to reduce the size of a graph while preserving important structural information. However, most existing graph pooling strategies rely on an assignment matrix obtained by employing a GNN layer, which is characterized by trainable parameters, often leading to significant computational complexity and a lack of interpretability in the pooling process. In this paper, we propose an unsupervised graph encoder-decoder model to detect abnormal nodes from graphs by learning an anomaly scoring function to rank nodes based on their degree of abnormality. In the encoding stage, we design a novel pooling mechanism, named LCPool, which leverages locality-constrained linear coding for feature encoding to find a cluster assignment matrix by solving a least-squares optimization problem with a locality regularization term. By enforcing locality constraints during the coding process, LCPool is designed to be free from learnable parameters, capable of efficiently handling large graphs, and can effectively generate a coarser graph representation while retaining the most significant structural characteristics of the graph. In the decoding stage, we propose an unpooling operation, called LCUnpool, to reconstruct both the structure and nodal features of the original graph. We conduct empirical evaluations of our method on six benchmark datasets using several evaluation metrics, and the results demonstrate its superiority over state-of-the-art anomaly detection approaches.
△ Less
Submitted 15 October, 2023; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Iterative Graph Filtering Network for 3D Human Pose Estimation
Authors:
Zaedul Islam,
A. Ben Hamza
Abstract:
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accura…
▽ More
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accurately capture the different and complex relationships between joints. In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation, which aims to predict the 3D joint positions given a set of 2D joint locations in images. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization via the Gauss-Seidel iterative method. Motivated by this iterative solution, we design a Gauss-Seidel network (GS-Net) architecture, which makes use of weight and adjacency modulation, skip connection, and a pure convolutional block with layer normalization. Adjacency modulation facilitates the learning of edges that go beyond the inherent connections of body joints, resulting in an adjusted graph structure that reflects the human skeleton, while skip connections help maintain crucial information from the input layer's initial features as the network depth increases. We evaluate our proposed model on two standard benchmark datasets, and compare it with a comprehensive set of strong baseline methods for 3D human pose estimation. Our experimental results demonstrate that our approach outperforms the baseline methods on both datasets, achieving state-of-the-art performance. Furthermore, we conduct ablation studies to analyze the contributions of different components of our model architecture and show that the skip connection and adjacency modulation help improve the model performance.
△ Less
Submitted 7 August, 2023; v1 submitted 29 July, 2023;
originally announced July 2023.
-
Regular Splitting Graph Network for 3D Human Pose Estimation
Authors:
Tanvir Hassan,
A. Ben Hamza
Abstract:
In human pose estimation methods based on graph convolutional architectures, the human skeleton is usually modeled as an undirected graph whose nodes are body joints and edges are connections between neighboring joints. However, most of these methods tend to focus on learning relationships between body joints of the skeleton using first-order neighbors, ignoring higher-order neighbors and hence li…
▽ More
In human pose estimation methods based on graph convolutional architectures, the human skeleton is usually modeled as an undirected graph whose nodes are body joints and edges are connections between neighboring joints. However, most of these methods tend to focus on learning relationships between body joints of the skeleton using first-order neighbors, ignoring higher-order neighbors and hence limiting their ability to exploit relationships between distant joints. In this paper, we introduce a higher-order regular splitting graph network (RS-Net) for 2D-to-3D human pose estimation using matrix splitting in conjunction with weight and adjacency modulation. The core idea is to capture long-range dependencies between body joints using multi-hop neighborhoods and also to learn different modulation vectors for different body joints as well as a modulation matrix added to the adjacency matrix associated to the skeleton. This learnable modulation matrix helps adjust the graph structure by adding extra graph edges in an effort to learn additional connections between body joints. Instead of using a shared weight matrix for all neighboring body joints, the proposed RS-Net model applies weight unsharing before aggregating the feature vectors associated to the joints in order to capture the different relations between them. Experiments and ablations studies performed on two benchmark datasets demonstrate the effectiveness of our model, achieving superior performance over recent state-of-the-art methods for 3D human pose estimation.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Masked Supervised Learning for Semantic Segmentation
Authors:
Hasib Zunair,
A. Ben Hamza
Abstract:
Self-attention is of vital importance in semantic segmentation as it enables modeling of long-range context, which translates into improved performance. We argue that it is equally important to model short-range context, especially to tackle cases where not only the regions of interest are small and ambiguous, but also when there exists an imbalance between the semantic classes. To this end, we pr…
▽ More
Self-attention is of vital importance in semantic segmentation as it enables modeling of long-range context, which translates into improved performance. We argue that it is equally important to model short-range context, especially to tackle cases where not only the regions of interest are small and ambiguous, but also when there exists an imbalance between the semantic classes. To this end, we propose Masked Supervised Learning (MaskSup), an effective single-stage learning paradigm that models both short- and long-range context, capturing the contextual relationships between pixels via random masking. Experimental results demonstrate the competitive performance of MaskSup against strong baselines in both binary and multi-class segmentation tasks on three standard benchmark datasets, particularly at handling ambiguous regions and retaining better segmentation of minority classes with no added inference cost. In addition to segmenting target regions even when large portions of the input are masked, MaskSup is also generic and can be easily integrated into a variety of semantic segmentation methods. We also show that the proposed method is computationally efficient, yielding an improved performance by 10\% on the mean intersection-over-union (mIoU) while requiring $3\times$ less learnable parameters.
△ Less
Submitted 8 November, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Fill in Fabrics: Body-Aware Self-Supervised Inpainting for Image-Based Virtual Try-On
Authors:
H. Zunair,
Y. Gobeil,
S. Mercier,
A. Ben Hamza
Abstract:
Previous virtual try-on methods usually focus on aligning a clothing item with a person, limiting their ability to exploit the complex pose, shape and skin color of the person, as well as the overall structure of the clothing, which is vital to photo-realistic virtual try-on. To address this potential weakness, we propose a fill in fabrics (FIFA) model, a self-supervised conditional generative adv…
▽ More
Previous virtual try-on methods usually focus on aligning a clothing item with a person, limiting their ability to exploit the complex pose, shape and skin color of the person, as well as the overall structure of the clothing, which is vital to photo-realistic virtual try-on. To address this potential weakness, we propose a fill in fabrics (FIFA) model, a self-supervised conditional generative adversarial network based framework comprised of a Fabricator and a unified virtual try-on pipeline with a Segmenter, Warper and Fuser. The Fabricator aims to reconstruct the clothing image when provided with a masked clothing as input, and learns the overall structure of the clothing by filling in fabrics. A virtual try-on pipeline is then trained by transferring the learned representations from the Fabricator to Warper in an effort to warp and refine the target clothing. We also propose to use a multi-scale structural constraint to enforce global context at multiple scales while warping the target clothing to better fit the pose and shape of the person. Extensive experiments demonstrate that our FIFA model achieves state-of-the-art results on the standard VITON dataset for virtual try-on of clothing items, and is shown to be effective at handling complex poses and retaining the texture and embroidery of the clothing.
△ Less
Submitted 8 November, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation
Authors:
Jianning Quan,
A. Ben Hamza
Abstract:
Estimating a 3D human pose has proven to be a challenging task, primarily because of the complexity of the human body joints, occlusions, and variability in lighting conditions. In this paper, we introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation. Using multi-hop neighborhoods for node feature aggregation, our model is able to capt…
▽ More
Estimating a 3D human pose has proven to be a challenging task, primarily because of the complexity of the human body joints, occlusions, and variability in lighting conditions. In this paper, we introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation. Using multi-hop neighborhoods for node feature aggregation, our model is able to capture the long-range dependencies between body joints. Moreover, our approach leverages residual connections, which are integrated by design in our network architecture, ensuring that the learned feature representations retain important information from the initial features of the input layer as the network depth increases. Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model, achieving superior performance over strong baseline methods for 3D human pose estimation.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
STAR: Noisy Semi-Supervised Transfer Learning for Visual Classification
Authors:
Hasib Zunair,
Yan Gobeil,
Samuel Mercier,
A. Ben Hamza
Abstract:
Semi-supervised learning (SSL) has proven to be effective at leveraging large-scale unlabeled data to mitigate the dependency on labeled data in order to learn better models for visual recognition and classification tasks. However, recent SSL methods rely on unlabeled image data at a scale of billions to work well. This becomes infeasible for tasks with relatively fewer unlabeled data in terms of…
▽ More
Semi-supervised learning (SSL) has proven to be effective at leveraging large-scale unlabeled data to mitigate the dependency on labeled data in order to learn better models for visual recognition and classification tasks. However, recent SSL methods rely on unlabeled image data at a scale of billions to work well. This becomes infeasible for tasks with relatively fewer unlabeled data in terms of runtime, memory and data acquisition. To address this issue, we propose noisy semi-supervised transfer learning, an efficient SSL approach that integrates transfer learning and self-training with noisy student into a single framework, which is tailored for tasks that can leverage unlabeled image data on a scale of thousands. We evaluate our method on both binary and multi-class classification tasks, where the objective is to identify whether an image displays people practicing sports or the type of sport, as well as to identify the pose from a pool of popular yoga poses. Extensive experiments and ablation studies demonstrate that by leveraging unlabeled data, our proposed framework significantly improves visual classification, especially in multi-class classification settings compared to state-of-the-art methods. Moreover, incorporating transfer learning not only improves classification performance, but also requires 6x less compute time and 5x less memory. We also show that our method boosts robustness of visual classification models, even without specifically optimizing for adversarial robustness.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
Sharp U-Net: Depthwise Convolutional Network for Biomedical Image Segmentation
Authors:
Hasib Zunair,
A. Ben Hamza
Abstract:
The U-Net architecture, built upon the fully convolutional network, has proven to be effective in biomedical image segmentation. However, U-Net applies skip connections to merge semantically different low- and high-level convolutional features, resulting in not only blurred feature maps, but also over- and under-segmented target regions. To address these limitations, we propose a simple, yet effec…
▽ More
The U-Net architecture, built upon the fully convolutional network, has proven to be effective in biomedical image segmentation. However, U-Net applies skip connections to merge semantically different low- and high-level convolutional features, resulting in not only blurred feature maps, but also over- and under-segmented target regions. To address these limitations, we propose a simple, yet effective end-to-end depthwise encoder-decoder fully convolutional network architecture, called Sharp U-Net, for binary and multi-class biomedical image segmentation. The key rationale of Sharp U-Net is that instead of applying a plain skip connection, a depthwise convolution of the encoder feature map with a sharpening kernel filter is employed prior to merging the encoder and decoder features, thereby producing a sharpened intermediate feature map of the same size as the encoder map. Using this sharpening filter layer, we are able to not only fuse semantically less dissimilar features, but also to smooth out artifacts throughout the network layers during the early stages of training. Our extensive experiments on six datasets show that the proposed Sharp U-Net model consistently outperforms or matches the recent state-of-the-art baselines in both binary and multi-class segmentation tasks, while adding no extra learnable parameters. Furthermore, Sharp U-Net outperforms baselines that have more than three times the number of learnable parameters.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Synthetic COVID-19 Chest X-ray Dataset for Computer-Aided Diagnosis
Authors:
Hasib Zunair,
A. Ben Hamza
Abstract:
We introduce a new dataset called Synthetic COVID-19 Chest X-ray Dataset for training machine learning models. The dataset consists of 21,295 synthetic COVID-19 chest X-ray images to be used for computer-aided diagnosis. These images, generated via an unsupervised domain adaptation approach, are of high quality. We find that the synthetic images not only improve performance of various deep learnin…
▽ More
We introduce a new dataset called Synthetic COVID-19 Chest X-ray Dataset for training machine learning models. The dataset consists of 21,295 synthetic COVID-19 chest X-ray images to be used for computer-aided diagnosis. These images, generated via an unsupervised domain adaptation approach, are of high quality. We find that the synthetic images not only improve performance of various deep learning architectures when used as additional training data under heavy imbalance conditions, but also detect the target class with high confidence. We also find that comparable performance can also be achieved when trained only on synthetic images. Further, salient features of the synthetic COVID-19 images indicate that the distribution is significantly different from Non-COVID-19 classes, enabling a proper decision boundary. We hope the availability of such high fidelity chest X-ray images of COVID-19 will encourage advances in the development of diagnostic and/or management tools.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Ridge Regression Neural Network for Pediatric Bone Age Assessment
Authors:
Ibrahim Salim,
A. Ben Hamza
Abstract:
Bone age is an important measure for assessing the skeletal and biological maturity of children. Delayed or increased bone age is a serious concern for pediatricians, and needs to be accurately assessed in a bid to determine whether bone maturity is occurring at a rate consistent with chronological age. In this paper, we introduce a unified deep learning framework for bone age assessment using ins…
▽ More
Bone age is an important measure for assessing the skeletal and biological maturity of children. Delayed or increased bone age is a serious concern for pediatricians, and needs to be accurately assessed in a bid to determine whether bone maturity is occurring at a rate consistent with chronological age. In this paper, we introduce a unified deep learning framework for bone age assessment using instance segmentation and ridge regression. The proposed approach consists of two integrated stages. In the first stage, we employ an image annotation and segmentation model to annotate and segment the hand from the radiographic image, followed by background removal. In the second stage, we design a regression neural network architecture composed of a pre-trained convolutional neural network for learning salient features from the segmented pediatric hand radiographs and a ridge regression output layer for predicting the bone age. Experimental evaluation on a dataset of hand radiographs demonstrates the competitive performance of our approach in comparison with existing deep learning based methods for bone age assessment.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
WISE: A Computer System Performance Index Scoring Framework
Authors:
Lorenzo Luciano,
Imre Kiss,
Peter William Beardshear,
Esther Kadosh,
A. Ben Hamza
Abstract:
The performance levels of a computing machine running a given workload configuration are crucial for both users and providers of computing resources. Knowing how well a computing machine is running with a given workload configuration is critical to making proper computing resource allocation decisions. In this paper, we introduce a novel framework for deriving computing machine and computing resou…
▽ More
The performance levels of a computing machine running a given workload configuration are crucial for both users and providers of computing resources. Knowing how well a computing machine is running with a given workload configuration is critical to making proper computing resource allocation decisions. In this paper, we introduce a novel framework for deriving computing machine and computing resource performance indicators for a given workload configuration. We propose a workload/machine index score (WISE) framework for computing a fitness score for a workload/machine combination. The WISE score indicates how well a computing machine is running with a specific workload configuration by addressing the issue of whether resources are being stressed or sitting idle wasting precious resources. In addition to encompassing any number of computing resources, the WISE score is determined by considering how far from target levels the machine resources are operating at without maxing out. Experimental results demonstrate the efficacy of the proposed WISE framework on two distinct workload configurations.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
A Federated Learning Approach to Anomaly Detection in Smart Buildings
Authors:
Raed Abdel Sater,
A. Ben Hamza
Abstract:
Internet of Things (IoT) sensors in smart buildings are becoming increasingly ubiquitous, making buildings more livable, energy efficient, and sustainable. These devices sense the environment and generate multivariate temporal data of paramount importance for detecting anomalies and improving the prediction of energy usage in smart buildings. However, detecting these anomalies in centralized syste…
▽ More
Internet of Things (IoT) sensors in smart buildings are becoming increasingly ubiquitous, making buildings more livable, energy efficient, and sustainable. These devices sense the environment and generate multivariate temporal data of paramount importance for detecting anomalies and improving the prediction of energy usage in smart buildings. However, detecting these anomalies in centralized systems is often plagued by a huge delay in response time. To overcome this issue, we formulate the anomaly detection problem in a federated learning setting by leveraging the multi-task learning paradigm, which aims at solving multiple tasks simultaneously while taking advantage of the similarities and differences across tasks. We propose a novel privacy-by-design federated learning model using a stacked long short-time memory (LSTM) model, and we demonstrate that it is more than twice as fast during training convergence compared to the centralized LSTM. The effectiveness of our federated learning approach is demonstrated on three real-world datasets generated by the IoT production system at General Electric Current smart building, achieving state-of-the-art performance compared to baseline methods in both classification and regression tasks. Our experimental results demonstrate the effectiveness of the proposed framework in reducing the overall training cost without compromising the prediction performance.
△ Less
Submitted 23 June, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Anisotropic Graph Convolutional Network for Semi-supervised Learning
Authors:
Mahsa Mesgaran,
A. Ben Hamza
Abstract:
Graph convolutional networks learn effective node embeddings that have proven to be useful in achieving high-accuracy prediction results in semi-supervised learning tasks, such as node classification. However, these networks suffer from the issue of over-smoothing and shrinking effect of the graph due in large part to the fact that they diffuse features across the edges of the graph using a linear…
▽ More
Graph convolutional networks learn effective node embeddings that have proven to be useful in achieving high-accuracy prediction results in semi-supervised learning tasks, such as node classification. However, these networks suffer from the issue of over-smoothing and shrinking effect of the graph due in large part to the fact that they diffuse features across the edges of the graph using a linear Laplacian flow. This limitation is especially problematic for the task of node classification, where the goal is to predict the label associated with a graph node. To address this issue, we propose an anisotropic graph convolutional network for semi-supervised node classification by introducing a nonlinear function that captures informative features from nodes, while preventing oversmoothing. The proposed framework is largely motivated by the good performance of anisotropic diffusion in image and geometry processing, and learns nonlinear representations based on local graph structure and node features. The effectiveness of our approach is demonstrated on three citation networks and two image datasets, achieving better or comparable classification accuracy results compared to the standard baseline methods.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Graph Fairing Convolutional Networks for Anomaly Detection
Authors:
Mahsa Mesgaran,
A. Ben Hamza
Abstract:
Graph convolution is a fundamental building block for many deep neural networks on graph-structured data. In this paper, we introduce a simple, yet very effective graph convolutional network with skip connections for semi-supervised anomaly detection. The proposed layerwise propagation rule of our model is theoretically motivated by the concept of implicit fairing in geometry processing, and compr…
▽ More
Graph convolution is a fundamental building block for many deep neural networks on graph-structured data. In this paper, we introduce a simple, yet very effective graph convolutional network with skip connections for semi-supervised anomaly detection. The proposed layerwise propagation rule of our model is theoretically motivated by the concept of implicit fairing in geometry processing, and comprises a graph convolution module for aggregating information from immediate node neighbors and a skip connection module for combining layer-wise neighborhood representations. This propagation rule is derived from the iterative solution of the implicit fairing equation via the Jacobi method. In addition to capturing information from distant graph nodes through skip connections between the network's layers, our approach exploits both the graph structure and node features for learning discriminative node representations. These skip connections are integrated by design in our proposed network architecture. The effectiveness of our model is demonstrated through extensive experiments on five benchmark datasets, achieving better or comparable anomaly detection results against strong baseline methods. We also demonstrate through an ablation study that skip connection helps improve the model performance.
△ Less
Submitted 15 October, 2023; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Synthesis of COVID-19 Chest X-rays using Unpaired Image-to-Image Translation
Authors:
Hasib Zunair,
A. Ben Hamza
Abstract:
Motivated by the lack of publicly available datasets of chest radiographs of positive patients with Coronavirus disease 2019 (COVID-19), we build the first-of-its-kind open dataset of synthetic COVID-19 chest X-ray images of high fidelity using an unsupervised domain adaptation approach by leveraging class conditioning and adversarial training. Our contributions are twofold. First, we show conside…
▽ More
Motivated by the lack of publicly available datasets of chest radiographs of positive patients with Coronavirus disease 2019 (COVID-19), we build the first-of-its-kind open dataset of synthetic COVID-19 chest X-ray images of high fidelity using an unsupervised domain adaptation approach by leveraging class conditioning and adversarial training. Our contributions are twofold. First, we show considerable performance improvements on COVID-19 detection using various deep learning architectures when employing synthetic images as additional training set. Second, we show how our image synthesis method can serve as a data anonymization tool by achieving comparable detection performance when trained only on synthetic data. In addition, the proposed data generation framework offers a viable solution to the COVID-19 detection in particular, and to medical image classification tasks in general. Our publicly available benchmark dataset consists of 21,295 synthetic COVID-19 chest X-ray images. The insights gleaned from this dataset can be used for preventive actions in the fight against the COVID-19 pandemic.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Melanoma Detection using Adversarial Training and Deep Transfer Learning
Authors:
Hasib Zunair,
A. Ben Hamza
Abstract:
Skin lesion datasets consist predominantly of normal samples with only a small percentage of abnormal ones, giving rise to the class imbalance problem. Also, skin lesion images are largely similar in overall appearance owing to the low inter-class variability. In this paper, we propose a two-stage framework for automatic classification of skin lesion images using adversarial training and transfer…
▽ More
Skin lesion datasets consist predominantly of normal samples with only a small percentage of abnormal ones, giving rise to the class imbalance problem. Also, skin lesion images are largely similar in overall appearance owing to the low inter-class variability. In this paper, we propose a two-stage framework for automatic classification of skin lesion images using adversarial training and transfer learning toward melanoma detection. In the first stage, we leverage the inter-class variation of the data distribution for the task of conditional image synthesis by learning the inter-class mapping and synthesizing under-represented class samples from the over-represented ones using unpaired image-to-image translation. In the second stage, we train a deep convolutional neural network for skin lesion classification using the original training set combined with the newly synthesized under-represented class samples. The training of this classifier is carried out by minimizing the focal loss function, which assists the model in learning from hard examples, while down-weighting the easy ones. Experiments conducted on a dermatology image benchmark demonstrate the superiority of our proposed approach over several standard baseline methods, achieving significant performance improvements. Interestingly, we show through feature visualization and analysis that our method leads to context based lesion assessment that can reach an expert dermatologist level.
△ Less
Submitted 28 July, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.
-
Shape retrieval of non-rigid 3d human models
Authors:
David Pickup,
Xianfang Sun,
Paul L Rosin,
Ralph R Martin,
Z Cheng,
Zhouhui Lian,
Masaki Aono,
A Ben Hamza,
A Bronstein,
M Bronstein,
S Bu,
Umberto Castellani,
S Cheng,
Valeria Garro,
Andrea Giachetti,
Afzal Godil,
Luca Isaia,
J Han,
Henry Johan,
L Lai,
Bo Li,
C Li,
Haisheng Li,
Roee Litman,
X Liu
, et al. (6 additional authors not shown)
Abstract:
3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks. We have added 145 new m…
▽ More
3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks. We have added 145 new models for use as a separate training set, in order to standardise the training data used and provide a fairer comparison. We have also included experiments with the FAUST dataset of human scans. All participants of the previous benchmark study have taken part in the new tests reported here, many providing updated results using the new data. In addition, further participants have also taken part, and we provide extra analysis of the retrieval results. A total of 25 different shape retrieval methods.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
Global spectral graph wavelet signature for surface analysis of carpal bones
Authors:
Majid Masoumi,
A. Ben Hamza
Abstract:
In this paper, we present a spectral graph wavelet approach for shape analysis of carpal bones of human wrist. We apply a metric called global spectral graph wavelet signature for representation of cortical surface of the carpal bone based on eigensystem of Laplace-Beltrami operator. Furthermore, we propose a heuristic and efficient way of aggregating local descriptors of a carpal bone surface to…
▽ More
In this paper, we present a spectral graph wavelet approach for shape analysis of carpal bones of human wrist. We apply a metric called global spectral graph wavelet signature for representation of cortical surface of the carpal bone based on eigensystem of Laplace-Beltrami operator. Furthermore, we propose a heuristic and efficient way of aggregating local descriptors of a carpal bone surface to global descriptor. The resultant global descriptor is not only isometric invariant, but also much more efficient and requires less memory storage. We perform experiments on shape of the carpal bones of ten women and ten men from a publicly-available database. Experimental results show the excellency of the proposed GSGW compared to recent proposed GPS embedding approach for comparing shapes of the carpal bones across populations.
△ Less
Submitted 4 September, 2017;
originally announced September 2017.
-
Shape Classification using Spectral Graph Wavelets
Authors:
Majid Masoumi,
A. Ben Hamza
Abstract:
Spectral shape descriptors have been used extensively in a broad spectrum of geometry processing applications ranging from shape retrieval and segmentation to classification. In this pa- per, we propose a spectral graph wavelet approach for 3D shape classification using the bag-of-features paradigm. In an effort to capture both the local and global geometry of a 3D shape, we present a three-step f…
▽ More
Spectral shape descriptors have been used extensively in a broad spectrum of geometry processing applications ranging from shape retrieval and segmentation to classification. In this pa- per, we propose a spectral graph wavelet approach for 3D shape classification using the bag-of-features paradigm. In an effort to capture both the local and global geometry of a 3D shape, we present a three-step feature description framework. First, local descriptors are extracted via the spectral graph wavelet transform having the Mexican hat wavelet as a generating ker- nel. Second, mid-level features are obtained by embedding lo- cal descriptors into the visual vocabulary space using the soft- assignment coding step of the bag-of-features model. Third, a global descriptor is constructed by aggregating mid-level fea- tures weighted by a geodesic exponential kernel, resulting in a matrix representation that describes the frequency of appearance of nearby codewords in the vocabulary. Experimental results on two standard 3D shape benchmarks demonstrate the effective- ness of the proposed classification approach in comparison with state-of-the-art methods.
△ Less
Submitted 11 May, 2017;
originally announced May 2017.
-
A Multicomponent Approach to Nonrigid Registration of Diffusion Tensor Images
Authors:
Mohammed Khader,
A. Ben Hamza
Abstract:
We propose a nonrigid registration approach for diffusion tensor images using a multicomponent information-theoretic measure. Explicit orientation optimization is enabled by incorporating tensor reorientation, which is necessary for wrapping diffusion tensor images. Experimental results on diffusion tensor images indicate the feasibility of the proposed approach and a much better performance compa…
▽ More
We propose a nonrigid registration approach for diffusion tensor images using a multicomponent information-theoretic measure. Explicit orientation optimization is enabled by incorporating tensor reorientation, which is necessary for wrapping diffusion tensor images. Experimental results on diffusion tensor images indicate the feasibility of the proposed approach and a much better performance compared to the affine registration method based on mutual information in terms of registration accuracy in the presence of geometric distortion.
△ Less
Submitted 14 April, 2015; v1 submitted 7 April, 2015;
originally announced April 2015.
-
Spectral Graph Theoretic Analysis of Tsallis Entropy-based Dissimilarity Measure
Authors:
A. Ben Hamza
Abstract:
In this paper we introduce a nonextensive quantum information theoretic measure which may be defined between any arbitrary number of density matrices, and we analyze its fundamental properties in the spectral graph-theoretic framework. Unlike other entropic measures, the proposed quantum divergence is symmetric, matrix-convex, theoretically upper-bounded, and has the advantage of being generalizab…
▽ More
In this paper we introduce a nonextensive quantum information theoretic measure which may be defined between any arbitrary number of density matrices, and we analyze its fundamental properties in the spectral graph-theoretic framework. Unlike other entropic measures, the proposed quantum divergence is symmetric, matrix-convex, theoretically upper-bounded, and has the advantage of being generalizable to any arbitrary number of density matrices, with a possibility of assigning weights to these densities.
△ Less
Submitted 14 April, 2015; v1 submitted 7 April, 2015;
originally announced April 2015.