Search | arXiv e-print repository

Ampere: Communication-Efficient and High-Accuracy Split Federated Learning

Authors: Zihan Zhang, Leon Wong, Blesson Varghese

Abstract: A Federated Learning (FL) system collaboratively trains neural networks across devices and a server but is limited by significant on-device computation costs. Split Federated Learning (SFL) systems mitigate this by offloading a block of layers of the network from the device to a server. However, in doing so, it introduces large communication overheads due to frequent exchanges of intermediate acti… ▽ More A Federated Learning (FL) system collaboratively trains neural networks across devices and a server but is limited by significant on-device computation costs. Split Federated Learning (SFL) systems mitigate this by offloading a block of layers of the network from the device to a server. However, in doing so, it introduces large communication overheads due to frequent exchanges of intermediate activations and gradients between devices and the server and reduces model accuracy for non-IID data. We propose Ampere, a novel collaborative training system that simultaneously minimizes on-device computation and device-server communication while improving model accuracy. Unlike SFL, which uses a global loss by iterative end-to-end training, Ampere develops unidirectional inter-block training to sequentially train the device and server block with a local loss, eliminating the transfer of gradients. A lightweight auxiliary network generation method decouples training between the device and server, reducing frequent intermediate exchanges to a single transfer, which significantly reduces the communication overhead. Ampere mitigates the impact of data heterogeneity by consolidating activations generated by the trained device block to train the server block, in contrast to SFL, which trains on device-specific, non-IID activations. Extensive experiments on multiple CNNs and transformers show that, compared to state-of-the-art SFL baseline systems, Ampere (i) improves model accuracy by up to 13.26% while reducing training time by up to 94.6%, (ii) reduces device-server communication overhead by up to 99.1% and on-device computation by up to 93.13%, and (iii) reduces standard deviation of accuracy by 53.39% for various non-IID degrees highlighting superior performance when faced with heterogeneous data. △ Less

Submitted 8 July, 2025; originally announced July 2025.

arXiv:2504.13850 [pdf, other]

Resource Utilization Optimized Federated Learning

Authors: Zihan Zhang, Leon Wong, Blesson Varghese

Abstract: Federated learning (FL) systems facilitate distributed machine learning across a server and multiple devices. However, FL systems have low resource utilization limiting their practical use in the real world. This inefficiency primarily arises from two types of idle time: (i) task dependency between the server and devices, and (ii) stragglers among heterogeneous devices. This paper introduces FedOp… ▽ More Federated learning (FL) systems facilitate distributed machine learning across a server and multiple devices. However, FL systems have low resource utilization limiting their practical use in the real world. This inefficiency primarily arises from two types of idle time: (i) task dependency between the server and devices, and (ii) stragglers among heterogeneous devices. This paper introduces FedOptima, a resource-optimized FL system designed to simultaneously minimize both types of idle time; existing systems do not eliminate or reduce both at the same time. FedOptima offloads the training of certain layers of a neural network from a device to server using three innovations. First, devices operate independently of each other using asynchronous aggregation to eliminate straggler effects, and independently of the server by utilizing auxiliary networks to minimize idle time caused by task dependency. Second, the server performs centralized training using a task scheduler that ensures balanced contributions from all devices, improving model accuracy. Third, an efficient memory management mechanism on the server increases scalability of the number of participating devices. Four state-of-the-art offloading-based and asynchronous FL methods are chosen as baselines. Experimental results show that compared to the best results of the baselines on convolutional neural networks and transformers on multiple lab-based testbeds, FedOptima (i) achieves higher or comparable accuracy, (ii) accelerates training by 1.9x to 21.8x, (iii) reduces server and device idle time by up to 93.9% and 81.8%, respectively, and (iv) increases throughput by 1.1x to 2.0x. △ Less

Submitted 10 March, 2025; originally announced April 2025.

arXiv:2504.06323 [pdf, ps, other]

doi 10.1016/j.future.2025.108056

Mosaic: Composite Projection Pruning for Resource-efficient LLMs

Authors: Bailey J. Eccles, Leon Wong, Blesson Varghese

Abstract: Extensive compute and memory requirements limit the deployment of large language models (LLMs) on any hardware. Compression methods, such as pruning, can reduce model size, which in turn reduces resource requirements. State-of-the-art pruning is based on coarse-grained methods. They are time-consuming and inherently remove critical model parameters, adversely impacting the quality of the pruned mo… ▽ More Extensive compute and memory requirements limit the deployment of large language models (LLMs) on any hardware. Compression methods, such as pruning, can reduce model size, which in turn reduces resource requirements. State-of-the-art pruning is based on coarse-grained methods. They are time-consuming and inherently remove critical model parameters, adversely impacting the quality of the pruned model. This paper introduces projection pruning, a novel fine-grained method for pruning LLMs. In addition, LLM projection pruning is enhanced by a new approach we refer to as composite projection pruning - the synergistic combination of unstructured pruning that retains accuracy and structured pruning that reduces model size. We develop Mosaic, a novel system to create and deploy pruned LLMs using composite projection pruning. Mosaic is evaluated using a range of performance and quality metrics on multiple hardware platforms, LLMs, and datasets. Mosaic is 7.19x faster in producing models than existing approaches. Mosaic models achieve up to 84.2% lower perplexity and 31.4% higher accuracy than models obtained from coarse-grained pruning. Up to 67% faster inference and 68% lower GPU memory use is noted for Mosaic models. Mosaic is available for public use from https://github.com/blessonvar/Mosaic △ Less

Submitted 12 August, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

arXiv:2504.00726 [pdf, other]

EMO: Edge Model Overlays to Scale Model Size in Federated Learning

Authors: Di Wu, Weibo He, Wanglei Feng, Zhenyu Wen, Bin Qian, Blesson Varghese

Abstract: Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlen… ▽ More Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlenecks and high communication costs. We propose EMO as a solution to enable the training of large models in FL while mitigating the challenges of SFL. EMO introduces Edge Model Overlay(s) between the device and server, enabling the creation of a larger ensemble model without modifying the FL workflow. The key innovation in EMO is Augmented Federated Learning (AFL), which builds an ensemble model by connecting the original (smaller) FL model with model(s) trained in the overlay(s) to facilitate horizontal or vertical scaling. This is accomplished through three key modules: a hierarchical activation replay cache to decouple AFL from FL, a convergence-aware communication controller to optimize communication overhead, and an ensemble inference module. Evaluations on a real-world prototype show that EMO improves accuracy by up to 17.77% compared to FL, and reduces communication costs by up to 7.17x and decreases training time by up to 6.9x compared to SFL. △ Less

Submitted 1 April, 2025; originally announced April 2025.

Comments: Poster accepted at IEEE ICDCS 2025

arXiv:2502.15790 [pdf, ps, other]

Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations

Authors: Dhananjay Saikumar, Blesson Varghese

Abstract: Neural network pruning is essential for reducing model complexity to enable deployment on resource constrained hardware. While performance loss of pruned networks is often attributed to the removal of critical parameters, we identify signal collapse a reduction in activation variance across layers as the root cause. Existing one shot pruning methods focus on weight selection strategies and rely on… ▽ More Neural network pruning is essential for reducing model complexity to enable deployment on resource constrained hardware. While performance loss of pruned networks is often attributed to the removal of critical parameters, we identify signal collapse a reduction in activation variance across layers as the root cause. Existing one shot pruning methods focus on weight selection strategies and rely on computationally expensive second order approximations. In contrast, we demonstrate that mitigating signal collapse, rather than optimizing weight selection, is key to improving accuracy of pruned networks. We propose REFLOW that addresses signal collapse without updating trainable weights, revealing high quality sparse sub networks within the original parameter space. REFLOW enables magnitude pruning to achieve state of the art performance, restoring ResNeXt101 accuracy from under 4.1% to 78.9% on ImageNet with only 20% of the weights retained, surpassing state of the art approaches. △ Less

Submitted 18 February, 2025; originally announced February 2025.

arXiv:2411.05983 [pdf, ps, other]

Longitudinal Ensemble Integration for sequential classification with multimodal data

Authors: Aviad Susman, Rupak Krishnamurthy, Yan Chak Li, Mohammad Olaimat, Serdar Bozdag, Bino Varghese, Nasim Sheikh-Bahaei, Gaurav Pandey

Abstract: Effectively modeling multimodal longitudinal data is a pressing need in various application areas, especially biomedicine. Despite this, few approaches exist in the literature for this problem, with most not adequately taking into account the multimodality of the data. In this study, we developed multiple configurations of a novel multimodal and longitudinal learning framework, Longitudinal Ensemb… ▽ More Effectively modeling multimodal longitudinal data is a pressing need in various application areas, especially biomedicine. Despite this, few approaches exist in the literature for this problem, with most not adequately taking into account the multimodality of the data. In this study, we developed multiple configurations of a novel multimodal and longitudinal learning framework, Longitudinal Ensemble Integration (LEI), for sequential classification. We evaluated LEI's performance, and compared it against existing approaches, for the early detection of dementia, which is among the most studied multimodal sequential classification tasks. LEI outperformed these approaches due to its use of intermediate base predictions arising from the individual data modalities, which enabled their better integration over time. LEI's design also enabled the identification of features that were consistently important across time for the effective prediction of dementia-related diagnoses. Overall, our work demonstrates the potential of LEI for sequential classification from longitudinal multimodal data. △ Less

Submitted 8 July, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

Comments: Accepted to IEEE ICDH 2025. This is the author's accepted manuscript (AAM). The final version will appear in the IEEE ICDH 2025 proceedings on IEEE Xplore

arXiv:2409.00807 [pdf, other]

Diffusion based multi-domain neuroimaging harmonization method with preservation of anatomical details

Authors: Haoyu Lan, Bino A. Varghese, Nasim Sheikh-Bahaei, Farshid Sepehrband, Arthur W Toga, Jeiran Choupan

Abstract: Multi-center neuroimaging studies face technical variability due to batch differences across sites, which potentially hinders data aggregation and impacts study reliability.Recent efforts in neuroimaging harmonization have aimed to minimize these technical gaps and reduce technical variability across batches. While Generative Adversarial Networks (GAN) has been a prominent method for addressing im… ▽ More Multi-center neuroimaging studies face technical variability due to batch differences across sites, which potentially hinders data aggregation and impacts study reliability.Recent efforts in neuroimaging harmonization have aimed to minimize these technical gaps and reduce technical variability across batches. While Generative Adversarial Networks (GAN) has been a prominent method for addressing image harmonization tasks, GAN-harmonized images suffer from artifacts or anatomical distortions. Given the advancements of denoising diffusion probabilistic model which produces high-fidelity images, we have assessed the efficacy of the diffusion model for neuroimaging harmonization. we have demonstrated the diffusion model's superior capability in harmonizing images from multiple domains, while GAN-based methods are limited to harmonizing images between two domains per model. Our experiments highlight that the learned domain invariant anatomical condition reinforces the model to accurately preserve the anatomical details while differentiating batch differences at each diffusion step. Our proposed method has been tested on two public neuroimaging dataset ADNI1 and ABIDE II, yielding harmonization results with consistent anatomy preservation and superior FID score compared to the GAN-based methods. We have conducted multiple analysis including extensive quantitative and qualitative evaluations against the baseline models, ablation study showcasing the benefits of the learned conditions, and improvements in the consistency of perivascular spaces (PVS) segmentation through harmonization. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2404.16877 [pdf, other]

doi 10.1109/CCGrid59990.2024.00044

Rapid Deployment of DNNs for Edge Computing via Structured Pruning at Initialization

Authors: Bailey J. Eccles, Leon Wong, Blesson Varghese

Abstract: Edge machine learning (ML) enables localized processing of data on devices and is underpinned by deep neural networks (DNNs). However, DNNs cannot be easily run on devices due to their substantial computing, memory and energy requirements for delivering performance that is comparable to cloud-based ML. Therefore, model compression techniques, such as pruning, have been considered. Existing pruning… ▽ More Edge machine learning (ML) enables localized processing of data on devices and is underpinned by deep neural networks (DNNs). However, DNNs cannot be easily run on devices due to their substantial computing, memory and energy requirements for delivering performance that is comparable to cloud-based ML. Therefore, model compression techniques, such as pruning, have been considered. Existing pruning methods are problematic for edge ML since they: (1) Create compressed models that have limited runtime performance benefits (using unstructured pruning) or compromise the final model accuracy (using structured pruning), and (2) Require substantial compute resources and time for identifying a suitable compressed DNN model (using neural architecture search). In this paper, we explore a new avenue, referred to as Pruning-at-Initialization (PaI), using structured pruning to mitigate the above problems. We develop Reconvene, a system for rapidly generating pruned models suited for edge deployments using structured PaI. Reconvene systematically identifies and prunes DNN convolution layers that are least sensitive to structured pruning. Reconvene rapidly creates pruned DNNs within seconds that are up to 16.21x smaller and 2x faster while maintaining the same accuracy as an unstructured PaI counterpart. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: The 24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing

arXiv:2404.03687 [pdf, other]

DRIVE: Dual Gradient-Based Rapid Iterative Pruning

Authors: Dhananjay Saikumar, Blesson Varghese

Abstract: Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference. Pruning is one solution that significantly reduces the space and time complexities of DNNs. Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning befo… ▽ More Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference. Pruning is one solution that significantly reduces the space and time complexities of DNNs. Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training. Pruning methods, such as iterative magnitude-based pruning (IMP) achieve up to a 90% parameter reduction while retaining accuracy comparable to the original model. However, this leads to impractical runtime as it relies on multiple train-prune-reset cycles to identify and eliminate redundant parameters. In contrast, training agnostic early pruning methods, such as SNIP and SynFlow offer fast pruning but fall short of the accuracy achieved by IMP at high sparsities. To bridge this gap, we present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the initialization. Subsequently, it employs a unique dual gradient-based metric for parameter ranking. It has been experimentally demonstrated for VGG and ResNet architectures on CIFAR-10/100 and Tiny ImageNet, and ResNet on ImageNet that DRIVE consistently has superior performance over other training-agnostic early pruning methods in accuracy. Notably, DRIVE is 43$\times$ to 869$\times$ faster than IMP for pruning. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.14212 [pdf, other]

doi 10.1063/5.0208517

CMOS-compatible photonic integrated circuits on thin-film ScAlN

Authors: Sihao Wang, Veerendra Dhyani, Sakthi Sanjeev Mohanraj, Xiaodong Shi, Binni Varghese, Wing Wai Chung, Ding Huang, Zhi Shiuh Lim, Qibin Zeng, Huajun Liu, Xianshu Luo, Victor Leong, Nanxi Li, Di Zhu

Abstract: Scandium aluminum nitride (ScAlN) has recently emerged as an attractive material for integrated photonics due to its favorable nonlinear optical properties and compatibility with CMOS fabrication. Despite the promising and versatile material properties, it is still an outstanding challenge to realize low-loss photonic circuits on thin-film ScAlN-on-insulator wafers. Here, we present a systematic s… ▽ More Scandium aluminum nitride (ScAlN) has recently emerged as an attractive material for integrated photonics due to its favorable nonlinear optical properties and compatibility with CMOS fabrication. Despite the promising and versatile material properties, it is still an outstanding challenge to realize low-loss photonic circuits on thin-film ScAlN-on-insulator wafers. Here, we present a systematic study on the material quality of sputtered thin-film ScAlN produced in a CMOS-compatible 200 mm line, and an optimized fabrication process to yield 400 nm thick, fully etched waveguides. With surface polishing and annealing, we achieve micro-ring resonators with an intrinsic quality factor as high as $1.47\times 10^5$, corresponding to a propagation loss of 2.4 dB/cm. These results serve as a critical step towards developing future large-scale, low-loss photonic integrated circuits based on ScAlN. △ Less

Submitted 11 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Journal ref: APL Photon. 9, 066109 (2024)

arXiv:2402.14139 [pdf, other]

NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning

Authors: Dhananjay Saikumar, Blesson Varghese

Abstract: Efficient on-device Convolutional Neural Network (CNN) training in resource-constrained mobile and edge environments is an open challenge. Backpropagation is the standard approach adopted, but it is GPU memory intensive due to its strong inter-layer dependencies that demand intermediate activations across the entire CNN model to be retained in GPU memory. This necessitates smaller batch sizes to m… ▽ More Efficient on-device Convolutional Neural Network (CNN) training in resource-constrained mobile and edge environments is an open challenge. Backpropagation is the standard approach adopted, but it is GPU memory intensive due to its strong inter-layer dependencies that demand intermediate activations across the entire CNN model to be retained in GPU memory. This necessitates smaller batch sizes to make training possible within the available GPU memory budget, but in turn, results in substantially high and impractical training time. We introduce NeuroFlux, a novel CNN training system tailored for memory-constrained scenarios. We develop two novel opportunities: firstly, adaptive auxiliary networks that employ a variable number of filters to reduce GPU memory usage, and secondly, block-specific adaptive batch sizes, which not only cater to the GPU memory constraints but also accelerate the training process. NeuroFlux segments a CNN into blocks based on GPU memory usage and further attaches an auxiliary network to each layer in these blocks. This disrupts the typical layer dependencies under a new training paradigm - $\textit{`adaptive local learning'}$. Moreover, NeuroFlux adeptly caches intermediate activations, eliminating redundant forward passes over previously trained blocks, further accelerating the training process. The results are twofold when compared to Backpropagation: on various hardware platforms, NeuroFlux demonstrates training speed-ups of 2.3$\times$ to 6.1$\times$ under stringent GPU memory budgets, and NeuroFlux generates streamlined models that have 10.9$\times$ to 29.4$\times$ fewer parameters. △ Less

Submitted 4 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted to EuroSys 2024

arXiv:2312.09626 [pdf]

Exploring Gender Disparities in Bumble's Match Recommendations

Authors: Ritvik Aryan Kalra, Pratham Gupta, Ben Varghese, Nimmi Rangaswamy

Abstract: We study bias and discrimination in the context of Bumble, an online dating platform in India. Drawing on research in AI fairness and inclusion studies we analyze algorithmic bias and their propensity to reproduce bias. We conducted an experiment to identify and address the presence of bias in the matching algorithms Bumble pushes to its users in the form of profiles for potential dates in the rea… ▽ More We study bias and discrimination in the context of Bumble, an online dating platform in India. Drawing on research in AI fairness and inclusion studies we analyze algorithmic bias and their propensity to reproduce bias. We conducted an experiment to identify and address the presence of bias in the matching algorithms Bumble pushes to its users in the form of profiles for potential dates in the real world. Dating apps like Bumble utilize algorithms that learn from user data to make recommendations. Even if the algorithm does not have intentions or consciousness, it is a system created and maintained by humans. We attribute moral agency of such systems to be compositely derived from algorithmic mediations, the design and utilization of these platforms. Developers, designers, and operators of dating platforms thus have a moral obligation to mitigate biases in the algorithms to create inclusive platforms that affirm diverse social identities. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2309.06973 [pdf, ps, other]

doi 10.1016/j.future.2023.09.025

DNNShifter: An Efficient DNN Pruning System for Edge Computing

Authors: Bailey J. Eccles, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To add… ▽ More Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To address this, models are pruned to create lightweight, more suitable variants for these devices. Existing pruning methods are unable to provide similar quality models compared to their unpruned counterparts without significant time costs and overheads or are limited to offline use cases. Our work rapidly derives suitable model variants while maintaining the accuracy of the original model. The model variants can be swapped quickly when system and network conditions change to match workload demand. This paper presents DNNShifter, an end-to-end DNN training, spatial pruning, and model switching system that addresses the challenges mentioned above. At the heart of DNNShifter is a novel methodology that prunes sparse models using structured pruning. The pruned model variants generated by DNNShifter are smaller in size and thus faster than dense and sparse model predecessors, making them suitable for inference at the edge while retaining near similar accuracy as of the original dense model. DNNShifter generates a portfolio of model variants that can be swiftly interchanged depending on operational conditions. DNNShifter produces pruned model variants up to 93x faster than conventional training methods. Compared to sparse models, the pruned model variants are up to 5.14x smaller and have a 1.67x inference latency speedup, with no compromise to sparse model accuracy. In addition, DNNShifter has up to 11.9x lower overhead for switching models and up to 3.8x lower memory utilisation than existing approaches. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 14 pages, 7 figures, 5 tables

MSC Class: 68T07 ACM Class: I.2.1

arXiv:2304.08675 [pdf, other]

Superconductivity in single crystals of a quasi-one dimensional infinite chain cuprate Sr$_x$Ca$_{1-x}$CuO$_2$ at 90 K

Authors: Neeraj K. Rajak, Dumpala Tirumalarao, Gourav Vaid, Sharath Kumar C, S. Athira, Govindarajan Prakash, Ashna Babu, Trupti Gaikwad, Shamili Chandradas, Alex P. Andrews, Aneesh A., Babu Varghese, Manoj Raama Varma, Arumugam Thamizhavel, S. Ramakrishnan, D. Jaiswal-Nagar

Abstract: Although there is no complete theory of high temperature superconductivity, the importance of CuO$_2$ planes in cuprate superconductors is confirmed from both theory and experiments. Strong Coulomb repulsion between electrons on the CuO$_2$ plane makes the resultant electron system highly correlated and a difficult problem to solve since exact solutions of many-body Hamiltonian in two dimensions d… ▽ More Although there is no complete theory of high temperature superconductivity, the importance of CuO$_2$ planes in cuprate superconductors is confirmed from both theory and experiments. Strong Coulomb repulsion between electrons on the CuO$_2$ plane makes the resultant electron system highly correlated and a difficult problem to solve since exact solutions of many-body Hamiltonian in two dimensions do not exist. If however, superconductivity can arise in structures having chains rather than planes and having a high critical temperature, then the high temperature superconductivity problem could become more tractable since exact solutions in one dimension do exist. In this paper, we report the observation of bulk superconductivity in single crystals of a cuprate Sr$_x$Ca$_{1-x}$CuO$_2$ at very high critical temperature, T$_c$, of $\sim$ 90 K whose structure reveals the presence of infinite double chains of Cu-O-Cu-O instead of CuO$_2$ planes, thus, ensuring quasi-one dimensional superconductivity. Bulk superconducting behaviour was observed in \textit{dc} magnetisation, \textit{ac} susceptibility as well as resistance measurements. The observation of bulk superconductivity in Sr$_x$Ca$_{1-x}$CuO$_2$ having chains of Cu-O-Cu-O rather than planes of CuO$_2$ at a high T$_c$ of 90 K is expected to profoundly impact our understanding of high temperature superconductivity. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 15 pages, 4 figures

arXiv:2304.05495 [pdf, other]

EcoFed: Efficient Communication for DNN Partitioning-based Federated Learning

Authors: Di Wu, Rehmat Ullah, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates signifi… ▽ More Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates significant communication overheads since the intermediate activation and gradient need to be transferred between the device and the server during training. While current research reduces the communication introduced by DNN partitioning using local loss-based methods, we demonstrate that these methods are ineffective in improving the overall efficiency (communication overhead and training speed) of a DPFL system. This is because they suffer from accuracy degradation and ignore the communication costs incurred when transferring the activation from the device to the server. This article proposes EcoFed - a communication efficient framework for DPFL systems. EcoFed eliminates the transmission of the gradient by developing pre-trained initialization of the DNN model on the device for the first time. This reduces the accuracy degradation seen in local loss-based methods. In addition, EcoFed proposes a novel replay buffer mechanism and implements a quantization-based compression technique to reduce the transmission of the activation. It is experimentally demonstrated that EcoFed can reduce the communication cost by up to 133x and accelerate training by up to 21x when compared to classic FL. Compared to vanilla DPFL, EcoFed achieves a 16x communication reduction and 2.86x training time speed-up. EcoFed is available from https://github.com/blessonvar/EcoFed. △ Less

Submitted 3 January, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

arXiv:2302.12803 [pdf, other]

PiPar: Pipeline Parallelism for Collaborative Machine Learning

Authors: Zihan Zhang, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Collaborative machine learning (CML) techniques, such as federated learning, have been proposed to train deep learning models across multiple mobile devices and a server. CML techniques are privacy-preserving as a local model that is trained on each device instead of the raw data from the device is shared with the server. However, CML training is inefficient due to low resource utilization. We ide… ▽ More Collaborative machine learning (CML) techniques, such as federated learning, have been proposed to train deep learning models across multiple mobile devices and a server. CML techniques are privacy-preserving as a local model that is trained on each device instead of the raw data from the device is shared with the server. However, CML training is inefficient due to low resource utilization. We identify idling resources on the server and devices due to sequential computation and communication as the principal cause of low resource utilization. A novel framework PiPar that leverages pipeline parallelism for CML techniques is developed to substantially improve resource utilization. A new training pipeline is designed to parallelize the computations on different hardware resources and communication on different bandwidth resources, thereby accelerating the training process in CML. A low overhead automated parameter selection method is proposed to optimize the pipeline, maximizing the utilization of available resources. The experimental results confirm the validity of the underlying approach of PiPar and highlight that when compared to federated learning: (i) the idle time of the server can be reduced by up to 64.1x, and (ii) the overall training time can be accelerated by up to 34.6x under varying network conditions for a collection of six small and large popular deep neural networks and four datasets without sacrificing accuracy. It is also experimentally demonstrated that PiPar achieves performance benefits when incorporating differential privacy methods and operating in environments with heterogeneous devices and changing bandwidths. △ Less

Submitted 25 June, 2024; v1 submitted 1 December, 2022; originally announced February 2023.

arXiv:2212.04645 [pdf, other]

doi 10.1016/j.iot.2022.100674

AI-based Fog and Edge Computing: A Systematic Review, Taxonomy and Future Directions

Authors: Sundas Iftikhar, Sukhpal Singh Gill, Chenghao Song, Minxian Xu, Mohammad Sadegh Aslanpour, Adel N. Toosi, Junhui Du, Huaming Wu, Shreya Ghosh, Deepraj Chowdhury, Muhammed Golec, Mohit Kumar, Ahmed M. Abdelmoniem, Felix Cuadrado, Blesson Varghese, Omer Rana, Schahram Dustdar, Steve Uhlig

Abstract: Resource management in computing is a very challenging problem that involves making sequential decisions. Resource limitations, resource heterogeneity, dynamic and diverse nature of workload, and the unpredictability of fog/edge computing environments have made resource management even more challenging to be considered in the fog landscape. Recently Artificial Intelligence (AI) and Machine Learnin… ▽ More Resource management in computing is a very challenging problem that involves making sequential decisions. Resource limitations, resource heterogeneity, dynamic and diverse nature of workload, and the unpredictability of fog/edge computing environments have made resource management even more challenging to be considered in the fog landscape. Recently Artificial Intelligence (AI) and Machine Learning (ML) based solutions are adopted to solve this problem. AI/ML methods with the capability to make sequential decisions like reinforcement learning seem most promising for these type of problems. But these algorithms come with their own challenges such as high variance, explainability, and online training. The continuously changing fog/edge environment dynamics require solutions that learn online, adopting changing computing environment. In this paper, we used standard review methodology to conduct this Systematic Literature Review (SLR) to analyze the role of AI/ML algorithms and the challenges in the applicability of these algorithms for resource management in fog/edge computing environments. Further, various machine learning, deep learning and reinforcement learning techniques for edge AI management have been discussed. Furthermore, we have presented the background and current status of AI/ML-based Fog/Edge Computing. Moreover, a taxonomy of AI/ML-based resource management techniques for fog/edge computing has been proposed and compared the existing techniques based on the proposed taxonomy. Finally, open challenges and promising future research directions have been identified and discussed in the area of AI/ML-based fog/edge computing. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: 49 page, 15 figures, 10 tables

Journal ref: Preprint for Publication in Elsevier IoT Journal 2022

arXiv:2210.16083 [pdf, other]

doi 10.1109/WACV56688.2023.00634

ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy

Authors: JunKyu Lee, Blesson Varghese, Hans Vandierendonck

Abstract: This paper analyzes the effects of dynamically varying video contents and detection latency on the real-time detection accuracy of a detector and proposes a new run-time accuracy variation model, ROMA, based on the findings from the analysis. ROMA is designed to select an optimal detector out of a set of detectors in real time without label information to maximize real-time object detection accura… ▽ More This paper analyzes the effects of dynamically varying video contents and detection latency on the real-time detection accuracy of a detector and proposes a new run-time accuracy variation model, ROMA, based on the findings from the analysis. ROMA is designed to select an optimal detector out of a set of detectors in real time without label information to maximize real-time object detection accuracy. ROMA utilizing four YOLOv4 detectors on an NVIDIA Jetson Nano shows real-time accuracy improvements by 4 to 37% for a scenario of dynamically varying video contents and detection latency consisting of MOT17Det and MOT20Det datasets, compared to individual YOLOv4 detectors and two state-of-the-art runtime techniques. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: Accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

arXiv:2209.02052 [pdf, other]

RX-ADS: Interpretable Anomaly Detection using Adversarial ML for Electric Vehicle CAN data

Authors: Chathurika S. Wickramasinghe, Daniel L. Marino, Harindra S. Mavikumbure, Victor Cobilean, Timothy D. Pennington, Benny J. Varghese, Craig Rieger, Milos Manic

Abstract: Recent year has brought considerable advancements in Electric Vehicles (EVs) and associated infrastructures/communications. Intrusion Detection Systems (IDS) are widely deployed for anomaly detection in such critical infrastructures. This paper presents an Interpretable Anomaly Detection System (RX-ADS) for intrusion detection in CAN protocol communication in EVs. Contributions include: 1) window… ▽ More Recent year has brought considerable advancements in Electric Vehicles (EVs) and associated infrastructures/communications. Intrusion Detection Systems (IDS) are widely deployed for anomaly detection in such critical infrastructures. This paper presents an Interpretable Anomaly Detection System (RX-ADS) for intrusion detection in CAN protocol communication in EVs. Contributions include: 1) window based feature extraction method; 2) deep Autoencoder based anomaly detection method; and 3) adversarial machine learning based explanation generation methodology. The presented approach was tested on two benchmark CAN datasets: OTIDS and Car Hacking. The anomaly detection performance of RX-ADS was compared against the state-of-the-art approaches on these datasets: HIDS and GIDS. The RX-ADS approach presented performance comparable to the HIDS approach (OTIDS dataset) and has outperformed HIDS and GIDS approaches (Car Hacking dataset). Further, the proposed approach was able to generate explanations for detected abnormal behaviors arising from various intrusions. These explanations were later validated by information used by domain experts to detect anomalies. Other advantages of RX-ADS include: 1) the method can be trained on unlabeled data; 2) explanations help experts in understanding anomalies and root course analysis, and also help with AI model debugging and diagnostics, ultimately improving user trust in AI systems. △ Less

Submitted 5 September, 2022; originally announced September 2022.

arXiv:2208.08764 [pdf, other]

FedComm: Understanding Communication Protocols for Edge-based Federated Learning

Authors: Gary Cleland, Di Wu, Rehmat Ullah, Blesson Varghese

Abstract: Federated learning (FL) trains machine learning (ML) models on devices using locally generated data and exchanges models without transferring raw data to a distant server. This exchange incurs a communication overhead and impacts the performance of FL training. There is limited understanding of how communication protocols specifically contribute to the performance of FL. Such an understanding is e… ▽ More Federated learning (FL) trains machine learning (ML) models on devices using locally generated data and exchanges models without transferring raw data to a distant server. This exchange incurs a communication overhead and impacts the performance of FL training. There is limited understanding of how communication protocols specifically contribute to the performance of FL. Such an understanding is essential for selecting the right communication protocol when designing an FL system. This paper presents FedComm, a benchmarking methodology to quantify the impact of optimized application layer protocols, namely Message Queue Telemetry Transport (MQTT), Advanced Message Queuing Protocol (AMQP), and ZeroMQ Message Transport Protocol (ZMTP), and non-optimized application layer protocols, namely as TCP and UDP, on the performance of FL. FedComm measures the overall performance of FL in terms of communication time and accuracy under varying computational and network stress and packet loss rates. Experiments on a lab-based testbed demonstrate that TCP outperforms UDP as a non-optimized application layer protocol with higher accuracy and shorter communication times for 4G and Wi-Fi networks. Optimized application layer protocols such as AMQP, MQTT, and ZMTP outperformed non-optimized application layer protocols in most network conditions, resulting in a 2.5x reduction in communication time compared to TCP while maintaining accuracy. The experimental results enable us to highlight a number of open research issues for further investigation. FedComm is available for download from https://github.com/qub-blesson/FedComm. △ Less

Submitted 18 August, 2022; originally announced August 2022.

arXiv:2206.05267 [pdf, other]

CONTINUER: Maintaining Distributed DNN Services During Edge Failures

Authors: Ayesha Abdul Majeed, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Partitioning and deploying Deep Neural Networks (DNNs) across edge nodes may be used to meet performance objectives of applications. However, the failure of a single node may result in cascading failures that will adversely impact the delivery of the service and will result in failure to meet specific objectives. The impact of these failures needs to be minimised at runtime. Three techniques are e… ▽ More Partitioning and deploying Deep Neural Networks (DNNs) across edge nodes may be used to meet performance objectives of applications. However, the failure of a single node may result in cascading failures that will adversely impact the delivery of the service and will result in failure to meet specific objectives. The impact of these failures needs to be minimised at runtime. Three techniques are explored in this paper, namely repartitioning, early-exit and skip-connection. When an edge node fails, the repartitioning technique will repartition and redeploy the DNN thus avoiding the failed nodes. The early-exit technique makes provision for a request to exit (early) before the failed node. The skip connection technique dynamically routes the request by skipping the failed nodes. This paper will leverage trade-offs in accuracy, end-to-end latency and downtime for selecting the best technique given user-defined objectives (accuracy, latency and downtime thresholds) when an edge node fails. To this end, CONTINUER is developed. Two key activities of the framework are estimating the accuracy and latency when using the techniques for distributed DNNs and selecting the best technique. It is demonstrated on a lab-based experimental testbed that CONTINUER estimates accuracy and latency when using the techniques with no more than an average error of 0.28% and 13.06%, respectively and selects the suitable technique with a low overhead of no more than 16.82 milliseconds and an accuracy of up to 99.86%. △ Less

Submitted 25 April, 2022; originally announced June 2022.

Comments: 10 pages

arXiv:2112.00616 [pdf, other]

doi 10.1145/3523230.3523235

Roadmap for Edge AI: A Dagstuhl Perspective

Authors: Aaron Yi Ding, Ella Peltonen, Tobias Meuser, Atakan Aral, Christian Becker, Schahram Dustdar, Thomas Hiessl, Dieter Kranzlmuller, Madhusanka Liyanage, Setareh Magshudi, Nitinder Mohan, Joerg Ott, Jan S. Rellermeyer, Stefan Schulte, Henning Schulzrinne, Gurkan Solmaz, Sasu Tarkoma, Blesson Varghese, Lars Wolf

Abstract: Based on the collective input of Dagstuhl Seminar (21342), this paper presents a comprehensive discussion on AI methods and capabilities in the context of edge computing, referred as Edge AI. In a nutshell, we envision Edge AI to provide adaptation for data-driven applications, enhance network and radio access, and allow the creation, optimization, and deployment of distributed AI/ML pipelines wit… ▽ More Based on the collective input of Dagstuhl Seminar (21342), this paper presents a comprehensive discussion on AI methods and capabilities in the context of edge computing, referred as Edge AI. In a nutshell, we envision Edge AI to provide adaptation for data-driven applications, enhance network and radio access, and allow the creation, optimization, and deployment of distributed AI/ML pipelines with given quality of experience, trust, security and privacy targets. The Edge AI community investigates novel ML methods for the edge computing environment, spanning multiple sub-fields of computer science, engineering and ICT. The goal is to share an envisioned roadmap that can bring together key actors and enablers to further advance the domain of Edge AI. △ Less

Submitted 27 November, 2021; originally announced December 2021.

Comments: for ACM SIGCOMM CCR

Report number: 2112.00616 ACM Class: I.2.11

Journal ref: ACM SIGCOMM Computer Communication Review; Vol. 52, No. 1; 2022

arXiv:2111.05190 [pdf, other]

QUDOS: Quorum-Based Cloud-Edge Distributed DNNs for Security Enhanced Industry 4.0

Authors: Kevin Wallis, Christoph Reich, Blesson Varghese, Christian Schindelhauer

Abstract: Distributed machine learning algorithms that employ Deep Neural Networks (DNNs) are widely used in Industry 4.0 applications, such as smart manufacturing. The layers of a DNN can be mapped onto different nodes located in the cloud, edge and shop floor for preserving privacy. The quality of the data that is fed into and processed through the DNN is of utmost importance for critical tasks, such as i… ▽ More Distributed machine learning algorithms that employ Deep Neural Networks (DNNs) are widely used in Industry 4.0 applications, such as smart manufacturing. The layers of a DNN can be mapped onto different nodes located in the cloud, edge and shop floor for preserving privacy. The quality of the data that is fed into and processed through the DNN is of utmost importance for critical tasks, such as inspection and quality control. Distributed Data Validation Networks (DDVNs) are used to validate the quality of the data. However, they are prone to single points of failure when an attack occurs. This paper proposes QUDOS, an approach that enhances the security of a distributed DNN that is supported by DDVNs using quorums. The proposed approach allows individual nodes that are corrupted due to an attack to be detected or excluded when the DNN produces an output. Metrics such as corruption factor and success probability of an attack are considered for evaluating the security aspects of DNNs. A simulation study demonstrates that if the number of corrupted nodes is less than a given threshold for decision-making in a quorum, the QUDOS approach always prevents attacks. Furthermore, the study shows that increasing the size of the quorum has a better impact on security than increasing the number of layers. One merit of QUDOS is that it enhances the security of DNNs without requiring any modifications to the algorithm and can therefore be applied to other classes of problems. △ Less

Submitted 9 November, 2021; originally announced November 2021.

arXiv:2111.01516 [pdf, other]

FedFly: Towards Migration in Edge-based Distributed Federated Learning

Authors: Rehmat Ullah, Di Wu, Paul Harvey, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Federated learning (FL) is a privacy-preserving distributed machine learning technique that trains models while keeping all the original data generated on devices locally. Since devices may be resource constrained, offloading can be used to improve FL performance by transferring computational workload from devices to edge servers. However, due to mobility, devices participating in FL may leave the… ▽ More Federated learning (FL) is a privacy-preserving distributed machine learning technique that trains models while keeping all the original data generated on devices locally. Since devices may be resource constrained, offloading can be used to improve FL performance by transferring computational workload from devices to edge servers. However, due to mobility, devices participating in FL may leave the network during training and need to connect to a different edge server. This is challenging because the offloaded computations from edge server need to be migrated. In line with this assertion, we present FedFly, which is, to the best of our knowledge, the first work to migrate a deep neural network (DNN) when devices move between edge servers during FL training. Our empirical results on the CIFAR10 dataset, with both balanced and imbalanced data distribution, support our claims that FedFly can reduce training time by up to 33% when a device moves after 50% of the training is completed, and by up to 45% when 90% of the training is completed when compared to state-of-the-art offloading approach in FL. FedFly has negligible overhead of up to two seconds and does not compromise accuracy. Finally, we highlight a number of open research issues for further investigation. FedFly can be downloaded from https://github.com/qub-blesson/FedFly. △ Less

Submitted 14 July, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: 7 pages, 6 figures

arXiv:2107.04271 [pdf, other]

FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning

Authors: Di Wu, Rehmat Ullah, Paul Harvey, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execution on devices with limited computational capabilities, (ii) accounting for stragglers due to computational heterogeneity of devices, and (iii) ada… ▽ More Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execution on devices with limited computational capabilities, (ii) accounting for stragglers due to computational heterogeneity of devices, and (iii) adaptation to the changing network bandwidths. This paper presents FedAdapt, an adaptive offloading FL framework to mitigate the aforementioned challenges. FedAdapt accelerates local training in computationally constrained devices by leveraging layer offloading of deep neural networks (DNNs) to servers. Further, FedAdapt adopts reinforcement learning based optimization and clustering to adaptively identify which layers of the DNN should be offloaded for each individual device on to a server to tackle the challenges of computational heterogeneity and changing network bandwidth. Experimental studies are carried out on a lab-based testbed and it is demonstrated that by offloading a DNN from the device to the server FedAdapt reduces the training time of a typical IoT device by over half compared to classic FL. The training time of extreme stragglers and the overall training time can be reduced by up to 57%. Furthermore, with changing network bandwidth, FedAdapt is demonstrated to reduce the training time by up to 40% when compared to classic FL, without sacrificing accuracy. △ Less

Submitted 18 May, 2022; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: 13 pages

arXiv:2106.15689 [pdf, other]

NEUKONFIG: Reducing Edge Service Downtime When Repartitioning DNNs

Authors: Ayesha Abdul Majeed, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Deep Neural Networks (DNNs) may be partitioned across the edge and the cloud to improve the performance efficiency of inference. DNN partitions are determined based on operational conditions such as network speed. When operational conditions change DNNs will need to be repartitioned to maintain the overall performance. However, repartitioning using existing approaches, such as Pause and Resume, wi… ▽ More Deep Neural Networks (DNNs) may be partitioned across the edge and the cloud to improve the performance efficiency of inference. DNN partitions are determined based on operational conditions such as network speed. When operational conditions change DNNs will need to be repartitioned to maintain the overall performance. However, repartitioning using existing approaches, such as Pause and Resume, will incur a service downtime on the edge. This paper presents the NEUKONFIG framework that identifies the service downtime incurred when repartitioning DNNs and proposes approaches for reducing edge service downtime. The proposed approaches are based on 'Dynamic Switching' in which, when the network speed changes and given an existing edge-cloud pipeline, a new edge-cloud pipeline is initialised with new DNN partitions. Incoming inference requests are switched to the new pipeline for processing data. Two dynamic switching scenarios are considered: when a second edge-cloud pipeline is always running and when a second pipeline is only initialised when the network speed changes. Experimental studies are carried out on a lab-based testbed to demonstrate that Dynamic Switching reduces the downtime by at least an order of magnitude when compared to a baseline using Pause and Resume that has a downtime of 6 seconds. A trade-off in the edge service downtime and memory required is noted. The Dynamic Switching approach that requires the same amount of memory as the baseline reduces the edge service downtime to 0.6 seconds and to less than 1 millisecond in the best case when twice the amount of memory as the baseline is available. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: 10 pages

arXiv:2106.12224 [pdf, other]

Revisiting the Arguments for Edge Computing Research

Authors: Blesson Varghese, Eyal de Lara, Aaron Ding, Cheol-Ho Hong, Flavio Bonomi, Schahram Dustdar, Paul Harvey, Peter Hewkin, Weisong Shi, Mark Thiele, Peter Willis

Abstract: This article argues that low latency, high bandwidth, device proliferation, sustainable digital infrastructure, and data privacy and sovereignty continue to motivate the need for edge computing research even though its initial concepts were formulated more than a decade ago. This article argues that low latency, high bandwidth, device proliferation, sustainable digital infrastructure, and data privacy and sovereignty continue to motivate the need for edge computing research even though its initial concepts were formulated more than a decade ago. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2105.08668 [pdf, other]

doi 10.1109/ICFEC51620.2021.00015

TOD: Transprecise Object Detection to Maximise Real-Time Accuracy on the Edge

Authors: JunKyu Lee, Blesson Varghese, Roger Woods, Hans Vandierendonck

Abstract: Real-time video analytics on the edge is challenging as the computationally constrained resources typically cannot analyse video streams at full fidelity and frame rate, which results in loss of accuracy. This paper proposes a Transprecise Object Detector (TOD) which maximises the real-time object detection accuracy on an edge device by selecting an appropriate Deep Neural Network (DNN) on the fly… ▽ More Real-time video analytics on the edge is challenging as the computationally constrained resources typically cannot analyse video streams at full fidelity and frame rate, which results in loss of accuracy. This paper proposes a Transprecise Object Detector (TOD) which maximises the real-time object detection accuracy on an edge device by selecting an appropriate Deep Neural Network (DNN) on the fly with negligible computational overhead. TOD makes two key contributions over the state of the art: (1) TOD leverages characteristics of the video stream such as object size and speed of movement to identify networks with high prediction accuracy for the current frames; (2) it selects the best-performing network based on projected accuracy and computational demand using an effective and low-overhead decision mechanism. Experimental evaluation on a Jetson Nano demonstrates that TOD improves the average object detection precision by 34.7 % over the YOLOv4-tiny-288 model on average over the MOT17Det dataset. In the MOT17-05 test dataset, TOD utilises only 45.1 % of GPU resource and 62.7 % of the GPU board power without losing accuracy, compared to YOLOv4-416 model. We expect that TOD will maximise the application of edge devices to real-time object detection, since TOD maximises real-time object detection accuracy given edge devices according to dynamic input features without increasing inference latency in practice. △ Less

Submitted 18 May, 2021; originally announced May 2021.

arXiv:2105.02019 [pdf, other]

ScissionLite: Accelerating Distributed Deep Neural Networks Using Transfer Layer

Authors: Hyunho Ahn, Munkyu Lee, Cheol-Ho Hong, Blesson Varghese

Abstract: Industrial Internet of Things (IIoT) applications can benefit from leveraging edge computing. For example, applications underpinned by deep neural networks (DNN) models can be sliced and distributed across the IIoT device and the edge of the network for improving the overall performance of inference and for enhancing privacy of the input data, such as industrial product images. However, low networ… ▽ More Industrial Internet of Things (IIoT) applications can benefit from leveraging edge computing. For example, applications underpinned by deep neural networks (DNN) models can be sliced and distributed across the IIoT device and the edge of the network for improving the overall performance of inference and for enhancing privacy of the input data, such as industrial product images. However, low network performance between IIoT devices and the edge is often a bottleneck. In this study, we develop ScissionLite, a holistic framework for accelerating distributed DNN inference using the Transfer Layer (TL). The TL is a traffic-aware layer inserted between the optimal slicing point of a DNN model slice in order to decrease the outbound network traffic without a significant accuracy drop. For the TL, we implement a new lightweight down/upsampling network for performance-limited IIoT devices. In ScissionLite, we develop ScissionTL, the Preprocessor, and the Offloader for end-to-end activities for deploying DNN slices with the TL. They decide the optimal slicing point of the DNN, prepare pre-trained DNN slices including the TL, and execute the DNN slices on an IIoT device and the edge. Employing the TL for the sliced DNN models has a negligible overhead. ScissionLite improves the inference latency by up to 16 and 2.8 times when compared to execution on the local device and an existing state-of-the-art model slicing approach respectively. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 10 pages

arXiv:2103.04930 [pdf, other]

AVEC: Accelerator Virtualization in Cloud-Edge Computing for Deep Learning Libraries

Authors: Jason Kennedy, Blesson Varghese, Carlos Reaño

Abstract: Edge computing offers the distinct advantage of harnessing compute capabilities on resources located at the edge of the network to run workloads of relatively weak user devices. This is achieved by offloading computationally intensive workloads, such as deep learning from user devices to the edge. Using the edge reduces the overall communication latency of applications as workloads can be processe… ▽ More Edge computing offers the distinct advantage of harnessing compute capabilities on resources located at the edge of the network to run workloads of relatively weak user devices. This is achieved by offloading computationally intensive workloads, such as deep learning from user devices to the edge. Using the edge reduces the overall communication latency of applications as workloads can be processed closer to where data is generated on user devices rather than sending them to geographically distant clouds. Specialised hardware accelerators, such as Graphics Processing Units (GPUs) available in the cloud-edge network can enhance the performance of computationally intensive workloads that are offloaded from devices on to the edge. The underlying approach required to facilitate this is virtualization of GPUs. This paper therefore sets out to investigate the potential of GPU accelerator virtualization to improve the performance of deep learning workloads in a cloud-edge environment. The AVEC accelerator virtualization framework is proposed that incurs minimum overheads and requires no source-code modification of the workload. AVEC intercepts local calls to a GPU on a device and forwards them to an edge resource seamlessly. The feasibility of AVEC is demonstrated on a real-world application, namely OpenPose using the Caffe deep learning library. It is observed that on a lab-based experimental test-bed AVEC delivers up to 7.48x speedup despite communication overheads incurred due to data transfers. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 8 pages, 13 figures

arXiv:2009.03035 [pdf, other]

doi 10.1038/s42005-021-00651-y

Global strain-induced scalar potential in graphene devices

Authors: Lujun Wang, Andreas Baumgartner, Péter Makk, Simon Zihlmann, Blesson S. Varghese, David I. Indolese, Kenji Watanabe, Takashi Taniguchi, Christian Schönenberger

Abstract: By mechanically distorting a crystal lattice it is possible to engineer the electronic and optical properties of a material. In graphene, one of the major effects of such a distortion is an energy shift of the Dirac point, often described as a scalar potential. We demonstrate how such a scalar potential can be generated systematically over an entire electronic device and how the resulting changes… ▽ More By mechanically distorting a crystal lattice it is possible to engineer the electronic and optical properties of a material. In graphene, one of the major effects of such a distortion is an energy shift of the Dirac point, often described as a scalar potential. We demonstrate how such a scalar potential can be generated systematically over an entire electronic device and how the resulting changes in the graphene work function can be detected in transport experiments. Combined with Raman spectroscopy, we obtain a characteristic scalar potential consistent with recent theoretical estimates. This direct evidence for a scalar potential on a macroscopic scale due to deterministically generated strain in graphene paves the way for engineering the optical and electronic properties of graphene and similar materials by using external strain. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Journal ref: Comm. Phys. 4, 147 (2021)

arXiv:2008.03523 [pdf, other]

Scission: Performance-driven and Context-aware Cloud-Edge Distribution of Deep Neural Networks

Authors: Luke Lockhart, Paul Harvey, Pierre Imai, Peter Willis, Blesson Varghese

Abstract: Partitioning and distributing deep neural networks (DNNs) across end-devices, edge resources and the cloud has a potential twofold advantage: preserving privacy of the input data, and reducing the ingress bandwidth demand beyond the edge. However, for a given DNN, identifying the optimal partition configuration for distributing the DNN that maximizes performance is a significant challenge. This is… ▽ More Partitioning and distributing deep neural networks (DNNs) across end-devices, edge resources and the cloud has a potential twofold advantage: preserving privacy of the input data, and reducing the ingress bandwidth demand beyond the edge. However, for a given DNN, identifying the optimal partition configuration for distributing the DNN that maximizes performance is a significant challenge. This is because the combination of potential target hardware resources that maximizes performance and the sequence of layers of the DNN that should be distributed across the target resources needs to be determined, while accounting for user-defined objectives/constraints for partitioning. This paper presents Scission, a tool for automated benchmarking of DNNs on a given set of target device, edge and cloud resources for determining optimal partitions that maximize DNN performance. The decision-making approach is context-aware by capitalizing on hardware capabilities of the target resources, their locality, the characteristics of DNN layers, and the network condition. Experimental studies are carried out on 18 DNNs. The decisions made by Scission cannot be manually made by a human given the complexity and the number of dimensions affecting the search space. The benchmarking overheads of Scission allow for responding to operational changes periodically rather than in real-time. Scission is available for public download at https://github.com/qub-blesson/Scission. △ Less

Submitted 16 December, 2020; v1 submitted 8 August, 2020; originally announced August 2020.

Comments: Accepted to IEEE/ACM UCC 2020

arXiv:2008.01814 [pdf, other]

A Case For Adaptive Deep Neural Networks in Edge Computing

Authors: Francis McNamee, Schahram Dustadar, Peter Kilpatrick, Weisong Shi, Ivor Spence, Blesson Varghese

Abstract: Edge computing offers an additional layer of compute infrastructure closer to the data source before raw data from privacy-sensitive and performance-critical applications is transferred to a cloud data center. Deep Neural Networks (DNNs) are one class of applications that are reported to benefit from collaboratively computing between the edge and the cloud. A DNN is partitioned such that specific… ▽ More Edge computing offers an additional layer of compute infrastructure closer to the data source before raw data from privacy-sensitive and performance-critical applications is transferred to a cloud data center. Deep Neural Networks (DNNs) are one class of applications that are reported to benefit from collaboratively computing between the edge and the cloud. A DNN is partitioned such that specific layers of the DNN are deployed onto the edge and the cloud to meet performance and privacy objectives. However, there is limited understanding of: (a) whether and how evolving operational conditions (increased CPU and memory utilization at the edge or reduced data transfer rates between the edge and the cloud) affect the performance of already deployed DNNs, and (b) whether a new partition configuration is required to maximize performance. A DNN that adapts to changing operational conditions is referred to as an 'adaptive DNN'. This paper investigates whether there is a case for adaptive DNNs in edge computing by considering three questions: (i) Are DNNs sensitive to operational conditions? (ii) How sensitive are DNNs to operational conditions? (iii) Do individual or a combination of operational conditions equally affect DNNs? (iv) Is DNN partitioning sensitive to hardware architectures on the cloud/edge? The exploration is carried out in the context of 8 pre-trained DNN models and the results presented are from analyzing nearly 8 million data points. The results highlight that network conditions affects DNN performance more than CPU or memory related operational conditions. Repartitioning is noted to provide a performance gain in a number of cases, but a specific trend was not noted in relation to its correlation to the underlying hardware architecture. Nonetheless, the need for adaptive DNNs is confirmed. △ Less

Submitted 16 December, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

arXiv:2006.12761 [pdf, other]

Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative

Authors: Mingxi Lei, Bino Varghese, Darryl Hwang, Steven Cen, Xiaomeng Lei, Afshin Azadikhah, Bhushan Desai, Assad Oberai, Vinay Duddalwar

Abstract: There is no consensus regarding the radiomic feature terminology, the underlying mathematics, or their implementation. This creates a scenario where features extracted using different toolboxes could not be used to build or validate the same model leading to a non-generalization of radiomic results. In this study, the image biomarker standardization initiative (IBSI) established phantom and benchm… ▽ More There is no consensus regarding the radiomic feature terminology, the underlying mathematics, or their implementation. This creates a scenario where features extracted using different toolboxes could not be used to build or validate the same model leading to a non-generalization of radiomic results. In this study, the image biomarker standardization initiative (IBSI) established phantom and benchmark values were used to compare the variation of the radiomic features while using 6 publicly available software programs and 1 in-house radiomics pipeline. All IBSI-standardized features (11 classes, 173 in total) were extracted. The relative differences between the extracted feature values from the different software and the IBSI benchmark values were calculated to measure the inter-software agreement. To better understand the variations, features are further grouped into 3 categories according to their properties: 1) morphology, 2) statistic/histogram and 3)texture features. While a good agreement was observed for a majority of radiomics features across the various programs, relatively poor agreement was observed for morphology features. Significant differences were also found in programs that use different gray level discretization approaches. Since these programs do not include all IBSI features, the level of quantitative assessment for each category was analyzed using Venn and the UpSet diagrams and also quantified using two ad hoc metrics. Morphology features earns lowest scores for both metrics, indicating that morphological features are not consistently evaluated among software programs. We conclude that radiomic features calculated using different software programs may not be identical and reliable. Further studies are needed to standardize the workflow of radiomic feature extraction. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: 21 pages, 8 figures

arXiv:2006.00342 [pdf, other]

WattsApp: Power-Aware Container Scheduling

Authors: Hemant Mehta, Paul Harvey, Omer Rana, Rajkumar Buyya, Blesson Varghese

Abstract: Containers are becoming a popular workload deployment mechanism in modern distributed systems. However, there are limited software-based methods (hardware-based methods are expensive requiring hardware level changes) for obtaining the power consumed by containers for facilitating power-aware container scheduling, an essential activity for efficient management of distributed systems. This paper pre… ▽ More Containers are becoming a popular workload deployment mechanism in modern distributed systems. However, there are limited software-based methods (hardware-based methods are expensive requiring hardware level changes) for obtaining the power consumed by containers for facilitating power-aware container scheduling, an essential activity for efficient management of distributed systems. This paper presents WattsApp, a tool underpinned by a six step software-based method for power-aware container scheduling to minimize power cap violations on a server. The proposed method relies on a neural network-based power estimation model and a power capped container scheduling technique. Experimental studies are pursued in a lab-based environment on 10 benchmarks deployed on Intel and ARM processors. The results highlight that the power estimation model has negligible overheads for data collection - nearly 90% of all data samples can be estimated with less than a 10% error, and the Mean Absolute Percentage Error (MAPE) is less than 6%. The power-aware scheduling of WattsApp is more effective than Intel's Running Power Average Limit (RAPL) based power capping for both single and multiple containers as it does not degrade the performance of all containers running on the server. The results confirm the feasibility of WattsApp. △ Less

Submitted 30 May, 2020; originally announced June 2020.

arXiv:2004.11725 [pdf, other]

A Survey on Edge Performance Benchmarking

Authors: Blesson Varghese, Nan Wang, David Bermbach, Cheol-Ho Hong, Eyal de Lara, Weisong Shi, Christopher Stewart

Abstract: Edge computing is the next Internet frontier that will leverage computing resources located near users, sensors, and data stores to provide more responsive services. Therefore, it is envisioned that a large-scale, geographically dispersed, and resource-rich distributed system will emerge and play a key role in the future Internet. However, given the loosely coupled nature of such complex systems,… ▽ More Edge computing is the next Internet frontier that will leverage computing resources located near users, sensors, and data stores to provide more responsive services. Therefore, it is envisioned that a large-scale, geographically dispersed, and resource-rich distributed system will emerge and play a key role in the future Internet. However, given the loosely coupled nature of such complex systems, their operational conditions are expected to change significantly over time. In this context, the performance characteristics of such systems will need to be captured rapidly, which is referred to as performance benchmarking, for application deployment, resource orchestration, and adaptive decision-making. Edge performance benchmarking is a nascent research avenue that has started gaining momentum over the past five years. This article first reviews articles published over the past three decades to trace the history of performance benchmarking from tightly coupled to loosely coupled systems. It then systematically classifies previous research to identify the system under test, techniques analyzed, and benchmark runtime in edge performance benchmarking. △ Less

Submitted 16 December, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Accepted by ACM Computing Surveys, 16 December 2020

arXiv:2003.08305 [pdf, other]

Cross Architectural Power Modelling

Authors: Kai Chen, Peter Kilpatrick, Dimitrios S. Nikolopoulos, Blesson Varghese

Abstract: Existing power modelling research focuses on the model rather than the process for developing models. An automated power modelling process that can be deployed on different processors for developing power models with high accuracy is developed. For this, (i) an automated hardware performance counter selection method that selects counters best correlated to power on both ARM and Intel processors, (… ▽ More Existing power modelling research focuses on the model rather than the process for developing models. An automated power modelling process that can be deployed on different processors for developing power models with high accuracy is developed. For this, (i) an automated hardware performance counter selection method that selects counters best correlated to power on both ARM and Intel processors, (ii) a noise filter based on clustering that can reduce the mean error in power models, and (iii) a two stage power model that surmounts challenges in using existing power models across multiple architectures are proposed and developed. The key results are: (i) the automated hardware performance counter selection method achieves comparable selection to the manual method reported in the literature, (ii) the noise filter reduces the mean error in power models by up to 55%, and (iii) the two stage power model can predict dynamic power with less than 8% error on both ARM and Intel processors, which is an improvement over classic models. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: 10 pages; IEEE/ACM CCGrid 2020. arXiv admin note: text overlap with arXiv:1710.10325

arXiv:2002.05531 [pdf, other]

Modelling Fog Offloading Performance

Authors: Ayesha Abdul Majeed, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Fog computing has emerged as a computing paradigm aimed at addressing the issues of latency, bandwidth and privacy when mobile devices are communicating with remote cloud services. The concept is to offload compute services closer to the data. However many challenges exist in the realisation of this approach. During offloading, (part of) the application underpinned by the services may be unavailab… ▽ More Fog computing has emerged as a computing paradigm aimed at addressing the issues of latency, bandwidth and privacy when mobile devices are communicating with remote cloud services. The concept is to offload compute services closer to the data. However many challenges exist in the realisation of this approach. During offloading, (part of) the application underpinned by the services may be unavailable, which the user will experience as down time. This paper describes work aimed at building models to allow prediction of such down time based on metrics (operational data) of the underlying and surrounding infrastructure. Such prediction would be invaluable in the context of automated Fog offloading and adaptive decision making in Fog orchestration. Models that cater for four container-based stateless and stateful offload techniques, namely Save and Load, Export and Import, Push and Pull and Live Migration, are built using four (linear and non-linear) regression techniques. Experimental results comprising over 42 million data points from multiple lab-based Fog infrastructure are presented. The results highlight that reasonably accurate predictions (measured by the coefficient of determination for regression models, mean absolute percentage error, and mean absolute error) may be obtained when considering 25 metrics relevant to the infrastructure. △ Less

Submitted 12 February, 2020; originally announced February 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1909.04945

arXiv:2001.09228 [pdf, other]

Context-aware Distribution of Fog Applications Using Deep Reinforcement Learning

Authors: Nan Wang, Blesson Varghese

Abstract: Fog computing is an emerging paradigm that aims to meet the increasing computation demands arising from the billions of devices connected to the Internet. Offloading services of an application from the Cloud to the edge of the network can improve the overall Quality-of-Service (QoS) of the application since it can process data closer to user devices. Diverse Fog nodes ranging from Wi-Fi routers to… ▽ More Fog computing is an emerging paradigm that aims to meet the increasing computation demands arising from the billions of devices connected to the Internet. Offloading services of an application from the Cloud to the edge of the network can improve the overall Quality-of-Service (QoS) of the application since it can process data closer to user devices. Diverse Fog nodes ranging from Wi-Fi routers to mini-clouds with varying resource capabilities makes it challenging to determine which services of an application need to be offloaded. In this paper, a context-aware mechanism for distributing applications across the Cloud and the Fog is proposed. The mechanism dynamically generates (re)deployment plans for the application to maximise the performance efficiency of the application by taking the QoS and running costs into account. The mechanism relies on deep Q-networks to generate a distribution plan without prior knowledge of the available resources on the Fog node, the network condition and the application. The feasibility of the proposed context-aware distribution mechanism is demonstrated on two use-cases, namely a face detection application and a location-based mobile game. The benefits are increased utility of dynamic distribution in both use cases, when compared to a static distribution approach used in existing research. △ Less

Submitted 24 January, 2020; originally announced January 2020.

arXiv:2001.09070 [pdf, other]

Priority-based Fair Scheduling in Edge Computing

Authors: Arkadiusz Madej, Nan Wang, Nikolaos Athanasopoulos, Rajiv Ranjan, Blesson Varghese

Abstract: Scheduling is important in Edge computing. In contrast to the Cloud, Edge resources are hardware limited and cannot support workload-driven infrastructure scaling. Hence, resource allocation and scheduling for the Edge requires a fresh perspective. Existing Edge scheduling research assumes availability of all needed resources whenever a job request is made. This paper challenges that assumption, s… ▽ More Scheduling is important in Edge computing. In contrast to the Cloud, Edge resources are hardware limited and cannot support workload-driven infrastructure scaling. Hence, resource allocation and scheduling for the Edge requires a fresh perspective. Existing Edge scheduling research assumes availability of all needed resources whenever a job request is made. This paper challenges that assumption, since not all job requests from a Cloud server can be scheduled on an Edge node. Thus, guaranteeing fairness among the clients (Cloud servers offloading jobs) while accounting for priorities of the jobs becomes a critical task. This paper presents four scheduling techniques, the first is a naive first come first serve strategy and further proposes three strategies, namely a client fair, priority fair, and hybrid that accounts for the fairness of both clients and job priorities. An evaluation on a target platform under three different scenarios, namely equal, random, and Gaussian job distributions is presented. The experimental studies highlight the low overheads and the distribution of scheduled jobs on the Edge node when compared to the naive strategy. The results confirm the superior performance of the hybrid strategy and showcase the feasibility of fair schedulers for Edge computing. △ Less

Submitted 24 January, 2020; originally announced January 2020.

Comments: 10 pages; accepted to IEEE Int. Conf. on Fog and Edge Computing (ICFEC), 2020

arXiv:2001.05790 [pdf, ps, other]

doi 10.1093/mnras/staa231

Unveiling Vela -- Variability of Interstellar Lines in the Direction of the Vela Supernova Remnant III. Na D and Ca II K

Authors: N. Kameswara Rao, David L. Lambert, Arumalla B. S Reddy, Ranjan Gupta, S. Muneer, Baba Varghese, Harinder P. Singh

Abstract: High-resolution optical spectra were obtained in 2017-2019 with The Southern African Large Telescope of fifteen stars in the direction of the Vela supernova remnant. Interstellar Ca ii H and K and Na i D lines are discussed in this paper. In particular, the line profiles are compared with profiles at a comparable spectral resolution obtained in 1993-1996 by Cha & Sembach. Ten of the lines of sight… ▽ More High-resolution optical spectra were obtained in 2017-2019 with The Southern African Large Telescope of fifteen stars in the direction of the Vela supernova remnant. Interstellar Ca ii H and K and Na i D lines are discussed in this paper. In particular, the line profiles are compared with profiles at a comparable spectral resolution obtained in 1993-1996 by Cha & Sembach. Ten of the lines of sight show changes to one or more of the components in that line of sight. Changes include small changes (1-2 km/s) in radial velocity and/or increases/decreases in equivalent width over the two decades between the periods of observation. Changes are more obvious in the Ca K line than in the Na D lines. These changes are attributed to gas disturbed by interactions between the supernova ejecta and the surrounding interstellar medium. A representative timescale may be 20-50 years. Small-scale variations in line profiles across the face of the remnant suggest, as previously remarked, that a linear scale for interactions is a small fraction of the 40 pc size of the present remnant. △ Less

Submitted 16 January, 2020; originally announced January 2020.

Comments: 24 pages, 18 figures, Accepted for publication in Monthly Notices of the Royal Astronomical Society Main Journal

arXiv:1909.04945 [pdf, other]

Performance Estimation of Container-Based Cloud-to-Fog Offloading

Authors: Ayesha Abdul Majeed, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Abstract: Fog computing offloads latency critical application services running on the Cloud in close proximity to end-user devices onto resources located at the edge of the network. The research in this paper is motivated towards characterising and estimating the time taken to offload a service using containers, which is investigated in the context of the `Save and Load' container migration technique. To th… ▽ More Fog computing offloads latency critical application services running on the Cloud in close proximity to end-user devices onto resources located at the edge of the network. The research in this paper is motivated towards characterising and estimating the time taken to offload a service using containers, which is investigated in the context of the `Save and Load' container migration technique. To this end, the research addresses questions such as whether fog offloading can be accurately modelled and which system and network related parameters influence offloading. These are addressed by exploring a catalogue of 21 different metrics both at the system and process levels that is used as input to four estimation techniques using collective model and individual models to predict the time taken for offloading. The study is pursued by collecting over 1.1 million data points and the preliminary results indicate that offloading can be modelled accurately. △ Less

Submitted 11 September, 2019; originally announced September 2019.

arXiv:1907.10890 [pdf, other]

DeFog: Fog Computing Benchmarks

Authors: Jonathan McChesney, Nan Wang, Ashish Tanwer, Eyal de Lara, Blesson Varghese

Abstract: Fog computing envisions that deploying services of an application across resources in the cloud and those located at the edge of the network may improve the overall performance of the application when compared to running the application on the cloud. However, there are currently no benchmarks that can directly compare the performance of the application across the cloud-only, edge-only and cloud-ed… ▽ More Fog computing envisions that deploying services of an application across resources in the cloud and those located at the edge of the network may improve the overall performance of the application when compared to running the application on the cloud. However, there are currently no benchmarks that can directly compare the performance of the application across the cloud-only, edge-only and cloud-edge deployment platform to obtain any insight on performance improvement. This paper proposes DeFog, a first Fog benchmarking suite to: (i) alleviate the burden of Fog benchmarking by using a standard methodology, and (ii) facilitate the understanding of the target platform by collecting a catalogue of relevant metrics for a set of benchmarks. The current portfolio of DeFog benchmarks comprises six relevant applications conducive to using the edge. Experimental studies are carried out on multiple target platforms to demonstrate the use of DeFog for collecting metrics related to application latencies (communication and computation), for understanding the impact of stress and concurrent users on application latencies, and for understanding the performance of deploying different combination of services of an application across the cloud and edge. DeFog is available for public download (https://github.com/qub-blesson/DeFog). △ Less

Submitted 25 July, 2019; originally announced July 2019.

Comments: Accepted to the ACM/IEEE Symposium on Edge Computing, 2019, Washington DC, USA

arXiv:1902.03656 [pdf, other]

Cloud Futurology

Authors: Blesson Varghese, Philipp Leitner, Suprio Ray, Kyle Chard, Adam Barker, Yehia Elkhatib, Herry Herry, Cheol-Ho Hong, Jeremy Singer, Fung Po Tso, Eiko Yoneki, Mohamed-Faten Zhani

Abstract: The Cloud has become integral to most Internet-based applications and user gadgets. This article provides a brief history of the Cloud and presents a researcher's view of the prospects for innovating at the infrastructure, middleware, and application and delivery levels of the already crowded Cloud computing stack. The Cloud has become integral to most Internet-based applications and user gadgets. This article provides a brief history of the Cloud and presents a researcher's view of the prospects for innovating at the infrastructure, middleware, and application and delivery levels of the already crowded Cloud computing stack. △ Less

Submitted 10 February, 2019; originally announced February 2019.

Comments: Accepted to IEEE Computer, 2019

arXiv:1812.01344 [pdf]

doi 10.1109/MCC.2018.064181115

Realizing Edge Marketplaces: Challenges and Opportunities

Authors: Blesson Varghese, Massimo Villari, Omer Rana, Philip James, Tejal Shal, Maria Fazio, Rajiv Ranjan

Abstract: The edge of the network has the potential to host services for supporting a variety of user applications, ranging in complexity from data preprocessing, image and video rendering, and interactive gaming, to embedded systems in autonomous cars and built environments. However, the computational and data resources over which such services are hosted, and the actors that interact with these services,… ▽ More The edge of the network has the potential to host services for supporting a variety of user applications, ranging in complexity from data preprocessing, image and video rendering, and interactive gaming, to embedded systems in autonomous cars and built environments. However, the computational and data resources over which such services are hosted, and the actors that interact with these services, have an intermittent availability and access profile, introducing significant risk for user applications that must rely on them. This article investigates the development of an edge marketplace, which is able to support multiple providers for offering services at the network edge, and to enable demand supply for influencing the operation of such a marketplace. Resilience, cost, and quality of service and experience will subsequently enable such a marketplace to adapt its services over time. This article also describes how distributed-ledger technologies (such as blockchains) provide a promising approach to support the operation of such a marketplace and regulate its behavior (such as the GDPR in Europe) and operation. Two application scenarios provide context for the discussion of how such a marketplace would function and be utilized in practice. △ Less

Submitted 4 December, 2018; originally announced December 2018.

Comments: Published in IEEE Cloud Computing, Volume 5, Issue 6, 2018, pp. 9-20

Journal ref: B. Varghese et al., "Realizing Edge Marketplaces: Challenges and Opportunities," in IEEE Cloud Computing, vol. 5, no. 6, pp. 9-20, Nov./Dec. 2018

arXiv:1810.06046 [pdf, other]

Accelerator Virtualization in Fog Computing: Moving From the Cloud to the Edge

Authors: Blesson Varghese, Carlos Reano, Federico Silla

Abstract: Hardware accelerators are available on the Cloud for enhanced analytics. Next generation Clouds aim to bring enhanced analytics using accelerators closer to user devices at the edge of the network for improving Quality-of-Service by minimizing end-to-end latencies and response times. The collective computing model that utilizes resources at the Cloud-Edge continuum in a multi-tier hierarchy compri… ▽ More Hardware accelerators are available on the Cloud for enhanced analytics. Next generation Clouds aim to bring enhanced analytics using accelerators closer to user devices at the edge of the network for improving Quality-of-Service by minimizing end-to-end latencies and response times. The collective computing model that utilizes resources at the Cloud-Edge continuum in a multi-tier hierarchy comprising the Cloud, the Edge and user devices is referred to as Fog computing. This article identifies challenges and opportunities in making accelerators accessible at the Edge. A holistic view of the Fog architecture is key to pursuing meaningful research in this area. △ Less

Submitted 14 October, 2018; originally announced October 2018.

Comments: IEEE Cloud Computing magazine

arXiv:1810.04608 [pdf, other]

DYVERSE: DYnamic VERtical Scaling in Multi-tenant Edge Environments

Authors: Nan Wang, Michail Matthaiou, Dimitrios S. Nikolopoulos, Blesson Varghese

Abstract: Multi-tenancy in resource-constrained environments is a key challenge in Edge computing. In this paper, we develop 'DYVERSE: DYnamic VERtical Scaling in Edge' environments, which is the first light-weight and dynamic vertical scaling mechanism for managing resources allocated to applications for facilitating multi-tenancy in Edge environments. To enable dynamic vertical scaling, one static and thr… ▽ More Multi-tenancy in resource-constrained environments is a key challenge in Edge computing. In this paper, we develop 'DYVERSE: DYnamic VERtical Scaling in Edge' environments, which is the first light-weight and dynamic vertical scaling mechanism for managing resources allocated to applications for facilitating multi-tenancy in Edge environments. To enable dynamic vertical scaling, one static and three dynamic priority management approaches that are workload-aware, community-aware and system-aware, respectively are proposed. This research advocates that dynamic vertical scaling and priority management approaches reduce Service Level Objective (SLO) violation rates. An online-game and a face detection workload in a Cloud-Edge test-bed are used to validate the research. The merits of DYVERSE is that there is only a sub-second overhead per Edge server when 32 Edge servers are deployed on a single Edge node. When compared to executing applications on the Edge servers without dynamic vertical scaling, static priorities and dynamic priorities reduce SLO violation rates of requests by up to 4% and 12% for the online game, respectively, and in both cases 6% for the face detection workload. Moreover, for both workloads, the system-aware dynamic vertical scaling method effectively reduces the latency of non-violated requests, when compared to other methods. △ Less

Submitted 21 February, 2020; v1 submitted 19 September, 2018; originally announced October 2018.

arXiv:1810.00305 [pdf, other]

doi 10.1145/3326066

Resource Management in Fog/Edge Computing: A Survey

Authors: Cheol-Ho Hong, Blesson Varghese

Abstract: Contrary to using distant and centralized cloud data center resources, employing decentralized resources at the edge of a network for processing data closer to user devices, such as smartphones and tablets, is an upcoming computing paradigm, referred to as fog/edge computing. Fog/edge resources are typically resource-constrained, heterogeneous, and dynamic compared to the cloud, thereby making res… ▽ More Contrary to using distant and centralized cloud data center resources, employing decentralized resources at the edge of a network for processing data closer to user devices, such as smartphones and tablets, is an upcoming computing paradigm, referred to as fog/edge computing. Fog/edge resources are typically resource-constrained, heterogeneous, and dynamic compared to the cloud, thereby making resource management an important challenge that needs to be addressed. This article reviews publications as early as 1991, with 85% of the publications between 2013-2018, to identify and classify the architectures, infrastructure, and underlying algorithms for managing resources in fog/edge computing. △ Less

Submitted 29 September, 2018; originally announced October 2018.

Comments: 22 pages

Journal ref: ACM Computing Surveys (CSUR) 52.5 (2019) 1-37

arXiv:1803.05255 [pdf]

Addressing the Challenges in Federating Edge Resources

Authors: Cihat Baktir, Cagatay Sonmez, Cem Ersoy, Atay Ozgovde, Blesson Varghese

Abstract: This book chapter considers how Edge deployments can be brought to bear in a global context by federating them across multiple geographic regions to create a global Edge-based fabric that decentralizes data center computation. This is currently impractical, not only because of technical challenges, but is also shrouded by social, legal and geopolitical issues. In this chapter, we discuss two key c… ▽ More This book chapter considers how Edge deployments can be brought to bear in a global context by federating them across multiple geographic regions to create a global Edge-based fabric that decentralizes data center computation. This is currently impractical, not only because of technical challenges, but is also shrouded by social, legal and geopolitical issues. In this chapter, we discuss two key challenges - networking and management in federating Edge deployments. Additionally, we consider resource and modeling challenges that will need to be addressed for a federated Edge. △ Less

Submitted 14 March, 2018; originally announced March 2018.

Comments: Book Chapter accepted to the Fog and Edge Computing: Principles and Paradigms; Editors Buyya, Srirama

arXiv:1712.04495 [pdf, other]

Intra-node Memory Safe GPU Co-Scheduling

Authors: Carlos Reano, Federico Silla, Dimitrios S. Nikolopoulos, Blesson Varghese

Abstract: GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the utilisation of GPUs by proposing a framework, we refer to as schedGPU, to facilitate intra-node GPU co-scheduling such that a GPU can be safely shared among mult… ▽ More GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the utilisation of GPUs by proposing a framework, we refer to as schedGPU, to facilitate intra-node GPU co-scheduling such that a GPU can be safely shared among multiple applications by taking memory constraints into account. Two approaches, namely a client-server and a shared memory approach are explored. However, the shared memory approach is more suitable due to lower overheads when compared to the former approach. Four policies are proposed in schedGPU to handle applications that are waiting to access the GPU, two of which account for priorities. The feasibility of schedGPU is validated on three real-world applications. The key observation is that a performance gain is achieved. For single applications, a gain of over 10 times, as measured by GPU utilisation and GPU memory utilisation, is obtained. For workloads comprising multiple applications, a speed-up of up to 5x in the total execution time is noted. Moreover, the average GPU utilisation and average GPU memory utilisation is increased by 5 and 12 times, respectively. △ Less

Submitted 12 December, 2017; originally announced December 2017.

Comments: Accepted on 12 Dec 2017, IEEE Transactions on Parallel and Distributed Systems

Showing 1–50 of 96 results for author: Varghese, B