-
LVS-Net: A Lightweight Vessels Segmentation Network for Retinal Image Analysis
Authors:
Mehwish Mehmood,
Shahzaib Iqbal,
Tariq Mahmood Khan,
Ivor Spence,
Muhammad Fahim
Abstract:
The analysis of retinal images for the diagnosis of various diseases is one of the emerging areas of research. Recently, the research direction has been inclined towards investigating several changes in retinal blood vessels in subjects with many neurological disorders, including dementia. This research focuses on detecting diseases early by improving the performance of models for segmentation of…
▽ More
The analysis of retinal images for the diagnosis of various diseases is one of the emerging areas of research. Recently, the research direction has been inclined towards investigating several changes in retinal blood vessels in subjects with many neurological disorders, including dementia. This research focuses on detecting diseases early by improving the performance of models for segmentation of retinal vessels with fewer parameters, which reduces computational costs and supports faster processing. This paper presents a novel lightweight encoder-decoder model that segments retinal vessels to improve the efficiency of disease detection. It incorporates multi-scale convolutional blocks in the encoder to accurately identify vessels of various sizes and thicknesses. The bottleneck of the model integrates the Focal Modulation Attention and Spatial Feature Refinement Blocks to refine and enhance essential features for efficient segmentation. The decoder upsamples features and integrates them with the corresponding feature in the encoder using skip connections and the spatial feature refinement block at every upsampling stage to enhance feature representation at various scales. The estimated computation complexity of our proposed model is around 29.60 GFLOP with 0.71 million parameters and 2.74 MB of memory size, and it is evaluated using public datasets, that is, DRIVE, CHASE\_DB, and STARE. It outperforms existing models with dice scores of 86.44\%, 84.22\%, and 87.88\%, respectively.
△ Less
Submitted 8 December, 2024;
originally announced December 2024.
-
DNNShifter: An Efficient DNN Pruning System for Edge Computing
Authors:
Bailey J. Eccles,
Philip Rodgers,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To add…
▽ More
Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To address this, models are pruned to create lightweight, more suitable variants for these devices. Existing pruning methods are unable to provide similar quality models compared to their unpruned counterparts without significant time costs and overheads or are limited to offline use cases. Our work rapidly derives suitable model variants while maintaining the accuracy of the original model. The model variants can be swapped quickly when system and network conditions change to match workload demand. This paper presents DNNShifter, an end-to-end DNN training, spatial pruning, and model switching system that addresses the challenges mentioned above. At the heart of DNNShifter is a novel methodology that prunes sparse models using structured pruning. The pruned model variants generated by DNNShifter are smaller in size and thus faster than dense and sparse model predecessors, making them suitable for inference at the edge while retaining near similar accuracy as of the original dense model. DNNShifter generates a portfolio of model variants that can be swiftly interchanged depending on operational conditions. DNNShifter produces pruned model variants up to 93x faster than conventional training methods. Compared to sparse models, the pruned model variants are up to 5.14x smaller and have a 1.67x inference latency speedup, with no compromise to sparse model accuracy. In addition, DNNShifter has up to 11.9x lower overhead for switching models and up to 3.8x lower memory utilisation than existing approaches.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
EcoFed: Efficient Communication for DNN Partitioning-based Federated Learning
Authors:
Di Wu,
Rehmat Ullah,
Philip Rodgers,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates signifi…
▽ More
Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates significant communication overheads since the intermediate activation and gradient need to be transferred between the device and the server during training. While current research reduces the communication introduced by DNN partitioning using local loss-based methods, we demonstrate that these methods are ineffective in improving the overall efficiency (communication overhead and training speed) of a DPFL system. This is because they suffer from accuracy degradation and ignore the communication costs incurred when transferring the activation from the device to the server. This article proposes EcoFed - a communication efficient framework for DPFL systems. EcoFed eliminates the transmission of the gradient by developing pre-trained initialization of the DNN model on the device for the first time. This reduces the accuracy degradation seen in local loss-based methods. In addition, EcoFed proposes a novel replay buffer mechanism and implements a quantization-based compression technique to reduce the transmission of the activation. It is experimentally demonstrated that EcoFed can reduce the communication cost by up to 133x and accelerate training by up to 21x when compared to classic FL. Compared to vanilla DPFL, EcoFed achieves a 16x communication reduction and 2.86x training time speed-up. EcoFed is available from https://github.com/blessonvar/EcoFed.
△ Less
Submitted 3 January, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
PiPar: Pipeline Parallelism for Collaborative Machine Learning
Authors:
Zihan Zhang,
Philip Rodgers,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Collaborative machine learning (CML) techniques, such as federated learning, have been proposed to train deep learning models across multiple mobile devices and a server. CML techniques are privacy-preserving as a local model that is trained on each device instead of the raw data from the device is shared with the server. However, CML training is inefficient due to low resource utilization. We ide…
▽ More
Collaborative machine learning (CML) techniques, such as federated learning, have been proposed to train deep learning models across multiple mobile devices and a server. CML techniques are privacy-preserving as a local model that is trained on each device instead of the raw data from the device is shared with the server. However, CML training is inefficient due to low resource utilization. We identify idling resources on the server and devices due to sequential computation and communication as the principal cause of low resource utilization. A novel framework PiPar that leverages pipeline parallelism for CML techniques is developed to substantially improve resource utilization. A new training pipeline is designed to parallelize the computations on different hardware resources and communication on different bandwidth resources, thereby accelerating the training process in CML. A low overhead automated parameter selection method is proposed to optimize the pipeline, maximizing the utilization of available resources. The experimental results confirm the validity of the underlying approach of PiPar and highlight that when compared to federated learning: (i) the idle time of the server can be reduced by up to 64.1x, and (ii) the overall training time can be accelerated by up to 34.6x under varying network conditions for a collection of six small and large popular deep neural networks and four datasets without sacrificing accuracy. It is also experimentally demonstrated that PiPar achieves performance benefits when incorporating differential privacy methods and operating in environments with heterogeneous devices and changing bandwidths.
△ Less
Submitted 25 June, 2024; v1 submitted 1 December, 2022;
originally announced February 2023.
-
CONTINUER: Maintaining Distributed DNN Services During Edge Failures
Authors:
Ayesha Abdul Majeed,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Partitioning and deploying Deep Neural Networks (DNNs) across edge nodes may be used to meet performance objectives of applications. However, the failure of a single node may result in cascading failures that will adversely impact the delivery of the service and will result in failure to meet specific objectives. The impact of these failures needs to be minimised at runtime. Three techniques are e…
▽ More
Partitioning and deploying Deep Neural Networks (DNNs) across edge nodes may be used to meet performance objectives of applications. However, the failure of a single node may result in cascading failures that will adversely impact the delivery of the service and will result in failure to meet specific objectives. The impact of these failures needs to be minimised at runtime. Three techniques are explored in this paper, namely repartitioning, early-exit and skip-connection. When an edge node fails, the repartitioning technique will repartition and redeploy the DNN thus avoiding the failed nodes. The early-exit technique makes provision for a request to exit (early) before the failed node. The skip connection technique dynamically routes the request by skipping the failed nodes. This paper will leverage trade-offs in accuracy, end-to-end latency and downtime for selecting the best technique given user-defined objectives (accuracy, latency and downtime thresholds) when an edge node fails. To this end, CONTINUER is developed. Two key activities of the framework are estimating the accuracy and latency when using the techniques for distributed DNNs and selecting the best technique. It is demonstrated on a lab-based experimental testbed that CONTINUER estimates accuracy and latency when using the techniques with no more than an average error of 0.28% and 13.06%, respectively and selects the suitable technique with a low overhead of no more than 16.82 milliseconds and an accuracy of up to 99.86%.
△ Less
Submitted 25 April, 2022;
originally announced June 2022.
-
FedFly: Towards Migration in Edge-based Distributed Federated Learning
Authors:
Rehmat Ullah,
Di Wu,
Paul Harvey,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Federated learning (FL) is a privacy-preserving distributed machine learning technique that trains models while keeping all the original data generated on devices locally. Since devices may be resource constrained, offloading can be used to improve FL performance by transferring computational workload from devices to edge servers. However, due to mobility, devices participating in FL may leave the…
▽ More
Federated learning (FL) is a privacy-preserving distributed machine learning technique that trains models while keeping all the original data generated on devices locally. Since devices may be resource constrained, offloading can be used to improve FL performance by transferring computational workload from devices to edge servers. However, due to mobility, devices participating in FL may leave the network during training and need to connect to a different edge server. This is challenging because the offloaded computations from edge server need to be migrated. In line with this assertion, we present FedFly, which is, to the best of our knowledge, the first work to migrate a deep neural network (DNN) when devices move between edge servers during FL training. Our empirical results on the CIFAR10 dataset, with both balanced and imbalanced data distribution, support our claims that FedFly can reduce training time by up to 33% when a device moves after 50% of the training is completed, and by up to 45% when 90% of the training is completed when compared to state-of-the-art offloading approach in FL. FedFly has negligible overhead of up to two seconds and does not compromise accuracy. Finally, we highlight a number of open research issues for further investigation. FedFly can be downloaded from https://github.com/qub-blesson/FedFly.
△ Less
Submitted 14 July, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning
Authors:
Di Wu,
Rehmat Ullah,
Paul Harvey,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execution on devices with limited computational capabilities, (ii) accounting for stragglers due to computational heterogeneity of devices, and (iii) ada…
▽ More
Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execution on devices with limited computational capabilities, (ii) accounting for stragglers due to computational heterogeneity of devices, and (iii) adaptation to the changing network bandwidths. This paper presents FedAdapt, an adaptive offloading FL framework to mitigate the aforementioned challenges. FedAdapt accelerates local training in computationally constrained devices by leveraging layer offloading of deep neural networks (DNNs) to servers. Further, FedAdapt adopts reinforcement learning based optimization and clustering to adaptively identify which layers of the DNN should be offloaded for each individual device on to a server to tackle the challenges of computational heterogeneity and changing network bandwidth. Experimental studies are carried out on a lab-based testbed and it is demonstrated that by offloading a DNN from the device to the server FedAdapt reduces the training time of a typical IoT device by over half compared to classic FL. The training time of extreme stragglers and the overall training time can be reduced by up to 57%. Furthermore, with changing network bandwidth, FedAdapt is demonstrated to reduce the training time by up to 40% when compared to classic FL, without sacrificing accuracy.
△ Less
Submitted 18 May, 2022; v1 submitted 9 July, 2021;
originally announced July 2021.
-
NEUKONFIG: Reducing Edge Service Downtime When Repartitioning DNNs
Authors:
Ayesha Abdul Majeed,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Deep Neural Networks (DNNs) may be partitioned across the edge and the cloud to improve the performance efficiency of inference. DNN partitions are determined based on operational conditions such as network speed. When operational conditions change DNNs will need to be repartitioned to maintain the overall performance. However, repartitioning using existing approaches, such as Pause and Resume, wi…
▽ More
Deep Neural Networks (DNNs) may be partitioned across the edge and the cloud to improve the performance efficiency of inference. DNN partitions are determined based on operational conditions such as network speed. When operational conditions change DNNs will need to be repartitioned to maintain the overall performance. However, repartitioning using existing approaches, such as Pause and Resume, will incur a service downtime on the edge. This paper presents the NEUKONFIG framework that identifies the service downtime incurred when repartitioning DNNs and proposes approaches for reducing edge service downtime. The proposed approaches are based on 'Dynamic Switching' in which, when the network speed changes and given an existing edge-cloud pipeline, a new edge-cloud pipeline is initialised with new DNN partitions. Incoming inference requests are switched to the new pipeline for processing data. Two dynamic switching scenarios are considered: when a second edge-cloud pipeline is always running and when a second pipeline is only initialised when the network speed changes. Experimental studies are carried out on a lab-based testbed to demonstrate that Dynamic Switching reduces the downtime by at least an order of magnitude when compared to a baseline using Pause and Resume that has a downtime of 6 seconds. A trade-off in the edge service downtime and memory required is noted. The Dynamic Switching approach that requires the same amount of memory as the baseline reduces the edge service downtime to 0.6 seconds and to less than 1 millisecond in the best case when twice the amount of memory as the baseline is available.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
A Case For Adaptive Deep Neural Networks in Edge Computing
Authors:
Francis McNamee,
Schahram Dustadar,
Peter Kilpatrick,
Weisong Shi,
Ivor Spence,
Blesson Varghese
Abstract:
Edge computing offers an additional layer of compute infrastructure closer to the data source before raw data from privacy-sensitive and performance-critical applications is transferred to a cloud data center. Deep Neural Networks (DNNs) are one class of applications that are reported to benefit from collaboratively computing between the edge and the cloud. A DNN is partitioned such that specific…
▽ More
Edge computing offers an additional layer of compute infrastructure closer to the data source before raw data from privacy-sensitive and performance-critical applications is transferred to a cloud data center. Deep Neural Networks (DNNs) are one class of applications that are reported to benefit from collaboratively computing between the edge and the cloud. A DNN is partitioned such that specific layers of the DNN are deployed onto the edge and the cloud to meet performance and privacy objectives. However, there is limited understanding of: (a) whether and how evolving operational conditions (increased CPU and memory utilization at the edge or reduced data transfer rates between the edge and the cloud) affect the performance of already deployed DNNs, and (b) whether a new partition configuration is required to maximize performance. A DNN that adapts to changing operational conditions is referred to as an 'adaptive DNN'. This paper investigates whether there is a case for adaptive DNNs in edge computing by considering three questions: (i) Are DNNs sensitive to operational conditions? (ii) How sensitive are DNNs to operational conditions? (iii) Do individual or a combination of operational conditions equally affect DNNs? (iv) Is DNN partitioning sensitive to hardware architectures on the cloud/edge? The exploration is carried out in the context of 8 pre-trained DNN models and the results presented are from analyzing nearly 8 million data points. The results highlight that network conditions affects DNN performance more than CPU or memory related operational conditions. Repartitioning is noted to provide a performance gain in a number of cases, but a specific trend was not noted in relation to its correlation to the underlying hardware architecture. Nonetheless, the need for adaptive DNNs is confirmed.
△ Less
Submitted 16 December, 2020; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Modelling Fog Offloading Performance
Authors:
Ayesha Abdul Majeed,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Fog computing has emerged as a computing paradigm aimed at addressing the issues of latency, bandwidth and privacy when mobile devices are communicating with remote cloud services. The concept is to offload compute services closer to the data. However many challenges exist in the realisation of this approach. During offloading, (part of) the application underpinned by the services may be unavailab…
▽ More
Fog computing has emerged as a computing paradigm aimed at addressing the issues of latency, bandwidth and privacy when mobile devices are communicating with remote cloud services. The concept is to offload compute services closer to the data. However many challenges exist in the realisation of this approach. During offloading, (part of) the application underpinned by the services may be unavailable, which the user will experience as down time. This paper describes work aimed at building models to allow prediction of such down time based on metrics (operational data) of the underlying and surrounding infrastructure. Such prediction would be invaluable in the context of automated Fog offloading and adaptive decision making in Fog orchestration. Models that cater for four container-based stateless and stateful offload techniques, namely Save and Load, Export and Import, Push and Pull and Live Migration, are built using four (linear and non-linear) regression techniques. Experimental results comprising over 42 million data points from multiple lab-based Fog infrastructure are presented. The results highlight that reasonably accurate predictions (measured by the coefficient of determination for regression models, mean absolute percentage error, and mean absolute error) may be obtained when considering 25 metrics relevant to the infrastructure.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Performance Estimation of Container-Based Cloud-to-Fog Offloading
Authors:
Ayesha Abdul Majeed,
Peter Kilpatrick,
Ivor Spence,
Blesson Varghese
Abstract:
Fog computing offloads latency critical application services running on the Cloud in close proximity to end-user devices onto resources located at the edge of the network. The research in this paper is motivated towards characterising and estimating the time taken to offload a service using containers, which is investigated in the context of the `Save and Load' container migration technique. To th…
▽ More
Fog computing offloads latency critical application services running on the Cloud in close proximity to end-user devices onto resources located at the edge of the network. The research in this paper is motivated towards characterising and estimating the time taken to offload a service using containers, which is investigated in the context of the `Save and Load' container migration technique. To this end, the research addresses questions such as whether fog offloading can be accurately modelled and which system and network related parameters influence offloading. These are addressed by exploring a catalogue of 21 different metrics both at the system and process levels that is used as input to four estimation techniques using collective model and individual models to predict the time taken for offloading. The study is pursued by collecting over 1.1 million data points and the preliminary results indicate that offloading can be modelled accurately.
△ Less
Submitted 11 September, 2019;
originally announced September 2019.
-
Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics
Authors:
Giorgis Georgakoudis,
Charles J. Gillan,
Ahmed Sayed,
Ivor Spence,
Richard Faloon,
Dimitrios S. Nikolopoulos
Abstract:
We present a mathematically rigorous Quality-of-Service (QoS) metric which relates the achievable quality of service metric (QoS) for a real-time analytics service to the server energy cost of offering the service. Using a new iso-QoS evaluation methodology, we scale server resources to meet QoS targets and directly rank the servers in terms of their energy-efficiency and by extension cost of owne…
▽ More
We present a mathematically rigorous Quality-of-Service (QoS) metric which relates the achievable quality of service metric (QoS) for a real-time analytics service to the server energy cost of offering the service. Using a new iso-QoS evaluation methodology, we scale server resources to meet QoS targets and directly rank the servers in terms of their energy-efficiency and by extension cost of ownership. Our metric and method are platform-independent and enable fair comparison of datacenter compute servers with significant architectural diversity, including micro-servers. We deploy our metric and methodology to compare three servers running financial option pricing workloads on real-life market data. We find that server ranking is sensitive to data inputs and desired QoS level and that although scale-out micro-servers can be up to two times more energy-efficient than conventional heavyweight servers for the same target QoS, they are still six times less energy efficient than high-performance computational accelerators.
△ Less
Submitted 14 January, 2015;
originally announced January 2015.
-
Methods and Metrics for Fair Server Assessment under Real-Time Financial Workloads
Authors:
Giorgis Georgakoudis,
Charles J. Gillan,
Ahmed Sayed,
Ivor Spence,
Richard Faloon,
Dimitrios S. Nikolopoulos
Abstract:
Energy efficiency has been a daunting challenge for datacenters. The financial industry operates some of the largest datacenters in the world. With increasing energy costs and the financial services sector growth, emerging financial analytics workloads may incur extremely high operational costs, to meet their latency targets. Microservers have recently emerged as an alternative to high-end servers…
▽ More
Energy efficiency has been a daunting challenge for datacenters. The financial industry operates some of the largest datacenters in the world. With increasing energy costs and the financial services sector growth, emerging financial analytics workloads may incur extremely high operational costs, to meet their latency targets. Microservers have recently emerged as an alternative to high-end servers, promising scalable performance and low energy consumption in datacenters via scale-out. Unfortunately, stark differences in architectural features, form factor and design considerations make a fair comparison between servers and microservers exceptionally challenging. In this paper we present a rigorous methodology and new metrics for fair comparison of server and microserver platforms. We deploy our methodology and metrics to compare a microserver with ARM cores against two servers with x86 cores, running the same real-time financial analytics workload. We define workload-specific but platform-independent performance metrics for platform comparison, targeting both datacenter operators and end users. Our methodology establishes that a server based the Xeon Phi processor delivers the highest performance and energy-efficiency. However, by scaling out energy-efficient microservers, we achieve competitive or better energy-efficiency than a power-equivalent server with two Sandy Bridge sockets despite the microserver's slower cores. Using a new iso-QoS (iso-Quality of Service) metric, we find that the ARM microserver scales enough to meet market throughput demand, i.e. a 100% QoS in terms of timely option pricing, with as little as 55% of the energy consumed by the Sandy Bridge server.
△ Less
Submitted 30 December, 2014;
originally announced January 2015.