Search | arXiv e-print repository

arXiv:2504.20387 [pdf, other]

DEER: Deep Runahead for Instruction Prefetching on Modern Mobile Workloads

Authors: Parmida Vahdatniya, Julian Humecki, Henry Kao, Tony Li, Ali Sedaghati, Fang Su, Ruoyu Zhou, Alex Bi, Reza Azimi, Maziar Goudarzi

Abstract: Mobile workloads incur heavy frontend stalls due to increasingly large code footprints as well as long repeat cycles. Existing instruction-prefetching techniques suffer from low coverage, poor timeliness, or high cost. We provide a SW/HW co-designed I-prefetcher; DEER uses profile analysis to extract metadata information that allow the hardware to prefetch the most likely future instruction cachel… ▽ More Mobile workloads incur heavy frontend stalls due to increasingly large code footprints as well as long repeat cycles. Existing instruction-prefetching techniques suffer from low coverage, poor timeliness, or high cost. We provide a SW/HW co-designed I-prefetcher; DEER uses profile analysis to extract metadata information that allow the hardware to prefetch the most likely future instruction cachelines, hundreds of instructions earlier. This profile analysis skips over loops and recursions to go deeper into the future, and uses a return-address stack on the hardware side to allow prefetch on the return-path from large call-stacks. The produced metadata table is put in DRAM, pointed to by an in-hardware register; the high depth of the lookahead allows to preload the metadata in time and thus nearly no on-chip metadata storage is needed. Gem5 evaluation on real-world modern mobile workloads shows up to 45% reduction in L2 instruction-miss rate (19.6% on average), resulting in up to 8% speedup (4.7% on average). These gains are up to 4X larger than full-hardware record-and-replay prefetchers, while needing two orders of magnitude smaller on-chip storage. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: 13 pages

arXiv:2411.13121 [pdf, other]

ReinFog: A DRL Empowered Framework for Resource Management in Edge and Cloud Computing Environments

Authors: Zhiyu Wang, Mohammad Goudarzi, Rajkumar Buyya

Abstract: The growing IoT landscape requires effective server deployment strategies to meet demands including real-time processing and energy efficiency. This is complicated by heterogeneous, dynamic applications and servers. To address these challenges, we propose ReinFog, a modular distributed software empowered with Deep Reinforcement Learning (DRL) for adaptive resource management across edge/fog and cl… ▽ More The growing IoT landscape requires effective server deployment strategies to meet demands including real-time processing and energy efficiency. This is complicated by heterogeneous, dynamic applications and servers. To address these challenges, we propose ReinFog, a modular distributed software empowered with Deep Reinforcement Learning (DRL) for adaptive resource management across edge/fog and cloud environments. ReinFog enables the practical development/deployment of various centralized and distributed DRL techniques for resource management in edge/fog and cloud computing environments. It also supports integrating native and library-based DRL techniques for diverse IoT application scheduling objectives. Additionally, ReinFog allows for customizing deployment configurations for different DRL techniques, including the number and placement of DRL Learners and DRL Workers in large-scale distributed systems. Besides, we propose a novel Memetic Algorithm for DRL Component (e.g., DRL Learners and DRL Workers) Placement in ReinFog named MADCP, which combines the strengths of Genetic Algorithm, Firefly Algorithm, and Particle Swarm Optimization. Experiments reveal that the DRL mechanisms developed within ReinFog have significantly enhanced both centralized and distributed DRL techniques implementation. These advancements have resulted in notable improvements in IoT application performance, reducing response time by 45%, energy consumption by 39%, and weighted cost by 37%, while maintaining minimal scheduling overhead. Additionally, ReinFog exhibits remarkable scalability, with a rise in DRL Workers from 1 to 30 causing only a 0.3-second increase in startup time and around 2 MB more RAM per Worker. The proposed MADCP for DRL component placement further accelerates the convergence rate of DRL techniques by up to 38%. △ Less

Submitted 20 November, 2024; originally announced November 2024.

arXiv:2410.14348 [pdf, other]

TF-DDRL: A Transformer-enhanced Distributed DRL Technique for Scheduling IoT Applications in Edge and Cloud Computing Environments

Authors: Zhiyu Wang, Mohammad Goudarzi, Rajkumar Buyya

Abstract: With the continuous increase of IoT applications, their effective scheduling in edge and cloud computing has become a critical challenge. The inherent dynamism and stochastic characteristics of edge and cloud computing, along with IoT applications, necessitate solutions that are highly adaptive. Currently, several centralized Deep Reinforcement Learning (DRL) techniques are adapted to address the… ▽ More With the continuous increase of IoT applications, their effective scheduling in edge and cloud computing has become a critical challenge. The inherent dynamism and stochastic characteristics of edge and cloud computing, along with IoT applications, necessitate solutions that are highly adaptive. Currently, several centralized Deep Reinforcement Learning (DRL) techniques are adapted to address the scheduling problem. However, they require a large amount of experience and training time to reach a suitable solution. Moreover, many IoT applications contain multiple interdependent tasks, imposing additional constraints on the scheduling problem. To overcome these challenges, we propose a Transformer-enhanced Distributed DRL scheduling technique, called TF-DDRL, to adaptively schedule heterogeneous IoT applications. This technique follows the Actor-Critic architecture, scales efficiently to multiple distributed servers, and employs an off-policy correction method to stabilize the training process. In addition, Prioritized Experience Replay (PER) and Transformer techniques are introduced to reduce exploration costs and capture long-term dependencies for faster convergence. Extensive results of practical experiments show that TF-DDRL, compared to its counterparts, significantly reduces response time, energy consumption, monetary cost, and weighted cost by up to 60%, 51%, 56%, and 58%, respectively. △ Less

Submitted 31 October, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

arXiv:2408.12081 [pdf, other]

Towards Threat Modelling of IoT Context-Sharing Platforms

Authors: Mohammad Goudarzi, Arash Shaghaghi, Simon Finn, Burkhard Stiller, Sanjay Jha

Abstract: The Internet of Things (IoT) involves complex, interconnected systems and devices that depend on context-sharing platforms for interoperability and information exchange. These platforms are, therefore, critical components of real-world IoT deployments, making their security essential to ensure the resilience and reliability of these 'systems of systems'. In this paper, we take the first steps towa… ▽ More The Internet of Things (IoT) involves complex, interconnected systems and devices that depend on context-sharing platforms for interoperability and information exchange. These platforms are, therefore, critical components of real-world IoT deployments, making their security essential to ensure the resilience and reliability of these 'systems of systems'. In this paper, we take the first steps toward systematically and comprehensively addressing the security of IoT context-sharing platforms. We propose a framework for threat modelling and security analysis of a generic IoT context-sharing solution, employing the MITRE ATT&CK framework. Through an evaluation of various industry-funded projects and academic research, we identify significant security challenges in the design of IoT context-sharing platforms. Our threat modelling provides an in-depth analysis of the techniques and sub-techniques adversaries may use to exploit these systems, offering valuable insights for future research aimed at developing resilient solutions. Additionally, we have developed an open-source threat analysis tool that incorporates our detailed threat modelling, which can be used to evaluate and enhance the security of existing context-sharing platforms. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2407.14116 [pdf, other]

AuditNet: A Conversational AI-based Security Assistant [DEMO]

Authors: Shohreh Deldari, Mohammad Goudarzi, Aditya Joshi, Arash Shaghaghi, Simon Finn, Flora D. Salim, Sanjay Jha

Abstract: In the age of information overload, professionals across various fields face the challenge of navigating vast amounts of documentation and ever-evolving standards. Ensuring compliance with standards, regulations, and contractual obligations is a critical yet complex task across various professional fields. We propose a versatile conversational AI assistant framework designed to facilitate complian… ▽ More In the age of information overload, professionals across various fields face the challenge of navigating vast amounts of documentation and ever-evolving standards. Ensuring compliance with standards, regulations, and contractual obligations is a critical yet complex task across various professional fields. We propose a versatile conversational AI assistant framework designed to facilitate compliance checking on the go, in diverse domains, including but not limited to network infrastructure, legal contracts, educational standards, environmental regulations, and government policies. By leveraging retrieval-augmented generation using large language models, our framework automates the review, indexing, and retrieval of relevant, context-aware information, streamlining the process of verifying adherence to established guidelines and requirements. This AI assistant not only reduces the manual effort involved in compliance checks but also enhances accuracy and efficiency, supporting professionals in maintaining high standards of practice and ensuring regulatory compliance in their respective fields. We propose and demonstrate AuditNet, the first conversational AI security assistant designed to assist IoT network security experts by providing instant access to security standards, policies, and regulations. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.05290 [pdf, ps, other]

Lack of Systematic Approach to Security of IoT Context Sharing Platforms

Authors: Mohammad Goudarzi, Arash Shaghaghi, Simon Finn, Sanjay Jha

Abstract: IoT context-sharing platforms are an essential component of today's interconnected IoT deployments with their security affecting the entire deployment and the critical infrastructure adopting IoT. We report on a lack of systematic approach to the security of IoT context-sharing platforms and propose the need for a methodological and systematic alternative to evaluate the existing solutions and dev… ▽ More IoT context-sharing platforms are an essential component of today's interconnected IoT deployments with their security affecting the entire deployment and the critical infrastructure adopting IoT. We report on a lack of systematic approach to the security of IoT context-sharing platforms and propose the need for a methodological and systematic alternative to evaluate the existing solutions and develop `secure-by-design' solutions. We have identified the key components of a generic IoT context-sharing platform and propose using MITRE ATT&CK for threat modelling of such platforms. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted to 21st Annual International Conference on Privacy, Security, and Trust (PST2024)

arXiv:2310.09003 [pdf, other]

μ-DDRL: A QoS-Aware Distributed Deep Reinforcement Learning Technique for Service Offloading in Fog computing Environments

Authors: Mohammad Goudarzi, Maria A. Rodriguez, Majid Sarvi, Rajkumar Buyya

Abstract: Fog and Edge computing extend cloud services to the proximity of end users, allowing many Internet of Things (IoT) use cases, particularly latency-critical applications. Smart devices, such as traffic and surveillance cameras, often do not have sufficient resources to process computation-intensive and latency-critical services. Hence, the constituent parts of services can be offloaded to nearby Ed… ▽ More Fog and Edge computing extend cloud services to the proximity of end users, allowing many Internet of Things (IoT) use cases, particularly latency-critical applications. Smart devices, such as traffic and surveillance cameras, often do not have sufficient resources to process computation-intensive and latency-critical services. Hence, the constituent parts of services can be offloaded to nearby Edge/Fog resources for processing and storage. However, making offloading decisions for complex services in highly stochastic and dynamic environments is an important, yet difficult task. Recently, Deep Reinforcement Learning (DRL) has been used in many complex service offloading problems; however, existing techniques are most suitable for centralized environments, and their convergence to the best-suitable solutions is slow. In addition, constituent parts of services often have predefined data dependencies and quality of service constraints, which further intensify the complexity of service offloading. To solve these issues, we propose a distributed DRL technique following the actor-critic architecture based on Asynchronous Proximal Policy Optimization (APPO) to achieve efficient and diverse distributed experience trajectory generation. Also, we employ PPO clipping and V-trace techniques for off-policy correction for faster convergence to the most suitable service offloading solutions. The results obtained demonstrate that our technique converges quickly, offers high scalability and adaptability, and outperforms its counterparts by improving the execution time of heterogeneous services. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.07407 [pdf, other]

Deep Reinforcement Learning-based Scheduling for Optimizing System Load and Response Time in Edge and Fog Computing Environments

Authors: Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya

Abstract: Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negati… ▽ More Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negatively affect IoT applications' response time. Moreover, many IoT applications are composed of dependent components incurring extra constraints for their execution. Besides, edge/fog computing environments and IoT applications are inherently dynamic and stochastic. Thus, efficient and adaptive scheduling of IoT applications in heterogeneous edge/fog computing environments is of paramount importance. However, limited computational resources on edge/fog servers imposes an extra burden for applying optimal but computationally demanding techniques. To overcome these challenges, we propose a Deep Reinforcement Learning-based IoT application Scheduling algorithm, called DRLIS to adaptively and efficiently optimize the response time of heterogeneous IoT applications and balance the load of the edge/fog servers. We implemented DRLIS as a practical scheduler in the FogBus2 function-as-a-service framework for creating an edge-fog-cloud integrated serverless computing environment. Results obtained from extensive experiments show that DRLIS significantly reduces the execution cost of IoT applications by up to 55%, 37%, and 50% in terms of load balancing, response time, and weighted cost, respectively, compared with metaheuristic algorithms and other reinforcement learning techniques. △ Less

Submitted 22 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.02834 [pdf, other]

FLight: A Lightweight Federated Learning Framework in Edge and Fog Computing

Authors: Wuji Zhu, Mohammad Goudarzi, Rajkumar Buyya

Abstract: The number of Internet of Things (IoT) applications, especially latency-sensitive ones, have been significantly increased. So, Cloud computing, as one of the main enablers of the IoT that offers centralized services, cannot solely satisfy the requirements of IoT applications. Edge/Fog computing, as a distributed computing paradigm, processes, and stores IoT data at the edge of the network, offerin… ▽ More The number of Internet of Things (IoT) applications, especially latency-sensitive ones, have been significantly increased. So, Cloud computing, as one of the main enablers of the IoT that offers centralized services, cannot solely satisfy the requirements of IoT applications. Edge/Fog computing, as a distributed computing paradigm, processes, and stores IoT data at the edge of the network, offering low latency, reduced network traffic, and higher bandwidth. The Edge/Fog resources are often less powerful compared to Cloud, and IoT data is dispersed among many geo-distributed servers. Hence, Federated Learning (FL), which is a machine learning approach that enables multiple distributed servers to collaborate on building models without exchanging the raw data, is well-suited to Edge/Fog computing environments, where data privacy is of paramount importance. Besides, to manage different FL tasks on Edge/Fog computing environments, a lightweight resource management framework is required to manage different incoming FL tasks while does not incur significant overhead on the system. Accordingly, in this paper, we propose a lightweight FL framework, called FLight, to be deployed on a diverse range of devices, ranging from resource limited Edge/Fog devices to powerful Cloud servers. FLight is implemented based on the FogBus2 framework, which is a containerized distributed resource management framework. Moreover, FLight integrates both synchronous and asynchronous models of FL. Besides, we propose a lightweight heuristic-based worker selection algorithm to select a suitable set of available workers to participate in the training step to obtain higher training time efficiency. The obtained results demonstrate the efficiency of the FLight. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2211.07238

arXiv:2305.08317 [pdf]

By-Software Branch Prediction in Loops

Authors: Maziar Goudarzi, Reza Azimi, Julian Humecki, Faizaan Rehman, Richard Zhang, Chirag Sethi, Tanishq Bomman, Yuqi Yang

Abstract: Load-Dependent Branches (LDB) often do not exhibit regular patterns in their local or global history and thus are inherently hard to predict correctly by conventional branch predictors. We propose a software-to-hardware branch pre-resolution mechanism that allows software to pass branch outcomes to the processor frontend ahead of fetching the branch instruction. A compiler pass identifies the inst… ▽ More Load-Dependent Branches (LDB) often do not exhibit regular patterns in their local or global history and thus are inherently hard to predict correctly by conventional branch predictors. We propose a software-to-hardware branch pre-resolution mechanism that allows software to pass branch outcomes to the processor frontend ahead of fetching the branch instruction. A compiler pass identifies the instruction chain leading to the branch (the branch backslice) and generates the pre-execute code that produces the branch outcomes ahead of the frontend observing them. The loop structure helps to unambiguously map the branch outcomes to their corresponding dynamic instances of the branch instruction. Our approach also allows for covering the loop iteration space selectively, with arbitrarily complex patterns. Our method for pre-execution enables important optimizations such as unrolling and vectorization, in order to substantially reduce the pre-execution overhead. Experimental results on select workloads from SPEC CPU 2017 and graph analytics workloads show up to 95% reduction of MPKI (21% on average), up to 39% speedup (7% on average), and up to 3x improvement on IPC (23% on average) compared to a core with TAGE-SC-L-64KB branch predictor. △ Less

Submitted 10 June, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

ACM Class: B.1.4.b; B.8; C.0.2; C.1.1.b; C.1.5.a; D.3.4.b; D.4.8.b

arXiv:2303.06896 [pdf]

Quality of Service (QoS)-driven Edge Computing and Smart Hospitals: A Vision, Architectural Elements, and Future Directions

Authors: Rajkumar Buyya, Satish N. Srirama, Redowan Mahmud, Mohammad Goudarzi, Leila Ismail, Vassilis Kostakos

Abstract: The Internet of Things (IoT) paradigm is drastically changing our world by making everyday objects an integral part of the Internet. This transformation is increasingly being adopted in the healthcare sector, where Smart Hospitals are now relying on IoT technologies to track staff, patients, devices, and equipment, both within a hospital and beyond. This paradigm opens the door to new innovations… ▽ More The Internet of Things (IoT) paradigm is drastically changing our world by making everyday objects an integral part of the Internet. This transformation is increasingly being adopted in the healthcare sector, where Smart Hospitals are now relying on IoT technologies to track staff, patients, devices, and equipment, both within a hospital and beyond. This paradigm opens the door to new innovations for creating novel types of interactions among objects, services, and people in smarter ways to enhance the quality of patient services and the efficient utilisation of resources. However, the realisation of real-time IoT applications in healthcare and, ultimately, the development of Smart Hospitals are constrained by their current Cloud-based computing environment. Edge computing emerged as a new computing model that harnesses edge-based resources alongside Clouds for real-time IoT applications. It helps to capitalise on the potential economic impact of the IoT paradigm of $11 trillion per year, with a trillion IoT devices deployed by 2025 to sense, manage and monitor the hospital systems in real-time. This vision paper proposes new algorithms and software systems to tackle important challenges in Edge computing-enabled Smart Hospitals, including how to manage and execute diverse real-time IoT applications and how to meet their diverse and strict Quality of Service (QoS) requirements in hospital settings. The vision we outline can help tackle timely challenges that hospitals increasingly face. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 14 pages, 6 figures

ACM Class: C.1.4

arXiv:2301.01758 [pdf, other]

An Ensemble Mobile-Cloud Computing Method for Affordable and Accurate Glucometer Readout

Authors: Navidreza Asadi, Maziar Goudarzi

Abstract: Despite essential efforts towards advanced wireless medical devices for regular monitoring of blood properties, many such devices are not available or not affordable for everyone in many countries. Alternatively using ordinary devices, patients ought to log data into a mobile health-monitoring manually. It causes several issues: (1) clients reportedly tend to enter unrealistic data; (2) typing val… ▽ More Despite essential efforts towards advanced wireless medical devices for regular monitoring of blood properties, many such devices are not available or not affordable for everyone in many countries. Alternatively using ordinary devices, patients ought to log data into a mobile health-monitoring manually. It causes several issues: (1) clients reportedly tend to enter unrealistic data; (2) typing values several times a day is bothersome and causes clients to leave the mobile app. Thus, there is a strong need to use now-ubiquitous smartphones, reducing error by capturing images from the screen of medical devices and extracting useful information automatically. Nevertheless, there are a few challenges in its development: (1) data scarcity has led to impractical methods with very low accuracy: to our knowledge, only small datasets are available in this case; (2) accuracy-availability tradeoff: one can execute a less accurate algorithm on a mobile phone to maintain higher availability, or alternatively deploy a more accurate and more compute-intensive algorithm on the cloud, however, at the cost of lower availability in poor/no connectivity situations. We present an ensemble learning algorithm, a mobile-cloud computing service architecture, and a simple compression technique to achieve higher availability and faster response time while providing higher accuracy by integrating cloud- and mobile-side predictions. Additionally, we propose an algorithm to generate synthetic training data which facilitates utilizing deep learning models to improve accuracy. Our proposed method achieves three main objectives: (1) 92.1% and 97.7% accuracy on two different datasets, improving previous methods by 40%, (2) reducing required bandwidth by 45x with 1% drop in accuracy, (3) and providing better availability compared to mobile-only, cloud-only, split computing, and early exit service models. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: 12 pages, 12 figures, 8 tables

arXiv:2211.06982 [pdf, ps, other]

FullPack: Full Vector Utilization for Sub-Byte Quantized Inference on General Purpose CPUs

Authors: Hossein Katebi, Navidreza Asadi, Maziar Goudarzi

Abstract: Although prior art has demonstrated negligible accuracy drop in sub-byte quantization -- where weights and/or activations are represented by less than 8 bits -- popular SIMD instructions of CPUs do not natively support these datatypes. While recent methods, such as ULPPACK, are already using sub-byte quantization on general-purpose CPUs with vector units, they leave out several empty bits between… ▽ More Although prior art has demonstrated negligible accuracy drop in sub-byte quantization -- where weights and/or activations are represented by less than 8 bits -- popular SIMD instructions of CPUs do not natively support these datatypes. While recent methods, such as ULPPACK, are already using sub-byte quantization on general-purpose CPUs with vector units, they leave out several empty bits between the sub-byte values in memory and in vector registers to avoid overflow to the neighbours during the operations. This results in memory footprint and bandwidth-usage inefficiencies and suboptimal performance. In this paper, we present memory layouts for storing, and mechanisms for processing sub-byte (4-, 2-, or 1-bit) models that utilize all the bits in the memory as well as in the vector registers for the actual data. We provide compute kernels for the proposed layout for the GEMV (GEneral Matrix-Vector multiplication) operations between weights and activations of different datatypes (e.g., 8-bit activations and 4-bit weights). For evaluation, we extended the TFLite package and added our methods to it, then ran the models on the cycle-accurate gem5 simulator to compare detailed memory and CPU cycles of each method. We compare against nine other methods that are actively used in production including GEMLOWP, Ruy, XNNPack, and ULPPACK. Furthermore, we explore the effect of different input and output sizes of deep learning layers on the performance of our proposed method. Experimental results show 0.96-2.1x speedup for small sizes and 1.2-6.7x speedup for mid to large sizes. Applying our proposal to a real-world speech recognition model, Mozilla DeepSpeech, we proved that our method achieves 1.56-2.11x end-to-end speedup compared to the state-of-the-art, depending on the bit-width employed. △ Less

Submitted 20 November, 2022; v1 submitted 13 November, 2022; originally announced November 2022.

arXiv:2210.08376 [pdf, other]

doi 10.1109/JIOT.2023.3285877

Variant Parallelism: Lightweight Deep Convolutional Models for Distributed Inference on IoT Devices

Authors: Navidreza Asadi, Maziar Goudarzi

Abstract: Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices: (1) model parallelism (MP) and (2) class parallelism (CP). In MP, transmitting bulky intermediate data (orders of magnitude larger than input) between devices imposes huge communication overhead. Although CP solves this problem, it has limitations on the… ▽ More Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices: (1) model parallelism (MP) and (2) class parallelism (CP). In MP, transmitting bulky intermediate data (orders of magnitude larger than input) between devices imposes huge communication overhead. Although CP solves this problem, it has limitations on the number of sub-models. In addition, both solutions are fault intolerant, an issue when deployed on edge devices. We propose variant parallelism (VP), an ensemble-based deep learning distribution method where different variants of a main model are generated and can be deployed on separate machines. We design a family of lighter models around the original model, and train them simultaneously to improve accuracy over single models. Our experimental results on six common mid-sized object recognition datasets demonstrate that our models can have 5.8-7.1x fewer parameters, 4.3-31x fewer multiply-accumulations (MACs), and 2.5-13.2x less response time on atomic inputs compared to MobileNetV2 while achieving comparable or higher accuracy. Our technique easily generates several variants of the base architecture. Each variant returns only 2k outputs 1 <= k <= (#classes/2), representing Top-k classes, instead of tons of floating point values required in MP. Since each variant provides a full-class prediction, our approach maintains higher availability compared with MP and CP in presence of failure. △ Less

Submitted 11 June, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: 8 pages, 6 figures, 7 tables

arXiv:2204.12580 [pdf, other]

Scheduling IoT Applications in Edge and Fog Computing Environments: A Taxonomy and Future Directions

Authors: Mohammad Goudarzi, Marimuthu Palaniswami, Rajkumar Buyya

Abstract: Fog computing, as a distributed paradigm, offers cloud-like services at the edge of the network with low latency and high-access bandwidth to support a diverse range of IoT application scenarios. To fully utilize the potential of this computing paradigm, scalable, adaptive, and accurate scheduling mechanisms and algorithms are required to efficiently capture the dynamics and requirements of users,… ▽ More Fog computing, as a distributed paradigm, offers cloud-like services at the edge of the network with low latency and high-access bandwidth to support a diverse range of IoT application scenarios. To fully utilize the potential of this computing paradigm, scalable, adaptive, and accurate scheduling mechanisms and algorithms are required to efficiently capture the dynamics and requirements of users, IoT applications, environmental properties, and optimization targets. This paper presents a taxonomy of recent literature on scheduling IoT applications in Fog computing. Based on our new classification schemes, current works in the literature are analyzed, research gaps of each category are identified, and respective future directions are described. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: ACM Computing Surveys (CSUR): Revised

arXiv:2203.05161 [pdf, other]

Container Orchestration in Edge and Fog Computing Environments for Real-Time IoT Applications

Authors: Zhiyu Wang, Mohammad Goudarzi, Jagannath Aryal, Rajkumar Buyya

Abstract: Resource management is the principal factor to fully utilize the potential of Edge/Fog computing to execute real-time and critical IoT applications. Although some resource management frameworks exist, the majority are not designed based on distributed containerized components. Hence, they are not suitable for highly distributed and heterogeneous computing environments. Containerized resource manag… ▽ More Resource management is the principal factor to fully utilize the potential of Edge/Fog computing to execute real-time and critical IoT applications. Although some resource management frameworks exist, the majority are not designed based on distributed containerized components. Hence, they are not suitable for highly distributed and heterogeneous computing environments. Containerized resource management frameworks such as FogBus2 enable efficient distribution of framework's components alongside IoT applications' components. However, the management, deployment, health-check, and scalability of a large number of containers are challenging issues. To orchestrate a multitude of containers, several orchestration tools are developed. But, many of these orchestration tools are heavy-weight and have a high overhead, especially for resource-limited Edge/Fog nodes. Thus, for hybrid computing environments, consisting of heterogeneous Edge/Fog and/or Cloud nodes, lightweight container orchestration tools are required to support both resource-limited resources at the Edge/Fog and resource-rich resources at the Cloud. Thus, in this paper, we propose a feasible approach to build a hybrid and lightweight cluster based on K3s, for the FogBus2 framework that offers containerized resource management framework. This work addresses the challenge of creating lightweight computing clusters in hybrid computing environments. It also proposes three design patterns for the deployment of the FogBus2 framework in hybrid environments, including 1) Host Network, 2) Proxy Server, and 3) Environment Variable. The performance evaluation shows that the proposed approach improves the response time of real-time IoT applications up to 29% with acceptable and low overhead. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 20 pages, 10 figures

arXiv:2110.12415 [pdf, other]

A Distributed Deep Reinforcement Learning Technique for Application Placement in Edge and Fog Computing Environments

Authors: Mohammad Goudarzi, Marimuthu Palaniswami, Rajkumar Buyya

Abstract: Fog/Edge computing is a novel computing paradigm supporting resource-constrained Internet of Things (IoT) devices by the placement of their tasks on the edge and/or cloud servers. Recently, several Deep Reinforcement Learning (DRL)-based placement techniques have been proposed in fog/edge computing environments, which are only suitable for centralized setups. The training of well-performed DRL age… ▽ More Fog/Edge computing is a novel computing paradigm supporting resource-constrained Internet of Things (IoT) devices by the placement of their tasks on the edge and/or cloud servers. Recently, several Deep Reinforcement Learning (DRL)-based placement techniques have been proposed in fog/edge computing environments, which are only suitable for centralized setups. The training of well-performed DRL agents requires manifold training data while obtaining training data is costly. Hence, these centralized DRL-based techniques lack generalizability and quick adaptability, thus failing to efficiently tackle application placement problems. Moreover, many IoT applications are modeled as Directed Acyclic Graphs (DAGs) with diverse topologies. Satisfying dependencies of DAG-based IoT applications incur additional constraints and increase the complexity of placement problems. To overcome these challenges, we propose an actor-critic-based distributed application placement technique, working based on the IMPortance weighted Actor-Learner Architectures (IMPALA). IMPALA is known for efficient distributed experience trajectory generation that significantly reduces the exploration costs of agents. Besides, it uses an adaptive off-policy correction method for faster convergence to optimal solutions. Our technique uses recurrent layers to capture temporal behaviors of input data and a replay buffer to improve the sample efficiency. The performance results, obtained from simulation and testbed experiments, demonstrate that our technique significantly improves the execution cost of IoT applications up to 30\% compared to its counterparts. △ Less

Submitted 24 October, 2021; originally announced October 2021.

Comments: This Paper is accepted in IEEE Transactions on Mobile Computing (TMC), on 23 October 2021

arXiv:2109.05636 [pdf, other]

IFogSim2: An Extended iFogSim Simulator for Mobility, Clustering, and Microservice Management in Edge and Fog Computing Environments

Authors: Redowan Mahmud, Samodha Pallewatta, Mohammad Goudarzi, Rajkumar Buyya

Abstract: Internet of Things (IoT) has already proven to be the building block for next-generation Cyber-Physical Systems (CPSs). The considerable amount of data generated by the IoT devices needs latency-sensitive processing, which is not feasible by deploying the respective applications in remote Cloud datacentres. Edge/Fog computing, a promising extension of Cloud at the IoT-proximate network, can meet s… ▽ More Internet of Things (IoT) has already proven to be the building block for next-generation Cyber-Physical Systems (CPSs). The considerable amount of data generated by the IoT devices needs latency-sensitive processing, which is not feasible by deploying the respective applications in remote Cloud datacentres. Edge/Fog computing, a promising extension of Cloud at the IoT-proximate network, can meet such requirements for smart CPSs. However, the structural and operational differences of Edge/Fog infrastructure resist employing Cloud-based service regulations directly to these environments. As a result, many research works have been recently conducted, focusing on efficient application and resource management in Edge/Fog computing environments. Scalable Edge/Fog infrastructure is a must to validate these policies, which is also challenging to accommodate in the real-world due to high cost and implementation time. Considering simulation as a key to this constraint, various software has been developed that can imitate the physical behaviour of Edge/Fog computing environments. Nevertheless, the existing simulators often fail to support advanced service management features because of their monolithic architecture, lack of actual dataset, and limited scope for a periodic update. To overcome these issues, we have developed multiple simulation models for service migration, dynamic distributed cluster formation, and microservice orchestration for Edge/Fog computing in this work and integrated with the existing iFogSim simulation toolkit for launching it as iFogSim2. The performance of iFogSim2 and its built-in policies are evaluated using three use case scenarios and compared with the contemporary simulators and benchmark policies under different settings. Results indicate that the proposed solution outperform others in service management time, network usage, ram consumption, and simulation time. △ Less

Submitted 15 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

Comments: The source code of the iFogSim2 simulator is accessible from: https://github.com/Cloudslab/iFogSim

arXiv:2108.02328 [pdf, other]

A Distributed Application Placement and Migration Management Techniques for Edge and Fog Computing Environments

Authors: Mohammad Goudarzi, Marimuthu Palaniswami, Rajkumar Buyya

Abstract: Fog/Edge computing model allows harnessing of resources in the proximity of the Internet of Things (IoT) devices to support various types of real-time IoT applications. However, due to the mobility of users and a wide range of IoT applications with different requirements, it is a challenging issue to satisfy these applications' requirements. The execution of IoT applications exclusively on one fog… ▽ More Fog/Edge computing model allows harnessing of resources in the proximity of the Internet of Things (IoT) devices to support various types of real-time IoT applications. However, due to the mobility of users and a wide range of IoT applications with different requirements, it is a challenging issue to satisfy these applications' requirements. The execution of IoT applications exclusively on one fog/edge server may not be always feasible due to limited resources, while execution of IoT applications on different servers needs further collaboration among servers. Also, considering user mobility, some modules of each IoT application may require migration to other servers for execution, leading to service interruption and extra execution costs. In this article, we propose a new weighted cost model for hierarchical fog computing environments, in terms of the response time of IoT applications and energy consumption of IoT devices, to minimize the cost of running IoT applications and potential migrations. Besides, a distributed clustering technique is proposed to enable the collaborative execution of tasks, emitted from application modules, among servers. Also, we propose an application placement technique to minimize the overall cost of executing IoT applications on multiple servers in a distributed manner. Furthermore, a distributed migration management technique is proposed for the potential migration of applications' modules to other remote servers as the users move along their path. Besides, failure recovery methods are embedded in the clustering, application placement, and migration management techniques to recover from unpredicted failures. The performance results show that our technique significantly improves its counterparts in terms of placement deployment time, average execution cost of tasks, total number of migrations, total number of interrupted tasks, and cumulative migration cost. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: Accepted as keynote paper in: 16th CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS FedCSIS 2021

arXiv:2108.00591 [pdf, other]

Resource Management in Edge and Fog Computing using FogBus2 Framework

Authors: Mohammad Goudarzi, Qifan Deng, Rajkumar Buyya

Abstract: Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse lat… ▽ More Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse latency-sensitive and computation-intensive IoT applications. Although some frameworks attempt to provide such integration, there are still several challenges to be addressed, such as dynamic scheduling of different IoT applications, scalability mechanisms, multi-platform support, and supporting different interaction models. FogBus2, as a new python-based framework, offers a lightweight and distributed container-based framework to overcome these challenges. In this chapter, we highlight key features of the FogBus2 framework alongside describing its main components. Besides, we provide a step-by-step guideline to set up an integrated computing environment, containing multiple cloud service providers (Hybrid-cloud) and edge devices, which is a prerequisite for any IoT application scenario. To obtain this, a low-overhead communication network among all computing resources is initiated by the provided scripts and configuration files. Next, we provide instructions and corresponding code snippets to install and run the main framework and its integrated applications. Finally, we demonstrate how to implement and integrate several new IoT applications and custom scheduling and scalability policies with the FogBus2 framework. △ Less

Submitted 1 August, 2021; originally announced August 2021.

Comments: Software Availability: The source code of the FogBus2 framework and newly implemented IoT applications and scheduling policies are accessible from the CLOUDS Laboratory GitHub webpage: https://github.com/Cloudslab/FogBus2

arXiv:2104.07714 [pdf]

doi 10.22042/isecure.2020.226400.535

Providing a hybrid cryptography algorithm for lightweight authentication protocol in RFID with urban traffic usage case

Authors: V. Chegeni, H. Haj Seyyed javadi, M. R Moazami Goudarzi, A. Rezakhani

Abstract: Today, the Internet of Things (IoT) is one of the emerging technologies that enable the connection and transfer of information through communication networks. The main idea of the IoT is the widespread presence of objects such as mobile devices, sensors, and RFID. With the increase in traffic volume in urban areas, the existing intelligent urban traffic management system based on IoT can be vital.… ▽ More Today, the Internet of Things (IoT) is one of the emerging technologies that enable the connection and transfer of information through communication networks. The main idea of the IoT is the widespread presence of objects such as mobile devices, sensors, and RFID. With the increase in traffic volume in urban areas, the existing intelligent urban traffic management system based on IoT can be vital. Therefore, this paper focused on security in urban traffic based on using RFID. In our scheme, RFID tags chose as the purpose of this article. We, in this paper, present a mutual authentication protocol that leads to privacy based on hybrid cryptography. Also, an authentication process with RFID tags is proposed that can be read at high speed. The protocol has attempted to reduce the complexity of computing. At the same time, the proposed method can withstand attacks such as spoofing of tag and reader, tag tracking, and replay attack. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: 10 pages,2 figures

arXiv:2003.13675 [pdf, other]

On Coordination of Smart Grid and Cooperative Cloud Providers

Authors: Monireh Mohebbi Moghaddam, Mohammad Hossein Manshaei, Mehdi Naderi Soorki, Walid Saad, Maziar Goudarzi, Dusit Niyato

Abstract: Cooperative cloud providers in the form of cloud federations can potentially reduce their energy costs by exploiting electricity price fluctuations across different locations. In this environment, on the one hand, the electricity price has a significant influence on the federations formed, and, thus, on the profit earned by the cloud providers, and on the other hand, the cloud cooperation has an i… ▽ More Cooperative cloud providers in the form of cloud federations can potentially reduce their energy costs by exploiting electricity price fluctuations across different locations. In this environment, on the one hand, the electricity price has a significant influence on the federations formed, and, thus, on the profit earned by the cloud providers, and on the other hand, the cloud cooperation has an inevitable impact on the performance of the smart grid. In this regard, the interaction between independent cloud providers and the smart grid is modeled as a two-stage Stackelberg game interleaved with a coalitional game in this paper. In this game, in the first stage the smart grid, as a leader chooses a proper electricity pricing mechanism to maximize its own profit. In the second stage, cloud providers cooperatively manage their workload to minimize their electricity costs. Given the dynamic of cloud providers in the federation formation process, an optimization model based on a constrained Markov decision process (CMDP) has been used by the smart grid to achieve the optimal policy. Numerical results show that the proposed solution yields around 28% and 29% profit improvement on average for the smart grid, and the cloud providers, respectively, compared to the noncooperative scheme △ Less

Submitted 30 March, 2020; originally announced March 2020.

arXiv:2003.02820 [pdf]

Workload Scheduling on heterogeneous Mobile Edge Cloud in 5G networks to Minimize SLA Violation

Authors: Mostafa Hadadian Nejad Yousefi, Amirmasoud Ghiassi, Boshra Sadat Hashemi, Maziar Goudarzi

Abstract: Smart devices have become an indispensable part of our lives and gain increasing applicability in almost every area. Latency-aware applications such as Augmented Reality (AR), autonomous driving, and online gaming demand more resources such as network bandwidth and computational capabilities. Since the traditional mobile networks cannot fulfill the required bandwidth and latency, Mobile Edge Cloud… ▽ More Smart devices have become an indispensable part of our lives and gain increasing applicability in almost every area. Latency-aware applications such as Augmented Reality (AR), autonomous driving, and online gaming demand more resources such as network bandwidth and computational capabilities. Since the traditional mobile networks cannot fulfill the required bandwidth and latency, Mobile Edge Cloud (MEC) emerged to provide cloud computing capabilities in the proximity of users on 5G networks. In this paper, we consider a heterogeneous MEC network with numerous mobile users that send their tasks to MEC servers. Each task has a maximum acceptable response time. Non-uniform distribution of users makes some MEC servers hotspots that cannot take more. A solution is to relocate the tasks among MEC servers, called Workload Migration. We formulate this problem of task scheduling as a mixed-integer non-linear optimization problem to minimize the number of Service Level Agreement (SLA) violations. Since solving this optimization problem has high computational complexity, we introduce a greedy algorithm called MESA, Migration Enabled Scheduling Algorithm, which reaches a near-optimal solution quickly. Our experiments show that in the term of SLA violation, MESA is only 8% and 11% far from the optimal choice on the average and the worst-case, respectively. Moreover, the migration enabled solution can reduce SLA violations by about 30% compare to assigning tasks to MEC servers without migration. △ Less

Submitted 21 March, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

Comments: 12 pages, 8 figures, 4 tables contact: hadadian AT ce DOT sharif DOT edu

arXiv:2001.10308 [pdf, other]

doi 10.1186/s40537-023-00771-y

A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster

Authors: Hamid Nasiri, Saeed Nasehi, Arman Divband, Maziar Goudarzi

Abstract: In the most popular distributed stream processing frameworks (DSPFs), programs are modeled as a directed acyclic graph. This model allows a DSPF to benefit from the parallelism power of distributed clusters. However, choosing the proper number of vertices for each operator and finding an appropriate mapping between these vertices and processing resources have a determinative effect on overall thro… ▽ More In the most popular distributed stream processing frameworks (DSPFs), programs are modeled as a directed acyclic graph. This model allows a DSPF to benefit from the parallelism power of distributed clusters. However, choosing the proper number of vertices for each operator and finding an appropriate mapping between these vertices and processing resources have a determinative effect on overall throughput and resource utilization; while the simplicity of current DSPFs' schedulers leads these frameworks to perform poorly on large-scale clusters. In this paper, we present the design and implementation of a heterogeneity-aware scheduling algorithm that finds the proper number of the vertices of an application graph and maps them to the most suitable cluster node. We start to scale up the application graph over a given cluster gradually, by increasing the topology input rate and taking new instances from bottlenecked vertices. Our experimental results on Storm Micro-Benchmark show that 1) the prediction model estimate CPU utilization with 92% accuracy. 2) Compared to default scheduler of Storm, our scheduler provides 7% to 44% throughput enhancement. 3) The proposed method can find the solution within 4% (worst case) of the optimal scheduler which obtains the best scheduling scenario using an exhaustive search on problem design space. △ Less

Submitted 28 January, 2020; originally announced January 2020.

Journal ref: J Big Data 10, 103 (2023)

arXiv:1709.00411 [pdf, other]

doi 10.1109/ISPDC.2017.26

On Reliability-Aware Server Consolidation in Cloud Datacenters

Authors: Amir Varasteh, Farzad Tashtarian, Maziar Goudarzi

Abstract: In the past few years, datacenter (DC) energy consumption has become an important issue in technology world. Server consolidation using virtualization and virtual machine (VM) live migration allows cloud DCs to improve resource utilization and hence energy efficiency. In order to save energy, consolidation techniques try to turn off the idle servers, while because of workload fluctuations, these o… ▽ More In the past few years, datacenter (DC) energy consumption has become an important issue in technology world. Server consolidation using virtualization and virtual machine (VM) live migration allows cloud DCs to improve resource utilization and hence energy efficiency. In order to save energy, consolidation techniques try to turn off the idle servers, while because of workload fluctuations, these offline servers should be turned on to support the increased resource demands. These repeated on-off cycles could affect the hardware reliability and wear-and-tear of servers and as a result, increase the maintenance and replacement costs. In this paper we propose a holistic mathematical model for reliability-aware server consolidation with the objective of minimizing total DC costs including energy and reliability costs. In fact, we try to minimize the number of active PMs and racks, in a reliability-aware manner. We formulate the problem as a Mixed Integer Linear Programming (MILP) model which is in form of NP-complete. Finally, we evaluate the performance of our approach in different scenarios using extensive numerical MATLAB simulations. △ Less

Submitted 1 September, 2017; originally announced September 2017.

Comments: International Symposium on Parallel and Distributed Computing (ISPDC), Innsbruck, Austria, 2017

arXiv:1302.2227 [pdf]

Virtual Machine Consolidation for Datacenter Energy Improvement

Authors: Sina Esfandiarpoor, Ali Pahlavan, Maziar Goudarzi

Abstract: Rapid growth and proliferation of cloud computing services around the world has increased the necessity and significance of improving the energy efficiency of could implementations. Virtual machines (VM) comprise the backend of most, if not all, cloud computing services. Several VMs are often consolidated on a physical machine to better utilize its resources. We take into account the cooling and n… ▽ More Rapid growth and proliferation of cloud computing services around the world has increased the necessity and significance of improving the energy efficiency of could implementations. Virtual machines (VM) comprise the backend of most, if not all, cloud computing services. Several VMs are often consolidated on a physical machine to better utilize its resources. We take into account the cooling and network structure of the datacenter hosting the physical machines when consolidating the VMs so that fewer racks and routers are employed, without compromising the service-level agreements, so that unused routing and cooling equipment can be turned off to reduce energy consumption. Our experimental results on four benchmarks shows that our technique improves energy consumption of servers, network equipment, and cooling systems by 2.5%, 18.8%, and 28.2% respectively, resulting in a total of 14.7% improvement on average in the entire datacenter. △ Less

Submitted 9 February, 2013; originally announced February 2013.

Comments: This is draft version. The finally version will be published

Showing 1–26 of 26 results for author: Goudarzi, M