-
Designing, Developing, and Validating Network Intelligence for Scaling in Service-Based Architectures based on Deep Reinforcement Learning
Authors:
Paola Soto,
Miguel Camelo,
Danny De Vleeschauwer,
Yorick De Bock,
Nina Slamnik-Kriještorac,
Chia-Yu Chang,
Natalia Gaviria,
Erik Mannens,
Juan F. Botero,
Steven Latré
Abstract:
Automating network processes without human intervention is crucial for the complex Sixth Generation (6G) environment. Thus, 6G networks must advance beyond basic automation, relying on Artificial Intelligence (AI) and Machine Learning (ML) for self-optimizing and autonomous operation. This requires zero-touch management and orchestration, the integration of Network Intelligence (NI) into the netwo…
▽ More
Automating network processes without human intervention is crucial for the complex Sixth Generation (6G) environment. Thus, 6G networks must advance beyond basic automation, relying on Artificial Intelligence (AI) and Machine Learning (ML) for self-optimizing and autonomous operation. This requires zero-touch management and orchestration, the integration of Network Intelligence (NI) into the network architecture, and the efficient lifecycle management of intelligent functions. Despite its potential, integrating NI poses challenges in model development and application. To tackle those issues, this paper presents a novel methodology to manage the complete lifecycle of Reinforcement Learning (RL) applications in networking, thereby enhancing existing Machine Learning Operations (MLOps) frameworks to accommodate RL-specific tasks. We focus on scaling computing resources in service-based architectures, modeling the problem as a Markov Decision Process (MDP). Two RL algorithms, guided by distinct Reward Functions (RFns), are proposed to autonomously determine the number of service replicas in dynamic environments. Our proposed methodology is anchored on a dual approach: firstly, it evaluates the training performance of these algorithms under varying RFns, and secondly, it validates their performance after being trained to discern the practical applicability in real-world settings. We show that, despite significant progress, the development stage of RL techniques for networking applications, particularly in scaling scenarios, still leaves room for significant improvements. This study underscores the importance of ongoing research and development to enhance the practicality and resilience of RL techniques in real-world networking environments.
△ Less
Submitted 16 October, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
An Adaptable and Agnostic Flow Scheduling Approach for Data Center Networks
Authors:
Sergio Armando Gutiérrez,
Juan Felipe Botero,
John Willian Branch
Abstract:
Cloud applications have reshaped the model of services and infrastructure of the Internet. Search engines, social networks, content delivery and retail and e-commerce sites belong to this group of applications. An important element in the architecture of data centers where these applications run is the communication infrastructure, commonly known as data center networks (DCNs). A critical challeng…
▽ More
Cloud applications have reshaped the model of services and infrastructure of the Internet. Search engines, social networks, content delivery and retail and e-commerce sites belong to this group of applications. An important element in the architecture of data centers where these applications run is the communication infrastructure, commonly known as data center networks (DCNs). A critical challenge DCNs have to address is the processing of the traffic of cloud applications, which due to its properties is essentially different to the traffic of other Internet applications. In order to improve the responsiveness and throughput of applications, DCNs should be able to prioritize short flows (a few KB) over long flows (several MB). However, given the time and space variations the traffic presents, the information about flow sizes is not available in advance in order to plan the flow scheduling. In this paper, we present an adaptable mechanism called Adaptable Workload-Agnostic Flow Scheduling (AWAFS). It is an adaptable approach that can adjust in an agnostic way the scheduling configuration of DCN forwarding devices. This agnostic adjustment contributes to reduce the Flow Completion Time (FCT) of those short flows, representing around 85% of the traffic handled by cloud applications. Our evaluation results based on simulation show that AWAFS can reduce the average FCT of short flows between 16.9% and 45.2% when compared to the best existing agnostic non-adaptable solution, without inducing starvation on long flows. Indeed, it can provide improvements as high as 39% for long flows. Additionally, AWAFS can improve the FCT for short flows in scenarios with high heterogeneity in the traffic present in the network, with a reduction up to 5% for the average FCT and 15% for the tail FCT.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Watching Smartly from the Bottom: Intrusion Detection revamped through Programmable Networks and Artificial Intelligence
Authors:
Sergio Armando Gutiérrez,
John Willian Branch,
Luciano Paschoal Gaspary,
Juan Felipe Botero
Abstract:
The advent of Programmable Data Planes represents an outstanding evolution and complete revolution of the Software- Defined Networking paradigm. The capacity to define the entire behavior of forwarding devices by controlling the packet parsing procedures and executing custom operations enables offloading functionalities traditionally performed at the control plane. A recent research line has explo…
▽ More
The advent of Programmable Data Planes represents an outstanding evolution and complete revolution of the Software- Defined Networking paradigm. The capacity to define the entire behavior of forwarding devices by controlling the packet parsing procedures and executing custom operations enables offloading functionalities traditionally performed at the control plane. A recent research line has explored the possibility of even offloading to the data plane part of Artificial Intelligence algorithms, and more specifically, Machine Learning ones, to increase their accuracy and responsiveness (by having more detailed visibility of the traffic). This introduces a significant opportunity for evolution in the critical field of Intrusion Detection. However, offloading functionalities to the data plane is not a straightforward task. In this paper, we discuss how Programmable Data Planes might complement different stages of an Intrusion Detection System based on Machine Learning. We present two use cases that make evident the feasibility of this approach and highlight aspects that must be considered when addressing the challenge of deploying solutions leveraging data-plane functionalities.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
ORACLE: Collaboration of Data and Control Planes to Detect DDoS Attacks
Authors:
Sebastián Gómez Macías,
Luciano Paschoal Gaspary,
Juan Felipe Botero
Abstract:
The possibility of programming the control and data planes, enabled by the Software-Defined Networking (SDN) paradigm, represents a fertile ground on top of which novel operation and management mechanisms can be fully explored, being Distributed Denial of Service (DDoS) attack detection based on machine learning techniques the focus of this work. To carry out the detection, this paper proposes ORA…
▽ More
The possibility of programming the control and data planes, enabled by the Software-Defined Networking (SDN) paradigm, represents a fertile ground on top of which novel operation and management mechanisms can be fully explored, being Distributed Denial of Service (DDoS) attack detection based on machine learning techniques the focus of this work. To carry out the detection, this paper proposes ORACLE: cOllaboRation of dAta and Control pLanEs to detect DDoS attacks, an architecture that promotes the coordination of control and data planes to detect network attacks. As its first contribution, this architecture delegates to the data plane the extraction and processing of traffic information collected per flow. This is done in order to ease the calculation and classification of the feature set used in the attack detection, as the needed flow information is already processed when it arrives at the control plane. Besides, as the second contribution, this architecture breaks the limitations to calculate some features that are not possible to implement in a traditional OpenFlow-based environment. In the evaluation of ORACLE, we obtained up to 96% of accuracy in the testing phase, using a K-Nearest Neighbor model.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.