Showing 1–2 of 2 results for author: Halawa, M S

Search v0.5.6 released 2020-02-24

arXiv:2312.06546 [pdf]

cs.DC cs.AI

doi 10.3390/s20154111

Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers

Authors: Mohamed S. Halawa, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas

Abstract: Performance analysis is an essential task in High-Performance Computing (HPC) systems and it is applied for different purposes such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of Key Performance Indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network… ▽ More Performance analysis is an essential task in High-Performance Computing (HPC) systems and it is applied for different purposes such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of Key Performance Indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper is to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system. With this aim, we have applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician Computation Center (CESGA). We have concluded that (i) those metrics (KPIs) related to the Network (interface) traffic monitoring provide the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms are the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 22 pages, 6 figures, journal

Journal ref: Sensors, 2020, vol. 20, no 15, p. 4111
arXiv:2312.06534 [pdf]

cs.AI

doi 10.1109/ACCESS.2021.3057427

KPIs-Based Clustering and Visualization of HPC jobs: a Feature Reduction Approach

Authors: Mohamed Soliman Halawa, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas

Abstract: High-Performance Computing (HPC) systems need to be constantly monitored to ensure their stability. The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc. A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as… ▽ More High-Performance Computing (HPC) systems need to be constantly monitored to ensure their stability. The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc. A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as well as the early detection of issues. In this paper, we introduce a methodology to cluster HPC jobs according to their KPI indicators. Our approach reduces the inherent high dimensionality of the collected data by applying two techniques to the time series: literature-based and variance-based feature extraction. We also define a procedure to visualize the obtained clusters by combining the two previous approaches and the Principal Component Analysis (PCA). Finally, we have validated our contributions on a real data set to conclude that those KPIs related to CPU usage provide the best cohesion and separation for clustering analysis and the good results of our visualization methodology. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 23 pages, 11 figures

Journal ref: IEEE Access, 2021, vol. 9, p. 25522-25543

Search v0.5.6 released 2020-02-24