-
Benchmark-based Study of CPU/GPU Power-Related Features through JAX and TensorFlow
Authors:
Roblex Nana Tchakoute,
Claude Tadonki,
Petr Dokladal,
Youssef Mesri
Abstract:
Power management has become a crucial focus in the modern computing landscape, considering that {\em energy} is increasingly recognized as a critical resource. This increased the importance of all topics related to {\em energy-aware computing}. This paper presents an experimental study of three prevalent power management techniques that are {\em power limitation, frequency limitation}, and {\em AC…
▽ More
Power management has become a crucial focus in the modern computing landscape, considering that {\em energy} is increasingly recognized as a critical resource. This increased the importance of all topics related to {\em energy-aware computing}. This paper presents an experimental study of three prevalent power management techniques that are {\em power limitation, frequency limitation}, and {\em ACPI/P-State governor modes} (OS states related to power consumption). Through a benchmark approach with a set of six computing kernels, we investigate {\em power/performance} trade-off with various hardware units and software frameworks (mainly TensorFlow and JAX). Our experimental results show that {\em frequency limitation} is the most effective technique to improve {\em Energy-Delay Product (EDP)}, which is a convolution of energy and running time. We also observe that running at the highest frequency compared to a reduced one could lead to a reduction of factor $\frac{1}{10}$ in EDP. Another noticeable fact is that frequency management shows a consistent behavior with different CPUs, whereas opposite effects sometimes occur between TensorFlow (TF) and JAX with the same power management settings.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Energy Concerns with HPC Systems and Applications
Authors:
Roblex Nana,
Claude Tadonki,
Petr Dokladal,
Youssef Mesri
Abstract:
For various reasons including those related to climate changes, {\em energy} has become a critical concern in all relevant activities and technical designs. For the specific case of computer activities, the problem is exacerbated with the emergence and pervasiveness of the so called {\em intelligent devices}. From the application side, we point out the special topic of {\em Artificial Intelligence…
▽ More
For various reasons including those related to climate changes, {\em energy} has become a critical concern in all relevant activities and technical designs. For the specific case of computer activities, the problem is exacerbated with the emergence and pervasiveness of the so called {\em intelligent devices}. From the application side, we point out the special topic of {\em Artificial Intelligence}, who clearly needs an efficient computing support in order to succeed in its purpose of being a {\em ubiquitous assistant}. There are mainly two contexts where {\em energy} is one of the top priority concerns: {\em embedded computing} and {\em supercomputing}. For the former, power consumption is critical because the amount of energy that is available for the devices is limited. For the latter, the heat dissipated is a serious source of failure and the financial cost related to energy is likely to be a significant part of the maintenance budget. On a single computer, the problem is commonly considered through the electrical power consumption. This paper, written in the form of a survey, we depict the landscape of energy concerns in computer activities, both from the hardware and the software standpoints.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
High Performance Optimization at the Door of the Exascale
Authors:
Claude Tadonki
Abstract:
quest for processing speed potential. In fact, we always get a fraction of the technically available computing power (so-called {\em theoretical peak}), and the gap is likely to go hand-to-hand with the hardware complexity of the target system. Among the key aspects of this complexity, we have: the {\em heterogeneity} of the computing units, the {\em memory hierarchy and partitioning} including th…
▽ More
quest for processing speed potential. In fact, we always get a fraction of the technically available computing power (so-called {\em theoretical peak}), and the gap is likely to go hand-to-hand with the hardware complexity of the target system. Among the key aspects of this complexity, we have: the {\em heterogeneity} of the computing units, the {\em memory hierarchy and partitioning} including the non-uniform memory access (NUMA) configuration, and the {\em interconnect} for data exchanges among the computing nodes. Scientific investigations and cutting-edge technical activities should ideally scale-up with respect to sustained performance. The special case of quantitative approaches for solving (large-scale) problems deserves a special focus. Indeed, most of common real-life problems, even when considering the artificial intelligence paradigm, rely on optimization techniques for the main kernels of algorithmic solutions. Mathematical programming and pure combinatorial methods are not easy to implement efficiently on large-scale supercomputers because of {\em irregular control flow}, {\em complex memory access patterns}, {\em heterogeneous kernels}, {\em numerical issues}, to name a few. We describe and examine our thoughts from the standpoint of large-scale supercomputers.
△ Less
Submitted 22 June, 2021;
originally announced June 2021.
-
Conceptual and Technical Challenges for High Performance Computing
Authors:
Claude Tadonki
Abstract:
High Performance Computing (HPC) aims at providing reasonably fast computing solutions to scientific and real life problems. The advent of multicore architectures is noticeable in the HPC history, because it has brought the underlying parallel programming concept into common considerations. At a larger scale, there is a keen interest in building or hosting frontline supercomputers; the Top500 rank…
▽ More
High Performance Computing (HPC) aims at providing reasonably fast computing solutions to scientific and real life problems. The advent of multicore architectures is noticeable in the HPC history, because it has brought the underlying parallel programming concept into common considerations. At a larger scale, there is a keen interest in building or hosting frontline supercomputers; the Top500 ranking is a nice illustration of this (implicit) racing. Supercomputers, as well as ordinary computers, have fallen in price for years while gaining processing power. We clearly see that, what commonly springs up in mind when it comes to HPC is computer capability. However, when going deeper into the topic, especially on large-scale problems, it appears that the processing speed by itself is no longer sufficient. Indeed, the real concern of HPC users is the time-to-output. Thus, we need to study each important aspect in the critical path between inputs and outputs. The first step is clearly the method, which is a conjunction of modelling with specific considerations (hypothesis, simplifications, constraints, to name a few) and a corresponding algorithm, which could be numerical and/or non numerical. Then comes the topic of programming, which should yield a skillful mapping of the algorithm onto HPC machines. Based on multicore processors, probably enhanced with acceleration units, current generation of supercomputers is rated to deliver an increasing peak performance, the Exascale era being the current horizon. However, getting a high fraction of the available peak performance is more and more difficult. The Design of an efficient code that scales well on a supercomputer is a non-trivial task. The present note will discuss the aforementioned points, interleaved with commented contributions from the literature and our personal views.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
OpenMP Parallelization of Dynamic Programming and Greedy Algorithms
Authors:
Claude Tadonki
Abstract:
Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore implementation for a important algorithmic kernel is a genuine contribution. From a methodology standpoint, this should be done at the level of the underlying paradigm if…
▽ More
Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore implementation for a important algorithmic kernel is a genuine contribution. From a methodology standpoint, this should be done at the level of the underlying paradigm if any. In this paper, we study the cases of {\em dynamic programming} and {\em greedy algorithms}, which are two major algorithmic paradigms. We exclusively consider directives-based loop parallelization through OpenMP and investigate necessary pre-transformations to reach a regular parallel form. We evaluate our methodology with a selection of well-known combinatorial optimization problems on an INTEL Broadwell processor. Key points for scalability are discussed before and after experimental results. Our immediate perspective is to extend our study to the manycore case, with a special focus on NUMA configurations.
△ Less
Submitted 20 January, 2020;
originally announced January 2020.
-
Basic Parallel and Distributed Computing Curriculum
Authors:
Claude Tadonki
Abstract:
With the advent of multi-core processors and their fast expansion, it is quite clear that {\em parallel computing} is now a genuine requirement in Computer Science and Engineering (and related) curriculum. In addition to the pervasiveness of parallel computing devices, we should take into account the fact that there are lot of existing softwares that are implemented in the sequential mode, and thu…
▽ More
With the advent of multi-core processors and their fast expansion, it is quite clear that {\em parallel computing} is now a genuine requirement in Computer Science and Engineering (and related) curriculum. In addition to the pervasiveness of parallel computing devices, we should take into account the fact that there are lot of existing softwares that are implemented in the sequential mode, and thus need to be adapted for a parallel execution. Therefore, it is required to the programmer to be able to design parallel programs and also to have some skills in moving from a given sequential code to the corresponding parallel code. In this paper, we present a basic educational scenario on how to give a consistent and efficient background in parallel computing to ordinary computer scientists and engineers.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
HPC Curriculum and Associated Ressources in the Academic Context
Authors:
Claude Tadonki
Abstract:
Hardware support for high-performance computing (HPC) has so far been subject to significant advances. The pervasiveness of HPC systems, mainly made up with parallel computing units, makes it crucial to spread and vivify effective HPC curricula. Besides didactic considerations, it appears very important to implement HPC hardware infrastructures that will serves for practices, and also for scientif…
▽ More
Hardware support for high-performance computing (HPC) has so far been subject to significant advances. The pervasiveness of HPC systems, mainly made up with parallel computing units, makes it crucial to spread and vivify effective HPC curricula. Besides didactic considerations, it appears very important to implement HPC hardware infrastructures that will serves for practices, and also for scientific and industrial requests. The latter ensures a valuable connection with surrounding cutting-edge research activities in other topics ({\em life sciences, physics, data mining, applied mathematics, finance, quantitative economy, engineering sciences}, to name a few), and also with industrial entities and services providers from their requests related to HPC means and expertise. This aspect is very important as it makes an HPC Center becoming a social actor, while bringing real-life scenarios into the academic context. The current paper describes the major steps and objectives for a consistent HPC curriculum, with specific analyses of particular contexts; suggests how to technically set up operational HPC infrastructures; and discusses the connection with end-users, all these in both effective and prospective standpoints.
△ Less
Submitted 4 February, 2018;
originally announced February 2018.
-
Automated Code Generation for Lattice Quantum Chromodynamics and beyond
Authors:
Denis Barthou,
Olivier Brand-Foissac,
Romain Dolbeau,
Gilbert Grosdidier,
Christina Eisenbeis,
Michael Kruse,
Olivier Pene,
Konstantin Petrov,
Claude Tadonki
Abstract:
We present here our ongoing work on a Domain Specific Language which aims to simplify Monte-Carlo simulations and measurements in the domain of Lattice Quantum Chromodynamics. The tool-chain, called Qiral, is used to produce high-performance OpenMP C code from LaTeX sources. We discuss conceptual issues and details of implementation and optimization. The comparison of the performance of the genera…
▽ More
We present here our ongoing work on a Domain Specific Language which aims to simplify Monte-Carlo simulations and measurements in the domain of Lattice Quantum Chromodynamics. The tool-chain, called Qiral, is used to produce high-performance OpenMP C code from LaTeX sources. We discuss conceptual issues and details of implementation and optimization. The comparison of the performance of the generated code to the well-established simulation software is also made.
△ Less
Submitted 9 January, 2014;
originally announced January 2014.
-
QIRAL: A High Level Language for Lattice QCD Code Generation
Authors:
Denis Barthou,
Gilbert Grosdidier,
Michael Kruse,
Olivier Pène,
Claude Tadonki
Abstract:
Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at mod- eling the strong nuclear force, which is responsible for the interactions of nuclear particles. Lattice QCD (LQCD) is the corresponding discrete formulation, widely used for simula- tions. The computational demand for the LQCD is tremendous. It has played a role in the history of supercomputers, and has also helped de…
▽ More
Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at mod- eling the strong nuclear force, which is responsible for the interactions of nuclear particles. Lattice QCD (LQCD) is the corresponding discrete formulation, widely used for simula- tions. The computational demand for the LQCD is tremendous. It has played a role in the history of supercomputers, and has also helped defining their future. Designing efficient LQCD codes that scale well on large (probably hybrid) supercomputers requires to express many levels of parallelism, and then to explore different algorithmic solutions. While al- gorithmic exploration is the key for efficient parallel codes, the process is hampered by the necessary coding effort. We present in this paper a domain-specific language, QIRAL, for a high level expression of parallel algorithms in LQCD. Parallelism is expressed through the mathematical struc- ture of the sparse matrices defining the problem. We show that from these expressions and from algorithmic and preconditioning formulations, a parallel code can be automatically generated. This separates algorithms and mathematical formulations for LQCD (that be- long to the field of physics) from the effective orchestration of parallelism, mainly related to compilation and optimization for parallel architectures.
△ Less
Submitted 16 August, 2012;
originally announced August 2012.
-
Parallel Chip Firing Game associated with n-cube orientations
Authors:
René Ndoundam,
Maurice Tchuente,
Claude Tadonki
Abstract:
We study the cycles generated by the chip firing game associated with n-cube orientations. We show the existence of the cycles generated by parallel evolutions of even lengths from 2 to $2^n$ on $H_n$ (n >= 1), and of odd lengths different from 3 and ranging from 1 to $2^{n-1}-1$ on $H_n$ (n >= 4).
We study the cycles generated by the chip firing game associated with n-cube orientations. We show the existence of the cycles generated by parallel evolutions of even lengths from 2 to $2^n$ on $H_n$ (n >= 1), and of odd lengths different from 3 and ranging from 1 to $2^{n-1}-1$ on $H_n$ (n >= 4).
△ Less
Submitted 1 July, 2010;
originally announced July 2010.