-
Resilience-by-Design in 6G Networks: Literature Review and Novel Enabling Concepts
Authors:
Ladan Khaloopour,
Yanpeng Su,
Florian Raskob,
Tobias Meuser,
Roland Bless,
Leon Janzen,
Kamyar Abedi,
Marko Andjelkovic,
Hekma Chaari,
Pousali Chakraborty,
Michael Kreutzer,
Matthias Hollick,
Thorsten Strufe,
Norman Franchi,
Vahid Jamali
Abstract:
The sixth generation (6G) mobile communication networks are expected to intelligently integrate into various aspects of modern digital society, including smart cities, homes, health-care, transportation, and factories. While offering a multitude of services, it is likely that societies become increasingly reliant on 6G infrastructure. Any disruption to these digital services, whether due to human…
▽ More
The sixth generation (6G) mobile communication networks are expected to intelligently integrate into various aspects of modern digital society, including smart cities, homes, health-care, transportation, and factories. While offering a multitude of services, it is likely that societies become increasingly reliant on 6G infrastructure. Any disruption to these digital services, whether due to human or technical failures, natural disasters, or terrorism, would significantly impact citizens' daily lives. Hence, 6G networks need not only to provide high-performance services but also to be resilient in maintaining essential services in the face of potentially unknown challenges. This paper provides a general review of the state of the art on resilient systems, definitions, concepts, and approaches. Moreover, it introduces a comprehensive concept, i.e., resilience-by-design (RBD), in three different levels for designing resilient 6G communication networks, summarizing our initial studies within the German Open6GHub project. First, we outline the general RBD enabling principles and discuss their related sub-categories. Next, adopting an interdisciplinary approach, we propose to embed these principles across all 6G layers/perspectives including electronics, physical channel, network components and functions, networks, services, and cross-layer and cross-infrastructure considerations and discuss their challenges. We further elaborate the RBD principles and their realizations along with several 6G use-cases. The paper is concluded by presenting a comprehensive list of open problems for future research on 6G resilience.
△ Less
Submitted 23 September, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Using Text Embeddings for Deductive Qualitative Research at Scale in Physics Education
Authors:
Tor Ole B. Odden,
Halvor Tyseng,
Jonas Timmann Mjaaland,
Markus Fleten Kreutzer,
Anders Malthe-Sørenssen
Abstract:
We propose a technique for performing deductive qualitative data analysis at scale on text-based data. Using a natural language processing technique known as text embeddings, we create vector-based representations of texts in a high-dimensional meaning space within which it is possible to quantify differences as vector distances. To apply the technique, we build off prior work that used topic mode…
▽ More
We propose a technique for performing deductive qualitative data analysis at scale on text-based data. Using a natural language processing technique known as text embeddings, we create vector-based representations of texts in a high-dimensional meaning space within which it is possible to quantify differences as vector distances. To apply the technique, we build off prior work that used topic modeling via Latent Dirichlet Allocation to thematically analyze 18 years of the Physics Education Research Conference proceedings literature. We first extend this analysis through 2023. Next, we create embeddings of all texts and, using representative articles from the 10 topics found by the LDA analysis, define centroids in the meaning space. We calculate the distances between every article and centroid and use the inverted, scaled distances between these centroids and articles to create an alternate topic model. We benchmark this model against the LDA model results and show that this embeddings model recovers most of the trends from that analysis. Finally, to illustrate the versatility of the method we define 8 new topic centroids derived from a review of the physics education research literature by Docktor and Mestre (2014) and re-analyze the literature using these researcher-defined topics. Based on these analyses, we critically discuss the features, uses, and limitations of this method and argue that it holds promise for flexible deductive qualitative analysis of a wide variety of text-based data that avoids many of the drawbacks inherent to prior NLP methods.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
MLComp: A Methodology for Machine Learning-based Performance Estimation and Adaptive Selection of Pareto-Optimal Compiler Optimization Sequences
Authors:
Alessio Colucci,
Dávid Juhász,
Martin Mosbeck,
Alberto Marchisio,
Semeen Rehman,
Manfred Kreutzer,
Guenther Nadbath,
Axel Jantsch,
Muhammad Shafique
Abstract:
Embedded systems have proliferated in various consumer and industrial applications with the evolution of Cyber-Physical Systems and the Internet of Things. These systems are subjected to stringent constraints so that embedded software must be optimized for multiple objectives simultaneously, namely reduced energy consumption, execution time, and code size. Compilers offer optimization phases to im…
▽ More
Embedded systems have proliferated in various consumer and industrial applications with the evolution of Cyber-Physical Systems and the Internet of Things. These systems are subjected to stringent constraints so that embedded software must be optimized for multiple objectives simultaneously, namely reduced energy consumption, execution time, and code size. Compilers offer optimization phases to improve these metrics. However, proper selection and ordering of them depends on multiple factors and typically requires expert knowledge. State-of-the-art optimizers facilitate different platforms and applications case by case, and they are limited by optimizing one metric at a time, as well as requiring a time-consuming adaptation for different targets through dynamic profiling.
To address these problems, we propose the novel MLComp methodology, in which optimization phases are sequenced by a Reinforcement Learning-based policy. Training of the policy is supported by Machine Learning-based analytical models for quick performance estimation, thereby drastically reducing the time spent for dynamic profiling. In our framework, different Machine Learning models are automatically tested to choose the best-fitting one. The trained Performance Estimator model is leveraged to efficiently devise Reinforcement Learning-based multi-objective policies for creating quasi-optimal phase sequences.
Compared to state-of-the-art estimation models, our Performance Estimator model achieves lower relative error (<2%) with up to 50x faster training time over multiple platforms and application domains. Our Phase Selection Policy improves execution time and energy consumption of a given code by up to 12% and 6%, respectively. The Performance Estimator and the Phase Selection Policy can be trained efficiently for any target platform and application domain.
△ Less
Submitted 11 December, 2020; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects
Authors:
Andreas Alvermann,
Achim Basermann,
Hans-Joachim Bungartz,
Christian Carbogno,
Dominik Ernst,
Holger Fehske,
Yasunori Futamura,
Martin Galgon,
Georg Hager,
Sarah Huber,
Thomas Huckle,
Akihiro Ida,
Akira Imakura,
Masatoshi Kawai,
Simone Köcher,
Moritz Kreutzer,
Pavel Kus,
Bruno Lang,
Hermann Lederer,
Valeriy Manin,
Andreas Marek,
Kengo Nakajima,
Lydia Nemec,
Karsten Reuter,
Michael Rippl
, et al. (8 additional authors not shown)
Abstract:
We first briefly report on the status and recent achievements of the ELPA-AEO (Eigenvalue Solvers for Petaflop Applications - Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highl…
▽ More
We first briefly report on the status and recent achievements of the ELPA-AEO (Eigenvalue Solvers for Petaflop Applications - Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highly parallel methods for the solution of eigenvalue problems. Then we focus on a topic addressed in both projects, the use of mixed precision computations to enhance efficiency. We give a more detailed description of our approaches for benefiting from either lower or higher precision in three selected contexts and of the results thus obtained.
△ Less
Submitted 4 June, 2018;
originally announced June 2018.
-
Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs
Authors:
Moritz Kreutzer,
Georg Hager,
Dominik Ernst,
Holger Fehske,
Alan R. Bishop,
Gerhard Wellein
Abstract:
Chebyshev filter diagonalization is well established in quantum chemistry and quantum physics to compute bulks of eigenvalues of large sparse matrices. Choosing a block vector implementation, we investigate optimization opportunities on the new class of high-performance compute devices featuring both high-bandwidth and low-bandwidth memory. We focus on the transparent access to the full address sp…
▽ More
Chebyshev filter diagonalization is well established in quantum chemistry and quantum physics to compute bulks of eigenvalues of large sparse matrices. Choosing a block vector implementation, we investigate optimization opportunities on the new class of high-performance compute devices featuring both high-bandwidth and low-bandwidth memory. We focus on the transparent access to the full address space supported by both architectures under consideration: Intel Xeon Phi "Knights Landing" and Nvidia "Pascal."
We propose two optimizations: (1) Subspace blocking is applied for improved performance and data access efficiency. We also show that it allows transparently handling problems much larger than the high-bandwidth memory without significant performance penalties. (2) Pipelining of communication and computation phases of successive subspaces is implemented to hide communication costs without extra memory traffic.
As an application scenario we use filter diagonalization studies on topological insulator materials. Performance numbers on up to 512 nodes of the OakForest-PACS and Piz Daint supercomputers are presented, achieving beyond 100 Tflop/s for computing 100 inner eigenvalues of sparse matrices of dimension one billion.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.
-
CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance
Authors:
Faisal Shahzad,
Jonas Thies,
Moritz Kreutzer,
Thomas Zeiser,
Georg Hager,
Gerhard Wellein
Abstract:
In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has been and still is the most widely used technique to deal with hard failures. Application-level CR is the most effective CR technique in terms of overhead efficiency but…
▽ More
In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has been and still is the most widely used technique to deal with hard failures. Application-level CR is the most effective CR technique in terms of overhead efficiency but it takes a lot of implementation effort. This work presents the implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic Fault Tolerance), which serves two purposes. First, it provides an extendable library that significantly eases the implementation of application-level checkpointing. The most basic and frequently used checkpoint data types are already part of CRAFT and can be directly used out of the box. The library can be easily extended to add more data types. As means of overhead reduction, the library offers a build-in asynchronous checkpointing mechanism and also supports the Scalable Checkpoint/Restart (SCR) library for node level checkpointing. Second, CRAFT provides an easier interface for User-Level Failure Mitigation (ULFM) based dynamic process recovery, which significantly reduces the complexity and effort of failure detection and communication recovery mechanism. By utilizing both functionalities together, applications can write application-level checkpoints and recover dynamically from process failures with very limited programming effort. This work presents the design and use of our library in detail. The associated overheads are thoroughly analyzed using several benchmarks.
△ Less
Submitted 7 August, 2017;
originally announced August 2017.
-
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations
Authors:
Andreas Pieper,
Moritz Kreutzer,
Andreas Alvermann,
Martin Galgon,
Holger Fehske,
Georg Hager,
Bruno Lang,
Gerhard Wellein
Abstract:
We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diag…
▽ More
We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need for matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the $10^2$ innermost eigenpairs of a topological insulator matrix with dimension $10^9$ derived from quantum physics applications.
△ Less
Submitted 10 May, 2016; v1 submitted 16 October, 2015;
originally announced October 2015.
-
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
Authors:
Moritz Kreutzer,
Jonas Thies,
Melven Röhrig-Zöllner,
Andreas Pieper,
Faisal Shahzad,
Martin Galgon,
Achim Basermann,
Holger Fehske,
Georg Hager,
Gerhard Wellein
Abstract:
While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such…
▽ More
While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.
△ Less
Submitted 15 February, 2016; v1 submitted 29 July, 2015;
originally announced July 2015.
-
Building a fault tolerant application using the GASPI communication layer
Authors:
Faisal Shahzad,
Moritz Kreutzer,
Thomas Zeiser,
Rui Machado,
Andreas Pieper,
Georg Hager,
Gerhard Wellein
Abstract:
It is commonly agreed that highly parallel software on Exascale computers will suffer from many more runtime failures due to the decreasing trend in the mean time to failures (MTTF). Therefore, it is not surprising that a lot of research is going on in the area of fault tolerance and fault mitigation. Applications should survive a failure and/or be able to recover with minimal cost. MPI is not yet…
▽ More
It is commonly agreed that highly parallel software on Exascale computers will suffer from many more runtime failures due to the decreasing trend in the mean time to failures (MTTF). Therefore, it is not surprising that a lot of research is going on in the area of fault tolerance and fault mitigation. Applications should survive a failure and/or be able to recover with minimal cost. MPI is not yet very mature in handling failures, the User-Level Failure Mitigation (ULFM) proposal being currently the most promising approach is still in its prototype phase. In our work we use GASPI, which is a relatively new communication library based on the PGAS model. It provides the missing features to allow the design of fault-tolerant applications. Instead of introducing algorithm-based fault tolerance in its true sense, we demonstrate how we can build on (existing) clever checkpointing and extend applications to allow integrate a low cost fault detection mechanism and, if necessary, recover the application on the fly. The aspects of process management, the restoration of groups and the recovery mechanism is presented in detail. We use a sparse matrix vector multiplication based application to perform the analysis of the overhead introduced by such modifications. Our fault detection mechanism causes no overhead in failure-free cases, whereas in case of failure(s), the failure detection and recovery cost is of reasonably acceptable order and shows good scalability.
△ Less
Submitted 18 May, 2015;
originally announced May 2015.
-
Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems
Authors:
Moritz Kreutzer,
Georg Hager,
Gerhard Wellein,
Andreas Pieper,
Andreas Alvermann,
Holger Fehske
Abstract:
The Kernel Polynomial Method (KPM) is a well-established scheme in quantum physics and quantum chemistry to determine the eigenvalue density and spectral properties of large sparse matrices. In this work we demonstrate the high optimization potential and feasibility of peta-scale heterogeneous CPU-GPU implementations of the KPM. At the node level we show that it is possible to decouple the sparse…
▽ More
The Kernel Polynomial Method (KPM) is a well-established scheme in quantum physics and quantum chemistry to determine the eigenvalue density and spectral properties of large sparse matrices. In this work we demonstrate the high optimization potential and feasibility of peta-scale heterogeneous CPU-GPU implementations of the KPM. At the node level we show that it is possible to decouple the sparse matrix problem posed by KPM from main memory bandwidth both on CPU and GPU. To alleviate the effects of scattered data access we combine loosely coupled outer iterations with tightly coupled block sparse matrix multiple vector operations, which enables pure data streaming. All optimizations are guided by a performance analysis and modelling process that indicates how the computational bottlenecks change with each optimization step. Finally we use the optimized node-level KPM with a hybrid-parallel framework to perform large scale heterogeneous electronic structure calculations for novel topological materials on a petascale-class Cray XC30 system.
△ Less
Submitted 29 July, 2015; v1 submitted 20 October, 2014;
originally announced October 2014.
-
Droplets on Inclined Plates: Local and Global Hysteresis of Pinned Capillary Surfaces
Authors:
Michiel Musterd,
Volkert van Steijn,
Chris R. Kleijn,
Michiel T. Kreutzer
Abstract:
Local contact line pinning prevents droplets from rearranging to minimal global energy, and models for droplets without pinning cannot predict their shape. We show that experiments are much better described by a theory, developed herein, that does account for the constrained contact line motion, using as example droplets on tilted plates. We map out their shapes in suitable phase spaces. For 2D dr…
▽ More
Local contact line pinning prevents droplets from rearranging to minimal global energy, and models for droplets without pinning cannot predict their shape. We show that experiments are much better described by a theory, developed herein, that does account for the constrained contact line motion, using as example droplets on tilted plates. We map out their shapes in suitable phase spaces. For 2D droplets, the critical point of maximum tilt depends on the hysteresis range and Bond number. In 3D, it also depends on the initial width, highlighting the importance of the deposition history.
△ Less
Submitted 27 July, 2014;
originally announced July 2014.
-
Droplets on a Tilted Plate
Authors:
Michiel Musterd,
Volkert van Steijn,
Chris R. Kleijn,
Michiel T. Kreutzer
Abstract:
In this short paper we present a fluid dynamics video of the deformation of droplets when tilted on a motorized stage. This is a submission to the 2013 Gallery of Fluid Motion which is part of the 66th annual meeting of APS-DFD. The video shows how differently placed droplets on the same surface will show a universal behaviour when tilted back and forth.
In this short paper we present a fluid dynamics video of the deformation of droplets when tilted on a motorized stage. This is a submission to the 2013 Gallery of Fluid Motion which is part of the 66th annual meeting of APS-DFD. The video shows how differently placed droplets on the same surface will show a universal behaviour when tilted back and forth.
△ Less
Submitted 11 October, 2013;
originally announced October 2013.
-
A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units
Authors:
Moritz Kreutzer,
Georg Hager,
Gerhard Wellein,
Holger Fehske,
Alan R. Bishop
Abstract:
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instructi…
▽ More
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-C-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from General Purpose Graphics Processing Units (GPGPUs) and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage (CRS) and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.
△ Less
Submitted 5 March, 2014; v1 submitted 23 July, 2013;
originally announced July 2013.
-
Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation
Authors:
Moritz Kreutzer,
Georg Hager,
Gerhard Wellein,
Holger Fehske,
Achim Basermann,
Alan R. Bishop
Abstract:
Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In our test sc…
▽ More
Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 95% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation.
△ Less
Submitted 29 February, 2012; v1 submitted 23 December, 2011;
originally announced December 2011.