-
Customizing Pareto Simulated Annealing for Multi-objective Optimization of Control Cabinet Layout
Authors:
Sabri Pllana,
Suejb Memeti,
Joanna Kolodziej
Abstract:
Determining the optimal location of control cabinet components requires the exploration of a large configuration space. For real-world control cabinets it is impractical to evaluate all possible cabinet configurations. Therefore, we need to apply methods for intelligent exploration of cabinet configuration space that enable to find a near-optimal configuration without evaluation of all possible co…
▽ More
Determining the optimal location of control cabinet components requires the exploration of a large configuration space. For real-world control cabinets it is impractical to evaluate all possible cabinet configurations. Therefore, we need to apply methods for intelligent exploration of cabinet configuration space that enable to find a near-optimal configuration without evaluation of all possible configurations. In this paper, we describe an approach for multi-objective optimization of control cabinet layout that is based on Pareto Simulated Annealing. Optimization aims at minimizing the total wire length used for interconnection of components and the heat convection within the cabinet. We simulate heat convection to study the warm air flow within the control cabinet and determine the optimal position of components that generate heat during the operation. We evaluate and demonstrate the effectiveness of our approach empirically for various control cabinet sizes and usage scenarios.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures
Authors:
Andre Viebke,
Sabri Pllana,
Suejb Memeti,
Joanna Kolodziej
Abstract:
Many complex problems, such as natural language processing or visual object detection, are solved using deep learning. However, efficient training of complex deep convolutional neural networks for large data sets is computationally demanding and requires parallel computing resources. In this paper, we present two parameterized performance models for estimation of execution time of training convolu…
▽ More
Many complex problems, such as natural language processing or visual object detection, are solved using deep learning. However, efficient training of complex deep convolutional neural networks for large data sets is computationally demanding and requires parallel computing resources. In this paper, we present two parameterized performance models for estimation of execution time of training convolutional neural networks on the Intel many integrated core architecture. While for the first performance model we minimally use measurement techniques for parameter value estimation, in the second model we estimate more parameters based on measurements. We evaluate the prediction accuracy of performance models in the context of training three different convolutional neural network architectures on the Intel Xeon Phi. The achieved average performance prediction accuracy is about 15% for the first model and 11% for second model.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Using Meta-heuristics and Machine Learning for Software Optimization of Parallel Computing Systems: A Systematic Literature Review
Authors:
Suejb Memeti,
Sabri Pllana,
Alecio Binotto,
Joanna Kolodziej,
Ivona Brandic
Abstract:
While modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, optimized software execution on parallel computing systems demands consideration of many parameters at compile-time and run-time. Determining the optimal…
▽ More
While modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, optimized software execution on parallel computing systems demands consideration of many parameters at compile-time and run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for software optimization at compile-time and run-time. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of software optimization for parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.
△ Less
Submitted 2 May, 2018; v1 submitted 29 January, 2018;
originally announced January 2018.
-
Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption
Authors:
Suejb Memeti,
Lu Li,
Sabri Pllana,
Joanna Kolodziej,
Christoph Kessler
Abstract:
Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks…
▽ More
Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward.
In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines that was required to parallelize the code using a specific framework. We use our tool x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.
△ Less
Submitted 18 April, 2017;
originally announced April 2017.