Search | arXiv e-print repository

Motivando el uso y aprendizaje de Bash a través de concursos de programación

Authors: Luis Costero, Jorge Villarrubia, Francisco D. Igual

Abstract: Command line learning and Bash usage are fundamental skills in systems administration, software development, and data science environments. However, their teaching has been neglected in many curricula, despite its relevance in the professional field. To address this gap, we developed an interactive competition that encourages students to improve their Bash skills through practical and competitive… ▽ More Command line learning and Bash usage are fundamental skills in systems administration, software development, and data science environments. However, their teaching has been neglected in many curricula, despite its relevance in the professional field. To address this gap, we developed an interactive competition that encourages students to improve their Bash skills through practical and competitive challenges. This gamified approach seeks to motivate autonomous learning and reinforce command line proficiency in a dynamic context. The results have been promising: of the 26 participating students, 85% considered the activity useful to improve their knowledge, and 71% expressed the need to delve deeper into Bash for their academic and professional future. These findings suggest that such initiatives may be an effective strategy to foster Bash learning in academic settings. △ Less

Submitted 30 May, 2025; originally announced June 2025.

Comments: in Spanish language. XXXV Jornadas de Paralelismo (JP2025). Jornadas Sarteco

arXiv:2503.01035 [pdf, other]

doi 10.1007/s11227-024-06605-9

Balanced segmentation of CNNs for multi-TPU inference

Authors: Jorge Villarrubia, Luis Costero, Francisco D. Igual, Katzalin Olcoz

Abstract: In this paper, we propose different alternatives for convolutional neural networks (CNNs) segmentation, addressing inference processes on computing architectures composed by multiple Edge TPUs. Specifically, we compare the inference performance for a number of state-of-the-art CNN models taking as a reference inference times on one TPU and a compiler-based pipelined inference implementation as pro… ▽ More In this paper, we propose different alternatives for convolutional neural networks (CNNs) segmentation, addressing inference processes on computing architectures composed by multiple Edge TPUs. Specifically, we compare the inference performance for a number of state-of-the-art CNN models taking as a reference inference times on one TPU and a compiler-based pipelined inference implementation as provided by the Google's Edge TPU compiler. Departing from a profiled-based segmentation strategy, we provide further refinements to balance the workload across multiple TPUs, leveraging their cooperative computing power, reducing work imbalance and alleviating the memory access bottleneck due to the limited amount of on-chip memory per TPU. The observed performance results compared with a single TPU yield superlinear speedups and accelerations up to 2.60x compared with the segmentation offered by the compiler targeting multiple TPUs. △ Less

Submitted 2 March, 2025; originally announced March 2025.

Comments: Accepted for publication at The Journal of Supercomputing. The final published version is available in: https://doi.org/10.1007/s11227-024-06605-9

ACM Class: C.1.3; B.8

Journal ref: The Journal of Supercomputing, 2025

arXiv:2503.01025 [pdf, other]

doi 10.1109/PDP59025.2023.00020

Improving inference time in multi-TPU systems with profiled model segmentation

Authors: Jorge Villarrubia, Luis Costero, Francisco D. Igual, Katzalin Olcoz

Abstract: In this paper, we systematically evaluate the inference performance of the Edge TPU by Google for neural networks with different characteristics. Specifically, we determine that, given the limited amount of on-chip memory on the Edge TPU, accesses to external (host) memory rapidly become an important performance bottleneck. We demonstrate how multiple devices can be jointly used to alleviate the b… ▽ More In this paper, we systematically evaluate the inference performance of the Edge TPU by Google for neural networks with different characteristics. Specifically, we determine that, given the limited amount of on-chip memory on the Edge TPU, accesses to external (host) memory rapidly become an important performance bottleneck. We demonstrate how multiple devices can be jointly used to alleviate the bottleneck introduced by accessing the host memory. We propose a solution combining model segmentation and pipelining on up to four TPUs, with remarkable performance improvements that range from $6\times$ for neural networks with convolutional layers to $46\times$ for fully connected layers, compared with single-TPU setups. △ Less

Submitted 2 March, 2025; originally announced March 2025.

Comments: Accepted for publication at the 2023 Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). The final published version is available in IEEE Xplore: https://doi.org/10.1109/PDP59025.2023.00020

ACM Class: C.1.3; B.8

Journal ref: 2023 Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

arXiv:2402.06319 [pdf, other]

doi 10.1109/HPCS.2017.67

Energy efficiency optimization of task-parallel codes on asymmetric architectures

Authors: Luis Costero, Francisco D. Igual, Katzalin Olcoz, Francisco Tirado

Abstract: We present a family of policies that, integrated within a runtime task scheduler (Nanox), pursue the goal of improving the energy efficiency of task-parallel executions with no intervention from the programmer. The proposed policies tackle the problem by modifying the core operating frequency via DVFS mechanisms, or by enabling/disabling the mapping of tasks to specific cores at selected execution… ▽ More We present a family of policies that, integrated within a runtime task scheduler (Nanox), pursue the goal of improving the energy efficiency of task-parallel executions with no intervention from the programmer. The proposed policies tackle the problem by modifying the core operating frequency via DVFS mechanisms, or by enabling/disabling the mapping of tasks to specific cores at selected execution points, depending on the internal status of the scheduler. Experimental results on an asymmetric SoC (Exynos 5422) and for a specific operation (Cholesky factorization) reveal gains up to 29% in terms of energy efficiency and considerable reductions in average power. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.04938 [pdf]

doi 10.1016/j.entcom.2016.08.002

An approach to automated videogame beta testing

Authors: Jennifer Hernández-Bécares, Luis Costero, Pedro Pablo Gómez-Martín

Abstract: Videogames developed in the 1970s and 1980s were modest programs created in a couple of months by a single person, who played the roles of designer, artist and programmer. Since then, videogames have evolved to become a multi-million dollar industry. Today, AAA game development involves hundreds of people working together over several years. Management and engineering requirements have changed at… ▽ More Videogames developed in the 1970s and 1980s were modest programs created in a couple of months by a single person, who played the roles of designer, artist and programmer. Since then, videogames have evolved to become a multi-million dollar industry. Today, AAA game development involves hundreds of people working together over several years. Management and engineering requirements have changed at the same pace. Although many of the processes have been adapted over time, this is not quite true for quality assurance tasks, which are still done mainly manually by human beta testers due to the specific peculiarities of videogames. This paper presents an approach to automate this beta testing. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Journal ref: Entertainment Computing, Elsevier. 18. pp 79 to 92. (2017)

arXiv:2402.04891 [pdf, other]

doi 10.1007/s11227-019-03117-9

Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding

Authors: Luis Costero, Francisco D. Igual, Katzalin Olcoz, Francisco Tirado

Abstract: The coexistence of parallel applications in shared computing nodes, each one featuring different Quality of Service (QoS) requirements, carries out new challenges to improve resource occupation while keeping acceptable rates in terms of QoS. As more application-specific and system-wide metrics are included as QoS dimensions, or under situations in which resource-usage limits are strict, building a… ▽ More The coexistence of parallel applications in shared computing nodes, each one featuring different Quality of Service (QoS) requirements, carries out new challenges to improve resource occupation while keeping acceptable rates in terms of QoS. As more application-specific and system-wide metrics are included as QoS dimensions, or under situations in which resource-usage limits are strict, building and serving the most appropriate set of actions (application control knobs and system resource assignment) to concurrent applications in an automatic and optimal fashion becomes mandatory. In this paper, we propose strategies to build and serve this type of knowledge to concurrent applications by leveraging Reinforcement Learning techniques. Taking multi-user video transcoding as a driving example, our experimental results reveal an excellent adaptation of resource and knob management to heterogeneous QoS requests, and increases in the amount of concurrently served users up to 1.24x compared with alternative approaches considering homogeneous QoS requests. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Journal ref: Journal of Supercomputing 76, pp. 9388 to 9403 (2020)

arXiv:2402.04090 [pdf]

doi 10.1002/cta.2552

Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures

Authors: Alberto Corpas, Luis Costero, Guillermo Botella, Francisco D. Igual, Carlos García, Manuel Rodríguez

Abstract: This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost Asymmetric Multicore Processors (AMPs) with limited power budget. A modelling and task scheduling/allocation is proposed in order to efficiently make use of the existing features on big.LITTLE ARM proce… ▽ More This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost Asymmetric Multicore Processors (AMPs) with limited power budget. A modelling and task scheduling/allocation is proposed in order to efficiently make use of the existing features on big.LITTLE ARM processors, including: (I) source-code adaptation for parallel computing, which enables code acceleration by applying the OmpSs programming model, a task-based programming model that handles data-dependencies between tasks in a transparent fashion; (II) different OmpSs task allocation policies which take into account the processor asymmetry and can dynamically set processing resources in a more efficient way based on their particular features. The proposed mechanism can be efficiently applied to take advantage of the processing elements existing on low-cost and low-energy multi-core embedded devices executing object detection algorithms based on cascading classifiers. Although these classifiers yield the best results for detection algorithms in the field of computer vision, their high computational requirements prevent them from being used on these devices under real-time requirements. Finally, we compare the energy efficiency of a heterogeneous architecture based on asymmetric multicore processors with a suitable task scheduling, with that of a homogeneous symmetric architecture. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Journal ref: International Journal of Circuit Theory and Applications. 2018. 46, pp 1756 1776

arXiv:2309.16333 [pdf, other]

CloudProphet: A Machine Learning-Based Performance Prediction for Public Clouds

Authors: Darong Huang, Luis Costero, Ali Pahlevan, Marina Zapater, David Atienza

Abstract: Computing servers have played a key role in developing and processing emerging compute-intensive applications in recent years. Consolidating multiple virtual machines (VMs) inside one server to run various applications introduces severe competence for limited resources among VMs. Many techniques such as VM scheduling and resource provisioning are proposed to maximize the cost-efficiency of the com… ▽ More Computing servers have played a key role in developing and processing emerging compute-intensive applications in recent years. Consolidating multiple virtual machines (VMs) inside one server to run various applications introduces severe competence for limited resources among VMs. Many techniques such as VM scheduling and resource provisioning are proposed to maximize the cost-efficiency of the computing servers while alleviating the performance inference between VMs. However, these management techniques require accurate performance prediction of the application running inside the VM, which is challenging to get in the public cloud due to the black-box nature of the VMs. From this perspective, this paper proposes a novel machine learning-based performance prediction approach for applications running in the cloud. To achieve high accuracy predictions for black-box VMs, the proposed method first identifies the running application inside the virtual machine. It then selects highly-correlated runtime metrics as the input of the machine learning approach to accurately predict the performance level of the cloud application. Experimental results with state-of-the-art cloud benchmarks demonstrate that our proposed method outperforms the existing prediction methods by more than 2x in terms of worst prediction error. In addition, we successfully tackle the challenge in performance prediction for applications with variable workloads by introducing the performance degradation index, which other comparison methods fail to consider. The workflow versatility of the proposed approach has been verified with different modern servers and VM configurations. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 15 pages, 11 figures, summited to IEEE Transactions on Sustainable Computing

arXiv:1509.02058 [pdf, other]

Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra

Authors: Luis Costero, Francisco D. Igual, Katzalin Olcoz, Enrique S. Quintana-Ortí

Abstract: Dealing with asymmetry in the architecture opens a plethora of questions from the perspective of scheduling task-parallel applications, and there exist early attempts to address this problem via ad-hoc strategies embedded into a runtime framework. In this paper we take a different path, which consists in addressing the complexity of the problem at the library level, via a few asymmetry-aware funda… ▽ More Dealing with asymmetry in the architecture opens a plethora of questions from the perspective of scheduling task-parallel applications, and there exist early attempts to address this problem via ad-hoc strategies embedded into a runtime framework. In this paper we take a different path, which consists in addressing the complexity of the problem at the library level, via a few asymmetry-aware fundamental kernels, hiding the architecture heterogeneity from the task scheduler. For the specific domain of dense linear algebra, we show that this is not only possible but delivers much higher performance than a naive approach based on an asymmetry-oblivious scheduler. Furthermore, this solution also outperforms an ad-hoc asymmetry-aware scheduler furnished with sophisticated scheduling techniques. △ Less

Submitted 7 September, 2015; originally announced September 2015.

Showing 1–9 of 9 results for author: Costero, L