Skip to main content

Showing 1–20 of 20 results for author: Casas, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03522  [pdf, ps, other

    cs.AR cs.LG

    A Flexible Instruction Set Architecture for Efficient GEMMs

    Authors: Alexandre de Limas Santana, Adrià Armejach, Francesc Martinez, Erich Focht, Marc Casas

    Abstract: GEneral Matrix Multiplications (GEMMs) are recurrent in high-performance computing and deep learning workloads. Typically, high-end CPUs accelerate GEMM workloads with Single-Instruction Multiple Data (SIMD) or vector Instruction Set Architectures (ISAs). Since these ISAs face significant issues when running GEMM workloads, particularly when dealing with small, tall, or skinny matrices, matrix ISA… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    ACM Class: C.1.0

  2. arXiv:2412.13235  [pdf, ps, other

    cs.AI cs.DM

    Logic-Constrained Shortest Paths for Flight Planning

    Authors: Ricardo Euler, Pedro Maristany de las Casas, Ralf Borndörfer

    Abstract: The logic-constrained shortest path problem (LCSPP) combines a one-to-one shortest path problem with satisfiability constraints imposed on the routing graph. This setting arises in flight planning, where air traffic control (ATC) authorities are enforcing a set of traffic flow restrictions (TFRs) on aircraft routes in order to increase safety and throughput. We propose a new branch and bound-based… ▽ More

    Submitted 11 June, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

  3. arXiv:2406.02579  [pdf, other

    cs.MS cs.AI cs.AR cs.LG math.NA

    An Open-Source Framework for Efficient Numerically-Tailored Computations

    Authors: Louis Ledoux, Marc Casas

    Abstract: We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic datapath generation, enabling highly customizable systolic MMM kernels; second, seamless integration of the generated kernels into user code, irrespective of t… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

    Comments: 6 pages, open-source

    Journal ref: International Conference on Field Programmable Logic and Applications 2023

  4. A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering

    Authors: Alexandre Valentin Jamet, Georgios Vavouliotis, Daniel A. Jiménez, Lluc Alvarez, Marc Casas

    Abstract: To alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether an access will be off-chip with adaptive prefetch filtering at the first-level data cache (L1D). TLP is composed of two connected microarchitectural perceptron predictors, name… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: To appear in 30th International Symposium on High-Performance Computer Architecture (HPCA), 2024

  5. arXiv:2309.10377  [pdf, other

    cs.DS cs.DM

    K-Shortest Simple Paths Using Biobjective Path Search

    Authors: Pedro Maristany de las Casas, Antonio Sedeño-Noda, Ralf Borndörfer, Max Huneshagen

    Abstract: In this paper we introduce a new algorithm for the \emph{$k$-Shortest Simple Paths} (\kspp{k}) problem with an asymptotic running time matching the state of the art from the literature. It is based on a black-box algorithm due to \citet{Roditty12} that solves at most $2k$ instances of the \emph{Second Shortest Simple Path} (\kspp{2}) problem without specifying how this is done. We fill this gap us… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    MSC Class: 90C99 ACM Class: G.4; G.2.2

  6. arXiv:2309.07158  [pdf, other

    cs.LG cs.AR cs.PF

    Compressed Real Numbers for AI: a case-study using a RISC-V CPU

    Authors: Federico Rossi, Marco Cococcioni, Roger Ferrer Ibàñez, Jesùs Labarta, Filippo Mantovani, Marc Casas, Emanuele Ruffaldi, Sergio Saponara

    Abstract: As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machin… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  7. Labeling Methods for Partially Ordered Paths

    Authors: Ricardo Euler, Pedro Maristany de las Casas

    Abstract: The landscape of applications and subroutines relying on shortest path computations continues to grow steadily. This growth is driven by the undeniable success of shortest path algorithms in theory and practice. It also introduces new challenges as the models and assessing the optimality of paths become more complicated. Hence, multiple recent publications in the field adapt existing labeling meth… ▽ More

    Submitted 12 August, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

    Journal ref: European Journal of Operational Research, Volume 318, Issue 1, 1 October 2024, Pages 19-30

  8. arXiv:2306.16203  [pdf, other

    cs.DM

    New Dynamic Programming Algorithm for the Multiobjective Minimum Spanning Tree Problem

    Authors: Pedro Maristany de las Casas, Antonio Sedeño-Noda, Ralf Borndörfer

    Abstract: The Multiobjective Minimum Spanning Tree (MO-MST) problem is a variant of the Minimum Spanning Tree problem, in which the costs associated with every edge of the input graph are vectors. In this paper, we design a new dynamic programming MO-MST algorithm. Dynamic programming for a MO-MST instance leads to the definition of an instance of the One-to-One Multiobjective Shortest Path (MOSP) problem a… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: 35 pages; 30 pages without appendix. 4 Tables, 13 Figures

    MSC Class: 90C29 ACM Class: G.2.2

  9. arXiv:2305.18328  [pdf, other

    cs.AR

    Open-Source GEMM Hardware Kernels Generator: Toward Numerically-Tailored Computations

    Authors: Louis Ledoux, Marc Casas

    Abstract: Many scientific computing problems can be reduced to Matrix-Matrix Multiplications (MMM), making the General Matrix Multiply (GEMM) kernels in the Basic Linear Algebra Subroutine (BLAS) of interest to the high-performance computing community. However, these workloads have a wide range of numerical requirements. Ill-conditioned linear systems require high-precision arithmetic to ensure correct and… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  10. arXiv:2305.06696  [pdf, other

    cs.AR cs.DS

    Characterizing the impact of last-level cache replacement policies on big-data workloads

    Authors: Alexandre Valentin Jamet, Lluc Alvarez, Marc Casas

    Abstract: In recent years, graph-processing has become an essential class of workloads with applications in a rapidly growing number of fields. Graph-processing typically uses large input sets, often in multi-gigabyte scale, and data-dependent graph traversal methods exhibiting irregular memory access patterns. Recent work demonstrates that, due to the highly irregular memory access patterns of data-depende… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: Extended abstract submitted to the 10th BSC Doctoral Symposium

  11. arXiv:2303.02471  [pdf, other

    cs.DC

    Optimization of SpGEMM with Risc-V vector instructions

    Authors: Valentin Le Fèvre, Marc Casas

    Abstract: The Sparse GEneral Matrix-Matrix multiplication (SpGEMM) $C = A \times B$ is a fundamental routine extensively used in domains like machine learning or graph analytics. Despite its relevance, the efficient execution of SpGEMM on vector architectures is a relatively unexplored topic. The most recent algorithm to run SpGEMM on these architectures is based on the SParse Accumulator (SPA) approach, an… ▽ More

    Submitted 2 June, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

  12. arXiv:2211.08272  [pdf, other

    cs.LG cs.AI

    Low-Thrust Orbital Transfer using Dynamics-Agnostic Reinforcement Learning

    Authors: Carlos M. Casas, Belen Carro, Antonio Sanchez-Esguevillas

    Abstract: Low-thrust trajectory design and in-flight control remain two of the most challenging topics for new-generation satellite operations. Most of the solutions currently implemented are based on reference trajectories and lead to sub-optimal fuel usage. Other solutions are based on simple guidance laws that need to be updated periodically, increasing the cost of operations. Whereas some optimization s… ▽ More

    Submitted 6 October, 2022; originally announced November 2022.

  13. arXiv:2202.09288  [pdf, other

    cs.DC

    Optimization of the Sparse Multi-Threaded Cholesky Factorization for A64FX

    Authors: Valentin Le Fèvre, Tetsuzo Usui, Marc Casas

    Abstract: Sparse linear algebra routines are fundamental building blocks of a large variety of scientific applications. Direct solvers, which are methods for solving linear systems via the factorization of matrices into products of triangular matrices, are commonly used in many contexts. The Cholesky factorization is the fastest direct method for symmetric and definite positive matrices. This paper presents… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

  14. arXiv:2110.10978  [pdf, other

    cs.DM

    Targeted Multiobjective Dijkstra Algorithm

    Authors: Pedro Maristany de las Casas, Luitgard Kraus, Antonio Sedeño-Noda, Ralf Borndörfer

    Abstract: In this paper, we introduce the Targeted Multiobjective Dijkstra Algorithm (T-MDA), a label setting algorithm for the One-to-One Multiobjective Shortest Path (MOSP) Problem. The T-MDA is based on the recently published Multiobjective Dijkstra Algorithm (MDA) and equips it with A*-like techniques. The resulting speedup is comparable to the speedup that the original A* algorithm achieves for Dijkstr… ▽ More

    Submitted 17 December, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: 20 pages, 58 figures, 10 tables

    MSC Class: 90C29; 90C35; 68W99 ACM Class: G.2.2

  15. arXiv:2009.08698  [pdf, other

    cs.NE cs.LG

    Generating Efficient DNN-Ensembles with Evolutionary Computation

    Authors: Marc Ortiz, Florian Scheidegger, Marc Casas, Cristiano Malossi, Eduard Ayguadé

    Abstract: In this work, we leverage ensemble learning as a tool for the creation of faster, smaller, and more accurate deep learning models. We demonstrate that we can jointly optimize for accuracy, inference time, and the number of parameters by combining DNN classifiers. To achieve this, we combine multiple ensemble strategies: bagging, boosting, and an ordered chain of classifiers. To reduce the number o… ▽ More

    Submitted 3 May, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

    Comments: 8 pages

  16. arXiv:2004.02297  [pdf, other

    cs.DC

    Reducing Data Motion to Accelerate the Training of Deep Neural Networks

    Authors: Sicong Zhuang, Cristiano Malossi, Marc Casas

    Abstract: This paper reduces the cost of DNNs training by decreasing the amount of data movement across heterogeneous architectures composed of several GPUs and multicore CPU devices. In particular, this paper proposes an algorithm to dynamically adapt the data representation format of network weights during training. This algorithm drives a compression procedure that reduces data size before sending them o… ▽ More

    Submitted 5 April, 2020; originally announced April 2020.

  17. Memory Vulnerability: A Case for Delaying Error Reporting

    Authors: Luc Jaulmes, Miquel Moretó, Mateo Valero, Marc Casas

    Abstract: To face future reliability challenges, it is necessary to quantify the risk of error in any part of a computing system. To this goal, the Architectural Vulnerability Factor (AVF) has long been used for chips. However, this metric is used for offline characterisation, which is inappropriate for memory. We survey the literature and formalise one of the metrics used, the Memory Vulnerability Factor,… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

  18. arXiv:1804.05267  [pdf, other

    cs.LG cs.NE stat.ML

    Low-Precision Floating-Point Schemes for Neural Network Training

    Authors: Marc Ortiz, Adrián Cristal, Eduard Ayguadé, Marc Casas

    Abstract: The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of performance and energy efficiency. In the first part of this paper, the behaviour of the 12-bit fixed-point arithmetic when training a convolutional neural network w… ▽ More

    Submitted 14 April, 2018; originally announced April 2018.

    Comments: 16 pages, 9 figures and 4 tables

    ACM Class: I.2.6; I.5

  19. arXiv:1707.08951  [pdf, ps, other

    cs.CV

    Handwritten character recognition using some (anti)-diagonal structural features

    Authors: José Manuel Casas, Nick Inassaridze, Manuel Ladra, Susana Ladra

    Abstract: In this paper, we present a methodology for off-line handwritten character recognition. The proposed methodology relies on a new feature extraction technique based on structural characteristics, histograms and profiles. As novelty, we propose the extraction of new eight histograms and four profiles from the $32\times 32$ matrices that represent the characters, creating 256-dimension feature vector… ▽ More

    Submitted 14 February, 2018; v1 submitted 27 July, 2017; originally announced July 2017.

    Comments: Revised version with a number of improvements and update references, 9 pages

  20. arXiv:1501.02282  [pdf, other

    cs.DC

    Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7

    Authors: David Prat, Cristobal Ortega, Marc Casas, Miquel Moretó, Mateo Valero

    Abstract: Hardware data prefetcher engines have been extensively used to reduce the impact of memory latency. However, microprocessors' hardware prefetcher engines do not include any automatic hardware control able to dynamically tune their operation. This lacking architectural feature causes systems to operate with prefetchers in a fixed configuration, which in many cases harms performance and energy consu… ▽ More

    Submitted 9 January, 2015; originally announced January 2015.

    Comments: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347)

    Report number: ADAPT/2015/07