Search | arXiv e-print repository

Performances in solving the Bethe-Salpeter equation with the Yambo code

Authors: Petru Milev, Blanca Mellado-Pinto, Muralidhar Nalabothula, Ali Esquembre Kucukalic, Fernando Alvarruiz, Enrique Ramos, Ludger Wirtz, Jose E. Roman, Davide Sangalli

Abstract: In this work, we analyze the performances of two different strategies in solving the structured eigenvalue problem deriving from the Bethe-Salpeter equation (BSE) in condensed matter physics. The first strategy employs direct diagonalization, while the second is based on an iterative solver. The BSE matrix is constructed with the Yambo code, and the two strategies are implemented by interfacing Ya… ▽ More In this work, we analyze the performances of two different strategies in solving the structured eigenvalue problem deriving from the Bethe-Salpeter equation (BSE) in condensed matter physics. The first strategy employs direct diagonalization, while the second is based on an iterative solver. The BSE matrix is constructed with the Yambo code, and the two strategies are implemented by interfacing Yambo with the ScaLAPACK and ELPA libraries for direct diagonalization, and with the SLEPc library for the iterative approach. We consider both the hermitian (Tamm-Dancoff approximation) and pseudo-hermitian forms, addressing dense matrices of three different sizes. A description of the implementation is also provided, with details for the pseudo-hermitian case. Timing and memory utilization are analyzed on both CPU and GPU clusters. The CPU simulations are performed on a local cluster in Rome, while the GPU simulations are performed on the Leonardo HPC cluster of CINECA. Our results demonstrate that it is now feasible to handle dense BSE matrices of the order 10$^5$. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: Submitted to Euro-Par 2025 conference

MSC Class: 81-04 ACM Class: A.0

arXiv:2210.11327 [pdf, other]

Improving Data Quality with Training Dynamics of Gradient Boosting Decision Trees

Authors: Moacir Antonelli Ponti, Lucas de Angelis Oliveira, Mathias Esteban, Valentina Garcia, Juan Martín Román, Luis Argerich

Abstract: Real world datasets contain incorrectly labeled instances that hamper the performance of the model and, in particular, the ability to generalize out of distribution. Also, each example might have different contribution towards learning. This motivates studies to better understanding of the role of data instances with respect to their contribution in good metrics in models. In this paper we propose… ▽ More Real world datasets contain incorrectly labeled instances that hamper the performance of the model and, in particular, the ability to generalize out of distribution. Also, each example might have different contribution towards learning. This motivates studies to better understanding of the role of data instances with respect to their contribution in good metrics in models. In this paper we propose a method based on metrics computed from training dynamics of Gradient Boosting Decision Trees (GBDTs) to assess the behavior of each training example. We focus on datasets containing mostly tabular or structured data, for which the use of Decision Trees ensembles are still the state-of-the-art in terms of performance. Our methods achieved the best results overall when compared with confident learning, direct heuristics and a robust boosting algorithm. We show results on detecting noisy labels in order clean datasets, improving models' metrics in synthetic and real public datasets, as well as on a industry case in which we deployed a model based on the proposed solution. △ Less

Submitted 22 February, 2024; v1 submitted 20 October, 2022; originally announced October 2022.

arXiv:2206.03768 [pdf, ps, other]

Thick-restarted joint Lanczos bidiagonalization for the GSVD

Authors: Fernando Alvarruiz, Carmen Campos, Jose E. Roman

Abstract: The computation of the partial generalized singular value decomposition (GSVD) of large-scale matrix pairs can be approached by means of iterative methods based on expanding subspaces, particularly Krylov subspaces. We consider the joint Lanczos bidiagonalization method, and analyze the feasibility of adapting the thick restart technique that is being used successfully in the context of other line… ▽ More The computation of the partial generalized singular value decomposition (GSVD) of large-scale matrix pairs can be approached by means of iterative methods based on expanding subspaces, particularly Krylov subspaces. We consider the joint Lanczos bidiagonalization method, and analyze the feasibility of adapting the thick restart technique that is being used successfully in the context of other linear algebra problems. Numerical experiments illustrate the effectiveness of the proposed method. We also compare the new method with an alternative solution via equivalent eigenvalue problems, considering accuracy as well as computational performance. The analysis is done using a parallel implementation in the SLEPc library. △ Less

Submitted 12 May, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

arXiv:2012.00506 [pdf, other]

A Parallel Direct Eigensolver for Sequences of Hermitian Eigenvalue Problems with No Tridiagonalization

Authors: Shengguo Li, Xinzhe Wu, Jose E. Roman, Ziyang Yuan, Ruibo Wang, Lizhi Cheng

Abstract: In this paper, a Parallel Direct Eigensolver for Sequences of Hermitian Eigenvalue Problems with no tridiagonalization is proposed, denoted by \texttt{PDESHEP}, and it combines direct methods with iterative methods. \texttt{PDESHEP} first reduces a Hermitian matrix to its banded form, then applies a spectrum slicing algorithm to the banded matrix, and finally computes the eigenvectors of the origi… ▽ More In this paper, a Parallel Direct Eigensolver for Sequences of Hermitian Eigenvalue Problems with no tridiagonalization is proposed, denoted by \texttt{PDESHEP}, and it combines direct methods with iterative methods. \texttt{PDESHEP} first reduces a Hermitian matrix to its banded form, then applies a spectrum slicing algorithm to the banded matrix, and finally computes the eigenvectors of the original matrix via backtransform. Therefore, compared with conventional direct eigensolvers, \texttt{PDESHEP} avoids tridiagonalization, which consists of many memory-bounded operations. In this work, the iterative method in \texttt{PDESHEP} is based on the contour integral method implemented in FEAST. The combination of direct methods with iterative methods for banded matrices requires some efficient data redistribution algorithms both from 2D to 1D and from 1D to 2D data structures. Hence, some two-step data redistribution algorithms are proposed, which can be $10\times$ faster than ScaLAPACK routine \texttt{PXGEMR2D}. For the symmetric self-consistent field (SCF) eigenvalue problems, \texttt{PDESHEP} can be on average $1.25\times$ faster than the state-of-the-art direct solver in ELPA when using $4096$ processes. Numerical results are obtained for dense Hermitian matrices from real applications and large real sparse matrices from the SuiteSparse collection. △ Less

Submitted 18 March, 2022; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: 19 pages and 9 figures

arXiv:2008.01990 [pdf, other]

doi 10.1109/TPDS.2020.3019471

A parallel structured divide-and-conquer algorithm for symmetric tridiagonal eigenvalue problems

Authors: Xia Liao, Shengguo Li, Yutong Lu, Jose E. Roman

Abstract: In this paper, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such mult… ▽ More In this paper, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such multiplications is a rank-structured Cauchy-like matrix. By exploiting this particular property, PSMMA constructs the local matrices by using generators of Cauchy-like matrices without any communication, and further reduces the computation costs by using a structured low-rank approximation algorithm. Thus, both the communication and computation costs are reduced. Experimental results show that both PSMMA and PSDC are highly scalable and scale to 4096 processes at least. PSDC has better scalability than PHDC that was proposed in [J. Comput. Appl. Math. 344 (2018) 512--520] and only scaled to 300 processes for the same matrices. Comparing with \texttt{PDSTEDC} in ScaLAPACK, PSDC is always faster and achieves $1.4$x--$1.6$x speedup for some matrices with few deflations. PSDC is also comparable with ELPA, with PSDC being faster than ELPA when using few processes and a little slower when using many processes. △ Less

Submitted 28 November, 2020; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: 17 pages, 9 figures

Journal ref: IEEE Transactions on Parallel and Distributed Systems 32 (2021) 367-378

arXiv:1910.11712 [pdf, ps, other]

doi 10.1145/3447544

NEP: a module for the parallel solution of nonlinear eigenvalue problems in SLEPc

Authors: Carmen Campos, Jose E. Roman

Abstract: SLEPc is a parallel library for the solution of various types of large-scale eigenvalue problems. In the last years we have been developing a module within SLEPc, called NEP, that is intended for solving nonlinear eigenvalue problems. These problems can be defined by means of a matrix-valued function that depends nonlinearly on a single scalar parameter. We do not consider the particular case of p… ▽ More SLEPc is a parallel library for the solution of various types of large-scale eigenvalue problems. In the last years we have been developing a module within SLEPc, called NEP, that is intended for solving nonlinear eigenvalue problems. These problems can be defined by means of a matrix-valued function that depends nonlinearly on a single scalar parameter. We do not consider the particular case of polynomial eigenvalue problems (which are implemented in a different module in SLEPc) and focus here on rational eigenvalue problems and other general nonlinear eigenproblems involving square roots or any other nonlinear function. The paper discusses how the NEP module has been designed to fit the needs of applications and provides a description of the available solvers, including some implementation details such as parallelization. Several test problems coming from real applications are used to evaluate the performance and reliability of the solvers. △ Less

Submitted 15 January, 2021; v1 submitted 25 October, 2019; originally announced October 2019.

Journal ref: ACM Trans. Math. Software, 47(3), Article 23, 2021

arXiv:1704.01144 [pdf, other]

doi 10.1016/j.jocs.2017.03.008

Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping

Authors: Jean Marie Couteyen Carpaye, Jean Roman, Pierre Brenner

Abstract: FLUSEPA (Registered trademark in France No. 134009261) is an advanced simulation tool which performs a large panel of aerodynamic studies. It is the unstructured finite-volume solver developed by Airbus Safran Launchers company to calculate compressible, multidimensional, unsteady, viscous and reactive flows around bodies in relative motion. The time integration in FLUSEPA is done using an explici… ▽ More FLUSEPA (Registered trademark in France No. 134009261) is an advanced simulation tool which performs a large panel of aerodynamic studies. It is the unstructured finite-volume solver developed by Airbus Safran Launchers company to calculate compressible, multidimensional, unsteady, viscous and reactive flows around bodies in relative motion. The time integration in FLUSEPA is done using an explicit temporal adaptive method. The current production version of the code is based on MPI and OpenMP. This implementation leads to important synchronizations that must be reduced. To tackle this problem, we present the study of a task-based parallelization of the aerodynamic solver of FLUSEPA using the runtime system StarPU and combining up to three levels of parallelism. We validate our solution by the simulation (using a finite-volume mesh with 80 million cells) of a take-off blast wave propagation for Ariane 5 launcher. △ Less

Submitted 6 April, 2017; v1 submitted 29 March, 2017; originally announced April 2017.

Comments: Accepted manuscript of a paper in Journal of Computational Science

arXiv:1602.02886 [pdf, ps, other]

Optilization of the gyroaverage operator based on hermite interpolation

Authors: F Rozar, C Steiner, G Latu, M Mehrenberger, V Grandgirard, Julien Bigot, T Cartier-Michaud, Jean Roman

Abstract: Gyrokinetic modeling is appropriate for describing Tokamak plasma turbulence, and the gyroaverage operator is a cornerstone of this approach. In a gyrokinetic code, the gyroaveraging scheme needs to be accurate enough to avoid spoiling the data but also requires a low computation cost because it is applied often on the main unknown, the 5D guiding-center distribution function, and on the 3D electr… ▽ More Gyrokinetic modeling is appropriate for describing Tokamak plasma turbulence, and the gyroaverage operator is a cornerstone of this approach. In a gyrokinetic code, the gyroaveraging scheme needs to be accurate enough to avoid spoiling the data but also requires a low computation cost because it is applied often on the main unknown, the 5D guiding-center distribution function, and on the 3D electric potentials. In the present paper, we improve a gyroaverage scheme based on Hermite interpolation used in the Gysela code. This initial implementation represents a too large fraction of the total execution time. The gyroaverage operator has been reformulated and is now expressed as a matrix-vector product and a cache-friendly algorithm has been setup. Different techniques have been investigated to quicken the computations by more than a factor two. Description of the algorithms is given, together with an analysis of the achieved performance. △ Less

Submitted 9 February, 2016; originally announced February 2016.

arXiv:1206.0042 [pdf, other]

Language Acquisition in Computers

Authors: Megan Belzner, Sean Colin-Ellerin, Jorge H. Roman

Abstract: This project explores the nature of language acquisition in computers, guided by techniques similar to those used in children. While existing natural language processing methods are limited in scope and understanding, our system aims to gain an understanding of language from first principles and hence minimal initial input. The first portion of our system was implemented in Java and is focused on… ▽ More This project explores the nature of language acquisition in computers, guided by techniques similar to those used in children. While existing natural language processing methods are limited in scope and understanding, our system aims to gain an understanding of language from first principles and hence minimal initial input. The first portion of our system was implemented in Java and is focused on understanding the morphology of language using bigrams. We use frequency distributions and differences between them to define and distinguish languages. English and French texts were analyzed to determine a difference threshold of 55 before the texts are considered to be in different languages, and this threshold was verified using Spanish texts. The second portion of our system focuses on gaining an understanding of the syntax of a language using a recursive method. The program uses one of two possible methods to analyze given sentences based on either sentence patterns or surrounding words. Both methods have been implemented in C++. The program is able to understand the structure of simple sentences and learn new words. In addition, we have provided some suggestions regarding future work and potential extensions of the existing program. △ Less

Submitted 31 May, 2012; originally announced June 2012.

Comments: 39 pages, 10 figures and 6 tables

ACM Class: I.2.6; I.2.7

Showing 1–9 of 9 results for author: Roman, J