Skip to main content

Showing 1–31 of 31 results for author: Lillo, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.23979  [pdf, other

    stat.ML cs.LG

    The more the merrier: logical and multistage processors in credit scoring

    Authors: Arturo Pérez-Peralta, Sandra Benítez-Peña, Rosa E. Lillo

    Abstract: Machine Learning algorithms are ubiquitous in key decision-making contexts such as organizational justice or healthcare, which has spawned a great demand for fairness in these procedures. In this paper we focus on the application of fair ML in finance, more concretely on the use of fairness techniques on credit scoring. This paper makes two contributions. On the one hand, it addresses the existent… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 34 pages, 14 figures

    MSC Class: 68T05; 91D30; 68T37

  2. arXiv:2503.07854  [pdf, other

    stat.ME stat.AP

    Health Prognostics in Multi-sensor Systems Based on Multivariate Functional Data Analysis

    Authors: Cevahir Yildirim, Alba M. Franco-Pereira, Rosa E. Lillo

    Abstract: Recent developments in big data analysis, machine learning, Industry 4.0, and IoT applications have enabled the monitoring and processing of multi-sensor data collected from systems, allowing for the prediction of the "Remaining Useful Life" (RUL) of system components. Particularly in the aviation industry, Prognostic Health Management (PHM) has become one of the most important practices for ensur… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  3. arXiv:2406.19213  [pdf, other

    stat.ME stat.AP

    Comparing Lasso and Adaptive Lasso in High-Dimensional Data: A Genetic Survival Analysis in Triple-Negative Breast Cancer

    Authors: Pilar González-Barquero, Rosa E. Lillo, Álvaro Méndez-Civieta

    Abstract: This study aims to evaluate the performance of Cox regression with lasso penalty and adaptive lasso penalty in high-dimensional settings. Variable selection methods are necessary in this context to reduce dimensionality and make the problem feasible. Several weight calculation procedures for adaptive lasso are proposed to determine if they offer an improvement over lasso, as adaptive lasso address… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 39 pages, 2 figures, 8 tables

  4. arXiv:2406.01588  [pdf, other

    cs.LG cs.AI stat.ML

    nn2poly: An R Package for Converting Neural Networks into Interpretable Polynomials

    Authors: Pablo Morala, Jenny Alexandra Cifuentes, Rosa E. Lillo, Iñaki Ucar

    Abstract: The nn2poly package provides the implementation in R of the NN2Poly method to explain and interpret feed-forward neural networks by means of polynomial representations that predict in an equivalent manner as the original network.Through the obtained polynomial coefficients, the effect and importance of each variable and their interactions on the output can be represented. This capabiltiy of captur… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. A bivariate two-state Markov modulated Poisson process for failure modelling

    Authors: Yoel G. Yera, Rosa E. Lillo, Bo F. Nielsen, Pepa Ramírez-Cobo, Fabrizio Ruggeri

    Abstract: Motivated by a real failure dataset in a two-dimensional context, this paper presents an extension of the Markov modulated Poisson process (MMPP) to two dimensions. The one-dimensional MMPP has been proposed for the modeling of dependent and non-exponential inter-failure times (in contexts as queuing, risk or reliability, among others). The novel two-dimensional MMPP allows for dependence between… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Journal ref: Reliability Engineering and System Safety 208(2021) 107318

  6. Fitting procedure for the two-state Batch Markov modulated Poisson process

    Authors: Yoel G. Yera, Rosa E. Lillo, Pepa Ramírez-Cobo

    Abstract: The Batch Markov Modulated Poisson Process (BMMPP) is a subclass of the versatile Batch Markovian Arrival process (BMAP) which has been proposed for the modeling of dependent events occurring in batches (as group arrivals, failures or risk events). This paper focuses on exploring the possibilities of the BMMPP for the modeling of real phenomena involving point processes with group arrivals. The fi… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Journal ref: European Journal of Operational Research (2019)

  7. arXiv:2401.14553  [pdf, ps, other

    q-fin.RM stat.AP

    Analysis of an aggregate loss model in a Markov renewal regime

    Authors: Pepa Ramírez-Cobo, Emilio Carrizosa, Rosa Elvira Lillo

    Abstract: In this article we consider an aggregate loss model with dependent losses. The losses occurrence process is governed by a two-state Markovian arrival process (MAP2), a Markov renewal process process that allows for (1) correlated inter-losses times, (2) non-exponentially distributed inter-losses times and, (3) overdisperse losses counts. Some quantities of interest to measure persistence in the lo… ▽ More

    Submitted 4 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Journal ref: Applied Mathematics and Computation (2021)

  8. Clustering multivariate functional data using the epigraph and hypograph indices: a case study on Madrid air quality

    Authors: Belén Pulido, Alba M. Franco-Pereira, Rosa E. Lillo

    Abstract: With the rapid growth of data generation, advancements in functional data analysis (FDA) have become essential, especially for approaches that handle multiple variables at the same time. This paper introduces a novel formulation of the epigraph and hypograph indices, along with their generalized expressions, specifically designed for multivariate functional data (MFD). These new definitions accoun… ▽ More

    Submitted 19 November, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

  9. arXiv:2207.12803  [pdf, other

    stat.ME stat.AP stat.CO

    Multivariate Functional Outlier Detection using the FastMUOD Indices

    Authors: Oluwasegun Taiwo Ojo, Antonio Fernández Anta, Marc G. Genton, Rosa E. Lillo

    Abstract: We present definitions and properties of the fast massive unsupervised outlier detection (FastMUOD) indices, used for outlier detection (OD) in functional data. FastMUOD detects outliers by computing, for each curve, an amplitude, magnitude and shape index meant to target the corresponding types of outliers. Some methods adapting FastMUOD to outlier detection in multivariate functional data are th… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  10. NN2Poly: A polynomial representation for deep feed-forward artificial neural networks

    Authors: Pablo Morala, Jenny Alexandra Cifuentes, Rosa E. Lillo, Iñaki Ucar

    Abstract: Interpretability of neural networks and their underlying theoretical behavior remain an open field of study even after the great success of their practical applications, particularly with the emergence of deep learning. In this work, NN2Poly is proposed: a theoretical approach to obtain an explicit polynomial model that provides an accurate representation of an already trained fully-connected feed… ▽ More

    Submitted 25 September, 2023; v1 submitted 21 December, 2021; originally announced December 2021.

    Journal ref: IEEE Transactions on Neural Networks and Learning Systems (2023, Early Access)

  11. arXiv:2111.00472  [pdf, other

    stat.CO

    Asgl: A Python Package for Penalized Linear and Quantile Regression

    Authors: Álvaro Méndez Civieta, M. Carmen Aguilera-Morillo, Rosa E. Lillo

    Abstract: Asg is a Python package that solves penalized linear regression and quantile regression models for simultaneous variable selection and prediction, for both high and low dimensional frameworks. It makes very easy to set up and solve different types of lasso-based penalizations among which the asgl (adaptive sparse group lasso, that gives name to the package) is remarked. This package is built on to… ▽ More

    Submitted 31 October, 2021; originally announced November 2021.

    Comments: 31 pages, 1 figure, 1 table

  12. arXiv:2110.07998  [pdf, other

    stat.ME stat.CO

    Fast Partial Quantile Regression

    Authors: Alvaro Mendez Civieta, M. Carmen Aguilera-Morillo, Rosa E. Lillo

    Abstract: Partial least squares (PLS) is a dimensionality reduction technique used as an alternative to ordinary least squares (OLS) in situations where the data is colinear or high dimensional. Both PLS and OLS provide mean based estimates, which are extremely sensitive to the presence of outliers or heavy tailed distributions. In contrast, quantile regression is an alternative to OLS that computes robust… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: 22 pages, 5 figures and 5 tables

    MSC Class: 62-08; 62Hxx; 62Jxx ACM Class: G.3

  13. arXiv:2108.03284  [pdf, other

    physics.soc-ph cs.DC stat.CO

    Estimating Active Cases of COVID-19

    Authors: Javier Álvarez, Carlos Baquero, Elisa Cabana, Jaya Prakash Champati, Antonio Fernández Anta, Davide Frey, Augusto García-Agúndez, Chryssis Georgiou, Mathieu Goessens, Harold Hernández, Rosa Lillo, Raquel Menezes, Raúl Moreno, Nicolas Nicolaou, Oluwasegun Ojo, Antonio Ortega, Jesús Rufino, Efstathios Stavrakis, Govind Jeevan, Christin Glorioso

    Abstract: Having accurate and timely data on confirmed active COVID-19 cases is challenging, since it depends on testing capacity and the availability of an appropriate infrastructure to perform tests and aggregate their results. In this paper, we propose methods to estimate the number of active cases of COVID-19 from the official data (of confirmed cases and fatalities) and from survey data. We show that t… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: Presented at the 2nd KDD Workshop on Data-driven Humanitarian Mapping: Harnessing Human-Machine Intelligence for High-Stake Public Policy and Resiliency Planning, August 15, 2021

  14. Functional clustering via multivariate clustering

    Authors: Belén Pulido, Alba María Franco-Pereira, Rosa Elvira Lillo

    Abstract: Clustering techniques applied to multivariate data are a very useful tool in Statistics and have been fully studied in the literature. Nevertheless, these clustering methodologies are less well known when dealing with functional data. Our proposal consists of introducing a clustering procedure for functional data using the very well known techniques for clustering multivariate data. The idea is to… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

  15. arXiv:2105.05213  [pdf, other

    stat.CO stat.ME

    Outlier Detection for Functional Data with R Package fdaoutlier

    Authors: Oluwasegun Ojo, Rosa E. Lillo, Antonio Fernández Anta

    Abstract: Outlier detection is one of the standard exploratory analysis tasks in functional data analysis. We present the R package fdaoutlier which contains implementations of some of the latest techniques for detecting functional outliers. The package makes it easy to detect different types of outliers (magnitude, shape, and amplitude) in functional data, and some of the implemented methods can be applied… ▽ More

    Submitted 14 October, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

  16. Towards a mathematical framework to inform Neural Network modelling via Polynomial Regression

    Authors: Pablo Morala, Jenny Alexandra Cifuentes, Rosa E. Lillo, Iñaki Ucar

    Abstract: Even when neural networks are widely used in a large number of applications, they are still considered as black boxes and present some difficulties for dimensioning or evaluating their prediction error. This has led to an increasing interest in the overlapping area between neural networks and more traditional statistical methods, which can help overcome those problems. In this article, a mathemati… ▽ More

    Submitted 7 February, 2021; originally announced February 2021.

    Comments: 39 pages, 15 figures

    Journal ref: Neural Networks 142 (2021), 57-72

  17. arXiv:2005.12783  [pdf, other

    cs.DC cs.CY stat.AP

    CoronaSurveys: Using Surveys with Indirect Reporting to Estimate the Incidence and Evolution of Epidemics

    Authors: Oluwasegun Ojo, Augusto García-Agundez, Benjamin Girault, Harold Hernández, Elisa Cabana, Amanda García-García, Payman Arabshahi, Carlos Baquero, Paolo Casari, Ednaldo José Ferreira, Davide Frey, Chryssis Georgiou, Mathieu Goessens, Anna Ishchenko, Ernesto Jiménez, Oleksiy Kebkal, Rosa Lillo, Raquel Menezes, Nicolas Nicolaou, Antonio Ortega, Paul Patras, Julian C Roberts, Efstathios Stavrakis, Yuichi Tanaka, Antonio Fernández Anta

    Abstract: The world is suffering from a pandemic called COVID-19, caused by the SARS-CoV-2 virus. National governments have problems evaluating the reach of the epidemic, due to having limited resources and tests at their disposal. This problem is especially acute in low and middle-income countries (LMICs). Hence, any simple, cheap and flexible means of evaluating the incidence and evolution of the epidemic… ▽ More

    Submitted 26 June, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

    Comments: Presented at The KDD Workshop on Humanitarian Mapping, San Diego, California USA, August 24, 2020

  18. Detecting and Classifying Outliers in Big Functional Data

    Authors: Oluwasegun Taiwo Ojo, Antonio Fernández Anta, Rosa E. Lillo, Carlo Sguera

    Abstract: We propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection (MUOD). MUOD detects and classifies outliers by computing for each curve, three indices, all based on the concept of linear regression and correlation, which measur… ▽ More

    Submitted 14 October, 2021; v1 submitted 16 December, 2019; originally announced December 2019.

    MSC Class: 2R10 (Functional data analysis)

  19. arXiv:1911.01081  [pdf, other

    stat.ME stat.AP

    Quantile regression: a penalization approach

    Authors: Álvaro Méndez Civieta, M. Carmen Aguilera-Morillo, Rosa E. Lillo

    Abstract: Sparse group LASSO (SGL) is a penalization technique used in regression problems where the covariates have a natural grouped structure and provides solutions that are both between and within group sparse. In this paper the SGL is introduced to the quantile regression (QR) framework, and a more flexible version, the adaptive sparse group LASSO (ASGL), is proposed. This proposal adds weights to the… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

    Comments: 9 figures, 5 tables

  20. Robust regression based on shrinkage estimators

    Authors: Elisa Cabana, Rosa E. Lillo, Henry Laniado

    Abstract: A robust estimator is proposed for the parameters that characterize the linear regression problem. It is based on the notion of shrinkages, often used in Finance and previously studied for outlier detection in multivariate data. A thorough simulation study is conducted to investigate: the efficiency with normal and heavy-tailed errors, the robustness under contamination, the computational times, t… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

  21. Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators

    Authors: Elisa Cabana, Rosa E. Lillo, Henry Laniado

    Abstract: A collection of robust Mahalanobis distances for multivariate outlier detection is proposed, based on the notion of shrinkage. Robust intensity and scaling factors are optimally estimated to define the shrinkage. Some properties are investigated, such as affine equivariance and breakdown value. The performance of the proposal is illustrated through the comparison to other techniques from the liter… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Journal ref: Stat Papers (2019)

  22. arXiv:1610.08386  [pdf, other

    stat.AP math.ST

    On the estimation of extreme directional multivariate quantiles

    Authors: Raúl Torres, Elena Di Bernardino, Henry Laniado, Rosa E. Lillo

    Abstract: In multivariate extreme value theory (MEVT), the focus is on analysis outside of the observable sampling zone, which implies that the region of interest is associated to high risk levels. This work provides tools to include directional notions into the MEVT, giving the opportunity to characterize the recently introduced directional multivariate quantiles (DMQ) at high levels. Then, an out-sample e… ▽ More

    Submitted 4 December, 2018; v1 submitted 26 October, 2016; originally announced October 2016.

  23. arXiv:1607.05042  [pdf, ps, other

    stat.ME

    An empirical comparison of global and local functional depths

    Authors: Carlo Sguera, Rosa E. Lillo

    Abstract: A functional data depth provides a center-outward ordering criterion which allows the definition of measures such as median, trimmed means, central regions or ranks in a functional framework. A functional data depth can be global or local. With global depths, the degree of centrality of a curve $x$ depends equally on the rest of the sample observations, while with local depths, the contribution of… ▽ More

    Submitted 5 July, 2018; v1 submitted 18 July, 2016; originally announced July 2016.

  24. Directional Multivariate Extremes in Environmental Phenomena

    Authors: Raúl Torres, Carlo De Michele, Henry Laniado, Rosa E. Lillo

    Abstract: Several environmental phenomena can be described by different correlated variables that must be considered jointly in order to be more representative of the nature of these phenomena. For such events, identification of extremes is inappropriate if it is based on marginal analysis. Extremes have usually been linked to the notion of quantile, which is an important tool to analyze risk in the univari… ▽ More

    Submitted 10 June, 2016; v1 submitted 6 June, 2016; originally announced June 2016.

    Comments: Article with supplementary material in the appendix

    Journal ref: Environmetrics, Volume 28, Issue 2 March 2017 e2428

  25. arXiv:1507.01835  [pdf, ps, other

    stat.ME

    Homogeneity test for functional data

    Authors: Ramón Flores, Rosa Lillo, Juan Romo

    Abstract: In the context of functional data analysis, we propose new two sample tests for homogeneity. Based on some well-known depth measures, we construct four different statistics in order to measure distance between the two samples. A simulation study is performed to check the efficiency of the tests when confronted with shape and magnitude perturbation. Finally, we apply these tools to measure the homo… ▽ More

    Submitted 7 July, 2015; originally announced July 2015.

  26. A Directional Multivariate Value at Risk

    Authors: Raúl Torres, Rosa E. Lillo, Henry Laniado

    Abstract: In economics, insurance and finance, value at risk (VaR) is a widely used measure of the risk of loss on a specific portfolio of financial assets. For a given portfolio, time horizon, and probability $α$, the $100α\%$ VaR is defined as a threshold loss value, such that the probability that the loss on the portfolio over the given time horizon exceeds this value is $α$. That is to say, it is a quan… ▽ More

    Submitted 3 February, 2015; originally announced February 2015.

    Comments: 30 pages, 9 figures

    Journal ref: Insurance: Mathematics and Economics, Volume 65, November 2015, Pages 111-123

  27. Functional outlier detection by a local depth with application to NOx levels

    Authors: Carlo Sguera, Pedro Galeano, Rosa Lillo

    Abstract: This paper proposes methods to detect outliers in functional data sets and the task of identifying atypical curves is carried out using the recently proposed kernelized functional spatial depth (KFSD). KFSD is a local depth that can be used to order the curves of a sample from the most to the least central, and since outliers are usually among the least central curves, we present a probabilistic r… ▽ More

    Submitted 16 June, 2015; v1 submitted 8 January, 2015; originally announced January 2015.

    Comments: in Stochastic Environmental Research and Risk Assessment, 2015

  28. arXiv:1409.1816  [pdf, ps, other

    stat.ME

    Extremality measures and a rank test for functional data

    Authors: A. M. Franco-Pereira, R. E. Lillo, J. Romo

    Abstract: The statistical analysis of functional data is a growing need in many research areas. In particular, a robust methodology is important to study curves, which are the output of experiments in applied statistics. In this paper we study some new definitions which reflect the "extremality" of a curve with respect to a collection of functions, and provide natural orderings for sample curves. Their fini… ▽ More

    Submitted 4 September, 2014; originally announced September 2014.

    Comments: 20pages, 11 figures

  29. Spatial Depth-Based Classification for Functional Data

    Authors: Carlo Sguera, Pedro Galeano, Rosa Lillo

    Abstract: We enlarge the number of available functional depths by introducing the kernelized functional spatial depth (KFSD). KFSD is a local-oriented and kernel-based version of the recently proposed functional spatial depth (FSD) that may be useful for studying functional samples that require an analysis at a local level. In addition, we consider supervised functional classification problems, focusing on… ▽ More

    Submitted 20 March, 2014; v1 submitted 13 May, 2013; originally announced May 2013.

    Journal ref: TEST, December 2014, Volume 23, Issue 4, pp 725-750

  30. arXiv:1304.4786  [pdf, other

    math.ST stat.CO stat.ME stat.ML

    The Mahalanobis distance for functional data with applications to classification

    Authors: Esdras Joseph, Pedro Galeano, Rosa E. Lillo

    Abstract: This paper presents a general notion of Mahalanobis distance for functional data that extends the classical multivariate concept to situations where the observed data are points belonging to curves generated by a stochastic process. More precisely, a new semi-distance for functional observations that generalize the usual Mahalanobis distance for multivariate datasets is introduced. For that, the d… ▽ More

    Submitted 17 April, 2013; originally announced April 2013.

  31. Bayesian inference for double Pareto lognormal queues

    Authors: Pepa Ramirez-Cobo, Rosa E. Lillo, Simon Wilson, Michael P. Wiper

    Abstract: In this article we describe a method for carrying out Bayesian estimation for the double Pareto lognormal (dPlN) distribution which has been proposed as a model for heavy-tailed phenomena. We apply our approach to estimate the $\mathit{dPlN}/M/1$ and $M/\mathit{dPlN}/1$ queueing systems. These systems cannot be analyzed using standard techniques due to the fact that the dPlN distribution does not… ▽ More

    Submitted 15 November, 2010; originally announced November 2010.

    Comments: Published in at http://dx.doi.org/10.1214/10-AOAS336 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS336

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 3, 1533-1557