Search | arXiv e-print repository

Sequential Filtering Techniques for Simultaneous Tracking and Parameter Estimation

Authors: Yannick Sztamfater Garcia, Joaquin Miguez, Manuel Sanjurjo-Rivo

Abstract: The number of resident space objects is rising at an alarming rate. Mega-constellations and breakup events are proliferating in most orbital regimes, and safe navigation is becoming increasingly problematic. It is important to be able to track RSOs accurately and at an affordable computational cost. Orbital dynamics are highly nonlinear, and current operational methods assume Gaussian representati… ▽ More The number of resident space objects is rising at an alarming rate. Mega-constellations and breakup events are proliferating in most orbital regimes, and safe navigation is becoming increasingly problematic. It is important to be able to track RSOs accurately and at an affordable computational cost. Orbital dynamics are highly nonlinear, and current operational methods assume Gaussian representations of the objects' states and employ linearizations which cease to hold true in observation-free propagation. Monte Carlo-based filters can provide a means to approximate the a posteriori probability distribution of the states more accurately by providing support in the portion of the state space which overlaps the most with the processed observations. Moreover, dynamical models are not able to capture the full extent of realistic forces experienced in the near-Earth space environment, and hence fully deterministic propagation methods may fail to achieve the desired accuracy. By modeling orbital dynamics as a stochastic system and solving it using stochastic numerical integrators, we are able to simultaneously estimate the scale of the process noise incurred by the assumed uncertainty in the system, and robustly track the state of the spacecraft. In order to find an adequate balance between accuracy and computational cost, we propose three algorithms which are capable of tracking a space object and estimating the magnitude of the system's uncertainty. The proposed filters are successfully applied to a LEO scenario, demonstrating the ability to accurately track a spacecraft state and estimate the scale of the uncertainty online, in various simulation setups. △ Less

Submitted 10 April, 2025; originally announced April 2025.

Comments: 28 pages, 9 figures. Submitted to the Journal of Astronautical Sciences on 26 March, 2025

arXiv:2403.02432 [pdf, other]

On the impact of measure pre-conditionings on general parametric ML models and transfer learning via domain adaptation

Authors: Joaquín Sánchez García

Abstract: We study a new technique for understanding convergence of learning agents under small modifications of data. We show that such convergence can be understood via an analogue of Fatou's lemma which yields gamma-convergence. We show it's relevance and applications in general machine learning tasks and domain adaptation transfer learning. We study a new technique for understanding convergence of learning agents under small modifications of data. We show that such convergence can be understood via an analogue of Fatou's lemma which yields gamma-convergence. We show it's relevance and applications in general machine learning tasks and domain adaptation transfer learning. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2312.17553 [pdf, other]

A Fully Automated Pipeline Using Swin Transformers for Deep Learning-Based Blood Segmentation on Head CT Scans After Aneurysmal Subarachnoid Hemorrhage

Authors: Sergio Garcia Garcia, Santiago Cepeda, Ignacio Arrese, Rosario Sarabia

Abstract: Background: Accurate volumetric assessment of spontaneous subarachnoid hemorrhage (SAH) is a labor-intensive task performed with current manual and semiautomatic methods that might be relevant for its clinical and prognostic implications. In the present research, we sought to develop and validate an artificial intelligence-driven, fully automated blood segmentation tool for SAH patients via noncon… ▽ More Background: Accurate volumetric assessment of spontaneous subarachnoid hemorrhage (SAH) is a labor-intensive task performed with current manual and semiautomatic methods that might be relevant for its clinical and prognostic implications. In the present research, we sought to develop and validate an artificial intelligence-driven, fully automated blood segmentation tool for SAH patients via noncontrast computed tomography (NCCT) scans employing a transformer-based Swin UNETR architecture. Methods: We retrospectively analyzed NCCT scans from patients with confirmed aneurysmal subarachnoid hemorrhage (aSAH) utilizing the Swin UNETR for segmentation. The performance of the proposed method was evaluated against manually segmented ground truth data using metrics such as Dice score, intersection over union (IoU), the volumetric similarity index (VSI), the symmetric average surface distance (SASD), and sensitivity and specificity. A validation cohort from an external institution was included to test the generalizability of the model. Results: The model demonstrated high accuracy with robust performance metrics across the internal and external validation cohorts. Notably, it achieved high Dice coefficient (0.873), IoU (0.810), VSI (0.840), sensitivity (0.821) and specificity (0.996) values and a low SASD (1.866), suggesting proficiency in segmenting blood in SAH patients. The model's efficiency was reflected in its processing speed, indicating potential for real-time applications. Conclusions: Our Swin UNETR-based model offers significant advances in the automated segmentation of blood after aSAH on NCCT images. Despite the computational intensity, the model operates effectively on standard hardware with a user-friendly interface, facilitating broader clinical adoption. Further validation across diverse datasets is warranted to confirm its clinical reliability. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2003.02601 [pdf, other]

Fuzzy k-Nearest Neighbors with monotonicity constraints: Moving towards the robustness of monotonic noise

Authors: Sergio González, Salvador García, Sheng-Tun Li, Robert John, Francisco Herrera

Abstract: This paper proposes a new model based on Fuzzy k-Nearest Neighbors for classification with monotonic constraints, Monotonic Fuzzy k-NN (MonFkNN). Real-life data-sets often do not comply with monotonic constraints due to class noise. MonFkNN incorporates a new calculation of fuzzy memberships, which increases robustness against monotonic noise without the need for relabeling. Our proposal has been… ▽ More This paper proposes a new model based on Fuzzy k-Nearest Neighbors for classification with monotonic constraints, Monotonic Fuzzy k-NN (MonFkNN). Real-life data-sets often do not comply with monotonic constraints due to class noise. MonFkNN incorporates a new calculation of fuzzy memberships, which increases robustness against monotonic noise without the need for relabeling. Our proposal has been designed to be adaptable to the different needs of the problem being tackled. In several experimental studies, we show significant improvements in accuracy while matching the best degree of monotonicity obtained by comparable methods. We also show that MonFkNN empirically achieves improved performance compared with Monotonic k-NN in the presence of large amounts of class noise. △ Less

Submitted 5 March, 2020; originally announced March 2020.

Comments: Accepted in Neurocomputing

arXiv:2002.09227 [pdf, other]

doi 10.1016/j.swevo.2020.100665

Recent Trends in the Use of Statistical Tests for Comparing Swarm and Evolutionary Computing Algorithms: Practical Guidelines and a Critical Review

Authors: J. Carrasco, S. García, M. M. Rueda, S. Das, F. Herrera

Abstract: A key aspect of the design of evolutionary and swarm intelligence algorithms is studying their performance. Statistical comparisons are also a crucial part which allows for reliable conclusions to be drawn. In the present paper we gather and examine the approaches taken from different perspectives to summarise the assumptions made by these statistical tests, the conclusions reached and the steps f… ▽ More A key aspect of the design of evolutionary and swarm intelligence algorithms is studying their performance. Statistical comparisons are also a crucial part which allows for reliable conclusions to be drawn. In the present paper we gather and examine the approaches taken from different perspectives to summarise the assumptions made by these statistical tests, the conclusions reached and the steps followed to perform them correctly. In this paper, we conduct a survey on the current trends of the proposals of statistical analyses for the comparison of algorithms of computational intelligence and include a description of the statistical background of these tests. We illustrate the use of the most common tests in the context of the Competition on single-objective real parameter optimisation of the IEEE Congress on Evolutionary Computation (CEC) 2017 and describe the main advantages and drawbacks of the use of each kind of test and put forward some recommendations concerning their use. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: 52 pages, 10 figures, 19 tables

Journal ref: SWEVO, Volume 54, May 2020, 100665

arXiv:2001.05759 [pdf, other]

Smart Data driven Decision Trees Ensemble Methodology for Imbalanced Big Data

Authors: Diego García-Gil, Salvador García, Ning Xiong, Francisco Herrera

Abstract: Differences in data size per class, also known as imbalanced data distribution, have become a common problem affecting data quality. Big Data scenarios pose a new challenge to traditional imbalanced classification algorithms, since they are not prepared to work with such amount of data. Split data strategies and lack of data in the minority class due to the use of MapReduce paradigm have posed new… ▽ More Differences in data size per class, also known as imbalanced data distribution, have become a common problem affecting data quality. Big Data scenarios pose a new challenge to traditional imbalanced classification algorithms, since they are not prepared to work with such amount of data. Split data strategies and lack of data in the minority class due to the use of MapReduce paradigm have posed new challenges for tackling the imbalance between classes in Big Data scenarios. Ensembles have shown to be able to successfully address imbalanced data problems. Smart Data refers to data of enough quality to achieve high performance models. The combination of ensembles and Smart Data, achieved through Big Data preprocessing, should be a great synergy. In this paper, we propose a novel Smart Data driven Decision Trees Ensemble methodology for addressing the imbalanced classification problem in Big Data domains, namely SD_DeTE methodology. This methodology is based on the learning of different decision trees using distributed quality data for the ensemble process. This quality data is achieved by fusing Random Discretization, Principal Components Analysis and clustering-based Random Oversampling for obtaining different Smart Data versions of the original data. Experiments carried out in 21 binary adapted datasets have shown that our methodology outperforms Random Forest. △ Less

Submitted 3 September, 2021; v1 submitted 16 January, 2020; originally announced January 2020.

arXiv:1812.05944 [pdf, other]

A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms, Experimental Analysis, Prospects and Challenges (with Appendices on Mathematical Background and Detailed Algorithms Explanation)

Authors: Juan Luis Suárez-Díaz, Salvador García, Francisco Herrera

Abstract: Distance metric learning is a branch of machine learning that aims to learn distances from the data, which enhances the performance of similarity-based algorithms. This tutorial provides a theoretical background and foundations on this topic and a comprehensive experimental analysis of the most-known algorithms. We start by describing the distance metric learning problem and its main mathematical… ▽ More Distance metric learning is a branch of machine learning that aims to learn distances from the data, which enhances the performance of similarity-based algorithms. This tutorial provides a theoretical background and foundations on this topic and a comprehensive experimental analysis of the most-known algorithms. We start by describing the distance metric learning problem and its main mathematical foundations, divided into three main blocks: convex analysis, matrix analysis and information theory. Then, we will describe a representative set of the most popular distance metric learning methods used in classification. All the algorithms studied in this paper will be evaluated with exhaustive testing in order to analyze their capabilities in standard classification problems, particularly considering dimensionality reduction and kernelization. The results, verified by Bayesian statistical tests, highlight a set of outstanding algorithms. Finally, we will discuss several potential future prospects and challenges in this field. This tutorial will serve as a starting point in the domain of distance metric learning from both a theoretical and practical perspective. △ Less

Submitted 19 August, 2020; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: 36 pages with appendices

arXiv:1811.12044 [pdf, other]

doi 10.1007/s13748-018-00167-7

A snapshot on nonstandard supervised learning problems: taxonomy, relationships and methods

Authors: David Charte, Francisco Charte, Salvador García, Francisco Herrera

Abstract: Machine learning is a field which studies how machines can alter and adapt their behavior, improving their actions according to the information they are given. This field is subdivided into multiple areas, among which the best known are supervised learning (e.g. classification and regression) and unsupervised learning (e.g. clustering and association rules). Within supervised learning, most stud… ▽ More Machine learning is a field which studies how machines can alter and adapt their behavior, improving their actions according to the information they are given. This field is subdivided into multiple areas, among which the best known are supervised learning (e.g. classification and regression) and unsupervised learning (e.g. clustering and association rules). Within supervised learning, most studies and research are focused on well known standard tasks, such as binary classification, multiclass classification and regression with one dependent variable. However, there are many other less known problems. These are what we generically call nonstandard supervised learning problems. The literature about them is much more sparse, and each study is directed to a specific task. Therefore, the definitions, relations and applications of this kind of learners are hard to find. The goal of this paper is to provide the reader with a broad view on the distinct variations of nonstandard supervised problems. A comprehensive taxonomy summarizing their traits is proposed. A review of the common approaches followed to accomplish them and their main applications is provided as well. △ Less

Submitted 29 November, 2018; originally announced November 2018.

MSC Class: 68T05; 68T10

Journal ref: Charte, D., Charte, F., García, S. et al. Prog Artif Intell (2018)

arXiv:1810.09733 [pdf, ps, other]

OCAPIS: R package for Ordinal Classification And Preprocessing In Scala

Authors: M. Cristina Heredia-Gómez, Salvador García, Pedro Antonio Gutiérrez, Francisco Herrera

Abstract: Ordinal Data are those where a natural order exist between the labels. The classification and pre-processing of this type of data is attracting more and more interest in the area of machine learning, due to its presence in many common problems. Traditionally, ordinal classification problems have been approached as nominal problems. However, that implies not taking into account their natural order… ▽ More Ordinal Data are those where a natural order exist between the labels. The classification and pre-processing of this type of data is attracting more and more interest in the area of machine learning, due to its presence in many common problems. Traditionally, ordinal classification problems have been approached as nominal problems. However, that implies not taking into account their natural order constraints. In this paper, an innovative R package named ocapis (Ordinal Classification and Preprocessing In Scala) is introduced. Implemented mainly in Scala and available through Github, this library includes four learners and two pre-processing algorithms for ordinal and monotonic data. Main features of the package and examples of installation and use are explained throughout this manuscript. △ Less

Submitted 17 March, 2019; v1 submitted 23 October, 2018; originally announced October 2018.

Comments: 16 pages

arXiv:1810.06021 [pdf, ps, other]

DPASF: A Flink Library for Streaming Data preprocessing

Authors: Alejandro Alcalde-Barros, Diego García-Gil, Salvador García, Francisco Herrera

Abstract: Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data preprocessing, there is little research devoted to the continuous Big Data problem. Apache Flink is a recent and novel Big Data framework, following the MapReduce pa… ▽ More Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data preprocessing, there is little research devoted to the continuous Big Data problem. Apache Flink is a recent and novel Big Data framework, following the MapReduce paradigm, focused on distributed stream and batch data processing. In this paper we propose a data stream library for Big Data preprocessing, named DPASF, under Apache Flink. We have implemented six of the most popular data preprocessing algorithms, three for discretization and the rest for feature selection. The algorithms have been tested using two Big Data datasets. Experimental results show that preprocessing can not only reduce the size of the data, but to maintain or even improve the original accuracy in a short time. DPASF contains useful algorithms when dealing with Big Data data streams. The preprocessing algorithms included in the library are able to tackle Big Datasets efficiently and to correct imperfections in the data. △ Less

Submitted 14 October, 2018; originally announced October 2018.

Comments: 19 pages

arXiv:1804.05774 [pdf, ps, other]

BELIEF: A distance-based redundancy-proof feature selection method for Big Data

Authors: Sergio Ramírez-Gallego, Salvador García, Ning Xiong, Francisco Herrera

Abstract: With the advent of Big Data era, data reduction methods are highly demanded given its ability to simplify huge data, and ease complex learning processes. Concretely, algorithms that are able to filter relevant dimensions from a set of millions are of huge importance. Although effective, these techniques suffer from the "scalability" curse as well. In this work, we propose a distributed feature w… ▽ More With the advent of Big Data era, data reduction methods are highly demanded given its ability to simplify huge data, and ease complex learning processes. Concretely, algorithms that are able to filter relevant dimensions from a set of millions are of huge importance. Although effective, these techniques suffer from the "scalability" curse as well. In this work, we propose a distributed feature weighting algorithm, which is able to rank millions of features in parallel using large samples. This method, inspired by the well-known RELIEF algorithm, introduces a novel redundancy elimination measure that provides similar schemes to those based on entropy at a much lower cost. It also allows smooth scale up when more instances are demanded in feature estimations. Empirical tests performed on our method show its estimation ability in manifold huge sets --both in number of features and instances--, as well as its simplified runtime cost (specially, at the redundancy detection step). △ Less

Submitted 16 April, 2018; originally announced April 2018.

Comments: 30 pages, 6 figures

arXiv:1501.04222 [pdf, ps, other]

JavaNPST: Nonparametric Statistical Tests in Java

Authors: J. Derrac, S. García, F. Herrera

Abstract: Nonparametric statistical tests are useful procedures that can be applied in a wide range of situations, such as testing randomness or goodness of fit, one-sample, two-sample and multiple-sample analysis, association between bivariate samples or count data analysis. Their use is often preferred to parametric tests due to the fact that they require less restrictive assumptions about the population… ▽ More Nonparametric statistical tests are useful procedures that can be applied in a wide range of situations, such as testing randomness or goodness of fit, one-sample, two-sample and multiple-sample analysis, association between bivariate samples or count data analysis. Their use is often preferred to parametric tests due to the fact that they require less restrictive assumptions about the population sampled. In this work, JavaNPST, an open source Java library implementing 40 nonparametric statistical tests, is presented. It can be helpful for programmers and practitioners interested in performing nonparametric statistical analyses, providing a quick and easy way of running these tests directly within any Java code. Some examples of use are also shown, highlighting some of the more remarkable capabilities of the library. △ Less

Submitted 17 January, 2015; originally announced January 2015.

Comments: 19 pages, 1 figure. Statistical Software Library for JAVA

arXiv:1106.5834 [pdf, ps, other]

doi 10.1214/13-AOAS638

A method for generating realistic correlation matrices

Authors: Johanna Hardin, Stephan Ramon Garcia, David Golan

Abstract: Simulating sample correlation matrices is important in many areas of statistics. Approaches such as generating Gaussian data and finding their sample correlation matrix or generating random uniform $[-1,1]$ deviates as pairwise correlations both have drawbacks. We develop an algorithm for adding noise, in a highly controlled manner, to general correlation matrices. In many instances, our method yi… ▽ More Simulating sample correlation matrices is important in many areas of statistics. Approaches such as generating Gaussian data and finding their sample correlation matrix or generating random uniform $[-1,1]$ deviates as pairwise correlations both have drawbacks. We develop an algorithm for adding noise, in a highly controlled manner, to general correlation matrices. In many instances, our method yields results which are superior to those obtained by simply simulating Gaussian data. Moreover, we demonstrate how our general algorithm can be tailored to a number of different correlation models. Using our results with a few different applications, we show that simulating correlation matrices can help assess statistical methodology. △ Less

Submitted 6 December, 2013; v1 submitted 28 June, 2011; originally announced June 2011.

Comments: Published in at http://dx.doi.org/10.1214/13-AOAS638 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS638

Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 3, 1733-1762

Showing 1–13 of 13 results for author: García, S