-
Sequential Filtering Techniques for Simultaneous Tracking and Parameter Estimation
Authors:
Yannick Sztamfater Garcia,
Joaquin Miguez,
Manuel Sanjurjo-Rivo
Abstract:
The number of resident space objects is rising at an alarming rate. Mega-constellations and breakup events are proliferating in most orbital regimes, and safe navigation is becoming increasingly problematic. It is important to be able to track RSOs accurately and at an affordable computational cost. Orbital dynamics are highly nonlinear, and current operational methods assume Gaussian representati…
▽ More
The number of resident space objects is rising at an alarming rate. Mega-constellations and breakup events are proliferating in most orbital regimes, and safe navigation is becoming increasingly problematic. It is important to be able to track RSOs accurately and at an affordable computational cost. Orbital dynamics are highly nonlinear, and current operational methods assume Gaussian representations of the objects' states and employ linearizations which cease to hold true in observation-free propagation. Monte Carlo-based filters can provide a means to approximate the a posteriori probability distribution of the states more accurately by providing support in the portion of the state space which overlaps the most with the processed observations. Moreover, dynamical models are not able to capture the full extent of realistic forces experienced in the near-Earth space environment, and hence fully deterministic propagation methods may fail to achieve the desired accuracy. By modeling orbital dynamics as a stochastic system and solving it using stochastic numerical integrators, we are able to simultaneously estimate the scale of the process noise incurred by the assumed uncertainty in the system, and robustly track the state of the spacecraft. In order to find an adequate balance between accuracy and computational cost, we propose three algorithms which are capable of tracking a space object and estimating the magnitude of the system's uncertainty. The proposed filters are successfully applied to a LEO scenario, demonstrating the ability to accurately track a spacecraft state and estimate the scale of the uncertainty online, in various simulation setups.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
On the impact of measure pre-conditionings on general parametric ML models and transfer learning via domain adaptation
Authors:
Joaquín Sánchez García
Abstract:
We study a new technique for understanding convergence of learning agents under small modifications of data. We show that such convergence can be understood via an analogue of Fatou's lemma which yields gamma-convergence. We show it's relevance and applications in general machine learning tasks and domain adaptation transfer learning.
We study a new technique for understanding convergence of learning agents under small modifications of data. We show that such convergence can be understood via an analogue of Fatou's lemma which yields gamma-convergence. We show it's relevance and applications in general machine learning tasks and domain adaptation transfer learning.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
A Fully Automated Pipeline Using Swin Transformers for Deep Learning-Based Blood Segmentation on Head CT Scans After Aneurysmal Subarachnoid Hemorrhage
Authors:
Sergio Garcia Garcia,
Santiago Cepeda,
Ignacio Arrese,
Rosario Sarabia
Abstract:
Background: Accurate volumetric assessment of spontaneous subarachnoid hemorrhage (SAH) is a labor-intensive task performed with current manual and semiautomatic methods that might be relevant for its clinical and prognostic implications. In the present research, we sought to develop and validate an artificial intelligence-driven, fully automated blood segmentation tool for SAH patients via noncon…
▽ More
Background: Accurate volumetric assessment of spontaneous subarachnoid hemorrhage (SAH) is a labor-intensive task performed with current manual and semiautomatic methods that might be relevant for its clinical and prognostic implications. In the present research, we sought to develop and validate an artificial intelligence-driven, fully automated blood segmentation tool for SAH patients via noncontrast computed tomography (NCCT) scans employing a transformer-based Swin UNETR architecture. Methods: We retrospectively analyzed NCCT scans from patients with confirmed aneurysmal subarachnoid hemorrhage (aSAH) utilizing the Swin UNETR for segmentation. The performance of the proposed method was evaluated against manually segmented ground truth data using metrics such as Dice score, intersection over union (IoU), the volumetric similarity index (VSI), the symmetric average surface distance (SASD), and sensitivity and specificity. A validation cohort from an external institution was included to test the generalizability of the model. Results: The model demonstrated high accuracy with robust performance metrics across the internal and external validation cohorts. Notably, it achieved high Dice coefficient (0.873), IoU (0.810), VSI (0.840), sensitivity (0.821) and specificity (0.996) values and a low SASD (1.866), suggesting proficiency in segmenting blood in SAH patients. The model's efficiency was reflected in its processing speed, indicating potential for real-time applications. Conclusions: Our Swin UNETR-based model offers significant advances in the automated segmentation of blood after aSAH on NCCT images. Despite the computational intensity, the model operates effectively on standard hardware with a user-friendly interface, facilitating broader clinical adoption. Further validation across diverse datasets is warranted to confirm its clinical reliability.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
Fuzzy k-Nearest Neighbors with monotonicity constraints: Moving towards the robustness of monotonic noise
Authors:
Sergio González,
Salvador García,
Sheng-Tun Li,
Robert John,
Francisco Herrera
Abstract:
This paper proposes a new model based on Fuzzy k-Nearest Neighbors for classification with monotonic constraints, Monotonic Fuzzy k-NN (MonFkNN). Real-life data-sets often do not comply with monotonic constraints due to class noise. MonFkNN incorporates a new calculation of fuzzy memberships, which increases robustness against monotonic noise without the need for relabeling. Our proposal has been…
▽ More
This paper proposes a new model based on Fuzzy k-Nearest Neighbors for classification with monotonic constraints, Monotonic Fuzzy k-NN (MonFkNN). Real-life data-sets often do not comply with monotonic constraints due to class noise. MonFkNN incorporates a new calculation of fuzzy memberships, which increases robustness against monotonic noise without the need for relabeling. Our proposal has been designed to be adaptable to the different needs of the problem being tackled. In several experimental studies, we show significant improvements in accuracy while matching the best degree of monotonicity obtained by comparable methods. We also show that MonFkNN empirically achieves improved performance compared with Monotonic k-NN in the presence of large amounts of class noise.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Recent Trends in the Use of Statistical Tests for Comparing Swarm and Evolutionary Computing Algorithms: Practical Guidelines and a Critical Review
Authors:
J. Carrasco,
S. García,
M. M. Rueda,
S. Das,
F. Herrera
Abstract:
A key aspect of the design of evolutionary and swarm intelligence algorithms is studying their performance. Statistical comparisons are also a crucial part which allows for reliable conclusions to be drawn. In the present paper we gather and examine the approaches taken from different perspectives to summarise the assumptions made by these statistical tests, the conclusions reached and the steps f…
▽ More
A key aspect of the design of evolutionary and swarm intelligence algorithms is studying their performance. Statistical comparisons are also a crucial part which allows for reliable conclusions to be drawn. In the present paper we gather and examine the approaches taken from different perspectives to summarise the assumptions made by these statistical tests, the conclusions reached and the steps followed to perform them correctly. In this paper, we conduct a survey on the current trends of the proposals of statistical analyses for the comparison of algorithms of computational intelligence and include a description of the statistical background of these tests. We illustrate the use of the most common tests in the context of the Competition on single-objective real parameter optimisation of the IEEE Congress on Evolutionary Computation (CEC) 2017 and describe the main advantages and drawbacks of the use of each kind of test and put forward some recommendations concerning their use.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
Smart Data driven Decision Trees Ensemble Methodology for Imbalanced Big Data
Authors:
Diego García-Gil,
Salvador García,
Ning Xiong,
Francisco Herrera
Abstract:
Differences in data size per class, also known as imbalanced data distribution, have become a common problem affecting data quality. Big Data scenarios pose a new challenge to traditional imbalanced classification algorithms, since they are not prepared to work with such amount of data. Split data strategies and lack of data in the minority class due to the use of MapReduce paradigm have posed new…
▽ More
Differences in data size per class, also known as imbalanced data distribution, have become a common problem affecting data quality. Big Data scenarios pose a new challenge to traditional imbalanced classification algorithms, since they are not prepared to work with such amount of data. Split data strategies and lack of data in the minority class due to the use of MapReduce paradigm have posed new challenges for tackling the imbalance between classes in Big Data scenarios. Ensembles have shown to be able to successfully address imbalanced data problems. Smart Data refers to data of enough quality to achieve high performance models. The combination of ensembles and Smart Data, achieved through Big Data preprocessing, should be a great synergy. In this paper, we propose a novel Smart Data driven Decision Trees Ensemble methodology for addressing the imbalanced classification problem in Big Data domains, namely SD_DeTE methodology. This methodology is based on the learning of different decision trees using distributed quality data for the ensemble process. This quality data is achieved by fusing Random Discretization, Principal Components Analysis and clustering-based Random Oversampling for obtaining different Smart Data versions of the original data. Experiments carried out in 21 binary adapted datasets have shown that our methodology outperforms Random Forest.
△ Less
Submitted 3 September, 2021; v1 submitted 16 January, 2020;
originally announced January 2020.
-
A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms, Experimental Analysis, Prospects and Challenges (with Appendices on Mathematical Background and Detailed Algorithms Explanation)
Authors:
Juan Luis Suárez-Díaz,
Salvador García,
Francisco Herrera
Abstract:
Distance metric learning is a branch of machine learning that aims to learn distances from the data, which enhances the performance of similarity-based algorithms. This tutorial provides a theoretical background and foundations on this topic and a comprehensive experimental analysis of the most-known algorithms. We start by describing the distance metric learning problem and its main mathematical…
▽ More
Distance metric learning is a branch of machine learning that aims to learn distances from the data, which enhances the performance of similarity-based algorithms. This tutorial provides a theoretical background and foundations on this topic and a comprehensive experimental analysis of the most-known algorithms. We start by describing the distance metric learning problem and its main mathematical foundations, divided into three main blocks: convex analysis, matrix analysis and information theory. Then, we will describe a representative set of the most popular distance metric learning methods used in classification. All the algorithms studied in this paper will be evaluated with exhaustive testing in order to analyze their capabilities in standard classification problems, particularly considering dimensionality reduction and kernelization. The results, verified by Bayesian statistical tests, highlight a set of outstanding algorithms. Finally, we will discuss several potential future prospects and challenges in this field. This tutorial will serve as a starting point in the domain of distance metric learning from both a theoretical and practical perspective.
△ Less
Submitted 19 August, 2020; v1 submitted 14 December, 2018;
originally announced December 2018.
-
A snapshot on nonstandard supervised learning problems: taxonomy, relationships and methods
Authors:
David Charte,
Francisco Charte,
Salvador García,
Francisco Herrera
Abstract:
Machine learning is a field which studies how machines can alter and adapt their behavior, improving their actions according to the information they are given. This field is subdivided into multiple areas, among which the best known are supervised learning (e.g. classification and regression) and unsupervised learning (e.g. clustering and association rules).
Within supervised learning, most stud…
▽ More
Machine learning is a field which studies how machines can alter and adapt their behavior, improving their actions according to the information they are given. This field is subdivided into multiple areas, among which the best known are supervised learning (e.g. classification and regression) and unsupervised learning (e.g. clustering and association rules).
Within supervised learning, most studies and research are focused on well known standard tasks, such as binary classification, multiclass classification and regression with one dependent variable. However, there are many other less known problems. These are what we generically call nonstandard supervised learning problems. The literature about them is much more sparse, and each study is directed to a specific task. Therefore, the definitions, relations and applications of this kind of learners are hard to find.
The goal of this paper is to provide the reader with a broad view on the distinct variations of nonstandard supervised problems. A comprehensive taxonomy summarizing their traits is proposed. A review of the common approaches followed to accomplish them and their main applications is provided as well.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
OCAPIS: R package for Ordinal Classification And Preprocessing In Scala
Authors:
M. Cristina Heredia-Gómez,
Salvador García,
Pedro Antonio Gutiérrez,
Francisco Herrera
Abstract:
Ordinal Data are those where a natural order exist between the labels. The classification and pre-processing of this type of data is attracting more and more interest in the area of machine learning, due to its presence in many common problems. Traditionally, ordinal classification problems have been approached as nominal problems. However, that implies not taking into account their natural order…
▽ More
Ordinal Data are those where a natural order exist between the labels. The classification and pre-processing of this type of data is attracting more and more interest in the area of machine learning, due to its presence in many common problems. Traditionally, ordinal classification problems have been approached as nominal problems. However, that implies not taking into account their natural order constraints. In this paper, an innovative R package named ocapis (Ordinal Classification and Preprocessing In Scala) is introduced. Implemented mainly in Scala and available through Github, this library includes four learners and two pre-processing algorithms for ordinal and monotonic data. Main features of the package and examples of installation and use are explained throughout this manuscript.
△ Less
Submitted 17 March, 2019; v1 submitted 23 October, 2018;
originally announced October 2018.
-
DPASF: A Flink Library for Streaming Data preprocessing
Authors:
Alejandro Alcalde-Barros,
Diego García-Gil,
Salvador García,
Francisco Herrera
Abstract:
Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data preprocessing, there is little research devoted to the continuous Big Data problem. Apache Flink is a recent and novel Big Data framework, following the MapReduce pa…
▽ More
Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data preprocessing, there is little research devoted to the continuous Big Data problem. Apache Flink is a recent and novel Big Data framework, following the MapReduce paradigm, focused on distributed stream and batch data processing. In this paper we propose a data stream library for Big Data preprocessing, named DPASF, under Apache Flink. We have implemented six of the most popular data preprocessing algorithms, three for discretization and the rest for feature selection. The algorithms have been tested using two Big Data datasets. Experimental results show that preprocessing can not only reduce the size of the data, but to maintain or even improve the original accuracy in a short time. DPASF contains useful algorithms when dealing with Big Data data streams. The preprocessing algorithms included in the library are able to tackle Big Datasets efficiently and to correct imperfections in the data.
△ Less
Submitted 14 October, 2018;
originally announced October 2018.
-
BELIEF: A distance-based redundancy-proof feature selection method for Big Data
Authors:
Sergio Ramírez-Gallego,
Salvador García,
Ning Xiong,
Francisco Herrera
Abstract:
With the advent of Big Data era, data reduction methods are highly demanded given its ability to simplify huge data, and ease complex learning processes. Concretely, algorithms that are able to filter relevant dimensions from a set of millions are of huge importance. Although effective, these techniques suffer from the "scalability" curse as well.
In this work, we propose a distributed feature w…
▽ More
With the advent of Big Data era, data reduction methods are highly demanded given its ability to simplify huge data, and ease complex learning processes. Concretely, algorithms that are able to filter relevant dimensions from a set of millions are of huge importance. Although effective, these techniques suffer from the "scalability" curse as well.
In this work, we propose a distributed feature weighting algorithm, which is able to rank millions of features in parallel using large samples. This method, inspired by the well-known RELIEF algorithm, introduces a novel redundancy elimination measure that provides similar schemes to those based on entropy at a much lower cost. It also allows smooth scale up when more instances are demanded in feature estimations. Empirical tests performed on our method show its estimation ability in manifold huge sets --both in number of features and instances--, as well as its simplified runtime cost (specially, at the redundancy detection step).
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
JavaNPST: Nonparametric Statistical Tests in Java
Authors:
J. Derrac,
S. García,
F. Herrera
Abstract:
Nonparametric statistical tests are useful procedures that can be applied in a wide range of situations, such as testing randomness or goodness of fit, one-sample, two-sample and multiple-sample analysis, association between bivariate samples or count data analysis. Their use is often preferred to parametric tests due to the fact that they require less restrictive assumptions about the population…
▽ More
Nonparametric statistical tests are useful procedures that can be applied in a wide range of situations, such as testing randomness or goodness of fit, one-sample, two-sample and multiple-sample analysis, association between bivariate samples or count data analysis. Their use is often preferred to parametric tests due to the fact that they require less restrictive assumptions about the population sampled.
In this work, JavaNPST, an open source Java library implementing 40 nonparametric statistical tests, is presented. It can be helpful for programmers and practitioners interested in performing nonparametric statistical analyses, providing a quick and easy way of running these tests directly within any Java code. Some examples of use are also shown, highlighting some of the more remarkable capabilities of the library.
△ Less
Submitted 17 January, 2015;
originally announced January 2015.
-
A method for generating realistic correlation matrices
Authors:
Johanna Hardin,
Stephan Ramon Garcia,
David Golan
Abstract:
Simulating sample correlation matrices is important in many areas of statistics. Approaches such as generating Gaussian data and finding their sample correlation matrix or generating random uniform $[-1,1]$ deviates as pairwise correlations both have drawbacks. We develop an algorithm for adding noise, in a highly controlled manner, to general correlation matrices. In many instances, our method yi…
▽ More
Simulating sample correlation matrices is important in many areas of statistics. Approaches such as generating Gaussian data and finding their sample correlation matrix or generating random uniform $[-1,1]$ deviates as pairwise correlations both have drawbacks. We develop an algorithm for adding noise, in a highly controlled manner, to general correlation matrices. In many instances, our method yields results which are superior to those obtained by simply simulating Gaussian data. Moreover, we demonstrate how our general algorithm can be tailored to a number of different correlation models. Using our results with a few different applications, we show that simulating correlation matrices can help assess statistical methodology.
△ Less
Submitted 6 December, 2013; v1 submitted 28 June, 2011;
originally announced June 2011.