-
Improving statistical learning methods via features selection without replacement sampling and random projection
Authors:
Sulaiman khan,
Muhammad Ahmad,
Fida Ullah,
Carlos Aguilar Ibañez,
José Eduardo Valdez Rodriguez
Abstract:
Cancer is fundamentally a genetic disease characterized by genetic and epigenetic alterations that disrupt normal gene expression, leading to uncontrolled cell growth and metastasis. High-dimensional microarray datasets pose challenges for classification models due to the "small n, large p" problem, resulting in overfitting. This study makes three different key contributions: 1) we propose a machi…
▽ More
Cancer is fundamentally a genetic disease characterized by genetic and epigenetic alterations that disrupt normal gene expression, leading to uncontrolled cell growth and metastasis. High-dimensional microarray datasets pose challenges for classification models due to the "small n, large p" problem, resulting in overfitting. This study makes three different key contributions: 1) we propose a machine learning-based approach integrating the Feature Selection Without Re-placement (FSWOR) technique and a projection method to improve classification accuracy. 2) We apply the Kendall statistical test to identify the most significant genes from the brain cancer mi-croarray dataset (GSE50161), reducing the feature space from 54,675 to 20,890 genes.3) we apply machine learning models using k-fold cross validation techniques in which our model incorpo-rates ensemble classifiers with LDA projection and Naïve Bayes, achieving a test score of 96%, outperforming existing methods by 9.09%. The results demonstrate the effectiveness of our ap-proach in high-dimensional gene expression analysis, improving classification accuracy while mitigating overfitting. This study contributes to cancer biomarker discovery, offering a robust computational method for analyzing microarray data.
△ Less
Submitted 28 May, 2025;
originally announced June 2025.
-
Enabling Dynamic and Intelligent Workflows for HPC, Data Analytics, and AI Convergence
Authors:
Jorge Ejarque,
Rosa M. Badia,
Loïc Albertin,
Giovanni Aloisio,
Enrico Baglione,
Yolanda Becerra,
Stefan Boschert,
Julian R. Berlin,
Alessandro D'Anca,
Donatello Elia,
François Exertier,
Sandro Fiore,
José Flich,
Arnau Folch,
Steven J Gibbons,
Nikolay Koldunov,
Francesc Lordan,
Stefano Lorito,
Finn Løvholt,
Jorge Macías,
Fabrizio Marozzo,
Alberto Michelini,
Marisol Monterrubio-Velasco,
Marta Pienkowska,
Josep de la Puente
, et al. (12 additional authors not shown)
Abstract:
The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena,…
▽ More
The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project.
△ Less
Submitted 13 May, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Validating Intelligent Power and Energy Systems - A Discussion of Educational Needs
Authors:
Panos Kotsampopoulos,
Nikos Hatziargyriou,
Thomas I. Strasser,
Cyndi Moyo,
Sebastian Rohjans,
Cornelius Steinbrink,
Sebastian Lehnhoff,
Peter Palensky,
Arjen A. van der Meer,
Daniel Esteban Morales Bondy,
Kai Heussen,
Mihai Calin,
Ata Khavari,
Maria Sosnina,
J. Emilio Rodriguez,
Graeme M. Burt
Abstract:
Traditional power systems education and training is flanked by the demand for coping with the rising complexity of energy systems, like the integration of renewable and distributed generation, communication, control and information technology. A broad understanding of these topics by the current/future researchers and engineers is becoming more and more necessary. This paper identifies educational…
▽ More
Traditional power systems education and training is flanked by the demand for coping with the rising complexity of energy systems, like the integration of renewable and distributed generation, communication, control and information technology. A broad understanding of these topics by the current/future researchers and engineers is becoming more and more necessary. This paper identifies educational and training needs addressing the higher complexity of intelligent energy systems. Education needs and requirements are discussed, such as the development of systems-oriented skills and cross-disciplinary learning. Education and training possibilities and necessary tools are described focusing on classroom but also on laboratory-based learning methods. In this context, experiences of using notebooks, co-simulation approaches, hardware-in-the-loop methods and remote labs experiments are discussed.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.