-
Problem-oriented AutoML in Clustering
Authors:
Matheus Camilo da Silva,
Gabriel Marques Tavares,
Eric Medvet,
Sylvio Barbon Junior
Abstract:
The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiveness across diverse clustering tasks. In contrast,…
▽ More
The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiveness across diverse clustering tasks. In contrast, PoAC establishes a dynamic connection between the clustering problem, CVIs, and meta-features, allowing users to customize these components based on the specific context and goals of their task. At its core, PoAC employs a surrogate model trained on a large meta-knowledge base of previous clustering datasets and solutions, enabling it to infer the quality of new clustering pipelines and synthesize optimal solutions for unseen datasets. Unlike many AutoML frameworks that are constrained by fixed evaluation metrics and algorithm sets, PoAC is algorithm-agnostic, adapting seamlessly to different clustering problems without requiring additional data or retraining. Experimental results demonstrate that PoAC not only outperforms state-of-the-art frameworks on a variety of datasets but also excels in specific tasks such as data visualization, and highlight its ability to dynamically adjust pipeline configurations based on dataset complexity.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
A Machine Learning Early Warning System: Multicenter Validation in Brazilian Hospitals
Authors:
Jhonatan Kobylarz,
Henrique D. P. dos Santos,
Felipe Barletta,
Mateus Cichelero da Silva,
Renata Vieira,
Hugo M. P. Morales,
Cristian da Costa Rocha
Abstract:
Early recognition of clinical deterioration is one of the main steps for reducing inpatient morbidity and mortality. The challenging task of clinical deterioration identification in hospitals lies in the intense daily routines of healthcare practitioners, in the unconnected patient data stored in the Electronic Health Records (EHRs) and in the usage of low accuracy scores. Since hospital wards are…
▽ More
Early recognition of clinical deterioration is one of the main steps for reducing inpatient morbidity and mortality. The challenging task of clinical deterioration identification in hospitals lies in the intense daily routines of healthcare practitioners, in the unconnected patient data stored in the Electronic Health Records (EHRs) and in the usage of low accuracy scores. Since hospital wards are given less attention compared to the Intensive Care Unit, ICU, we hypothesized that when a platform is connected to a stream of EHR, there would be a drastic improvement in dangerous situations awareness and could thus assist the healthcare team. With the application of machine learning, the system is capable to consider all patient's history and through the use of high-performing predictive models, an intelligent early warning system is enabled. In this work we used 121,089 medical encounters from six different hospitals and 7,540,389 data points, and we compared popular ward protocols with six different scalable machine learning methods (three are classic machine learning models, logistic and probabilistic-based models, and three gradient boosted models). The results showed an advantage in AUC (Area Under the Receiver Operating Characteristic Curve) of 25 percentage points in the best Machine Learning model result compared to the current state-of-the-art protocols. This is shown by the generalization of the algorithm with leave-one-group-out (AUC of 0.949) and the robustness through cross-validation (AUC of 0.961). We also perform experiments to compare several window sizes to justify the use of five patient timestamps. A sample dataset, experiments, and code are available for replicability purposes.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
Estimation of classrooms occupancy using a multi-layer perceptron
Authors:
Eugénio Rodrigues,
Luísa Dias Pereira,
Adélio Rodrigues Gaspar,
Álvaro Gomes,
Manuel Carlos Gameiro da Silva
Abstract:
This paper presents a multi-layer perceptron model for the estimation of classrooms number of occupants from sensed indoor environmental data-relative humidity, air temperature, and carbon dioxide concentration. The modelling datasets were collected from two classrooms in the Secondary School of Pombal, Portugal. The number of occupants and occupation periods were obtained from class attendance re…
▽ More
This paper presents a multi-layer perceptron model for the estimation of classrooms number of occupants from sensed indoor environmental data-relative humidity, air temperature, and carbon dioxide concentration. The modelling datasets were collected from two classrooms in the Secondary School of Pombal, Portugal. The number of occupants and occupation periods were obtained from class attendance reports. However, post-class occupancy was unknown and the developed model is used to reconstruct the classrooms occupancy by filling the unreported periods. Different model structure and environment variables combination were tested. The model with best accuracy had as input vector 10 variables of five averaged time intervals of relative humidity and carbon dioxide concentration. The model presented a mean square error of 1.99, coefficient of determination of 0.96 with a significance of p-value < 0.001, and a mean absolute error of 1 occupant. These results show promising estimation capabilities in uncertain indoor environment conditions.
△ Less
Submitted 7 February, 2017;
originally announced February 2017.
-
GerAPlanO - A new building design tool: design generation, thermal assessment and performance optimization
Authors:
Eugénio Rodrigues,
Ana Rita Amaral,
Adélio Rodrigues Gaspar,
Álvaro Gomes,
Manuel Carlos Gameiro da Silva,
Carlos Henggeler Antunes
Abstract:
Building practitioners (architects, engineers, energy managers) are showing a growing interest in the design of more energy efficient and livable buildings. The best way to predict how a building will behave regarding energy consumption and thermal comfort is to use a dynamic simulation tool. However, the use of this kind of tools is difficult on a daily basis practice due to the heuristic and exp…
▽ More
Building practitioners (architects, engineers, energy managers) are showing a growing interest in the design of more energy efficient and livable buildings. The best way to predict how a building will behave regarding energy consumption and thermal comfort is to use a dynamic simulation tool. However, the use of this kind of tools is difficult on a daily basis practice due to the heuristic and exploratory nature of the architectural design process. To deal with this difficulty, the University of Coimbra and three companies have been working on the development of a prototype design aiding tool, specifically devoted to the space planning phase of building design, under the project GerAPlanO (Automatic Generation of Architecture Floor plans with Energy Optimization). This project aims to combine the capabilities of design generation techniques, thermal assessment programs, and design optimization methods to provide assistance to decision makers. This paper presents the overall concept, as well as the current status of development of this tool.
△ Less
Submitted 24 March, 2015;
originally announced March 2015.