-
FedTilt: Towards Multi-Level Fairness-Preserving and Robust Federated Learning
Authors:
Binghui Zhang,
Luis Mares De La Cruz,
Binghui Wang
Abstract:
Federated Learning (FL) is an emerging decentralized learning paradigm that can partly address the privacy concern that cannot be handled by traditional centralized and distributed learning. Further, to make FL practical, it is also necessary to consider constraints such as fairness and robustness. However, existing robust FL methods often produce unfair models, and existing fair FL methods only c…
▽ More
Federated Learning (FL) is an emerging decentralized learning paradigm that can partly address the privacy concern that cannot be handled by traditional centralized and distributed learning. Further, to make FL practical, it is also necessary to consider constraints such as fairness and robustness. However, existing robust FL methods often produce unfair models, and existing fair FL methods only consider one-level (client) fairness and are not robust to persistent outliers (i.e., injected outliers into each training round) that are common in real-world FL settings. We propose \texttt{FedTilt}, a novel FL that can preserve multi-level fairness and be robust to outliers. In particular, we consider two common levels of fairness, i.e., \emph{client fairness} -- uniformity of performance across clients, and \emph{client data fairness} -- uniformity of performance across different classes of data within a client. \texttt{FedTilt} is inspired by the recently proposed tilted empirical risk minimization, which introduces tilt hyperparameters that can be flexibly tuned. Theoretically, we show how tuning tilt values can achieve the two-level fairness and mitigate the persistent outliers, and derive the convergence condition of \texttt{FedTilt} as well. Empirically, our evaluation results on a suite of realistic federated datasets in diverse settings show the effectiveness and flexibility of the \texttt{FedTilt} framework and the superiority to the state-of-the-arts.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
Evaluating Robotic Approach Techniques for the Insertion of a Straight Instrument into a Vitreoretinal Surgery Trocar
Authors:
Ross Henry,
Martin Huber,
Anestis Mablekos-Alexiou,
Carlo Seneci,
Mohamed Abdelaziz,
Hans Natalius,
Lyndon da Cruz,
Christos Bergeles
Abstract:
Advances in vitreoretinal robotic surgery enable precise techniques for gene therapies. This study evaluates three robotic approaches using the 7-DoF robotic arm for docking a micro-precise tool to a trocar: fully co-manipulated, hybrid co-manipulated/teleoperated, and hybrid with camera assistance. The fully co-manipulated approach was the fastest but had a 42% success rate. Hybrid methods showed…
▽ More
Advances in vitreoretinal robotic surgery enable precise techniques for gene therapies. This study evaluates three robotic approaches using the 7-DoF robotic arm for docking a micro-precise tool to a trocar: fully co-manipulated, hybrid co-manipulated/teleoperated, and hybrid with camera assistance. The fully co-manipulated approach was the fastest but had a 42% success rate. Hybrid methods showed higher success rates (91.6% and 100%) and completed tasks within 2 minutes. NASA Task Load Index (TLX) assessments indicated lower physical demand and effort for hybrid approaches.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Unveiling the Energy Vampires: A Methodology for Debugging Software Energy Consumption
Authors:
Enrique Barba Roque,
Luis Cruz,
Thomas Durieux
Abstract:
Energy consumption in software systems is becoming increasingly important, especially in large-scale deployments. However, debugging energy-related issues remains challenging due to the lack of specialized tools. This paper presents an energy debugging methodology for identifying and isolating energy consumption hotspots in software systems. We demonstrate the methodology's effectiveness through a…
▽ More
Energy consumption in software systems is becoming increasingly important, especially in large-scale deployments. However, debugging energy-related issues remains challenging due to the lack of specialized tools. This paper presents an energy debugging methodology for identifying and isolating energy consumption hotspots in software systems. We demonstrate the methodology's effectiveness through a case study of Redis, a popular in-memory database. Our analysis reveals significant energy consumption differences between Alpine and Ubuntu distributions, with Alpine consuming up to 20.2% more power in certain operations. We trace this difference to the implementation of the memcpy function in different C standard libraries (musl vs. glibc). By isolating and benchmarking memcpy, we confirm it as the primary cause of the energy discrepancy. Our findings highlight the importance of considering energy efficiency in software dependencies and demonstrate the capability to assist developers in identifying and addressing energy-related issues. This work contributes to the growing field of sustainable software engineering by providing a systematic approach to energy debugging and using it to unveil unexpected energy behaviors in Alpine.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
An incremental algorithm based on multichannel non-negative matrix partial co-factorization for ambient denoising in auscultation
Authors:
Juan De La Torre Cruz,
Francisco Jesus Canadas Quesada,
Damian Martinez-Munoz,
Nicolas Ruiz Reyes,
Sebastian Garcia Galan,
Julio Jose Carabias Orti
Abstract:
The aim of this study is to implement a method to remove ambient noise in biomedical sounds captured in auscultation. We propose an incremental approach based on multichannel non-negative matrix partial co-factorization (NMPCF) for ambient denoising focusing on high noisy environment with a Signal-to-Noise Ratio (SNR) <= -5 dB. The first contribution applies NMPCF assuming that ambient noise can b…
▽ More
The aim of this study is to implement a method to remove ambient noise in biomedical sounds captured in auscultation. We propose an incremental approach based on multichannel non-negative matrix partial co-factorization (NMPCF) for ambient denoising focusing on high noisy environment with a Signal-to-Noise Ratio (SNR) <= -5 dB. The first contribution applies NMPCF assuming that ambient noise can be modelled as repetitive sound events simultaneously found in two single-channel inputs captured by means of different recording devices. The second contribution proposes an incremental algorithm, based on the previous multichannel NMPCF, that refines the estimated biomedical spectrogram throughout a set of incremental stages by eliminating most of the ambient noise that was not removed in the previous stage at the expense of preserving most of the biomedical spectral content. The ambient denoising performance of the proposed method, compared to some of the most relevant state-of-the-art methods, has been evaluated using a set of recordings composed of biomedical sounds mixed with ambient noise that typically surrounds a medical consultation room to simulate high noisy environments with a SNR from -20 dB to -5 dB. Experimental results report that: (i) the performance drop suffered by the proposed method is lower compared to MSS and NLMS; (ii) unlike what happens with MSS and NLMS, the proposed method shows a stable trend of the average SDR and SIR results regardless of the type of ambient noise and the SNR level evaluated; and (iii) a remarkable advantage is the high robustness of the estimated biomedical sounds when the two single-channel inputs suffer from a delay between them.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
GINTRIP: Interpretable Temporal Graph Regression using Information bottleneck and Prototype-based method
Authors:
Ali Royat,
Seyed Mohamad Moghadas,
Lesley De Cruz,
Adrian Munteanu
Abstract:
Deep neural networks (DNNs) have demonstrated remarkable performance across various domains, yet their application to temporal graph regression tasks faces significant challenges regarding interpretability. This critical issue, rooted in the inherent complexity of both DNNs and underlying spatio-temporal patterns in the graph, calls for innovative solutions. While interpretability concerns in Grap…
▽ More
Deep neural networks (DNNs) have demonstrated remarkable performance across various domains, yet their application to temporal graph regression tasks faces significant challenges regarding interpretability. This critical issue, rooted in the inherent complexity of both DNNs and underlying spatio-temporal patterns in the graph, calls for innovative solutions. While interpretability concerns in Graph Neural Networks (GNNs) mirror those of DNNs, to the best of our knowledge, no notable work has addressed the interpretability of temporal GNNs using a combination of Information Bottleneck (IB) principles and prototype-based methods. Our research introduces a novel approach that uniquely integrates these techniques to enhance the interpretability of temporal graph regression models. The key contributions of our work are threefold: We introduce the \underline{G}raph \underline{IN}terpretability in \underline{T}emporal \underline{R}egression task using \underline{I}nformation bottleneck and \underline{P}rototype (GINTRIP) framework, the first combined application of IB and prototype-based methods for interpretable temporal graph tasks. We derive a novel theoretical bound on mutual information (MI), extending the applicability of IB principles to graph regression tasks. We incorporate an unsupervised auxiliary classification head, fostering multi-task learning and diverse concept representation, which enhances the model bottleneck's interpretability. Our model is evaluated on real-world traffic datasets, outperforming existing methods in both forecasting accuracy and interpretability-related metrics.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Understanding Feedback Mechanisms in Machine Learning Jupyter Notebooks
Authors:
Arumoy Shome,
Luis Cruz,
Diomidis Spinellis,
Arie van Deursen
Abstract:
The machine learning development lifecycle is characterized by iterative and exploratory processes that rely on feedback mechanisms to ensure data and model integrity. Despite the critical role of feedback in machine learning engineering, no prior research has been conducted to identify and understand these mechanisms. To address this knowledge gap, we mine 297.8 thousand Jupyter notebooks and ana…
▽ More
The machine learning development lifecycle is characterized by iterative and exploratory processes that rely on feedback mechanisms to ensure data and model integrity. Despite the critical role of feedback in machine learning engineering, no prior research has been conducted to identify and understand these mechanisms. To address this knowledge gap, we mine 297.8 thousand Jupyter notebooks and analyse 2.3 million code cells. We identify three key feedback mechanisms -- assertions, print statements and last cell statements -- and further categorize them into implicit and explicit forms of feedback. Our findings reveal extensive use of implicit feedback for critical design decisions and the relatively limited adoption of explicit feedback mechanisms. By conducting detailed case studies with selected feedback instances, we uncover the potential for automated validation of critical assumptions in ML workflows using assertions. Finally, this study underscores the need for improved documentation, and provides practical recommendations on how existing feedback mechanisms in the ML development workflow can be effectively used to mitigate technical debt and enhance reproducibility.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Innovating for Tomorrow: The Convergence of SE and Green AI
Authors:
Luís Cruz,
Xavier Franch Gutierrez,
Silverio Martínez-Fernández
Abstract:
The latest advancements in machine learning, specifically in foundation models, are revolutionizing the frontiers of existing software engineering (SE) processes. This is a bi-directional phenomona, where 1) software systems are now challenged to provide AI-enabled features to their users, and 2) AI is used to automate tasks within the software development lifecycle. In an era where sustainability…
▽ More
The latest advancements in machine learning, specifically in foundation models, are revolutionizing the frontiers of existing software engineering (SE) processes. This is a bi-directional phenomona, where 1) software systems are now challenged to provide AI-enabled features to their users, and 2) AI is used to automate tasks within the software development lifecycle. In an era where sustainability is a pressing societal concern, our community needs to adopt a long-term plan enabling a conscious transformation that aligns with environmental sustainability values. In this paper, we reflect on the impact of adopting environmentally friendly practices to create AI-enabled software systems and make considerations on the environmental impact of using foundation models for software development.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Green AI in Action: Strategic Model Selection for Ensembles in Production
Authors:
Nienke Nijkamp,
June Sallou,
Niels van der Heijden,
Luís Cruz
Abstract:
Integrating Artificial Intelligence (AI) into software systems has significantly enhanced their capabilities while escalating energy demands. Ensemble learning, combining predictions from multiple models to form a single prediction, intensifies this problem due to cumulative energy consumption. This paper presents a novel approach to model selection that addresses the challenge of balancing the ac…
▽ More
Integrating Artificial Intelligence (AI) into software systems has significantly enhanced their capabilities while escalating energy demands. Ensemble learning, combining predictions from multiple models to form a single prediction, intensifies this problem due to cumulative energy consumption. This paper presents a novel approach to model selection that addresses the challenge of balancing the accuracy of AI models with their energy consumption in a live AI ensemble system. We explore how reducing the number of models or improving the efficiency of model usage within an ensemble during inference can reduce energy demands without substantially sacrificing accuracy. This study introduces and evaluates two model selection strategies, Static and Dynamic, for optimizing ensemble learning systems performance while minimizing energy usage. Our results demonstrate that the Static strategy improves the F1 score beyond the baseline, reducing average energy usage from 100\% from the full ensemble to 6\2%. The Dynamic strategy further enhances F1 scores, using on average 76\% compared to 100% of the full ensemble. Moreover, we propose an approach that balances accuracy with resource consumption, significantly reducing energy usage without substantially impacting accuracy. This method decreased the average energy usage of the Static strategy from approximately 62\% to 14\%, and for the Dynamic strategy, from around 76\% to 57\%. Our field study of Green AI using an operational AI system developed by a large professional services provider shows the practical applicability of adopting energy-conscious model selection strategies in live production environments.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
DIRESA, a distance-preserving nonlinear dimension reduction technique based on regularized autoencoders
Authors:
Geert De Paepe,
Lesley De Cruz
Abstract:
In meteorology, finding similar weather patterns or analogs in historical datasets can be useful for data assimilation, forecasting, and postprocessing. In climate science, analogs in historical and climate projection data are used for attribution and impact studies. However, most of the time, those large weather and climate datasets are nearline. This means that they must be downloaded, which tak…
▽ More
In meteorology, finding similar weather patterns or analogs in historical datasets can be useful for data assimilation, forecasting, and postprocessing. In climate science, analogs in historical and climate projection data are used for attribution and impact studies. However, most of the time, those large weather and climate datasets are nearline. This means that they must be downloaded, which takes a lot of bandwidth and disk space, before the computationally expensive search can be executed. We propose a dimension reduction technique based on autoencoder (AE) neural networks to compress the datasets and perform the search in an interpretable, compressed latent space. A distance-regularized Siamese twin autoencoder (DIRESA) architecture is designed to preserve distance in latent space while capturing the nonlinearities in the datasets. Using conceptual climate models of different complexities, we show that the latent components thus obtained provide physical insight into the dominant modes of variability in the system. Compressing datasets with DIRESA reduces the online storage and keeps the latent components uncorrelated, while the distance (ordering) preservation and reconstruction fidelity robustly outperform Principal Component Analysis (PCA) and other dimension reduction techniques such as UMAP or variational autoencoders.
△ Less
Submitted 27 April, 2025; v1 submitted 28 April, 2024;
originally announced April 2024.
-
McUDI: Model-Centric Unsupervised Degradation Indicator for Failure Prediction AIOps Solutions
Authors:
Lorena Poenaru-Olaru,
Luis Cruz,
Jan Rellermeyer,
Arie van Deursen
Abstract:
Due to the continuous change in operational data, AIOps solutions suffer from performance degradation over time. Although periodic retraining is the state-of-the-art technique to preserve the failure prediction AIOps models' performance over time, this technique requires a considerable amount of labeled data to retrain. In AIOps obtaining label data is expensive since it requires the availability…
▽ More
Due to the continuous change in operational data, AIOps solutions suffer from performance degradation over time. Although periodic retraining is the state-of-the-art technique to preserve the failure prediction AIOps models' performance over time, this technique requires a considerable amount of labeled data to retrain. In AIOps obtaining label data is expensive since it requires the availability of domain experts to intensively annotate it. In this paper, we present McUDI, a model-centric unsupervised degradation indicator that is capable of detecting the exact moment the AIOps model requires retraining as a result of changes in data. We further show how employing McUDI in the maintenance pipeline of AIOps solutions can reduce the number of samples that require annotations with 30k for job failure prediction and 260k for disk failure prediction while achieving similar performance with periodic retraining.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Data vs. Model Machine Learning Fairness Testing: An Empirical Study
Authors:
Arumoy Shome,
Luis Cruz,
Arie van Deursen
Abstract:
Although several fairness definitions and bias mitigation techniques exist in the literature, all existing solutions evaluate fairness of Machine Learning (ML) systems after the training stage. In this paper, we take the first steps towards evaluating a more holistic approach by testing for fairness both before and after model training. We evaluate the effectiveness of the proposed approach and po…
▽ More
Although several fairness definitions and bias mitigation techniques exist in the literature, all existing solutions evaluate fairness of Machine Learning (ML) systems after the training stage. In this paper, we take the first steps towards evaluating a more holistic approach by testing for fairness both before and after model training. We evaluate the effectiveness of the proposed approach and position it within the ML development lifecycle, using an empirical analysis of the relationship between model dependent and independent fairness metrics. The study uses 2 fairness metrics, 4 ML algorithms, 5 real-world datasets and 1600 fairness evaluation cycles. We find a linear relationship between data and model fairness metrics when the distribution and the size of the training data changes. Our results indicate that testing for fairness prior to training can be a ``cheap'' and effective means of catching a biased data collection process early; detecting data drifts in production systems and minimising execution of full training cycles thus reducing development time and costs.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions
Authors:
Arumoy Shome,
Luis Cruz,
Arie van Deursen
Abstract:
We present our vision for developing an automated tool capable of translating visual properties observed in Machine Learning (ML) visualisations into Python assertions. The tool aims to streamline the process of manually verifying these visualisations in the ML development cycle, which is critical as real-world data and assumptions often change post-deployment. In a prior study, we mined $54,070$…
▽ More
We present our vision for developing an automated tool capable of translating visual properties observed in Machine Learning (ML) visualisations into Python assertions. The tool aims to streamline the process of manually verifying these visualisations in the ML development cycle, which is critical as real-world data and assumptions often change post-deployment. In a prior study, we mined $54,070$ Jupyter notebooks from Github and created a catalogue of $269$ semantically related visualisation-assertion (VA) pairs. Building on this catalogue, we propose to build a taxonomy that organises the VA pairs based on ML verification tasks. The input feature space comprises of a rich source of information mined from the Jupyter notebooks -- visualisations, Python source code, and associated markdown text. The effectiveness of various AI models, including traditional NLP4Code models and modern Large Language Models, will be compared using established machine translation metrics and evaluated through a qualitative study with human participants. The paper also plans to address the challenge of extending the existing VA pair dataset with additional pairs from Kaggle and to compare the tool's effectiveness with commercial generative AI models like ChatGPT. This research not only contributes to the field of ML system validation but also explores novel ways to leverage AI for automating and enhancing software engineering practices in ML.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Energy Patterns for Web: An Exploratory Study
Authors:
Pooja Rani,
Jonas Zellweger,
Veronika Kousadianos,
Luis Cruz,
Timo Kehrer,
Alberto Bacchelli
Abstract:
As the energy footprint generated by software is increasing at an alarming rate, understanding how to develop energy-efficient applications has become a necessity. Previous work has introduced catalogs of coding practices, also known as energy patterns. These patterns are yet limited to Mobile or third-party libraries. In this study, we focus on the Web domain--a main source of energy consumption.…
▽ More
As the energy footprint generated by software is increasing at an alarming rate, understanding how to develop energy-efficient applications has become a necessity. Previous work has introduced catalogs of coding practices, also known as energy patterns. These patterns are yet limited to Mobile or third-party libraries. In this study, we focus on the Web domain--a main source of energy consumption. First, we investigated whether and how Mobile energy patterns could be ported to this domain and found that 20 patterns could be ported. Then, we interviewed six expert web developers from different companies to challenge the ported patterns. Most developers expressed concerns for antipatterns, specifically with functional antipatterns, and were able to formulate guidelines to locate these patterns in the source code. Finally, to quantify the effect of Web energy patterns on energy consumption, we set up an automated pipeline to evaluate two ported patterns: 'Dynamic Retry Delay' (DRD) and 'Open Only When Necessary' (OOWN). With this, we found no evidence that the DRD pattern consumes less energy than its antipattern, while the opposite is true for OOWN. Data and Material: https://doi.org/10.5281/zenodo.8404487
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
EnergiBridge: Empowering Software Sustainability through Cross-Platform Energy Measurement
Authors:
June Sallou,
Luís Cruz,
Thomas Durieux
Abstract:
In the continually evolving realm of software engineering, the need to address software energy consumption has gained increasing prominence. However, the absence of a platform-independent tool that facilitates straightforward energy measurements remains a notable gap. This paper presents EnergiBridge, a cross-platform measurement utility that provides support for Linux, Windows, and MacOS, as well…
▽ More
In the continually evolving realm of software engineering, the need to address software energy consumption has gained increasing prominence. However, the absence of a platform-independent tool that facilitates straightforward energy measurements remains a notable gap. This paper presents EnergiBridge, a cross-platform measurement utility that provides support for Linux, Windows, and MacOS, as well as Intel, AMD, and Apple ARM CPU architectures. In essence, EnergiBridge serves as a bridge between energy-conscious software engineering and the diverse software environments in which it operates. It encourages a broader community to make informed decisions, minimize energy consumption, and reduce the environmental impact of software systems.
By simplifying software energy measurements, EnergiBridge offers a valuable resource to make green software development more lightweight, education more inclusive, and research more reproducible. Through the evaluation, we highlight EnergiBridge's ability to gather energy data across diverse platforms and hardware configurations.
EnergiBridge is publicly available on GitHub: https://github.com/tdurieux/EnergiBridge, and a demonstration video can be viewed at: https://youtu.be/-gPJurKFraE.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
INRISCO: INcident monitoRing In Smart COmmunities
Authors:
Mónica Aguilar Igartua,
Florina Almenares,
Rebeca P. Díaz Redondo,
Manuela I. Martín,
Jordi Forné,
Celeste Campo,
Ana Fernández,
Luis J. de la Cruz,
Carlos García-Rubio,
Andrés Marínn,
Ahmad Mohamad Mezher,
Daniel Díaz,
Héctor Cerezo,
David Rebollo-Monedero,
Patricia Arias,
Francisco Rico
Abstract:
Major advances in information and communication technologies (ICTs) make citizens to be considered as sensors in motion. Carrying their mobile devices, moving in their connected vehicles or actively participating in social networks, citizens provide a wealth of information that, after properly processing, can support numerous applications for the benefit of the community. In the context of smart c…
▽ More
Major advances in information and communication technologies (ICTs) make citizens to be considered as sensors in motion. Carrying their mobile devices, moving in their connected vehicles or actively participating in social networks, citizens provide a wealth of information that, after properly processing, can support numerous applications for the benefit of the community. In the context of smart communities, the INRISCO proposal intends for (i) the early detection of abnormal situations in cities (i.e., incidents), (ii) the analysis of whether, according to their impact, those incidents are really adverse for the community; and (iii) the automatic actuation by dissemination of appropriate information to citizens and authorities. Thus, INRISCO will identify and report on incidents in traffic (jam, accident) or public infrastructure (e.g., works, street cut), the occurrence of specific events that affect other citizens life (e.g., demonstrations, concerts), or environmental problems (e.g., pollution, bad weather). It is of particular interest to this proposal the identification of incidents with a social and economic impact, which affects the quality of life of citizens.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World
Authors:
Lorena Poenaru-Olaru,
Natalia Karpova,
Luis Cruz,
Jan Rellermeyer,
Arie van Deursen
Abstract:
Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are continuously evaluated on newly emerging data. Operational data is constantly changing over time, which affects the performance of deployed anomaly d…
▽ More
Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are continuously evaluated on newly emerging data. Operational data is constantly changing over time, which affects the performance of deployed anomaly detection models. Therefore, continuous model maintenance is required to preserve the performance of anomaly detectors over time. In this work, we analyze two different anomaly detection model maintenance techniques in terms of the model update frequency, namely blind model retraining and informed model retraining. We further investigate the effects of updating the model by retraining it on all the available data (full-history approach) and only the newest data (sliding window approach). Moreover, we investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.
△ Less
Submitted 11 April, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
The Two Faces of AI in Green Mobile Computing: A Literature Review
Authors:
Wander Siemers,
June Sallou,
Luís Cruz
Abstract:
Artificial intelligence is bringing ever new functionalities to the realm of mobile devices that are now considered essential (e.g., camera and voice assistants, recommender systems). Yet, operating artificial intelligence takes up a substantial amount of energy. However, artificial intelligence is also being used to enable more energy-efficient solutions for mobile systems. Hence, artificial inte…
▽ More
Artificial intelligence is bringing ever new functionalities to the realm of mobile devices that are now considered essential (e.g., camera and voice assistants, recommender systems). Yet, operating artificial intelligence takes up a substantial amount of energy. However, artificial intelligence is also being used to enable more energy-efficient solutions for mobile systems. Hence, artificial intelligence has two faces in that regard, it is both a key enabler of desired (efficient) mobile functionalities and a major power draw on these devices, playing a part in both the solution and the problem. In this paper, we present a review of the literature of the past decade on the usage of artificial intelligence within the realm of green mobile computing. From the analysis of 34 papers, we highlight the emerging patterns and map the field into 13 main topics that are summarized in details.
Our results showcase that the field is slowly increasing in the past years, more specifically, since 2019. Regarding the double impact AI has on the mobile energy consumption, the energy consumption of AI-based mobile systems is under-studied in comparison to the usage of AI for energy-efficient mobile computing, and we argue for more exploratory studies in that direction. We observe that although most studies are framed as solution papers (94%), the large majority do not make those solutions publicly available to the community. Moreover, we also show that most contributions are purely academic (28 out of 34 papers) and that we need to promote the involvement of the mobile software industry in this field.
△ Less
Submitted 21 July, 2023;
originally announced August 2023.
-
Batching for Green AI -- An Exploratory Study on Inference
Authors:
Tim Yarally,
Luís Cruz,
Daniel Feitosa,
June Sallou,
Arie van Deursen
Abstract:
The batch size is an essential parameter to tune during the development of new neural networks. Amongst other quality indicators, it has a large degree of influence on the model's accuracy, generalisability, training times and parallelisability. This fact is generally known and commonly studied. However, during the application phase of a deep learning model, when the model is utilised by an end-us…
▽ More
The batch size is an essential parameter to tune during the development of new neural networks. Amongst other quality indicators, it has a large degree of influence on the model's accuracy, generalisability, training times and parallelisability. This fact is generally known and commonly studied. However, during the application phase of a deep learning model, when the model is utilised by an end-user for inference, we find that there is a disregard for the potential benefits of introducing a batch size. In this study, we examine the effect of input batching on the energy consumption and response times of five fully-trained neural networks for computer vision that were considered state-of-the-art at the time of their publication. The results suggest that batching has a significant effect on both of these metrics. Furthermore, we present a timeline of the energy efficiency and accuracy of neural networks over the past decade. We find that in general, energy consumption rises at a much steeper pace than accuracy and question the necessity of this evolution. Additionally, we highlight one particular network, ShuffleNetV2(2018), that achieved a competitive performance for its time while maintaining a much lower energy consumption. Nevertheless, we highlight that the results are model dependent.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Towards Automated Semantic Segmentation in Mammography Images
Authors:
Cesar A. Sierra-Franco,
Jan Hurtado,
Victor de A. Thomaz,
Leonardo C. da Cruz,
Santiago V. Silva,
Alberto B. Raposo
Abstract:
Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting…
▽ More
Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting these landmark structures. In this paper, we propose a deep learning-based framework for the segmentation of the nipple, the pectoral muscle, the fibroglandular tissue, and the fatty tissue on standard-view mammography images. We introduce a large private segmentation dataset and extensive experiments considering different deep-learning model architectures. Our experiments demonstrate accurate segmentation performance on variate and challenging cases, showing that this framework can be integrated into clinical practice.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
How to use model architecture and training environment to estimate the energy consumption of DL training
Authors:
Santiago del Rey,
Silverio Martínez-Fernández,
Luís Cruz,
Xavier Franch
Abstract:
To raise awareness of the huge impact Deep Learning (DL) has on the environment, several works have tried to estimate the energy consumption and carbon footprint of DL-based systems across their life cycle. However, the estimations for energy consumption in the training stage usually rely on assumptions that have not been thoroughly tested. This study aims to move past these assumptions by leverag…
▽ More
To raise awareness of the huge impact Deep Learning (DL) has on the environment, several works have tried to estimate the energy consumption and carbon footprint of DL-based systems across their life cycle. However, the estimations for energy consumption in the training stage usually rely on assumptions that have not been thoroughly tested. This study aims to move past these assumptions by leveraging the relationship between energy consumption and two relevant design decisions in DL training; model architecture, and training environment. To investigate these relationships, we collect multiple metrics related to energy efficiency and model correctness during the models' training. Then, we outline the trade-offs between the measured energy consumption and the models' correctness regarding model architecture, and their relationship with the training environment. Finally, we study the training's power consumption behavior and propose four new energy estimation methods. Our results show that selecting the proper model architecture and training environment can reduce energy consumption dramatically (up to 80.72%) at the cost of negligible decreases in correctness. Also, we find evidence that GPUs should scale with the models' computational complexity for better energy efficiency. Furthermore, we prove that current energy estimation methods are unreliable and propose alternatives 2x more precise.
△ Less
Submitted 21 November, 2024; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Green Runner: A tool for efficient model selection from model repositories
Authors:
Jai Kannan,
Scott Barnett,
Anj Simmons,
Taylan Selvi,
Luis Cruz
Abstract:
Deep learning models have become essential in software engineering, enabling intelligent features like image captioning and document generation. However, their popularity raises concerns about environmental impact and inefficient model selection. This paper introduces GreenRunnerGPT, a novel tool for efficiently selecting deep learning models based on specific use cases. It employs a large languag…
▽ More
Deep learning models have become essential in software engineering, enabling intelligent features like image captioning and document generation. However, their popularity raises concerns about environmental impact and inefficient model selection. This paper introduces GreenRunnerGPT, a novel tool for efficiently selecting deep learning models based on specific use cases. It employs a large language model to suggest weights for quality indicators, optimizing resource utilization. The tool utilizes a multi-armed bandit framework to evaluate models against target datasets, considering tradeoffs. We demonstrate that GreenRunnerGPT is able to identify a model suited to a target use case without wasteful computations that would occur under a brute-force approach to model selection.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Towards Understanding Machine Learning Testing in Practise
Authors:
Arumoy Shome,
Luis Cruz,
Arie van Deursen
Abstract:
Visualisations drive all aspects of the Machine Learning (ML) Development Cycle but remain a vastly untapped resource by the research community. ML testing is a highly interactive and cognitive process which demands a human-in-the-loop approach. Besides writing tests for the code base, bulk of the evaluation requires application of domain expertise to generate and interpret visualisations. To gain…
▽ More
Visualisations drive all aspects of the Machine Learning (ML) Development Cycle but remain a vastly untapped resource by the research community. ML testing is a highly interactive and cognitive process which demands a human-in-the-loop approach. Besides writing tests for the code base, bulk of the evaluation requires application of domain expertise to generate and interpret visualisations. To gain a deeper insight into the process of testing ML systems, we propose to study visualisations of ML pipelines by mining Jupyter notebooks. We propose a two prong approach in conducting the analysis. First, gather general insights and trends using a qualitative study of a smaller sample of notebooks. And then use the knowledge gained from the qualitative study to design an empirical study using a larger sample of notebooks. Computational notebooks provide a rich source of information in three formats -- text, code and images. We hope to utilise existing work in image analysis and Natural Language Processing for text and code, to analyse the information present in notebooks. We hope to gain a new perspective into program comprehension and debugging in the context of ML testing.
△ Less
Submitted 22 May, 2023; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI
Authors:
Tim Yarally,
Luís Cruz,
Daniel Feitosa,
June Sallou,
Arie van Deursen
Abstract:
Modern AI practices all strive towards the same goal: better results. In the context of deep learning, the term "results" often refers to the achieved accuracy on a competitive problem set. In this paper, we adopt an idea from the emerging field of Green AI to consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage. We examine the…
▽ More
Modern AI practices all strive towards the same goal: better results. In the context of deep learning, the term "results" often refers to the achieved accuracy on a competitive problem set. In this paper, we adopt an idea from the emerging field of Green AI to consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage. We examine the training stage of the deep learning pipeline from a sustainability perspective, through the study of hyperparameter tuning strategies and the model complexity, two factors vastly impacting the overall pipeline's energy consumption. First, we investigate the effectiveness of grid search, random search and Bayesian optimisation during hyperparameter tuning, and we find that Bayesian optimisation significantly dominates the other strategies. Furthermore, we analyse the architecture of convolutional neural networks with the energy consumption of three prominent layer types: convolutional, linear and ReLU layers. The results show that convolutional layers are the most computationally expensive by a strong margin. Additionally, we observe diminishing returns in accuracy for more energy-hungry models. The overall energy consumption of training can be halved by reducing the network complexity. In conclusion, we highlight innovative and promising energy-efficient practices for training deep learning models. To expand the application of Green AI, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
A Systematic Review of Green AI
Authors:
Roberto Verdecchia,
June Sallou,
Luís Cruz
Abstract:
With the ever-growing adoption of AI-based systems, the carbon footprint of AI is no longer negligible. AI researchers and practitioners are therefore urged to hold themselves accountable for the carbon emissions of the AI models they design and use. This led in recent years to the appearance of researches tackling AI environmental sustainability, a field referred to as Green AI. Despite the rapid…
▽ More
With the ever-growing adoption of AI-based systems, the carbon footprint of AI is no longer negligible. AI researchers and practitioners are therefore urged to hold themselves accountable for the carbon emissions of the AI models they design and use. This led in recent years to the appearance of researches tackling AI environmental sustainability, a field referred to as Green AI. Despite the rapid growth of interest in the topic, a comprehensive overview of Green AI research is to date still missing. To address this gap, in this paper, we present a systematic review of the Green AI literature. From the analysis of 98 primary studies, different patterns emerge. The topic experienced a considerable growth from 2020 onward. Most studies consider monitoring AI model footprint, tuning hyperparameters to improve model sustainability, or benchmarking models. A mix of position papers, observational studies, and solution papers are present. Most papers focus on the training phase, are algorithm-agnostic or study neural networks, and use image data. Laboratory experiments are the most common research strategy. Reported Green AI energy savings go up to 115%, with savings over 50% being rather common. Industrial parties are involved in Green AI studies, albeit most target academic readers. Green AI tool provisioning is scarce. As a conclusion, the Green AI research field results to have reached a considerable level of maturity. Therefore, from this review emerges that the time is suitable to adopt other Green AI research strategies, and port the numerous promising academic results to industrial practice.
△ Less
Submitted 5 May, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
Multi-Layer Personalized Federated Learning for Mitigating Biases in Student Predictive Analytics
Authors:
Yun-Wei Chu,
Seyyedali Hosseinalipour,
Elizabeth Tenorio,
Laura Cruz,
Kerrie Douglas,
Andrew Lan,
Christopher Brinton
Abstract:
Conventional methods for student modeling, which involve predicting grades based on measured activities, struggle to provide accurate results for minority/underrepresented student groups due to data availability biases. In this paper, we propose a Multi-Layer Personalized Federated Learning (MLPFL) methodology that optimizes inference accuracy over different layers of student grouping criteria, su…
▽ More
Conventional methods for student modeling, which involve predicting grades based on measured activities, struggle to provide accurate results for minority/underrepresented student groups due to data availability biases. In this paper, we propose a Multi-Layer Personalized Federated Learning (MLPFL) methodology that optimizes inference accuracy over different layers of student grouping criteria, such as by course and by demographic subgroups within each course. In our approach, personalized models for individual student subgroups are derived from a global model, which is trained in a distributed fashion via meta-gradient updates that account for subgroup heterogeneity while preserving modeling commonalities that exist across the full dataset. The evaluation of the proposed methodology considers case studies of two popular downstream student modeling tasks, knowledge tracing and outcome prediction, which leverage multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums) in model training. Experiments on three real-world online course datasets show significant improvements achieved by our approach over existing student modeling benchmarks, as evidenced by an increased average prediction quality and decreased variance across different student subgroups. Visual analysis of the resulting students' knowledge state embeddings confirm that our personalization methodology extracts activity patterns clustered into different student subgroups, consistent with the performance enhancements we obtain over the baselines.
△ Less
Submitted 28 May, 2024; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Are Concept Drift Detectors Reliable Alarming Systems? -- A Comparative Study
Authors:
Lorena Poenaru-Olaru,
Luis Cruz,
Arie van Deursen,
Jan S. Rellermeyer
Abstract:
As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually…
▽ More
As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as alarming system.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Oflib: Facilitating Operations with and on Optical Flow Fields in Python
Authors:
Claudio Ravasio,
Lyndon Da Cruz,
Christos Bergeles
Abstract:
We present a robust theoretical framework for the characterisation and manipulation of optical flow, i.e 2D vector fields, in the context of their use in motion estimation algorithms and beyond. The definition of two frames of reference guides the mathematical derivation of flow field application, inversion, evaluation, and composition operations. This structured approach is then used as the found…
▽ More
We present a robust theoretical framework for the characterisation and manipulation of optical flow, i.e 2D vector fields, in the context of their use in motion estimation algorithms and beyond. The definition of two frames of reference guides the mathematical derivation of flow field application, inversion, evaluation, and composition operations. This structured approach is then used as the foundation for an implementation in Python 3, with the fully differentiable PyTorch version oflibpytorch supporting back-propagation as required for deep learning. We verify the flow composition method empirically and provide a working example for its application to optical flow ground truth in synthetic training data creation. All code is publicly available.
△ Less
Submitted 14 October, 2022; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Mitigating Biases in Student Performance Prediction via Attention-Based Personalized Federated Learning
Authors:
Yun-Wei Chu,
Seyyedali Hosseinalipour,
Elizabeth Tenorio,
Laura Cruz,
Kerrie Douglas,
Andrew Lan,
Christopher Brinton
Abstract:
Traditional learning-based approaches to student modeling generalize poorly to underrepresented student groups due to biases in data availability. In this paper, we propose a methodology for predicting student performance from their online learning activities that optimizes inference accuracy over different demographic groups such as race and gender. Building upon recent foundations in federated l…
▽ More
Traditional learning-based approaches to student modeling generalize poorly to underrepresented student groups due to biases in data availability. In this paper, we propose a methodology for predicting student performance from their online learning activities that optimizes inference accuracy over different demographic groups such as race and gender. Building upon recent foundations in federated learning, in our approach, personalized models for individual student subgroups are derived from a global model aggregated across all student models via meta-gradient updates that account for subgroup heterogeneity. To learn better representations of student activity, we augment our approach with a self-supervised behavioral pretraining methodology that leverages multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums), and include a neural network attention mechanism in the model aggregation stage. Through experiments on three real-world datasets from online courses, we demonstrate that our approach obtains substantial improvements over existing student modeling baselines in predicting student learning outcomes for all subgroups. Visual analysis of the resulting student embeddings confirm that our personalization methodology indeed identifies different activity patterns within different subgroups, consistent with its stronger inference ability compared with the baselines.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
MLSmellHound: A Context-Aware Code Analysis Tool
Authors:
Jai Kannan,
Scott Barnett,
Luís Cruz,
Anj Simmons,
Akash Agarwal
Abstract:
Meeting the rise of industry demand to incorporate machine learning (ML) components into software systems requires interdisciplinary teams contributing to a shared code base. To maintain consistency, reduce defects and ensure maintainability, developers use code analysis tools to aid them in identifying defects and maintaining standards. With the inclusion of machine learning, tools must account f…
▽ More
Meeting the rise of industry demand to incorporate machine learning (ML) components into software systems requires interdisciplinary teams contributing to a shared code base. To maintain consistency, reduce defects and ensure maintainability, developers use code analysis tools to aid them in identifying defects and maintaining standards. With the inclusion of machine learning, tools must account for the cultural differences within the teams which manifests as multiple programming languages, and conflicting definitions and objectives. Existing tools fail to identify these cultural differences and are geared towards software engineering which reduces their adoption in ML projects. In our approach we attempt to resolve this problem by exploring the use of context which includes i) purpose of the source code, ii) technical domain, iii) problem domain, iv) team norms, v) operational environment, and vi) development lifecycle stage to provide contextualised error reporting for code analysis. To demonstrate our approach, we adapt Pylint as an example and apply a set of contextual transformations to the linting results based on the domain of individual project files under analysis. This allows for contextualised and meaningful error reporting for the end-user.
△ Less
Submitted 8 May, 2022;
originally announced May 2022.
-
Data-Centric Green AI: An Exploratory Empirical Study
Authors:
Roberto Verdecchia,
Luís Cruz,
June Sallou,
Michelle Lin,
James Wickenden,
Estelle Hotellier
Abstract:
With the growing availability of large-scale datasets, and the popularization of affordable storage and computational capabilities, the energy consumed by AI is becoming a growing concern. To address this issue, in recent years, studies have focused on demonstrating how AI energy efficiency can be improved by tuning the model training strategy. Nevertheless, how modifications applied to datasets c…
▽ More
With the growing availability of large-scale datasets, and the popularization of affordable storage and computational capabilities, the energy consumed by AI is becoming a growing concern. To address this issue, in recent years, studies have focused on demonstrating how AI energy efficiency can be improved by tuning the model training strategy. Nevertheless, how modifications applied to datasets can impact the energy consumption of AI is still an open question. To fill this gap, in this exploratory study, we evaluate if data-centric approaches can be utilized to improve AI energy efficiency. To achieve our goal, we conduct an empirical experiment, executed by considering 6 different AI algorithms, a dataset comprising 5,574 data points, and two dataset modifications (number of data points and number of features). Our results show evidence that, by exclusively conducting modifications on datasets, energy consumption can be drastically reduced (up to 92.16%), often at the cost of a negligible or even absent accuracy decline. As additional introductory results, we demonstrate how, by exclusively changing the algorithm used, energy savings up to two orders of magnitude can be achieved. In conclusion, this exploratory investigation empirically demonstrates the importance of applying data-centric techniques to improve AI energy efficiency. Our results call for a research agenda that focuses on data-centric techniques, to further enable and democratize Green AI.
△ Less
Submitted 7 April, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Code Smells for Machine Learning Applications
Authors:
Haiyin Zhang,
Luís Cruz,
Arie van Deursen
Abstract:
The popularity of machine learning has wildly expanded in recent years. Machine learning techniques have been heatedly studied in academia and applied in the industry to create business value. However, there is a lack of guidelines for code quality in machine learning applications. In particular, code smells have rarely been studied in this domain. Although machine learning code is usually integra…
▽ More
The popularity of machine learning has wildly expanded in recent years. Machine learning techniques have been heatedly studied in academia and applied in the industry to create business value. However, there is a lack of guidelines for code quality in machine learning applications. In particular, code smells have rarely been studied in this domain. Although machine learning code is usually integrated as a small part of an overarching system, it usually plays an important role in its core functionality. Hence ensuring code quality is quintessential to avoid issues in the long run. This paper proposes and identifies a list of 22 machine learning-specific code smells collected from various sources, including papers, grey literature, GitHub commits, and Stack Overflow posts. We pinpoint each smell with a description of its context, potential issues in the long run, and proposed solutions. In addition, we link them to their respective pipeline stage and the evidence from both academic and grey literature. The code smell catalog helps data scientists and developers produce and maintain high-quality machine learning application code.
△ Less
Submitted 30 March, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation
Authors:
Theodoros Pissas,
Claudio S. Ravasio,
Lyndon Da Cruz,
Christos Bergeles
Abstract:
This work considers supervised contrastive learning for semantic segmentation. We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks. Our key methodological insight is to leverage samples from the feature spaces emanating from multiple stages of a model's encoder itself requiring neither data augmentation nor onlin…
▽ More
This work considers supervised contrastive learning for semantic segmentation. We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks. Our key methodological insight is to leverage samples from the feature spaces emanating from multiple stages of a model's encoder itself requiring neither data augmentation nor online memory banks to obtain a diverse set of samples. To allow for such an extension we introduce an efficient and effective sampling process, that enables applying contrastive losses over the encoder's features at multiple scales. Furthermore, by first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint by introducing cross-scale contrastive learning linking high-resolution local features to low-resolution global features. Combined, our multi-scale and cross-scale contrastive losses boost performance of various models (DeepLabV3, HRNet, OCRNet, UPerNet) with both CNN and Transformer backbones, when evaluated on 4 diverse datasets from natural (Cityscapes, PascalContext, ADE20K) but also surgical (CaDIS) domains. Our code is available at https://github.com/RViMLab/MS_CS_ContrSeg. datasets from natural (Cityscapes, PascalContext, ADE20K) but also surgical (CaDIS) domains.
△ Less
Submitted 19 July, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Data Smells in Public Datasets
Authors:
Arumoy Shome,
Luis Cruz,
Arie van Deursen
Abstract:
The adoption of Artificial Intelligence (AI) in high-stakes domains such as healthcare, wildlife preservation, autonomous driving and criminal justice system calls for a data-centric approach to AI. Data scientists spend the majority of their time studying and wrangling the data, yet tools to aid them with data analysis are lacking. This study identifies the recurrent data quality issues in public…
▽ More
The adoption of Artificial Intelligence (AI) in high-stakes domains such as healthcare, wildlife preservation, autonomous driving and criminal justice system calls for a data-centric approach to AI. Data scientists spend the majority of their time studying and wrangling the data, yet tools to aid them with data analysis are lacking. This study identifies the recurrent data quality issues in public datasets. Analogous to code smells, we introduce a novel catalogue of data smells that can be used to indicate early signs of problems or technical debt in machine learning systems. To understand the prevalence of data quality issues in datasets, we analyse 25 public datasets and identify 14 data smells.
△ Less
Submitted 25 March, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
"Project smells" -- Experiences in Analysing the Software Quality of ML Projects with mllint
Authors:
Bart van Oort,
Luís Cruz,
Babak Loni,
Arie van Deursen
Abstract:
Machine Learning (ML) projects incur novel challenges in their development and productionisation over traditional software applications, though established principles and best practices in ensuring the project's software quality still apply. While using static analysis to catch code smells has been shown to improve software quality attributes, it is only a small piece of the software quality puzzl…
▽ More
Machine Learning (ML) projects incur novel challenges in their development and productionisation over traditional software applications, though established principles and best practices in ensuring the project's software quality still apply. While using static analysis to catch code smells has been shown to improve software quality attributes, it is only a small piece of the software quality puzzle, especially in the case of ML projects given their additional challenges and lower degree of Software Engineering (SE) experience in the data scientists that develop them. We introduce the novel concept of project smells which consider deficits in project management as a more holistic perspective on software quality in ML projects. An open-source static analysis tool mllint was also implemented to help detect and mitigate these. Our research evaluates this novel concept of project smells in the industrial context of ING, a global bank and large software- and data-intensive organisation. We also investigate the perceived importance of these project smells for proof-of-concept versus production-ready ML projects, as well as the perceived obstructions and benefits to using static analysis tools such as mllint. Our findings indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development, while requiring minimal configuration effort from the user.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Click-Based Student Performance Prediction: A Clustering Guided Meta-Learning Approach
Authors:
Yun-Wei Chu,
Elizabeth Tenorio,
Laura Cruz,
Kerrie Douglas,
Andrew S. Lan,
Christopher G. Brinton
Abstract:
We study the problem of predicting student knowledge acquisition in online courses from clickstream behavior. Motivated by the proliferation of eLearning lecture delivery, we specifically focus on student in-video activity in lectures videos, which consist of content and in-video quizzes. Our methodology for predicting in-video quiz performance is based on three key ideas we develop. First, we mod…
▽ More
We study the problem of predicting student knowledge acquisition in online courses from clickstream behavior. Motivated by the proliferation of eLearning lecture delivery, we specifically focus on student in-video activity in lectures videos, which consist of content and in-video quizzes. Our methodology for predicting in-video quiz performance is based on three key ideas we develop. First, we model students' clicking behavior via time-series learning architectures operating on raw event data, rather than defining hand-crafted features as in existing approaches that may lose important information embedded within the click sequences. Second, we develop a self-supervised clickstream pre-training to learn informative representations of clickstream events that can initialize the prediction model effectively. Third, we propose a clustering guided meta-learning-based training that optimizes the prediction model to exploit clusters of frequent patterns in student clickstream sequences. Through experiments on three real-world datasets, we demonstrate that our method obtains substantial improvements over two baseline models in predicting students' in-video quiz performance. Further, we validate the importance of the pre-training and meta-learning components of our framework through ablation studies. Finally, we show how our methodology reveals insights on video-watching behavior associated with knowledge acquisition for useful learning analytics.
△ Less
Submitted 15 November, 2021; v1 submitted 28 October, 2021;
originally announced November 2021.
-
2020 CATARACTS Semantic Segmentation Challenge
Authors:
Imanol Luengo,
Maria Grammatikopoulou,
Rahim Mohammadi,
Chris Walsh,
Chinedu Innocent Nwoye,
Deepak Alapatt,
Nicolas Padoy,
Zhen-Liang Ni,
Chen-Chen Fan,
Gui-Bin Bian,
Zeng-Guang Hou,
Heonjin Ha,
Jiacheng Wang,
Haojie Wang,
Dong Guo,
Lu Wang,
Guotai Wang,
Mobarakol Islam,
Bharat Giddwani,
Ren Hongliang,
Theodoros Pissas,
Claudio Ravasio,
Martin Huber,
Jeremy Birch,
Joan M. Nunez Do Rio
, et al. (15 additional authors not shown)
Abstract:
Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presenc…
▽ More
Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.
△ Less
Submitted 24 February, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Effective semantic segmentation in Cataract Surgery: What matters most?
Authors:
Theodoros Pissas,
Claudio Ravasio,
Lyndon Da Cruz,
Christos Bergeles
Abstract:
Our work proposes neural network design choices that set the state-of-the-art on a challenging public benchmark on cataract surgery, CaDIS. Our methodology achieves strong performance across three semantic segmentation tasks with increasingly granular surgical tool class sets by effectively handling class imbalance, an inherent challenge in any surgical video. We consider and evaluate two conceptu…
▽ More
Our work proposes neural network design choices that set the state-of-the-art on a challenging public benchmark on cataract surgery, CaDIS. Our methodology achieves strong performance across three semantic segmentation tasks with increasingly granular surgical tool class sets by effectively handling class imbalance, an inherent challenge in any surgical video. We consider and evaluate two conceptually simple data oversampling methods as well as different loss functions. We show significant performance gains across network architectures and tasks especially on the rarest tool classes, thereby presenting an approach for achieving high performance when imbalanced granular datasets are considered. Our code and trained models are available at https://github.com/RViMLab/MICCAI2021_Cataract_semantic_segmentation and qualitative results on unseen surgical video can be found at https://youtu.be/twVIPUj1WZM.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Green Software Lab: Towards an Engineering Discipline for Green Software
Authors:
Rui Abreu,
Marco Couto,
Luís Cruz,
Jácome Cunha,
João Paulo Fernandes,
Rui Pereira,
Alexandre Perez,
João Saraiva
Abstract:
This report describes the research goals and results of the Green Software Lab (GSL) research project. This was a project funded by Fundação para a Ciência e a Tecnologia (FCT) -- the Portuguese research foundation -- under reference POCI-01-0145-FEDER-016718, that ran from January 2016 till July 2020.
This report includes the complete document reporting the results achieved during the project e…
▽ More
This report describes the research goals and results of the Green Software Lab (GSL) research project. This was a project funded by Fundação para a Ciência e a Tecnologia (FCT) -- the Portuguese research foundation -- under reference POCI-01-0145-FEDER-016718, that ran from January 2016 till July 2020.
This report includes the complete document reporting the results achieved during the project execution, which was submitted to FCT for evaluation on July 2020. It describes the goals of the project, and the different research tasks presenting the deliverables of each of them. It also presents the management and result dissemination work performed during the project's execution. The document includes also a self assessment of the achieved results, and a complete list of scientific publications describing the contributions of the project. Finally, this document includes the FCT evaluation report.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Fixing Vulnerabilities Potentially Hinders Maintainability
Authors:
Sofia Reis,
Rui Abreu,
Luis Cruz
Abstract:
Security is a requirement of utmost importance to produce high-quality software. However, there is still a considerable amount of vulnerabilities being discovered and fixed almost weekly. We hypothesize that developers affect the maintainability of their codebases when patching vulnerabilities. This paper evaluates the impact of patches to improve security on the maintainability of open-source sof…
▽ More
Security is a requirement of utmost importance to produce high-quality software. However, there is still a considerable amount of vulnerabilities being discovered and fixed almost weekly. We hypothesize that developers affect the maintainability of their codebases when patching vulnerabilities. This paper evaluates the impact of patches to improve security on the maintainability of open-source software. Maintainability is measured based on the Better Code Hub's model of 10 guidelines on a dataset, including 1300 security-related commits. Results show evidence of a trade-off between security and maintainability for 41.90% of the cases, i.e., developers may hinder software maintainability. Our analysis shows that 38.29% of patches increased software complexity and 37.87% of patches increased the percentage of LOCs per unit. The implications of our study are that changes to codebases while patching vulnerabilities need to be performed with extra care; tools for patch risk assessment should be integrated into the CI/CD pipeline; computer science curricula needs to be updated; and, more secure programming languages are necessary.
△ Less
Submitted 12 September, 2021; v1 submitted 6 June, 2021;
originally announced June 2021.
-
Systematic Mapping Study on the Machine Learning Lifecycle
Authors:
Yuanhao Xie,
Luís Cruz,
Petra Heck,
Jan S. Rellermeyer
Abstract:
The development of artificial intelligence (AI) has made various industries eager to explore the benefits of AI. There is an increasing amount of research surrounding AI, most of which is centred on the development of new AI algorithms and techniques. However, the advent of AI is bringing an increasing set of practical problems related to AI model lifecycle management that need to be investigated.…
▽ More
The development of artificial intelligence (AI) has made various industries eager to explore the benefits of AI. There is an increasing amount of research surrounding AI, most of which is centred on the development of new AI algorithms and techniques. However, the advent of AI is bringing an increasing set of practical problems related to AI model lifecycle management that need to be investigated. We address this gap by conducting a systematic mapping study on the lifecycle of AI model. Through quantitative research, we provide an overview of the field, identify research opportunities, and provide suggestions for future research. Our study yields 405 publications published from 2005 to 2020, mapped in 5 different main research topics, and 31 sub-topics. We observe that only a minority of publications focus on data management and model production problems, and that more studies should address the AI lifecycle from a holistic perspective.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
The Prevalence of Code Smells in Machine Learning projects
Authors:
Bart van Oort,
Luís Cruz,
Maurício Aniche,
Arie van Deursen
Abstract:
Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of software engineering experience and best practices in this field. One such best practice, static code analysis, can be used to find code smells, i.e., (potential) defects in the source code, refactoring opportunities, and violations of common coding stan…
▽ More
Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of software engineering experience and best practices in this field. One such best practice, static code analysis, can be used to find code smells, i.e., (potential) defects in the source code, refactoring opportunities, and violations of common coding standards. Our research set out to discover the most prevalent code smells in ML projects. We gathered a dataset of 74 open-source ML projects, installed their dependencies and ran Pylint on them. This resulted in a top 20 of all detected code smells, per category. Manual analysis of these smells mainly showed that code duplication is widespread and that the PEP8 convention for identifier naming style may not always be applicable to ML code due to its resemblance with mathematical notation. More interestingly, however, we found several major obstructions to the maintainability and reproducibility of ML projects, primarily related to the dependency management of Python projects. We also found that Pylint cannot reliably check for correct usage of imported dependencies, including prominent ML libraries such as PyTorch.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
Building a SDN Enterprise WLAN Based on Virtual APs
Authors:
Luis Sequeira,
Juan Luis de la Cruz,
Jose Ruiz-Mas,
Jose Saldana,
Julian Fernandez-Navajas,
Jose Almodovar
Abstract:
In this letter the development and testing of an open enterprise Wi-Fi solution based on virtual APs, managed by a central WLAN controller is presented. It allows seamless handovers between APs in different channels, maintaining the QoS of real-time services. The potential scalability issues associated to the beacon generation and channel assignment have been addressed. A battery of tests has been…
▽ More
In this letter the development and testing of an open enterprise Wi-Fi solution based on virtual APs, managed by a central WLAN controller is presented. It allows seamless handovers between APs in different channels, maintaining the QoS of real-time services. The potential scalability issues associated to the beacon generation and channel assignment have been addressed. A battery of tests has been run in a real environment, and the results are reported in terms of packet loss and delay.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
AI Lifecycle Models Need To Be Revised. An Exploratory Study in Fintech
Authors:
Mark Haakman,
Luís Cruz,
Hennie Huijgens,
Arie van Deursen
Abstract:
Tech-leading organizations are embracing the forthcoming artificial intelligence revolution. Intelligent systems are replacing and cooperating with traditional software components. Thus, the same development processes and standards in software engineering ought to be complied in artificial intelligence systems. This study aims to understand the processes by which artificial intelligence-based syst…
▽ More
Tech-leading organizations are embracing the forthcoming artificial intelligence revolution. Intelligent systems are replacing and cooperating with traditional software components. Thus, the same development processes and standards in software engineering ought to be complied in artificial intelligence systems. This study aims to understand the processes by which artificial intelligence-based systems are developed and how state-of-the-art lifecycle models fit the current needs of the industry. We conducted an exploratory case study at ING, a global bank with a strong European base. We interviewed 17 people with different roles and from different departments within the organization. We have found that the following stages have been overlooked by previous lifecycle models: data collection, feasibility study, documentation, model monitoring, and model risk assessment. Our work shows that the real challenges of applying Machine Learning go much beyond sophisticated learning algorithms - more focus is needed on the entire lifecycle. In particular, regardless of the existing development tools for Machine Learning, we observe that they are still not meeting the particularities of this field.
△ Less
Submitted 2 June, 2021; v1 submitted 3 October, 2020;
originally announced October 2020.
-
Multi-objective dynamic programming with limited precision
Authors:
L. Mandow,
J. L. Pérez de la Cruz,
N. Pozas
Abstract:
This paper addresses the problem of approximating the set of all solutions for Multi-objective Markov Decision Processes. We show that in the vast majority of interesting cases, the number of solutions is exponential or even infinite. In order to overcome this difficulty we propose to approximate the set of all solutions by means of a limited precision approach based on White's multi-objective val…
▽ More
This paper addresses the problem of approximating the set of all solutions for Multi-objective Markov Decision Processes. We show that in the vast majority of interesting cases, the number of solutions is exponential or even infinite. In order to overcome this difficulty we propose to approximate the set of all solutions by means of a limited precision approach based on White's multi-objective value-iteration dynamic programming algorithm. We prove that the number of calculated solutions is tractable and show experimentally that the solutions obtained are a good approximation of the true Pareto front.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Can a Wi-Fi WLAN Support a First Person Shooter?
Authors:
Jose Saldana,
Juan Luis de la Cruz,
Luis Sequeira,
Julian Fernandez-Navajas,
Jose Ruiz-Mas
Abstract:
In corporate and commercial environments, the deployment of a set of coordinated Wi-Fi APs is becoming a common solution to provide Internet coverage to moving users. In these scenarios, real-time services as online games can also be present. This paper presents a set of experiments developed in a test scenario where an end device moves between different APs while generating game traffic. A WLAN s…
▽ More
In corporate and commercial environments, the deployment of a set of coordinated Wi-Fi APs is becoming a common solution to provide Internet coverage to moving users. In these scenarios, real-time services as online games can also be present. This paper presents a set of experiments developed in a test scenario where an end device moves between different APs while generating game traffic. A WLAN solution based on virtual APs is used, in order to make the handoffs transparent for Layer 3. The results show that it is possible to maintain an acceptable level of subjective quality during the handoff. At the same time, it is set clear that the fact of having a gamer in an AP could be taken into account by radio resource management algorithms, in order to provide a better quality.
△ Less
Submitted 30 July, 2020;
originally announced July 2020.
-
QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)
Authors:
Andrew Perkis,
Christian Timmerer,
Sabina Baraković,
Jasmina Baraković Husić,
Søren Bech,
Sebastian Bosse,
Jean Botev,
Kjell Brunnström,
Luis Cruz,
Katrien De Moor,
Andrea de Polo Saibanti,
Wouter Durnez,
Sebastian Egger-Lampl,
Ulrich Engelke,
Tiago H. Falk,
Jesús Gutiérrez,
Asim Hameed,
Andrew Hines,
Tanja Kojic,
Dragan Kukolj,
Eirini Liotou,
Dragorad Milovanovic,
Sebastian Möller,
Niall Murray,
Babak Naderi
, et al. (19 additional authors not shown)
Abstract:
With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions,…
▽ More
With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions, scope, and constituents that are required to be addressed so that a coherent understanding of the concepts can be achieved. Such consensus is vital for paving the directionality of the future of immersive media experiences (IMEx) and all related matters. The aim of this white paper is to provide a survey of definitions of immersion and presence which leads to a definition of immersive media experience (IMEx). The Quality of Experience (QoE) for immersive media is described by establishing a relationship between the concepts of QoE and IMEx followed by application areas of immersive media experience. Influencing factors on immersive media experience are elaborated as well as the assessment of immersive media experience. Finally, standardization activities related to IMEx are highlighted and the white paper is concluded with an outlook related to future developments.
△ Less
Submitted 24 November, 2020; v1 submitted 10 June, 2020;
originally announced July 2020.
-
On the Energy Footprint of Mobile Testing Frameworks
Authors:
Luís Cruz,
Rui Abreu
Abstract:
High energy consumption is a challenging issue that an ever increasing number of mobile applications face today. However, energy consumption is being tested in an ad hoc way, despite being an important non-functional requirement of an application. Such limitation becomes particularly disconcerting during software testing: on the one hand, developers do not really know how to measure energy; on the…
▽ More
High energy consumption is a challenging issue that an ever increasing number of mobile applications face today. However, energy consumption is being tested in an ad hoc way, despite being an important non-functional requirement of an application. Such limitation becomes particularly disconcerting during software testing: on the one hand, developers do not really know how to measure energy; on the other hand, there is no knowledge as to what is the energy overhead imposed by the testing framework. In this paper, as we evaluate eight popular mobile UI automation frameworks, we have discovered that there are automation frameworks that increase energy consumption up to roughly 2200%. While limited in the interactions one can do, Espresso is the most energy efficient framework. However, depending on the needs of the tester, Appium, Monkeyrunner, or UIAutomator are good alternatives. In practice, results show that deciding which is the most suitable framework is vital. We provide a decision tree to help developers make an educated decision on which framework suits best their testing needs.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
Do Energy-oriented Changes Hinder Maintainability?
Authors:
Luis Cruz,
Rui Abreu,
John Grundy,
Li Li,
Xin Xia
Abstract:
Energy efficiency is a crucial quality requirement for mobile applications. However, improving energy efficiency is far from trivial as developers lack the knowledge and tools to aid in this activity. In this paper we study the impact of changes to improve energy efficiency on the maintainability of Android applications. Using a dataset containing 539 energy efficiency-oriented commits, we measure…
▽ More
Energy efficiency is a crucial quality requirement for mobile applications. However, improving energy efficiency is far from trivial as developers lack the knowledge and tools to aid in this activity. In this paper we study the impact of changes to improve energy efficiency on the maintainability of Android applications. Using a dataset containing 539 energy efficiency-oriented commits, we measure maintainability -- as computed by the Software Improvement Group's web-based source code analysis service Better Code Hub (BCH) -- before and after energy efficiency-related code changes. Results show that in general improving energy efficiency comes with a significant decrease in maintainability. This is particularly evident in code changes to accommodate the Power Save Mode and Wakelock Addition energy patterns. In addition, we perform manual analysis to assess how real examples of energy-oriented changes affect maintainability. Our results help mobile app developers to 1) avoid common maintainability issues when improving the energy efficiency of their apps; and 2) adopt development processes to build maintainable and energy-efficient code. We also support researchers by identifying challenges in mobile app development that still need to be addressed.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
An Analysis of 35+ Million Jobs of Travis CI
Authors:
Thomas Durieux,
Rui Abreu,
Martin Monperrus,
Tegawendé F. Bissyandé,
Luís Cruz
Abstract:
Travis CI handles automatically thousands of builds every day to, amongst other things, provide valuable feedback to thousands of open-source developers. In this paper, we investigate Travis CI to firstly understand who is using it, and when they start to use it. Secondly, we investigate how the developers use Travis CI and finally, how frequently the developers change the Travis CI configurations…
▽ More
Travis CI handles automatically thousands of builds every day to, amongst other things, provide valuable feedback to thousands of open-source developers. In this paper, we investigate Travis CI to firstly understand who is using it, and when they start to use it. Secondly, we investigate how the developers use Travis CI and finally, how frequently the developers change the Travis CI configurations. We observed during our analysis that the main users of Travis CI are corporate users such as Microsoft. And the programming languages used in Travis CI by those users do not follow the same popularity trend than on GitHub, for example, Python is the most popular language on Travis CI, but it is only the third one on GitHub. We also observe that Travis CI is set up on average seven days after the creation of the repository and the jobs are still mainly used (60%) to run tests. And finally, we observe that 7.34% of the commits modify the Travis CI configuration. We share the biggest benchmark of Travis CI jobs (to our knowledge): it contains 35,793,144 jobs from 272,917 different GitHub projects.
△ Less
Submitted 28 September, 2019; v1 submitted 20 April, 2019;
originally announced April 2019.
-
Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning
Authors:
Gabriel V. de la Cruz Jr.,
Yunshu Du,
Matthew E. Taylor
Abstract:
Deep Reinforcement Learning (DRL) algorithms are known to be data inefficient. One reason is that a DRL agent learns both the feature and the policy tabula rasa. Integrating prior knowledge into DRL algorithms is one way to improve learning efficiency since it helps to build helpful representations. In this work, we consider incorporating human knowledge to accelerate the asynchronous advantage ac…
▽ More
Deep Reinforcement Learning (DRL) algorithms are known to be data inefficient. One reason is that a DRL agent learns both the feature and the policy tabula rasa. Integrating prior knowledge into DRL algorithms is one way to improve learning efficiency since it helps to build helpful representations. In this work, we consider incorporating human knowledge to accelerate the asynchronous advantage actor-critic (A3C) algorithm by pre-training a small amount of non-expert human demonstrations. We leverage the supervised autoencoder framework and propose a novel pre-training strategy that jointly trains a weighted supervised classification loss, an unsupervised reconstruction loss, and an expected return loss. The resulting pre-trained model learns more useful features compared to independently training in supervised or unsupervised fashion. Our pre-training method drastically improved the learning performance of the A3C agent in Atari games of Pong and MsPacman, exceeding the performance of the state-of-the-art algorithms at a much smaller number of game interactions. Our method is light-weight and easy to implement in a single machine. For reproducibility, our code is available at github.com/gabrieledcjr/DeepRL/tree/A3C-ALA2019
△ Less
Submitted 3 April, 2019;
originally announced April 2019.