-
Synthetic ECG Generation for Data Augmentation and Transfer Learning in Arrhythmia Classification
Authors:
José Fernando Núñez,
Jamie Arjona,
Javier Béjar
Abstract:
Deep learning models need a sufficient amount of data in order to be able to find the hidden patterns in it. It is the purpose of generative modeling to learn the data distribution, thus allowing us to sample more data and augment the original dataset. In the context of physiological data, and more specifically electrocardiogram (ECG) data, given its sensitive nature and expensive data collection,…
▽ More
Deep learning models need a sufficient amount of data in order to be able to find the hidden patterns in it. It is the purpose of generative modeling to learn the data distribution, thus allowing us to sample more data and augment the original dataset. In the context of physiological data, and more specifically electrocardiogram (ECG) data, given its sensitive nature and expensive data collection, we can exploit the benefits of generative models in order to enlarge existing datasets and improve downstream tasks, in our case, classification of heart rhythm.
In this work, we explore the usefulness of synthetic data generated with different generative models from Deep Learning namely Diffweave, Time-Diffusion and Time-VQVAE in order to obtain better classification results for two open source multivariate ECG datasets. Moreover, we also investigate the effects of transfer learning, by fine-tuning a synthetically pre-trained model and then progressively adding increasing proportions of real data. We conclude that although the synthetic samples resemble the real ones, the classification improvement when simply augmenting the real dataset is barely noticeable on individual datasets, but when both datasets are merged the results show an increase across all metrics for the classifiers when using synthetic samples as augmented data. From the fine-tuning results the Time-VQVAE generative model has shown to be superior to the others but not powerful enough to achieve results close to a classifier trained with real data only. In addition, methods and metrics for measuring closeness between synthetic data and the real one have been explored as a side effect of the main research questions of this study.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Agile, User-Centered Design and Quality in Software Processes for Mobile Application Development Teaching
Authors:
Manuel Ignacio Castillo López,
Ana Libia Eslava Cervantes,
Gustavo de la Cruz Martínez,
Jorge Luis Ortega Arjona
Abstract:
Agile methods in undergraduate courses have been explored in an effort to close the gap between industry and professional profiles. We have structured an Android application development course based on a tailored user-centered Agile process for development of educational digital tools. This process is based on Scrum and Extreme Programming in combination with User Experience (UX) approaches. The c…
▽ More
Agile methods in undergraduate courses have been explored in an effort to close the gap between industry and professional profiles. We have structured an Android application development course based on a tailored user-centered Agile process for development of educational digital tools. This process is based on Scrum and Extreme Programming in combination with User Experience (UX) approaches. The course is executed in two phases: the first half of the semester presents theory on Agile and mobile applications development, the latter half is managed as a workshop where students develop for an actual client. The introduction of UX and user-centered design exploiting the close relationship with stakeholders expected from Agile processes allows for different quality features development. Since 2019 two of the projects have been extended and one project has been developed with the described process and course alumni. Students and stakeholders have found value in the generated products and process.
△ Less
Submitted 25 September, 2023;
originally announced November 2023.
-
Healthy Twitter discussions? Time will tell
Authors:
Dmitry Gnatyshak,
Dario Garcia-Gasulla,
Sergio Alvarez-Napagao,
Jamie Arjona,
Tommaso Venturini
Abstract:
Studying misinformation and how to deal with unhealthy behaviours within online discussions has recently become an important field of research within social studies. With the rapid development of social media, and the increasing amount of available information and sources, rigorous manual analysis of such discourses has become unfeasible. Many approaches tackle the issue by studying the semantic a…
▽ More
Studying misinformation and how to deal with unhealthy behaviours within online discussions has recently become an important field of research within social studies. With the rapid development of social media, and the increasing amount of available information and sources, rigorous manual analysis of such discourses has become unfeasible. Many approaches tackle the issue by studying the semantic and syntactic properties of discussions following a supervised approach, for example using natural language processing on a dataset labeled for abusive, fake or bot-generated content. Solutions based on the existence of a ground truth are limited to those domains which may have ground truth. However, within the context of misinformation, it may be difficult or even impossible to assign labels to instances. In this context, we consider the use of temporal dynamic patterns as an indicator of discussion health. Working in a domain for which ground truth was unavailable at the time (early COVID-19 pandemic discussions) we explore the characterization of discussions based on the the volume and time of contributions. First we explore the types of discussions in an unsupervised manner, and then characterize these types using the concept of ephemerality, which we formalize. In the end, we discuss the potential use of our ephemerality definition for labeling online discourses based on how desirable, healthy and constructive they are.
△ Less
Submitted 12 May, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Benchmarking the computing resources at the Instituto de Astrofísica de Canarias
Authors:
Nicola Caon,
Antonio Dorta,
Juan Carlos Trelles Arjona
Abstract:
The aim of this study is the characterization of the computing resources used by researchers at the "Instituto de Astrofísica de Canarias" (IAC). Since there is a huge demand of computing time and we use tools such as HTCondor to implement High Throughput Computing (HTC) across all available PCs, it is essential for us to assess in a quantitative way, using objective parameters, the performances o…
▽ More
The aim of this study is the characterization of the computing resources used by researchers at the "Instituto de Astrofísica de Canarias" (IAC). Since there is a huge demand of computing time and we use tools such as HTCondor to implement High Throughput Computing (HTC) across all available PCs, it is essential for us to assess in a quantitative way, using objective parameters, the performances of our computing nodes. In order to achieve that, we have run a set of benchmark tests on a number of different desktop and laptop PC models among those used in our institution. In particular, we run the "Polyhedron Fortran Benchmarks" suite, using three different compilers: GNU Fortran Compiler, Intel Fortran Compiler and the PGI Fortran Compiler; execution times are then normalized to the reference values published by Polyhedron. The same tests were run multiple times on a same PCs, and on 3 to 5 PCs of the same model (whenever possible) to check for repeatability and consistency of the results. We found that in general execution times, for a given PC model, are consistent within an uncertainty of about 10%, and show a gain in CPU speed of a factor of about 3 between the oldest PCs used at the IAC (7-8 years old) and the newest ones.
△ Less
Submitted 16 February, 2017;
originally announced February 2017.
-
A Measurement-based Analysis of the Energy Consumption of Data Center Servers
Authors:
Jordi Arjona,
Angelos Chatzipapas,
Antonio Fernandez Anta,
Vincenzo Mancuso
Abstract:
Energy consumption is a growing issue in data centers, impacting their economic viability and their public image. In this work we empirically characterize the power and energy consumed by different types of servers. In particular, in order to understand the behavior of their energy and power consumption, we perform measurements in different servers. In each of them, we exhaustively measure the pow…
▽ More
Energy consumption is a growing issue in data centers, impacting their economic viability and their public image. In this work we empirically characterize the power and energy consumed by different types of servers. In particular, in order to understand the behavior of their energy and power consumption, we perform measurements in different servers. In each of them, we exhaustively measure the power consumed by the CPU, the disk, and the network interface under different configurations, identifying the optimal operational levels. One interesting conclusion of our study is that the curve that defines the minimal CPU power as a function of the load is neither linear nor purely convex as has been previously assumed. Moreover, we find that the efficiency of the various server components can be maximized by tuning the CPU frequency and the number of active cores as a function of the system and network load, while the block size of I/O operations should be always maximized by applications. We also show how to estimate the energy consumed by an application as a function of some simple parameters, like the CPU load, and the disk and network activity. We validate the proposed approach by accurately estimating the energy of a map-reduce computation in a Hadoop platform.
△ Less
Submitted 4 February, 2014;
originally announced February 2014.