Search | arXiv e-print repository

A New Family of Thread to Core Allocation Policies for an SMT ARM Processor

Authors: Marta Navarro, Josué Feliu, Salvador Petit, María E. Gómez, Julio Sahuquillo

Abstract: Modern high-performance servers commonly integrate Simultaneous Multithreading (SMT) processors, which efficiently boosts throughput over single-threaded cores. Optimizing performance in SMT processors faces challenges due to the inter-application interference within each SMT core. To mitigate the interference, thread-to-core (T2C) allocation policies play a pivotal role. State-of-the-art T2C poli… ▽ More Modern high-performance servers commonly integrate Simultaneous Multithreading (SMT) processors, which efficiently boosts throughput over single-threaded cores. Optimizing performance in SMT processors faces challenges due to the inter-application interference within each SMT core. To mitigate the interference, thread-to-core (T2C) allocation policies play a pivotal role. State-of-the-art T2C policies work in two steps: i) building a per-application performance stack using performance counters and ii) building performance prediction models to identify the best pairs of applications to run on each core. This paper explores distinct ways to build the performance stack in ARM processors and introduces the Instructions and Stalls Cycles (ISC) stack, a novel approach to overcome ARM PMU limitations. The ISC stacks are used as inputs for a performance prediction model to estimate the applications' performance considering the inter-application interference. The accuracy of the prediction model (second step) depends on the accuracy of the performance stack (first step); thus, the higher the accuracy of the performance stack, the higher the potential performance gains obtained by the T2C allocation policy. This paper presents SYNPA as a family of T2C allocation policies. Experimental results show that $SYNPA4$, the best-performing SYNPA variant, outperforms turnaround time by 38\% over Linux, which represents 3$\times$ the gains achieved by the state-of-the-art policies for ARM processors. Furthermore, the multiple discussions and refinements presented throughout this paper can be applied to other SMT processors from distinct vendors and are aimed at helping performance analysts build performance stacks for accurate performance estimates in real processors. △ Less

Submitted 1 July, 2025; originally announced July 2025.

Comments: 13 pages

arXiv:2505.24712 [pdf, ps, other]

HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America

Authors: Guido Ivetta, Marcos J. Gomez, Sofía Martinelli, Pietro Palombini, M. Emilia Echeveste, Nair Carolina Mazzeo, Beatriz Busaniche, Luciana Benotti

Abstract: Most resources for evaluating social biases in Large Language Models are developed without co-design from the communities affected by these biases, and rarely involve participatory approaches. We introduce HESEIA, a dataset of 46,499 sentences created in a professional development course. The course involved 370 high-school teachers and 5,370 students from 189 Latin-American schools. Unlike existi… ▽ More Most resources for evaluating social biases in Large Language Models are developed without co-design from the communities affected by these biases, and rarely involve participatory approaches. We introduce HESEIA, a dataset of 46,499 sentences created in a professional development course. The course involved 370 high-school teachers and 5,370 students from 189 Latin-American schools. Unlike existing benchmarks, HESEIA captures intersectional biases across multiple demographic axes and school subjects. It reflects local contexts through the lived experience and pedagogical expertise of educators. Teachers used minimal pairs to create sentences that express stereotypes relevant to their school subjects and communities. We show the dataset diversity in term of demographic axes represented and also in terms of the knowledge areas included. We demonstrate that the dataset contains more stereotypes unrecognized by current LLMs than previous datasets. HESEIA is available to support bias assessments grounded in educational communities. △ Less

Submitted 30 May, 2025; originally announced May 2025.

arXiv:2504.20275 [pdf, other]

Smart Water Security with AI and Blockchain-Enhanced Digital Twins

Authors: Mohammadhossein Homaei, Victor Gonzalez Morales, Oscar Mogollon Gutierrez, Ruben Molano Gomez, Andres Caro

Abstract: Water distribution systems in rural areas face serious challenges such as a lack of real-time monitoring, vulnerability to cyberattacks, and unreliable data handling. This paper presents an integrated framework that combines LoRaWAN-based data acquisition, a machine learning-driven Intrusion Detection System (IDS), and a blockchain-enabled Digital Twin (BC-DT) platform for secure and transparent w… ▽ More Water distribution systems in rural areas face serious challenges such as a lack of real-time monitoring, vulnerability to cyberattacks, and unreliable data handling. This paper presents an integrated framework that combines LoRaWAN-based data acquisition, a machine learning-driven Intrusion Detection System (IDS), and a blockchain-enabled Digital Twin (BC-DT) platform for secure and transparent water management. The IDS filters anomalous or spoofed data using a Long Short-Term Memory (LSTM) Autoencoder and Isolation Forest before validated data is logged via smart contracts on a private Ethereum blockchain using Proof of Authority (PoA) consensus. The verified data feeds into a real-time DT model supporting leak detection, consumption forecasting, and predictive maintenance. Experimental results demonstrate that the system achieves over 80 transactions per second (TPS) with under 2 seconds of latency while remaining cost-effective and scalable for up to 1,000 smart meters. This work demonstrates a practical and secure architecture for decentralized water infrastructure in under-connected rural environments. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: 8 Pages, 9 Figures

arXiv:2503.05268 [pdf, ps, other]

ZOGRASCOPE: A New Benchmark for Semantic Parsing over Property Graphs

Authors: Francesco Cazzaro, Justin Kleindienst, Sofia Marquez Gomez, Ariadna Quattoni

Abstract: In recent years, the need for natural language interfaces to knowledge graphs has become increasingly important since they enable easy and efficient access to the information contained in them. In particular, property graphs (PGs) have seen increased adoption as a means of representing complex structured information. Despite their growing popularity in industry, PGs remain relatively underrepresen… ▽ More In recent years, the need for natural language interfaces to knowledge graphs has become increasingly important since they enable easy and efficient access to the information contained in them. In particular, property graphs (PGs) have seen increased adoption as a means of representing complex structured information. Despite their growing popularity in industry, PGs remain relatively underrepresented in semantic parsing research with a lack of resources for evaluation. To address this gap, we introduce ZOGRASCOPE, a benchmark designed specifically for PGs and queries written in Cypher. Our benchmark includes a diverse set of manually annotated queries of varying complexity and is organized into three partitions: iid, compositional and length. We complement this paper with a set of experiments that test the performance of different LLMs in a variety of learning settings. △ Less

Submitted 30 May, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

arXiv:2410.13043 [pdf, other]

UniCoN: Universal Conditional Networks for Multi-Age Embryonic Cartilage Segmentation with Sparsely Annotated Data

Authors: Nishchal Sapkota, Yejia Zhang, Zihao Zhao, Maria Gomez, Yuhan Hsi, Jordan A. Wilson, Kazuhiko Kawasaki, Greg Holmes, Meng Wu, Ethylin Wang Jabs, Joan T. Richtsmeier, Susan M. Motch Perrine, Danny Z. Chen

Abstract: Osteochondrodysplasia, affecting 2-3% of newborns globally, is a group of bone and cartilage disorders that often result in head malformations, contributing to childhood morbidity and reduced quality of life. Current research on this disease using mouse models faces challenges since it involves accurately segmenting the developing cartilage in 3D micro-CT images of embryonic mice. Tackling this se… ▽ More Osteochondrodysplasia, affecting 2-3% of newborns globally, is a group of bone and cartilage disorders that often result in head malformations, contributing to childhood morbidity and reduced quality of life. Current research on this disease using mouse models faces challenges since it involves accurately segmenting the developing cartilage in 3D micro-CT images of embryonic mice. Tackling this segmentation task with deep learning (DL) methods is laborious due to the big burden of manual image annotation, expensive due to the high acquisition costs of 3D micro-CT images, and difficult due to embryonic cartilage's complex and rapidly changing shapes. While DL approaches have been proposed to automate cartilage segmentation, most such models have limited accuracy and generalizability, especially across data from different embryonic age groups. To address these limitations, we propose novel DL methods that can be adopted by any DL architectures -- including CNNs, Transformers, or hybrid models -- which effectively leverage age and spatial information to enhance model performance. Specifically, we propose two new mechanisms, one conditioned on discrete age categories and the other on continuous image crop locations, to enable an accurate representation of cartilage shape changes across ages and local shape details throughout the cranial region. Extensive experiments on multi-age cartilage segmentation datasets show significant and consistent performance improvements when integrating our conditional modules into popular DL segmentation architectures. On average, we achieve a 1.7% Dice score increase with minimal computational overhead and a 7.5% improvement on unseen data. These results highlight the potential of our approach for developing robust, universal models capable of handling diverse datasets with limited annotated data, a key challenge in DL-based medical image analysis. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2408.10361 [pdf, other]

ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge

Authors: Juan M. Martín-Doñas, Eros Roselló, Angel M. Gomez, Aitor Álvarez, Iván López-Espejo, Antonio M. Peinado

Abstract: This paper presents the work carried out by the ASASVIcomtech team, made up of researchers from Vicomtech and University of Granada, for the ASVspoof5 Challenge. The team has participated in both Track 1 (speech deepfake detection) and Track 2 (spoofing-aware speaker verification). This work started with an analysis of the challenge available data, which was regarded as an essential step to avoid… ▽ More This paper presents the work carried out by the ASASVIcomtech team, made up of researchers from Vicomtech and University of Granada, for the ASVspoof5 Challenge. The team has participated in both Track 1 (speech deepfake detection) and Track 2 (spoofing-aware speaker verification). This work started with an analysis of the challenge available data, which was regarded as an essential step to avoid later potential biases of the trained models, and whose main conclusions are presented here. With respect to the proposed approaches, a closed-condition system employing a deep complex convolutional recurrent architecture was developed for Track 1, although, unfortunately, no noteworthy results were achieved. On the other hand, different possibilities of open-condition systems, based on leveraging self-supervised models, augmented training data from previous challenges, and novel vocoders, were explored for both tracks, finally achieving very competitive results with an ensemble system. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: This paper was accepted at ASVspoof Workshop 2024

arXiv:2406.09474 [pdf, other]

doi 10.1029/2024GL110960

Lightning-Fast Convective Outlooks: Predicting Severe Convective Environments with Global AI-based Weather Models

Authors: Monika Feldmann, Tom Beucler, Milton Gomez, Olivia Martius

Abstract: Severe convective storms are among the most dangerous weather phenomena and accurate forecasts mitigate their impacts. The recently released suite of AI-based weather models produces medium-range forecasts within seconds, with a skill similar to state-of-the-art operational forecasts for variables on single levels. However, predicting severe thunderstorm environments requires accurate combinations… ▽ More Severe convective storms are among the most dangerous weather phenomena and accurate forecasts mitigate their impacts. The recently released suite of AI-based weather models produces medium-range forecasts within seconds, with a skill similar to state-of-the-art operational forecasts for variables on single levels. However, predicting severe thunderstorm environments requires accurate combinations of dynamic and thermodynamic variables and the vertical structure of the atmosphere. Advancing the assessment of AI-models towards process-based evaluations lays the foundation for hazard-driven applications. We assess the forecast skill of three top-performing AI-models for convective parameters at lead-times of up to 10 days against reanalysis and ECMWF's operational numerical weather prediction model IFS. In a case study and seasonal analyses, we see the best performance by GraphCast and Pangu-Weather: these models match or even exceed the performance of IFS for instability and shear. This opens opportunities for fast and inexpensive predictions of severe weather environments. △ Less

Submitted 9 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Main body: 17 pages, 4 Figures. SI: 11 pages, 3 Figures. Under review at Geophysical Research Letters

arXiv:2401.04631 [pdf, other]

Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring

Authors: Samuel Yanes Luis, Dmitriy Shutin, Juan Marchal Gómez, Daniel Gutiérrez Reina, Sergio Toral Marín

Abstract: The conservation of hydrological resources involves continuously monitoring their contamination. A multi-agent system composed of autonomous surface vehicles is proposed in this paper to efficiently monitor the water quality. To achieve a safe control of the fleet, the fleet policy should be able to act based on measurements and to the the fleet state. It is proposed to use Local Gaussian Processe… ▽ More The conservation of hydrological resources involves continuously monitoring their contamination. A multi-agent system composed of autonomous surface vehicles is proposed in this paper to efficiently monitor the water quality. To achieve a safe control of the fleet, the fleet policy should be able to act based on measurements and to the the fleet state. It is proposed to use Local Gaussian Processes and Deep Reinforcement Learning to jointly obtain effective monitoring policies. Local Gaussian processes, unlike classical global Gaussian processes, can accurately model the information in a dissimilar spatial correlation which captures more accurately the water quality information. A Deep convolutional policy is proposed, that bases the decisions on the observation on the mean and variance of this model, by means of an information gain reward. Using a Double Deep Q-Learning algorithm, agents are trained to minimize the estimation error in a safe manner thanks to a Consensus-based heuristic. Simulation results indicate an improvement of up to 24% in terms of the mean absolute error with the proposed models. Also, training results with 1-3 agents indicate that our proposed approach returns 20% and 24% smaller average estimation errors for, respectively, monitoring water quality variables and monitoring algae blooms, as compared to state-of-the-art approaches △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.03736 [pdf, other]

Lessons Learned: Reproducibility, Replicability, and When to Stop

Authors: Milton S. Gomez, Tom Beucler

Abstract: While extensive guidance exists for ensuring the reproducibility of one's own study, there is little discussion regarding the reproduction and replication of external studies within one's own research. To initiate this discussion, drawing lessons from our experience reproducing an operational product for predicting tropical cyclogenesis, we present a two-dimensional framework to offer guidance on… ▽ More While extensive guidance exists for ensuring the reproducibility of one's own study, there is little discussion regarding the reproduction and replication of external studies within one's own research. To initiate this discussion, drawing lessons from our experience reproducing an operational product for predicting tropical cyclogenesis, we present a two-dimensional framework to offer guidance on reproduction and replication. Our framework, representing model fitting on one axis and its use in inference on the other, builds upon three key aspects: the dataset, the metrics, and the model itself. By assessing the trajectories of our studies on this 2D plane, we can better inform the claims made using our research. Additionally, we use this framework to contextualize the utility of benchmark datasets in the atmospheric sciences. Our two-dimensional framework provides a tool for researchers, especially early career researchers, to incorporate prior work in their own research and to inform the claims they can make in this context. △ Less

Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Main Text: 6 pages with 2 Figures Supplementary Text: 7 Pages with 3 figures and 3 tables Submitted to AMS AIES Lessons Learned (https://journals.ametsoc.org/view/journals/aies/aies-overview.xml)

arXiv:2401.00060 [pdf, ps, other]

doi 10.33232/001c.124452

Assessing your Observatory's Impact: Best Practices in Establishing and Maintaining Observatory Bibliographies

Authors: Observatory Bibliographers Collaboration, Raffaele D'Abrusco, Monique Gomez, Uta Grothkopf, Sharon Hunt, Ruth Kneale, Mika Konuma, Jenny Novacescu, Luisa Rebull, Elena Scire, Erin Scott, Richard Shaw, Donna Thompson, Lance Utley, Christopher Wilkinson, Sherry Winkelman

Abstract: Observatories need to measure and evaluate the scientific output and overall impact of their facilities. An observatory bibliography consists of the papers published using that observatory's data, typically gathered by searching the major journals for relevant keywords. Recently, the volume of literature and methods by which the publications pool is evaluated has increased. Efficient and standardi… ▽ More Observatories need to measure and evaluate the scientific output and overall impact of their facilities. An observatory bibliography consists of the papers published using that observatory's data, typically gathered by searching the major journals for relevant keywords. Recently, the volume of literature and methods by which the publications pool is evaluated has increased. Efficient and standardized procedures are necessary to assign meaningful metadata; enable user-friendly retrieval; and provide the opportunity to derive reports, statistics, and visualizations to impart a deeper understanding of the research output. In 2021, a group of observatory bibliographers from around the world convened online to continue the discussions presented in Lagerstrom (2015). We worked to extract general guidelines from our experiences, techniques, and lessons learnt. The paper explores the development, application, and current status of telescope bibliographies and future trends. This paper briefly describes the methodologies employed in constructing databases, along with the various bibliometric techniques used to analyze and interpret them. We explain reasons for non-standardization and why it is essential for each observatory to identify metadata and metrics that are meaningful for them; caution the (over-)use of comparisons among facilities that are, ultimately, not comparable through bibliometrics; and highlight the benefits of telescope bibliographies, both for researchers within the astronomical community and for stakeholders beyond the specific observatories. There is tremendous diversity in the ways bibliographers track publications and maintain databases, due to parameters such as resources, type of observatory, historical practices, and reporting requirements to funders and outside agencies. However, there are also common sets of Best Practices. △ Less

Submitted 4 October, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: 19 pages, 3 appendices

arXiv:2311.14057 [pdf, other]

doi 10.1007/978-3-031-40725-3_27

Assessing the Impact of Noise on Quantum Neural Networks: An Experimental Analysis

Authors: Erik B. Terres Escudero, Danel Arias Alamo, Oier Mentxaka Gómez, Pablo García Bringas

Abstract: In the race towards quantum computing, the potential benefits of quantum neural networks (QNNs) have become increasingly apparent. However, Noisy Intermediate-Scale Quantum (NISQ) processors are prone to errors, which poses a significant challenge for the execution of complex algorithms or quantum machine learning. To ensure the quality and security of QNNs, it is crucial to explore the impact of… ▽ More In the race towards quantum computing, the potential benefits of quantum neural networks (QNNs) have become increasingly apparent. However, Noisy Intermediate-Scale Quantum (NISQ) processors are prone to errors, which poses a significant challenge for the execution of complex algorithms or quantum machine learning. To ensure the quality and security of QNNs, it is crucial to explore the impact of noise on their performance. This paper provides a comprehensive analysis of the impact of noise on QNNs, examining the Mottonen state preparation algorithm under various noise models and studying the degradation of quantum states as they pass through multiple layers of QNNs. Additionally, the paper evaluates the effect of noise on the performance of pre-trained QNNs and highlights the challenges posed by noise models in quantum computing. The findings of this study have significant implications for the development of quantum software, emphasizing the importance of prioritizing stability and noise-correction measures when developing QNNs to ensure reliable and trustworthy results. This paper contributes to the growing body of literature on quantum computing and quantum machine learning, providing new insights into the impact of noise on QNNs and paving the way towards the development of more robust and efficient quantum algorithms. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Journal ref: Hybrid Artificial Intelligent Systems. HAIS 2023. Lecture Notes in Computer Science(), vol 14001. Springer, Cham

arXiv:2311.05021 [pdf, other]

doi 10.1016/j.compmedimag.2024.102390

Leveraging a realistic synthetic database to learn Shape-from-Shading for estimating the colon depth in colonoscopy images

Authors: Josué Ruano, Martín Gómez, Eduardo Romero, Antoine Manzanera

Abstract: Colonoscopy is the choice procedure to diagnose colon and rectum cancer, from early detection of small precancerous lesions (polyps), to confirmation of malign masses. However, the high variability of the organ appearance and the complex shape of both the colon wall and structures of interest make this exploration difficult. Learned visuospatial and perceptual abilities mitigate technical limitati… ▽ More Colonoscopy is the choice procedure to diagnose colon and rectum cancer, from early detection of small precancerous lesions (polyps), to confirmation of malign masses. However, the high variability of the organ appearance and the complex shape of both the colon wall and structures of interest make this exploration difficult. Learned visuospatial and perceptual abilities mitigate technical limitations in clinical practice by proper estimation of the intestinal depth. This work introduces a novel methodology to estimate colon depth maps in single frames from monocular colonoscopy videos. The generated depth map is inferred from the shading variation of the colon wall with respect to the light source, as learned from a realistic synthetic database. Briefly, a classic convolutional neural network architecture is trained from scratch to estimate the depth map, improving sharp depth estimations in haustral folds and polyps by a custom loss function that minimizes the estimation error in edges and curvatures. The network was trained by a custom synthetic colonoscopy database herein constructed and released, composed of 248,400 frames (47 videos), with depth annotations at the level of pixels. This collection comprehends 5 subsets of videos with progressively higher levels of visual complexity. Evaluation of the depth estimation with the synthetic database reached a threshold accuracy of 95.65%, and a mean-RMSE of 0.451 cm, while a qualitative assessment with a real database showed consistent depth estimations, visually evaluated by the expert gastroenterologist coauthoring this paper. Finally, the method achieved competitive performance with respect to another state-of-the-art method using a public synthetic database and comparable results in a set of images with other five state-of-the-art methods. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Journal ref: Ruano, J., Gomez, M., Romero, E., & Manzanera, A. (2024). Leveraging a realistic synthetic database to learn Shape-from-Shading for estimating the colon depth in colonoscopy images. Computerized Medical Imaging and Graphics, 102390

arXiv:2310.12786 [pdf, other]

SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors

Authors: Marta Navarro, Josué Feliu, Salvador Petit, María E. Gómez, Julio Sahuquillo

Abstract: Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference within the core, which has a negative impact on individual application performance and can significantly increase the turnaround time of multi-program workloads. T… ▽ More Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference within the core, which has a negative impact on individual application performance and can significantly increase the turnaround time of multi-program workloads. The severity of the interference effects depends on the competing co-runners sharing the core. Thus, it can be mitigated by applying a thread-to-core allocation policy that smartly selects applications to be run in the same core to minimize their interference. This paper presents SYNPA, a simple approach that dynamically allocates threads to cores in an SMT processor based on their run-time dynamic behavior. The approach uses a regression model to select synergistic pairs to mitigate intra-core interference. The main novelty of SYNPA is that it uses just three variables collected from the performance counters available in current ARM processors at the dispatch stage. Experimental results show that SYNPA outperforms the default Linux scheduler by around 36%, on average, in terms of turnaround time in 8-application workloads combining frontend bound and backend bound benchmarks. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 11 pages, 9 figures

arXiv:2310.02744 [pdf, other]

SALSA: Semantically-Aware Latent Space Autoencoder

Authors: Kathryn E. Kirchoff, Travis Maxfield, Alexander Tropsha, Shawn M. Gomez

Abstract: In deep learning for drug discovery, chemical data are often represented as simplified molecular-input line-entry system (SMILES) sequences which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations th… ▽ More In deep learning for drug discovery, chemical data are often represented as simplified molecular-input line-entry system (SMILES) sequences which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations that are semantically meaningful, where semantics are defined by the structural (graph-to-graph) similarities between molecules. We demonstrate by example that autoencoders may map structurally similar molecules to distant codes, resulting in an incoherent latent space that does not respect the structural similarities between molecules. To address this shortcoming we propose Semantically-Aware Latent Space Autoencoder (SALSA), a transformer-autoencoder modified with a contrastive task, tailored specifically to learn graph-to-graph similarity between molecules. Formally, the contrastive objective is to map structurally similar molecules (separated by a single graph edit) to nearby codes in the latent space. To accomplish this, we generate a novel dataset comprised of sets of structurally similar molecules and opt for a supervised contrastive loss that is able to incorporate full sets of positive samples. We compare SALSA to its ablated counterparts, and show empirically that the composed training objective (reconstruction and contrastive task) leads to a higher quality latent space that is more 1) structurally-aware, 2) semantically continuous, and 3) property-aware. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2308.15710 [pdf, ps, other]

Speech Wikimedia: A 77 Language Multilingual Speech Dataset

Authors: Rafael Mosquera Gómez, Julián Eusse, Juan Ciro, Daniel Galvez, Ryan Hileman, Kurt Bollacker, David Kanter

Abstract: The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages. Each audio file has one or more transcriptions in different languages, making this dataset suitable for training speech recogni… ▽ More The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages. Each audio file has one or more transcriptions in different languages, making this dataset suitable for training speech recognition, speech translation, and machine translation models. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Data-Centric Machine Learning Workshop at the International Machine Learning Conference 2023 (ICML)

arXiv:2304.05294 [pdf, other]

Selecting Robust Features for Machine Learning Applications using Multidata Causal Discovery

Authors: Saranya Ganesh S., Tom Beucler, Frederick Iat-Hin Tam, Milton S. Gomez, Jakob Runge, Andreas Gerhardus

Abstract: Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously pro… ▽ More Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification. △ Less

Submitted 30 June, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: 11 pages, 4 figures, 1 table, Supporting Information, Accepted for an oral presentation at the Climate Informatics 2023

arXiv:2304.03571 [pdf, other]

$β$-Variational autoencoders and transformers for reduced-order modelling of fluid flows

Authors: Alberto Solera-Rico, Carlos Sanmiguel Vila, M. A. Gómez, Yuning Wang, Abdulrahman Almashjary, Scott T. M. Dawson, Ricardo Vinuesa

Abstract: Variational autoencoder (VAE) architectures have the potential to develop reduced-order models (ROMs) for chaotic fluid flows. We propose a method for learning compact and near-orthogonal ROMs using a combination of a $β$-VAE and a transformer, tested on numerical data from a two-dimensional viscous flow in both periodic and chaotic regimes. The $β$-VAE is trained to learn a compact latent represe… ▽ More Variational autoencoder (VAE) architectures have the potential to develop reduced-order models (ROMs) for chaotic fluid flows. We propose a method for learning compact and near-orthogonal ROMs using a combination of a $β$-VAE and a transformer, tested on numerical data from a two-dimensional viscous flow in both periodic and chaotic regimes. The $β$-VAE is trained to learn a compact latent representation of the flow velocity, and the transformer is trained to predict the temporal dynamics in latent space. Using the $β$-VAE to learn disentangled representations in latent-space, we obtain a more interpretable flow model with features that resemble those observed in the proper orthogonal decomposition, but with a more efficient representation. Using Poincaré maps, the results show that our method can capture the underlying dynamics of the flow outperforming other prediction models. The proposed method has potential applications in other fields such as weather forecasting, structural dynamics or biomedical engineering. △ Less

Submitted 15 November, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

arXiv:2210.05858 [pdf, ps, other]

Abstract homogeneity of interval-valued functions

Authors: Ana Shirley Monteiro, Regivan Santiago, Radko Mesiar, Marisol Gomez, Martin Papco, Mikel Ferrero-Jaurrieta, Humberto Bustince

Abstract: In this paper we develop the idea of abstract homogeneity in the context of interval-valued (IV) functions endowed with admissible orders and investigate some of its properties. In this paper we develop the idea of abstract homogeneity in the context of interval-valued (IV) functions endowed with admissible orders and investigate some of its properties. △ Less

Submitted 27 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: 2 pages

arXiv:2207.07369 [pdf, other]

A Systematic Literature Review of Game-based Assessment Studies: Trends and Challenges

Authors: Manuel J. Gomez, José A. Ruipérez-Valiente, Félix J. García Clemente

Abstract: Technology has become an essential part of our everyday life, and its use in educational environments keeps growing. In addition, games are one of the most popular activities across cultures and ages, and there is ample evidence that supports the benefits of using games for assessment. This field is commonly known as game-based assessment (GBA), which refers to the use of games to assess learners'… ▽ More Technology has become an essential part of our everyday life, and its use in educational environments keeps growing. In addition, games are one of the most popular activities across cultures and ages, and there is ample evidence that supports the benefits of using games for assessment. This field is commonly known as game-based assessment (GBA), which refers to the use of games to assess learners' competencies, skills, or knowledge. This paper analyzes the current status of the GBA field by performing the first systematic literature review on empirical GBA studies. It is based on 65 research papers that used digital GBAs to determine: (1) the context where the study has been applied; (2) the primary purpose; (3) the domain of the game used; (4) game/tool availability; (5) the size of the data sample; (6) the computational methods and algorithms applied; (7) the targeted stakeholders of the study; and (8) what limitations and challenges are reported by authors. Based on the categories established and our analysis, the findings suggest that GBAs are mainly used in K-16 education and for assessment purposes, and that most GBAs focus on assessing STEM content, and cognitive and soft skills. Furthermore, the current limitations indicate that future GBA research would benefit from the use of bigger data samples and more specialized algorithms. Based on our results, we discuss current trends in the field and open challenges (including replication and validation problems), providing recommendations for the future research agenda of the GBA field. △ Less

Submitted 2 December, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: 25 pages, 12 figures, 1 table

arXiv:2206.10536 [pdf, other]

HealNet -- Self-Supervised Acute Wound Heal-Stage Classification

Authors: Héctor Carrión, Mohammad Jafari, Hsin-Ya Yang, Roslyn Rivkah Isseroff, Marco Rolandi, Marcella Gomez, Narges Norouzi

Abstract: Identifying, tracking, and predicting wound heal-stage progression is a fundamental task towards proper diagnosis, effective treatment, facilitating healing, and reducing pain. Traditionally, a medical expert might observe a wound to determine the current healing state and recommend treatment. However, sourcing experts who can produce such a diagnosis solely from visual indicators can be difficult… ▽ More Identifying, tracking, and predicting wound heal-stage progression is a fundamental task towards proper diagnosis, effective treatment, facilitating healing, and reducing pain. Traditionally, a medical expert might observe a wound to determine the current healing state and recommend treatment. However, sourcing experts who can produce such a diagnosis solely from visual indicators can be difficult, time-consuming and expensive. In addition, lesions may take several weeks to undergo the healing process, demanding resources to monitor and diagnose continually. Automating this task can be challenging; datasets that follow wound progression from onset to maturation are small, rare, and often collected without computer vision in mind. To tackle these challenges, we introduce a self-supervised learning scheme composed of (a) learning embeddings of wound's temporal dynamics, (b) clustering for automatic stage discovery, and (c) fine-tuned classification. The proposed self-supervised and flexible learning framework is biologically inspired and trained on a small dataset with zero human labeling. The HealNet framework achieved high pre-text and downstream classification accuracy; when evaluated on held-out test data, HealNet achieved 97.7% pre-text accuracy and 90.62% heal-stage classification accuracy. △ Less

Submitted 23 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.07786 [pdf, other]

doi 10.1080/24725854.2022.2157912

Federated Data Analytics: A Study on Linear Models

Authors: Xubo Yue, Raed Al Kontar, Ana María Estrada Gómez

Abstract: As edge devices become increasingly powerful, data analytics are gradually moving from a centralized to a decentralized regime where edge compute resources are exploited to process more of the data locally. This regime of analytics is coined as federated data analytics (FDA). In spite of the recent success stories of FDA, most literature focuses exclusively on deep neural networks. In this work, w… ▽ More As edge devices become increasingly powerful, data analytics are gradually moving from a centralized to a decentralized regime where edge compute resources are exploited to process more of the data locally. This regime of analytics is coined as federated data analytics (FDA). In spite of the recent success stories of FDA, most literature focuses exclusively on deep neural networks. In this work, we take a step back to develop an FDA treatment for one of the most fundamental statistical models: linear regression. Our treatment is built upon hierarchical modeling that allows borrowing strength across multiple groups. To this end, we propose two federated hierarchical model structures that provide a shared representation across devices to facilitate information sharing. Notably, our proposed frameworks are capable of providing uncertainty quantification, variable selection, hypothesis testing and fast adaptation to new unseen data. We validate our methods on a range of real-life applications including condition monitoring for aircraft engines. The results show that our FDA treatment for linear models can serve as a competing benchmark model for future development of federated algorithms. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Journal ref: IISE Transactions, 2023

arXiv:2205.02731 [pdf]

doi 10.1080/02640414.2022.2096771

Characterizing player's playing styles based on Player Vectors for each playing position in the Chinese Football Super League

Authors: Yuesen Li, Shouxin Zong, Yanfei Shen, Zhiqiang Pu, Miguel-Ángel Gómez, Yixiong Cui

Abstract: Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a rece… ▽ More Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a recently adopted Player Vectors framework. Data of 960 matches from 2016-2019 CSL were used. Match ratings, and ten types of match events with the corresponding coordinates for all the lineup players whose on-pitch time exceeded 45 minutes were extracted. Players were first clustered into 8 positions. A player vector was constructed for each player in each match based on the Player Vectors using Nonnegative Matrix Factorization (NMF). Another NMF process was run on the player vectors to extract different types of playing styles. The resulting player vectors discovered 18 different playing styles in the CSL. Six performance indicators of each style were investigated to observe their contributions. In general, the playing styles of forwards and midfielders are in line with football performance evolution trends, while the styles of defenders should be reconsidered. Multifunctional playing styles were also found in high rated CSL players. △ Less

Submitted 7 July, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

Comments: 40 pages, 5 figures, already published on Journal of Sports Sciences

ACM Class: I.2.1

arXiv:2201.06967 [pdf, other]

Large Scale Analysis of Open MOOC Reviews to Support Learners' Course Selection

Authors: Manuel J. Gomez, Mario Calderón, Victor Sánchez, Félix J. García Clemente, José A. Ruipérez-Valiente

Abstract: The recent pandemic has changed the way we see education. It is not surprising that children and college students are not the only ones using online education. Millions of adults have signed up for online classes and courses during last years, and MOOC providers, such as Coursera or edX, are reporting millions of new users signing up in their platforms. However, students do face some challenges wh… ▽ More The recent pandemic has changed the way we see education. It is not surprising that children and college students are not the only ones using online education. Millions of adults have signed up for online classes and courses during last years, and MOOC providers, such as Coursera or edX, are reporting millions of new users signing up in their platforms. However, students do face some challenges when choosing courses. Though online review systems are standard among many verticals, no standardized or fully decentralized review systems exist in the MOOC ecosystem. In this vein, we believe that there is an opportunity to leverage available open MOOC reviews in order to build simpler and more transparent reviewing systems, allowing users to really identify the best courses out there. Specifically, in our research we analyze 2.4 million reviews (which is the largest MOOC reviews dataset used until now) from five different platforms in order to determine the following: (1) if the numeric ratings provide discriminant information to learners, (2) if NLP-driven sentiment analysis on textual reviews could provide valuable information to learners, (3) if we can leverage NLP-driven topic finding techniques to infer themes that could be important for learners, and (4) if we can use these models to effectively characterize MOOCs based on the open reviews. Results show that numeric ratings are clearly biased (63\% of them are 5-star ratings), and the topic modeling reveals some interesting topics related with course advertisements, the real applicability, or the difficulty of the different courses. We expect our study to shed some light on the area and promote a more transparent approach in online education reviews, which are becoming more and more popular as we enter the post-pandemic era. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: 36 pages, 8 figures

arXiv:2106.06008 [pdf, other]

doi 10.1109/TGCN.2019.2960061

On the Bound of Energy Consumption in Cellular IoT Networks

Authors: Bassel Al Homssi, Akram Al-Hourani, Sathyanarayanan Chandrasekharan, Karina Mabell Gomez, Sithamparanathan Kandeepan

Abstract: Billions of sensors are expected to be connected to the Internet through the emerging Internet of Things (IoT) technologies. Many of these sensors will primarily be connected using wireless technologies powered using batteries as their sole energy source which makes it paramount to optimize their energy consumption. In this paper, we provide an analytic framework of the energy-consumption profile… ▽ More Billions of sensors are expected to be connected to the Internet through the emerging Internet of Things (IoT) technologies. Many of these sensors will primarily be connected using wireless technologies powered using batteries as their sole energy source which makes it paramount to optimize their energy consumption. In this paper, we provide an analytic framework of the energy-consumption profile and its lower bound for an IoT end device formulated based on Shannon capacity. We extend the study to model the average energy-consumption performance based on the random geometric distribution of IoT gateways by utilizing tools from stochastic geometry and real measurements of interference in the ISM-band. Experimental data, interference measurements and Monte-Carlo simulations are presented to validate the plausibility of the proposed analytic framework, where results demonstrate that the current network infrastructures performance is bounded between two extreme geometric models. This study considers interference seen by a gateway regardless of its source. △ Less

Submitted 10 June, 2021; originally announced June 2021.

arXiv:2106.04423 [pdf, other]

PANACEA cough sound-based diagnosis of COVID-19 for the DiCOVA 2021 Challenge

Authors: Madhu R. Kamble, Jose A. Gonzalez-Lopez, Teresa Grau, Juan M. Espin, Lorenzo Cascioli, Yiqing Huang, Alejandro Gomez-Alanis, Jose Patino, Roberto Font, Antonio M. Peinado, Angel M. Gomez, Nicholas Evans, Maria A. Zuluaga, Massimiliano Todisco

Abstract: The COVID-19 pandemic has led to the saturation of public health services worldwide. In this scenario, the early diagnosis of SARS-Cov-2 infections can help to stop or slow the spread of the virus and to manage the demand upon health services. This is especially important when resources are also being stretched by heightened demand linked to other seasonal diseases, such as the flu. In this contex… ▽ More The COVID-19 pandemic has led to the saturation of public health services worldwide. In this scenario, the early diagnosis of SARS-Cov-2 infections can help to stop or slow the spread of the virus and to manage the demand upon health services. This is especially important when resources are also being stretched by heightened demand linked to other seasonal diseases, such as the flu. In this context, the organisers of the DiCOVA 2021 challenge have collected a database with the aim of diagnosing COVID-19 through the use of coughing audio samples. This work presents the details of the automatic system for COVID-19 detection from cough recordings presented by team PANACEA. This team consists of researchers from two European academic institutions and one company: EURECOM (France), University of Granada (Spain), and Biometric Vox S.L. (Spain). We developed several systems based on established signal processing and machine learning methods. Our best system employs a Teager energy operator cepstral coefficients (TECCs) based frontend and Light gradient boosting machine (LightGBM) backend. The AUC obtained by this system on the test set is 76.31% which corresponds to a 10% improvement over the official baseline. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted in INTERSPEECH 2021

arXiv:2105.01147 [pdf, other]

Sketches image analysis: Web image search engine usingLSH index and DNN InceptionV3

Authors: Alessio Schiavo, Filippo Minutella, Mattia Daole, Marsha Gomez Gomez

Abstract: The adoption of an appropriate approximate similarity search method is an essential prereq-uisite for developing a fast and efficient CBIR system, especially when dealing with large amount ofdata. In this study we implement a web image search engine on top of a Locality Sensitive Hashing(LSH) Index to allow fast similarity search on deep features. Specifically, we exploit transfer learningfor deep… ▽ More The adoption of an appropriate approximate similarity search method is an essential prereq-uisite for developing a fast and efficient CBIR system, especially when dealing with large amount ofdata. In this study we implement a web image search engine on top of a Locality Sensitive Hashing(LSH) Index to allow fast similarity search on deep features. Specifically, we exploit transfer learningfor deep features extraction from images. Firstly, we adopt InceptionV3 pretrained on ImageNet asfeatures extractor, secondly, we try out several CNNs built on top of InceptionV3 as convolutionalbase fine-tuned on our dataset. In both of the previous cases we index the features extracted within ourLSH index implementation so as to compare the retrieval performances with and without fine-tuning.In our approach we try out two different LSH implementations: the first one working with real numberfeature vectors and the second one with the binary transposed version of those vectors. Interestingly,we obtain the best performances when using the binary LSH, reaching almost the same result, in termsof mean average precision, obtained by performing sequential scan of the features, thus avoiding thebias introduced by the LSH index. Lastly, we carry out a performance analysis class by class in terms ofrecall againstmAPhighlighting, as expected, a strong positive correlation between the two. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2010.14908 [pdf, other]

doi 10.1109/JIOT.2020.2974680

Collective Awareness for Abnormality Detection in Connected Autonomous Vehicles

Authors: Divya Thekke Kanapram, Fabio Patrone, Pablo Marin-Plaza, Mario Marchese, Eliane L. Bodanese, Lucio Marcenaro, David Martín Gómez, Carlo Regazzoni

Abstract: The advancements in connected and autonomous vehicles in these times demand the availability of tools providing the agents with the capability to be aware and predict their own states and context dynamics. This article presents a novel approach to develop an initial level of collective awareness in a network of intelligent agents. A specific collective self awareness functionality is considered, n… ▽ More The advancements in connected and autonomous vehicles in these times demand the availability of tools providing the agents with the capability to be aware and predict their own states and context dynamics. This article presents a novel approach to develop an initial level of collective awareness in a network of intelligent agents. A specific collective self awareness functionality is considered, namely, agent centered detection of abnormal situations present in the environment around any agent in the network. Moreover, the agent should be capable of analyzing how such abnormalities can influence the future actions of each agent. Data driven dynamic Bayesian network (DBN) models learned from time series of sensory data recorded during the realization of tasks (agent network experiences) are here used for abnormality detection and prediction. A set of DBNs, each related to an agent, is used to allow the agents in the network to each synchronously aware possible abnormalities occurring when available models are used on a new instance of the task for which DBNs have been learned. A growing neural gas (GNG) algorithm is used to learn the node variables and conditional probabilities linking nodes in the DBN models; a Markov jump particle filter (MJPF) is employed for state estimation and abnormality detection in each agent using learned DBNs as filter parameters. Performance metrics are discussed to asses the algorithms reliability and accuracy. The impact is also evaluated by the communication channel used by the network to share the data sensed in a distributed way by each agent of the network. The IEEE 802.11p protocol standard has been considered for communication among agents. Real data sets are also used acquired by autonomous vehicles performing different tasks in a controlled environment. △ Less

Submitted 28 October, 2020; originally announced October 2020.

Comments: IEEE Internet of Things Journal

arXiv:2010.05031 [pdf, other]

Understanding Cloud Workloads Performance in a Production like Environment

Authors: Lucia Pons, Josué Feliu, José Puche, Chaoyi Huang, Salvador Petit, Julio Pons, María E. Gómez, Julio Sahuquillo

Abstract: Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that classifies applications according to how the major system resources affect their performance (e.g., tail latency) as a function of the level of load (e.g., QPS). Af… ▽ More Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that classifies applications according to how the major system resources affect their performance (e.g., tail latency) as a function of the level of load (e.g., QPS). After that, we present three main studies addressing three major concerns to improve the cloud performance: impact of the level of load on performance, impact of hyper-threading on performance, and impact of limiting the major system resources (e.g., last level cache) on performance. In all these studies we identified important findings that we hope help cloud providers improve their system utilization. △ Less

Submitted 10 October, 2020; originally announced October 2020.

Comments: 16 pages, 17 figures. Submitted to Journal of Parallel and Distributed Computing

arXiv:2009.02110 [pdf, other]

doi 10.1109/ACCESS.2020.3026579

Silent Speech Interfaces for Speech Restoration: A Review

Authors: Jose A. Gonzalez-Lopez, Alejandro Gomez-Alanis, Juan M. Martín-Doñas, José L. Pérez-Córdoba, Angel M. Gomez

Abstract: This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communicati… ▽ More This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities. △ Less

Submitted 27 September, 2020; v1 submitted 4 September, 2020; originally announced September 2020.

arXiv:2005.01600 [pdf, other]

doi 10.1063/5.0010781

Quantification of MagLIF morphology using the Mallat Scattering Transformation

Authors: Michael E. Glinsky, Thomas W. Moore, William E. Lewis, Matthew R. Weis, Christopher A. Jennings, David J. Ampleford, Patrick F. Knapp, Eric C. Harding, Matthew R. Gomez, Adam J. Harvey-Thompson

Abstract: The morphology of the stagnated plasma resulting from Magnetized Liner Inertial Fusion (MagLIF) is measured by imaging the self-emission x-rays coming from the multi-keV plasma. Equivalent diagnostic response can be generated by integrated radiation-magnetohydrodynamic (rad-MHD) simulations from programs such as HYDRA and GORGON. There have been only limited quantitative ways to compare the image… ▽ More The morphology of the stagnated plasma resulting from Magnetized Liner Inertial Fusion (MagLIF) is measured by imaging the self-emission x-rays coming from the multi-keV plasma. Equivalent diagnostic response can be generated by integrated radiation-magnetohydrodynamic (rad-MHD) simulations from programs such as HYDRA and GORGON. There have been only limited quantitative ways to compare the image morphology, that is the texture, of simulations and experiments. We have developed a metric of image morphology based on the Mallat Scattering Transformation (MST), a transformation that has proved to be effective at distinguishing textures, sounds, and written characters. This metric is designed, demonstrated, and refined by classifying ensembles (i.e., classes) of synthetic stagnation images, and by regressing an ensemble of synthetic stagnation images to the morphology (i.e., model) parameters used to generate the synthetic images. We use this metric to quantitatively compare simulations to experimental images, experimental images to each other, and to estimate the morphological parameters of the experimental images with uncertainty. This coordinate space has proved very adept at doing a sophisticated relative background subtraction in the MST space. This was needed to compare the experimental self emission images to the rad-MHD simulation images. △ Less

Submitted 15 October, 2020; v1 submitted 13 April, 2020; originally announced May 2020.

Comments: 19 pages, 18 figures, 3 tables, 4 animations, accepted for publication in Physics of Plasmas; arXiv admin note: substantial text overlap with arXiv:1911.02359

Report number: Sandia National Laboratories Report: SAND2020-10785 J and SAND2020-11336 O

Journal ref: Phys. Plasmas 27, 112703 (2020)

arXiv:1911.07929 [pdf]

doi 10.30534/ijatcse/2019/116852019

A Smartphone-Based Skin Disease Classification Using MobileNet CNN

Authors: Jessica Velasco, Cherry Pascion, Jean Wilmar Alberio, Jonathan Apuang, John Stephen Cruz, Mark Angelo Gomez, Benjamin Jr. Molina, Lyndon Tuala, August Thio-ac, Romeo Jr. Jorda

Abstract: The MobileNet model was used by applying transfer learning on the 7 skin diseases to create a skin disease classification system on Android application. The proponents gathered a total of 3,406 images and it is considered as imbalanced dataset because of the unequal number of images on its classes. Using different sampling method and preprocessing of input data was explored to further improved the… ▽ More The MobileNet model was used by applying transfer learning on the 7 skin diseases to create a skin disease classification system on Android application. The proponents gathered a total of 3,406 images and it is considered as imbalanced dataset because of the unequal number of images on its classes. Using different sampling method and preprocessing of input data was explored to further improved the accuracy of the MobileNet. Using under-sampling method and the default preprocessing of input data achieved an 84.28% accuracy. While, using imbalanced dataset and default preprocessing of input data achieved a 93.6% accuracy. Then, researchers explored oversampling the dataset and the model attained a 91.8% accuracy. Lastly, by using oversampling technique and data augmentation on preprocessing the input data provide a 94.4% accuracy and this model was deployed on the developed Android application. △ Less

Submitted 13 November, 2019; originally announced November 2019.

Journal ref: International Journal of Advanced Trends in Computer Science and Engineering (2019) 2632-2637

arXiv:1807.00518 [pdf, other]

doi 10.1109/MS.2017.46

App Store 2.0: From Crowd Information to Actionable Feedback in Mobile Ecosystems

Authors: María Gómez, Bram Adams, Walid Maalej, Martin Monperrus, Romain Rouvoy

Abstract: Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores a… ▽ More Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores and discuss the different kinds of actionable feedbacks that app stores can generate using crowdsourced information. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Journal ref: IEEE Software, Institute of Electrical and Electronics Engineers, 2017, 34, pp.81-89

arXiv:1807.00082 [pdf, other]

Amanuensis: The Programmer's Apprentice

Authors: Thomas Dean, Maurice Chiang, Marcus Gomez, Nate Gruver, Yousef Hindy, Michelle Lam, Peter Lu, Sophia Sanchez, Rohun Saxena, Michael Smith, Lucy Wang, Catherine Wong

Abstract: This document provides an overview of the material covered in a course taught at Stanford in the spring quarter of 2018. The course draws upon insight from cognitive and systems neuroscience to implement hybrid connectionist and symbolic reasoning systems that leverage and extend the state of the art in machine learning by integrating human and machine intelligence. As a concrete example we focus… ▽ More This document provides an overview of the material covered in a course taught at Stanford in the spring quarter of 2018. The course draws upon insight from cognitive and systems neuroscience to implement hybrid connectionist and symbolic reasoning systems that leverage and extend the state of the art in machine learning by integrating human and machine intelligence. As a concrete example we focus on digital assistants that learn from continuous dialog with an expert software engineer while providing initial value as powerful analytical, computational and mathematical savants. Over time these savants learn cognitive strategies (domain-relevant problem solving skills) and develop intuitions (heuristics and the experience necessary for applying them) by learning from their expert associates. By doing so these savants elevate their innate analytical skills allowing them to partner on an equal footing as versatile collaborators - effectively serving as cognitive extensions and digital prostheses, thereby amplifying and emulating their human partner's conceptually-flexible thinking patterns and enabling improved access to and control over powerful computing resources. △ Less

Submitted 8 November, 2018; v1 submitted 29 June, 2018; originally announced July 2018.

arXiv:1611.01017 [pdf, ps, other]

On Computing the Dollo-1 phylogeny in polynomial time

Authors: Paola Bonizzoni, Gianluca Della Vedova, Mauricio Soto Gomez, Gabriella Trucco

Abstract: The Dollo model for reconstructing evolutionary trees from binary characters has been proposed as a generalization of the infinite sites model, also known as the Perfect Phylogeny. In particular, the Dollo model is considered more realistic than the Perfect Phylogeny for inferring the evolution of tumor mutations. In the case of binary matrices, the Dollo-$k$ model requires an evolutionary tree in… ▽ More The Dollo model for reconstructing evolutionary trees from binary characters has been proposed as a generalization of the infinite sites model, also known as the Perfect Phylogeny. In particular, the Dollo model is considered more realistic than the Perfect Phylogeny for inferring the evolution of tumor mutations. In the case of binary matrices, the Dollo-$k$ model requires an evolutionary tree in which each character, corresponding to a column in the input matrix, may change from $0$ to $1$ at most once, and from $1$ to $0$ at most $k$ times throughout the entire tree. Given a binary matrix, the problem of deciding whether there exists a Dollo-$k$ tree compatible with the matrix is NP-complete for any fixed $k \geq 2$, while computing a Dollo-$0$ tree corresponds to the Perfect Phylogeny decision problem, which admits a simple linear-time algorithm. The Dollo-$1$ tree problem corresponds to the Persistent Phylogeny problem, whose computational complexity, albeit under an equivalent formulation, was posed as an open question 20 years ago. We solve this problem by presenting a polynomial-time algorithm for the Persistent Phylogeny problem. Our solution relies on efficiently solving a specific class of binary matrices, represented as bipartite graphs called \emph{skeleton graphs}, or simply skeletons. In these graphs, characters are \emph{maximal}, that is their corresponding sets of species are not related by inclusion. △ Less

Submitted 16 June, 2025; v1 submitted 3 November, 2016; originally announced November 2016.

Showing 1–34 of 34 results for author: Gómez, M