-
A New Family of Thread to Core Allocation Policies for an SMT ARM Processor
Authors:
Marta Navarro,
Josué Feliu,
Salvador Petit,
María E. Gómez,
Julio Sahuquillo
Abstract:
Modern high-performance servers commonly integrate Simultaneous Multithreading (SMT) processors, which efficiently boosts throughput over single-threaded cores. Optimizing performance in SMT processors faces challenges due to the inter-application interference within each SMT core. To mitigate the interference, thread-to-core (T2C) allocation policies play a pivotal role. State-of-the-art T2C poli…
▽ More
Modern high-performance servers commonly integrate Simultaneous Multithreading (SMT) processors, which efficiently boosts throughput over single-threaded cores. Optimizing performance in SMT processors faces challenges due to the inter-application interference within each SMT core. To mitigate the interference, thread-to-core (T2C) allocation policies play a pivotal role. State-of-the-art T2C policies work in two steps: i) building a per-application performance stack using performance counters and ii) building performance prediction models to identify the best pairs of applications to run on each core.
This paper explores distinct ways to build the performance stack in ARM processors and introduces the Instructions and Stalls Cycles (ISC) stack, a novel approach to overcome ARM PMU limitations. The ISC stacks are used as inputs for a performance prediction model to estimate the applications' performance considering the inter-application interference. The accuracy of the prediction model (second step) depends on the accuracy of the performance stack (first step); thus, the higher the accuracy of the performance stack, the higher the potential performance gains obtained by the T2C allocation policy.
This paper presents SYNPA as a family of T2C allocation policies. Experimental results show that $SYNPA4$, the best-performing SYNPA variant, outperforms turnaround time by 38\% over Linux, which represents 3$\times$ the gains achieved by the state-of-the-art policies for ARM processors. Furthermore, the multiple discussions and refinements presented throughout this paper can be applied to other SMT processors from distinct vendors and are aimed at helping performance analysts build performance stacks for accurate performance estimates in real processors.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America
Authors:
Guido Ivetta,
Marcos J. Gomez,
Sofía Martinelli,
Pietro Palombini,
M. Emilia Echeveste,
Nair Carolina Mazzeo,
Beatriz Busaniche,
Luciana Benotti
Abstract:
Most resources for evaluating social biases in Large Language Models are developed without co-design from the communities affected by these biases, and rarely involve participatory approaches. We introduce HESEIA, a dataset of 46,499 sentences created in a professional development course. The course involved 370 high-school teachers and 5,370 students from 189 Latin-American schools. Unlike existi…
▽ More
Most resources for evaluating social biases in Large Language Models are developed without co-design from the communities affected by these biases, and rarely involve participatory approaches. We introduce HESEIA, a dataset of 46,499 sentences created in a professional development course. The course involved 370 high-school teachers and 5,370 students from 189 Latin-American schools. Unlike existing benchmarks, HESEIA captures intersectional biases across multiple demographic axes and school subjects. It reflects local contexts through the lived experience and pedagogical expertise of educators. Teachers used minimal pairs to create sentences that express stereotypes relevant to their school subjects and communities. We show the dataset diversity in term of demographic axes represented and also in terms of the knowledge areas included. We demonstrate that the dataset contains more stereotypes unrecognized by current LLMs than previous datasets. HESEIA is available to support bias assessments grounded in educational communities.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Smart Water Security with AI and Blockchain-Enhanced Digital Twins
Authors:
Mohammadhossein Homaei,
Victor Gonzalez Morales,
Oscar Mogollon Gutierrez,
Ruben Molano Gomez,
Andres Caro
Abstract:
Water distribution systems in rural areas face serious challenges such as a lack of real-time monitoring, vulnerability to cyberattacks, and unreliable data handling. This paper presents an integrated framework that combines LoRaWAN-based data acquisition, a machine learning-driven Intrusion Detection System (IDS), and a blockchain-enabled Digital Twin (BC-DT) platform for secure and transparent w…
▽ More
Water distribution systems in rural areas face serious challenges such as a lack of real-time monitoring, vulnerability to cyberattacks, and unreliable data handling. This paper presents an integrated framework that combines LoRaWAN-based data acquisition, a machine learning-driven Intrusion Detection System (IDS), and a blockchain-enabled Digital Twin (BC-DT) platform for secure and transparent water management. The IDS filters anomalous or spoofed data using a Long Short-Term Memory (LSTM) Autoencoder and Isolation Forest before validated data is logged via smart contracts on a private Ethereum blockchain using Proof of Authority (PoA) consensus. The verified data feeds into a real-time DT model supporting leak detection, consumption forecasting, and predictive maintenance. Experimental results demonstrate that the system achieves over 80 transactions per second (TPS) with under 2 seconds of latency while remaining cost-effective and scalable for up to 1,000 smart meters. This work demonstrates a practical and secure architecture for decentralized water infrastructure in under-connected rural environments.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
ZOGRASCOPE: A New Benchmark for Semantic Parsing over Property Graphs
Authors:
Francesco Cazzaro,
Justin Kleindienst,
Sofia Marquez Gomez,
Ariadna Quattoni
Abstract:
In recent years, the need for natural language interfaces to knowledge graphs has become increasingly important since they enable easy and efficient access to the information contained in them. In particular, property graphs (PGs) have seen increased adoption as a means of representing complex structured information. Despite their growing popularity in industry, PGs remain relatively underrepresen…
▽ More
In recent years, the need for natural language interfaces to knowledge graphs has become increasingly important since they enable easy and efficient access to the information contained in them. In particular, property graphs (PGs) have seen increased adoption as a means of representing complex structured information. Despite their growing popularity in industry, PGs remain relatively underrepresented in semantic parsing research with a lack of resources for evaluation. To address this gap, we introduce ZOGRASCOPE, a benchmark designed specifically for PGs and queries written in Cypher. Our benchmark includes a diverse set of manually annotated queries of varying complexity and is organized into three partitions: iid, compositional and length. We complement this paper with a set of experiments that test the performance of different LLMs in a variety of learning settings.
△ Less
Submitted 30 May, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
UniCoN: Universal Conditional Networks for Multi-Age Embryonic Cartilage Segmentation with Sparsely Annotated Data
Authors:
Nishchal Sapkota,
Yejia Zhang,
Zihao Zhao,
Maria Gomez,
Yuhan Hsi,
Jordan A. Wilson,
Kazuhiko Kawasaki,
Greg Holmes,
Meng Wu,
Ethylin Wang Jabs,
Joan T. Richtsmeier,
Susan M. Motch Perrine,
Danny Z. Chen
Abstract:
Osteochondrodysplasia, affecting 2-3% of newborns globally, is a group of bone and cartilage disorders that often result in head malformations, contributing to childhood morbidity and reduced quality of life. Current research on this disease using mouse models faces challenges since it involves accurately segmenting the developing cartilage in 3D micro-CT images of embryonic mice. Tackling this se…
▽ More
Osteochondrodysplasia, affecting 2-3% of newborns globally, is a group of bone and cartilage disorders that often result in head malformations, contributing to childhood morbidity and reduced quality of life. Current research on this disease using mouse models faces challenges since it involves accurately segmenting the developing cartilage in 3D micro-CT images of embryonic mice. Tackling this segmentation task with deep learning (DL) methods is laborious due to the big burden of manual image annotation, expensive due to the high acquisition costs of 3D micro-CT images, and difficult due to embryonic cartilage's complex and rapidly changing shapes. While DL approaches have been proposed to automate cartilage segmentation, most such models have limited accuracy and generalizability, especially across data from different embryonic age groups. To address these limitations, we propose novel DL methods that can be adopted by any DL architectures -- including CNNs, Transformers, or hybrid models -- which effectively leverage age and spatial information to enhance model performance. Specifically, we propose two new mechanisms, one conditioned on discrete age categories and the other on continuous image crop locations, to enable an accurate representation of cartilage shape changes across ages and local shape details throughout the cranial region. Extensive experiments on multi-age cartilage segmentation datasets show significant and consistent performance improvements when integrating our conditional modules into popular DL segmentation architectures. On average, we achieve a 1.7% Dice score increase with minimal computational overhead and a 7.5% improvement on unseen data. These results highlight the potential of our approach for developing robust, universal models capable of handling diverse datasets with limited annotated data, a key challenge in DL-based medical image analysis.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge
Authors:
Juan M. Martín-Doñas,
Eros Roselló,
Angel M. Gomez,
Aitor Álvarez,
Iván López-Espejo,
Antonio M. Peinado
Abstract:
This paper presents the work carried out by the ASASVIcomtech team, made up of researchers from Vicomtech and University of Granada, for the ASVspoof5 Challenge. The team has participated in both Track 1 (speech deepfake detection) and Track 2 (spoofing-aware speaker verification). This work started with an analysis of the challenge available data, which was regarded as an essential step to avoid…
▽ More
This paper presents the work carried out by the ASASVIcomtech team, made up of researchers from Vicomtech and University of Granada, for the ASVspoof5 Challenge. The team has participated in both Track 1 (speech deepfake detection) and Track 2 (spoofing-aware speaker verification). This work started with an analysis of the challenge available data, which was regarded as an essential step to avoid later potential biases of the trained models, and whose main conclusions are presented here. With respect to the proposed approaches, a closed-condition system employing a deep complex convolutional recurrent architecture was developed for Track 1, although, unfortunately, no noteworthy results were achieved. On the other hand, different possibilities of open-condition systems, based on leveraging self-supervised models, augmented training data from previous challenges, and novel vocoders, were explored for both tracks, finally achieving very competitive results with an ensemble system.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Lightning-Fast Convective Outlooks: Predicting Severe Convective Environments with Global AI-based Weather Models
Authors:
Monika Feldmann,
Tom Beucler,
Milton Gomez,
Olivia Martius
Abstract:
Severe convective storms are among the most dangerous weather phenomena and accurate forecasts mitigate their impacts. The recently released suite of AI-based weather models produces medium-range forecasts within seconds, with a skill similar to state-of-the-art operational forecasts for variables on single levels. However, predicting severe thunderstorm environments requires accurate combinations…
▽ More
Severe convective storms are among the most dangerous weather phenomena and accurate forecasts mitigate their impacts. The recently released suite of AI-based weather models produces medium-range forecasts within seconds, with a skill similar to state-of-the-art operational forecasts for variables on single levels. However, predicting severe thunderstorm environments requires accurate combinations of dynamic and thermodynamic variables and the vertical structure of the atmosphere. Advancing the assessment of AI-models towards process-based evaluations lays the foundation for hazard-driven applications. We assess the forecast skill of three top-performing AI-models for convective parameters at lead-times of up to 10 days against reanalysis and ECMWF's operational numerical weather prediction model IFS. In a case study and seasonal analyses, we see the best performance by GraphCast and Pangu-Weather: these models match or even exceed the performance of IFS for instability and shear. This opens opportunities for fast and inexpensive predictions of severe weather environments.
△ Less
Submitted 9 September, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring
Authors:
Samuel Yanes Luis,
Dmitriy Shutin,
Juan Marchal Gómez,
Daniel Gutiérrez Reina,
Sergio Toral Marín
Abstract:
The conservation of hydrological resources involves continuously monitoring their contamination. A multi-agent system composed of autonomous surface vehicles is proposed in this paper to efficiently monitor the water quality. To achieve a safe control of the fleet, the fleet policy should be able to act based on measurements and to the the fleet state. It is proposed to use Local Gaussian Processe…
▽ More
The conservation of hydrological resources involves continuously monitoring their contamination. A multi-agent system composed of autonomous surface vehicles is proposed in this paper to efficiently monitor the water quality. To achieve a safe control of the fleet, the fleet policy should be able to act based on measurements and to the the fleet state. It is proposed to use Local Gaussian Processes and Deep Reinforcement Learning to jointly obtain effective monitoring policies. Local Gaussian processes, unlike classical global Gaussian processes, can accurately model the information in a dissimilar spatial correlation which captures more accurately the water quality information. A Deep convolutional policy is proposed, that bases the decisions on the observation on the mean and variance of this model, by means of an information gain reward. Using a Double Deep Q-Learning algorithm, agents are trained to minimize the estimation error in a safe manner thanks to a Consensus-based heuristic. Simulation results indicate an improvement of up to 24% in terms of the mean absolute error with the proposed models. Also, training results with 1-3 agents indicate that our proposed approach returns 20% and 24% smaller average estimation errors for, respectively, monitoring water quality variables and monitoring algae blooms, as compared to state-of-the-art approaches
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Lessons Learned: Reproducibility, Replicability, and When to Stop
Authors:
Milton S. Gomez,
Tom Beucler
Abstract:
While extensive guidance exists for ensuring the reproducibility of one's own study, there is little discussion regarding the reproduction and replication of external studies within one's own research. To initiate this discussion, drawing lessons from our experience reproducing an operational product for predicting tropical cyclogenesis, we present a two-dimensional framework to offer guidance on…
▽ More
While extensive guidance exists for ensuring the reproducibility of one's own study, there is little discussion regarding the reproduction and replication of external studies within one's own research. To initiate this discussion, drawing lessons from our experience reproducing an operational product for predicting tropical cyclogenesis, we present a two-dimensional framework to offer guidance on reproduction and replication. Our framework, representing model fitting on one axis and its use in inference on the other, builds upon three key aspects: the dataset, the metrics, and the model itself. By assessing the trajectories of our studies on this 2D plane, we can better inform the claims made using our research. Additionally, we use this framework to contextualize the utility of benchmark datasets in the atmospheric sciences. Our two-dimensional framework provides a tool for researchers, especially early career researchers, to incorporate prior work in their own research and to inform the claims they can make in this context.
△ Less
Submitted 9 January, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Assessing your Observatory's Impact: Best Practices in Establishing and Maintaining Observatory Bibliographies
Authors:
Observatory Bibliographers Collaboration,
Raffaele D'Abrusco,
Monique Gomez,
Uta Grothkopf,
Sharon Hunt,
Ruth Kneale,
Mika Konuma,
Jenny Novacescu,
Luisa Rebull,
Elena Scire,
Erin Scott,
Richard Shaw,
Donna Thompson,
Lance Utley,
Christopher Wilkinson,
Sherry Winkelman
Abstract:
Observatories need to measure and evaluate the scientific output and overall impact of their facilities. An observatory bibliography consists of the papers published using that observatory's data, typically gathered by searching the major journals for relevant keywords. Recently, the volume of literature and methods by which the publications pool is evaluated has increased. Efficient and standardi…
▽ More
Observatories need to measure and evaluate the scientific output and overall impact of their facilities. An observatory bibliography consists of the papers published using that observatory's data, typically gathered by searching the major journals for relevant keywords. Recently, the volume of literature and methods by which the publications pool is evaluated has increased. Efficient and standardized procedures are necessary to assign meaningful metadata; enable user-friendly retrieval; and provide the opportunity to derive reports, statistics, and visualizations to impart a deeper understanding of the research output. In 2021, a group of observatory bibliographers from around the world convened online to continue the discussions presented in Lagerstrom (2015). We worked to extract general guidelines from our experiences, techniques, and lessons learnt. The paper explores the development, application, and current status of telescope bibliographies and future trends. This paper briefly describes the methodologies employed in constructing databases, along with the various bibliometric techniques used to analyze and interpret them. We explain reasons for non-standardization and why it is essential for each observatory to identify metadata and metrics that are meaningful for them; caution the (over-)use of comparisons among facilities that are, ultimately, not comparable through bibliometrics; and highlight the benefits of telescope bibliographies, both for researchers within the astronomical community and for stakeholders beyond the specific observatories. There is tremendous diversity in the ways bibliographers track publications and maintain databases, due to parameters such as resources, type of observatory, historical practices, and reporting requirements to funders and outside agencies. However, there are also common sets of Best Practices.
△ Less
Submitted 4 October, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Assessing the Impact of Noise on Quantum Neural Networks: An Experimental Analysis
Authors:
Erik B. Terres Escudero,
Danel Arias Alamo,
Oier Mentxaka Gómez,
Pablo García Bringas
Abstract:
In the race towards quantum computing, the potential benefits of quantum neural networks (QNNs) have become increasingly apparent. However, Noisy Intermediate-Scale Quantum (NISQ) processors are prone to errors, which poses a significant challenge for the execution of complex algorithms or quantum machine learning. To ensure the quality and security of QNNs, it is crucial to explore the impact of…
▽ More
In the race towards quantum computing, the potential benefits of quantum neural networks (QNNs) have become increasingly apparent. However, Noisy Intermediate-Scale Quantum (NISQ) processors are prone to errors, which poses a significant challenge for the execution of complex algorithms or quantum machine learning. To ensure the quality and security of QNNs, it is crucial to explore the impact of noise on their performance. This paper provides a comprehensive analysis of the impact of noise on QNNs, examining the Mottonen state preparation algorithm under various noise models and studying the degradation of quantum states as they pass through multiple layers of QNNs. Additionally, the paper evaluates the effect of noise on the performance of pre-trained QNNs and highlights the challenges posed by noise models in quantum computing. The findings of this study have significant implications for the development of quantum software, emphasizing the importance of prioritizing stability and noise-correction measures when developing QNNs to ensure reliable and trustworthy results. This paper contributes to the growing body of literature on quantum computing and quantum machine learning, providing new insights into the impact of noise on QNNs and paving the way towards the development of more robust and efficient quantum algorithms.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Leveraging a realistic synthetic database to learn Shape-from-Shading for estimating the colon depth in colonoscopy images
Authors:
Josué Ruano,
Martín Gómez,
Eduardo Romero,
Antoine Manzanera
Abstract:
Colonoscopy is the choice procedure to diagnose colon and rectum cancer, from early detection of small precancerous lesions (polyps), to confirmation of malign masses. However, the high variability of the organ appearance and the complex shape of both the colon wall and structures of interest make this exploration difficult. Learned visuospatial and perceptual abilities mitigate technical limitati…
▽ More
Colonoscopy is the choice procedure to diagnose colon and rectum cancer, from early detection of small precancerous lesions (polyps), to confirmation of malign masses. However, the high variability of the organ appearance and the complex shape of both the colon wall and structures of interest make this exploration difficult. Learned visuospatial and perceptual abilities mitigate technical limitations in clinical practice by proper estimation of the intestinal depth. This work introduces a novel methodology to estimate colon depth maps in single frames from monocular colonoscopy videos. The generated depth map is inferred from the shading variation of the colon wall with respect to the light source, as learned from a realistic synthetic database. Briefly, a classic convolutional neural network architecture is trained from scratch to estimate the depth map, improving sharp depth estimations in haustral folds and polyps by a custom loss function that minimizes the estimation error in edges and curvatures. The network was trained by a custom synthetic colonoscopy database herein constructed and released, composed of 248,400 frames (47 videos), with depth annotations at the level of pixels. This collection comprehends 5 subsets of videos with progressively higher levels of visual complexity. Evaluation of the depth estimation with the synthetic database reached a threshold accuracy of 95.65%, and a mean-RMSE of 0.451 cm, while a qualitative assessment with a real database showed consistent depth estimations, visually evaluated by the expert gastroenterologist coauthoring this paper. Finally, the method achieved competitive performance with respect to another state-of-the-art method using a public synthetic database and comparable results in a set of images with other five state-of-the-art methods.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors
Authors:
Marta Navarro,
Josué Feliu,
Salvador Petit,
María E. Gómez,
Julio Sahuquillo
Abstract:
Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference within the core, which has a negative impact on individual application performance and can significantly increase the turnaround time of multi-program workloads. T…
▽ More
Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference within the core, which has a negative impact on individual application performance and can significantly increase the turnaround time of multi-program workloads. The severity of the interference effects depends on the competing co-runners sharing the core. Thus, it can be mitigated by applying a thread-to-core allocation policy that smartly selects applications to be run in the same core to minimize their interference.
This paper presents SYNPA, a simple approach that dynamically allocates threads to cores in an SMT processor based on their run-time dynamic behavior. The approach uses a regression model to select synergistic pairs to mitigate intra-core interference. The main novelty of SYNPA is that it uses just three variables collected from the performance counters available in current ARM processors at the dispatch stage. Experimental results show that SYNPA outperforms the default Linux scheduler by around 36%, on average, in terms of turnaround time in 8-application workloads combining frontend bound and backend bound benchmarks.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
SALSA: Semantically-Aware Latent Space Autoencoder
Authors:
Kathryn E. Kirchoff,
Travis Maxfield,
Alexander Tropsha,
Shawn M. Gomez
Abstract:
In deep learning for drug discovery, chemical data are often represented as simplified molecular-input line-entry system (SMILES) sequences which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations th…
▽ More
In deep learning for drug discovery, chemical data are often represented as simplified molecular-input line-entry system (SMILES) sequences which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations that are semantically meaningful, where semantics are defined by the structural (graph-to-graph) similarities between molecules. We demonstrate by example that autoencoders may map structurally similar molecules to distant codes, resulting in an incoherent latent space that does not respect the structural similarities between molecules. To address this shortcoming we propose Semantically-Aware Latent Space Autoencoder (SALSA), a transformer-autoencoder modified with a contrastive task, tailored specifically to learn graph-to-graph similarity between molecules. Formally, the contrastive objective is to map structurally similar molecules (separated by a single graph edit) to nearby codes in the latent space. To accomplish this, we generate a novel dataset comprised of sets of structurally similar molecules and opt for a supervised contrastive loss that is able to incorporate full sets of positive samples. We compare SALSA to its ablated counterparts, and show empirically that the composed training objective (reconstruction and contrastive task) leads to a higher quality latent space that is more 1) structurally-aware, 2) semantically continuous, and 3) property-aware.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Speech Wikimedia: A 77 Language Multilingual Speech Dataset
Authors:
Rafael Mosquera Gómez,
Julián Eusse,
Juan Ciro,
Daniel Galvez,
Ryan Hileman,
Kurt Bollacker,
David Kanter
Abstract:
The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages. Each audio file has one or more transcriptions in different languages, making this dataset suitable for training speech recogni…
▽ More
The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages. Each audio file has one or more transcriptions in different languages, making this dataset suitable for training speech recognition, speech translation, and machine translation models.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Selecting Robust Features for Machine Learning Applications using Multidata Causal Discovery
Authors:
Saranya Ganesh S.,
Tom Beucler,
Frederick Iat-Hin Tam,
Milton S. Gomez,
Jakob Runge,
Andreas Gerhardus
Abstract:
Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously pro…
▽ More
Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.
△ Less
Submitted 30 June, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
$β$-Variational autoencoders and transformers for reduced-order modelling of fluid flows
Authors:
Alberto Solera-Rico,
Carlos Sanmiguel Vila,
M. A. Gómez,
Yuning Wang,
Abdulrahman Almashjary,
Scott T. M. Dawson,
Ricardo Vinuesa
Abstract:
Variational autoencoder (VAE) architectures have the potential to develop reduced-order models (ROMs) for chaotic fluid flows. We propose a method for learning compact and near-orthogonal ROMs using a combination of a $β$-VAE and a transformer, tested on numerical data from a two-dimensional viscous flow in both periodic and chaotic regimes. The $β$-VAE is trained to learn a compact latent represe…
▽ More
Variational autoencoder (VAE) architectures have the potential to develop reduced-order models (ROMs) for chaotic fluid flows. We propose a method for learning compact and near-orthogonal ROMs using a combination of a $β$-VAE and a transformer, tested on numerical data from a two-dimensional viscous flow in both periodic and chaotic regimes. The $β$-VAE is trained to learn a compact latent representation of the flow velocity, and the transformer is trained to predict the temporal dynamics in latent space. Using the $β$-VAE to learn disentangled representations in latent-space, we obtain a more interpretable flow model with features that resemble those observed in the proper orthogonal decomposition, but with a more efficient representation. Using Poincaré maps, the results show that our method can capture the underlying dynamics of the flow outperforming other prediction models. The proposed method has potential applications in other fields such as weather forecasting, structural dynamics or biomedical engineering.
△ Less
Submitted 15 November, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Abstract homogeneity of interval-valued functions
Authors:
Ana Shirley Monteiro,
Regivan Santiago,
Radko Mesiar,
Marisol Gomez,
Martin Papco,
Mikel Ferrero-Jaurrieta,
Humberto Bustince
Abstract:
In this paper we develop the idea of abstract homogeneity in the context of interval-valued (IV) functions endowed with admissible orders and investigate some of its properties.
In this paper we develop the idea of abstract homogeneity in the context of interval-valued (IV) functions endowed with admissible orders and investigate some of its properties.
△ Less
Submitted 27 March, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
A Systematic Literature Review of Game-based Assessment Studies: Trends and Challenges
Authors:
Manuel J. Gomez,
José A. Ruipérez-Valiente,
Félix J. García Clemente
Abstract:
Technology has become an essential part of our everyday life, and its use in educational environments keeps growing. In addition, games are one of the most popular activities across cultures and ages, and there is ample evidence that supports the benefits of using games for assessment. This field is commonly known as game-based assessment (GBA), which refers to the use of games to assess learners'…
▽ More
Technology has become an essential part of our everyday life, and its use in educational environments keeps growing. In addition, games are one of the most popular activities across cultures and ages, and there is ample evidence that supports the benefits of using games for assessment. This field is commonly known as game-based assessment (GBA), which refers to the use of games to assess learners' competencies, skills, or knowledge. This paper analyzes the current status of the GBA field by performing the first systematic literature review on empirical GBA studies. It is based on 65 research papers that used digital GBAs to determine: (1) the context where the study has been applied; (2) the primary purpose; (3) the domain of the game used; (4) game/tool availability; (5) the size of the data sample; (6) the computational methods and algorithms applied; (7) the targeted stakeholders of the study; and (8) what limitations and challenges are reported by authors. Based on the categories established and our analysis, the findings suggest that GBAs are mainly used in K-16 education and for assessment purposes, and that most GBAs focus on assessing STEM content, and cognitive and soft skills. Furthermore, the current limitations indicate that future GBA research would benefit from the use of bigger data samples and more specialized algorithms. Based on our results, we discuss current trends in the field and open challenges (including replication and validation problems), providing recommendations for the future research agenda of the GBA field.
△ Less
Submitted 2 December, 2022; v1 submitted 15 July, 2022;
originally announced July 2022.
-
HealNet -- Self-Supervised Acute Wound Heal-Stage Classification
Authors:
Héctor Carrión,
Mohammad Jafari,
Hsin-Ya Yang,
Roslyn Rivkah Isseroff,
Marco Rolandi,
Marcella Gomez,
Narges Norouzi
Abstract:
Identifying, tracking, and predicting wound heal-stage progression is a fundamental task towards proper diagnosis, effective treatment, facilitating healing, and reducing pain. Traditionally, a medical expert might observe a wound to determine the current healing state and recommend treatment. However, sourcing experts who can produce such a diagnosis solely from visual indicators can be difficult…
▽ More
Identifying, tracking, and predicting wound heal-stage progression is a fundamental task towards proper diagnosis, effective treatment, facilitating healing, and reducing pain. Traditionally, a medical expert might observe a wound to determine the current healing state and recommend treatment. However, sourcing experts who can produce such a diagnosis solely from visual indicators can be difficult, time-consuming and expensive. In addition, lesions may take several weeks to undergo the healing process, demanding resources to monitor and diagnose continually. Automating this task can be challenging; datasets that follow wound progression from onset to maturation are small, rare, and often collected without computer vision in mind. To tackle these challenges, we introduce a self-supervised learning scheme composed of (a) learning embeddings of wound's temporal dynamics, (b) clustering for automatic stage discovery, and (c) fine-tuned classification. The proposed self-supervised and flexible learning framework is biologically inspired and trained on a small dataset with zero human labeling. The HealNet framework achieved high pre-text and downstream classification accuracy; when evaluated on held-out test data, HealNet achieved 97.7% pre-text accuracy and 90.62% heal-stage classification accuracy.
△ Less
Submitted 23 June, 2022; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Federated Data Analytics: A Study on Linear Models
Authors:
Xubo Yue,
Raed Al Kontar,
Ana María Estrada Gómez
Abstract:
As edge devices become increasingly powerful, data analytics are gradually moving from a centralized to a decentralized regime where edge compute resources are exploited to process more of the data locally. This regime of analytics is coined as federated data analytics (FDA). In spite of the recent success stories of FDA, most literature focuses exclusively on deep neural networks. In this work, w…
▽ More
As edge devices become increasingly powerful, data analytics are gradually moving from a centralized to a decentralized regime where edge compute resources are exploited to process more of the data locally. This regime of analytics is coined as federated data analytics (FDA). In spite of the recent success stories of FDA, most literature focuses exclusively on deep neural networks. In this work, we take a step back to develop an FDA treatment for one of the most fundamental statistical models: linear regression. Our treatment is built upon hierarchical modeling that allows borrowing strength across multiple groups. To this end, we propose two federated hierarchical model structures that provide a shared representation across devices to facilitate information sharing. Notably, our proposed frameworks are capable of providing uncertainty quantification, variable selection, hypothesis testing and fast adaptation to new unseen data. We validate our methods on a range of real-life applications including condition monitoring for aircraft engines. The results show that our FDA treatment for linear models can serve as a competing benchmark model for future development of federated algorithms.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Characterizing player's playing styles based on Player Vectors for each playing position in the Chinese Football Super League
Authors:
Yuesen Li,
Shouxin Zong,
Yanfei Shen,
Zhiqiang Pu,
Miguel-Ángel Gómez,
Yixiong Cui
Abstract:
Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a rece…
▽ More
Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a recently adopted Player Vectors framework. Data of 960 matches from 2016-2019 CSL were used. Match ratings, and ten types of match events with the corresponding coordinates for all the lineup players whose on-pitch time exceeded 45 minutes were extracted. Players were first clustered into 8 positions. A player vector was constructed for each player in each match based on the Player Vectors using Nonnegative Matrix Factorization (NMF). Another NMF process was run on the player vectors to extract different types of playing styles. The resulting player vectors discovered 18 different playing styles in the CSL. Six performance indicators of each style were investigated to observe their contributions. In general, the playing styles of forwards and midfielders are in line with football performance evolution trends, while the styles of defenders should be reconsidered. Multifunctional playing styles were also found in high rated CSL players.
△ Less
Submitted 7 July, 2022; v1 submitted 5 May, 2022;
originally announced May 2022.
-
Large Scale Analysis of Open MOOC Reviews to Support Learners' Course Selection
Authors:
Manuel J. Gomez,
Mario Calderón,
Victor Sánchez,
Félix J. García Clemente,
José A. Ruipérez-Valiente
Abstract:
The recent pandemic has changed the way we see education. It is not surprising that children and college students are not the only ones using online education. Millions of adults have signed up for online classes and courses during last years, and MOOC providers, such as Coursera or edX, are reporting millions of new users signing up in their platforms. However, students do face some challenges wh…
▽ More
The recent pandemic has changed the way we see education. It is not surprising that children and college students are not the only ones using online education. Millions of adults have signed up for online classes and courses during last years, and MOOC providers, such as Coursera or edX, are reporting millions of new users signing up in their platforms. However, students do face some challenges when choosing courses. Though online review systems are standard among many verticals, no standardized or fully decentralized review systems exist in the MOOC ecosystem. In this vein, we believe that there is an opportunity to leverage available open MOOC reviews in order to build simpler and more transparent reviewing systems, allowing users to really identify the best courses out there. Specifically, in our research we analyze 2.4 million reviews (which is the largest MOOC reviews dataset used until now) from five different platforms in order to determine the following: (1) if the numeric ratings provide discriminant information to learners, (2) if NLP-driven sentiment analysis on textual reviews could provide valuable information to learners, (3) if we can leverage NLP-driven topic finding techniques to infer themes that could be important for learners, and (4) if we can use these models to effectively characterize MOOCs based on the open reviews. Results show that numeric ratings are clearly biased (63\% of them are 5-star ratings), and the topic modeling reveals some interesting topics related with course advertisements, the real applicability, or the difficulty of the different courses. We expect our study to shed some light on the area and promote a more transparent approach in online education reviews, which are becoming more and more popular as we enter the post-pandemic era.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
On the Bound of Energy Consumption in Cellular IoT Networks
Authors:
Bassel Al Homssi,
Akram Al-Hourani,
Sathyanarayanan Chandrasekharan,
Karina Mabell Gomez,
Sithamparanathan Kandeepan
Abstract:
Billions of sensors are expected to be connected to the Internet through the emerging Internet of Things (IoT) technologies. Many of these sensors will primarily be connected using wireless technologies powered using batteries as their sole energy source which makes it paramount to optimize their energy consumption. In this paper, we provide an analytic framework of the energy-consumption profile…
▽ More
Billions of sensors are expected to be connected to the Internet through the emerging Internet of Things (IoT) technologies. Many of these sensors will primarily be connected using wireless technologies powered using batteries as their sole energy source which makes it paramount to optimize their energy consumption. In this paper, we provide an analytic framework of the energy-consumption profile and its lower bound for an IoT end device formulated based on Shannon capacity. We extend the study to model the average energy-consumption performance based on the random geometric distribution of IoT gateways by utilizing tools from stochastic geometry and real measurements of interference in the ISM-band. Experimental data, interference measurements and Monte-Carlo simulations are presented to validate the plausibility of the proposed analytic framework, where results demonstrate that the current network infrastructures performance is bounded between two extreme geometric models. This study considers interference seen by a gateway regardless of its source.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
PANACEA cough sound-based diagnosis of COVID-19 for the DiCOVA 2021 Challenge
Authors:
Madhu R. Kamble,
Jose A. Gonzalez-Lopez,
Teresa Grau,
Juan M. Espin,
Lorenzo Cascioli,
Yiqing Huang,
Alejandro Gomez-Alanis,
Jose Patino,
Roberto Font,
Antonio M. Peinado,
Angel M. Gomez,
Nicholas Evans,
Maria A. Zuluaga,
Massimiliano Todisco
Abstract:
The COVID-19 pandemic has led to the saturation of public health services worldwide. In this scenario, the early diagnosis of SARS-Cov-2 infections can help to stop or slow the spread of the virus and to manage the demand upon health services. This is especially important when resources are also being stretched by heightened demand linked to other seasonal diseases, such as the flu. In this contex…
▽ More
The COVID-19 pandemic has led to the saturation of public health services worldwide. In this scenario, the early diagnosis of SARS-Cov-2 infections can help to stop or slow the spread of the virus and to manage the demand upon health services. This is especially important when resources are also being stretched by heightened demand linked to other seasonal diseases, such as the flu. In this context, the organisers of the DiCOVA 2021 challenge have collected a database with the aim of diagnosing COVID-19 through the use of coughing audio samples. This work presents the details of the automatic system for COVID-19 detection from cough recordings presented by team PANACEA. This team consists of researchers from two European academic institutions and one company: EURECOM (France), University of Granada (Spain), and Biometric Vox S.L. (Spain). We developed several systems based on established signal processing and machine learning methods. Our best system employs a Teager energy operator cepstral coefficients (TECCs) based frontend and Light gradient boosting machine (LightGBM) backend. The AUC obtained by this system on the test set is 76.31% which corresponds to a 10% improvement over the official baseline.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Sketches image analysis: Web image search engine usingLSH index and DNN InceptionV3
Authors:
Alessio Schiavo,
Filippo Minutella,
Mattia Daole,
Marsha Gomez Gomez
Abstract:
The adoption of an appropriate approximate similarity search method is an essential prereq-uisite for developing a fast and efficient CBIR system, especially when dealing with large amount ofdata. In this study we implement a web image search engine on top of a Locality Sensitive Hashing(LSH) Index to allow fast similarity search on deep features. Specifically, we exploit transfer learningfor deep…
▽ More
The adoption of an appropriate approximate similarity search method is an essential prereq-uisite for developing a fast and efficient CBIR system, especially when dealing with large amount ofdata. In this study we implement a web image search engine on top of a Locality Sensitive Hashing(LSH) Index to allow fast similarity search on deep features. Specifically, we exploit transfer learningfor deep features extraction from images. Firstly, we adopt InceptionV3 pretrained on ImageNet asfeatures extractor, secondly, we try out several CNNs built on top of InceptionV3 as convolutionalbase fine-tuned on our dataset. In both of the previous cases we index the features extracted within ourLSH index implementation so as to compare the retrieval performances with and without fine-tuning.In our approach we try out two different LSH implementations: the first one working with real numberfeature vectors and the second one with the binary transposed version of those vectors. Interestingly,we obtain the best performances when using the binary LSH, reaching almost the same result, in termsof mean average precision, obtained by performing sequential scan of the features, thus avoiding thebias introduced by the LSH index. Lastly, we carry out a performance analysis class by class in terms ofrecall againstmAPhighlighting, as expected, a strong positive correlation between the two.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Collective Awareness for Abnormality Detection in Connected Autonomous Vehicles
Authors:
Divya Thekke Kanapram,
Fabio Patrone,
Pablo Marin-Plaza,
Mario Marchese,
Eliane L. Bodanese,
Lucio Marcenaro,
David Martín Gómez,
Carlo Regazzoni
Abstract:
The advancements in connected and autonomous vehicles in these times demand the availability of tools providing the agents with the capability to be aware and predict their own states and context dynamics. This article presents a novel approach to develop an initial level of collective awareness in a network of intelligent agents. A specific collective self awareness functionality is considered, n…
▽ More
The advancements in connected and autonomous vehicles in these times demand the availability of tools providing the agents with the capability to be aware and predict their own states and context dynamics. This article presents a novel approach to develop an initial level of collective awareness in a network of intelligent agents. A specific collective self awareness functionality is considered, namely, agent centered detection of abnormal situations present in the environment around any agent in the network. Moreover, the agent should be capable of analyzing how such abnormalities can influence the future actions of each agent. Data driven dynamic Bayesian network (DBN) models learned from time series of sensory data recorded during the realization of tasks (agent network experiences) are here used for abnormality detection and prediction. A set of DBNs, each related to an agent, is used to allow the agents in the network to each synchronously aware possible abnormalities occurring when available models are used on a new instance of the task for which DBNs have been learned. A growing neural gas (GNG) algorithm is used to learn the node variables and conditional probabilities linking nodes in the DBN models; a Markov jump particle filter (MJPF) is employed for state estimation and abnormality detection in each agent using learned DBNs as filter parameters. Performance metrics are discussed to asses the algorithms reliability and accuracy. The impact is also evaluated by the communication channel used by the network to share the data sensed in a distributed way by each agent of the network. The IEEE 802.11p protocol standard has been considered for communication among agents. Real data sets are also used acquired by autonomous vehicles performing different tasks in a controlled environment.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Understanding Cloud Workloads Performance in a Production like Environment
Authors:
Lucia Pons,
Josué Feliu,
José Puche,
Chaoyi Huang,
Salvador Petit,
Julio Pons,
María E. Gómez,
Julio Sahuquillo
Abstract:
Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that classifies applications according to how the major system resources affect their performance (e.g., tail latency) as a function of the level of load (e.g., QPS). Af…
▽ More
Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that classifies applications according to how the major system resources affect their performance (e.g., tail latency) as a function of the level of load (e.g., QPS). After that, we present three main studies addressing three major concerns to improve the cloud performance: impact of the level of load on performance, impact of hyper-threading on performance, and impact of limiting the major system resources (e.g., last level cache) on performance. In all these studies we identified important findings that we hope help cloud providers improve their system utilization.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
Silent Speech Interfaces for Speech Restoration: A Review
Authors:
Jose A. Gonzalez-Lopez,
Alejandro Gomez-Alanis,
Juan M. Martín-Doñas,
José L. Pérez-Córdoba,
Angel M. Gomez
Abstract:
This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communicati…
▽ More
This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.
△ Less
Submitted 27 September, 2020; v1 submitted 4 September, 2020;
originally announced September 2020.
-
Quantification of MagLIF morphology using the Mallat Scattering Transformation
Authors:
Michael E. Glinsky,
Thomas W. Moore,
William E. Lewis,
Matthew R. Weis,
Christopher A. Jennings,
David J. Ampleford,
Patrick F. Knapp,
Eric C. Harding,
Matthew R. Gomez,
Adam J. Harvey-Thompson
Abstract:
The morphology of the stagnated plasma resulting from Magnetized Liner Inertial Fusion (MagLIF) is measured by imaging the self-emission x-rays coming from the multi-keV plasma. Equivalent diagnostic response can be generated by integrated radiation-magnetohydrodynamic (rad-MHD) simulations from programs such as HYDRA and GORGON. There have been only limited quantitative ways to compare the image…
▽ More
The morphology of the stagnated plasma resulting from Magnetized Liner Inertial Fusion (MagLIF) is measured by imaging the self-emission x-rays coming from the multi-keV plasma. Equivalent diagnostic response can be generated by integrated radiation-magnetohydrodynamic (rad-MHD) simulations from programs such as HYDRA and GORGON. There have been only limited quantitative ways to compare the image morphology, that is the texture, of simulations and experiments. We have developed a metric of image morphology based on the Mallat Scattering Transformation (MST), a transformation that has proved to be effective at distinguishing textures, sounds, and written characters. This metric is designed, demonstrated, and refined by classifying ensembles (i.e., classes) of synthetic stagnation images, and by regressing an ensemble of synthetic stagnation images to the morphology (i.e., model) parameters used to generate the synthetic images. We use this metric to quantitatively compare simulations to experimental images, experimental images to each other, and to estimate the morphological parameters of the experimental images with uncertainty. This coordinate space has proved very adept at doing a sophisticated relative background subtraction in the MST space. This was needed to compare the experimental self emission images to the rad-MHD simulation images.
△ Less
Submitted 15 October, 2020; v1 submitted 13 April, 2020;
originally announced May 2020.
-
A Smartphone-Based Skin Disease Classification Using MobileNet CNN
Authors:
Jessica Velasco,
Cherry Pascion,
Jean Wilmar Alberio,
Jonathan Apuang,
John Stephen Cruz,
Mark Angelo Gomez,
Benjamin Jr. Molina,
Lyndon Tuala,
August Thio-ac,
Romeo Jr. Jorda
Abstract:
The MobileNet model was used by applying transfer learning on the 7 skin diseases to create a skin disease classification system on Android application. The proponents gathered a total of 3,406 images and it is considered as imbalanced dataset because of the unequal number of images on its classes. Using different sampling method and preprocessing of input data was explored to further improved the…
▽ More
The MobileNet model was used by applying transfer learning on the 7 skin diseases to create a skin disease classification system on Android application. The proponents gathered a total of 3,406 images and it is considered as imbalanced dataset because of the unequal number of images on its classes. Using different sampling method and preprocessing of input data was explored to further improved the accuracy of the MobileNet. Using under-sampling method and the default preprocessing of input data achieved an 84.28% accuracy. While, using imbalanced dataset and default preprocessing of input data achieved a 93.6% accuracy. Then, researchers explored oversampling the dataset and the model attained a 91.8% accuracy. Lastly, by using oversampling technique and data augmentation on preprocessing the input data provide a 94.4% accuracy and this model was deployed on the developed Android application.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
App Store 2.0: From Crowd Information to Actionable Feedback in Mobile Ecosystems
Authors:
María Gómez,
Bram Adams,
Walid Maalej,
Martin Monperrus,
Romain Rouvoy
Abstract:
Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores a…
▽ More
Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores and discuss the different kinds of actionable feedbacks that app stores can generate using crowdsourced information.
△ Less
Submitted 2 July, 2018;
originally announced July 2018.
-
Amanuensis: The Programmer's Apprentice
Authors:
Thomas Dean,
Maurice Chiang,
Marcus Gomez,
Nate Gruver,
Yousef Hindy,
Michelle Lam,
Peter Lu,
Sophia Sanchez,
Rohun Saxena,
Michael Smith,
Lucy Wang,
Catherine Wong
Abstract:
This document provides an overview of the material covered in a course taught at Stanford in the spring quarter of 2018. The course draws upon insight from cognitive and systems neuroscience to implement hybrid connectionist and symbolic reasoning systems that leverage and extend the state of the art in machine learning by integrating human and machine intelligence. As a concrete example we focus…
▽ More
This document provides an overview of the material covered in a course taught at Stanford in the spring quarter of 2018. The course draws upon insight from cognitive and systems neuroscience to implement hybrid connectionist and symbolic reasoning systems that leverage and extend the state of the art in machine learning by integrating human and machine intelligence. As a concrete example we focus on digital assistants that learn from continuous dialog with an expert software engineer while providing initial value as powerful analytical, computational and mathematical savants. Over time these savants learn cognitive strategies (domain-relevant problem solving skills) and develop intuitions (heuristics and the experience necessary for applying them) by learning from their expert associates. By doing so these savants elevate their innate analytical skills allowing them to partner on an equal footing as versatile collaborators - effectively serving as cognitive extensions and digital prostheses, thereby amplifying and emulating their human partner's conceptually-flexible thinking patterns and enabling improved access to and control over powerful computing resources.
△ Less
Submitted 8 November, 2018; v1 submitted 29 June, 2018;
originally announced July 2018.
-
On Computing the Dollo-1 phylogeny in polynomial time
Authors:
Paola Bonizzoni,
Gianluca Della Vedova,
Mauricio Soto Gomez,
Gabriella Trucco
Abstract:
The Dollo model for reconstructing evolutionary trees from binary characters has been proposed as a generalization of the infinite sites model, also known as the Perfect Phylogeny. In particular, the Dollo model is considered more realistic than the Perfect Phylogeny for inferring the evolution of tumor mutations. In the case of binary matrices, the Dollo-$k$ model requires an evolutionary tree in…
▽ More
The Dollo model for reconstructing evolutionary trees from binary characters has been proposed as a generalization of the infinite sites model, also known as the Perfect Phylogeny. In particular, the Dollo model is considered more realistic than the Perfect Phylogeny for inferring the evolution of tumor mutations. In the case of binary matrices, the Dollo-$k$ model requires an evolutionary tree in which each character, corresponding to a column in the input matrix, may change from $0$ to $1$ at most once, and from $1$ to $0$ at most $k$ times throughout the entire tree. Given a binary matrix, the problem of deciding whether there exists a Dollo-$k$ tree compatible with the matrix is NP-complete for any fixed $k \geq 2$, while computing a Dollo-$0$ tree corresponds to the Perfect Phylogeny decision problem, which admits a simple linear-time algorithm. The Dollo-$1$ tree problem corresponds to the Persistent Phylogeny problem, whose computational complexity, albeit under an equivalent formulation, was posed as an open question 20 years ago. We solve this problem by presenting a polynomial-time algorithm for the Persistent Phylogeny problem. Our solution relies on efficiently solving a specific class of binary matrices, represented as bipartite graphs called \emph{skeleton graphs}, or simply skeletons. In these graphs, characters are \emph{maximal}, that is their corresponding sets of species are not related by inclusion.
△ Less
Submitted 16 June, 2025; v1 submitted 3 November, 2016;
originally announced November 2016.