Search | arXiv e-print repository

Humanity's Last Exam

Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai. △ Less

Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

Comments: 29 pages, 6 figures

arXiv:2410.00693 [pdf, ps, other]

Optimizing Photoplethysmography-Based Sleep Staging Models by Leveraging Temporal Context for Wearable Devices Applications

Authors: Joseph A. P. Quino, Diego A. C. Cardenas, Marcelo A. F. Toledo, Felipe M. Dias, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez

Abstract: Accurate sleep stage classification is crucial for diagnosing sleep disorders and evaluating sleep quality. While polysomnography (PSG) remains the gold standard, photoplethysmography (PPG) is more practical due to its affordability and widespread use in wearable devices. However, state-of-the-art sleep staging methods often require prolonged continuous signal acquisition, making them impractical… ▽ More Accurate sleep stage classification is crucial for diagnosing sleep disorders and evaluating sleep quality. While polysomnography (PSG) remains the gold standard, photoplethysmography (PPG) is more practical due to its affordability and widespread use in wearable devices. However, state-of-the-art sleep staging methods often require prolonged continuous signal acquisition, making them impractical for wearable devices due to high energy consumption. Shorter signal acquisitions are more feasible but less accurate. Our work proposes an adapted sleep staging model based on top-performing state-of-the-art methods and evaluates its performance with different PPG segment sizes. We concatenate 30-second PPG segments over 15-minute intervals to leverage longer segment contexts. This approach achieved an accuracy of 0.75, a Cohen's Kappa of 0.60, an F1-Weighted score of 0.74, and an F1-Macro score of 0.60. Although reducing segment size decreased sensitivity for deep and REM stages, our strategy outperformed single 30-second window methods, particularly for these stages. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: 11 pages, 5 figures, 1 table

arXiv:2404.16049 [pdf, other]

doi 10.1088/1361-6579/adcb86

Exploring the limitations of blood pressure estimation using the photoplethysmography signal

Authors: Felipe M. Dias, Diego A. C. Cardenas, Marcelo A. F. Toledo, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez

Abstract: Hypertension, a leading contributor to cardiovascular morbidity, underscores the need for accurate and continuous blood pressure (BP) monitoring. Photoplethysmography (PPG) presents a promising approach to this end. However, the precision of BP estimates derived from PPG signals has been the subject of ongoing debate, necessitating a comprehensive evaluation of their effectiveness and constraints.… ▽ More Hypertension, a leading contributor to cardiovascular morbidity, underscores the need for accurate and continuous blood pressure (BP) monitoring. Photoplethysmography (PPG) presents a promising approach to this end. However, the precision of BP estimates derived from PPG signals has been the subject of ongoing debate, necessitating a comprehensive evaluation of their effectiveness and constraints. We developed a calibration-based Siamese ResNet model for BP estimation, using a signal input paired with a reference BP reading. We compared the use of normalized PPG (N-PPG) against the normalized Invasive Arterial Blood Pressure (N-IABP) signals as input. The N-IABP signals do not directly present systolic and diastolic values but theoretically provide a more accurate BP measure than PPG signals since it is a direct pressure sensor inside the body. Our strategy establishes a critical benchmark for PPG performance, realistically calibrating expectations for PPG's BP estimation capabilities. Nonetheless, we compared the performance of our models using different signal-filtering conditions to evaluate the impact of filtering on the results. We evaluated our method using the AAMI and the BHS standards employing the VitalDB dataset. The N-IABP signals meet with AAMI standards for both Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP), with errors of 1.29+-6.33mmHg for systolic pressure and 1.17+-5.78mmHg for systolic and diastolic pressure respectively for the raw N-IABP signal. In contrast, N-PPG signals, in their best setup, exhibited inferior performance than N-IABP, presenting 1.49+-11.82mmHg and 0.89+-7.27mmHg for systolic and diastolic pressure respectively. Our findings highlight the potential and limitations of employing PPG for BP estimation, showing that these signals contain information correlated to BP but may not be sufficient for predicting it accurately. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 17 pages, 7 figures, 3 tables

arXiv:2401.14926 [pdf, other]

Frictional contact of soft polymeric shells

Authors: Riad Sahli, Jeppe Mikkelsen, Mathias Satherstrom Boye, Marcelo A. Dias, Ramin Aghababaei

Abstract: The classical Hertzian contact model establishes a monotonic correlation between contact force and area. Here, we showed that the interplay between local friction and structural instability can deliberately lead to unconventional contact behavior when a soft elastic shell comes into contact with a flat surface. The deviation from Hertzian contact first arises from bending within the contact area,… ▽ More The classical Hertzian contact model establishes a monotonic correlation between contact force and area. Here, we showed that the interplay between local friction and structural instability can deliberately lead to unconventional contact behavior when a soft elastic shell comes into contact with a flat surface. The deviation from Hertzian contact first arises from bending within the contact area, followed by the second transition induced by buckling, resulting in a notable decrease in the contact area despite increased contact force. Friction delays both transitions and introduces hysteresis during unloading. However, a high amount of friction suppresses both buckling and dissipation. Different contact regimes are discussed in terms of rolling and sliding mechanisms, providing insights for tailoring contact behaviors in soft shells. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2312.14321 [pdf, other]

A Novel ML-driven Test Case Selection Approach for Enhancing the Performance of Grammatical Evolution

Authors: Krishn Kumar Gupt, Meghana Kshirsagar, Douglas Mota Dias, Joseph P. Sullivan, Conor Ryan

Abstract: Computational cost in metaheuristics such as Evolutionary Algorithms (EAs) is often a major concern, particularly with their ability to scale. In data-based training, traditional EAs typically use a significant portion, if not all, of the dataset for model training and fitness evaluation in each generation. This makes EAs suffer from high computational costs incurred during the fitness evaluation… ▽ More Computational cost in metaheuristics such as Evolutionary Algorithms (EAs) is often a major concern, particularly with their ability to scale. In data-based training, traditional EAs typically use a significant portion, if not all, of the dataset for model training and fitness evaluation in each generation. This makes EAs suffer from high computational costs incurred during the fitness evaluation of the population, particularly when working with large datasets. To mitigate this issue, we propose a Machine Learning (ML)-driven Distance-based Selection (DBS) algorithm that reduces the fitness evaluation time by optimizing test cases. We test our algorithm by applying it to 24 benchmark problems from Symbolic Regression (SR) and digital circuit domains and then using Grammatical Evolution (GE) to train models using the reduced dataset. We use GE to test DBS on SR and produce a system flexible enough to test it on digital circuit problems further. The quality of the solutions is tested and compared against the conventional training method to measure the coverage of training data selected using DBS, i.e., how well the subset matches the statistical properties of the entire dataset. Moreover, the effect of optimized training data on run time and the effective size of the evolved solutions is analyzed. Experimental and statistical evaluations of the results show our method empowered GE to yield superior or comparable solutions to the baseline (using the full datasets) with smaller sizes and demonstrates computational efficiency in terms of speed. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2311.15740 [pdf, other]

doi 10.1145/3606705

Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents

Authors: Mariana Dias, Carla Teixeira Lopes

Abstract: Linked Data is used in various fields as a new way of structuring and connecting data. Cultural heritage institutions have been using linked data to improve archival descriptions and facilitate the discovery of information. Most archival records have digital representations of physical artifacts in the form of scanned images that are non-machine-readable. Optical Character Recognition (OCR) recogn… ▽ More Linked Data is used in various fields as a new way of structuring and connecting data. Cultural heritage institutions have been using linked data to improve archival descriptions and facilitate the discovery of information. Most archival records have digital representations of physical artifacts in the form of scanned images that are non-machine-readable. Optical Character Recognition (OCR) recognizes text in images and translates it into machine-encoded text. This paper evaluates the impact of image processing methods and parameter tuning in OCR applied to typewritten cultural heritage documents. The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II) to tune the methods' parameters. Evaluation results show that parameterization by digital representation typology benefits the performance of image pre-processing algorithms in OCR. Furthermore, our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results. In particular, Adaptive Thresholding, Bilateral Filter, and Opening are the best-performing algorithms for the theatre plays' covers, letters, and overall dataset, respectively, and should be applied before OCR to improve its performance. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 25 pages, 4 figures

Journal ref: J. Comput. Cult. Herit. 16, 4, Article 77 (December 2023), 25 pages

arXiv:2308.12973 [pdf, other]

Proofs of valid categorical syllogisms in one diagrammatic and two symbolic axiomatic systems

Authors: Antonielly Garcia Rodrigues, Eduardo Mario Dias

Abstract: Gottfried Leibniz embarked on a research program to prove all the Aristotelic categorical syllogisms by diagrammatic and algebraic methods. He succeeded in proving them by means of Euler diagrams, but didn't produce a manuscript with their algebraic proofs. We demonstrate how key excerpts scattered across various Leibniz's drafts on logic contained sufficient ingredients to prove them by an algebr… ▽ More Gottfried Leibniz embarked on a research program to prove all the Aristotelic categorical syllogisms by diagrammatic and algebraic methods. He succeeded in proving them by means of Euler diagrams, but didn't produce a manuscript with their algebraic proofs. We demonstrate how key excerpts scattered across various Leibniz's drafts on logic contained sufficient ingredients to prove them by an algebraic method -- which we call the Leibniz-Cayley (LC) system -- without having to make use of the more expressive and complex machinery of first-order quantificational logic. In addition, we prove the classic categorical syllogisms again by a relational method -- which we call the McColl-Ladd (ML) system -- employing categorical relations studied by Hugh McColl and Christine Ladd. Finally, we show the connection of ML and LC with Boolean algebra, proving that ML is a consequence of LC, and that LC is a consequence of the Boolean lattice axioms, thus establishing Leibniz's historical priority over George Boole in characterizing and applying (a sufficient fragment of) Boolean algebra to effectively tackle categorical syllogistic. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 66 pages, 9 figures (some of which include subfigures), 5 tables (one of which includes 2 subtables). A cut-down version of this article, which removes the discussion on diagrammatic logic with Euler diagrams, was submitted to the "History and Philosophy of Logic" journal with a different title

MSC Class: 03-02; 03-03; 03-01; 03G05; 01-02; 01A45; 01A55; 97E30 ACM Class: F.4.1; I.2.4; I.2.3

arXiv:2308.05759 [pdf, ps, other]

doi 10.5753/sbcas.2024.1872

A machine-learning sleep-wake classification model using a reduced number of features derived from photoplethysmography and activity signals

Authors: Douglas A. Almeida, Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez

Abstract: Sleep is a crucial aspect of our overall health and well-being. It plays a vital role in regulating our mental and physical health, impacting our mood, memory, and cognitive function to our physical resilience and immune system. The classification of sleep stages is a mandatory step to assess sleep quality, providing the metrics to estimate the quality of sleep and how well our body is functioning… ▽ More Sleep is a crucial aspect of our overall health and well-being. It plays a vital role in regulating our mental and physical health, impacting our mood, memory, and cognitive function to our physical resilience and immune system. The classification of sleep stages is a mandatory step to assess sleep quality, providing the metrics to estimate the quality of sleep and how well our body is functioning during this essential period of rest. Photoplethysmography (PPG) has been demonstrated to be an effective signal for sleep stage inference, meaning it can be used on its own or in a combination with others signals to determine sleep stage. This information is valuable in identifying potential sleep issues and developing strategies to improve sleep quality and overall health. In this work, we present a machine learning sleep-wake classification model based on the eXtreme Gradient Boosting (XGBoost) algorithm and features extracted from PPG signal and activity counts. The performance of our method was comparable to current state-of-the-art methods with a Sensitivity of 91.15 $\pm$ 1.16%, Specificity of 53.66 $\pm$ 1.12%, F1-score of 83.88 $\pm$ 0.56%, and Kappa of 48.0 $\pm$ 0.86%. Our method offers a significant improvement over other approaches as it uses a reduced number of features, making it suitable for implementation in wearable devices that have limited computational power. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 8 pages, 3 figures

arXiv:2308.01930 [pdf, other]

Machine Learning-Based Diabetes Detection Using Photoplethysmography Signal Features

Authors: Filipe A. C. Oliveira, Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Douglas A. Almeida, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez

Abstract: Diabetes is a prevalent chronic condition that compromises the health of millions of people worldwide. Minimally invasive methods are needed to prevent and control diabetes but most devices for measuring glucose levels are invasive and not amenable for continuous monitoring. Here, we present an alternative method to overcome these shortcomings based on non-invasive optical photoplethysmography (PP… ▽ More Diabetes is a prevalent chronic condition that compromises the health of millions of people worldwide. Minimally invasive methods are needed to prevent and control diabetes but most devices for measuring glucose levels are invasive and not amenable for continuous monitoring. Here, we present an alternative method to overcome these shortcomings based on non-invasive optical photoplethysmography (PPG) for detecting diabetes. We classify non-Diabetic and Diabetic patients using the PPG signal and metadata for training Logistic Regression (LR) and eXtreme Gradient Boosting (XGBoost) algorithms. We used PPG signals from a publicly available dataset. To prevent overfitting, we divided the data into five folds for cross-validation. By ensuring that patients in the training set are not in the testing set, the model's performance can be evaluated on unseen subjects' data, providing a more accurate assessment of its generalization. Our model achieved an F1-Score and AUC of $58.8\pm20.0\%$ and $79.2\pm15.0\%$ for LR and $51.7\pm16.5\%$ and $73.6\pm17.0\%$ for XGBoost, respectively. Feature analysis suggested that PPG morphological features contains diabetes-related information alongside metadata. Our findings are within the same range reported in the literature, indicating that machine learning methods are promising for developing remote, non-invasive, and continuous measurement devices for detecting and preventing diabetes. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 11 pages, 6 figures

arXiv:2307.08766 [pdf, other]

Quality Assessment of Photoplethysmography Signals For Cardiovascular Biomarkers Monitoring Using Wearable Devices

Authors: Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Douglas A. Almeida, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez

Abstract: Photoplethysmography (PPG) is a non-invasive technology that measures changes in blood volume in the microvascular bed of tissue. It is commonly used in medical devices such as pulse oximeters and wrist worn heart rate monitors to monitor cardiovascular hemodynamics. PPG allows for the assessment of parameters (e.g., heart rate, pulse waveform, and peripheral perfusion) that can indicate condition… ▽ More Photoplethysmography (PPG) is a non-invasive technology that measures changes in blood volume in the microvascular bed of tissue. It is commonly used in medical devices such as pulse oximeters and wrist worn heart rate monitors to monitor cardiovascular hemodynamics. PPG allows for the assessment of parameters (e.g., heart rate, pulse waveform, and peripheral perfusion) that can indicate conditions such as vasoconstriction or vasodilation, and provides information about microvascular blood flow, making it a valuable tool for monitoring cardiovascular health. However, PPG is subject to a number of sources of variations that can impact its accuracy and reliability, especially when using a wearable device for continuous monitoring, such as motion artifacts, skin pigmentation, and vasomotion. In this study, we extracted 27 statistical features from the PPG signal for training machine-learning models based on gradient boosting (XGBoost and CatBoost) and Random Forest (RF) algorithms to assess quality of PPG signals that were labeled as good or poor quality. We used the PPG time series from a publicly available dataset and evaluated the algorithm s performance using Sensitivity (Se), Positive Predicted Value (PPV), and F1-score (F1) metrics. Our model achieved Se, PPV, and F1-score of 94.4, 95.6, and 95.0 for XGBoost, 94.7, 95.9, and 95.3 for CatBoost, and 93.7, 91.3 and 92.5 for RF, respectively. Our findings are comparable to state-of-the-art reported in the literature but using a much simpler model, indicating that ML models are promising for developing remote, non-invasive, and continuous measurement devices. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 9 pages

arXiv:2208.11594 [pdf, other]

Active Gaze Control for Foveal Scene Exploration

Authors: Alexandre M. F. Dias, Luís Simões, Plinio Moreno, Alexandre Bernardino

Abstract: Active perception and foveal vision are the foundations of the human visual system. While foveal vision reduces the amount of information to process during a gaze fixation, active perception will change the gaze direction to the most promising parts of the visual field. We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene, identifying the objects pres… ▽ More Active perception and foveal vision are the foundations of the human visual system. While foveal vision reduces the amount of information to process during a gaze fixation, active perception will change the gaze direction to the most promising parts of the visual field. We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene, identifying the objects present in their surroundings with in least number of gaze shifts. Our approach is based on three key methods. First, we take an off-the-shelf deep object detector, pre-trained on a large dataset of regular images, and calibrate the classification outputs to the case of foveated images. Second, a body-centered semantic map, encoding the objects classifications and corresponding uncertainties, is sequentially updated with the calibrated detections, considering several data fusion techniques. Third, the next best gaze fixation point is determined based on information-theoretic metrics that aim at minimizing the overall expected uncertainty of the semantic map. When compared to the random selection of next gaze shifts, the proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts and reduces to one third the number of required gaze shifts to attain similar performance. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: 6 pages, 8 figures, ICDL 2022 (International Conference on Development and Learning, formerly ICDL-EpiRob)

arXiv:2205.13760 [pdf, other]

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

Authors: Pascal Notin, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado, Aidan Gomez, Debora S. Marks, Yarin Gal

Abstract: The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. Deep generative models of protein sequences trained on multiple sequence alignments have been the most successful ap… ▽ More The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. Deep generative models of protein sequences trained on multiple sequence alignments have been the most successful approaches so far to address these tasks. The performance of these methods is however contingent on the availability of sufficiently deep and diverse alignments for reliable training. Their potential scope is thus limited by the fact many protein families are hard, if not impossible, to align. Large language models trained on massive quantities of non-aligned protein sequences from diverse families address these problems and show potential to eventually bridge the performance gap. We introduce Tranception, a novel transformer architecture leveraging autoregressive predictions and retrieval of homologous sequences at inference to achieve state-of-the-art fitness prediction performance. Given its markedly higher performance on multiple mutants, robustness to shallow alignments and ability to score indels, our approach offers significant gain of scope over existing approaches. To enable more rigorous model testing across a broader range of protein families, we develop ProteinGym -- an extensive set of multiplexed assays of variant effects, substantially increasing both the number and diversity of assays compared to existing benchmarks. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: ICML 2022

arXiv:2204.08891 [pdf, other]

Distributional Transform Based Information Reconciliation

Authors: Micael Andrade Dias, Francisco Marcos de Assis

Abstract: In this paper, we present an information reconciliation protocol designed for Continuous-Variable QKD using the Distributional Transform. By combining tools from copula and information theory, we present a method for extracting independent symmetric Bernoulli bits for Gaussian-modulated CVQKD protocols, which we called the Distributional Transform Expansion (DTE). We derived the expressions for th… ▽ More In this paper, we present an information reconciliation protocol designed for Continuous-Variable QKD using the Distributional Transform. By combining tools from copula and information theory, we present a method for extracting independent symmetric Bernoulli bits for Gaussian-modulated CVQKD protocols, which we called the Distributional Transform Expansion (DTE). We derived the expressions for the maximum reconciliation efficiency for both homodyne and heterodyne measurements, which, for the last, is achievable with an efficiency greater than 0.9 at a signal-to-noise ratio lower than -3.6 dB. △ Less

Submitted 11 May, 2023; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: 8 pages, 4 Figures

arXiv:2112.01243 [pdf, other]

Towards Continuous Compounding Effects and Agile Practices in Educational Experimentation

Authors: Luis M. Vaquero, Niall Twomey, Miguel Patricio Dias, Massimo Camplani, Robert Hardman

Abstract: Randomised control trials are currently the definitive gold standard approach for formal educational experiments. Although conclusions from these experiments are highly credible, their relatively slow experimentation rate, high expense and rigid framework can be seen to limit scope on: 1. $\textit{metrics}$: automation of the consistent rigorous computation of hundreds of metrics for every experim… ▽ More Randomised control trials are currently the definitive gold standard approach for formal educational experiments. Although conclusions from these experiments are highly credible, their relatively slow experimentation rate, high expense and rigid framework can be seen to limit scope on: 1. $\textit{metrics}$: automation of the consistent rigorous computation of hundreds of metrics for every experiment; 2. $\textit{concurrency}$: fast automated releases of hundreds of concurrent experiments daily; and 3. $\textit{safeguards}$: safety net tests and ramping up/rolling back treatments quickly to minimise negative impact. This paper defines a framework for categorising different experimental processes, and places a particular emphasis on technology readiness. On the basis of our analysis, our thesis is that the next generation of education technology successes will be heralded by recognising the context of experiments and collectively embracing the full set of processes that are at hand: from rapid ideation and prototyping produced in small scale experiments on the one hand, to influencing recommendations of best teaching practices with large-scale and technology-enabled online A/B testing on the other. A key benefit of the latter is that the running costs tend towards zero (leading to `free experimentation'). This offers low-risk opportunities to explore and drive value though well-planned lasting campaigns that iterate quickly at a large scale. Importantly, because these experimental platforms are so adaptable, the cumulative effect of the experimental campaign delivers compounding value exponentially over time even if each individual experiment delivers a small effect. △ Less

Submitted 17 November, 2021; originally announced December 2021.

arXiv:2004.06916 [pdf, other]

doi 10.1186/s13362-020-00098-w

Flattening the curves: on-off lock-down strategies for COVID-19 with an application to Brazi

Authors: L. Tarrataca, C. M. Dias, D. B. Haddad, E. F. Arruda

Abstract: The current COVID-19 pandemic is affecting different countries in different ways. The assortment of reporting techniques alongside other issues, such as underreporting and budgetary constraints, makes predicting the spread and lethality of the virus a challenging task. This work attempts to gain a better understanding of how COVID-19 will affect one of the least studied countries, namely Brazil. C… ▽ More The current COVID-19 pandemic is affecting different countries in different ways. The assortment of reporting techniques alongside other issues, such as underreporting and budgetary constraints, makes predicting the spread and lethality of the virus a challenging task. This work attempts to gain a better understanding of how COVID-19 will affect one of the least studied countries, namely Brazil. Currently, several Brazilian states are in a state of lock-down. However, there is political pressure for this type of measures to be lifted. This work considers the impact that such a termination would have on how the virus evolves locally. This was done by extending the SEIR model with an on / off strategy. Given the simplicity of SEIR we also attempted to gain more insight by developing a neural regressor. We chose to employ features that current clinical studies have pinpointed has having a connection to the lethality of COVID-19. We discuss how this data can be processed in order to obtain a robust assessment. △ Less

Submitted 15 April, 2020; originally announced April 2020.

arXiv:2004.05958 [pdf, other]

Anomaly Detection in Trajectory Data with Normalizing Flows

Authors: Madson L. D. Dias, César Lincoln C. Mattos, Ticiana L. C. da Silva, José Antônio F. de Macedo, Wellington C. P. Silva

Abstract: The task of detecting anomalous data patterns is as important in practical applications as challenging. In the context of spatial data, recognition of unexpected trajectories brings additional difficulties, such as high dimensionality and varying pattern lengths. We aim to tackle such a problem from a probability density estimation point of view, since it provides an unsupervised procedure to iden… ▽ More The task of detecting anomalous data patterns is as important in practical applications as challenging. In the context of spatial data, recognition of unexpected trajectories brings additional difficulties, such as high dimensionality and varying pattern lengths. We aim to tackle such a problem from a probability density estimation point of view, since it provides an unsupervised procedure to identify out of distribution samples. More specifically, we pursue an approach based on normalizing flows, a recent framework that enables complex density estimation from data with neural networks. Our proposal computes exact model likelihood values, an important feature of normalizing flows, for each segment of the trajectory. Then, we aggregate the segments' likelihoods into a single coherent trajectory anomaly score. Such a strategy enables handling possibly large sequences with different lengths. We evaluate our methodology, named aggregated anomaly detection with normalizing flows (GRADINGS), using real world trajectory data and compare it with more traditional anomaly detection techniques. The promising results obtained in the performed computational experiments indicate the feasibility of the GRADINGS, specially the variant that considers autoregressive normalizing flows. △ Less

Submitted 13 April, 2020; originally announced April 2020.

Comments: Accepted as a conference paper at 2020 International Joint Conference on Neural Networks (IJCNN 2020), part of 2020 IEEE World Congress on Computational Intelligence (IEEE WCCI 2020)

arXiv:1907.09795 [pdf, other]

Close Encounters of the Binary Kind: Signal Reconstruction Guarantees for Compressive Hadamard Sampling with Haar Wavelet Basis

Authors: Amirafshar Moshtaghpour, José M. Bioucas Dias, Laurent Jacques

Abstract: We investigate the problems of 1-D and 2-D signal recovery from subsampled Hadamard measurements using Haar wavelet sparsity prior. These problems are of interest in, e.g., computational imaging applications relying on optical multiplexing or single-pixel imaging. However, the realization of such modalities is often hindered by the coherence between the Hadamard and Haar bases. The variable and mu… ▽ More We investigate the problems of 1-D and 2-D signal recovery from subsampled Hadamard measurements using Haar wavelet sparsity prior. These problems are of interest in, e.g., computational imaging applications relying on optical multiplexing or single-pixel imaging. However, the realization of such modalities is often hindered by the coherence between the Hadamard and Haar bases. The variable and multilevel density sampling strategies solve this issue by adjusting the subsampling process to the local and multilevel coherence, respectively, between the two bases; hence enabling successful signal recovery. In this work, we compute an explicit sample-complexity bound for Hadamard-Haar systems as well as uniform and non-uniform recovery guarantees; a seemingly missing result in the related literature. We explore the faithfulness of the numerical simulations to the theoretical results and show in a practically relevant instance, e.g., single-pixel camera, that the target signal can be obtained from a few Hadamard measurements. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: 36 pages, 11 figures

arXiv:1907.05384 [pdf, other]

Visualização e animação de autómatos em Ocsigen Framework

Authors: Rita Macedo, Artur Miguel Dias, António Ravara

Abstract: Formal Languages and Automata Theory are important foundational topics in Computer Science. Their rigorous and formal characteristics make their learning them demanding. An important support for the assimilation of concepts is the possibility of interactively visualizing concrete examples of these computational models, facilitating understanding them. The tools available are neither complete nor… ▽ More Formal Languages and Automata Theory are important foundational topics in Computer Science. Their rigorous and formal characteristics make their learning them demanding. An important support for the assimilation of concepts is the possibility of interactively visualizing concrete examples of these computational models, facilitating understanding them. The tools available are neither complete nor fully support the interactive aspect. This project aims at the development of an interactive web tool in Portuguese to help in an assisted and intuitive way to understand the concepts and algorithms in question, seeing them work step-by-step, through typical examples preloaded or built by the user (an original aspect of our platform). The tool should therefore enable the creation and edition of an automata, as well as execute the relevant classical algorithms such as word acceptance, model conversions, etc. It is also intended to visualize not only the process of construction of the automaton, but also all the steps of applying the given algorithm. This tool uses the Ocsigen Framework because it provides the development of complete and interactive web tools written in OCaml, a functional language with a strong type checking system and therefore perfect for a web page without errors. Ocsigen was also chosen because it allows the creation of dynamic pages with a singular client-server system. This article presents the first phase of the development of the project. It is already possible to create automata, check the nature of its states and verify step-by-step (with undo) the acceptance of a word. △ Less

Submitted 11 July, 2019; originally announced July 2019.

Comments: Article in Portuguese, submitted to the national informatics conference INForum (http://inforum.org.pt/INForum2019)

arXiv:1906.06437 [pdf]

A Strategy for Expert Recommendation From Open Data Available on the Lattes Platform

Authors: Sérgio José de Sousa, Thiago Magela Rodrigues Dias, Adilson Luiz Pinto

Abstract: With the increasing volume of data and users of curriculum systems, the difficulty of finding specialists is increasing.This work proposes an open data extraction methodology of the Lattes Platform curricula, a treatment for this data and investigates a Recommendation Agent approach based on deep neural networks with autoencoder. With the increasing volume of data and users of curriculum systems, the difficulty of finding specialists is increasing.This work proposes an open data extraction methodology of the Lattes Platform curricula, a treatment for this data and investigates a Recommendation Agent approach based on deep neural networks with autoencoder. △ Less

Submitted 14 June, 2019; originally announced June 2019.

Comments: 7 pages, in Portuguese, 3 figures

arXiv:1808.01766 [pdf]

On Optimizing Deep Convolutional Neural Networks by Evolutionary Computing

Authors: M. U. B. Dias, D. D. N. De Silva, S. Fernando

Abstract: Optimization for deep networks is currently a very active area of research. As neural networks become deeper, the ability in manually optimizing the network becomes harder. Mini-batch normalization, identification of effective respective fields, momentum updates, introduction of residual blocks, learning rate adoption, etc. have been proposed to speed up the rate of convergent in manual training p… ▽ More Optimization for deep networks is currently a very active area of research. As neural networks become deeper, the ability in manually optimizing the network becomes harder. Mini-batch normalization, identification of effective respective fields, momentum updates, introduction of residual blocks, learning rate adoption, etc. have been proposed to speed up the rate of convergent in manual training process while keeping the higher accuracy level. However, the problem of finding optimal topological structure for a given problem is becoming a challenging task need to be addressed immediately. Few researchers have attempted to optimize the network structure using evolutionary computing approaches. Among them, few have successfully evolved networks with reinforcement learning and long-short-term memory. A very few has applied evolutionary programming into deep convolution neural networks. These attempts are mainly evolved the network structure and then subsequently optimized the hyper-parameters of the network. However, a mechanism to evolve the deep network structure under the techniques currently being practiced in manual process is still absent. Incorporation of such techniques into chromosomes level of evolutionary computing, certainly can take us to better topological deep structures. The paper concludes by identifying the gap between evolutionary based deep neural networks and deep neural networks. Further, it proposes some insights for optimizing deep neural networks using evolutionary computing techniques. △ Less

Submitted 6 August, 2018; originally announced August 2018.

arXiv:1607.03607 [pdf, other]

Cloud Empowered Self-Managing WSNs

Authors: Gabriel Martins Dias, Cintia Borges Margi, Filipe C. P. de Oliveira, Boris Bellalta

Abstract: Wireless Sensor Networks (WSNs) are composed of low powered and resource-constrained wireless sensor nodes that are not capable of performing high-complexity algorithms. Integrating these networks into the Internet of Things (IoT) facilitates their real-time optimization based on remote data visualization and analysis. This work describes the design and implementation of a scalable system architec… ▽ More Wireless Sensor Networks (WSNs) are composed of low powered and resource-constrained wireless sensor nodes that are not capable of performing high-complexity algorithms. Integrating these networks into the Internet of Things (IoT) facilitates their real-time optimization based on remote data visualization and analysis. This work describes the design and implementation of a scalable system architecture that integrates WSNs and cloud services to work autonomously in an IoT environment. The implementation relies on Software Defined Networking features to simplify the WSN management and exploits data analytics tools to execute a reinforcement learning algorithm that takes decisions based on the environment's evolution. It can automatically configure wireless sensor nodes to measure and transmit the temperature only at periods when the environment changes more often. Without any human intervention, the system could reduce nearly 85% the number of transmissions, showing the potential of this mechanism to extend WSNs lifetime without compromising the data quality. Besides attending to similar use cases, such a WSN autonomic management could promote a new business model to offer sensing tasks as a service, which is also introduced in this work. △ Less

Submitted 13 July, 2016; originally announced July 2016.

Comments: 12 pages, 4200 words, 4 figures, 2 tables, submitted to "IEEE Communications Magazine" special issue on the Internet of Things

ACM Class: C.1.3; C.2.4

arXiv:1607.03443 [pdf, other]

A Survey about Prediction-Based Data Reduction in Wireless Sensor Networks

Authors: Gabriel Martins Dias, Boris Bellalta, Simon Oechsner

Abstract: One of the main characteristics of Wireless Sensor Networks (WSNs) is the constrained energy resources of their wireless sensor nodes. Although this issue has been addressed in several works and got a lot of attention within the years, the most recent advances pointed out that the energy harvesting and wireless charging techniques may offer means to overcome such a limitation. Consequently, an iss… ▽ More One of the main characteristics of Wireless Sensor Networks (WSNs) is the constrained energy resources of their wireless sensor nodes. Although this issue has been addressed in several works and got a lot of attention within the years, the most recent advances pointed out that the energy harvesting and wireless charging techniques may offer means to overcome such a limitation. Consequently, an issue that had been put in second place, now emerges: the low availability of spectrum resources. Because of it, the incorporation of the WSNs into the Internet of Things and the exponential growth of the latter may be hindered if no control over the data generation is taken. Alternatively, part of the sensed data can be predicted without triggering transmissions and congesting the wireless medium. In this work, we analyze and categorize existing prediction-based data reduction mechanisms that have been designed for WSNs. Our main contribution is a systematic procedure for selecting a scheme to make predictions in WSNs, based on WSNs' constraints, characteristics of prediction methods and monitored data. Finally, we conclude the paper with a discussion about future challenges and open research directions in the use of prediction methods to support the WSNs' growth. △ Less

Submitted 12 July, 2016; originally announced July 2016.

Comments: 37 pages, 6 figures, 3 tables. Submitted to ACM Computing Surveys

ACM Class: C.2.4; I.2; A.1

arXiv:1607.03408 [pdf, other]

doi 10.1109/WoWMoM.2013.6583430

Performance Optimization of WSNs using External Information

Authors: Gabriel Martins Dias

Abstract: The goal of this work is to describe a self-management system that correlates data sensed by different Wireless Sensor Networks (WSNs) and adjusts the number of active nodes in each network to provide an appropriate amount of measurements. The architecture considers the factors that make the external data relevant to the local network, such as the distance between covered areas, the relation betwe… ▽ More The goal of this work is to describe a self-management system that correlates data sensed by different Wireless Sensor Networks (WSNs) and adjusts the number of active nodes in each network to provide an appropriate amount of measurements. The architecture considers the factors that make the external data relevant to the local network, such as the distance between covered areas, the relation between the types of sensed data and the reliability of the measurements. As a result, the operation of each network will be tuned to trade-off the accuracy of the measurements and the power consumption. △ Less

Submitted 12 July, 2016; originally announced July 2016.

Comments: Published in: IEEE 14th International Symposium and Workshops on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2013 (copyright has been transferred to IEEE)

ACM Class: D.2.11; C.2.1

arXiv:1606.02193 [pdf, other]

Adapting Sampling Interval of Sensor Networks Using On-Line Reinforcement Learning

Authors: Gabriel Martins Dias, Maddalena Nurchis, Boris Bellalta

Abstract: Monitoring Wireless Sensor Networks (WSNs) are composed of sensor nodes that report temperature, relative humidity, and other environmental parameters. The time between two successive measurements is a critical parameter to set during the WSN configuration because it can impact the WSN's lifetime, the wireless medium contention and the quality of the reported data. As trends in monitored parameter… ▽ More Monitoring Wireless Sensor Networks (WSNs) are composed of sensor nodes that report temperature, relative humidity, and other environmental parameters. The time between two successive measurements is a critical parameter to set during the WSN configuration because it can impact the WSN's lifetime, the wireless medium contention and the quality of the reported data. As trends in monitored parameters can significantly vary between scenarios and within time, identifying a sampling interval suitable for several cases is also challenging. In this work, we propose a dynamic sampling rate adaptation scheme based on reinforcement learning, able to tune sensors' sampling interval on-the-fly, according to environmental conditions and application requirements. The primary goal is to set the sampling interval to the best value possible so as to avoid oversampling and save energy, while not missing environmental changes that can be relevant for the application. In simulations, our mechanism could reduce up to 73% the total number of transmissions compared to a fixed strategy and, simultaneously, keep the average quality of information provided by the WSN. The inherent flexibility of the reinforcement learning algorithm facilitates its use in several scenarios, so as to exploit the broad scope of the Internet of Things. △ Less

Submitted 12 July, 2016; v1 submitted 7 June, 2016; originally announced June 2016.

Comments: 6 pages, 2 figures, submitted to the IEEE World Forum on Internet of Things 2016

ACM Class: C.2.4; I.2.1

arXiv:1605.09011 [pdf, other]

A Self-Managed Architecture for Sensor Networks Based on Real Time Data Analysis

Authors: Gabriel Martins Dias, Toni Adame, Boris Bellalta, Simon Oechsner

Abstract: Wireless sensor networks (WSNs) have been adopted as merely data producers for years. However, the data collected by WSNs can also be used to manage their operation and avoid unnecessary measurements that do not provide any new knowledge about the environment. The benefits are twofold because wireless sensor nodes may save their limited energy resources and also reduce the wireless medium occupanc… ▽ More Wireless sensor networks (WSNs) have been adopted as merely data producers for years. However, the data collected by WSNs can also be used to manage their operation and avoid unnecessary measurements that do not provide any new knowledge about the environment. The benefits are twofold because wireless sensor nodes may save their limited energy resources and also reduce the wireless medium occupancy. We present a self-managed platform that collects and stores data from sensor nodes, analyzes its contents and uses the built knowledge to adjust the operation of the entire network. The system architecture facilitates the incorporation of traditional WSNs into the Internet of Things by abstracting the lower communication layers and allowing decisions based on the data relevance. Finally, we demonstrate the platform optimizing a WSN's operation at runtime, based on different real-time data analysis. △ Less

Submitted 12 July, 2016; v1 submitted 29 May, 2016; originally announced May 2016.

Comments: 3 pages, 3 figures, demo proposal, accepted in the Future Technologies Conference IEEE 2016

ACM Class: D.2.11; H.4.3; C.2.1

arXiv:1604.01275 [pdf, other]

On the importance and feasibility of forecasting data in sensors

Authors: Gabriel Martins Dias, Boris Bellalta, Simon Oechsner

Abstract: The first generation of wireless sensor nodes have constrained energy resources and computational power, which discourages applications to process any task other than measuring and transmitting towards a central server. However, nowadays, sensor networks tend to be incorporated into the Internet of Things and the hardware evolution may change the old strategy of avoiding data computation in the se… ▽ More The first generation of wireless sensor nodes have constrained energy resources and computational power, which discourages applications to process any task other than measuring and transmitting towards a central server. However, nowadays, sensor networks tend to be incorporated into the Internet of Things and the hardware evolution may change the old strategy of avoiding data computation in the sensor nodes. In this paper, we show the importance of reducing the number of transmissions in sensor networks and present the use of forecasting methods as a way of doing it. Experiments using real sensor data show that state-of-the-art forecasting methods can be successfully implemented in the sensor nodes to keep the quality of their measurements and reduce up to 30% of their transmissions, lowering the channel utilization. We conclude that there is an old paradigm that is no longer the most beneficial, which is the strategy of always transmitting a measurement when it differs by more than a threshold from the last one transmitted. Adopting more complex forecasting methods in the sensor nodes is the alternative to significantly reduce the number of transmissions without compromising the quality of their measurements, and therefore support the exponential growth of the Internet of Things. △ Less

Submitted 5 April, 2016; originally announced April 2016.

Comments: 30 pages and 12 figures. This paper has been submitted to the Transactions on Mobile Computing journal

MSC Class: 62P30 ACM Class: C.2.4; C.2.1

arXiv:1509.08778 [pdf, other]

doi 10.1016/j.comcom.2017.08.002

The Impact of Dual Prediction Schemes on the Reduction of the Number of Transmissions in Sensor Networks

Authors: Gabriel Martins Dias, Boris Bellalta, Simon Oechsner

Abstract: Future Internet of Things (IoT) applications will require that billions of wireless devices transmit data to the cloud frequently. However, the wireless medium access is pointed as a problem for the next generations of wireless networks; hence, the number of data transmissions in Wireless Sensor Networks (WSNs) can quickly become a bottleneck, disrupting the exponential growth in the number of int… ▽ More Future Internet of Things (IoT) applications will require that billions of wireless devices transmit data to the cloud frequently. However, the wireless medium access is pointed as a problem for the next generations of wireless networks; hence, the number of data transmissions in Wireless Sensor Networks (WSNs) can quickly become a bottleneck, disrupting the exponential growth in the number of interconnected devices, sensors, and amount of produced data. Therefore, keeping a low number of data transmissions is critical to incorporate new sensor nodes and measure a great variety of parameters in future generations of WSNs. Thanks to the high accuracy and low complexity of state-of-the-art forecasting algorithms, Dual Prediction Schemes (DPSs) are potential candidates to optimize the data transmissions in WSNs at the finest level because they facilitate for sensor nodes to avoid unnecessary transmissions without affecting the quality of their measurements. In this work, we present a sensor network model that uses statistical theorems to describe the expected impact of DPSs and data aggregation in WSNs. We aim to provide a foundation for future works by characterizing the theoretical gains of processing data in sensors and conditioning its transmission to the predictions' accuracy. Our simulation results show that the number of transmissions can be reduced by almost 98% in the sensor nodes with the highest workload. We also detail the impact of predicting and aggregating transmissions according to the parameters that can be observed in common scenarios, such as sensor nodes' transmission ranges, the correlation between measurements of different sensors, and the period between two consecutive measurements in a sensor. △ Less

Submitted 30 August, 2017; v1 submitted 29 September, 2015; originally announced September 2015.

Comments: 30 pages, 8 figures

MSC Class: 62P30 ACM Class: C.2.4; C.2.1

Journal ref: Computer Communications 112C (2017) pp. 58-72

arXiv:1509.04207 [pdf, other]

doi 10.1145/2811237.2811299

DeltaImpactFinder: Assessing Semantic Merge Conflicts with Dependency Analysis

Authors: Martín Dias, Guillermo Polito, Damien Cassou, Stéphane Ducasse

Abstract: In software development, version control systems (VCS) provide branching and merging support tools. Such tools are popular among developers to concurrently change a code-base in separate lines and reconcile their changes automatically afterwards. However, two changes that are correct independently can introduce bugs when merged together. We call semantic merge conflicts this kind of bugs. Change i… ▽ More In software development, version control systems (VCS) provide branching and merging support tools. Such tools are popular among developers to concurrently change a code-base in separate lines and reconcile their changes automatically afterwards. However, two changes that are correct independently can introduce bugs when merged together. We call semantic merge conflicts this kind of bugs. Change impact analysis (CIA) aims at estimating the effects of a change in a codebase. In this paper, we propose to detect semantic merge conflicts using CIA. On a merge, DELTAIMPACTFINDER analyzes and compares the impact of a change in its origin and destination branches. We call the difference between these two impacts the delta-impact. If the delta-impact is empty, then there is no indicator of a semantic merge conflict and the merge can continue automatically. Otherwise, the delta-impact contains what are the sources of possible conflicts. △ Less

Submitted 14 September, 2015; originally announced September 2015.

Comments: International Workshop on Smalltalk Technologies 2015, Jul 2015, Brescia, Italy

arXiv:1506.00925 [pdf, other]

Facial Expressions Tracking and Recognition: Database Protocols for Systems Validation and Evaluation

Authors: Catarina Runa Miranda, Pedro Mendes, Pedro Coelho, Xenxo Alvarez, João Freitas, Miguel Sales Dias, Verónica Costa Orvalho

Abstract: Each human face is unique. It has its own shape, topology, and distinguishing features. As such, developing and testing facial tracking systems are challenging tasks. The existing face recognition and tracking algorithms in Computer Vision mainly specify concrete situations according to particular goals and applications, requiring validation methodologies with data that fits their purposes. Howeve… ▽ More Each human face is unique. It has its own shape, topology, and distinguishing features. As such, developing and testing facial tracking systems are challenging tasks. The existing face recognition and tracking algorithms in Computer Vision mainly specify concrete situations according to particular goals and applications, requiring validation methodologies with data that fits their purposes. However, a database that covers all possible variations of external and factors does not exist, increasing researchers' work in acquiring their own data or compiling groups of databases. To address this shortcoming, we propose a methodology for facial data acquisition through definition of fundamental variables, such as subject characteristics, acquisition hardware, and performance parameters. Following this methodology, we also propose two protocols that allow the capturing of facial behaviors under uncontrolled and real-life situations. As validation, we executed both protocols which lead to creation of two sample databases: FdMiee (Facial database with Multi input, expressions, and environments) and FACIA (Facial Multimodal database driven by emotional induced acting). Using different types of hardware, FdMiee captures facial information under environmental and facial behaviors variations. FACIA is an extension of FdMiee introducing a pipeline to acquire additional facial behaviors and speech using an emotion-acting method. Therefore, this work eases the creation of adaptable database according to algorithm's requirements and applications, leading to simplified validation and testing processes. △ Less

Submitted 2 June, 2015; originally announced June 2015.

Comments: 10 pages, 6 images, Computers & Graphics

arXiv:1505.03662 [pdf, other]

Predicting Occupancy Trends in Barcelona's Bicycle Service Stations Using Open Data

Authors: Gabriel Martins Dias, Boris Bellalta, Simon Oechsner

Abstract: In 2008, the CEO of the company that manages and maintains the public bicycle service in Barcelona recognized that one may not expect to always find a place to leave the rented bike nearby their destination, similarly to the case when, driving a car, people may not find a parking lot. In this work, we make predictions about the statuses of the stations of the public bicycle service in Barcelona. W… ▽ More In 2008, the CEO of the company that manages and maintains the public bicycle service in Barcelona recognized that one may not expect to always find a place to leave the rented bike nearby their destination, similarly to the case when, driving a car, people may not find a parking lot. In this work, we make predictions about the statuses of the stations of the public bicycle service in Barcelona. We show that it is feasible to correctly predict nearly half of the times when the stations are either completely full of bikes or completely empty, up to 2 days before they actually happen. That is, users might avoid stations at times when they could not return a bicycle that they have rented before, or when they would not find a bike to rent. To achieve that, we apply the Random Forest algorithm to classify the status of the stations and improve the lifetime of the models using publicly available data, such as information about the weather forecast. Finally, we expect that the results of the predictions can be used to improve the quality of the service and make it more reliable for the users. △ Less

Submitted 6 August, 2015; v1 submitted 14 May, 2015; originally announced May 2015.

Comments: 7 pages, 7 figures, 1 table, accepted to SAI Intelligent Systems Conference 2015

MSC Class: 68-06 ACM Class: I.2.M

arXiv:1502.06757 [pdf, other]

Untangling Fine-Grained Code Changes

Authors: Martín Dias, Alberto Bacchelli, Georgios Gousios, Damien Cassou, Stéphane Ducasse

Abstract: After working for some time, developers commit their code changes to a version control system. When doing so, they often bundle unrelated changes (e.g., bug fix and refactoring) in a single commit, thus creating a so-called tangled commit. Sharing tangled commits is problematic because it makes review, reversion, and integration of these commits harder and historical analyses of the project less r… ▽ More After working for some time, developers commit their code changes to a version control system. When doing so, they often bundle unrelated changes (e.g., bug fix and refactoring) in a single commit, thus creating a so-called tangled commit. Sharing tangled commits is problematic because it makes review, reversion, and integration of these commits harder and historical analyses of the project less reliable. Researchers have worked at untangling existing commits, i.e., finding which part of a commit relates to which task. In this paper, we contribute to this line of work in two ways: (1) A publicly available dataset of untangled code changes, created with the help of two developers who accurately split their code changes into self contained tasks over a period of four months; (2) a novel approach, EpiceaUntangler, to help developers share untangled commits (aka. atomic commits) by using fine-grained code change information. EpiceaUntangler is based and tested on the publicly available dataset, and further evaluated by deploying it to 7 developers, who used it for 2 weeks. We recorded a median success rate of 91% and average one of 75%, in automatically creating clusters of untangled fine-grained code changes. △ Less

Submitted 24 February, 2015; originally announced February 2015.

arXiv:1409.1001 [pdf, ps, other]

Towards information-centric WSN simulations

Authors: Gabriel Martins Dias, Boris Bellalta, Simon Oechsner

Abstract: In pursuance of integrating Wireless Sensor Networks (WSNs) with other systems, the use of techniques from other fields, such as machine learning and information processing, are becoming more common. Therefore, we faced the problem of missing network simulations that are not only focused on the packet exchange between network elements, but also in the data that is transmitted between them. In othe… ▽ More In pursuance of integrating Wireless Sensor Networks (WSNs) with other systems, the use of techniques from other fields, such as machine learning and information processing, are becoming more common. Therefore, we faced the problem of missing network simulations that are not only focused on the packet exchange between network elements, but also in the data that is transmitted between them. In other words, we needed a tool that evaluated the WSNs on how they evolve and react to the environmental changes. To illustrate the benefits of having such perspective, we explain the kind of simulation problems that we solved in our last work. Moreover, we outline the next steps in the direction of creating an extension to support this approach. △ Less

Submitted 3 September, 2014; originally announced September 2014.

Comments: Published in: A. Förster, C. Sommer, T. Steinbach, M. Wählisch (Eds.), Proc. of 1st OMNeT++ Community Summit, Hamburg, Germany, September 2, 2014, arXiv:1409.0093, 2014

Report number: OMNET/2014/09

arXiv:1407.0981 [pdf, other]

doi 10.1007/978-3-319-23440-3_2

A Centralized Mechanism to Make Predictions Based on Data From Multiple WSNs

Authors: Gabriel Martins Dias, Simon Oechsner, Boris Bellalta

Abstract: In this work, we present a method that exploits a scenario with inter-Wireless Sensor Networks (WSNs) information exchange by making predictions and adapting the workload of a WSN according to their outcomes. We show the feasibility of an approach that intelligently utilizes information produced by other WSNs that may or not belong to the same administrative domain. To illustrate how the predictio… ▽ More In this work, we present a method that exploits a scenario with inter-Wireless Sensor Networks (WSNs) information exchange by making predictions and adapting the workload of a WSN according to their outcomes. We show the feasibility of an approach that intelligently utilizes information produced by other WSNs that may or not belong to the same administrative domain. To illustrate how the predictions using data from external WSNs can be utilized, a specific use-case is considered, where the operation of a WSN measuring relative humidity is optimized using the data obtained from a WSN measuring temperature. Based on a dedicated performance score, the simulation results show that this new approach can find the optimal operating point associated to the trade-off between energy consumption and quality of measurements. Moreover, we outline the additional challenges that need to be overcome, and draw conclusions to guide the future work in this field. △ Less

Submitted 12 July, 2016; v1 submitted 3 July, 2014; originally announced July 2014.

Comments: 10 pages, simulation results and figures. Published in

ACM Class: D.2.11; C.2.1

Journal ref: Multiple Access Communications, Lecture Notes in Computer Science, Volume 9305, pp 19-32, 2015

arXiv:1309.4334 [pdf, other]

Representing Code History with Development Environment Events

Authors: Martin Dias, Damien Cassou, Stéphane Ducasse

Abstract: Modern development environments handle information about the intent of the programmer: for example, they use abstract syntax trees for providing high-level code manipulation such as refactorings; nevertheless, they do not keep track of this information in a way that would simplify code sharing and change understanding. In most Smalltalk systems, source code modifications are immediately registered… ▽ More Modern development environments handle information about the intent of the programmer: for example, they use abstract syntax trees for providing high-level code manipulation such as refactorings; nevertheless, they do not keep track of this information in a way that would simplify code sharing and change understanding. In most Smalltalk systems, source code modifications are immediately registered in a transaction log often called a ChangeSet. Such mechanism has proven reliability, but it has several limitations. In this paper we analyse such limitations and describe scenarios and requirements for tracking fine-grained code history with a semantic representation. We present Epicea, an early prototype implementation. We want to enrich code sharing with extra information from the IDE, which will help understanding the intention of the changes and let a new generation of tools act in consequence. △ Less

Submitted 17 September, 2013; originally announced September 2013.

Journal ref: IWST-2013 - 5th International Workshop on Smalltalk Technologies (2013)

arXiv:1112.3783 [pdf, other]

L-FLAT: Logtalk Toolkit for Formal Languages and Automata Theory

Authors: Paulo Moura, Artur Miguel Dias

Abstract: We describe L-FLAT, a Logtalk Toolkit for teaching Formal Languages and Automata Theory. L-FLAT supports the definition of \textsl{alphabets}, the definition of \textsl{orders} over alphabet symbols, the partial definition of \textsl{languages} using unit tests, and the definition of \textsl{mechanisms}, which implement language generators or language recognizers. Supported mechanisms include \tex… ▽ More We describe L-FLAT, a Logtalk Toolkit for teaching Formal Languages and Automata Theory. L-FLAT supports the definition of \textsl{alphabets}, the definition of \textsl{orders} over alphabet symbols, the partial definition of \textsl{languages} using unit tests, and the definition of \textsl{mechanisms}, which implement language generators or language recognizers. Supported mechanisms include \textsl{predicates}, \textsl{regular expressions}, \textsl{finite automata}, \textsl{context-free grammars}, \textsl{Turing machines}, and \textsl{push-down automata}. L-FLAT entities are implemented using the object-oriented features of Logtalk, providing a highly portable and easily extendable framework. The use of L-FLAT in educational environments is enhanced by supporting Mooshak, a web application that features automatic grading of submitted programs. △ Less

Submitted 16 December, 2011; originally announced December 2011.

Comments: Online Proceedings of the 11th International Colloquium on Implementation of Constraint LOgic Programming Systems (CICLOPS 2011), Lexington, KY, U.S.A., July 10, 2011

ACM Class: D.1.6; D.3

arXiv:0711.3605 [pdf]

Very strict selectional restrictions

Authors: Eric Laporte, Christian Leclère, Maria Carmelita P. Dias

Abstract: We discuss the characteristics and behaviour of two parallel classes of verbs in two Romance languages, French and Portuguese. Examples of these verbs are Port. abater [gado] and Fr. abattre [bétail], both meaning "slaughter [cattle]". In both languages, the definition of the class of verbs includes several features: - They have only one essential complement, which is a direct object. - The nomi… ▽ More We discuss the characteristics and behaviour of two parallel classes of verbs in two Romance languages, French and Portuguese. Examples of these verbs are Port. abater [gado] and Fr. abattre [bétail], both meaning "slaughter [cattle]". In both languages, the definition of the class of verbs includes several features: - They have only one essential complement, which is a direct object. - The nominal distribution of the complement is very limited, i.e., few nouns can be selected as head nouns of the complement. However, this selection is not restricted to a single noun, as would be the case for verbal idioms such as Fr. monter la garde "mount guard". - We excluded from the class constructions which are reductions of more complex constructions, e.g. Port. afinar [instrumento] com "tune [instrument] with". △ Less

Submitted 22 November, 2007; originally announced November 2007.

Journal ref: Dans Proceedings - Very strict selectional restrictions. A Comparison between Portuguese and French, Itatiaia : Brésil (2006)

Showing 1–36 of 36 results for author: Dias, M