-
Can AI support student engagement in classroom activities in higher education?
Authors:
Neha Rani,
Sharan Majumder,
Ishan Bhardwaj,
Pedro Guillermo Feijoo Garcia
Abstract:
Lucrative career prospects and creative opportunities often attract students to enroll in computer science majors and pursue advanced studies in the field. Consequently, there has been a significant surge in enrollment in computer science courses, resulting in large class sizes that can range from hundreds to even thousands of students. A common challenge in such large classrooms is the lack of en…
▽ More
Lucrative career prospects and creative opportunities often attract students to enroll in computer science majors and pursue advanced studies in the field. Consequently, there has been a significant surge in enrollment in computer science courses, resulting in large class sizes that can range from hundreds to even thousands of students. A common challenge in such large classrooms is the lack of engagement between students and both the instructor and the learning material. However, with advancements in technology and improvements in large language models (LLMs), there is a considerable opportunity to utilize LLM-based AI models, such as conversational artificial intelligence (CAI), to enhance student engagement with learning content in large classes. To explore the potential of CAI to support engagement, especially with learning content, we designed an activity in a software Engineering course (with a large class size) where students used CAI for an in-class activity. We conducted a within-subject investigation in a large classroom at a US university where we compared student engagement during an in-class activity that used CAI tool vs. one without CAI tool. The CAI tool we used was ChatGPT due to its widespread popularity and familiarity. Our results indicate that CAI (ChatGPT) has the potential to support engagement with learning content during in-class activities, especially in large class sizes. We further discuss the implications of our findings.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
Exploring the Feasibility of AI-Assisted Spine MRI Protocol Optimization Using DICOM Image Metadata
Authors:
Alice Vian,
Diego Andre Eifer,
Mauricio Anes,
Guilherme Ribeiro Garcia,
Mariana Recamonde-Mendoza
Abstract:
Artificial intelligence (AI) is increasingly being utilized to optimize magnetic resonance imaging (MRI) protocols. Given that image details are critical for diagnostic accuracy, optimizing MRI acquisition protocols is essential for enhancing image quality. While medical physicists are responsible for this optimization, the variability in equipment usage and the wide range of MRI protocols in clin…
▽ More
Artificial intelligence (AI) is increasingly being utilized to optimize magnetic resonance imaging (MRI) protocols. Given that image details are critical for diagnostic accuracy, optimizing MRI acquisition protocols is essential for enhancing image quality. While medical physicists are responsible for this optimization, the variability in equipment usage and the wide range of MRI protocols in clinical settings pose significant challenges. This study aims to validate the application of AI in optimizing MRI protocols using dynamic data from clinical practice, specifically DICOM metadata. To achieve this, four MRI spine exam databases were created, with the target attribute being the binary classification of image quality (good or bad). Five AI models were trained to identify trends in acquisition parameters that influence image quality, grounded in MRI theory. These trends were analyzed using SHAP graphs. The models achieved F1 performance ranging from 77% to 93% for datasets containing 292 or more instances, with the observed trends aligning with MRI theory. The models effectively reflected the practical realities of clinical MRI settings, offering a valuable tool for medical physicists in quality control tasks. In conclusion, AI has demonstrated its potential to optimize MRI protocols, supporting medical physicists in improving image quality and enhancing the efficiency of quality control in clinical practice.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Text-to-SQL based on Large Language Models and Database Keyword Search
Authors:
Eduardo R. Nascimento,
Caio Viktor S. Avila,
Yenier T. Izquierdo,
Grettel M. García,
Lucas Feijó L. Andrade,
Michelle S. P. Facina,
Melissa Lemos,
Marco A. Casanova
Abstract:
Text-to-SQL prompt strategies based on Large Language Models (LLMs) achieve remarkable performance on well-known benchmarks. However, when applied to real-world databases, their performance is significantly less than for these benchmarks, especially for Natural Language (NL) questions requiring complex filters and joins to be processed. This paper then proposes a strategy to compile NL questions i…
▽ More
Text-to-SQL prompt strategies based on Large Language Models (LLMs) achieve remarkable performance on well-known benchmarks. However, when applied to real-world databases, their performance is significantly less than for these benchmarks, especially for Natural Language (NL) questions requiring complex filters and joins to be processed. This paper then proposes a strategy to compile NL questions into SQL queries that incorporates a dynamic few-shot examples strategy and leverages the services provided by a database keyword search (KwS) platform. The paper details how the precision and recall of the schema-linking process are improved with the help of the examples provided and the keyword-matching service that the KwS platform offers. Then, it shows how the KwS platform can be used to synthesize a view that captures the joins required to process an input NL question and thereby simplify the SQL query compilation step. The paper includes experiments with a real-world relational database to assess the performance of the proposed strategy. The experiments suggest that the strategy achieves an accuracy on the real-world relational database that surpasses state-of-the-art approaches. The paper concludes by discussing the results obtained.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Transformer Vibration Forecasting for Advancing Rail Safety and Maintenance 4.0
Authors:
Darío C. Larese,
Almudena Bravo Cerrada,
Gabriel Dambrosio Tomei,
Alejandro Guerrero-López,
Pablo M. Olmos,
María Jesús Gómez García
Abstract:
Maintaining railway axles is critical to preventing severe accidents and financial losses. The railway industry is increasingly interested in advanced condition monitoring techniques to enhance safety and efficiency, moving beyond traditional periodic inspections toward Maintenance 4.0.
This study introduces a robust Deep Autoregressive solution that integrates seamlessly with existing systems t…
▽ More
Maintaining railway axles is critical to preventing severe accidents and financial losses. The railway industry is increasingly interested in advanced condition monitoring techniques to enhance safety and efficiency, moving beyond traditional periodic inspections toward Maintenance 4.0.
This study introduces a robust Deep Autoregressive solution that integrates seamlessly with existing systems to avert mechanical failures. Our approach simulates and predicts vibration signals under various conditions and fault scenarios, improving dataset robustness for more effective detection systems. These systems can alert maintenance needs, preventing accidents preemptively. We use experimental vibration signals from accelerometers on train axles.
Our primary contributions include a transformer model, ShaftFormer, designed for processing time series data, and an alternative model incorporating spectral methods and enhanced observation models. Simulating vibration signals under diverse conditions mitigates the high cost of obtaining experimental signals for all scenarios. Given the non-stationary nature of railway vibration signals, influenced by speed and load changes, our models address these complexities, offering a powerful tool for predictive maintenance in the rail industry.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning
Authors:
Harrison Fuller,
Fernando Gabriela Garcia,
Victor Flores
Abstract:
Few-shot learning in medical image classification presents a significant challenge due to the limited availability of annotated data and the complex nature of medical imagery. In this work, we propose Adaptive Vision-Language Fine-tuning with Hierarchical Contrastive Alignment (HiCA), a novel framework that leverages the capabilities of Large Vision-Language Models (LVLMs) for medical image analys…
▽ More
Few-shot learning in medical image classification presents a significant challenge due to the limited availability of annotated data and the complex nature of medical imagery. In this work, we propose Adaptive Vision-Language Fine-tuning with Hierarchical Contrastive Alignment (HiCA), a novel framework that leverages the capabilities of Large Vision-Language Models (LVLMs) for medical image analysis. HiCA introduces a two-stage fine-tuning strategy, combining domain-specific pretraining and hierarchical contrastive learning to align visual and textual representations at multiple levels. We evaluate our approach on two benchmark datasets, Chest X-ray and Breast Ultrasound, achieving state-of-the-art performance in both few-shot and zero-shot settings. Further analyses demonstrate the robustness, generalizability, and interpretability of our method, with substantial improvements in performance compared to existing baselines. Our work highlights the potential of hierarchical contrastive strategies in adapting LVLMs to the unique challenges of medical imaging tasks.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences
Authors:
Gabriel Lino Garcia,
João Renato Ribeiro Manesco,
Pedro Henrique Paiola,
Lucas Miranda,
Maria Paola de Salvo,
João Paulo Papa
Abstract:
The rapid advancement of large language models (LLMs) has opened new boundaries in the extraction and synthesis of medical knowledge, particularly within evidence synthesis. This paper reviews the state-of-the-art applications of LLMs in the biomedical domain, exploring their effectiveness in automating complex tasks such as evidence synthesis and data extraction from a biomedical corpus of docume…
▽ More
The rapid advancement of large language models (LLMs) has opened new boundaries in the extraction and synthesis of medical knowledge, particularly within evidence synthesis. This paper reviews the state-of-the-art applications of LLMs in the biomedical domain, exploring their effectiveness in automating complex tasks such as evidence synthesis and data extraction from a biomedical corpus of documents. While LLMs demonstrate remarkable potential, significant challenges remain, including issues related to hallucinations, contextual understanding, and the ability to generalize across diverse medical tasks. We highlight critical gaps in the current research literature, particularly the need for unified benchmarks to standardize evaluations and ensure reliability in real-world applications. In addition, we propose directions for future research, emphasizing the integration of state-of-the-art techniques such as retrieval-augmented generation (RAG) to enhance LLM performance in evidence synthesis. By addressing these challenges and utilizing the strengths of LLMs, we aim to improve access to medical literature and facilitate meaningful discoveries in healthcare.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Leveraging Large Language Models for Comparative Literature Summarization with Reflective Incremental Mechanisms
Authors:
Fernando Gabriela Garcia,
Spencer Burns,
Harrison Fuller
Abstract:
In this paper, we introduce ChatCite, a novel method leveraging large language models (LLMs) for generating comparative literature summaries. The ability to summarize research papers with a focus on key comparisons between studies is an essential task in academic research. Existing summarization models, while effective at generating concise summaries, fail to provide deep comparative insights. Cha…
▽ More
In this paper, we introduce ChatCite, a novel method leveraging large language models (LLMs) for generating comparative literature summaries. The ability to summarize research papers with a focus on key comparisons between studies is an essential task in academic research. Existing summarization models, while effective at generating concise summaries, fail to provide deep comparative insights. ChatCite addresses this limitation by incorporating a multi-step reasoning mechanism that extracts critical elements from papers, incrementally builds a comparative summary, and refines the output through a reflective memory process. We evaluate ChatCite on a custom dataset, CompLit-LongContext, consisting of 1000 research papers with annotated comparative summaries. Experimental results show that ChatCite outperforms several baseline methods, including GPT-4, BART, T5, and CoT, across various automatic evaluation metrics such as ROUGE and the newly proposed G-Score. Human evaluation further confirms that ChatCite generates more coherent, insightful, and fluent summaries compared to these baseline models. Our method provides a significant advancement in automatic literature review generation, offering researchers a powerful tool for efficiently comparing and synthesizing scientific research.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Adapting LLMs for the Medical Domain in Portuguese: A Study on Fine-Tuning and Model Evaluation
Authors:
Pedro Henrique Paiola,
Gabriel Lino Garcia,
João Renato Ribeiro Manesco,
Mateus Roder,
Douglas Rodrigues,
João Paulo Papa
Abstract:
This study evaluates the performance of large language models (LLMs) as medical agents in Portuguese, aiming to develop a reliable and relevant virtual assistant for healthcare professionals. The HealthCareMagic-100k-en and MedQuAD datasets, translated from English using GPT-3.5, were used to fine-tune the ChatBode-7B model using the PEFT-QLoRA method. The InternLM2 model, with initial training on…
▽ More
This study evaluates the performance of large language models (LLMs) as medical agents in Portuguese, aiming to develop a reliable and relevant virtual assistant for healthcare professionals. The HealthCareMagic-100k-en and MedQuAD datasets, translated from English using GPT-3.5, were used to fine-tune the ChatBode-7B model using the PEFT-QLoRA method. The InternLM2 model, with initial training on medical data, presented the best overall performance, with high precision and adequacy in metrics such as accuracy, completeness and safety. However, DrBode models, derived from ChatBode, exhibited a phenomenon of catastrophic forgetting of acquired medical knowledge. Despite this, these models performed frequently or even better in aspects such as grammaticality and coherence. A significant challenge was low inter-rater agreement, highlighting the need for more robust assessment protocols. This work paves the way for future research, such as evaluating multilingual models specific to the medical field, improving the quality of training data, and developing more consistent evaluation methodologies for the medical field.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
A Fly on the Wall -- Exploiting Acoustic Side-Channels in Differential Pressure Sensors
Authors:
Yonatan Gizachew Achamyeleh,
Mohamad Habib Fakih,
Gabriel Garcia,
Anomadarshi Barua,
Mohammad Al Faruque
Abstract:
Differential Pressure Sensors are widely deployed to monitor critical environments. However, our research unveils a previously overlooked vulnerability: their high sensitivity to pressure variations makes them susceptible to acoustic side-channel attacks. We demonstrate that the pressure-sensing diaphragms in DPS can inadvertently capture subtle air vibrations caused by speech, which propagate thr…
▽ More
Differential Pressure Sensors are widely deployed to monitor critical environments. However, our research unveils a previously overlooked vulnerability: their high sensitivity to pressure variations makes them susceptible to acoustic side-channel attacks. We demonstrate that the pressure-sensing diaphragms in DPS can inadvertently capture subtle air vibrations caused by speech, which propagate through the sensor's components and affect the pressure readings. Exploiting this discovery, we introduce BaroVox, a novel attack that reconstructs speech from DPS readings, effectively turning DPS into a "fly on the wall." We model the effect of sound on DPS, exploring the limits and challenges of acoustic leakage. To overcome these challenges, we propose two solutions: a signal-processing approach using a unique spectral subtraction method and a deep learning-based approach for keyword classification. Evaluations under various conditions demonstrate BaroVox's effectiveness, achieving a word error rate of 0.29 for manual recognition and 90.51% accuracy for automatic recognition. Our findings highlight the significant privacy implications of this vulnerability. We also discuss potential defense strategies to mitigate the risks posed by BaroVox.
△ Less
Submitted 7 October, 2024; v1 submitted 26 September, 2024;
originally announced September 2024.
-
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Authors:
Gonzalo Martin Garcia,
Karim Abou Zeid,
Christian Schmidt,
Daan de Geus,
Alexander Hermans,
Bastian Leibe
Abstract:
Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results, high computational demands due to multi-step inference limited its use in many scenarios. In this paper, we show that the perceived inefficiency was caused by…
▽ More
Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results, high computational demands due to multi-step inference limited its use in many scenarios. In this paper, we show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed. The fixed model performs comparably to the best previously reported configuration while being more than 200$\times$ faster. To optimize for downstream task performance, we perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models on common zero-shot benchmarks. We surprisingly find that this fine-tuning protocol also works directly on Stable Diffusion and achieves comparable performance to current state-of-the-art diffusion-based depth and normal estimation models, calling into question some of the conclusions drawn from prior works.
△ Less
Submitted 19 March, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Performance Comparison of ROS2 Middlewares for Multi-robot Mesh Networks in Planetary Exploration
Authors:
Loïck Pierre Chovet,
Gabriel Manuel Garcia,
Abhishek Bera,
Antoine Richard,
Kazuya Yoshida,
Miguel Angel Olivares-Mendez
Abstract:
Recent advancements in Multi-Robot Systems (MRS) and mesh network technologies pave the way for innovative approaches to explore extreme environments. The Artemis Accords, a series of international agreements, have further catalyzed this progress by fostering cooperation in space exploration, emphasizing the use of cutting-edge technologies. In parallel, the widespread adoption of the Robot Operat…
▽ More
Recent advancements in Multi-Robot Systems (MRS) and mesh network technologies pave the way for innovative approaches to explore extreme environments. The Artemis Accords, a series of international agreements, have further catalyzed this progress by fostering cooperation in space exploration, emphasizing the use of cutting-edge technologies. In parallel, the widespread adoption of the Robot Operating System 2 (ROS 2) by companies across various sectors underscores its robustness and versatility. This paper evaluates the performances of available ROS 2 MiddleWare (RMW), such as FastRTPS, CycloneDDS and Zenoh, over a mesh network with a dynamic topology. The final choice of RMW is determined by the one that would fit the most the scenario: an exploration of the extreme extra-terrestrial environment using a MRS. The conducted study in a real environment highlights Zenoh as a potential solution for future applications, showing a reduced delay, reachability, and CPU usage while being competitive on data overhead and RAM usage over a dynamic mesh topology
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
Authors:
Gabriel Lino Garcia,
Pedro Henrique Paiola,
Luis Henrique Morelli,
Giovani Candido,
Arnaldo Cândido Júnior,
Danilo Samuel Jodas,
Luis C. S. Afonso,
Ivan Rizzo Guilherme,
Bruno Elias Penteado,
João Paulo Papa
Abstract:
Large Language Models (LLMs) are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to…
▽ More
Large Language Models (LLMs) are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to respond to prompts in Portuguese satisfactorily, presenting, for example, code switching in their responses. This work proposes a fine-tuned LLaMA 2-based model for Portuguese prompts named Bode in two versions: 7B and 13B. We evaluate the performance of this model in classification tasks using the zero-shot approach with in-context learning, and compare it with other LLMs. Our main contribution is to bring an LLM with satisfactory results in the Portuguese language, as well as to provide a model that is free for research or commercial purposes.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
A Fully Automated Pipeline Using Swin Transformers for Deep Learning-Based Blood Segmentation on Head CT Scans After Aneurysmal Subarachnoid Hemorrhage
Authors:
Sergio Garcia Garcia,
Santiago Cepeda,
Ignacio Arrese,
Rosario Sarabia
Abstract:
Background: Accurate volumetric assessment of spontaneous subarachnoid hemorrhage (SAH) is a labor-intensive task performed with current manual and semiautomatic methods that might be relevant for its clinical and prognostic implications. In the present research, we sought to develop and validate an artificial intelligence-driven, fully automated blood segmentation tool for SAH patients via noncon…
▽ More
Background: Accurate volumetric assessment of spontaneous subarachnoid hemorrhage (SAH) is a labor-intensive task performed with current manual and semiautomatic methods that might be relevant for its clinical and prognostic implications. In the present research, we sought to develop and validate an artificial intelligence-driven, fully automated blood segmentation tool for SAH patients via noncontrast computed tomography (NCCT) scans employing a transformer-based Swin UNETR architecture. Methods: We retrospectively analyzed NCCT scans from patients with confirmed aneurysmal subarachnoid hemorrhage (aSAH) utilizing the Swin UNETR for segmentation. The performance of the proposed method was evaluated against manually segmented ground truth data using metrics such as Dice score, intersection over union (IoU), the volumetric similarity index (VSI), the symmetric average surface distance (SASD), and sensitivity and specificity. A validation cohort from an external institution was included to test the generalizability of the model. Results: The model demonstrated high accuracy with robust performance metrics across the internal and external validation cohorts. Notably, it achieved high Dice coefficient (0.873), IoU (0.810), VSI (0.840), sensitivity (0.821) and specificity (0.996) values and a low SASD (1.866), suggesting proficiency in segmenting blood in SAH patients. The model's efficiency was reflected in its processing speed, indicating potential for real-time applications. Conclusions: Our Swin UNETR-based model offers significant advances in the automated segmentation of blood after aSAH on NCCT images. Despite the computational intensity, the model operates effectively on standard hardware with a user-friendly interface, facilitating broader clinical adoption. Further validation across diverse datasets is warranted to confirm its clinical reliability.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
fMPI: Fast Novel View Synthesis in the Wild with Layered Scene Representations
Authors:
Jonas Kohler,
Nicolas Griffiths Sanchez,
Luca Cavalli,
Catherine Herold,
Albert Pumarola,
Alberto Garcia Garcia,
Ali Thabet
Abstract:
In this study, we propose two novel input processing paradigms for novel view synthesis (NVS) methods based on layered scene representations that significantly improve their runtime without compromising quality. Our approach identifies and mitigates the two most time-consuming aspects of traditional pipelines: building and processing the so-called plane sweep volume (PSV), which is a high-dimensio…
▽ More
In this study, we propose two novel input processing paradigms for novel view synthesis (NVS) methods based on layered scene representations that significantly improve their runtime without compromising quality. Our approach identifies and mitigates the two most time-consuming aspects of traditional pipelines: building and processing the so-called plane sweep volume (PSV), which is a high-dimensional tensor of planar re-projections of the input camera views. In particular, we propose processing this tensor in parallel groups for improved compute efficiency as well as super-sampling adjacent input planes to generate denser, and hence more accurate scene representation. The proposed enhancements offer significant flexibility, allowing for a balance between performance and speed, thus making substantial steps toward real-time applications. Furthermore, they are very general in the sense that any PSV-based method can make use of them, including methods that employ multiplane images, multisphere images, and layered depth images. In a comprehensive set of experiments, we demonstrate that our proposed paradigms enable the design of an NVS method that achieves state-of-the-art on public benchmarks while being up to $50x$ faster than existing state-of-the-art methods. It also beats the current forerunner in terms of speed by over $3x$, while achieving significantly better rendering quality.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Pushing the Limits of Quantum Computing for Simulating PFAS Chemistry
Authors:
Emil Dimitrov,
Goar Sanchez-Sanz,
James Nelson,
Lee O'Riordan,
Myles Doyle,
Sean Courtney,
Venkatesh Kannan,
Hassan Naseri,
Alberto Garcia Garcia,
James Tricker,
Marisa Faraggi,
Joshua Goings,
Luning Zhao
Abstract:
Accurate and scalable methods for computational quantum chemistry can accelerate research and development in many fields, ranging from drug discovery to advanced material design. Solving the electronic Schrodinger equation is the core problem of computational chemistry. However, the combinatorial complexity of this problem makes it intractable to find exact solutions, except for very small systems…
▽ More
Accurate and scalable methods for computational quantum chemistry can accelerate research and development in many fields, ranging from drug discovery to advanced material design. Solving the electronic Schrodinger equation is the core problem of computational chemistry. However, the combinatorial complexity of this problem makes it intractable to find exact solutions, except for very small systems. The idea of quantum computing originated from this computational challenge in simulating quantum-mechanics. We propose an end-to-end quantum chemistry pipeline based on the variational quantum eigensolver (VQE) algorithm and integrated with both HPC-based simulators and a trapped-ion quantum computer. Our platform orchestrates hundreds of simulation jobs on compute resources to efficiently complete a set of ab initio chemistry experiments with a wide range of parameterization. Per- and poly-fluoroalkyl substances (PFAS) are a large family of human-made chemicals that pose a major environmental and health issue globally. Our simulations includes breaking a Carbon-Fluorine bond in trifluoroacetic acid (TFA), a common PFAS chemical. This is a common pathway towards destruction and removal of PFAS. Molecules are modeled on both a quantum simulator and a trapped-ion quantum computer, specifically IonQ Aria. Using basic error mitigation techniques, the 11-qubit TFA model (56 entangling gates) on IonQ Aria yields near-quantitative results with milli-Hartree accuracy. Our novel results show the current state and future projections for quantum computing in solving the electronic structure problem, push the boundaries for the VQE algorithm and quantum computers, and facilitates development of quantum chemistry workflows.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
If the Sources Could Talk: Evaluating Large Language Models for Research Assistance in History
Authors:
Giselle Gonzalez Garcia,
Christian Weilbach
Abstract:
The recent advent of powerful Large-Language Models (LLM) provides a new conversational form of inquiry into historical memory (or, training data, in this case). We show that by augmenting such LLMs with vector embeddings from highly specialized academic sources, a conversational methodology can be made accessible to historians and other researchers in the Humanities. Concretely, we evaluate and d…
▽ More
The recent advent of powerful Large-Language Models (LLM) provides a new conversational form of inquiry into historical memory (or, training data, in this case). We show that by augmenting such LLMs with vector embeddings from highly specialized academic sources, a conversational methodology can be made accessible to historians and other researchers in the Humanities. Concretely, we evaluate and demonstrate how LLMs have the ability of assisting researchers while they examine a customized corpora of different types of documents, including, but not exclusive to: (1). primary sources, (2). secondary sources written by experts, and (3). the combination of these two. Compared to established search interfaces for digital catalogues, such as metadata and full-text search, we evaluate the richer conversational style of LLMs on the performance of two main types of tasks: (1). question-answering, and (2). extraction and organization of data. We demonstrate that LLMs semantic retrieval and reasoning abilities on problem-specific tasks can be applied to large textual archives that have not been part of the its training data. Therefore, LLMs can be augmented with sources relevant to specific research projects, and can be queried privately by researchers.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Software Testing and Code Refactoring: A Survey with Practitioners
Authors:
Danilo Leandro Lima,
Ronnie de Souza Santos,
Guilherme Pires Garcia,
Sildemir S. da Silva,
Cesar Franca,
Luiz Fernando Capretz
Abstract:
Nowadays, software testing professionals are commonly required to develop coding skills to work on test automation. One essential skill required from those who code is the ability to implement code refactoring, a valued quality aspect of software development; however, software developers usually encounter obstacles in successfully applying this practice. In this scenario, the present study aims to…
▽ More
Nowadays, software testing professionals are commonly required to develop coding skills to work on test automation. One essential skill required from those who code is the ability to implement code refactoring, a valued quality aspect of software development; however, software developers usually encounter obstacles in successfully applying this practice. In this scenario, the present study aims to explore how software testing professionals (e.g., software testers, test engineers, test analysts, and software QAs) deal with code refactoring to understand the benefits and limitations of this practice in the context of software testing. We followed the guidelines to conduct surveys in software engineering and applied three sampling techniques, namely convenience sampling, purposive sampling, and snowballing sampling, to collect data from testing professionals. We received answers from 80 individuals reporting their experience refactoring the code of automated tests. We concluded that in the context of software testing, refactoring offers several benefits, such as supporting the maintenance of automated tests and improving the performance of the testing team. However, practitioners might encounter barriers in effectively implementing this practice, in particular, the lack of interest from managers and leaders. Our study raises discussions on the importance of having testing professionals implement refactoring in the code of automated tests, allowing them to improve their coding abilities.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Deep learning-based interactive segmentation in remote sensing
Authors:
Zhe Wang,
Shoukun Sun,
Xiang Que,
Xiaogang Ma,
Carmen Galaz Garcia
Abstract:
Interactive segmentation, a computer vision technique where a user provides guidance to help an algorithm segment a feature of interest in an image, has achieved outstanding accuracy and efficient human-computer interaction. However, few studies have discussed its application to remote sensing imagery, where click-based interactive segmentation could greatly facilitate the analysis of complicated…
▽ More
Interactive segmentation, a computer vision technique where a user provides guidance to help an algorithm segment a feature of interest in an image, has achieved outstanding accuracy and efficient human-computer interaction. However, few studies have discussed its application to remote sensing imagery, where click-based interactive segmentation could greatly facilitate the analysis of complicated landscapes. This study aims to bridge the gap between click-based interactive segmentation and remote sensing image analysis by conducting a benchmark study on various click-based interactive segmentation models. We assessed the performance of five state-of-the-art interactive segmentation methods (Reviving Iterative Training with Mask Guidance for Interactive Segmentation (RITM), FocalClick, SimpleClick, Iterative Click Loss (ICL), and Segment Anything (SAM)) on two high-resolution aerial imagery datasets. The Cascade-Forward Refinement (CFR) approach, an innovative inference strategy for interactive segmentation, was also introduced to enhance the segmentation results without requiring manual efforts. We further integrated CFR into all models for comparison. The performance of these methods on various land cover types, different object sizes, and multiple band combinations in the datasets was evaluated. The SimpleClick-CFR model consistently outperformed the other methods in our experiments. Building upon these findings, we developed a dedicated online tool called SegMap for interactive segmentation of remote sensing data. SegMap incorporates a well-performing interactive model that is fine-tuned with remote sensing data. Unlike existing interactive segmentation tools, SegMap offers robust interactivity, modifiability, and adaptability to analyze remote sensing imagery.
△ Less
Submitted 12 May, 2025; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Learning efficient backprojections across cortical hierarchies in real time
Authors:
Kevin Max,
Laura Kriener,
Garibaldi Pineda García,
Thomas Nowotny,
Ismael Jaras,
Walter Senn,
Mihai A. Petrovici
Abstract:
Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths.
We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered…
▽ More
Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths.
We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning.
Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding.
△ Less
Submitted 2 February, 2024; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Detecting train driveshaft damages using accelerometer signals and Differential Convolutional Neural Networks
Authors:
Antía López Galdo,
Alejandro Guerrero-López,
Pablo M. Olmos,
María Jesús Gómez García
Abstract:
Railway axle maintenance is critical to avoid catastrophic failures. Nowadays, condition monitoring techniques are becoming more prominent in the industry to prevent enormous costs and damage to human lives. This paper proposes the development of a railway axle condition monitoring system based on advanced 2D-Convolutional Neural Network (CNN) architectures applied to time-frequency representation…
▽ More
Railway axle maintenance is critical to avoid catastrophic failures. Nowadays, condition monitoring techniques are becoming more prominent in the industry to prevent enormous costs and damage to human lives. This paper proposes the development of a railway axle condition monitoring system based on advanced 2D-Convolutional Neural Network (CNN) architectures applied to time-frequency representations of vibration signals. For this purpose, several preprocessing steps and different types of Deep Learning (DL) and Machine Learning (ML) architectures are discussed to design an accurate classification system. The resultant system converts the railway axle vibration signals into time-frequency domain representations, i.e., spectrograms, and, thus, trains a two-dimensional CNN to classify them depending on their cracks. The results showed that the proposed approach outperforms several alternative methods tested. The CNN architecture has been tested in 3 different wheelset assemblies, achieving AUC scores of 0.93, 0.86, and 0.75 outperforming any other architecture and showing a high level of reliability when classifying 4 different levels of defects.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
TensorAnalyzer: Identification of Urban Patterns in Big Cities using Non-Negative Tensor Factorization
Authors:
Jaqueline Silveira,
Germain García,
Afonso Paiva,
Marcelo Nery,
Sergio Adorno,
Luis Gustavo Nonato
Abstract:
Extracting relevant urban patterns from multiple data sources can be difficult using classical clustering algorithms since we have to make a suitable setup of the hyperparameters of the algorithms and deal with outliers. It should be addressed correctly to help urban planners in the decision-making process for the further development of a big city. For instance, experts' main interest in criminolo…
▽ More
Extracting relevant urban patterns from multiple data sources can be difficult using classical clustering algorithms since we have to make a suitable setup of the hyperparameters of the algorithms and deal with outliers. It should be addressed correctly to help urban planners in the decision-making process for the further development of a big city. For instance, experts' main interest in criminology is comprehending the relationship between crimes and the socio-economic characteristics at specific georeferenced locations. In addition, the classical clustering algorithms take little notice of the intricate spatial correlations in georeferenced data sources. This paper presents a new approach to detecting the most relevant urban patterns from multiple data sources based on tensor decomposition. Compared to classical methods, the proposed approach's performance is attested to validate the identified patterns' quality. The result indicates that the approach can effectively identify functional patterns to characterize the data set for further analysis in achieving good clustering quality. Furthermore, we developed a generic framework named TensorAnalyzer, where the effectiveness and usefulness of the proposed methodology are tested by a set of experiments and a real-world case study showing the relationship between the crime events around schools and students performance and other variables involved in the analysis.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Relict landslide detection using Deep-Learning architectures for image segmentation in rainforest areas: A new framework
Authors:
Guilherme P. B. Garcia,
Carlos H. Grohmann,
Lucas P. Soares,
Mateus Espadoto
Abstract:
Landslides are destructive and recurrent natural disasters on steep slopes and represent a risk to lives and properties. Knowledge of relict landslides location is vital to understand their mechanisms, update inventory maps and improve risk assessment. However, relict landslide mapping is complex in tropical regions covered with rainforest vegetation. A new CNN framework is proposed for semi-autom…
▽ More
Landslides are destructive and recurrent natural disasters on steep slopes and represent a risk to lives and properties. Knowledge of relict landslides location is vital to understand their mechanisms, update inventory maps and improve risk assessment. However, relict landslide mapping is complex in tropical regions covered with rainforest vegetation. A new CNN framework is proposed for semi-automatic detection of relict landslides, which uses a dataset generated by a k-means clustering algorithm and has a pre-training step. The weights computed in the pre-training are used to fine-tune the CNN training process. A comparison between the proposed and the standard framework is performed using CBERS-04A WPM images. Three CNNs for semantic segmentation are used (Unet, FPN, Linknet) with two augmented datasets. A total of 42 combinations of CNNs are tested. Values of precision and recall were very similar between the combinations tested. Recall was higher than 75% for every combination, but precision values were usually smaller than 20%. False positives (FP) samples were addressed as the cause for these low precision values. Predictions of the proposed framework were more accurate and correctly detected more landslides. This work demonstrates that there are limitations for detecting relict landslides in areas covered with rainforest, mainly related to similarities between the spectral response of pastures and deforested areas with Gleichenella sp. ferns, commonly used as an indicator of landslide scars.
△ Less
Submitted 29 May, 2023; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Virtual reality (VR) as a testing bench for consumer optical solutions: A machine learning approach (GBR) to visual comfort under simulated progressive addition lenses (PALS) distortions
Authors:
Miguel García García,
Yannick Sauer,
Tamara Watson,
Siegfried Wahl
Abstract:
For decades, manufacturers have attempted to reduce or eliminate the optical aberrations that appear on the progressive addition lens' surfaces during manufacturing. Besides every effort made, some of these distortions are inevitable given how lenses are fabricated, where in fact, astigmatism appears on the surface and cannot be entirely removed or where non-uniform magnification becomes inherent…
▽ More
For decades, manufacturers have attempted to reduce or eliminate the optical aberrations that appear on the progressive addition lens' surfaces during manufacturing. Besides every effort made, some of these distortions are inevitable given how lenses are fabricated, where in fact, astigmatism appears on the surface and cannot be entirely removed or where non-uniform magnification becomes inherent to the power change across the lens. Some presbyopes may refer to certain discomfort when wearing these lenses for the first time, and a subset of them might never adapt. Developing, prototyping, testing and purveying those lenses into the market come at a cost, which is usually reflected in the retail price. This study aims to test the feasibility of virtual reality for testing customers' satisfaction with these lenses, even before getting them onto production. VR offers a controlled environment where different parameters affecting progressive lens comforts, such as distortions, image displacement or optical blurring, can be analysed separately. In this study, the focus was set on the distortions and image displacement, not taking blur into account. Behavioural changes (head and eye movements) were recorded using the built-in eye tracker. Participants were significantly more displeased in the presence of highly distorted lens simulations. In addition, a gradient boosting regressor was fitted to the data, so predictors of discomfort could be unveiled, and ratings could be predicted without performing additional measurements.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
NeuralPassthrough: Learned Real-Time View Synthesis for VR
Authors:
Lei Xiao,
Salah Nouri,
Joel Hegland,
Alberto Garcia Garcia,
Douglas Lanman
Abstract:
Virtual reality (VR) headsets provide an immersive, stereoscopic visual experience, but at the cost of blocking users from directly observing their physical environment. Passthrough techniques are intended to address this limitation by leveraging outward-facing cameras to reconstruct the images that would otherwise be seen by the user without the headset. This is inherently a real-time view synthe…
▽ More
Virtual reality (VR) headsets provide an immersive, stereoscopic visual experience, but at the cost of blocking users from directly observing their physical environment. Passthrough techniques are intended to address this limitation by leveraging outward-facing cameras to reconstruct the images that would otherwise be seen by the user without the headset. This is inherently a real-time view synthesis challenge, since passthrough cameras cannot be physically co-located with the eyes. Existing passthrough techniques suffer from distracting reconstruction artifacts, largely due to the lack of accurate depth information (especially for near-field and disoccluded objects), and also exhibit limited image quality (e.g., being low resolution and monochromatic). In this paper, we propose the first learned passthrough method and assess its performance using a custom VR headset that contains a stereo pair of RGB cameras. Through both simulations and experiments, we demonstrate that our learned passthrough method delivers superior image quality compared to state-of-the-art methods, while meeting strict VR requirements for real-time, perspective-correct stereoscopic view synthesis over a wide field of view for desktop-connected headsets.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Simulated redistricting plans for the analysis and evaluation of redistricting in the United States
Authors:
Cory McCartan,
Christopher T. Kenny,
Tyler Simko,
George Garcia III,
Kevin Wang,
Melissa Wu,
Shiro Kuriwaki,
Kosuke Imai
Abstract:
This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standa…
▽ More
This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.
△ Less
Submitted 20 October, 2022; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Representational Systems Theory: A Unified Approach to Encoding, Analysing and Transforming Representations
Authors:
Daniel Raggi,
Gem Stapleton,
Mateja Jamnik,
Aaron Stockdill,
Grecia Garcia Garcia,
Peter C-H. Cheng
Abstract:
The study of representations is of fundamental importance to any form of communication, and our ability to exploit them effectively is paramount. This article presents a novel theory -- Representational Systems Theory -- that is designed to abstractly encode a wide variety of representations from three core perspectives: syntax, entailment, and their properties. By introducing the concept of a con…
▽ More
The study of representations is of fundamental importance to any form of communication, and our ability to exploit them effectively is paramount. This article presents a novel theory -- Representational Systems Theory -- that is designed to abstractly encode a wide variety of representations from three core perspectives: syntax, entailment, and their properties. By introducing the concept of a construction space, we are able to encode each of these core components under a single, unifying paradigm. Using our Representational Systems Theory, it becomes possible to structurally transform representations in one system into representations in another. An intrinsic facet of our structural transformation technique is representation selection based on properties that representations possess, such as their relative cognitive effectiveness or structural complexity. A major theoretical barrier to providing general structural transformation techniques is a lack of terminating algorithms. Representational Systems Theory permits the derivation of partial transformations when no terminating algorithm can produce a full transformation. Since Representational Systems Theory provides a universal approach to encoding representational systems, a further key barrier is eliminated: the need to devise system-specific structural transformation algorithms, that are necessary when different systems adopt different formalisation approaches. Consequently, Representational Systems Theory is the first general framework that provides a unified approach to encoding representations, supports representation selection via structural transformations, and has the potential for widespread practical application.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Lagarto I-Una plataforma hardware/software de arquitectura de computadoras para la academia e investigación
Authors:
Cristobal Ramirez Lazo,
Cesar Alejandro Hernandez,
Carlos Rojas Morales,
Gustavo Mondragon Garcia,
Luis Alfonso Villa Vargas,
Marco Antonio Ramirez Salinas
Abstract:
The design of Microprocessors Computer Architectures remains as a fundamental course in Computer Science and Computer Engineering. The technology and organization inside microprocessors have changed quite fast in the last twenty years. That change has increased the information handled in class, difficulting the teaching/learning process among students. Although there are tools, mainly simulators,…
▽ More
The design of Microprocessors Computer Architectures remains as a fundamental course in Computer Science and Computer Engineering. The technology and organization inside microprocessors have changed quite fast in the last twenty years. That change has increased the information handled in class, difficulting the teaching/learning process among students. Although there are tools, mainly simulators, available to exemplify abstract concepts during the course, these tools have not come along with the technology. The computer architecture group of the Centro de Investigación en Computación at the IPN Mexico is working on a project called Lagarto to create an open computing platform for research and education to simplify the understanding of fundamental concepts involved in computer architecture and operating systems. This paper introduces Lagarto, our soft-core-processor micro-architecture. It has a scalar pipeline structure and executes a full MIPS 32 R6 ISA [9] [10] and includes an MMU to support modern Operative Systems. The complete design has been described using Verilog HDL and is fully synthesizable in an FPGA. Additionally, this work shows different ways to use and test the microprocessor with codes written in either assembly language or C language. We show that the Lagarto project allows students to incorporate during the course not only the traditional model of visualizing theoretical knowledge in a practical exercise through simulators but also integrate into the teaching process the RTL design to build the Microprocessor Architecture.
△ Less
Submitted 26 February, 2022;
originally announced February 2022.
-
A self-training framework for glaucoma grading in OCT B-scans
Authors:
Gabriel García,
Adrián Colomer,
Rafael Verdú-Monedero,
José Dolz,
Valery Naranjo
Abstract:
In this paper, we present a self-training-based framework for glaucoma grading using OCT B-scans under the presence of domain shift. Particularly, the proposed two-step learning methodology resorts to pseudo-labels generated during the first step to augment the training dataset on the target domain, which is then used to train the final target model. This allows transferring knowledge-domain from…
▽ More
In this paper, we present a self-training-based framework for glaucoma grading using OCT B-scans under the presence of domain shift. Particularly, the proposed two-step learning methodology resorts to pseudo-labels generated during the first step to augment the training dataset on the target domain, which is then used to train the final target model. This allows transferring knowledge-domain from the unlabeled data. Additionally, we propose a novel glaucoma-specific backbone which introduces residual and attention modules via skip-connections to refine the embedding features of the latent space. By doing this, our model is capable of improving state-of-the-art from a quantitative and interpretability perspective. The reported results demonstrate that the proposed learning strategy can boost the performance of the model on the target dataset without incurring in additional annotation steps, by using only labels from the source examples. Our model consistently outperforms the baseline by 1-3% across different metrics and bridges the gap with respect to training the model on the labeled target data.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
On the Regularization of Autoencoders
Authors:
Harald Steck,
Dario Garcia Garcia
Abstract:
While much work has been devoted to understanding the implicit (and explicit) regularization of deep nonlinear networks in the supervised setting, this paper focuses on unsupervised learning, i.e., autoencoders are trained with the objective of reproducing the output from the input. We extend recent results [Jin et al. 2021] on unconstrained linear models and apply them to (1) nonlinear autoencode…
▽ More
While much work has been devoted to understanding the implicit (and explicit) regularization of deep nonlinear networks in the supervised setting, this paper focuses on unsupervised learning, i.e., autoencoders are trained with the objective of reproducing the output from the input. We extend recent results [Jin et al. 2021] on unconstrained linear models and apply them to (1) nonlinear autoencoders and (2) constrained linear autoencoders, obtaining the following two results: first, we show that the unsupervised setting by itself induces strong additional regularization, i.e., a severe reduction in the model-capacity of the learned autoencoder: we derive that a deep nonlinear autoencoder cannot fit the training data more accurately than a linear autoencoder does if both models have the same dimensionality in their last hidden layer (and under a few additional assumptions). Our second contribution is concerned with the low-rank EDLAE model [Steck 2020], which is a linear autoencoder with a constraint on the diagonal of the learned low-rank parameter-matrix for improved generalization: we derive a closed-form approximation to the optimum of its non-convex training-objective, and empirically demonstrate that it is an accurate approximation across all model-ranks in our experiments on three well-known data sets.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
A deep neural network for multi-species fish detection using multiple acoustic cameras
Authors:
Guglielmo Fernandez Garcia,
François Martignac,
Marie Nevoux,
Laurent Beaulaton,
Thomas Corpetti
Abstract:
Underwater acoustic cameras are high potential devices for many applications in ecology, notably for fisheries management and monitoring. However how to extract such data into high value information without a time-consuming entire dataset reading by an operator is still a challenge. Moreover the analysis of acoustic imaging, due to its low signal-to-noise ratio, is a perfect training ground for ex…
▽ More
Underwater acoustic cameras are high potential devices for many applications in ecology, notably for fisheries management and monitoring. However how to extract such data into high value information without a time-consuming entire dataset reading by an operator is still a challenge. Moreover the analysis of acoustic imaging, due to its low signal-to-noise ratio, is a perfect training ground for experimenting with new approaches, especially concerning Deep Learning techniques. We present hereby a novel approach that takes advantage of both CNN (Convolutional Neural Network) and classical CV (Computer Vision) techniques, able to detect a generic class ''fish'' in acoustic video streams. The pipeline pre-treats the acoustic images to extract 2 features, in order to localise the signals and improve the detection performances. To ensure the performances from an ecological point of view, we propose also a two-step validation, one to validate the results of the trainings and one to test the method on a real-world scenario. The YOLOv3-based model was trained with data of fish from multiple species recorded by the two common acoustic cameras, DIDSON and ARIS, including species of high ecological interest, as Atlantic salmon or European eels. The model we developed provides satisfying results detecting almost 80% of fish and minimizing the false positive rate, however the model is much less efficient for eel detections on ARIS videos. The first CNN pipeline for fish monitoring exploiting video data from two models of acoustic cameras satisfies most of the required features. Many challenges are still present, such as the automation of fish species identification through a multiclass model. 1 However the results point a new solution for dealing with complex data, such as sonar data, which can also be reapplied in other cases where the signal-to-noise ratio is a challenge.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Initial Test of "BabyRobot" Behaviour on a Teleoperated Toy Substitution: Improving the Motor Skills of Toddlers
Authors:
Eric Canas,
Alba M. G. Garcia,
Anais Garrell,
Cecilio Angulo
Abstract:
This article introduces "Baby Robot", a robot aiming to improve motor skills of babies and toddlers. Authors developed a car-like toy that moves autonomously using reinforcement learning and computer vision techniques. The robot behaviour is to escape from a target baby that has been previously recognized, or at least detected, while avoiding obstacles, so that the security of the baby is not comp…
▽ More
This article introduces "Baby Robot", a robot aiming to improve motor skills of babies and toddlers. Authors developed a car-like toy that moves autonomously using reinforcement learning and computer vision techniques. The robot behaviour is to escape from a target baby that has been previously recognized, or at least detected, while avoiding obstacles, so that the security of the baby is not compromised. A myriad of commercial toys with a similar mobility improvement purpose are into the market; however, there is no one that bets for an intelligent autonomous movement, as they perform simple yet repetitive trajectories in the best of the cases. Two crawling toys -- one in representation of "Baby Robot" -- were tested in a real environment with respect to regular toys in order to check how they improved the toddlers mobility. These real-life experiments were conducted with our proposed robot in a kindergarten, where a group of children interacted with the toys. Significant improvement in the motion skills of participants were detected.
△ Less
Submitted 16 March, 2023; v1 submitted 19 September, 2021;
originally announced September 2021.
-
A Novel Self-Learning Framework for Bladder Cancer Grading Using Histopathological Images
Authors:
Gabriel García,
Anna Esteve,
Adrián Colomer,
David Ramos,
Valery Naranjo
Abstract:
Recently, bladder cancer has been significantly increased in terms of incidence and mortality. Currently, two subtypes are known based on tumour growth: non-muscle invasive (NMIBC) and muscle-invasive bladder cancer (MIBC). In this work, we focus on the MIBC subtype because it is of the worst prognosis and can spread to adjacent organs. We present a self-learning framework to grade bladder cancer…
▽ More
Recently, bladder cancer has been significantly increased in terms of incidence and mortality. Currently, two subtypes are known based on tumour growth: non-muscle invasive (NMIBC) and muscle-invasive bladder cancer (MIBC). In this work, we focus on the MIBC subtype because it is of the worst prognosis and can spread to adjacent organs. We present a self-learning framework to grade bladder cancer from histological images stained via immunohistochemical techniques. Specifically, we propose a novel Deep Convolutional Embedded Attention Clustering (DCEAC) which allows classifying histological patches into different severity levels of the disease, according to the patterns established in the literature. The proposed DCEAC model follows a two-step fully unsupervised learning methodology to discern between non-tumour, mild and infiltrative patterns from high-resolution samples of 512x512 pixels. Our system outperforms previous clustering-based methods by including a convolutional attention module, which allows refining the features of the latent space before the classification stage. The proposed network exceeds state-of-the-art approaches by 2-3% across different metrics, achieving a final average accuracy of 0.9034 in a multi-class scenario. Furthermore, the reported class activation maps evidence that our model is able to learn by itself the same patterns that clinicians consider relevant, without incurring prior annotation steps. This fact supposes a breakthrough in muscle-invasive bladder cancer grading which bridges the gap with respect to train the model on labelled data.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Circumpapillary OCT-Focused Hybrid Learning for Glaucoma Grading Using Tailored Prototypical Neural Networks
Authors:
Gabriel García,
Rocío del Amor,
Adrián Colomer,
Rafael Verdú-Monedero,
Juan Morales-Sánchez,
Valery Naranjo
Abstract:
Glaucoma is one of the leading causes of blindness worldwide and Optical Coherence Tomography (OCT) is the quintessential imaging technique for its detection. Unlike most of the state-of-the-art studies focused on glaucoma detection, in this paper, we propose, for the first time, a novel framework for glaucoma grading using raw circumpapillary B-scans. In particular, we set out a new OCT-based hyb…
▽ More
Glaucoma is one of the leading causes of blindness worldwide and Optical Coherence Tomography (OCT) is the quintessential imaging technique for its detection. Unlike most of the state-of-the-art studies focused on glaucoma detection, in this paper, we propose, for the first time, a novel framework for glaucoma grading using raw circumpapillary B-scans. In particular, we set out a new OCT-based hybrid network which combines hand-driven and deep learning algorithms. An OCT-specific descriptor is proposed to extract hand-crafted features related to the retinal nerve fibre layer (RNFL). In parallel, an innovative CNN is developed using skip-connections to include tailored residual and attention modules to refine the automatic features of the latent space. The proposed architecture is used as a backbone to conduct a novel few-shot learning based on static and dynamic prototypical networks. The k-shot paradigm is redefined giving rise to a supervised end-to-end system which provides substantial improvements discriminating between healthy, early and advanced glaucoma samples. The training and evaluation processes of the dynamic prototypical network are addressed from two fused databases acquired via Heidelberg Spectralis system. Validation and testing results reach a categorical accuracy of 0.9459 and 0.8788 for glaucoma grading, respectively. Besides, the high performance reported by the proposed model for glaucoma detection deserves a special mention. The findings from the class activation maps are directly in line with the clinicians' opinion since the heatmaps pointed out the RNFL as the most relevant structure for glaucoma diagnosis.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Efficient Deep Learning Architectures for Fast Identification of Bacterial Strains in Resource-Constrained Devices
Authors:
R. Gallardo García,
S. Jarquín Rodríguez,
B. Beltrán Martínez,
C. Hernández Gracidas,
R. Martínez Torres
Abstract:
This work presents twelve fine-tuned deep learning architectures to solve the bacterial classification problem over the Digital Image of Bacterial Species Dataset. The base architectures were mainly published as mobile or efficient solutions to the ImageNet challenge, and all experiments presented in this work consisted of making several modifications to the original designs, in order to make them…
▽ More
This work presents twelve fine-tuned deep learning architectures to solve the bacterial classification problem over the Digital Image of Bacterial Species Dataset. The base architectures were mainly published as mobile or efficient solutions to the ImageNet challenge, and all experiments presented in this work consisted of making several modifications to the original designs, in order to make them able to solve the bacterial classification problem by using fine-tuning and transfer learning techniques. This work also proposes a novel data augmentation technique for this dataset, which is based on the idea of artificial zooming, strongly increasing the performance of every tested architecture, even doubling it in some cases. In order to get robust and complete evaluations, all experiments were performed with 10-fold cross-validation and evaluated with five different metrics: top-1 and top-5 accuracy, precision, recall, and F1 score. This paper presents a complete comparison of the twelve different architectures, cross-validated with the original and the augmented version of the dataset, the results are also compared with several literature methods. Overall, eight of the eleven architectures surpassed the 0.95 scores in top-1 accuracy with our data augmentation method, being 0.9738 the highest top-1 accuracy. The impact of the data augmentation technique is reported with relative improvement scores.
△ Less
Submitted 11 June, 2021;
originally announced June 2021.
-
Prostate Gland Segmentation in Histology Images via Residual and Multi-Resolution U-Net
Authors:
Julio Silva-Rodríguez,
Elena Payá-Bosch,
Gabriel García,
Adrián Colomer,
Valery Naranjo
Abstract:
Prostate cancer is one of the most prevalent cancers worldwide. One of the key factors in reducing its mortality is based on early detection. The computer-aided diagnosis systems for this task are based on the glandular structural analysis in histology images. Hence, accurate gland detection and segmentation is crucial for a successful prediction. The methodological basis of this work is a prostat…
▽ More
Prostate cancer is one of the most prevalent cancers worldwide. One of the key factors in reducing its mortality is based on early detection. The computer-aided diagnosis systems for this task are based on the glandular structural analysis in histology images. Hence, accurate gland detection and segmentation is crucial for a successful prediction. The methodological basis of this work is a prostate gland segmentation based on U-Net convolutional neural network architectures modified with residual and multi-resolution blocks, trained using data augmentation techniques. The residual configuration outperforms in the test subset the previous state-of-the-art approaches in an image-level comparison, reaching an average Dice Index of 0.77.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Implementation of Departmental and Periodical Examination Analyzer System
Authors:
Julius G. Garcia,
Connie C. Aunario
Abstract:
Administering examinations both in public and private academic institutions can be tedious and unmanageable. The multiplicity of problems affecting the conduct of departmental and periodical examination can be greatly reduced by automating the examination process. The purpose of this action research is to provide an alternative technical solution in administering test through the use of Examinatio…
▽ More
Administering examinations both in public and private academic institutions can be tedious and unmanageable. The multiplicity of problems affecting the conduct of departmental and periodical examination can be greatly reduced by automating the examination process. The purpose of this action research is to provide an alternative technical solution in administering test through the use of Examination System. This software application can facilitate a plenitude of examinees for different subjects that implements a random questioning technique and can generate item analysis and test results. The Departmental and Periodical Examination System was developed using Visual Basic language. The software modules were tested using the functional testing method. Using the criteria and metrics of ISO 9126 software quality model, the system was evaluated by a group of students, teachers, school administrators and information technology professionals and has received an overall weighted mean of 4.56585 with an excellent descriptive rating. Therefore, the performance of the application software provides solution that can surmount the gargantuan problems of test administration and post-examination issues and performs all the operations specified in the objectives.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Reinforcement Learning with Probabilistic Boolean Network Models of Smart Grid Devices
Authors:
Pedro J. Rivera Torres,
Carlos Gershenson García,
Samir Kanaan Izquierdo
Abstract:
The area of Smart Power Grids needs to constantly improve its efficiency and resilience, to pro-vide high quality electrical power, in a resistant grid, managing faults and avoiding failures. Achieving this requires high component reliability, adequate maintenance, and a studied failure occurrence. Correct system operation involves those activities, and novel methodologies to detect, classify, and…
▽ More
The area of Smart Power Grids needs to constantly improve its efficiency and resilience, to pro-vide high quality electrical power, in a resistant grid, managing faults and avoiding failures. Achieving this requires high component reliability, adequate maintenance, and a studied failure occurrence. Correct system operation involves those activities, and novel methodologies to detect, classify, and isolate faults and failures, model and simulate processes with predictive algorithms and analytics (using data analysis and asset condition to plan and perform activities). We show-case the application of a complex-adaptive, self-organizing modeling method, Probabilistic Boolean Networks (PBN), as a way towards the understanding of the dynamics of smart grid devices, and to model and characterize their behavior. This work demonstrates that PBNs are is equivalent to the standard Reinforcement Learning Cycle, in which the agent/model has an inter-action with its environment and receives feedback from it in the form of a reward signal. Differ-ent reward structures were created in order to characterize preferred behavior. This information can be used to guide the PBN to avoid fault conditions and failures.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
Canvas Adoption Assessment and Acceptance of the Learning Management System on a Web-Based Platform
Authors:
Julius G. Garcia,
Mark Gil T. Gangan,
Marita N. Tolentino,
Marc Ligas,
Shirley D. Moraga,
Amelia A. Pasilan
Abstract:
The acquisition of non-proprietary and proprietary learning management system has provided a richer learning experience to users and raised interest among education providers. This study aims to assess student adoption of Canvas as a new learning management system and its potential as a web-based platform in the e-learning programme of the University of the East. This study also assessed student r…
▽ More
The acquisition of non-proprietary and proprietary learning management system has provided a richer learning experience to users and raised interest among education providers. This study aims to assess student adoption of Canvas as a new learning management system and its potential as a web-based platform in the e-learning programme of the University of the East. This study also assessed student readiness in using Canvas. A survey was administered to 214 students of the University of the East through snowball sampling. An Exploratory Factor Analysis was conducted to examine the validity of the model. A Confirmatory Factory Analysis was used to validate the Exploratory Factor Analysis results and analyse the correlation of the constructs. A Structural Equation Modelling was conducted to analyse the relationships between the constructs, which were evaluated using fit indices. Adopted from the Technology Acceptance Model, the constructs perceived ease of use, perceived usefulness, and attitude were studied. The study reveals that students perceived usefulness and attitude towards using Canvas in a web-based platform have direct and significant effects on their intention to use Canvas. The students perceived ease of use has a significant effect on their perceived usefulness but has no significant effects on their attitude towards the use of Canvas. The students technological maturity and prior experience in using a learning management system influenced their beliefs on the adaptation of similar technology. Exploring the potential benefits of Canvas and factors affecting the students adoption amplifies access to quality education to fulfil educational directives. Furthermore, educational institutions should explore technological migration related to teaching and learning processes.
△ Less
Submitted 26 May, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Decision Support System for an Intelligent Operator of Utility Tunnel Boring Machines
Authors:
Gabriel Rodriguez Garcia,
Gabriel Michau,
Herbert H. Einstein,
Olga Fink
Abstract:
In tunnel construction projects, delays induce high costs. Thus, tunnel boring machines (TBM) operators aim for fast advance rates, without safety compromise, a difficult mission in uncertain ground environments. Finding the optimal control parameters based on the TBM sensors' measurements remains an open research question with large practical relevance.
In this paper, we propose an intelligent…
▽ More
In tunnel construction projects, delays induce high costs. Thus, tunnel boring machines (TBM) operators aim for fast advance rates, without safety compromise, a difficult mission in uncertain ground environments. Finding the optimal control parameters based on the TBM sensors' measurements remains an open research question with large practical relevance.
In this paper, we propose an intelligent decision support system developed in three steps. First past projects performances are evaluated with an optimality score, taking into account the advance rate and the working pressure safety. Then, a deep learning model learns the mapping between the TBM measurements and this optimality score. Last, in real application, the model provides incremental recommendations to improve the optimality, taking into account the current setting and measurements of the TBM.
The proposed approach is evaluated on real micro-tunnelling project and demonstrates great promises for future projects.
△ Less
Submitted 8 January, 2021; v1 submitted 7 January, 2021;
originally announced January 2021.
-
Beam Management in 5G: A Stochastic Geometry Analysis
Authors:
Sanket S. Kalamkar,
François Baccelli,
Fuad M. Abinader Jr.,
Andrea S. Marcano Fani,
Luis G. Uzeda Garcia
Abstract:
Beam management is central in the operation of beamformed wireless cellular systems such as 5G New Radio (NR) networks. Focusing the energy radiated to mobile terminals (MTs) by increasing the number of beams per cell increases signal power and decreases interference, and has hence the potential to bring major improvements on area spectral efficiency (ASE). This paper proposes a first system-level…
▽ More
Beam management is central in the operation of beamformed wireless cellular systems such as 5G New Radio (NR) networks. Focusing the energy radiated to mobile terminals (MTs) by increasing the number of beams per cell increases signal power and decreases interference, and has hence the potential to bring major improvements on area spectral efficiency (ASE). This paper proposes a first system-level stochastic geometry model encompassing major aspects of the beam management problem: frequencies, antenna configurations, and propagation; physical layer, wireless links, and coding; network geometry, interference, and resource sharing; sensing, signaling, and mobility management. This model leads to a simple analytical expression for the effective rate that the typical user gets in this context. This in turn allows one to find the number of beams per cell and per MT that maximizes the effective ASE by offering the best tradeoff between beamforming gains and beam management operational overheads and costs, for a wide variety of 5G network scenarios including millimeter wave (mmWave) and sub-6 GHz. As part of the system-level analysis, we define and analyze several underlying new and fundamental performance metrics that are of independent interest. The numerical results discuss the effects of different systemic tradeoffs and performance optimizations of mmWave and sub-6 GHz 5G deployments.
△ Less
Submitted 5 December, 2020;
originally announced December 2020.
-
Usability Dimensions and Behavioral Intention to Use Markdown to Moodle in Test Construction
Authors:
Julius G. Garcia,
Connie C. Aunario,
Go Frendi Gunawan
Abstract:
Creating test with numerous items in Moodle can be tedious and less intuitive compared to conventional method. This study aims to determine the Markdown to Moodle performance in easing the test construction process and explain the underlying factors of the behavioral intention to use the application. Markdown to Moodle is an application that allows users to type the bulk of test items directly to…
▽ More
Creating test with numerous items in Moodle can be tedious and less intuitive compared to conventional method. This study aims to determine the Markdown to Moodle performance in easing the test construction process and explain the underlying factors of the behavioral intention to use the application. Markdown to Moodle is an application that allows users to type the bulk of test items directly to the browser and generates .doc, .md and .xml files stored in the local drive. The .xml can be imported to Moodle test bank. This lessens the time of creating test items one at a time in the Moodle. A training and a survey were conducted among teachers with Moodle usage experience. Results from this study allowed the researchers to determine the usability of the application and the users behavioral intention. This highlights the workflow continuity in test construction as a key factor in the usage and performance of the application.
△ Less
Submitted 26 November, 2020;
originally announced December 2020.
-
Creating and Maintaining Filipino and Japanese Students Social Capital with Facebook
Authors:
Mayumi Kubota,
Julius G. Garcia
Abstract:
This study investigated perceptions and patterns of Facebook use among Filipino and Japanese undergraduate students and the relationship of these factors to creating and maintaining students social capital, international posture, and willingness to communicate. The survey of undergraduate students was conducted online and 483 valid responses were obtained. Data revealed the characteristic uses of…
▽ More
This study investigated perceptions and patterns of Facebook use among Filipino and Japanese undergraduate students and the relationship of these factors to creating and maintaining students social capital, international posture, and willingness to communicate. The survey of undergraduate students was conducted online and 483 valid responses were obtained. Data revealed the characteristic uses of FB by Filipino and Japanese undergraduate students. An interrelation model among six factors, International Posture, WTC, Perception, Bridging, Bonding, and Utilization for Filipino students showed the importance of utilization or FB usage for bridging social capital and bonding social capital. For Japanese students, bonding social capital mediated between utilization or FB usage and bridging. Bridging social capital was established only through bonding social capital. Thus, unless Japanese students are close enough to their FB friends, they do not construct new relationships on FB that will influence Filipino students in the process of virtual internationalization in the future.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.
-
An Interactive Foreign Language Trainer Using Assessment and Feedback Modalities
Authors:
Rosalyn P. Reyes,
Evelyn C. Samson,
Julius G. Garcia
Abstract:
English has long been set as the universal language. Basically most, if not all countries in the world know how to speak English or at least try to use it in their everyday communications for the purpose of globalizing. This study is designed to help the students learn from one or all of the four most commonly used foreign languages in the field of Information Technology namely Korean, Mandarin Ch…
▽ More
English has long been set as the universal language. Basically most, if not all countries in the world know how to speak English or at least try to use it in their everyday communications for the purpose of globalizing. This study is designed to help the students learn from one or all of the four most commonly used foreign languages in the field of Information Technology namely Korean, Mandarin Chinese, Japanese, and Spanish. Composed of a set of words, phrases, and sentences, the program is intended to quickly teach the students in the form of basic, intermediate, and advanced levels. This study has used the Agile model in system development. Functionality, reliability, usability, efficiency, and portability were also considered in determining the level of the acceptability of the system in terms of ISO 25010:2011. This interactive foreign language trainer is built to associate fun with learning, to remedy the lack of perseverance by some in learning a new language, and to make learning the users' favorite playtime activity. The study allows the user to interact with the program which provides support for their learning. Moreover, this study reveals that integrating feedback modalities in the training and assessment modules of the software strengthens and enhances the memory in learning the language.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Stochastic Geometry-Based Modeling and Analysis of Beam Management in 5G
Authors:
Sanket S. Kalamkar,
Fuad M. Abinader Jr.,
François Baccelli,
Andrea S. Marcano Fani,
and Luis G. Uzeda Garcia
Abstract:
Beam management is central in the operation of dense 5G cellular networks. Focusing the energy radiated to mobile terminals (MTs) by increasing the number of beams per cell increases signal power and decreases interference, and has hence the potential to bring major improvements on area spectral efficiency (ASE). This benefit, however, comes with unavoidable overheads that increase with the number…
▽ More
Beam management is central in the operation of dense 5G cellular networks. Focusing the energy radiated to mobile terminals (MTs) by increasing the number of beams per cell increases signal power and decreases interference, and has hence the potential to bring major improvements on area spectral efficiency (ASE). This benefit, however, comes with unavoidable overheads that increase with the number of beams and the MT speed. This paper proposes a first system-level stochastic geometry model encompassing major aspects of the beam management problem: frequencies, antennas, and propagation; physical layer, wireless links, and coding; network geometry, interference, and resource sharing; sensing, signaling, and mobility management. This model leads to a simple analytical expression for the effective ASE that the typical user gets in this context. This in turn allows one to find, for a wide variety of 5G network scenarios including millimeter wave (mmWave) and sub-6 GHz, the number of beams per cell that offers the best global trade-off between these benefits and costs. We finally provide numerical results that discuss the effects of different systemic trade-offs and performances of mmWave and sub-6 GHz 5G deployments.
△ Less
Submitted 14 September, 2020; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Glaucoma Detection From Raw Circumapillary OCT Images Using Fully Convolutional Neural Networks
Authors:
Gabriel García,
Rocío del Amor,
Adrián Colomer,
Valery Naranjo
Abstract:
Nowadays, glaucoma is the leading cause of blindness worldwide. We propose in this paper two different deep-learning-based approaches to address glaucoma detection just from raw circumpapillary OCT images. The first one is based on the development of convolutional neural networks (CNNs) trained from scratch. The second one lies in fine-tuning some of the most common state-of-the-art CNNs architect…
▽ More
Nowadays, glaucoma is the leading cause of blindness worldwide. We propose in this paper two different deep-learning-based approaches to address glaucoma detection just from raw circumpapillary OCT images. The first one is based on the development of convolutional neural networks (CNNs) trained from scratch. The second one lies in fine-tuning some of the most common state-of-the-art CNNs architectures. The experiments were performed on a private database composed of 93 glaucomatous and 156 normal B-scans around the optic nerve head of the retina, which were diagnosed by expert ophthalmologists. The validation results evidence that fine-tuned CNNs outperform the networks trained from scratch when small databases are addressed. Additionally, the VGG family of networks reports the most promising results, with an area under the ROC curve of 0.96 and an accuracy of 0.92, during the prediction of the independent test set.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
Temporal signals to images: Monitoring the condition of industrial assets with deep learning image processing algorithms
Authors:
Gabriel Rodriguez Garcia,
Gabriel Michau,
Mélanie Ducoffe,
Jayant Sen Gupta,
Olga Fink
Abstract:
The ability to detect anomalies in time series is considered highly valuable in numerous application domains. The sequential nature of time series objects is responsible for an additional feature complexity, ultimately requiring specialized approaches in order to solve the task. Essential characteristics of time series, situated outside the time domain, are often difficult to capture with state-of…
▽ More
The ability to detect anomalies in time series is considered highly valuable in numerous application domains. The sequential nature of time series objects is responsible for an additional feature complexity, ultimately requiring specialized approaches in order to solve the task. Essential characteristics of time series, situated outside the time domain, are often difficult to capture with state-of-the-art anomaly detection methods when no transformations have been applied to the time series. Inspired by the success of deep learning methods in computer vision, several studies have proposed transforming time series into image-like representations, used as inputs for deep learning models, and have led to very promising results in classification tasks. In this paper, we first review the signal to image encoding approaches found in the literature. Second, we propose modifications to some of their original formulations to make them more robust to the variability in large datasets. Third, we compare them on the basis of a common unsupervised task to demonstrate how the choice of the encoding can impact the results when used in the same deep learning architecture. We thus provide a comparison between six encoding algorithms with and without the proposed modifications. The selected encoding methods are Gramian Angular Field, Markov Transition Field, recurrence plot, grey scale encoding, spectrogram, and scalogram. We also compare the results achieved with the raw signal used as input for another deep learning model. We demonstrate that some encodings have a competitive advantage and might be worth considering within a deep learning framework. The comparison is performed on a dataset collected and released by Airbus SAS, containing highly complex vibration measurements from real helicopter flight tests. The different encodings provide competitive results for anomaly detection.
△ Less
Submitted 26 February, 2021; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Longitudinal Dynamics Model Identification of an Electric Car Based on Real Response Approximation
Authors:
Salvador Dominguez,
Gaëtan Garcia,
Arnaud Hamon,
Vincent Frémont
Abstract:
Obtaining a realistic and accurate model of the longitudinal dynamics is key for a good speed control of a self-driving car. It is also useful to simulate the longitudinal behavior of the vehicle with high fidelity. In this paper, a straightforward and generic method for obtaining the friction, braking and propulsion forces as a function of speed, throttle input and brake input is proposed. Experi…
▽ More
Obtaining a realistic and accurate model of the longitudinal dynamics is key for a good speed control of a self-driving car. It is also useful to simulate the longitudinal behavior of the vehicle with high fidelity. In this paper, a straightforward and generic method for obtaining the friction, braking and propulsion forces as a function of speed, throttle input and brake input is proposed. Experimental data is recorded during tests over the full speed range to estimate the forces, to which the corresponding curves are adjusted. A simple and direct balance of forces in the direction tangent to the ground is used to obtain an estimation of the real forces involved. Then a model composed of approximate spline curves that fit the results is proposed. Using splines to model the dynamic response has the advantage of being quick and accurate, avoiding the complexity of parameter identification and tuning of non-linear responses embedding the internal functionalities of the car, like ABS or regenerative brake. This methodology has been applied to LS2N's electric Renault Zoe but can be applied to any other electric car. As shown in the experimental section, a comparison between the estimated acceleration of the car using the model and the real one over a wide range of speeds along a trip of about $10km/h$ reveals only $0.35m/s^2$ of error standard deviation in a range of $\pm{2}m/s^2$ which is very encouraging.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
An Experimental Evaluation of Robustness and Precision for Long-term LiDAR-based Localization in Highly Changing Environments
Authors:
Salvador Dominguez,
Gaëtan Garcia,
Vincent Frémont,
Arnaud Hamon
Abstract:
One of the hardest challenges to face in the development of a non GPS-based localization system for autonomous vehicles is the changes of the environment. LiDAR-based systems typically try to match the last measurements obtained with a previously recorded map of the area. If the existing map is not updated along time, there is a good chance that the measures will not match the environment well eno…
▽ More
One of the hardest challenges to face in the development of a non GPS-based localization system for autonomous vehicles is the changes of the environment. LiDAR-based systems typically try to match the last measurements obtained with a previously recorded map of the area. If the existing map is not updated along time, there is a good chance that the measures will not match the environment well enough, causing the vehicle to lose track of its location. In this paper, we present and analyze experimental results regarding the robustness and precision of a map-matching based localization system over a certain period of time in the following three cases: (1) without any update of the initial map, (2) updating the map as the vehicle moves and (3) with map updates that take into account surrounding structures labeled as "fixed" which are treated differently. The environment of the tests is a busy parking area, which ensures drastic changes from one day to the next. The precision is obtained by comparing the positions computed using the map with the ones provided by a Real-Time Kinematic GPS system. The experimental results reveal a positioning error of about 6cm which remains stable even after 23 days when using fixed structures on the working area.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Learning CHARME models with neural networks
Authors:
José G. Gómez García,
Jalal Fadili,
Christophe Chesneau
Abstract:
In this paper, we consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear nonparametric AR-ARCH time series. Under certain Lipschitz-type conditions on the autoregressive and volatility functions, we prove that this model is stationary, ergodic and $τ$-weakly dependent. These conditions are much weaker than those p…
▽ More
In this paper, we consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear nonparametric AR-ARCH time series. Under certain Lipschitz-type conditions on the autoregressive and volatility functions, we prove that this model is stationary, ergodic and $τ$-weakly dependent. These conditions are much weaker than those presented in the literature that treats this model. Moreover, this result forms the theoretical basis for deriving an asymptotic theory of the underlying (non)parametric estimation, which we present for this model. As an application, from the universal approximation property of neural networks (NN), we develop a learning theory for the NN-based autoregressive functions of the model, where the strong consistency and asymptotic normality of the considered estimator of the NN weights and biases are guaranteed under weak conditions.
△ Less
Submitted 17 November, 2020; v1 submitted 8 February, 2020;
originally announced February 2020.
-
0-Step Capturability, Motion Decomposition and Global Feedback Control of the 3D Variable Height-Inverted Pendulum
Authors:
Gabriel Garcia,
Robert Griffin,
Jerry Pratt
Abstract:
One common method for stabilizing robots after a push is the Instantaneous Capture Point, however, this has the fundamental limitation of assuming constant height. Although there are several works for balancing bipedal robots including height variations in 2D, the amount of literature on 3D models is limited. There are optimization methods using variable Center of Pressure (CoP) and reaction force…
▽ More
One common method for stabilizing robots after a push is the Instantaneous Capture Point, however, this has the fundamental limitation of assuming constant height. Although there are several works for balancing bipedal robots including height variations in 2D, the amount of literature on 3D models is limited. There are optimization methods using variable Center of Pressure (CoP) and reaction force to the ground, although they do not provide the physical region where a robot can step and require a precomputation for the analysis. This work provides the necessary and sufficient conditions to maintain balance of the 3D Variable Height Inverted Pendulum (VHIP) with both, fixed and variable CoP. We also prove that the 3D VHIP with Fixed CoP is the same as its 2D version, and we generalize controllers working on the 2D VHIP to the 3D VHIP. We also show the generalization of the Divergent Component of Motion to the 3D VHIP and we provide an alternative motion decomposition for the analysis of height and CoP strategies independently. This allow us to generalize previous global feedback controllers done in the 2D VHIP to the 3D VHIP with a Variable CoP.
△ Less
Submitted 12 December, 2019;
originally announced December 2019.