-
A Performance Analysis of You Only Look Once Models for Deployment on Constrained Computational Edge Devices in Drone Applications
Authors:
Lucas Rey,
Ana M. Bernardos,
Andrzej D. Dobrzycki,
David Carramiñana,
Luca Bergesio,
Juan A. Besada,
José Ramón Casar
Abstract:
Advancements in embedded systems and Artificial Intelligence (AI) have enhanced the capabilities of Unmanned Aircraft Vehicles (UAVs) in computer vision. However, the integration of AI techniques o-nboard drones is constrained by their processing capabilities. In this sense, this study evaluates the deployment of object detection models (YOLOv8n and YOLOv8s) on both resource-constrained edge devic…
▽ More
Advancements in embedded systems and Artificial Intelligence (AI) have enhanced the capabilities of Unmanned Aircraft Vehicles (UAVs) in computer vision. However, the integration of AI techniques o-nboard drones is constrained by their processing capabilities. In this sense, this study evaluates the deployment of object detection models (YOLOv8n and YOLOv8s) on both resource-constrained edge devices and cloud environments. The objective is to carry out a comparative performance analysis using a representative real-time UAV image processing pipeline. Specifically, the NVIDIA Jetson Orin Nano, Orin NX, and Raspberry Pi 5 (RPI5) devices have been tested to measure their detection accuracy, inference speed, and energy consumption, and the effects of post-training quantization (PTQ). The results show that YOLOv8n surpasses YOLOv8s in its inference speed, achieving 52 FPS on the Jetson Orin NX and 65 fps with INT8 quantization. Conversely, the RPI5 failed to satisfy the real-time processing needs in spite of its suitability for low-energy consumption applications. An analysis of both the cloud-based and edge-based end-to-end processing times showed that increased communication latencies hindered real-time applications, revealing trade-offs between edge (low latency) and cloud processing (quick processing). Overall, these findings contribute to providing recommendations and optimization strategies for the deployment of AI models on UAVs.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Enhancing healthcare infrastructure resilience through agent-based simulation methods
Authors:
David Carramiñana,
Ana M. Bernardos,
Juan A. Besada,
José R. Casar
Abstract:
Critical infrastructures face demanding challenges due to natural and human-generated threats, such as pandemics, workforce shortages or cyber-attacks, which might severely compromise service quality. To improve system resilience, decision-makers would need intelligent tools for quick and efficient resource allocation. This article explores an agent-based simulation model that intends to capture a…
▽ More
Critical infrastructures face demanding challenges due to natural and human-generated threats, such as pandemics, workforce shortages or cyber-attacks, which might severely compromise service quality. To improve system resilience, decision-makers would need intelligent tools for quick and efficient resource allocation. This article explores an agent-based simulation model that intends to capture a part of the complexity of critical infrastructure systems, particularly considering the interdependencies of healthcare systems with information and telecommunication systems. Such a model enables to implement a simulation-based optimization approach in which the exposure of critical systems to risks is evaluated, while comparing the mitigation effects of multiple tactical and strategical decision alternatives to enhance their resilience. The proposed model is designed to be parameterizable, to enable adapting it to risk scenarios with different severity, and it facilitates the compilation of relevant performance indicators enabling monitoring at both agent level and system level. To validate the agent-based model, a literature-supported methodology has been used to perform cross-validation, sensitivity analysis and test the usefulness of the proposed model through a use case. The use case analyzes the impact of a concurrent pandemic and a cyber-attack on a hospital and compares different resiliency-enhancing countermeasures using contingency tables. Overall, the use case illustrates the feasibility and versatility of the proposed approach to enhance resiliency.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Application of Deep Reinforcement Learning to UAV Swarming for Ground Surveillance
Authors:
Raúl Arranz,
David Carramiñana,
Gonzalo de Miguel,
Juan A. Besada,
Ana M. Bernardos
Abstract:
This paper summarizes in depth the state of the art of aerial swarms, covering both classical and new reinforcement-learning-based approaches for their management. Then, it proposes a hybrid AI system, integrating deep reinforcement learning in a multi-agent centralized swarm architecture. The proposed system is tailored to perform surveillance of a specific area, searching and tracking ground tar…
▽ More
This paper summarizes in depth the state of the art of aerial swarms, covering both classical and new reinforcement-learning-based approaches for their management. Then, it proposes a hybrid AI system, integrating deep reinforcement learning in a multi-agent centralized swarm architecture. The proposed system is tailored to perform surveillance of a specific area, searching and tracking ground targets, for security and law enforcement applications. The swarm is governed by a central swarm controller responsible for distributing different search and tracking tasks among the cooperating UAVs. Each UAV agent is then controlled by a collection of cooperative sub-agents, whose behaviors have been trained using different deep reinforcement learning models, tailored for the different task types proposed by the swarm controller. More specifically, proximal policy optimization (PPO) algorithms were used to train the agents' behavior. In addition, several metrics to assess the performance of the swarm in this application were defined. The results obtained through simulation show that our system searches the operation area effectively, acquires the targets in a reasonable time, and is capable of tracking them continuously and consistently.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
Enhancing Interaction with Augmented Reality through Mid-Air Haptic Feedback: Architecture Design and User Feedback
Authors:
Diego Vaquero-Melchor,
Ana M. Bernardos
Abstract:
The integration of haptics within Augmented Reality may help to deliver an enriched experience, while facilitating the performance of specific actions (e.g. repositioning or resizin ) that are still dependent on the user's skills. This paper gathers the description of a flexible architecture designed to deploy haptically-enabled AR applications. The haptic feedback may be generated through a varie…
▽ More
The integration of haptics within Augmented Reality may help to deliver an enriched experience, while facilitating the performance of specific actions (e.g. repositioning or resizin ) that are still dependent on the user's skills. This paper gathers the description of a flexible architecture designed to deploy haptically-enabled AR applications. The haptic feedback may be generated through a variety of devices (e.g., wearable, graspable, or mid-air ones), and the architecture facilitates handling the specificity of each. For this reason, it is discussed how to generate a haptic representation of a 3D digital object depending on the application and the target device. Additionally, it is included an analysis of practical, relevant issues that arise when setting up a system to work with specific devices like Head-Mounted Displays (e.g., HoloLens) and mid-air haptic devices (e.g., Ultrahaptics UHK), such as the alignment between the real world and the virtual one. The architecture applicability is demonstrated through the implementation of two applications: Form Inspector and Simon Game, built for HoloLens and iOS mobile phones for visualization and for UHK for mid-air haptics delivery. These applications have been used by nine users to explore the efficiency, meaningfulness, and usefulness of mid-air haptics for form perception, object resizing, and push interaction tasks. Results show that, although mobile interaction is preferred when this option is available, haptics turn out to be more meaningful in identifying shapes when compared to what users initially expect and in contributing to the execution of resizing tasks. Moreover, this preliminary user study reveals that users may be expecting a tailored interface metaphor, not necessarily inspired in natural interaction.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis
Authors:
Andrzej D. Dobrzycki,
Ana M. Bernardos,
Luca Bergesio,
Andrzej Pomirski,
Daniel Sáez-Trigueros
Abstract:
Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding images and text. This study aims to assess…
▽ More
Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding images and text. This study aims to assess the effectiveness of CLIP in classifying human postures, focusing on its application in yoga. Despite the initial limitations of the zero-shot approach, applying transfer learning on 15,301 images (real and synthetic) with 82 classes has shown promising results. The article describes the full procedure for fine-tuning, including the choice for image description syntax, models and hyperparameters adjustment. The fine-tuned CLIP model, tested on 3826 images, achieves an accuracy of over 85%, surpassing the current state-of-the-art of previous works on the same dataset by approximately 6%, its training time being 3.5 times lower than what is needed to fine-tune a YOLOv8-based model. For more application-oriented scenarios, with smaller datasets of six postures each, containing 1301 and 401 training images, the fine-tuned models attain an accuracy of 98.8% and 99.1%, respectively. Furthermore, our experiments indicate that training with as few as 20 images per pose can yield around 90% accuracy in a six-class dataset. This study demonstrates that this multimodal technique can be effectively used for yoga pose classification, and possibly for human posture classification, in general. Additionally, CLIP inference time (around 7 ms) supports that the model can be integrated into automated systems for posture evaluation, e.g., for developing a real-time personal yoga assistant for performance assessment.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Towards resilient cities: A hybrid simulation framework for risk mitigation through data driven decision making
Authors:
David Carraminana,
Ana M. Bernardos,
Juan A. Besada,
Jose R. Casar
Abstract:
Providing a comprehensive view of the city operation and offering useful metrics for decision making is a well known challenge for urban risk analysis systems. Existing systems are, in many cases, generalizations of previous domain specific tools and or methodologies that may not cover all urban interdependencies and makes it difficult to have homogeneous indicators. In order to overcome this limi…
▽ More
Providing a comprehensive view of the city operation and offering useful metrics for decision making is a well known challenge for urban risk analysis systems. Existing systems are, in many cases, generalizations of previous domain specific tools and or methodologies that may not cover all urban interdependencies and makes it difficult to have homogeneous indicators. In order to overcome this limitation while seeking for effective support to decision makers, this article introduces a novel hybrid simulation framework for risk mitigation. The framework is built on a proposed city concept that considers urban space as a Complex Adaptive System composed by interconnected Critical Infrastructures. In this concept, a Social System, which models daily patterns and social interactions of the citizens in the Urban Landscape, drives the CIs demand to configure the full city picture. The frameworks hybrid design integrates agent based and network based modeling by breaking down city agents into system dependent subagents, to enable both inter and intra system interaction simulation, respectively. A layered structure of indicators at different aggregation levels is also developed, to ensure that decisions are not only data driven but also explainable. Therefore, the proposed simulation framework can serve as a DSS tool that allows the quantitative analysis of the impact of threats at different levels. First, system level metrics can be used to get a broad view on the city resilience. Then, agent level metrics back those figures and provide better explainability. On implementation, the proposed framework enables component reusability (for eased coding), simulation federation (enabling the integration of existing system oriented simulators), discrete simulation in accelerated time (for rapid scenario simulation) and decision oriented visualization (for informed outputs).
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
Hybrid Artificial Intelligence Strategies for Drone Navigation
Authors:
Rubén San-Segundo,
Lucía Angulo,
Manuel Gil-Martín,
David Carramiñana,
Ana M. Bernardos
Abstract:
Objective: This paper describes the development of hybrid artificial intelligence strategies for drone navigation. Methods: The navigation module combines a deep learning model with a rule-based engine depending on the agent state. The deep learning model has been trained using reinforcement learning. The rule-based engine uses expert knowledge to deal with specific situations. The navigation modu…
▽ More
Objective: This paper describes the development of hybrid artificial intelligence strategies for drone navigation. Methods: The navigation module combines a deep learning model with a rule-based engine depending on the agent state. The deep learning model has been trained using reinforcement learning. The rule-based engine uses expert knowledge to deal with specific situations. The navigation module incorporates several strategies to explain the drone decision based on its observation space, and different mechanisms for including human decisions in the navigation process. Finally, this paper proposes an evaluation methodology based on defining several scenarios and analyzing the performance of the different strategies according to metrics adapted to each scenario. Results: Two main navigation problems have been studied. For the first scenario (reaching known targets), it has been possible to obtain a 90% task completion rate, reducing significantly the number of collisions thanks to the rule-based engine. For the second scenario, it has been possible to reduce 20% of the time required to locate all the targets using the reinforcement learning model. Conclusions: Reinforcement learning is a very good strategy to learn policies for drone navigation, but in critical situations, it is necessary to complement it with a rule-based module to increase task success rate.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
Assessing the Acceptance of a Mid-Air Gesture Syntax for Smart Space Interaction: An Empirical Study
Authors:
Ana M. Bernardos,
Xian Wang,
Luca Bergesio,
Juan A. Besada,
José R. Casar
Abstract:
This article explores the use of a location-aware mid-air gesture-based command triplet syntax to interact with a smart space. The syntax, inspired by human language, is built as a vocative case with an imperative structure. In a sentence like 'Light, please switch on', the object being activated is invoked via making a gesture that mimics its initial letter/acronym (vocative, coincident with the…
▽ More
This article explores the use of a location-aware mid-air gesture-based command triplet syntax to interact with a smart space. The syntax, inspired by human language, is built as a vocative case with an imperative structure. In a sentence like 'Light, please switch on', the object being activated is invoked via making a gesture that mimics its initial letter/acronym (vocative, coincident with the sentence's elliptical subject). A geometrical or directional gesture then identifies the action (imperative verb) and may include an object feature or a second object with which to network (complement), which also represented by the initial or acronym letter. Technically, an interpreter relying on a trainable multidevice gesture recognition layer makes the pair/triplet syntax decoding possible. The recognition layer works on acceleration and position input signals from graspable (smartphone) and free-hand devices (smartwatch and external depth cameras), as well as a specific compiler. On a specific deployment at a Living Lab facility, the syntax has been instantiated via the use of a lexicon derived from English (with respect to the initial letters and acronyms). A within-subject analysis with twelve users has enabled the analysis of the syntax acceptance (in terms of usability, gesture agreement for actions over objects, and social acceptance) and technology preference of the gesture syntax within its three device implementations (graspable, wearable, and device-free ones). Participants express consensus regarding the simplicity of learning the syntax and its potential effectiveness in managing smart resources. Socially, participants favoured the Watch for outdoor activities and the Phone for home and work settings, underscoring the importance of social context in technology design. The Phone emerged as the preferred option for gesture recognition due to its efficiency and familiarity.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
SARA: A Microservice-Based Architecture for Cross-Platform Collaborative Augmented Reality
Authors:
Diego Vaquero-Melchor,
Ana M. Bernardos,
Luca Bergesio
Abstract:
Augmented Reality (AR) functionalities may be effectively leveraged in collaborative service scenarios (e.g., remote maintenance, on-site building, street gaming, etc.). Standard development cycles for collaborative AR require to code for each specific visualization platform and implement the necessary control mechanisms over the shared assets. This paper describes SARA, an architecture to support…
▽ More
Augmented Reality (AR) functionalities may be effectively leveraged in collaborative service scenarios (e.g., remote maintenance, on-site building, street gaming, etc.). Standard development cycles for collaborative AR require to code for each specific visualization platform and implement the necessary control mechanisms over the shared assets. This paper describes SARA, an architecture to support cross-platform collaborative Augmented Reality applications based on microservices. The architecture is designed to work over the concept of collaboration models (turn, layer, ownership,hierarchy-based and unconstrained examples) which regulate the interaction and permissions of each user over the AR assets. Thanks to the reusability of its components, during the development of an application, SARA enables focusing on the application logic while avoiding the implementation of the communication protocol, data model handling and orchestration between the different, possibly heterogeneous,devices involved in the collaboration (i.e., mobile or wearable AR devices using different operating systems). To describe how to build an application based on SARA, a prototype for HoloLens and iOS devices has been implemented. the prototype is a collaborative voxel-based game in which several players work real time together on a piece of land, adding or eliminating cubes in a collaborative manner to create buildings and landscapes. Turn-based and unconstrained collaboration models are applied to regulate the interaction, the development workflow for this case study shows how the architecture serves as a framework to support the deployment of collaborative AR services, enabling the reuse of collaboration model components, agnostically handling client technologies.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.