-
Predictive Uncertainty for Runtime Assurance of a Real-Time Computer Vision-Based Landing System
Authors:
Romeo Valentin,
Sydney M. Katz,
Artur B. Carneiro,
Don Walker,
Mykel J. Kochenderfer
Abstract:
Recent advances in data-driven computer vision have enabled robust autonomous navigation capabilities for civil aviation, including automated landing and runway detection. However, ensuring that these systems meet the robustness and safety requirements for aviation applications remains a major challenge. In this work, we present a practical vision-based pipeline for aircraft pose estimation from r…
▽ More
Recent advances in data-driven computer vision have enabled robust autonomous navigation capabilities for civil aviation, including automated landing and runway detection. However, ensuring that these systems meet the robustness and safety requirements for aviation applications remains a major challenge. In this work, we present a practical vision-based pipeline for aircraft pose estimation from runway images that represents a step toward the ability to certify these systems for use in safety-critical aviation applications. Our approach features three key innovations: (i) an efficient, flexible neural architecture based on a spatial Soft Argmax operator for probabilistic keypoint regression, supporting diverse vision backbones with real-time inference; (ii) a principled loss function producing calibrated predictive uncertainties, which are evaluated via sharpness and calibration metrics; and (iii) an adaptation of Residual-based Receiver Autonomous Integrity Monitoring (RAIM), enabling runtime detection and rejection of faulty model outputs. We implement and evaluate our pose estimation pipeline on a dataset of runway images. We show that our model outperforms baseline architectures in terms of accuracy while also producing well-calibrated uncertainty estimates with sub-pixel precision that can be used downstream for fault detection.
△ Less
Submitted 13 August, 2025;
originally announced August 2025.
-
Advancing Image-Based Grapevine Variety Classification with a New Benchmark and Evaluation of Masked Autoencoders
Authors:
Gabriel A. Carneiro,
Thierry J. Aubry,
António Cunha,
Petia Radeva,
Joaquim Sousa
Abstract:
Grapevine varieties are essential for the economies of many wine-producing countries, influencing the production of wine, juice, and the consumption of fruits and leaves. Traditional identification methods, such as ampelography and molecular analysis, have limitations: ampelography depends on expert knowledge and is inherently subjective, while molecular methods are costly and time-intensive. To a…
▽ More
Grapevine varieties are essential for the economies of many wine-producing countries, influencing the production of wine, juice, and the consumption of fruits and leaves. Traditional identification methods, such as ampelography and molecular analysis, have limitations: ampelography depends on expert knowledge and is inherently subjective, while molecular methods are costly and time-intensive. To address these limitations, recent studies have applied deep learning (DL) models to classify grapevine varieties using image data. However, due to the small dataset sizes, these methods often depend on transfer learning from datasets from other domains, e.g., ImageNet1K (IN1K), which can lead to performance degradation due to domain shift and supervision collapse. In this context, self-supervised learning (SSL) methods can be a good tool to avoid this performance degradation, since they can learn directly from data, without external labels. This study presents an evaluation of Masked Autoencoders (MAEs) for identifying grapevine varieties based on field-acquired images. The main contributions of this study include two benchmarks comprising 43 grapevine varieties collected across different seasons, an analysis of MAE's application in the agricultural context, and a performance comparison of trained models across seasons. Our results show that a ViT-B/16 model pre-trained with MAE and the unlabeled dataset achieved an F1 score of 0.7956, outperforming all other models. Additionally, we observed that pre-trained models benefit from long pre-training, perform well under low-data training regime, and that simple data augmentation methods are more effective than complex ones. The study also found that the mask ratio in MAE impacts performance only marginally.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment
Authors:
Xiao Fei,
Michail Chatzianastasis,
Sarah Almeida Carneiro,
Hadi Abdine,
Lawrence P. Petalidis,
Michalis Vazirgiannis
Abstract:
Predicting protein function from sequence is a central challenge in computational biology. While existing methods rely heavily on structured ontologies or similarity-based techniques, they often lack the flexibility to express structure-free functional descriptions and novel biological functions. In this work, we introduce Prot2Text-V2, a novel multimodal sequence-to-text model that generates free…
▽ More
Predicting protein function from sequence is a central challenge in computational biology. While existing methods rely heavily on structured ontologies or similarity-based techniques, they often lack the flexibility to express structure-free functional descriptions and novel biological functions. In this work, we introduce Prot2Text-V2, a novel multimodal sequence-to-text model that generates free-form natural language descriptions of protein function directly from amino acid sequences. Our method combines a protein language model as a sequence encoder (ESM-3B) and a decoder-only language model (LLaMA-3.1-8B-Instruct) through a lightweight nonlinear modality projector. A key innovation is our Hybrid Sequence-level Contrastive Alignment Learning (H-SCALE), which improves cross-modal learning by matching mean- and std-pooled protein embeddings with text representations via contrastive loss. After the alignment phase, we apply instruction-based fine-tuning using LoRA on the decoder to teach the model how to generate accurate protein function descriptions conditioned on the protein sequence. We train Prot2Text-V2 on about 250K curated entries from SwissProt and evaluate it under low-homology conditions, where test sequences have low similarity with training samples. Prot2Text-V2 consistently outperforms traditional and LLM-based baselines across various metrics.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Sign language recognition based on deep learning and low-cost handcrafted descriptors
Authors:
Alvaro Leandro Cavalcante Carneiro,
Denis Henrique Pinheiro Salvadeo,
Lucas de Brito Silva
Abstract:
In recent years, deep learning techniques have been used to develop sign language recognition systems, potentially serving as a communication tool for millions of hearing-impaired individuals worldwide. However, there are inherent challenges in creating such systems. Firstly, it is important to consider as many linguistic parameters as possible in gesture execution to avoid ambiguity between words…
▽ More
In recent years, deep learning techniques have been used to develop sign language recognition systems, potentially serving as a communication tool for millions of hearing-impaired individuals worldwide. However, there are inherent challenges in creating such systems. Firstly, it is important to consider as many linguistic parameters as possible in gesture execution to avoid ambiguity between words. Moreover, to facilitate the real-world adoption of the created solution, it is essential to ensure that the chosen technology is realistic, avoiding expensive, intrusive, or low-mobility sensors, as well as very complex deep learning architectures that impose high computational requirements. Based on this, our work aims to propose an efficient sign language recognition system that utilizes low-cost sensors and techniques. To this end, an object detection model was trained specifically for detecting the interpreter's face and hands, ensuring focus on the most relevant regions of the image and generating inputs with higher semantic value for the classifier. Additionally, we introduced a novel approach to obtain features representing hand location and movement by leveraging spatial information derived from centroid positions of bounding boxes, thereby enhancing sign discrimination. The results demonstrate the efficiency of our handcrafted features, increasing accuracy by 7.96% on the AUTSL dataset, while adding fewer than 700 thousand parameters and incurring less than 10 milliseconds of additional inference time. These findings highlight the potential of our technique to strike a favorable balance between computational cost and accuracy, making it a promising approach for practical sign language recognition applications.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Clustering Dynamics for Improved Speed Prediction Deriving from Topographical GPS Registrations
Authors:
Sarah Almeida Carneiro,
Giovanni Chierchia,
Aurelie Pirayre,
Laurent Najman
Abstract:
A persistent challenge in the field of Intelligent Transportation Systems is to extract accurate traffic insights from geographic regions with scarce or no data coverage. To this end, we propose solutions for speed prediction using sparse GPS data points and their associated topographical and road design features. Our goal is to investigate whether we can use similarities in the terrain and infras…
▽ More
A persistent challenge in the field of Intelligent Transportation Systems is to extract accurate traffic insights from geographic regions with scarce or no data coverage. To this end, we propose solutions for speed prediction using sparse GPS data points and their associated topographical and road design features. Our goal is to investigate whether we can use similarities in the terrain and infrastructure to train a machine learning model that can predict speed in regions where we lack transportation data. For this we create a Temporally Orientated Speed Dictionary Centered on Topographically Clustered Roads, which helps us to provide speed correlations to selected feature configurations. Our results show qualitative and quantitative improvement over new and standard regression methods. The presented framework provides a fresh perspective on devising strategies for missing data traffic analysis.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
SWMLP: Shared Weight Multilayer Perceptron for Car Trajectory Speed Prediction using Road Topographical Features
Authors:
Sarah Almeida Carneiro,
Giovanni Chierchia,
Jean Charléty,
Aurélie Chataignon,
Laurent Najman
Abstract:
Although traffic is one of the massively collected data, it is often only available for specific regions. One concern is that, although there are studies that give good results for these data, the data from these regions may not be sufficiently representative to describe all the traffic patterns in the rest of the world. In quest of addressing this concern, we propose a speed prediction method tha…
▽ More
Although traffic is one of the massively collected data, it is often only available for specific regions. One concern is that, although there are studies that give good results for these data, the data from these regions may not be sufficiently representative to describe all the traffic patterns in the rest of the world. In quest of addressing this concern, we propose a speed prediction method that is independent of large historical speed data. To predict a vehicle's speed, we use the trajectory road topographical features to fit a Shared Weight Multilayer Perceptron learning model. Our results show significant improvement, both qualitative and quantitative, over standard regression analysis. Moreover, the proposed framework sheds new light on the way to design new approaches for traffic analysis.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
A visão da BBChain sobre o contexto tecnológico subjacente à adoção do Real Digital
Authors:
Marcio G B de Avellar,
Alexandre A S Junior,
André H G Lopes,
André L S Carneiro,
João A Pereira,
Davi C B D da Cunha
Abstract:
We explore confidential computing in the context of CBDCs using Microsoft's CCF framework as an example. By developing an experiment and comparing different approaches and performance and security metrics, we seek to evaluate the effectiveness of confidential computing to improve the privacy, security, and performance of CBDCs. Preliminary results suggest that confidential computing could be a pro…
▽ More
We explore confidential computing in the context of CBDCs using Microsoft's CCF framework as an example. By developing an experiment and comparing different approaches and performance and security metrics, we seek to evaluate the effectiveness of confidential computing to improve the privacy, security, and performance of CBDCs. Preliminary results suggest that confidential computing could be a promising solution to the technological challenges faced by CBDCs. Furthermore, by implementing confidential computing in DLTs such as Hyperledger Besu and utilizing frameworks such as CCF, we increase transaction confidentiality and privacy while maintaining the scalability and interoperability required for a global digital financial system. In conclusion, confidential computing can significantly bolster CBDC development, fostering a secure, private, and efficient financial future.
--
Exploramos o uso da computação confidencial no contexto das CBDCs utilizando o framework CCF da Microsoft como exemplo. Via desenvolvimento de experimentos e comparação de diferentes abordagens e métricas de desempenho e segurança, buscamos avaliar a eficácia da computação confidencial para melhorar a privacidade, segurança e desempenho das CBDCs. Resultados preliminares sugerem que a computação confidencial pode ser uma solução promissora para os desafios tecnológicos enfrentados pelas CBDCs. Ao implementar a computação confidencial em DLTs, como o Hyperledger Besu, e utilizar frameworks como o CCF, aumentamos a confidencialidade e a privacidade das transações, mantendo a escalabilidade e a interoperabilidade necessárias para um sistema financeiro global e digital. Em conclusão, a computação confidencial pode reforçar significativamente o desenvolvimento do CBDC, promovendo um futuro financeiro seguro, privado e eficiente.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Wireless Connectivity of a Ground-and-Air Sensor Network
Authors:
Clara R. P. Baldansa,
Roberto C. G. Porto,
Bruno José Olivieri de Souza,
Vítor G. Andrezo Carneiro,
Markus Endler
Abstract:
This paper shows that, when considering outdoor scenarios and wireless communications using the IEEE 802.11 protocol with dipole antennas, the ground reflection is a significant propagation mechanism. This way, the Two-Ray model for this environment allows predicting, with some accuracy, the received signal power. This study is relevant for the application in the communication between overflying U…
▽ More
This paper shows that, when considering outdoor scenarios and wireless communications using the IEEE 802.11 protocol with dipole antennas, the ground reflection is a significant propagation mechanism. This way, the Two-Ray model for this environment allows predicting, with some accuracy, the received signal power. This study is relevant for the application in the communication between overflying Unmanned Aerial Vehicles (UAVs) and ground sensors. In the proposed Wireless Sensor Network (WSN) scenario, the UAVs must receive information from the environment, which is collected by sensors positioned on the ground, and need to maintain connectivity between them and the base station, in order to maintain the quality of service, while moving through the environment.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
Portuguese Man-of-War Image Classification with Convolutional Neural Networks
Authors:
Alessandra Carneiro,
Lorena Nascimento,
Mauricio Noernberg,
Carmem Hara,
Aurora Pozo
Abstract:
Portuguese man-of-war (PMW) is a gelatinous organism with long tentacles capable of causing severe burns, thus leading to negative impacts on human activities, such as tourism and fishing. There is a lack of information about the spatio-temporal dynamics of this species. Therefore, the use of alternative methods for collecting data can contribute to their monitoring. Given the widespread use of so…
▽ More
Portuguese man-of-war (PMW) is a gelatinous organism with long tentacles capable of causing severe burns, thus leading to negative impacts on human activities, such as tourism and fishing. There is a lack of information about the spatio-temporal dynamics of this species. Therefore, the use of alternative methods for collecting data can contribute to their monitoring. Given the widespread use of social networks and the eye-catching look of PMW, Instagram posts can be a promising data source for monitoring. The first task to follow this approach is to identify posts that refer to PMW. This paper reports on the use of convolutional neural networks for PMW images classification, in order to automate the recognition of Instagram posts. We created a suitable dataset, and trained three different neural networks: VGG-16, ResNet50, and InceptionV3, with and without a pre-trained step with the ImageNet dataset. We analyzed their results using accuracy, precision, recall, and F1 score metrics. The pre-trained ResNet50 network presented the best results, obtaining 94% of accuracy and 95% of precision, recall, and F1 score. These results show that convolutional neural networks can be very effective for recognizing PMW images from the Instagram social media.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
An Efficient Contact Algorithm for Rigid/Deformable Interaction based on the Dual Mortar Method
Authors:
R. Pinto Carvalho,
A. M. Couto Carneiro,
F. M. Andrade Pires,
A. Popp
Abstract:
In a wide range of practical problems, such as forming operations and impact tests, assuming that one of the contacting bodies is rigid is an excellent approximation to the physical phenomenon. In this work, the well-established dual mortar method is adopted to enforce interface constraints in the finite deformation frictionless contact of rigid and deformable bodies. The efficiency of the nonline…
▽ More
In a wide range of practical problems, such as forming operations and impact tests, assuming that one of the contacting bodies is rigid is an excellent approximation to the physical phenomenon. In this work, the well-established dual mortar method is adopted to enforce interface constraints in the finite deformation frictionless contact of rigid and deformable bodies. The efficiency of the nonlinear contact algorithm proposed here is based on two main contributions. Firstly, a variational formulation of the method using the so-called Petrov-Galerkin scheme is investigated, as it unlocks a significant simplification by removing the need to explicitly evaluate the dual basis functions. The corresponding first-order dual mortar interpolation is presented in detail. Particular focus is, then, placed on the extension for second-order interpolation by employing a piecewise linear interpolation scheme, which critically retains the geometrical information of the finite element mesh. Secondly, a new definition for the nodal orthonormal moving frame attached to each contact node is suggested. It reduces the geometrical coupling between the nodes and consequently decreases the stiffness matrix bandwidth. The proposed contributions decrease the computational complexity of dual mortar methods for rigid/deformable interaction, especially in the three-dimensional setting, while preserving accuracy and robustness.
△ Less
Submitted 6 October, 2022; v1 submitted 4 January, 2022;
originally announced January 2022.
-
Efficient sign language recognition system and dataset creation method based on deep learning and image processing
Authors:
Alvaro Leandro Cavalcante Carneiro,
Lucas de Brito Silva,
Denis Henrique Pinheiro Salvadeo
Abstract:
New deep-learning architectures are created every year, achieving state-of-the-art results in image recognition and leading to the belief that, in a few years, complex tasks such as sign language translation will be considerably easier, serving as a communication tool for the hearing-impaired community. On the other hand, these algorithms still need a lot of data to be trained and the dataset crea…
▽ More
New deep-learning architectures are created every year, achieving state-of-the-art results in image recognition and leading to the belief that, in a few years, complex tasks such as sign language translation will be considerably easier, serving as a communication tool for the hearing-impaired community. On the other hand, these algorithms still need a lot of data to be trained and the dataset creation process is expensive, time-consuming, and slow. Thereby, this work aims to investigate techniques of digital image processing and machine learning that can be used to create a sign language dataset effectively. We argue about data acquisition, such as the frames per second rate to capture or subsample the videos, the background type, preprocessing, and data augmentation, using convolutional neural networks and object detection to create an image classifier and comparing the results based on statistical tests. Different datasets were created to test the hypotheses, containing 14 words used daily and recorded by different smartphones in the RGB color system. We achieved an accuracy of 96.38% on the test set and 81.36% on the validation set containing more challenging conditions, showing that 30 FPS is the best frame rate subsample to train the classifier, geometric transformations work better than intensity transformations, and artificial background creation is not effective to model generalization. These trade-offs should be considered in future work as a cost-benefit guideline between computational cost and accuracy gain when creating a dataset and training a sign recognition model.
△ Less
Submitted 1 April, 2021; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Artificial intelligence for detection and quantification of rust and leaf miner in coffee crop
Authors:
Alvaro Leandro Cavalcante Carneiro,
Lucas de Brito Silva,
Marisa Silveira Almeida Renaud Faulin
Abstract:
Pest and disease control plays a key role in agriculture since the damage caused by these agents are responsible for a huge economic loss every year. Based on this assumption, we create an algorithm capable of detecting rust (Hemileia vastatrix) and leaf miner (Leucoptera coffeella) in coffee leaves (Coffea arabica) and quantify disease severity using a mobile application as a high-level interface…
▽ More
Pest and disease control plays a key role in agriculture since the damage caused by these agents are responsible for a huge economic loss every year. Based on this assumption, we create an algorithm capable of detecting rust (Hemileia vastatrix) and leaf miner (Leucoptera coffeella) in coffee leaves (Coffea arabica) and quantify disease severity using a mobile application as a high-level interface for the model inferences. We used different convolutional neural network architectures to create the object detector, besides the OpenCV library, k-means, and three treatments: the RGB and value to quantification, and the AFSoft software, in addition to the analysis of variance, where we compare the three methods. The results show an average precision of 81,5% in the detection and that there was no significant statistical difference between treatments to quantify the severity of coffee leaves, proposing a computationally less costly method. The application, together with the trained model, can detect the pest and disease over different image conditions and infection stages and also estimate the disease infection stage.
△ Less
Submitted 1 April, 2021; v1 submitted 20 March, 2021;
originally announced March 2021.
-
DR$\vert$GRADUATE: uncertainty-aware deep learning-based diabetic retinopathy grading in eye fundus images
Authors:
Teresa Araújo,
Guilherme Aresta,
Luís Mendonça,
Susana Penas,
Carolina Maia,
Ângela Carneiro,
Ana Maria Mendonça,
Aurélio Campilho
Abstract:
Diabetic retinopathy (DR) grading is crucial in determining the adequate treatment and follow up of patients, but the screening process can be tiresome and prone to errors. Deep learning approaches have shown promising performance as computer-aided diagnosis(CAD) systems, but their black-box behaviour hinders the clinical application. We propose DR$\vert$GRADUATE, a novel deep learning-based DR gr…
▽ More
Diabetic retinopathy (DR) grading is crucial in determining the adequate treatment and follow up of patients, but the screening process can be tiresome and prone to errors. Deep learning approaches have shown promising performance as computer-aided diagnosis(CAD) systems, but their black-box behaviour hinders the clinical application. We propose DR$\vert$GRADUATE, a novel deep learning-based DR grading CAD system that supports its decision by providing a medically interpretable explanation and an estimation of how uncertain that prediction is, allowing the ophthalmologist to measure how much that decision should be trusted. We designed DR$\vert$GRADUATE taking into account the ordinal nature of the DR grading problem. A novel Gaussian-sampling approach built upon a Multiple Instance Learning framework allow DR$\vert$GRADUATE to infer an image grade associated with an explanation map and a prediction uncertainty while being trained only with image-wise labels. DR$\vert$GRADUATE was trained on the Kaggle training set and evaluated across multiple datasets. In DR grading, a quadratic-weighted Cohen's kappa (QWK) between 0.71 and 0.84 was achieved in five different datasets. We show that high QWK values occur for images with low prediction uncertainty, thus indicating that this uncertainty is a valid measure of the predictions' quality. Further, bad quality images are generally associated with higher uncertainties, showing that images not suitable for diagnosis indeed lead to less trustworthy predictions. Additionally, tests on unfamiliar medical image data types suggest that DR$\vert$GRADUATE allows outlier detection. The attention maps generally highlight regions of interest for diagnosis. These results show the great potential of DR$\vert$GRADUATE as a second-opinion system in DR severity grading.
△ Less
Submitted 29 May, 2020; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Width Parameterizations for Knot-free Vertex Deletion on Digraphs
Authors:
Stéphane Bessy,
Marin Bougeret,
Alan D. A. Carneiro,
Fábio Protti,
Uéverton S. Souza
Abstract:
A knot in a directed graph $G$ is a strongly connected subgraph $Q$ of $G$ with at least two vertices, such that no vertex in $V(Q)$ is an in-neighbor of a vertex in $V(G)\setminus V(Q)$. Knots are important graph structures, because they characterize the existence of deadlocks in a classical distributed computation model, the so-called OR-model. Deadlock detection is correlated with the recogniti…
▽ More
A knot in a directed graph $G$ is a strongly connected subgraph $Q$ of $G$ with at least two vertices, such that no vertex in $V(Q)$ is an in-neighbor of a vertex in $V(G)\setminus V(Q)$. Knots are important graph structures, because they characterize the existence of deadlocks in a classical distributed computation model, the so-called OR-model. Deadlock detection is correlated with the recognition of knot-free graphs as well as deadlock resolution is closely related to the {\sc Knot-Free Vertex Deletion (KFVD)} problem, which consists of determining whether an input graph $G$ has a subset $S \subseteq V(G)$ of size at most $k$ such that $G[V\setminus S]$ contains no knot. In this paper we focus on graph width measure parameterizations for {\sc KFVD}. First, we show that: (i) {\sc KFVD} parameterized by the size of the solution $k$ is W[1]-hard even when $p$, the length of a longest directed path of the input graph, as well as $κ$, its Kenny-width, are bounded by constants, and we remark that {\sc KFVD} is para-NP-hard even considering many directed width measures as parameters, but in FPT when parameterized by clique-width; (ii) {\sc KFVD} can be solved in time $2^{O(tw)}\times n$, but assuming ETH it cannot be solved in $2^{o(tw)}\times n^{O(1)}$, where $tw$ is the treewidth of the underlying undirected graph. Finally, since the size of a minimum directed feedback vertex set ($dfv$) is an upper bound for the size of a minimum knot-free vertex deletion set, we investigate parameterization by $dfv$ and we show that (iii) {\sc KFVD} can be solved in FPT-time parameterized by either $dfv+κ$ or $dfv+p$; and it admits a Turing kernel by the distance to a DAG having an Hamiltonian path.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.