-
Segment Anything for Satellite Imagery: A Strong Baseline and a Regional Dataset for Automatic Field Delineation
Authors:
Carmelo Scribano,
Elena Govi,
Paolo Bertellini,
Simone Parisi,
Giorgia Franchini,
Marko Bertogna
Abstract:
Accurate mapping of agricultural field boundaries is essential for the efficient operation of agriculture. Automatic extraction from high-resolution satellite imagery, supported by computer vision techniques, can avoid costly ground surveys. In this paper, we present a pipeline for field delineation based on the Segment Anything Model (SAM), introducing a fine-tuning strategy to adapt SAM to this…
▽ More
Accurate mapping of agricultural field boundaries is essential for the efficient operation of agriculture. Automatic extraction from high-resolution satellite imagery, supported by computer vision techniques, can avoid costly ground surveys. In this paper, we present a pipeline for field delineation based on the Segment Anything Model (SAM), introducing a fine-tuning strategy to adapt SAM to this task. In addition to using published datasets, we describe a method for acquiring a complementary regional dataset that covers areas beyond current sources. Extensive experiments assess segmentation accuracy and evaluate the generalization capabilities. Our approach provides a robust baseline for automated field delineation. The new regional dataset, known as ERAS, is now publicly available.
△ Less
Submitted 23 June, 2025; v1 submitted 19 June, 2025;
originally announced June 2025.
-
Towards Safer Planetary Exploration: A Hybrid Architecture for Terrain Traversability Analysis in Mars Rovers
Authors:
Achille Chiuchiarelli,
Giacomo Franchini,
Francesco Messina,
Marcello Chiaberge
Abstract:
The field of autonomous navigation for unmanned ground vehicles (UGVs) is in continuous growth and increasing levels of autonomy have been reached in the last few years. However, the task becomes more challenging when the focus is on the exploration of planet surfaces such as Mars. In those situations, UGVs are forced to navigate through unstable and rugged terrains which, inevitably, open the veh…
▽ More
The field of autonomous navigation for unmanned ground vehicles (UGVs) is in continuous growth and increasing levels of autonomy have been reached in the last few years. However, the task becomes more challenging when the focus is on the exploration of planet surfaces such as Mars. In those situations, UGVs are forced to navigate through unstable and rugged terrains which, inevitably, open the vehicle to more hazards, accidents, and, in extreme cases, complete mission failure. The paper addresses the challenges of autonomous navigation for unmanned ground vehicles in planetary exploration, particularly on Mars, introducing a hybrid architecture for terrain traversability analysis that combines two approaches: appearance-based and geometry-based. The appearance-based method uses semantic segmentation via deep neural networks to classify different terrain types. This is further refined by pixel-level terrain roughness classification obtained from the same RGB image, assigning different costs based on the physical properties of the soil. The geometry-based method complements the appearance-based approach by evaluating the terrain's geometrical features, identifying hazards that may not be detectable by the appearance-based side. The outputs of both methods are combined into a comprehensive hybrid cost map. The proposed architecture was trained on synthetic datasets and developed as a ROS2 application to integrate into broader autonomous navigation systems for harsh environments. Simulations have been performed in Unity, showing the ability of the method to assess online traversability analysis.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Advancing lunar exploration through virtual reality simulations: a framework for future human missions
Authors:
Giacomo Franchini,
Brenno Tuberga,
Marcello Chiaberge
Abstract:
In an era marked by renewed interest in lunar exploration and the prospect of establishing a sustainable human presence on the Moon, innovative approaches supporting mission preparation and astronaut training are imperative. To this end, the advancements in Virtual Reality (VR) technology offer a promising avenue to simulate and optimize future human missions to the Moon. Through VR simulations, t…
▽ More
In an era marked by renewed interest in lunar exploration and the prospect of establishing a sustainable human presence on the Moon, innovative approaches supporting mission preparation and astronaut training are imperative. To this end, the advancements in Virtual Reality (VR) technology offer a promising avenue to simulate and optimize future human missions to the Moon. Through VR simulations, tests can be performed quickly, with different environment parameters and a human-centered perspective can be maintained throughout the experiments. This paper presents a comprehensive framework that harnesses VR simulations to replicate the challenges and opportunities of lunar exploration, aiming to enhance astronaut readiness and mission success. Multiple environments with physical and visual characteristics that reflect those found in interesting Moon regions have been modeled and integrated into simulations based on the Unity graphical engine. We exploit VR to allow the user to fully immerse in the simulations and interact with assets in the same way as in real contexts. Different scenarios have been replicated, from upcoming exploration missions where it is possible to deploy scientific payloads, collect samples, and traverse the surrounding environment, to long-term habitation in a futuristic lunar base, performing everyday activities. Moreover, our framework allows us to simulate human-robot collaboration and surveillance directly displaying sensor readings and scheduled tasks of autonomous agents which will be part of future hybrid missions, leveraging the ROS2-Unity bridge. Thus, the entire project can be summarized as a desire to define cornerstones for human-machine design and interaction, astronaut training, and learning of potential weak points in the context of future lunar missions, through targeted operations in a variety of contexts as close to reality as possible.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Majorization-Minimization for sparse SVMs
Authors:
Alessandro Benfenati,
Emilie Chouzenoux,
Giorgia Franchini,
Salla Latva-Aijo,
Dominik Narnhofer,
Jean-Christophe Pesquet,
Sebastian J. Scott,
Mahsa Yousefi
Abstract:
Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimizat…
▽ More
Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization. This choice paves the way to the application of quick training methods built on majorization-minimization approaches, benefiting from the Lipschitz differentiabililty of the loss function. Moreover, the proposed approach allows us to handle sparsity-preserving regularizers promoting the selection of the most significant features, so enhancing the performance. Numerical tests and comparisons conducted on three different datasets demonstrate the good performance of the proposed methodology in terms of qualitative metrics (accuracy, precision, recall, and F 1 score) as well as computational cost.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Uncovering the Background-Induced bias in RGB based 6-DoF Object Pose Estimation
Authors:
Elena Govi,
Davide Sapienza,
Carmelo Scribano,
Tobia Poppi,
Giorgia Franchini,
Paola Ardòn,
Micaela Verucchi,
Marko Bertogna
Abstract:
In recent years, there has been a growing trend of using data-driven methods in industrial settings. These kinds of methods often process video images or parts, therefore the integrity of such images is crucial. Sometimes datasets, e.g. consisting of images, can be sophisticated for various reasons. It becomes critical to understand how the manipulation of video and images can impact the effective…
▽ More
In recent years, there has been a growing trend of using data-driven methods in industrial settings. These kinds of methods often process video images or parts, therefore the integrity of such images is crucial. Sometimes datasets, e.g. consisting of images, can be sophisticated for various reasons. It becomes critical to understand how the manipulation of video and images can impact the effectiveness of a machine learning method. Our case study aims precisely to analyze the Linemod dataset, considered the state of the art in 6D pose estimation context. That dataset presents images accompanied by ArUco markers; it is evident that such markers will not be available in real-world contexts. We analyze how the presence of the markers affects the pose estimation accuracy, and how this bias may be mitigated through data augmentation and other methods. Our work aims to show how the presence of these markers goes to modify, in the testing phase, the effectiveness of the deep learning method used. In particular, we will demonstrate, through the tool of saliency maps, how the focus of the neural network is captured in part by these ArUco markers. Finally, a new dataset, obtained by applying geometric tools to Linemod, will be proposed in order to demonstrate our hypothesis and uncovering the bias. Our results demonstrate the potential for bias in 6DOF pose estimation networks, and suggest methods for reducing this bias when training with markers.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Explainable bilevel optimization: an application to the Helsinki deblur challenge
Authors:
Silvia Bonettini,
Giorgia Franchini,
Danilo Pezzi,
Marco Prato
Abstract:
In this paper we present a bilevel optimization scheme for the solution of a general image deblurring problem, in which a parametric variational-like approach is encapsulated within a machine learning scheme to provide a high quality reconstructed image with automatically learned parameters. The ingredients of the variational lower level and the machine learning upper one are specifically chosen f…
▽ More
In this paper we present a bilevel optimization scheme for the solution of a general image deblurring problem, in which a parametric variational-like approach is encapsulated within a machine learning scheme to provide a high quality reconstructed image with automatically learned parameters. The ingredients of the variational lower level and the machine learning upper one are specifically chosen for the Helsinki Deblur Challenge 2021, in which sequences of letters are asked to be recovered from out-of-focus photographs with increasing levels of blur. Our proposed procedure for the reconstructed image consists in a fixed number of FISTA iterations applied to the minimization of an edge preserving and binarization enforcing regularized least-squares functional. The parameters defining the variational model and the optimization steps, which, unlike most deep learning approaches, all have a precise and interpretable meaning, are learned via either a similarity index or a support vector machine strategy. Numerical experiments on the test images provided by the challenge authors show significant gains with respect to a standard variational approach and performances comparable with those of some of the proposed deep learning based algorithms which require the optimization of millions of parameters.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
CERBERUS: Simple and Effective All-In-One Automotive Perception Model with Multi Task Learning
Authors:
Carmelo Scribano,
Giorgia Franchini,
Ignacio Sañudo Olmedo,
Marko Bertogna
Abstract:
Perceiving the surrounding environment is essential for enabling autonomous or assisted driving functionalities. Common tasks in this domain include detecting road users, as well as determining lane boundaries and classifying driving conditions. Over the last few years, a large variety of powerful Deep Learning models have been proposed to address individual tasks of camera-based automotive percep…
▽ More
Perceiving the surrounding environment is essential for enabling autonomous or assisted driving functionalities. Common tasks in this domain include detecting road users, as well as determining lane boundaries and classifying driving conditions. Over the last few years, a large variety of powerful Deep Learning models have been proposed to address individual tasks of camera-based automotive perception with astonishing performances. However, the limited capabilities of in-vehicle embedded computing platforms cannot cope with the computational effort required to run a heavy model for each individual task. In this work, we present CERBERUS (CEnteR Based End-to-end peRception Using a Single model), a lightweight model that leverages a multitask-learning approach to enable the execution of multiple perception tasks at the cost of a single inference. The code will be made publicly available at https://github.com/cscribano/CERBERUS
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
Authors:
Carmelo Scribano,
Giorgia Franchini,
Marco Prato,
Marko Bertogna
Abstract:
Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive" architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as $O(n^2)$ where $n$ s…
▽ More
Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive" architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as $O(n^2)$ where $n$ stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. This makes it particularly suitable in real-time contexts on embedded platforms. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public
△ Less
Submitted 15 March, 2023; v1 submitted 2 March, 2022;
originally announced March 2022.
-
All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers
Authors:
Carmelo Scribano,
Davide Sapienza,
Giorgia Franchini,
Micaela Verucchi,
Marko Bertogna
Abstract:
Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehic…
▽ More
Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Comparing the luminosity distance for gravitational waves and electromagnetic signals in a simple model of quadratic gravity
Authors:
G. Fanizza,
G. Franchini,
M. Gasperini,
L. Tedesco
Abstract:
We compute the modified friction coefficient controlling the propagation of tensor metric perturbations in the context of a generalized cosmological scenario based on a theory of gravity with quadratic curvature corrections. In such a context we discuss the differences between gravitational and electromagnetic luminosity distance, as well as the differences with the standard results based on the E…
▽ More
We compute the modified friction coefficient controlling the propagation of tensor metric perturbations in the context of a generalized cosmological scenario based on a theory of gravity with quadratic curvature corrections. In such a context we discuss the differences between gravitational and electromagnetic luminosity distance, as well as the differences with the standard results based on the Einstein equations. We present numerical estimates of the modified luminosity distance on the cosmic redshift scale typical of Supernovae and standard sirens.
△ Less
Submitted 13 November, 2020; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Combining Weighted Total Variation and Deep Image Prior for natural and medical image restoration via ADMM
Authors:
Pasquale Cascarano,
Andrea Sebastiani,
Maria Colomba Comes,
Giorgia Franchini,
Federica Porta
Abstract:
In the last decades, unsupervised deep learning based methods have caught researchers attention, since in many real applications, such as medical imaging, collecting a great amount of training examples is not always feasible. Moreover, the construction of a good training set is time consuming and hard because the selected data have to be enough representative for the task. In this paper, we focus…
▽ More
In the last decades, unsupervised deep learning based methods have caught researchers attention, since in many real applications, such as medical imaging, collecting a great amount of training examples is not always feasible. Moreover, the construction of a good training set is time consuming and hard because the selected data have to be enough representative for the task. In this paper, we focus on the Deep Image Prior (DIP) framework and we propose to combine it with a space-variant Total Variation regularizer with an automatic estimation of the local regularization parameters. Differently from other existing approaches, we solve the arising minimization problem via the flexible Alternating Direction Method of Multipliers (ADMM). Furthermore, we provide a specific implementation also for the standard isotropic Total Variation. The promising performances of the proposed approach, in terms of PSNR and SSIM values, are addressed through several experiments on simulated as well as real natural and medical corrupted images.
△ Less
Submitted 24 March, 2021; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Mise en abyme with artificial intelligence: how to predict the accuracy of NN, applied to hyper-parameter tuning
Authors:
Giorgia Franchini,
Mathilde Galinier,
Micaela Verucchi
Abstract:
In the context of deep learning, the costliest phase from a computational point of view is the full training of the learning algorithm. However, this process is to be used a significant number of times during the design of a new artificial neural network, leading therefore to extremely expensive operations. Here, we propose a low-cost strategy to predict the accuracy of the algorithm, based only o…
▽ More
In the context of deep learning, the costliest phase from a computational point of view is the full training of the learning algorithm. However, this process is to be used a significant number of times during the design of a new artificial neural network, leading therefore to extremely expensive operations. Here, we propose a low-cost strategy to predict the accuracy of the algorithm, based only on its initial behaviour. To do so, we train the network of interest up to convergence several times, modifying its characteristics at each training. The initial and final accuracies observed during this beforehand process are stored in a database. We then make use of both curve fitting and Support Vector Machines techniques, the latter being trained on the created database, to predict the accuracy of the network, given its accuracy on the primary iterations of its learning. This approach can be of particular interest when the space of the characteristics of the network is notably large or when its full training is highly time-consuming. The results we obtained are promising and encouraged us to apply this strategy to a topical issue: hyper-parameter optimisation (HO). In particular, we focused on the HO of a convolutional neural network for the classification of the databases MNIST and CIFAR-10. By using our method of prediction, and an algorithm implemented by us for a probabilistic exploration of the hyper-parameter space, we were able to find the hyper-parameter settings corresponding to the optimal accuracies already known in literature, at a quite low-cost.
△ Less
Submitted 28 June, 2019;
originally announced July 2019.