Search | arXiv e-print repository

MinkOcc: Towards real-time label-efficient semantic occupancy prediction

Authors: Samuel Sze, Daniele De Martini, Lars Kunze

Abstract: Developing 3D semantic occupancy prediction models often relies on dense 3D annotations for supervised learning, a process that is both labor and resource-intensive, underscoring the need for label-efficient or even label-free approaches. To address this, we introduce MinkOcc, a multi-modal 3D semantic occupancy prediction framework for cameras and LiDARs that proposes a two-step semi-supervised t… ▽ More Developing 3D semantic occupancy prediction models often relies on dense 3D annotations for supervised learning, a process that is both labor and resource-intensive, underscoring the need for label-efficient or even label-free approaches. To address this, we introduce MinkOcc, a multi-modal 3D semantic occupancy prediction framework for cameras and LiDARs that proposes a two-step semi-supervised training procedure. Here, a small dataset of explicitly 3D annotations warm-starts the training process; then, the supervision is continued by simpler-to-annotate accumulated LiDAR sweeps and images -- semantically labelled through vision foundational models. MinkOcc effectively utilizes these sensor-rich supervisory cues and reduces reliance on manual labeling by 90\% while maintaining competitive accuracy. In addition, the proposed model incorporates information from LiDAR and camera data through early fusion and leverages sparse convolution networks for real-time prediction. With its efficiency in both supervision and computation, we aim to extend MinkOcc beyond curated datasets, enabling broader real-world deployment of 3D semantic occupancy prediction in autonomous driving. △ Less

Submitted 3 April, 2025; originally announced April 2025.

Comments: 8 pages

arXiv:2503.08661 [pdf, other]

Task-Oriented Co-Design of Communication, Computing, and Control for Edge-Enabled Industrial Cyber-Physical Systems

Authors: Yufeng Diao, Yichi Zhang, Daniele De Martini, Philip Guodong Zhao, Emma Liying Li

Abstract: This paper proposes a task-oriented co-design framework that integrates communication, computing, and control to address the key challenges of bandwidth limitations, noise interference, and latency in mission-critical industrial Cyber-Physical Systems (CPS). To improve communication efficiency and robustness, we design a task-oriented Joint Source-Channel Coding (JSCC) using Information Bottleneck… ▽ More This paper proposes a task-oriented co-design framework that integrates communication, computing, and control to address the key challenges of bandwidth limitations, noise interference, and latency in mission-critical industrial Cyber-Physical Systems (CPS). To improve communication efficiency and robustness, we design a task-oriented Joint Source-Channel Coding (JSCC) using Information Bottleneck (IB) to enhance data transmission efficiency by prioritizing task-specific information. To mitigate the perceived End-to-End (E2E) delays, we develop a Delay-Aware Trajectory-Guided Control Prediction (DTCP) strategy that integrates trajectory planning with control prediction, predicting commands based on E2E delay. Moreover, the DTCP is co-designed with task-oriented JSCC, focusing on transmitting task-specific information for timely and reliable autonomous driving. Experimental results in the CARLA simulator demonstrate that, under an E2E delay of 1 second (20 time slots), the proposed framework achieves a driving score of 48.12, which is 31.59 points higher than using Better Portable Graphics (BPG) while reducing bandwidth usage by 99.19%. △ Less

Submitted 11 March, 2025; originally announced March 2025.

Comments: This paper has been accepted for publication in IEEE Journal on Selected Areas in Communications (JSAC), with publication expected in 2025

arXiv:2503.03449 [pdf, other]

Tiny Lidars for Manipulator Self-Awareness: Sensor Characterization and Initial Localization Experiments

Authors: Giammarco Caroleo, Alessandro Albini, Daniele De Martini, Timothy D. Barfoot, Perla Maiolino

Abstract: For several tasks, ranging from manipulation to inspection, it is beneficial for robots to localize a target object in their surroundings. In this paper, we propose an approach that utilizes coarse point clouds obtained from miniaturized VL53L5CX Time-of-Flight (ToF) sensors (tiny lidars) to localize a target object in the robot's workspace. We first conduct an experimental campaign to calibrate t… ▽ More For several tasks, ranging from manipulation to inspection, it is beneficial for robots to localize a target object in their surroundings. In this paper, we propose an approach that utilizes coarse point clouds obtained from miniaturized VL53L5CX Time-of-Flight (ToF) sensors (tiny lidars) to localize a target object in the robot's workspace. We first conduct an experimental campaign to calibrate the dependency of sensor readings on relative range and orientation to targets. We then propose a probabilistic sensor model that is validated in an object pose estimation task using a Particle Filter (PF). The results show that the proposed sensor model improves the performance of the localization of the target object with respect to two baselines: one that assumes measurements are free from uncertainty and one in which the confidence is provided by the sensor datasheet. △ Less

Submitted 5 March, 2025; originally announced March 2025.

Comments: 7 pages, 6 figures, 3 tables, conference submission

arXiv:2502.19135 [pdf, other]

A Temporal Planning Framework for Multi-Agent Systems via LLM-Aided Knowledge Base Management

Authors: Enrico Saccon, Ahmet Tikna, Davide De Martini, Edoardo Lamon, Luigi Palopoli, Marco Roveri

Abstract: This paper presents a novel framework, called PLANTOR (PLanning with Natural language for Task-Oriented Robots), that integrates Large Language Models (LLMs) with Prolog-based knowledge management and planning for multi-robot tasks. The system employs a two-phase generation of a robot-oriented knowledge base, ensuring reusability and compositional reasoning, as well as a three-step planning proced… ▽ More This paper presents a novel framework, called PLANTOR (PLanning with Natural language for Task-Oriented Robots), that integrates Large Language Models (LLMs) with Prolog-based knowledge management and planning for multi-robot tasks. The system employs a two-phase generation of a robot-oriented knowledge base, ensuring reusability and compositional reasoning, as well as a three-step planning procedure that handles temporal dependencies, resource constraints, and parallel task execution via mixed-integer linear programming. The final plan is converted into a Behaviour Tree for direct use in ROS2. We tested the framework in multi-robot assembly tasks within a block world and an arch-building scenario. Results demonstrate that LLMs can produce accurate knowledge bases with modest human feedback, while Prolog guarantees formal correctness and explainability. This approach underscores the potential of LLM integration for advanced robotics tasks requiring flexible, scalable, and human-understandable planning. △ Less

Submitted 26 February, 2025; originally announced February 2025.

arXiv:2411.04006 [pdf, other]

Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval

Authors: Davide Buoso, Luke Robinson, Giuseppe Averta, Philip Torr, Tim Franzmeyer, Daniele De Martini

Abstract: This study explores the potential of off-the-shelf Vision-Language Models (VLMs) for high-level robot planning in the context of autonomous navigation. Indeed, while most of existing learning-based approaches for path planning require extensive task-specific training/fine-tuning, we demonstrate how such training can be avoided for most practical cases. To do this, we introduce Select2Plan (S2P), a… ▽ More This study explores the potential of off-the-shelf Vision-Language Models (VLMs) for high-level robot planning in the context of autonomous navigation. Indeed, while most of existing learning-based approaches for path planning require extensive task-specific training/fine-tuning, we demonstrate how such training can be avoided for most practical cases. To do this, we introduce Select2Plan (S2P), a novel training-free framework for high-level robot planning which completely eliminates the need for fine-tuning or specialised training. By leveraging structured Visual Question-Answering (VQA) and In-Context Learning (ICL), our approach drastically reduces the need for data collection, requiring a fraction of the task-specific data typically used by trained models, or even relying only on online data. Our method facilitates the effective use of a generally trained VLM in a flexible and cost-efficient way, and does not require additional sensing except for a simple monocular camera. We demonstrate its adaptability across various scene types, context sources, and sensing setups. We evaluate our approach in two distinct scenarios: traditional First-Person View (FPV) and infrastructure-driven Third-Person View (TPV) navigation, demonstrating the flexibility and simplicity of our method. Our technique significantly enhances the navigational capabilities of a baseline VLM of approximately 50% in TPV scenario, and is comparable to trained models in the FPV one, with as few as 20 demonstrations. △ Less

Submitted 6 November, 2024; originally announced November 2024.

arXiv:2410.13514 [pdf, other]

GraphSCENE: On-Demand Critical Scenario Generation for Autonomous Vehicles in Simulation

Authors: Efimia Panagiotaki, Georgi Pramatarov, Lars Kunze, Daniele De Martini

Abstract: Testing and validating Autonomous Vehicle (AV) performance in safety-critical and diverse scenarios is crucial before real-world deployment. However, manually creating such scenarios in simulation remains a significant and time-consuming challenge. This work introduces a novel method that generates dynamic temporal scene graphs corresponding to diverse traffic scenarios, on-demand, tailored to use… ▽ More Testing and validating Autonomous Vehicle (AV) performance in safety-critical and diverse scenarios is crucial before real-world deployment. However, manually creating such scenarios in simulation remains a significant and time-consuming challenge. This work introduces a novel method that generates dynamic temporal scene graphs corresponding to diverse traffic scenarios, on-demand, tailored to user-defined preferences, such as AV actions, sets of dynamic agents, and criticality levels. A temporal Graph Neural Network (GNN) model learns to predict relationships between ego-vehicle, agents, and static structures, guided by real-world spatiotemporal interaction patterns and constrained by an ontology that restricts predictions to semantically valid links. Our model consistently outperforms the baselines in accurately generating links corresponding to the requested scenarios. We render the predicted scenarios in simulation to further demonstrate their effectiveness as testing environments for AV agents. △ Less

Submitted 11 March, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

Comments: 8 pages, 8 figures

arXiv:2410.11031 [pdf, other]

NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms

Authors: Efimia Panagiotaki, Daniele De Martini, Lars Kunze, Petar Veličković

Abstract: This study explores the intersection of neural networks and classical robotics algorithms through the Neural Algorithmic Reasoning (NAR) framework, allowing to train neural networks to effectively reason like classical robotics algorithms by learning to execute them. Algorithms are integral to robotics and safety-critical applications due to their predictable and consistent performance through log… ▽ More This study explores the intersection of neural networks and classical robotics algorithms through the Neural Algorithmic Reasoning (NAR) framework, allowing to train neural networks to effectively reason like classical robotics algorithms by learning to execute them. Algorithms are integral to robotics and safety-critical applications due to their predictable and consistent performance through logical and mathematical principles. In contrast, while neural networks are highly adaptable, handling complex, high-dimensional data and generalising across tasks, they often lack interpretability and transparency in their internal computations. We propose a Graph Neural Network (GNN)-based learning framework, NAR-*ICP, which learns the intermediate algorithmic steps of classical ICP-based pointcloud registration algorithms, and extend the CLRS Algorithmic Reasoning Benchmark with classical robotics perception algorithms. We evaluate our approach across diverse datasets, from real-world to synthetic, demonstrating its flexibility in handling complex and noisy inputs, along with its potential to be used as part of a larger learning system. Our results indicate that our method achieves superior performance across all benchmarks and datasets, consistently surpassing even the algorithms it has been trained on, further demonstrating its ability to generalise beyond the capabilities of traditional algorithms. △ Less

Submitted 16 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

Comments: 17 pages, 9 figures

arXiv:2408.01251 [pdf, other]

NeRFoot: Robot-Footprint Estimation for Image-Based Visual Servoing

Authors: Daoxin Zhong, Luke Robinson, Daniele De Martini

Abstract: This paper investigates the utility of Neural Radiance Fields (NeRF) models in extending the regions of operation of a mobile robot, controlled by Image-Based Visual Servoing (IBVS) via static CCTV cameras. Using NeRF as a 3D-representation prior, the robot's footprint may be extrapolated geometrically and used to train a CNN-based network to extract it online from the robot's appearance alone. Th… ▽ More This paper investigates the utility of Neural Radiance Fields (NeRF) models in extending the regions of operation of a mobile robot, controlled by Image-Based Visual Servoing (IBVS) via static CCTV cameras. Using NeRF as a 3D-representation prior, the robot's footprint may be extrapolated geometrically and used to train a CNN-based network to extract it online from the robot's appearance alone. The resulting footprint results in a tighter bound than a robot-wide bounding box, allowing the robot's controller to prescribe more optimal trajectories and expand its safe operational floor area. △ Less

Submitted 2 October, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

Comments: Accepted as extended abstract for ICRA@40

arXiv:2404.10446 [pdf, other]

Watching Grass Grow: Long-term Visual Navigation and Mission Planning for Autonomous Biodiversity Monitoring

Authors: Matthew Gadd, Daniele De Martini, Luke Pitt, Wayne Tubby, Matthew Towlson, Chris Prahacs, Oliver Bartlett, John Jackson, Man Qi, Paul Newman, Andrew Hector, Roberto Salguero-Gómez, Nick Hawes

Abstract: We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platfor… ▽ More We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platform, as localisation is a foundational part of that control loop, and so routes must be carefully taught and retaught until autonomy is robust and repeatable. Our system is demonstrated over a 6-week period monitoring the response of grass species to experimental climate change manipulations. We also discuss the applicability of our pipeline to monitor biodiversity in other complex natural settings. △ Less

Submitted 1 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: to be presented at the Workshop on Field Robotics - ICRA 2024

arXiv:2403.09025 [pdf, other]

VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition

Authors: Benjamin Ramtoula, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of im… ▽ More This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of images improves performance. In our recent work on measuring domain gaps between image datasets, we proposed a Visual Distribution of Neuron Activations (VDNA) representation to represent datasets of images. This representation can naturally handle image sequences and provides a general and granular feature representation derived from a general-purpose model. Moreover, our representation is based on tracking neuron activation values over the list of images to represent and is not limited to a particular neural network layer, therefore having access to high- and low-level concepts. This work shows how VDNAs can be used for VPR by learning a very lightweight and simple encoder to generate task-specific descriptors. Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution, such as to indoor environments and aerial imagery. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Published at ICRA 2024

arXiv:2403.07789 [pdf, other]

RobotCycle: Assessing Cycling Safety in Urban Environments

Authors: Efimia Panagiotaki, Tyler Reinmund, Stephan Mouton, Luke Pitt, Arundathi Shaji Shanthini, Wayne Tubby, Matthew Towlson, Samuel Sze, Brian Liu, Chris Prahacs, Daniele De Martini, Lars Kunze

Abstract: This paper introduces RobotCycle, a novel ongoing project that leverages Autonomous Vehicle (AV) research to investigate how road infrastructure influences cyclist behaviour and safety during real-world journeys. The project's requirements were defined in collaboration with key stakeholders, including city planners, cyclists, and policymakers, informing the design of risk and safety metrics and th… ▽ More This paper introduces RobotCycle, a novel ongoing project that leverages Autonomous Vehicle (AV) research to investigate how road infrastructure influences cyclist behaviour and safety during real-world journeys. The project's requirements were defined in collaboration with key stakeholders, including city planners, cyclists, and policymakers, informing the design of risk and safety metrics and the data collection criteria. We propose a data-driven approach relying on a novel, rich dataset of diverse traffic scenes and scenarios captured using a custom-designed wearable sensing unit. By analysing road-user trajectories, we identify normal path deviations indicating potential risks or hazardous interactions related to infrastructure elements in the environment. Our analysis correlates driving profiles and trajectory patterns with local road segments, driving conditions, and road-user interactions to predict traffic behaviours and identify critical scenarios. Moreover, by leveraging advancements in AV research, the project generates detailed 3D High-Definition Maps (HD Maps), traffic flow patterns, and trajectory models to provide a comprehensive assessment and analysis of the behaviour of all traffic agents. These data can then inform the design of cyclist-friendly road infrastructure, ultimately enhancing road safety and cyclability. The project provides valuable insights for enhancing cyclist protection and advancing sustainable urban mobility. △ Less

Submitted 26 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: IEEE Intelligent Vehicles Symposium (IV 2024)

arXiv:2403.04755 [pdf, other]

That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation

Authors: Georgi Pramatarov, Matthew Gadd, Paul Newman, Daniele De Martini

Abstract: This paper is about 3D pose estimation on LiDAR scans with extremely minimal storage requirements to enable scalable mapping and localisation. We achieve this by clustering all points of segmented scans into semantic objects and representing them only with their respective centroid and semantic class. In this way, each LiDAR scan is reduced to a compact collection of four-number vectors. This abst… ▽ More This paper is about 3D pose estimation on LiDAR scans with extremely minimal storage requirements to enable scalable mapping and localisation. We achieve this by clustering all points of segmented scans into semantic objects and representing them only with their respective centroid and semantic class. In this way, each LiDAR scan is reduced to a compact collection of four-number vectors. This abstracts away important structural information from the scenes, which is crucial for traditional registration approaches. To mitigate this, we introduce an object-matching network based on self- and cross-correlation that captures geometric and semantic relationships between entities. The respective matches allow us to recover the relative transformation between scans through weighted Singular Value Decomposition (SVD) and RANdom SAmple Consensus (RANSAC). We demonstrate that such representation is sufficient for metric localisation by registering point clouds taken under different viewpoints on the KITTI dataset, and at different periods of time localising between KITTI and KITTI-360. We achieve accurate metric estimates comparable with state-of-the-art methods with almost half the representation size, specifically 1.33 kB on average. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA) 2024

arXiv:2403.02845 [pdf, other]

OORD: The Oxford Offroad Radar Dataset

Authors: Matthew Gadd, Daniele De Martini, Oliver Bartlett, Paul Murcutt, Matt Towlson, Matthew Widojo, Valentina Muşat, Luke Robinson, Efimia Panagiotaki, Georgi Pramatarov, Marc Alexander Kühn, Letizia Marchegiani, Paul Newman, Lars Kunze

Abstract: There is a growing academic interest as well as commercial exploitation of millimetre-wave scanning radar for autonomous vehicle localisation and scene understanding. Although several datasets to support this research area have been released, they are primarily focused on urban or semi-urban environments. Nevertheless, rugged offroad deployments are important application areas which also present u… ▽ More There is a growing academic interest as well as commercial exploitation of millimetre-wave scanning radar for autonomous vehicle localisation and scene understanding. Although several datasets to support this research area have been released, they are primarily focused on urban or semi-urban environments. Nevertheless, rugged offroad deployments are important application areas which also present unique challenges and opportunities for this sensor technology. Therefore, the Oxford Offroad Radar Dataset (OORD) presents data collected in the rugged Scottish highlands in extreme weather. The radar data we offer to the community are accompanied by GPS/INS reference - to further stimulate research in radar place recognition. In total we release over 90GiB of radar scans as well as GPS and IMU readings by driving a diverse set of four routes over 11 forays, totalling approximately 154km of rugged driving. This is an area increasingly explored in literature, and we therefore present and release examples of recent open-sourced radar place recognition systems and their performance on our dataset. This includes a learned neural network, the weights of which we also release. The data and tools are made freely available to the community at https://oxford-robotics-institute.github.io/oord-dataset. △ Less

Submitted 25 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.17653 [pdf, other]

Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

Authors: David S. W. Williams, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: Knowing when a trained segmentation model is encountering data that is different to its training data is important. Understanding and mitigating the effects of this play an important part in their application from a performance and assurance perspective - this being a safety concern in applications such as autonomous vehicles (AVs). This work presents a segmentation network that can detect errors… ▽ More Knowing when a trained segmentation model is encountering data that is different to its training data is important. Understanding and mitigating the effects of this play an important part in their application from a performance and assurance perspective - this being a safety concern in applications such as autonomous vehicles (AVs). This work presents a segmentation network that can detect errors caused by challenging test domains without any additional annotation in a single forward pass. As annotation costs limit the diversity of labelled datasets, we use easy-to-obtain, uncurated and unlabelled data to learn to perform uncertainty estimation by selectively enforcing consistency over data augmentation. To this end, a novel segmentation benchmark based on the SAX Dataset is used, which includes labelled test data spanning three autonomous-driving domains, ranging in appearance from dense urban to off-road. The proposed method, named Gamma-SSL, consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark - by up to 10.7% in area under the receiver operating characteristic (ROC) curve and 19.2% in area under the precision-recall (PR) curve in the most challenging of the three scenarios. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted for publication in IEEE Transactions on Robotics (T-RO)

arXiv:2402.17622 [pdf, other]

Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling

Authors: David S. W. Williams, Matthew Gadd, Paul Newman, Daniele De Martini

Abstract: This work proposes a semantic segmentation network that produces high-quality uncertainty estimates in a single forward pass. We exploit general representations from foundation models and unlabelled datasets through a Masked Image Modeling (MIM) approach, which is robust to augmentation hyper-parameters and simpler than previous techniques. For neural networks used in safety-critical applications,… ▽ More This work proposes a semantic segmentation network that produces high-quality uncertainty estimates in a single forward pass. We exploit general representations from foundation models and unlabelled datasets through a Masked Image Modeling (MIM) approach, which is robust to augmentation hyper-parameters and simpler than previous techniques. For neural networks used in safety-critical applications, bias in the training data can lead to errors; therefore it is crucial to understand a network's limitations at run time and act accordingly. To this end, we test our proposed method on a number of test domains including the SAX Segmentation benchmark, which includes labelled test data from dense urban, rural and off-road driving domains. The proposed method consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted for publication at 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2310.15677 [pdf, other]

Robot-Relay : Building-Wide, Calibration-Less Visual Servoing with Learned Sensor Handover Network

Authors: Luke Robinson, Matthew Gadd, Paul Newman, Daniele De Martini

Abstract: We present a system which grows and manages a network of remote viewpoints during the natural installation cycle for a newly installed camera network or a newly deployed robot fleet. No explicit notion of camera position or orientation is required, neither global - i.e. relative to a building plan - nor local - i.e. relative to an interesting point in a room. Furthermore, no metric relationship be… ▽ More We present a system which grows and manages a network of remote viewpoints during the natural installation cycle for a newly installed camera network or a newly deployed robot fleet. No explicit notion of camera position or orientation is required, neither global - i.e. relative to a building plan - nor local - i.e. relative to an interesting point in a room. Furthermore, no metric relationship between viewpoints is required. Instead, we leverage our prior work in effective remote control without extrinsic or intrinsic calibration and extend it to the multi-camera setting. In this, we memorise, from simultaneous robot detections in the tracker thread, soft pixel-wise topological connections between viewpoints. We demonstrate our system with repeated autonomous traversals of workspaces connected by a network of six cameras across a productive office environment. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Paper accepted to the 18th International Symposium on Experimental Robotics (ISER 2023)

arXiv:2310.13622 [pdf, other]

What you see is what you get: Experience ranking with deep neural dataset-to-dataset similarity for topological localisation

Authors: Matthew Gadd, Benjamin Ramtoula, Daniele De Martini, Paul Newman

Abstract: Recalling the most relevant visual memories for localisation or understanding a priori the likely outcome of localisation effort against a particular visual memory is useful for efficient and robust visual navigation. Solutions to this problem should be divorced from performance appraisal against ground truth - as this is not available at run-time - and should ideally be based on generalisable env… ▽ More Recalling the most relevant visual memories for localisation or understanding a priori the likely outcome of localisation effort against a particular visual memory is useful for efficient and robust visual navigation. Solutions to this problem should be divorced from performance appraisal against ground truth - as this is not available at run-time - and should ideally be based on generalisable environmental observations. For this, we propose applying the recently developed Visual DNA as a highly scalable tool for comparing datasets of images - in this work, sequences of map and live experiences. In the case of localisation, important dataset differences impacting performance are modes of appearance change, including weather, lighting, and season. Specifically, for any deep architecture which is used for place recognition by matching feature volumes at a particular layer, we use distribution measures to compare neuron-wise activation statistics between live images and multiple previously recorded past experiences, with a potentially large seasonal (winter/summer) or time of day (day/night) shift. We find that differences in these statistics correlate to performance when localising using a past experience with the same appearance gap. We validate our approach over the Nordland cross-season dataset as well as data from Oxford's University Parks with lighting and mild seasonal change, showing excellent ability of our system to rank actual localisation performance across candidate experiences. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 18th International Symposium on Experimental Robotics (ISER 2023)

arXiv:2309.15049 [pdf, other]

When Prolog meets generative models: a new approach for managing knowledge and planning in robotic applications

Authors: Enrico Saccon, Ahmet Tikna, Davide De Martini, Edoardo Lamon, Marco Roveri, Luigi Palopoli

Abstract: In this paper, we propose a robot oriented knowledge management system based on the use of the Prolog language. Our framework hinges on a special organisation of knowledge base that enables: 1. its efficient population from natural language texts using semi-automated procedures based on Large Language Models, 2. the bumpless generation of temporal parallel plans for multi-robot systems through a s… ▽ More In this paper, we propose a robot oriented knowledge management system based on the use of the Prolog language. Our framework hinges on a special organisation of knowledge base that enables: 1. its efficient population from natural language texts using semi-automated procedures based on Large Language Models, 2. the bumpless generation of temporal parallel plans for multi-robot systems through a sequence of transformations, 3. the automated translation of the plan into an executable formalism (the behaviour trees). The framework is supported by a set of open source tools and is shown on a realistic application. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 7 pages, 4 figures, submitted to ICRA 2024

arXiv:2308.10597 [pdf, other]

Doppler-aware Odometry from FMCW Scanning Radar

Authors: Fraser Rennie, David Williams, Paul Newman, Daniele De Martini

Abstract: This work explores Doppler information from a millimetre-Wave (mm-W) Frequency-Modulated Continuous-Wave (FMCW) scanning radar to make odometry estimation more robust and accurate. Firstly, doppler information is added to the scan masking process to enhance correlative scan matching. Secondly, we train a Neural Network (NN) for regressing forward velocity directly from a single radar scan; we fuse… ▽ More This work explores Doppler information from a millimetre-Wave (mm-W) Frequency-Modulated Continuous-Wave (FMCW) scanning radar to make odometry estimation more robust and accurate. Firstly, doppler information is added to the scan masking process to enhance correlative scan matching. Secondly, we train a Neural Network (NN) for regressing forward velocity directly from a single radar scan; we fuse this estimate with the correlative scan matching estimate and show improved robustness to bad estimates caused by challenging environment geometries, e.g. narrow tunnels. We test our method with a novel custom dataset which is released with this work at https://ori.ox.ac.uk/publications/datasets. △ Less

Submitted 14 December, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: Accepted to ITSC 2023

arXiv:2308.04220 [pdf, other]

Semantic Interpretation and Validation of Graph Attention-based Explanations for GNN Models

Authors: Efimia Panagiotaki, Daniele De Martini, Lars Kunze

Abstract: In this work, we propose a methodology for investigating the use of semantic attention to enhance the explainability of Graph Neural Network (GNN)-based models. Graph Deep Learning (GDL) has emerged as a promising field for tasks like scene interpretation, leveraging flexible graph structures to concisely describe complex features and relationships. As traditional explainability methods used in eX… ▽ More In this work, we propose a methodology for investigating the use of semantic attention to enhance the explainability of Graph Neural Network (GNN)-based models. Graph Deep Learning (GDL) has emerged as a promising field for tasks like scene interpretation, leveraging flexible graph structures to concisely describe complex features and relationships. As traditional explainability methods used in eXplainable AI (XAI) cannot be directly applied to such structures, graph-specific approaches are introduced. Attention has been previously employed to estimate the importance of input features in GDL, however, the fidelity of this method in generating accurate and consistent explanations has been questioned. To evaluate the validity of using attention weights as feature importance indicators, we introduce semantically-informed perturbations and correlate predicted attention weights with the accuracy of the model. Our work extends existing attention-based graph explainability methods by analysing the divergence in the attention distributions in relation to semantically sorted feature sets and the behaviour of a GNN model, efficiently estimating feature importance. We apply our methodology on a lidar pointcloud estimation model successfully identifying key semantic classes that contribute to enhanced performance, effectively generating reliable post-hoc semantic explanations. △ Less

Submitted 20 October, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: International Conference on Advanced Robotics (ICAR 2023)

ACM Class: G.3; I.2.10; I.2.9; I.4.8; I.5.2; I.5.1

arXiv:2308.03718 [pdf, other]

SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention

Authors: Efimia Panagiotaki, Daniele De Martini, Georgi Pramatarov, Matthew Gadd, Lars Kunze

Abstract: This paper proposes a Graph Neural Network(GNN)-based method for exploiting semantics and local geometry to guide the identification of reliable pointcloud registration candidates. Semantic and morphological features of the environment serve as key reference points for registration, enabling accurate lidar-based pose estimation. Our novel lightweight static graph structure informs our attention-ba… ▽ More This paper proposes a Graph Neural Network(GNN)-based method for exploiting semantics and local geometry to guide the identification of reliable pointcloud registration candidates. Semantic and morphological features of the environment serve as key reference points for registration, enabling accurate lidar-based pose estimation. Our novel lightweight static graph structure informs our attention-based node aggregation network by identifying semantic-instance relationships, acting as an inductive bias to significantly reduce the computational burden of pointcloud registration. By connecting candidate nodes and exploiting cross-graph attention, we identify confidence scores for all potential registration correspondences and estimate the displacement between pointcloud scans. Our pipeline enables introspective analysis of the model's performance by correlating it with the individual contributions of local structures in the environment, providing valuable insights into the system's behaviour. We test our method on the KITTI odometry dataset, achieving competitive accuracy compared to benchmark methods and a higher track smoothness while relying on significantly fewer network parameters. △ Less

Submitted 22 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: International Conference on Advanced Robotics (ICAR 2023)

ACM Class: I.2.9; I.2.10; I.2.4; I.4.8; I.5.1; I.5.2

arXiv:2306.14848 [pdf, other]

Visual Servoing on Wheels: Robust Robot Orientation Estimation in Remote Viewpoint Control

Authors: Luke Robinson, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: This work proposes a fast deployment pipeline for visually-servoed robots which does not assume anything about either the robot - e.g. sizes, colour or the presence of markers - or the deployment environment. In this, accurate estimation of robot orientation is crucial for successful navigation in complex environments; manual labelling of angular values is, though, time-consuming and possibly hard… ▽ More This work proposes a fast deployment pipeline for visually-servoed robots which does not assume anything about either the robot - e.g. sizes, colour or the presence of markers - or the deployment environment. In this, accurate estimation of robot orientation is crucial for successful navigation in complex environments; manual labelling of angular values is, though, time-consuming and possibly hard to perform. For this reason, we propose a weakly supervised pipeline that can produce a vast amount of data in a small amount of time. We evaluate our approach on a dataset of remote camera images captured in various indoor environments demonstrating high tracking performances when integrated into a fully-autonomous pipeline with a simple controller. With this, we then analyse the data requirement of our approach, showing how it is possible to deploy a new robot in a new environment in less than 30.00 min. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: Accepted at IROS 2023

arXiv:2304.13150 [pdf, other]

Roll-Drop: accounting for observation noise with a single parameter

Authors: Luigi Campanaro, Daniele De Martini, Siddhant Gangapurwala, Wolfgang Merkt, Ioannis Havoutis

Abstract: This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to account for observation noise during deployment without explicitly modelling its distribution for each state. DRL is a promising approach to control robots for highly dynamic and feedback-based manoeuvres, and accurate simulators are crucial to pr… ▽ More This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to account for observation noise during deployment without explicitly modelling its distribution for each state. DRL is a promising approach to control robots for highly dynamic and feedback-based manoeuvres, and accurate simulators are crucial to providing cheap and abundant data to learn the desired behaviour. Nevertheless, the simulated data are noiseless and generally show a distributional shift that challenges the deployment on real machines where sensor readings are affected by noise. The standard solution is modelling the latter and injecting it during training; while this requires a thorough system identification, Roll-Drop enhances the robustness to sensor noise by tuning only a single parameter. We demonstrate an 80% success rate when up to 25% noise is injected in the observations, with twice higher robustness than the baselines. We deploy the controller trained in simulation on a Unitree A1 platform and assess this improved robustness on the physical system. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted at Learning for Dynamics & Control Conference 2023 (L4DC), 10 pages, 7 figures

arXiv:2304.10036 [pdf, other]

Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations

Authors: Benjamin Ramtoula, Matthew Gadd, Paul Newman, Daniele De Martini

Abstract: Selecting appropriate datasets is critical in modern computer vision. However, no general-purpose tools exist to evaluate the extent to which two datasets differ. For this, we propose representing images - and by extension datasets - using Distributions of Neuron Activations (DNAs). DNAs fit distributions, such as histograms or Gaussians, to activations of neurons in a pre-trained feature extracto… ▽ More Selecting appropriate datasets is critical in modern computer vision. However, no general-purpose tools exist to evaluate the extent to which two datasets differ. For this, we propose representing images - and by extension datasets - using Distributions of Neuron Activations (DNAs). DNAs fit distributions, such as histograms or Gaussians, to activations of neurons in a pre-trained feature extractor through which we pass the image(s) to represent. This extractor is frozen for all datasets, and we rely on its generally expressive power in feature space. By comparing two DNAs, we can evaluate the extent to which two datasets differ with granular control over the comparison attributes of interest, providing the ability to customise the way distances are measured to suit the requirements of the task at hand. Furthermore, DNAs are compact, representing datasets of any size with less than 15 megabytes. We demonstrate the value of DNAs by evaluating their applicability on several tasks, including conditional dataset comparison, synthetic image evaluation, and transfer learning, and across diverse datasets, ranging from synthetic cat images to celebrity faces and urban driving scenes. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: Published at CVPR 2023. Project page with code: https://bramtoula.github.io/vdna/

arXiv:2302.03477 [pdf, other]

Explainable Action Prediction through Self-Supervision on Scene Graphs

Authors: Pawit Kochakarn, Daniele De Martini, Daniel Omeiza, Lars Kunze

Abstract: This work explores scene graphs as a distilled representation of high-level information for autonomous driving, applied to future driver-action prediction. Given the scarcity and strong imbalance of data samples, we propose a self-supervision pipeline to infer representative and well-separated embeddings. Key aspects are interpretability and explainability; as such, we embed in our architecture at… ▽ More This work explores scene graphs as a distilled representation of high-level information for autonomous driving, applied to future driver-action prediction. Given the scarcity and strong imbalance of data samples, we propose a self-supervision pipeline to infer representative and well-separated embeddings. Key aspects are interpretability and explainability; as such, we embed in our architecture attention mechanisms that can create spatial and temporal heatmaps on the scene graphs. We evaluate our system on the ROAD dataset against a fully-supervised approach, showing the superiority of our training regime. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: Accepted to the 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2208.04233 [pdf, other]

Sampling, Communication, and Prediction Co-Design for Synchronizing the Real-World Device and Digital Model in Metaverse

Authors: Zhen Meng, Changyang She, Guodong Zhao, Daniele De Martini

Abstract: The metaverse has the potential to revolutionize the next generation of the Internet by supporting highly interactive services with the help of Mixed Reality (MR) technologies; still, to provide a satisfactory experience for users, the synchronization between the physical world and its digital models is crucial. This work proposes a sampling, communication and prediction co-design framework to min… ▽ More The metaverse has the potential to revolutionize the next generation of the Internet by supporting highly interactive services with the help of Mixed Reality (MR) technologies; still, to provide a satisfactory experience for users, the synchronization between the physical world and its digital models is crucial. This work proposes a sampling, communication and prediction co-design framework to minimize the communication load subject to a constraint on tracking the Mean Squared Error (MSE) between a real-world device and its digital model in the metaverse. To optimize the sampling rate and the prediction horizon, we exploit expert knowledge and develop a constrained Deep Reinforcement Learning (DRL) algorithm, named Knowledge-assisted Constrained Twin-Delayed Deep Deterministic (KC-TD3) policy gradient algorithm. We validate our framework on a prototype composed of a real-world robotic arm and its digital model. Compared with existing approaches: (1) When the tracking error constraint is stringent (MSE=0.002 degrees), our policy degenerates into the policy in the sampling-communication co-design framework. (2) When the tracking error constraint is mild (MSE=0.007 degrees), our policy degenerates into the policy in the prediction-communication co-design framework. (3) Our framework achieves a better trade-off between the average MSE and the average communication load compared with a communication system without sampling and prediction. For example, the average communication load can be reduced up to 87% when the track error constraint is 0.002 degrees. (4) Our policy outperforms the benchmark with the static sampling rate and prediction horizon optimized by exhaustive search, in terms of the tail probability of the tracking error. Furthermore, with the assistance of expert knowledge, the proposed algorithm KC-TD3 achieves better convergence time, stability, and final policy performance. △ Less

Submitted 31 July, 2022; originally announced August 2022.

arXiv:2206.15154 [pdf, other]

BoxGraph: Semantic Place Recognition and Pose Estimation from 3D LiDAR

Authors: Georgi Pramatarov, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: This paper is about extremely robust and lightweight localisation using LiDAR point clouds based on instance segmentation and graph matching. We model 3D point clouds as fully-connected graphs of semantically identified components where each vertex corresponds to an object instance and encodes its shape. Optimal vertex association across graphs allows for full 6-Degree-of-Freedom (DoF) pose estima… ▽ More This paper is about extremely robust and lightweight localisation using LiDAR point clouds based on instance segmentation and graph matching. We model 3D point clouds as fully-connected graphs of semantically identified components where each vertex corresponds to an object instance and encodes its shape. Optimal vertex association across graphs allows for full 6-Degree-of-Freedom (DoF) pose estimation and place recognition by measuring similarity. This representation is very concise, condensing the size of maps by a factor of 25 against the state-of-the-art, requiring only 3kB to represent a 1.4MB laser scan. We verify the efficacy of our system on the SemanticKITTI dataset, where we achieve a new state-of-the-art in place recognition, with an average of 88.4% recall at 100% precision where the next closest competitor follows with 64.9%. We also show accurate metric pose estimation performance - estimating 6-DoF pose with median errors of 10 cm and 0.33 deg. △ Less

Submitted 30 June, 2022; originally announced June 2022.

Comments: Accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2206.10517 [pdf, other]

What Goes Around: Leveraging a Constant-curvature Motion Constraint in Radar Odometry

Authors: Roberto Aldera, Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: This paper presents a method that leverages vehicle motion constraints to refine data associations in a point-based radar odometry system. By using the strong prior on how a non-holonomic robot is constrained to move smoothly through its environment, we develop the necessary framework to estimate ego-motion from a single landmark association rather than considering all of these correspondences at… ▽ More This paper presents a method that leverages vehicle motion constraints to refine data associations in a point-based radar odometry system. By using the strong prior on how a non-holonomic robot is constrained to move smoothly through its environment, we develop the necessary framework to estimate ego-motion from a single landmark association rather than considering all of these correspondences at once. This allows for informed outlier detection of poor matches that are a dominant source of pose estimate error. By refining the subset of matched landmarks, we see an absolute decrease of 2.15% (from 4.68% to 2.53%) in translational error, approximately halving the error in odometry (reducing by 45.94%) than when using the full set of correspondences. This contribution is relevant to other point-based odometry implementations that rely on a range sensor and provides a lightweight and interpretable means of incorporating vehicle dynamics for ego-motion estimation. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted for RA-L

arXiv:2203.03405 [pdf, other]

Depth-SIMS: Semi-Parametric Image and Depth Synthesis

Authors: Valentina Musat, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: In this paper we present a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an in-painting network that transforms the RGB canvases into high quality RGB images and the sparse depth maps into pixel-wise dense depth maps. We benchmark our method in terms of structural alignment and image quality, showing an increa… ▽ More In this paper we present a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an in-painting network that transforms the RGB canvases into high quality RGB images and the sparse depth maps into pixel-wise dense depth maps. We benchmark our method in terms of structural alignment and image quality, showing an increase in mIoU over SOTA by 3.7 percentage points and a highly competitive FID. Furthermore, we analyse the quality of the generated data as training data for semantic segmentation and depth completion, and show that our approach is more suited for this purpose than other methods. △ Less

Submitted 2 June, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

arXiv:2203.00459 [pdf, other]

Fast-MbyM: Leveraging Translational Invariance of the Fourier Transform for Efficient and Accurate Radar Odometry

Authors: Robert Weston, Matthew Gadd, Daniele De Martini, Paul Newman, Ingmar Posner

Abstract: Masking By Moving (MByM), provides robust and accurate radar odometry measurements through an exhaustive correlative search across discretised pose candidates. However, this dense search creates a significant computational bottleneck which hinders real-time performance when high-end GPUs are not available. Utilising the translational invariance of the Fourier Transform, in our approach, f-MByM, we… ▽ More Masking By Moving (MByM), provides robust and accurate radar odometry measurements through an exhaustive correlative search across discretised pose candidates. However, this dense search creates a significant computational bottleneck which hinders real-time performance when high-end GPUs are not available. Utilising the translational invariance of the Fourier Transform, in our approach, f-MByM, we decouple the search for angle and translation. By maintaining end-to-end differentiability a neural network is used to mask scans and trained by supervising pose prediction directly. Training faster and with less memory, utilising a decoupled search allows f-MByM to achieve significant run-time performance improvements on a CPU (168%) and to run in real-time on embedded devices, in stark contrast to MByM. Throughout, our approach remains accurate and competitive with the best radar odometry variants available in the literature -- achieving an end-point drift of 2.01% in translation and 6.3deg/km on the Oxford Radar RobotCar Dataset. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: 7 pages

arXiv:2111.02995 [pdf, other]

Unsupervised Change Detection of Extreme Events Using ML On-Board

Authors: Vít Růžička, Anna Vaughan, Daniele De Martini, James Fulton, Valentina Salvatelli, Chris Bridges, Gonzalo Mateo-Garcia, Valentina Zantedeschi

Abstract: In this paper, we introduce RaVAEn, a lightweight, unsupervised approach for change detection in satellite data based on Variational Auto-Encoders (VAEs) with the specific purpose of on-board deployment. Applications such as disaster management enormously benefit from the rapid availability of satellite observations. Traditionally, data analysis is performed on the ground after all data is transfe… ▽ More In this paper, we introduce RaVAEn, a lightweight, unsupervised approach for change detection in satellite data based on Variational Auto-Encoders (VAEs) with the specific purpose of on-board deployment. Applications such as disaster management enormously benefit from the rapid availability of satellite observations. Traditionally, data analysis is performed on the ground after all data is transferred - downlinked - to a ground station. Constraint on the downlink capabilities therefore affects any downstream application. In contrast, RaVAEn pre-processes the sampled data directly on the satellite and flags changed areas to prioritise for downlink, shortening the response time. We verified the efficacy of our system on a dataset composed of time series of catastrophic events - which we plan to release alongside this publication - demonstrating that RaVAEn outperforms pixel-wise baselines. Finally we tested our approach on resource-limited hardware for assessing computational and memory limitations. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: 5 pages (+2 in appendix), 5 figures (+1 in appendix), 2 tables (+3 in appendix), NeurIPS Workshop on Artificial Intelligence for Humanitarian Assistance and Disaster Response Workshop (AI+HADR), 2021

arXiv:2110.02744 [pdf, other]

Contrastive Learning for Unsupervised Radar Place Recognition

Authors: Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving the place recognition problem with complex radar data. Our method is based on invariant instance feature learning but is tailored for the task of re-localisation by exploiting for data augmentation the temporal successivity of data as collected by a mobile platform moving through the scene sm… ▽ More We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving the place recognition problem with complex radar data. Our method is based on invariant instance feature learning but is tailored for the task of re-localisation by exploiting for data augmentation the temporal successivity of data as collected by a mobile platform moving through the scene smoothly. We experiment across two prominent urban radar datasets totalling over 400 km of driving and show that we achieve a new radar place recognition state-of-the-art. Specifically, the proposed system proves correct for 98.38% of the queries that it is presented with over a challenging re-localisation sequence, using only the single nearest neighbour in the learned metric space. We also find that our learned model shows better understanding of out-of-lane loop closures at arbitrary orientation than non-learned radar scan descriptors. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: accepted for publication at the IEEE International Conference on Advanced Robotics (ICAR) 2021. arXiv admin note: substantial text overlap with arXiv:2106.06703

arXiv:2106.08983 [pdf, other]

The Oxford Road Boundaries Dataset

Authors: Tarlan Suleymanov, Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: In this paper we present the Oxford Road Boundaries Dataset, designed for training and testing machine-learning-based road-boundary detection and inference approaches. We have hand-annotated two of the 10 km-long forays from the Oxford Robotcar Dataset and generated from other forays several thousand further examples with semi-annotated road-boundary masks. To boost the number of training samples… ▽ More In this paper we present the Oxford Road Boundaries Dataset, designed for training and testing machine-learning-based road-boundary detection and inference approaches. We have hand-annotated two of the 10 km-long forays from the Oxford Robotcar Dataset and generated from other forays several thousand further examples with semi-annotated road-boundary masks. To boost the number of training samples in this way, we used a vision-based localiser to project labels from the annotated datasets to other traversals at different times and weather conditions. As a result, we release 62605 labelled samples, of which 47639 samples are curated. Each of these samples contains both raw and classified masks for left and right lenses. Our data contains images from a diverse set of scenarios such as straight roads, parked cars, junctions, etc. Files for download and tools for manipulating the labelled data are available at: oxford-robotics-institute.github.io/road-boundaries-dataset △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: Accepted for publication at the workshop "3D-DLAD: 3D-Deep Learning for Autonomous Driving" (WS15), Intelligent Vehicles Symposium (IV 2021)

arXiv:2106.06703 [pdf, other]

Unsupervised Place Recognition with Deep Embedding Learning over Radar Videos

Authors: Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving place recognition problem using complex radar data. We experiment on 280 km of data and show performance exceeding state-of-the-art supervised approaches, localising correctly 98.38% of the time when using just the nearest database candidate. We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving place recognition problem using complex radar data. We experiment on 280 km of data and show performance exceeding state-of-the-art supervised approaches, localising correctly 98.38% of the time when using just the nearest database candidate. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: to be presented at the Workshop on Radar Perception for All-Weather Autonomy at the IEEE International Conference on Robotics and Automation (ICRA) 2021

arXiv:2103.00869 [pdf, other]

Fool Me Once: Robust Selective Segmentation via Out-of-Distribution Detection with Contrastive Learning

Authors: David Williams, Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: In this work, we train a network to simultaneously perform segmentation and pixel-wise Out-of-Distribution (OoD) detection, such that the segmentation of unknown regions of scenes can be rejected. This is made possible by leveraging an OoD dataset with a novel contrastive objective and data augmentation scheme. By combining data including unknown classes in the training data, a more robust feature… ▽ More In this work, we train a network to simultaneously perform segmentation and pixel-wise Out-of-Distribution (OoD) detection, such that the segmentation of unknown regions of scenes can be rejected. This is made possible by leveraging an OoD dataset with a novel contrastive objective and data augmentation scheme. By combining data including unknown classes in the training data, a more robust feature representation can be learned with known classes represented distinctly from those unknown. When presented with unknown classes or conditions, many current approaches for segmentation frequently exhibit high confidence in their inaccurate segmentations and cannot be trusted in many operational environments. We validate our system on a real-world dataset of unusual driving scenes, and show that by selectively segmenting scenes based on what is predicted as OoD, we can increase the segmentation accuracy by an IoU of 0.2 with respect to alternative techniques. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: Accepted for publication at the 2021 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2102.12891 [pdf, other]

CPG-ACTOR: Reinforcement Learning for Central Pattern Generators

Authors: Luigi Campanaro, Siddhant Gangapurwala, Daniele De Martini, Wolfgang Merkt, Ioannis Havoutis

Abstract: Central Pattern Generators (CPGs) have several properties desirable for locomotion: they generate smooth trajectories, are robust to perturbations and are simple to implement. Although conceptually promising, we argue that the full potential of CPGs has so far been limited by insufficient sensory-feedback information. This paper proposes a new methodology that allows tuning CPG controllers through… ▽ More Central Pattern Generators (CPGs) have several properties desirable for locomotion: they generate smooth trajectories, are robust to perturbations and are simple to implement. Although conceptually promising, we argue that the full potential of CPGs has so far been limited by insufficient sensory-feedback information. This paper proposes a new methodology that allows tuning CPG controllers through gradient-based optimization in a Reinforcement Learning (RL) setting. To the best of our knowledge, this is the first time CPGs have been trained in conjunction with a MultilayerPerceptron (MLP) network in a Deep-RL context. In particular, we show how CPGs can directly be integrated as the Actor in an Actor-Critic formulation. Additionally, we demonstrate how this change permits us to integrate highly non-linear feedback directly from sensory perception to reshape the oscillators' dynamics. Our results on a locomotion task using a single-leg hopper demonstrate that explicitly using the CPG as the Actor rather than as part of the environment results in a significant increase in the reward gained over time (6x more) compared with previous approaches. Furthermore, we show that our method without feedback reproduces results similar to prior work with feedback. Finally, we demonstrate how our closed-loop CPG progressively improves the hopping behaviour for longer training epochs relying only on basic reward functions. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2012.09670 [pdf, other]

RainBench: Towards Global Precipitation Forecasting from Satellite Imagery

Authors: Christian Schroeder de Witt, Catherine Tong, Valentina Zantedeschi, Daniele De Martini, Freddie Kalaitzis, Matthew Chantry, Duncan Watson-Parris, Piotr Bilinski

Abstract: Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the developing world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global pr… ▽ More Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the developing world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global precipitation forecasts. In this paper, we introduce \textbf{RainBench}, a new multi-modal benchmark dataset for data-driven precipitation forecasting. It includes simulated satellite data, a selection of relevant meteorological data from the ERA5 reanalysis product, and IMERG precipitation data. We also release \textbf{PyRain}, a library to process large precipitation datasets efficiently. We present an extensive analysis of our novel dataset and establish baseline results for two benchmark medium-range precipitation forecasting tasks. Finally, we discuss existing data-driven weather forecasting methodologies and suggest future research avenues. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: Work completed during the 2020 Frontier Development Lab research accelerator, a private-public partnership with NASA in the US, and ESA in Europe. Accepted as a spotlight/long oral talk at both Climate Change and AI, as well as AI for Earth Sciences Workshops at NeurIPS 2020

arXiv:2006.02108 [pdf, other]

Self-Supervised Localisation between Range Sensors and Overhead Imagery

Authors: Tim Y. Tang, Daniele De Martini, Shangzhe Wu, Paul Newman

Abstract: Publicly available satellite imagery can be an ubiquitous, cheap, and powerful tool for vehicle localisation when a prior sensor map is unavailable. However, satellite images are not directly comparable to data from ground range sensors because of their starkly different modalities. We present a learned metric localisation method that not only handles the modality difference, but is cheap to train… ▽ More Publicly available satellite imagery can be an ubiquitous, cheap, and powerful tool for vehicle localisation when a prior sensor map is unavailable. However, satellite images are not directly comparable to data from ground range sensors because of their starkly different modalities. We present a learned metric localisation method that not only handles the modality difference, but is cheap to train, learning in a self-supervised fashion without metrically accurate ground truth. By evaluating across multiple real-world datasets, we demonstrate the robustness and versatility of our method for various sensor configurations. We pay particular attention to the use of millimetre wave radar, which, owing to its complex interaction with the scene and its immunity to weather and lighting, makes for a compelling and valuable use case. △ Less

Submitted 23 September, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: Robotics: Science and Systems (RSS) 2020

arXiv:2005.05175 [pdf, other]

Keep off the Grass: Permissible Driving Routes from Radar with Weak Audio Supervision

Authors: David Williams, Daniele De Martini, Matthew Gadd, Letizia Marchegiani, Paul Newman

Abstract: Reliable outdoor deployment of mobile robots requires the robust identification of permissible driving routes in a given environment. The performance of LiDAR and vision-based perception systems deteriorates significantly if certain environmental factors are present e.g. rain, fog, darkness. Perception systems based on FMCW scanning radar maintain full performance regardless of environmental condi… ▽ More Reliable outdoor deployment of mobile robots requires the robust identification of permissible driving routes in a given environment. The performance of LiDAR and vision-based perception systems deteriorates significantly if certain environmental factors are present e.g. rain, fog, darkness. Perception systems based on FMCW scanning radar maintain full performance regardless of environmental conditions and with a longer range than alternative sensors. Learning to segment a radar scan based on driveability in a fully supervised manner is not feasible as labelling each radar scan on a bin-by-bin basis is both difficult and time-consuming to do by hand. We therefore weakly supervise the training of the radar-based classifier through an audio-based classifier that is able to predict the terrain type underneath the robot. By combining odometry, GPS and the terrain labels from the audio classifier, we are able to construct a terrain labelled trajectory of the robot in the environment which is then used to label the radar scans. Using a curriculum learning procedure, we then train a radar segmentation network to generalise beyond the initial labelling and to detect all permissible driving routes in the environment. △ Less

Submitted 22 September, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

Comments: accepted for publication at the IEEE Intelligent Transportation Systems Conference (ITSC) 2020

arXiv:2005.02031 [pdf, other]

Sense-Assess-eXplain (SAX): Building Trust in Autonomous Vehicles in Challenging Real-World Driving Scenarios

Authors: Matthew Gadd, Daniele De Martini, Letizia Marchegiani, Paul Newman, Lars Kunze

Abstract: This paper discusses ongoing work in demonstrating research in mobile autonomy in challenging driving scenarios. In our approach, we address fundamental technical issues to overcome critical barriers to assurance and regulation for large-scale deployments of autonomous systems. To this end, we present how we build robots that (1) can robustly sense and interpret their environment using traditional… ▽ More This paper discusses ongoing work in demonstrating research in mobile autonomy in challenging driving scenarios. In our approach, we address fundamental technical issues to overcome critical barriers to assurance and regulation for large-scale deployments of autonomous systems. To this end, we present how we build robots that (1) can robustly sense and interpret their environment using traditional as well as unconventional sensors; (2) can assess their own capabilities; and (3), vitally in the purpose of assurance and trust, can provide causal explanations of their interpretations and assessments. As it is essential that robots are safe and trusted, we design, develop, and demonstrate fundamental technologies in real-world applications to overcome critical barriers which impede the current deployment of robots in economically and socially important areas. Finally, we describe ongoing work in the collection of an unusual, rare, and highly valuable dataset. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: accepted for publication at the IEEE Intelligent Vehicles Symposium (IV), Workshop on Ensuring and Validating Safety for Automated Vehicles (EVSAV), 2020, project URL: https://ori.ox.ac.uk/projects/sense-assess-explain-sax

arXiv:2004.03451 [pdf, other]

RSS-Net: Weakly-Supervised Multi-Class Semantic Segmentation with FMCW Radar

Authors: Prannay Kaul, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: This paper presents an efficient annotation procedure and an application thereof to end-to-end, rich semantic segmentation of the sensed environment using FMCW scanning radar. We advocate radar over the traditional sensors used for this task as it operates at longer ranges and is substantially more robust to adverse weather and illumination conditions. We avoid laborious manual labelling by exploi… ▽ More This paper presents an efficient annotation procedure and an application thereof to end-to-end, rich semantic segmentation of the sensed environment using FMCW scanning radar. We advocate radar over the traditional sensors used for this task as it operates at longer ranges and is substantially more robust to adverse weather and illumination conditions. We avoid laborious manual labelling by exploiting the largest radar-focused urban autonomy dataset collected to date, correlating radar scans with RGB cameras and LiDAR sensors, for which semantic segmentation is an already consolidated procedure. The training procedure leverages a state-of-the-art natural image segmentation system which is publicly available and as such, in contrast to previous approaches, allows for the production of copious labels for the radar stream by incorporating four camera and two LiDAR streams. Additionally, the losses are computed taking into account labels to the radar sensor horizon by accumulating LiDAR returns along a pose-chain ahead and behind of the current vehicle position. Finally, we present the network with multi-channel radar scan inputs in order to deal with ephemeral and dynamic scene objects. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: submitted to IEEE Intelligent Vehicles Symposium (IV) 2020

arXiv:2003.04699 [pdf, other]

Look Around You: Sequence-based Radar Place Recognition with Learned Rotational Invariance

Authors: Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: This paper details an application which yields significant improvements to the adeptness of place recognition with Frequency-Modulated Continuous-Wave radar - a commercially promising sensor poised for exploitation in mobile autonomy. We show how a rotationally-invariant metric embedding for radar scans can be integrated into sequence-based trajectory matching systems typically applied to videos t… ▽ More This paper details an application which yields significant improvements to the adeptness of place recognition with Frequency-Modulated Continuous-Wave radar - a commercially promising sensor poised for exploitation in mobile autonomy. We show how a rotationally-invariant metric embedding for radar scans can be integrated into sequence-based trajectory matching systems typically applied to videos taken by visual sensors. Due to the complete horizontal field of view inherent to the radar scan formation process, we show how this off-the-shelf sequence-based trajectory matching system can be manipulated to detect place matches when the vehicle is travelling down a previously visited stretch of road in the opposite direction. We demonstrate the efficacy of the approach on 26 km of challenging urban driving taken from the largest radar-focused urban autonomy dataset released to date -- showing a boost of 30% in recall at high levels of precision over a nearest neighbour approach. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: accepted for publication at the IEEE/ION Position, Location and Navigation Symposium (PLANS) 2020

arXiv:2001.09438 [pdf, other]

Kidnapped Radar: Topological Radar Localisation using Rotationally-Invariant Metric Learning

Authors: Ştefan Săftescu, Matthew Gadd, Daniele De Martini, Dan Barnes, Paul Newman

Abstract: This paper presents a system for robust, large-scale topological localisation using Frequency-Modulated Continuous-Wave (FMCW) scanning radar. We learn a metric space for embedding polar radar scans using CNN and NetVLAD architectures traditionally applied to the visual domain. However, we tailor the feature extraction for more suitability to the polar nature of radar scan formation using cylindri… ▽ More This paper presents a system for robust, large-scale topological localisation using Frequency-Modulated Continuous-Wave (FMCW) scanning radar. We learn a metric space for embedding polar radar scans using CNN and NetVLAD architectures traditionally applied to the visual domain. However, we tailor the feature extraction for more suitability to the polar nature of radar scan formation using cylindrical convolutions, anti-aliasing blurring, and azimuth-wise max-pooling; all in order to bolster the rotational invariance. The enforced metric space is then used to encode a reference trajectory, serving as a map, which is queried for nearest neighbours (NNs) for recognition of places at run-time. We demonstrate the performance of our topological localisation system over the course of many repeat forays using the largest radar-focused mobile autonomy dataset released to date, totalling 280 km of urban driving, a small portion of which we also use to learn the weights of the modified architecture. As this work represents a novel application for FMCW radar, we analyse the utility of the proposed method via a comprehensive set of metrics which provide insight into the efficacy when used in a realistic system, showing improved performance over the root architecture even in the face of random rotational perturbation. △ Less

Submitted 26 January, 2020; originally announced January 2020.

Comments: submitted to the 2020 International Conference on Robotics and Automation (ICRA)

arXiv:2001.03233 [pdf, other]

RSL-Net: Localising in Satellite Images From a Radar on the Ground

Authors: Tim Y. Tang, Daniele De Martini, Dan Barnes, Paul Newman

Abstract: This paper is about localising a vehicle in an overhead image using FMCW radar mounted on a ground vehicle. FMCW radar offers extraordinary promise and efficacy for vehicle localisation. It is impervious to all weather types and lighting conditions. However the complexity of the interactions between millimetre radar wave and the physical environment makes it a challenging domain. Infrastructure-fr… ▽ More This paper is about localising a vehicle in an overhead image using FMCW radar mounted on a ground vehicle. FMCW radar offers extraordinary promise and efficacy for vehicle localisation. It is impervious to all weather types and lighting conditions. However the complexity of the interactions between millimetre radar wave and the physical environment makes it a challenging domain. Infrastructure-free large-scale radar-based localisation is in its infancy. Typically here a map is built and suitable techniques, compatible with the nature of sensor, are brought to bear. In this work we eschew the need for a radar-based map; instead we simply use an overhead image -- a resource readily available everywhere. This paper introduces a method that not only naturally deals with the complexity of the signal type but does so in the context of cross modal processing. △ Less

Submitted 6 February, 2020; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: Accepted to IEEE Robotics and Automation Letters (RA-L)

arXiv:1804.04976 [pdf, other]

doi 10.1109/TETC.2020.3027454

Online Fall Detection using Recurrent Neural Networks

Authors: Mirto Musci, Daniele De Martini, Nicola Blago, Tullio Facchinetti, Marco Piastra

Abstract: Unintentional falls can cause severe injuries and even death, especially if no immediate assistance is given. The aim of Fall Detection Systems (FDSs) is to detect an occurring fall. This information can be used to trigger the necessary assistance in case of injury. This can be done by using either ambient-based sensors, e.g. cameras, or wearable devices. The aim of this work is to study the techn… ▽ More Unintentional falls can cause severe injuries and even death, especially if no immediate assistance is given. The aim of Fall Detection Systems (FDSs) is to detect an occurring fall. This information can be used to trigger the necessary assistance in case of injury. This can be done by using either ambient-based sensors, e.g. cameras, or wearable devices. The aim of this work is to study the technical aspects of FDSs based on wearable devices and artificial intelligence techniques, in particular Deep Learning (DL), to implement an effective algorithm for on-line fall detection. The proposed classifier is based on a Recurrent Neural Network (RNN) model with underlying Long Short-Term Memory (LSTM) blocks. The method is tested on the publicly available SisFall dataset, with extended annotation, and compared with the results obtained by the SisFall authors. △ Less

Submitted 13 April, 2018; originally announced April 2018.

Comments: 6 pages, ICRA 2018

Showing 1–45 of 45 results for author: De Martini, D