Search | arXiv e-print repository

VICI: VLM-Instructed Cross-view Image-localisation

Authors: Xiaohan Zhang, Tavis Shore, Chen Chen, Oscar Mendez, Simon Hadfield, Safwan Wshah

Abstract: In this paper, we present a high-performing solution to the UAVM 2025 Challenge, which focuses on matching narrow FOV street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic s… ▽ More In this paper, we present a high-performing solution to the UAVM 2025 Challenge, which focuses on matching narrow FOV street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic street-level queries; instead, queries typically consist of limited-FOV images captured with unknown camera parameters. Our work prioritises discovering the highest achievable performance under these constraints, pushing the limits of existing architectures. Our method begins by retrieving candidate satellite image embeddings for a given query, followed by a re-ranking stage that selectively enhances retrieval accuracy within the top candidates. This two-stage approach enables more precise matching, even under the significant viewpoint and scale variations inherent in the task. Through experimentation, we demonstrate that our approach achieves competitive results -specifically attaining R@1 and R@10 retrieval rates of \topone\% and \topten\% respectively. This underscores the potential of optimised retrieval and re-ranking strategies in advancing practical geo-localisation performance. Code is available at https://github.com/tavisshore/VICI. △ Less

Submitted 5 July, 2025; originally announced July 2025.

arXiv:2506.04351 [pdf, ps, other]

HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting

Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

Abstract: 3D human generation is an important problem with a wide range of applications in computer vision and graphics. Despite recent progress in generative AI such as diffusion models or rendering methods like Neural Radiance Fields or Gaussian Splatting, controlling the generation of accurate 3D humans from text prompts remains an open challenge. Current methods struggle with fine detail, accurate rende… ▽ More 3D human generation is an important problem with a wide range of applications in computer vision and graphics. Despite recent progress in generative AI such as diffusion models or rendering methods like Neural Radiance Fields or Gaussian Splatting, controlling the generation of accurate 3D humans from text prompts remains an open challenge. Current methods struggle with fine detail, accurate rendering of hands and faces, human realism, and controlability over appearance. The lack of diversity, realism, and annotation in human image data also remains a challenge, hindering the development of a foundational 3D human model. We present a weakly supervised pipeline that tries to address these challenges. In the first step, we generate a photorealistic human image dataset with controllable attributes such as appearance, race, gender, etc using a state-of-the-art image diffusion model. Next, we propose an efficient mapping approach from image features to 3D point clouds using a transformer-based architecture. Finally, we close the loop by training a point-cloud diffusion model that is conditioned on the same text prompts used to generate the original samples. We demonstrate orders-of-magnitude speed-ups in 3D human generation compared to the state-of-the-art approaches, along with significantly improved text-prompt alignment, realism, and rendering quality. We will make the code and dataset available. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2505.02108 [pdf, other]

SignSplat: Rendering Sign Language via Gaussian Splatting

Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

Abstract: State-of-the-art approaches for conditional human body rendering via Gaussian splatting typically focus on simple body motions captured from many views. This is often in the context of dancing or walking. However, for more complex use cases, such as sign language, we care less about large body motion and more about subtle and complex motions of the hands and face. The problems of building high fid… ▽ More State-of-the-art approaches for conditional human body rendering via Gaussian splatting typically focus on simple body motions captured from many views. This is often in the context of dancing or walking. However, for more complex use cases, such as sign language, we care less about large body motion and more about subtle and complex motions of the hands and face. The problems of building high fidelity models are compounded by the complexity of capturing multi-view data of sign. The solution is to make better use of sequence data, ensuring that we can overcome the limited information from only a few views by exploiting temporal variability. Nevertheless, learning from sequence-level data requires extremely accurate and consistent model fitting to ensure that appearance is consistent across complex motions. We focus on how to achieve this, constraining mesh parameters to build an accurate Gaussian splatting framework from few views capable of modelling subtle human motion. We leverage regularization techniques on the Gaussian parameters to mitigate overfitting and rendering artifacts. Additionally, we propose a new adaptive control method to densify Gaussians and prune splat points on the mesh surface. To demonstrate the accuracy of our approach, we render novel sequences of sign language video, building on neural machine translation approaches to sign stitching. On benchmark datasets, our approach achieves state-of-the-art performance; and on highly articulated and complex sign language motion, we significantly outperform competing approaches. △ Less

Submitted 4 May, 2025; originally announced May 2025.

arXiv:2505.02079 [pdf, other]

HandOcc: NeRF-based Hand Rendering with Occupancy Networks

Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

Abstract: We propose HandOcc, a novel framework for hand rendering based upon occupancy. Popular rendering methods such as NeRF are often combined with parametric meshes to provide deformable hand models. However, in doing so, such approaches present a trade-off between the fidelity of the mesh and the complexity and dimensionality of the parametric model. The simplicity of parametric mesh structures is app… ▽ More We propose HandOcc, a novel framework for hand rendering based upon occupancy. Popular rendering methods such as NeRF are often combined with parametric meshes to provide deformable hand models. However, in doing so, such approaches present a trade-off between the fidelity of the mesh and the complexity and dimensionality of the parametric model. The simplicity of parametric mesh structures is appealing, but the underlying issue is that it binds methods to mesh initialization, making it unable to generalize to objects where a parametric model does not exist. It also means that estimation is tied to mesh resolution and the accuracy of mesh fitting. This paper presents a pipeline for meshless 3D rendering, which we apply to the hands. By providing only a 3D skeleton, the desired appearance is extracted via a convolutional model. We do this by exploiting a NeRF renderer conditioned upon an occupancy-based representation. The approach uses the hand occupancy to resolve hand-to-hand interactions further improving results, allowing fast rendering, and excellent hand appearance transfer. On the benchmark InterHand2.6M dataset, we achieved state-of-the-art results. △ Less

Submitted 4 May, 2025; originally announced May 2025.

arXiv:2505.00594 [pdf, other]

Decomposing graphs into stable and ordered parts

Authors: Hector Buffière, Patrice Ossona de Mendez

Abstract: Connections between structural graph theory and finite model theory recently gained a lot of attention. In this setting, many interesting question remain on the properties of hereditary dependent (NIP) classes of graphs, in particular related to first-order transductions. Motivated by Simon's decomposition theorem of dependent types into a stable part and a distal (order-like) part, we conjectur… ▽ More Connections between structural graph theory and finite model theory recently gained a lot of attention. In this setting, many interesting question remain on the properties of hereditary dependent (NIP) classes of graphs, in particular related to first-order transductions. Motivated by Simon's decomposition theorem of dependent types into a stable part and a distal (order-like) part, we conjecture that every hereditary dependent class of graphs is transduction-equivalent to a hereditary dependent class of partially ordered graphs, where the cover graph of the partial order has bounded treewidth and the unordered graph is (edge) stable. In this paper, we consider the first non-trivial case (classes with bounded linear cliquewidth) and prove that the conjecture holds in a strong form, where the cover graph of the partial order has bounded pathwidth. Then, we extend our study to classes that admit bounded-size bounded linear cliquewidth covers, and prove that the conjecture holds for these classes, too. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2503.18614 [pdf, other]

Path degeneracy and applications

Authors: Y. Lin, P. Ossona de Mendez

Abstract: In this work, we relate girth and path-degeneracy in classes with sub-exponential expansion, with explicit bounds for classes with polynomial expansion and proper minor-closed classes that are tight up to a constant factor (and tight up to second order terms if a classical conjecture on existence of $g$-cages is verified). As an application, we derive bounds on the generalized acyclic indices, on… ▽ More In this work, we relate girth and path-degeneracy in classes with sub-exponential expansion, with explicit bounds for classes with polynomial expansion and proper minor-closed classes that are tight up to a constant factor (and tight up to second order terms if a classical conjecture on existence of $g$-cages is verified). As an application, we derive bounds on the generalized acyclic indices, on the generalized arboricities, and on the weak coloring numbers of high-girth graphs in such classes. Along the way, we prove a conjecture proposed in [T.~Bartnicki et al., Generalized arboricity of graphs with large girth, Discrete Mathematics 342 (2019), no.~5, 1343--1350.], which asserts that, for every integer $k$, there is an integer $g(p,k)$ such that every $K_k$ minor-free graph with girth at least $g(p,k)$ has $p$-arboricity at most $p+1$. △ Less

Submitted 24 March, 2025; originally announced March 2025.

arXiv:2412.12849 [pdf, other]

HyperGS: Hyperspectral 3D Gaussian Splatting

Authors: Christopher Thirgood, Oscar Mendez, Erin Chao Ling, Jon Storey, Simon Hadfield

Abstract: We introduce HyperGS, a novel framework for Hyperspectral Novel View Synthesis (HNVS), based on a new latent 3D Gaussian Splatting (3DGS) technique. Our approach enables simultaneous spatial and spectral renderings by encoding material properties from multi-view 3D hyperspectral datasets. HyperGS reconstructs high-fidelity views from arbitrary perspectives with improved accuracy and speed, outperf… ▽ More We introduce HyperGS, a novel framework for Hyperspectral Novel View Synthesis (HNVS), based on a new latent 3D Gaussian Splatting (3DGS) technique. Our approach enables simultaneous spatial and spectral renderings by encoding material properties from multi-view 3D hyperspectral datasets. HyperGS reconstructs high-fidelity views from arbitrary perspectives with improved accuracy and speed, outperforming currently existing methods. To address the challenges of high-dimensional data, we perform view synthesis in a learned latent space, incorporating a pixel-wise adaptive density function and a pruning technique for increased training stability and efficiency. Additionally, we introduce the first HNVS benchmark, implementing a number of new baselines based on recent SOTA RGB-NVS techniques, alongside the small number of prior works on HNVS. We demonstrate HyperGS's robustness through extensive evaluation of real and simulated hyperspectral scenes with a 14db accuracy improvement upon previously published models. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2411.16940 [pdf, other]

The Radiance of Neural Fields: Democratizing Photorealistic and Dynamic Robotic Simulation

Authors: Georgina Nuthall, Richard Bowden, Oscar Mendez

Abstract: As robots increasingly coexist with humans, they must navigate complex, dynamic environments rich in visual information and implicit social dynamics, like when to yield or move through crowds. Addressing these challenges requires significant advances in vision-based sensing and a deeper understanding of socio-dynamic factors, particularly in tasks like navigation. To facilitate this, robotics rese… ▽ More As robots increasingly coexist with humans, they must navigate complex, dynamic environments rich in visual information and implicit social dynamics, like when to yield or move through crowds. Addressing these challenges requires significant advances in vision-based sensing and a deeper understanding of socio-dynamic factors, particularly in tasks like navigation. To facilitate this, robotics researchers need advanced simulation platforms offering dynamic, photorealistic environments with realistic actors. Unfortunately, most existing simulators fall short, prioritizing geometric accuracy over visual fidelity, and employing unrealistic agents with fixed trajectories and low-quality visuals. To overcome these limitations, we developed a simulator that incorporates three essential elements: (1) photorealistic neural rendering of environments, (2) neurally animated human entities with behavior management, and (3) an ego-centric robotic agent providing multi-sensor output. By utilizing advanced neural rendering techniques in a dual-NeRF simulator, our system produces high-fidelity, photorealistic renderings of both environments and human entities. Additionally, it integrates a state-of-the-art Social Force Model to model dynamic human-human and human-robot interactions, creating the first photorealistic and accessible human-robot simulation system powered by neural rendering. △ Less

Submitted 25 November, 2024; originally announced November 2024.

Comments: 8 pages, 5 figures

arXiv:2411.15742 [pdf, other]

PEnG: Pose-Enhanced Geo-Localisation

Authors: Tavis Shore, Oscar Mendez, Simon Hadfield

Abstract: Cross-view Geo-localisation is typically performed at a coarse granularity, because densely sampled satellite image patches overlap heavily. This heavy overlap would make disambiguating patches very challenging. However, by opting for sparsely sampled patches, prior work has placed an artificial upper bound on the localisation accuracy that is possible. Even a perfect oracle system cannot achieve… ▽ More Cross-view Geo-localisation is typically performed at a coarse granularity, because densely sampled satellite image patches overlap heavily. This heavy overlap would make disambiguating patches very challenging. However, by opting for sparsely sampled patches, prior work has placed an artificial upper bound on the localisation accuracy that is possible. Even a perfect oracle system cannot achieve accuracy greater than the average separation of the tiles. To solve this limitation, we propose combining cross-view geo-localisation and relative pose estimation to increase precision to a level practical for real-world application. We develop PEnG, a 2-stage system which first predicts the most likely edges from a city-scale graph representation upon which a query image lies. It then performs relative pose estimation within these edges to determine a precise position. PEnG presents the first technique to utilise both viewpoints available within cross-view geo-localisation datasets to enhance precision to a sub-metre level, with some examples achieving centimetre level accuracy. Our proposed ensemble achieves state-of-the-art precision - with relative Top-5m retrieval improvements on previous works of 213%. Decreasing the median euclidean distance error by 96.90% from the previous best of 734m down to 22.77m, when evaluating with 90 degree horizontal FOV images. Code will be made available: tavisshore.co.uk/PEnG △ Less

Submitted 24 November, 2024; originally announced November 2024.

Comments: 8 pages, 6 figures

arXiv:2409.15514 [pdf, other]

SpaGBOL: Spatial-Graph-Based Orientated Localisation

Authors: Tavis Shore, Oscar Mendez, Simon Hadfield

Abstract: Cross-View Geo-Localisation within urban regions is challenging in part due to the lack of geo-spatial structuring within current datasets and techniques. We propose utilising graph representations to model sequences of local observations and the connectivity of the target location. Modelling as a graph enables generating previously unseen sequences by sampling with new parameter configurations. T… ▽ More Cross-View Geo-Localisation within urban regions is challenging in part due to the lack of geo-spatial structuring within current datasets and techniques. We propose utilising graph representations to model sequences of local observations and the connectivity of the target location. Modelling as a graph enables generating previously unseen sequences by sampling with new parameter configurations. To leverage this newly available information, we propose a GNN-based architecture, producing spatially strong embeddings and improving discriminability over isolated image embeddings. We outline SpaGBOL, introducing three novel contributions. 1) The first graph-structured dataset for Cross-View Geo-Localisation, containing multiple streetview images per node to improve generalisation. 2) Introducing GNNs to the problem, we develop the first system that exploits the correlation between node proximity and feature similarity. 3) Leveraging the unique properties of the graph representation - we demonstrate a novel retrieval filtering approach based on neighbourhood bearings. SpaGBOL achieves state-of-the-art accuracies on the unseen test graph - with relative Top-1 retrieval improvements on previous techniques of 11%, and 50% when filtering with Bearing Vector Matching on the SpaGBOL dataset. △ Less

Submitted 3 December, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

arXiv:2404.07240 [pdf, other]

Interactions Between Brauer Configuration Algebras and Classical Cryptanalysis to Analyze Bach's Canons

Authors: Agustín Moreno Cañadas, Pedro Fernando Fernández Espinosa, José Gregorio Rodríguez Nieto, Odette M. Mendez, Ricardo Hugo Arteaga-Bastidas

Abstract: Since their introduction, Brauer configuration algebras (BCAs) and their specialized messages have helped research in several fields of mathematics and sciences. This paper deals with a new perspective on using such algebras as a theoretical framework in classical cryptography and music theory. It is proved that some block cyphers define labeled Brauer configuration algebras. Particularly, the dim… ▽ More Since their introduction, Brauer configuration algebras (BCAs) and their specialized messages have helped research in several fields of mathematics and sciences. This paper deals with a new perspective on using such algebras as a theoretical framework in classical cryptography and music theory. It is proved that some block cyphers define labeled Brauer configuration algebras. Particularly, the dimension of the BCA associated with a ciphertext-only attack of the Vigenere cryptosystem is given by the corresponding key's length and the captured ciphertext's coincidence index. On the other hand, historically, Bach's canons have been considered solved music puzzles. However, due to how Bach posed such canons, the question remains whether their solutions are only limited to musical issues. This paper gives alternative solutions based on the theory of Brauer configuration algebras to some of the puzzle canons proposed by Bach in his Musical Offering (BWV 1079) and the canon â 4 Voc: Perpetuus (BWV 1073). Specifically to the canon â 6 Voc (BWV 1076), canon 1 â2 (also known as the crab canon), and canon â4 Quaerendo Invenietis. These solutions are obtained by interpreting such canons as ciphertexts (via route and transposition cyphers) of some specialized Brauer messages. In particular, it is noted that the structure or form of the notes used in such canons can be described via the shape of the most used symbols in Bach's works. △ Less

Submitted 25 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: 50 pages

MSC Class: 00A65; 16G20; 16G30; 16G60

arXiv:2404.05414 [pdf, other]

Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks

Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

Abstract: 3D hand pose estimation from images has seen considerable interest from the literature, with new methods improving overall 3D accuracy. One current challenge is to address hand-to-hand interaction where self-occlusions and finger articulation pose a significant problem to estimation. Little work has applied physical constraints that minimize the hand intersections that occur as a result of noisy e… ▽ More 3D hand pose estimation from images has seen considerable interest from the literature, with new methods improving overall 3D accuracy. One current challenge is to address hand-to-hand interaction where self-occlusions and finger articulation pose a significant problem to estimation. Little work has applied physical constraints that minimize the hand intersections that occur as a result of noisy estimation. This work addresses the intersection of hands by exploiting an occupancy network that represents the hand's volume as a continuous manifold. This allows us to model the probability distribution of points being inside a hand. We designed an intersection loss function to minimize the likelihood of hand-to-point intersections. Moreover, we propose a new hand mesh parameterization that is superior to the commonly used MANO model in many respects including lower mesh complexity, underlying 3D skeleton extraction, watertightness, etc. On the benchmark InterHand2.6M dataset, the models trained using our intersection loss achieve better results than the state-of-the-art by significantly decreasing the number of hand intersections while lowering the mean per-joint positional error. Additionally, we demonstrate superior performance for 3D hand uplift on Re:InterHand and SMILE datasets and show reduced hand-to-hand intersections for complex domains such as sign-language pose estimation. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2312.15363 [pdf, other]

BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation

Authors: Tavis Shore, Simon Hadfield, Oscar Mendez

Abstract: Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effec… ▽ More Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effectively in GNSS-denied environments. Current research employs a variety of techniques to reduce the domain gap such as applying polar transforms to aerial images or synthesising between perspectives. However, these approaches generally rely on having a 360° field of view, limiting real-world feasibility. We propose BEV-CV, an approach introducing two key novelties with a focus on improving the real-world viability of cross-view geo-localisation. Firstly bringing ground-level images into a semantic Birds-Eye-View before matching embeddings, allowing for direct comparison with aerial image representations. Secondly, we adapt datasets into application realistic format - limited Field-of-View images aligned to vehicle direction. BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70° crops of CVUSA and CVACT by 23% and 24% respectively. Also decreasing computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities. △ Less

Submitted 23 September, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

Comments: 8 pages, 6 figures

arXiv:2308.09525 [pdf, other]

Improving 3D Pose Estimation for Sign Language

Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

Abstract: This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using… ▽ More This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using neural networks to predict both the joint rotations and bone lengths. These predictions are then combined with skeletal constraints using an FK layer implemented as a network layer in PyTorch. The result is a fast and accurate approach to the estimation of 3D skeletal pose. Through quantitative and qualitative evaluation, we demonstrate the method is significantly more accurate than MediaPipe in terms of both per joint positional error and visual appearance. Furthermore, we demonstrate generalization over different datasets. The implementation in PyTorch runs at between 100-200 milliseconds per image (including CNN detection) using CPU only. △ Less

Submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.09523 [pdf, other]

Denoising Diffusion for 3D Hand Pose Estimation from Images

Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

Abstract: Hand pose estimation from a single image has many applications. However, approaches to full 3D body pose estimation are typically trained on day-to-day activities or actions. As such, detailed hand-to-hand interactions are poorly represented, especially during motion. We see this in the failure cases of techniques such as OpenPose or MediaPipe. However, accurate hand pose estimation is crucial for… ▽ More Hand pose estimation from a single image has many applications. However, approaches to full 3D body pose estimation are typically trained on day-to-day activities or actions. As such, detailed hand-to-hand interactions are poorly represented, especially during motion. We see this in the failure cases of techniques such as OpenPose or MediaPipe. However, accurate hand pose estimation is crucial for many applications where the global body motion is less important than accurate hand pose estimation. This paper addresses the problem of 3D hand pose estimation from monocular images or sequences. We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes. Moreover, we enforce kinematic constraints to ensure realistic poses are generated by incorporating an explicit forward kinematic layer as part of the network. The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D. However, when sequence data is available, we add a Transformer module over a temporal window of consecutive frames to refine the results, overcoming jittering and further increasing accuracy. The method is quantitatively and qualitatively evaluated showing state-of-the-art robustness, generalization, and accuracy on several different datasets. △ Less

Submitted 18 August, 2023; originally announced August 2023.

arXiv:2307.09065 [pdf, other]

Learning Adaptive Neighborhoods for Graph Neural Networks

Authors: Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden

Abstract: Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiabl… ▽ More Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiable graph generator which builds graph topologies where each node selects both its neighborhood and its size. Our module can be readily integrated into existing pipelines involving graph convolution operations, replacing the predetermined or existing adjacency matrix with one that is learned, and optimized, as part of the general objective. As such it is applicable to any GCN. We integrate our module into trajectory prediction, point cloud classification and node classification pipelines resulting in improved accuracy over other structure-learning methods across a wide range of datasets and GCN backbones. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: ICCV 2023

arXiv:2306.02195 [pdf, other]

Subchromatic numbers of powers of graphs with excluded minors

Authors: Pedro P. Cortés, Pankaj Kumar, Benjamin Moore, Patrice Ossona de Mendez, Daniel A. Quiroz

Abstract: A $k$-subcolouring of a graph $G$ is a function $f:V(G) \to \{0,\ldots,k-1\}$ such that the set of vertices coloured $i$ induce a disjoint union of cliques. The subchromatic number, $χ_{\textrm{sub}}(G)$, is the minimum $k$ such that $G$ admits a $k$-subcolouring. Nešetřil, Ossona de Mendez, Pilipczuk, and Zhu (2020), recently raised the problem of finding tight upper bounds for… ▽ More A $k$-subcolouring of a graph $G$ is a function $f:V(G) \to \{0,\ldots,k-1\}$ such that the set of vertices coloured $i$ induce a disjoint union of cliques. The subchromatic number, $χ_{\textrm{sub}}(G)$, is the minimum $k$ such that $G$ admits a $k$-subcolouring. Nešetřil, Ossona de Mendez, Pilipczuk, and Zhu (2020), recently raised the problem of finding tight upper bounds for $χ_{\textrm{sub}}(G^2)$ when $G$ is planar. We show that $χ_{\textrm{sub}}(G^2)\le 43$ when $G$ is planar, improving their bound of 135. We give even better bounds when the planar graph $G$ has larger girth. Moreover, we show that $χ_{\textrm{sub}}(G^{3})\le 95$, improving the previous bound of 364. For these we adapt some recent techniques of Almulhim and Kierstead (2022), while also extending the decompositions of triangulated planar graphs of Van den Heuvel, Ossona de Mendez, Quiroz, Rabinovich and Siebertz (2017), to planar graphs of arbitrary girth. Note that these decompositions are the precursors of the graph product structure theorem of planar graphs. We give improved bounds for $χ_{\textrm{sub}}(G^p)$ for all $p$, whenever $G$ has bounded treewidth, bounded simple treewidth, bounded genus, or excludes a clique or biclique as a minor. For this we introduce a family of parameters which form a gradation between the strong and the weak colouring numbers. We give upper bounds for these parameters for graphs coming from such classes. Finally, we give a 2-approximation algorithm for the subchromatic number of graphs coming from any fixed class with bounded layered cliquewidth. In particular, this implies a 2-approximation algorithm for the subchromatic number of powers $G^p$ of graphs coming from any fixed class with bounded layered treewidth (such as the class of planar graphs). This algorithm works even if the power $p$ and the graph $G$ is unknown. △ Less

Submitted 29 January, 2024; v1 submitted 3 June, 2023; originally announced June 2023.

Comments: 21 pages, 2 figures, version 2 incorporates referee comments

MSC Class: 05C15; 05C10; 05C83

arXiv:2211.03704 [pdf, other]

Modulo-Counting First-Order Logic on Bounded Expansion Classes

Authors: J. Nesetril, P. Ossona de Mendez, S. Siebertz

Abstract: We prove that, on bounded expansion classes, every first-order formula with modulo counting is equivalent, in a linear-time computable monadic expansion, to an existential first-order formula. As a consequence, we derive, on bounded expansion classes, that first-order transductions with modulo counting have the same encoding power as existential first-order transductions. Also, modulo-counting fir… ▽ More We prove that, on bounded expansion classes, every first-order formula with modulo counting is equivalent, in a linear-time computable monadic expansion, to an existential first-order formula. As a consequence, we derive, on bounded expansion classes, that first-order transductions with modulo counting have the same encoding power as existential first-order transductions. Also, modulo-counting first-order model checking and computation of the size of sets definable in modulo-counting first-order logic can be achieved in linear time on bounded expansion classes. As an application, we prove that a class has structurally bounded expansion if and only if it is a class of bounded depth vertex-minors of graphs in a bounded expansion class. We also show how our results can be used to implement fast matrix calculus on bounded expansion matrices over a finite field. △ Less

Submitted 23 March, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: submitted to CSGT2022 special issue

arXiv:2209.12023 [pdf, other]

Twin-width V: linear minors, modular counting, and matrix multiplication

Authors: Édouard Bonnet, Ugo Giocanti, Patrice Ossona de Mendez, Stéphan Thomassé

Abstract: We continue developing the theory around the twin-width of totally ordered binary structures, initiated in the previous paper of the series. We first introduce the notion of parity and linear minors of a matrix, which consists of iteratively replacing consecutive rows or consecutive columns with a linear combination of them. We show that a matrix class has bounded twin-width if and only if its lin… ▽ More We continue developing the theory around the twin-width of totally ordered binary structures, initiated in the previous paper of the series. We first introduce the notion of parity and linear minors of a matrix, which consists of iteratively replacing consecutive rows or consecutive columns with a linear combination of them. We show that a matrix class has bounded twin-width if and only if its linear-minor closure does not contain all matrices. We observe that the fixed-parameter tractable algorithm for first-order model checking on structures given with an $O(1)$-sequence (certificate of bounded twin-width) and the fact that first-order transductions of bounded twin-width classes have bounded twin-width, both established in Twin-width I, extend to first-order logic with modular counting quantifiers. We make explicit a win-win argument obtained as a by-product of Twin-width IV, and somewhat similar to bidimensionality, that we call rank-bidimensionality. Armed with the above-mentioned extension to modular counting, we show that the twin-width of the product of two conformal matrices $A, B$ over a finite field is bounded by a function of the twin-width of $A$, of $B$, and of the size of the field. Furthermore, if $A$ and $B$ are $n \times n$ matrices of twin-width $d$ over $\mathbb F_q$, we show that $AB$ can be computed in time $O_{d,q}(n^2 \log n)$. We finally present an ad hoc algorithm to efficiently multiply two matrices of bounded twin-width, with a single-exponential dependence in the twin-width bound: If the inputs are given in a compact tree-like form, called twin-decomposition (of width $d$), then two $n \times n$ matrices $A, B$ over $\mathbb F_2$, a twin-decomposition of $AB$ with width $2^{d+o(d)}$ can be computed in time $4^{d+o(d)}n$ (resp. $4^{d+o(d)}n^{1+\varepsilon}$), and entries queried in doubly-logarithmic (resp. constant) time. △ Less

Submitted 24 September, 2022; originally announced September 2022.

Comments: 45 pages, 9 figures

MSC Class: 68W01 ACM Class: F.2.2

arXiv:2209.11229 [pdf, other]

Decomposition horizons and a characterization of stable hereditary classes of graphs

Authors: Samuel Braunfeld, Jaroslav Nešetřil, Patrice Ossona de Mendez, Sebastian Siebertz

Abstract: The notions of bounded-size and quasibounded-size decompositions with bounded treedepth base classes are central to the structural theory of graph sparsity introduced by two of the authors years ago, and provide a characterization of both classes with bounded expansions and nowhere dense classes. Strong connections of this theory with model theory led to considering first-order transductions, whic… ▽ More The notions of bounded-size and quasibounded-size decompositions with bounded treedepth base classes are central to the structural theory of graph sparsity introduced by two of the authors years ago, and provide a characterization of both classes with bounded expansions and nowhere dense classes. Strong connections of this theory with model theory led to considering first-order transductions, which are logically defined graph transformations, and to initiate a comparative study of combinatorial and model theoretical properties of graph classes, with an emphasis on the model theoretical notions of dependence (or NIP) and stability. In this paper, we first prove that every hereditary class with quasibounded-size decompositions with dependent (resp.\ stable) base classes is itself dependent (resp.\ stable). This result is obtained in a more general study of ``decomposition horizons'', which are class properties compatible with quasibounded-size decompositions. We deduce that hereditary classes with quasibounded-size decompositions with bounded shrubdepth base classes are stable. In the second part of the paper, we prove the converse. Thus, we characterize stable hereditary classes of graphs as those hereditary classes that admit quasibounded-size decompositions with bounded shrubdepth base classes. This result is obtained by proving that every hereditary stable class of graphs admits almost nowhere dense quasi-bush representations, thus answering positively a conjecture of Dreier et al. These results have several consequences. For example, we show that every graph $G$ in a stable, hereditary class of graphs $\mathscr C$ has a clique or a stable set of size $Ω_{\mathscr C,ε}(|G|^{1/2-ε})$, for every $ε>0$, which is tight in the sense that it cannot be improved to $Ω_{\mathscr C}(|G|^{1/2})$. △ Less

Submitted 22 December, 2024; v1 submitted 15 September, 2022; originally announced September 2022.

arXiv:2208.14412 [pdf, other]

doi 10.46298/lmcs-21(2:26)2025

On first-order transductions of classes of graphs

Authors: Samuel Braunfeld, Jaroslav Nešetřil, Patrice Ossona de Mendez, Sebastian Siebertz

Abstract: We study various aspects of the first-order transduction quasi-order on graph classes, which provides a way of measuring the relative complexity of graph classes based on whether one can encode the other using a formula of first-order (FO) logic. In contrast with the conjectured simplicity of the transduction quasi-order for monadic second-order logic, the FO-transduction quasi-order is very compl… ▽ More We study various aspects of the first-order transduction quasi-order on graph classes, which provides a way of measuring the relative complexity of graph classes based on whether one can encode the other using a formula of first-order (FO) logic. In contrast with the conjectured simplicity of the transduction quasi-order for monadic second-order logic, the FO-transduction quasi-order is very complex, and many standard properties from structural graph theory and model theory naturally appear in it. We prove a local normal form for transductions among other general results and constructions, which we illustrate via several examples and via the characterizations of the transductions of some simple classes. We then turn to various aspects of the quasi-order, including the (non-)existence of minimum and maximum classes for certain properties, the strictness of the pathwidth hierarchy, the fact that the quasi-order is not a lattice, and the role of weakly sparse classes in the quasi-order. △ Less

Submitted 19 June, 2025; v1 submitted 30 August, 2022; originally announced August 2022.

Journal ref: Logical Methods in Computer Science, Volume 21, Issue 2 (June 23, 2025) lmcs:9981

arXiv:2207.02669 [pdf, other]

Distributed domination on sparse graph classes

Authors: Ozan Heydt, Simeon Kublenz, Patrice Ossona de Mendez, Sebastian Siebertz, Alexandre Vigny

Abstract: We show that the dominating set problem admits a constant factor approximation in a constant number of rounds in the LOCAL model of distributed computing on graph classes with bounded expansion. This generalizes a result of Czygrinow et al. for graphs with excluded topological minors to very general classes of uniformly sparse graphs. We demonstrate how our general algorithm can be modified and fi… ▽ More We show that the dominating set problem admits a constant factor approximation in a constant number of rounds in the LOCAL model of distributed computing on graph classes with bounded expansion. This generalizes a result of Czygrinow et al. for graphs with excluded topological minors to very general classes of uniformly sparse graphs. We demonstrate how our general algorithm can be modified and fine-tuned to compute an ($11+ε$)-approximation (for any $ε>0)$ of a minimum dominating set on planar graphs. This improves on the previously best known approximation factor of 52 on planar graphs, which was achieved by an elegant and simple algorithm of Lenzen et al. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2111.14506, arXiv:2012.02701

arXiv:2206.12946 [pdf, other]

AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation

Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden

Abstract: Motion estimation approaches typically employ sensor fusion techniques, such as the Kalman Filter, to handle individual sensor failures. More recently, deep learning-based fusion approaches have been proposed, increasing the performance and requiring less model-specific implementations. However, current deep fusion approaches often assume that sensors are synchronised, which is not always practica… ▽ More Motion estimation approaches typically employ sensor fusion techniques, such as the Kalman Filter, to handle individual sensor failures. More recently, deep learning-based fusion approaches have been proposed, increasing the performance and requiring less model-specific implementations. However, current deep fusion approaches often assume that sensors are synchronised, which is not always practical, especially for low-cost hardware. To address this limitation, in this work, we propose AFT-VO, a novel transformer-based sensor fusion architecture to estimate VO from multiple sensors. Our framework combines predictions from asynchronous multi-view cameras and accounts for the time discrepancies of measurements coming from different sources. Our approach first employs a Mixture Density Network (MDN) to estimate the probability distributions of the 6-DoF poses for every camera in the system. Then a novel transformer-based fusion module, AFT-VO, is introduced, which combines these asynchronous pose estimations, along with their confidences. More specifically, we introduce Discretiser and Source Encoding techniques which enable the fusion of multi-source asynchronous signals. We evaluate our approach on the popular nuScenes and KITTI datasets. Our experiments demonstrate that multi-view fusion for VO estimation provides robust and accurate trajectories, outperforming the state of the art in both challenging weather and lighting conditions. △ Less

Submitted 16 September, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

arXiv:2205.07716 [pdf, other]

Generalizing to New Tasks via One-Shot Compositional Subgoals

Authors: Xihan Bian, Oscar Mendez, Simon Hadfield

Abstract: The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future "General AI". Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptatio… ▽ More The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future "General AI". Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptation to new tasks, through trial and error learning. However, this can be challenging for complex tasks which require many timesteps or large numbers of subtasks to complete. These "long horizon" tasks suffer from sample inefficiency and can require extremely long training times before the agent can learn to perform the necessary longterm planning. In this work, we introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. These subgoals are recalculated at each step using compositional arithmetic in a learned latent representation space. In addition to improving learning efficiency for standard long-term tasks, this approach also makes it possible to perform one-shot generalization to previously unseen tasks, given only a single reference trajectory for the task in a different environment. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%. △ Less

Submitted 25 July, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: Present at ICRA 2022 "Compositional Robotics: Mathematics and Tools"

arXiv:2205.03130 [pdf, other]

SKILL-IL: Disentangling Skill and Knowledge in Multitask Imitation Learning

Authors: Bian Xihan, Oscar Mendez, Simon Hadfield

Abstract: In this work, we introduce a new perspective for learning transferable content in multi-task imitation learning. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. We take inspiration from this and hypothesize the latent memory of a policy network can be disentangled into two partitions. These contain… ▽ More In this work, we introduce a new perspective for learning transferable content in multi-task imitation learning. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. We take inspiration from this and hypothesize the latent memory of a policy network can be disentangled into two partitions. These contain either the knowledge of the environmental context for the task or the generalizable skill needed to solve the task. This allows improved training efficiency and better generalization over previously unseen combinations of skills in the same environment, and the same task in unseen environments. We used the proposed approach to train a disentangled agent for two different multi-task IL environments. In both cases we out-performed the SOTA by 30% in task success rate. We also demonstrated this for navigation on a real robot. △ Less

Submitted 26 July, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

Comments: Submitted to IROS 2022, under review

arXiv:2204.02944 [pdf, other]

"The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

Authors: Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden

Abstract: Estimating a semantically segmented bird's-eye-view (BEV) map from a single image has become a popular technique for autonomous control and navigation. However, they show an increase in localization error with distance from the camera. While such an increase in error is entirely expected - localization is harder at distance - much of the drop in performance can be attributed to the cues used by cu… ▽ More Estimating a semantically segmented bird's-eye-view (BEV) map from a single image has become a popular technique for autonomous control and navigation. However, they show an increase in localization error with distance from the camera. While such an increase in error is entirely expected - localization is harder at distance - much of the drop in performance can be attributed to the cues used by current texture-based models, in particular, they make heavy use of object-ground intersections (such as shadows), which become increasingly sparse and uncertain for distant objects. In this work, we address these shortcomings in BEV-mapping by learning the spatial relationship between objects in a scene. We propose a graph neural network which predicts BEV objects from a monocular image by spatially reasoning about an object within the context of other objects. Our approach sets a new state-of-the-art in BEV estimation from monocular images across three large-scale datasets, including a 50% relative improvement for objects on nuScenes. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: Accepted to CVPR 2022

arXiv:2203.16900 [pdf, other]

Transducing paths in graph classes with unbounded shrubdepth

Authors: Michał Pilipczuk, Patrice Ossona de Mendez, Sebastian Siebertz

Abstract: Transductions are a general formalism for expressing transformations of graphs (and more generally, of relational structures) in logic. We prove that a graph class $\mathscr{C}$ can be $\mathsf{FO}$-transduced from a class of bounded-height trees (that is, has bounded shrubdepth) if, and only if, from $\mathscr{C}$ one cannot $\mathsf{FO}$-transduce the class of all paths. This establishes one of… ▽ More Transductions are a general formalism for expressing transformations of graphs (and more generally, of relational structures) in logic. We prove that a graph class $\mathscr{C}$ can be $\mathsf{FO}$-transduced from a class of bounded-height trees (that is, has bounded shrubdepth) if, and only if, from $\mathscr{C}$ one cannot $\mathsf{FO}$-transduce the class of all paths. This establishes one of the three remaining open questions posed by Blumensath and Courcelle about the $\mathsf{MSO}$-transduction quasi-order, even in the stronger form that concerns $\mathsf{FO}$-transductions instead of $\mathsf{MSO}$-transductions. The backbone of our proof is a graph-theoretic statement that says the following: If a graph $G$ excludes a path, the bipartite complement of a path, and a half-graph as semi-induced subgraphs, then the vertex set of $G$ can be partitioned into a bounded number of parts so that every part induces a cograph of bounded height, and every pair of parts semi-induce a bi-cograph of bounded height. This statement may be of independent interest; for instance, it implies that the graphs in question form a class that is linearly $χ$-bounded. △ Less

Submitted 31 March, 2022; originally announced March 2022.

arXiv:2112.12818 [pdf, other]

doi 10.1109/ITSC48978.2021.9565079

Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty Estimation

Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden

Abstract: Visual Odometry (VO) estimation is an important source of information for vehicle state estimation and autonomous driving. Recently, deep learning based approaches have begun to appear in the literature. However, in the context of driving, single sensor based approaches are often prone to failure because of degraded image quality due to environmental factors, camera placement, etc. To address this… ▽ More Visual Odometry (VO) estimation is an important source of information for vehicle state estimation and autonomous driving. Recently, deep learning based approaches have begun to appear in the literature. However, in the context of driving, single sensor based approaches are often prone to failure because of degraded image quality due to environmental factors, camera placement, etc. To address this issue, we propose a deep sensor fusion framework which estimates vehicle motion using both pose and uncertainty estimations from multiple on-board cameras. We extract spatio-temporal feature representations from a set of consecutive images using a hybrid CNN - RNN model. We then utilise a Mixture Density Network (MDN) to estimate the 6-DoF pose as a mixture of distributions and a fusion module to estimate the final pose using MDN outputs from multi-cameras. We evaluate our approach on the publicly available, large scale autonomous vehicle dataset, nuScenes. The results show that the proposed fusion approach surpasses the state-of-the-art, and provides robust estimates and accurate trajectories compared to individual camera-based estimations. △ Less

Submitted 23 December, 2021; originally announced December 2021.

Journal ref: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 2944-2949

arXiv:2112.12812 [pdf, other]

doi 10.1109/IROS51168.2021.9636827

MDN-VO: Estimating Visual Odometry with Confidence

Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden

Abstract: Visual Odometry (VO) is used in many applications including robotics and autonomous systems. However, traditional approaches based on feature matching are computationally expensive and do not directly address failure cases, instead relying on heuristic methods to detect failure. In this work, we propose a deep learning-based VO model to efficiently estimate 6-DoF poses, as well as a confidence mod… ▽ More Visual Odometry (VO) is used in many applications including robotics and autonomous systems. However, traditional approaches based on feature matching are computationally expensive and do not directly address failure cases, instead relying on heuristic methods to detect failure. In this work, we propose a deep learning-based VO model to efficiently estimate 6-DoF poses, as well as a confidence model for these estimates. We utilise a CNN - RNN hybrid model to learn feature representations from image sequences. We then employ a Mixture Density Network (MDN) which estimates camera motion as a mixture of Gaussians, based on the extracted spatio-temporal representations. Our model uses pose labels as a source of supervision, but derives uncertainties in an unsupervised manner. We evaluate the proposed model on the KITTI and nuScenes datasets and report extensive quantitative and qualitative results to analyse the performance of both pose and uncertainty estimation. Our experiments show that the proposed model exceeds state-of-the-art performance in addition to detecting failure cases using the predicted pose uncertainty. △ Less

Submitted 23 December, 2021; originally announced December 2021.

Journal ref: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3528-3533

arXiv:2107.11857 [pdf, other]

Improving Robot Localisation by Ignoring Visual Distraction

Authors: Oscar Mendez, Matthew Vowels, Richard Bowden

Abstract: Attention is an important component of modern deep learning. However, less emphasis has been put on its inverse: ignoring distraction. Our daily lives require us to explicitly avoid giving attention to salient visual features that confound the task we are trying to accomplish. This visual prioritisation allows us to concentrate on important tasks while ignoring visual distractors. In this work,… ▽ More Attention is an important component of modern deep learning. However, less emphasis has been put on its inverse: ignoring distraction. Our daily lives require us to explicitly avoid giving attention to salient visual features that confound the task we are trying to accomplish. This visual prioritisation allows us to concentrate on important tasks while ignoring visual distractors. In this work, we introduce Neural Blindness, which gives an agent the ability to completely ignore objects or classes that are deemed distractors. More explicitly, we aim to render a neural network completely incapable of representing specific chosen classes in its latent space. In a very real sense, this makes the network "blind" to certain classes, allowing and agent to focus on what is important for a given task, and demonstrates how this can be used to improve localisation. △ Less

Submitted 25 July, 2021; originally announced July 2021.

Comments: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2106.01434 [pdf, other]

Robot in a China Shop: Using Reinforcement Learning for Location-Specific Navigation Behaviour

Authors: Xihan Bian, Oscar Mendez, Simon Hadfield

Abstract: Robots need to be able to work in multiple different environments. Even when performing similar tasks, different behaviour should be deployed to best fit the current environment. In this paper, We propose a new approach to navigation, where it is treated as a multi-task learning problem. This enables the robot to learn to behave differently in visual navigation tasks for different environments whi… ▽ More Robots need to be able to work in multiple different environments. Even when performing similar tasks, different behaviour should be deployed to best fit the current environment. In this paper, We propose a new approach to navigation, where it is treated as a multi-task learning problem. This enables the robot to learn to behave differently in visual navigation tasks for different environments while also learning shared expertise across environments. We evaluated our approach in both simulated environments as well as real-world data. Our method allows our system to converge with a 26% reduction in training time, while also increasing accuracy. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Published at ICRA 2021

arXiv:2106.00371 [pdf, other]

Markov Localisation using Heatmap Regression and Deep Convolutional Odometry

Authors: Oscar Mendez, Simon Hadfield, Richard Bowden

Abstract: In the context of self-driving vehicles there is strong competition between approaches based on visual localisation and LiDAR. While LiDAR provides important depth information, it is sparse in resolution and expensive. On the other hand, cameras are low-cost and recent developments in deep learning mean they can provide high localisation performance. However, several fundamental problems remain, p… ▽ More In the context of self-driving vehicles there is strong competition between approaches based on visual localisation and LiDAR. While LiDAR provides important depth information, it is sparse in resolution and expensive. On the other hand, cameras are low-cost and recent developments in deep learning mean they can provide high localisation performance. However, several fundamental problems remain, particularly in the domain of uncertainty, where learning based approaches can be notoriously over-confident. Markov, or grid-based, localisation was an early solution to the localisation problem but fell out of favour due to its computational complexity. Representing the likelihood field as a grid (or volume) means there is a trade off between accuracy and memory size. Furthermore, it is necessary to perform expensive convolutions across the entire likelihood volume. Despite the benefit of simultaneously maintaining a likelihood for all possible locations, grid based approaches were superseded by more efficient particle filters and Monte Carlo Localisation (MCL). However, MCL introduces its own problems e.g. particle deprivation. Recent advances in deep learning hardware allow large likelihood volumes to be stored directly on the GPU, along with the hardware necessary to efficiently perform GPU-bound 3D convolutions and this obviates many of the disadvantages of grid based methods. In this work, we present a novel CNN-based localisation approach that can leverage modern deep learning hardware. By implementing a grid-based Markov localisation approach directly on the GPU, we create a hybrid CNN that can perform image-based localisation and odometry-based likelihood propagation within a single neural network. The resulting approach is capable of outperforming direct pose regression methods as well as state-of-the-art localisation systems. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2021

arXiv:2105.03693 [pdf, other]

Discrepancy and Sparsity

Authors: Mario Grobler, Yiting Jiang, Patrice Ossona de Mendez, Sebastian Siebertz, Alexandre Vigny

Abstract: We study the connections between the notions of combinatorial discrepancy and graph degeneracy. In particular, we prove that the maximum discrepancy over all subgraphs $H$ of a graph $G$ of the neighborhood set system of $H$ is sandwiched between $Ω(\log\mathrm{deg}(G))$ and $\mathcal{O}(\mathrm{deg}(G))$, where $\mathrm{deg}(G)$ denotes the degeneracy of $G$. We extend this result to inequalities… ▽ More We study the connections between the notions of combinatorial discrepancy and graph degeneracy. In particular, we prove that the maximum discrepancy over all subgraphs $H$ of a graph $G$ of the neighborhood set system of $H$ is sandwiched between $Ω(\log\mathrm{deg}(G))$ and $\mathcal{O}(\mathrm{deg}(G))$, where $\mathrm{deg}(G)$ denotes the degeneracy of $G$. We extend this result to inequalities relating weak coloring numbers and discrepancy of graph powers and deduce a new characterization of bounded expansion classes. Then, we switch to a model theoretical point of view, introduce pointer structures, and study their relations to graph classes with bounded expansion. We deduce that a monotone class of graphs has bounded expansion if and only if all the set systems definable in this class have bounded hereditary discrepancy. Using known bounds on the VC-density of set systems definable in nowhere dense classes we also give a characterization of nowhere dense classes in terms of discrepancy. As consequences of our results, we obtain a corollary on the discrepancy of neighborhood set systems of edge colored graphs, a polynomial-time algorithm to compute $\varepsilon$-approximations of size $\mathcal{O}(1/\varepsilon)$ for set systems definable in bounded expansion classes, an application to clique coloring, and even the non-existence of a quantifier elimination scheme for nowhere dense classes. △ Less

Submitted 29 November, 2021; v1 submitted 8 May, 2021; originally announced May 2021.

Comments: Submitted version

arXiv:2103.10768 [pdf, other]

doi 10.1109/ICRA48506.2021.9561621

There and Back Again: Self-supervised Multispectral Correspondence Estimation

Authors: Celyn Walters, Oscar Mendez, Mark Johnson, Richard Bowden

Abstract: Across a wide range of applications, from autonomous vehicles to medical imaging, multi-spectral images provide an opportunity to extract additional information not present in color images. One of the most important steps in making this information readily available is the accurate estimation of dense correspondences between different spectra. Due to the nature of cross-spectral images, most cor… ▽ More Across a wide range of applications, from autonomous vehicles to medical imaging, multi-spectral images provide an opportunity to extract additional information not present in color images. One of the most important steps in making this information readily available is the accurate estimation of dense correspondences between different spectra. Due to the nature of cross-spectral images, most correspondence solving techniques for the visual domain are simply not applicable. Furthermore, most cross-spectral techniques utilize spectra-specific characteristics to perform the alignment. In this work, we aim to address the dense correspondence estimation problem in a way that generalizes to more than one spectrum. We do this by introducing a novel cycle-consistency metric that allows us to self-supervise. This, combined with our spectra-agnostic loss functions, allows us to train the same network across multiple spectra. We demonstrate our approach on the challenging task of dense RGB-FIR correspondence estimation. We also show the performance of our unmodified network on the cases of RGB-NIR and RGB-RGB, where we achieve higher accuracy than similar self-supervised approaches. Our work shows that cross-spectral correspondence estimation can be solved in a common framework that learns to generalize alignment across spectra. △ Less

Submitted 26 May, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: To be published in IEEE/RSJ International Conference on Robot and Automation (ICRA) 2021

arXiv:2103.09641 [pdf, other]

doi 10.1109/IROS40897.2019.8968244.

A Robust Extrinsic Calibration Framework for Vehicles with Unscaled Sensors

Authors: Celyn Walters, Oscar Mendez, Simon Hadfield, Richard Bowden

Abstract: Accurate extrinsic sensor calibration is essential for both autonomous vehicles and robots. Traditionally this is an involved process requiring calibration targets, known fiducial markers and is generally performed in a lab. Moreover, even a small change in the sensor layout requires recalibration. With the anticipated arrival of consumer autonomous vehicles, there is demand for a system which can… ▽ More Accurate extrinsic sensor calibration is essential for both autonomous vehicles and robots. Traditionally this is an involved process requiring calibration targets, known fiducial markers and is generally performed in a lab. Moreover, even a small change in the sensor layout requires recalibration. With the anticipated arrival of consumer autonomous vehicles, there is demand for a system which can do this automatically, after deployment and without specialist human expertise. To solve these limitations, we propose a flexible framework which can estimate extrinsic parameters without an explicit calibration stage, even for sensors with unknown scale. Our first contribution builds upon standard hand-eye calibration by jointly recovering scale. Our second contribution is that our system is made robust to imperfect and degenerate sensor data, by collecting independent sets of poses and automatically selecting those which are most ideal. We show that our approach's robustness is essential for the target scenario. Unlike previous approaches, ours runs in real time and constantly estimates the extrinsic transform. For both an ideal experimental setup and a real use case, comparison against these approaches shows that we outperform the state-of-the-art. Furthermore, we demonstrate that the recovered scale may be applied to the full trajectory, circumventing the need for scale estimation via sensor fusion. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Journal ref: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019, pp. 36-42

arXiv:2102.06880 [pdf, other]

doi 10.46298/lmcs-20(3:4)2024

Twin-width and permutations

Authors: Édouard Bonnet, Jaroslav Nešetřil, Patrice Ossona de Mendez, Sebastian Siebertz, Stéphan Thomassé

Abstract: Inspired by a width invariant on permutations defined by Guillemot and Marx, Bonnet, Kim, Thomassé, and Watrigant introduced the twin-width of graphs, which is a parameter describing its structural complexity. This invariant has been further extended to binary structures, in several (basically equivalent) ways. We prove that a class of binary relational structures (that is: edge-colored partially… ▽ More Inspired by a width invariant on permutations defined by Guillemot and Marx, Bonnet, Kim, Thomassé, and Watrigant introduced the twin-width of graphs, which is a parameter describing its structural complexity. This invariant has been further extended to binary structures, in several (basically equivalent) ways. We prove that a class of binary relational structures (that is: edge-colored partially directed graphs) has bounded twin-width if and only if it is a first-order transduction of a~proper permutation class. As a by-product, we show that every class with bounded twin-width contains at most $2^{O(n)}$ pairwise non-isomorphic $n$-vertex graphs. △ Less

Submitted 4 July, 2024; v1 submitted 13 February, 2021; originally announced February 2021.

Journal ref: Logical Methods in Computer Science, Volume 20, Issue 3 (July 8, 2024) lmcs:11112

arXiv:2102.03117 [pdf, other]

Twin-width IV: ordered graphs and matrices

Authors: Édouard Bonnet, Ugo Giocanti, Patrice Ossona de Mendez, Pierre Simon, Stéphan Thomassé, Szymon Toruńczyk

Abstract: We establish a list of characterizations of bounded twin-width for hereditary, totally ordered binary structures. This has several consequences. First, it allows us to show that a (hereditary) class of matrices over a finite alphabet either contains at least $n!$ matrices of size $n \times n$, or at most $c^n$ for some constant $c$. This generalizes the celebrated Stanley-Wilf conjecture/Marcus-Ta… ▽ More We establish a list of characterizations of bounded twin-width for hereditary, totally ordered binary structures. This has several consequences. First, it allows us to show that a (hereditary) class of matrices over a finite alphabet either contains at least $n!$ matrices of size $n \times n$, or at most $c^n$ for some constant $c$. This generalizes the celebrated Stanley-Wilf conjecture/Marcus-Tardos theorem from permutation classes to any matrix class over a finite alphabet, answers our small conjecture [SODA '21] in the case of ordered graphs, and with more work, settles a question first asked by Balogh, Bollobás, and Morris [Eur. J. Comb. '06] on the growth of hereditary classes of ordered graphs. Second, it gives a fixed-parameter approximation algorithm for twin-width on ordered graphs. Third, it yields a full classification of fixed-parameter tractable first-order model checking on hereditary classes of ordered binary structures. Fourth, it provides a model-theoretic characterization of classes with bounded twin-width. △ Less

Submitted 5 July, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

Comments: 53 pages, 18 figures

MSC Class: 05A05; 05A16; 05C30 ACM Class: F.2.2

arXiv:2010.02607 [pdf, other]

Structural properties of the first-order transduction quasiorder

Authors: Jaroslav Nesetril, Patrice Ossona de Mendez, Sebastian Siebertz

Abstract: Logical transductions provide a very useful tool to encode classes of structures inside other classes of structures. In this paper we study first-order (FO) transductions and the quasiorder they induce on infinite classes of finite graphs. Surprisingly, this quasiorder is very complex, though shaped by the locality properties of first-order logic. This contrasts with the conjectured simplicity of… ▽ More Logical transductions provide a very useful tool to encode classes of structures inside other classes of structures. In this paper we study first-order (FO) transductions and the quasiorder they induce on infinite classes of finite graphs. Surprisingly, this quasiorder is very complex, though shaped by the locality properties of first-order logic. This contrasts with the conjectured simplicity of the monadic second order (MSO) transduction quasiorder. We first establish a local normal form for FO transductions, which is of independent interest. Then we prove that the quotient partial order is a bounded distributive join-semilattice, and that the subposet of \emph{additive} classes is also a bounded distributive join-semilattice. The FO transduction quasiorder has a great expressive power, and many well studied class properties can be defined using it. We apply these structural properties to prove, among other results, that FO transductions of the class of paths are exactly perturbations of classes with bounded bandwidth, that the local variants of monadic stability and monadic dependence are equivalent to their (standard) non-local versions, and that the classes with pathwidth at most $k$, for $k\geq 1$ form a strict hierarchy in the FO transduction quasiorder. △ Less

Submitted 13 July, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

arXiv:2007.07857 [pdf, other]

Rankwidth meets stability

Authors: Jaroslav Nesetril, Patrice Ossona de Mendez, Michal Pilipczuk, Roman Rabinovich, Sebastian Siebertz

Abstract: We study two notions of being well-structured for classes of graphs that are inspired by classic model theory. A class of graphs $C$ is monadically stable if it is impossible to define arbitrarily long linear orders in vertex-colored graphs from $C$ using a fixed first-order formula. Similarly, monadic dependence corresponds to the impossibility of defining all graphs in this way. Examples of mona… ▽ More We study two notions of being well-structured for classes of graphs that are inspired by classic model theory. A class of graphs $C$ is monadically stable if it is impossible to define arbitrarily long linear orders in vertex-colored graphs from $C$ using a fixed first-order formula. Similarly, monadic dependence corresponds to the impossibility of defining all graphs in this way. Examples of monadically stable graph classes are nowhere dense classes, which provide a robust theory of sparsity. Examples of monadically dependent classes are classes of bounded rankwidth (or equivalently, bounded cliquewidth), which can be seen as a dense analog of classes of bounded treewidth. Thus, monadic stability and monadic dependence extend classical structural notions for graphs by viewing them in a wider, model-theoretical context. We explore this emerging theory by proving the following: - A class of graphs $C$ is a first-order transduction of a class with bounded treewidth if and only if $C$ has bounded rankwidth and a stable edge relation (i.e. graphs from $C$ exclude some half-graph as a semi-induced subgraph). - If a class of graphs $C$ is monadically dependent and not monadically stable, then $C$ has in fact an unstable edge relation. As a consequence, we show that classes with bounded rankwidth excluding some half-graph as a semi-induced subgraph are linearly $χ$-bounded. Our proofs are effective and lead to polynomial time algorithms. △ Less

Submitted 15 July, 2020; originally announced July 2020.

arXiv:2003.11692 [pdf, other]

Regular partitions of gentle graphs

Authors: Yiting Jiang, Jaroslav Nesetril, Patrice Ossona de Mendez, Sebastian Siebertz

Abstract: Szemeredi's Regularity Lemma is a very useful tool of extremal combinatorics. Recently, several refinements of this seminal result were obtained for special, more structured classes of graphs. We survey these results in their rich combinatorial context. In particular, we stress the link to the theory of (structural) sparsity, which leads to alternative proofs, refinements and solutions of open pro… ▽ More Szemeredi's Regularity Lemma is a very useful tool of extremal combinatorics. Recently, several refinements of this seminal result were obtained for special, more structured classes of graphs. We survey these results in their rich combinatorial context. In particular, we stress the link to the theory of (structural) sparsity, which leads to alternative proofs, refinements and solutions of open problems. It is interesting to note that many of these classes present challenging problems. Nevertheless, from the point of view of regularity lemma type statements, they appear as "gentle" classes. △ Less

Submitted 29 March, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

arXiv:2003.03605 [pdf, other]

Clustering powers of sparse graphs

Authors: Jaroslav Nešetřil, Patrice Ossona de Mendez, Michał Pilipczuk, Xuding Zhu

Abstract: We prove that if $G$ is a sparse graph --- it belongs to a fixed class of bounded expansion $\mathcal{C}$ --- and $d\in \mathbb{N}$ is fixed, then the $d$th power of $G$ can be partitioned into cliques so that contracting each of these clique to a single vertex again yields a sparse graph. This result has several graph-theoretic and algorithmic consequences for powers of sparse graphs, including b… ▽ More We prove that if $G$ is a sparse graph --- it belongs to a fixed class of bounded expansion $\mathcal{C}$ --- and $d\in \mathbb{N}$ is fixed, then the $d$th power of $G$ can be partitioned into cliques so that contracting each of these clique to a single vertex again yields a sparse graph. This result has several graph-theoretic and algorithmic consequences for powers of sparse graphs, including bounds on their subchromatic number and efficient approximation algorithms for the chromatic number and the clique number. △ Less

Submitted 7 March, 2020; originally announced March 2020.

Comments: 14 pages

arXiv:1911.07748 [pdf, other]

Linear rankwidth meets stability

Authors: Jaroslav Nesetril, Patrice Ossona de Mendez, Roman Rabinovich, Sebastian Siebertz

Abstract: Classes with bounded rankwidth are MSO-transductions of trees and classes with bounded linear rankwidth are MSO-transductions of paths. These results show a strong link between the properties of these graph classes considered from the point of view of structural graph theory and from the point of view of finite model theory. We take both views on classes with bounded linear rankwidth and prove str… ▽ More Classes with bounded rankwidth are MSO-transductions of trees and classes with bounded linear rankwidth are MSO-transductions of paths. These results show a strong link between the properties of these graph classes considered from the point of view of structural graph theory and from the point of view of finite model theory. We take both views on classes with bounded linear rankwidth and prove structural and model theoretic properties of these classes: 1) Graphs with linear rankwidth at most $r$ are linearly \mbox{$χ$-bounded}. Actually, they have bounded $c$-chromatic number, meaning that they can be colored with $f(r)$ colors, each color inducing a cograph. 2) Based on a Ramsey-like argument, we prove for every proper hereditary family $\mathcal F$ of graphs (like cographs) that there is a class with bounded rankwidth that does not have the property that graphs in it can be colored by a bounded number of colors, each inducing a subgraph in~$\mathcal F$. 3) For a class $\mathcal C$ with bounded linear rankwidth the following conditions are equivalent: a) $\mathcal C$~is~stable, b)~$\mathcal C$~excludes some half-graph as a semi-induced subgraph, c) $\mathcal C$ is a first-order transduction of a class with bounded pathwidth. These results open the perspective to study classes admitting low linear rankwidth covers. △ Less

Submitted 15 November, 2019; originally announced November 2019.

Comments: accepted at SODA 2020 conference. arXiv admin note: text overlap with arXiv:1909.01564

arXiv:1909.01564 [pdf, other]

Classes of graphs with low complexity: the case of classes with bounded linear rankwidth

Authors: Jaroslav Nesetril, Patrice Ossona de Mendez, Roman Rabinovich, Sebastian Siebertz

Abstract: Classes with bounded rankwidth are MSO-transductions of trees and classes with bounded linear rankwidth are MSO-transductions of paths -- a result that shows a strong link between the properties of these graph classes considered from the point of view of structural graph theory and from the point of view of finite model theory. We take both views on classes with bounded linear rankwidth and prove… ▽ More Classes with bounded rankwidth are MSO-transductions of trees and classes with bounded linear rankwidth are MSO-transductions of paths -- a result that shows a strong link between the properties of these graph classes considered from the point of view of structural graph theory and from the point of view of finite model theory. We take both views on classes with bounded linear rankwidth and prove structural and model theoretic properties of these classes. The structural results we obtain are the following. 1) The number of unlabeled graphs of order $n$ with linear rank-width at most~$r$ is at most $\bigl[(r/2)!\,2^{\binom{r}{2}}3^{r+2}\bigr]^n$. 2) Graphs with linear rankwidth at most $r$ are linearly $χ$-bounded. Actually, they have bounded $c$-chromatic number, meaning that they can be colored with $f(r)$ colors, each color inducing a cograph. 3) To the contrary, based on a Ramsey-like argument, we prove for every proper hereditary family $F$ of graphs (like cographs) that there is a class with bounded rankwidth that does not have the property that graphs in it can be colored by a bounded number of colors, each inducing a subgraph in $F$. From the model theoretical side we obtain the following results: 1) A direct short proof that graphs with linear rankwidth at most $r$ are first-order transductions of linear orders. This result could also be derived from Colcombet's theorem on first-order transduction of linear orders and the equivalence of linear rankwidth with linear cliquewidth. 2) For a class $C$ with bounded linear rankwidth the following conditions are equivalent: a) $C$ is stable, b) $C$ excludes some half-graph as a semi-induced subgraph, c) $C$ is a first-order transduction of a class with bounded pathwidth. These results open the perspective to study classes admitting low linear rankwidth covers. △ Less

Submitted 4 September, 2019; originally announced September 2019.

arXiv:1907.09296 [pdf, other]

A-Phase classification using convolutional neural networks

Authors: Edgar R. Arce-Santana, Alfonso Alba, Martin O. Mendez, Valdemar Arce-Guevara

Abstract: A series of short events, called A-phases, can be observed in the human electroencephalogram during NREM sleep. These events can be classified in three groups (A1, A2 and A3) according to their spectral contents, and are thought to play a role in the transitions between the different sleep stages. A-phase detection and classification is usually performed manually by a trained expert, but it is a t… ▽ More A series of short events, called A-phases, can be observed in the human electroencephalogram during NREM sleep. These events can be classified in three groups (A1, A2 and A3) according to their spectral contents, and are thought to play a role in the transitions between the different sleep stages. A-phase detection and classification is usually performed manually by a trained expert, but it is a tedious and time-consuming task. In the past two decades, various researchers have designed algorithms to automatically detect and classify the A-phases with varying degrees of success, but the problem remains open. In this paper, a different approach is proposed: instead of attempting to design a general classifier for all subjects, we propose to train ad-hoc classifiers for each subject using as little data as possible, in order to drastically reduce the amount of time required from the expert. The proposed classifiers are based on deep convolutional neural networks using the log-spectrogram of the EEG signal as input data. Results are encouraging, achieving average accuracies of 80.31% when discriminating between A-phases and non A-phases, and 71.87% when classifying among A-phase sub-types, with only 25% of the total A-phases used for training. When additional expert-validated data is considered, the sub-type classification accuracy increases to 78.92%. These results show that a semi-automatic annotation system with assistance from an expert could provide a better alternative to fully automatic classifiers. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: 19 pages, 5 figures, 4 tables

arXiv:1812.08003 [pdf, other]

doi 10.1145/3360011

Model-Checking on Ordered Structures

Authors: Kord Eickmeyer, Jan van den Heuvel, Ken-ichi Kawarabayashi, Stephan Kreutzer, Patrice Ossona de Mendez, Michał Pilipczuk, Daniel A. Quiroz, Roman Rabinovich, Sebastian Siebertz

Abstract: We study the model-checking problem for first- and monadic second-order logic on finite relational structures. The problem of verifying whether a formula of these logics is true on a given structure is considered intractable in general, but it does become tractable on interesting classes of structures, such as on classes whose Gaifman graphs have bounded treewidth. In this paper we continue this l… ▽ More We study the model-checking problem for first- and monadic second-order logic on finite relational structures. The problem of verifying whether a formula of these logics is true on a given structure is considered intractable in general, but it does become tractable on interesting classes of structures, such as on classes whose Gaifman graphs have bounded treewidth. In this paper we continue this line of research and study model-checking for first- and monadic second-order logic in the presence of an ordering on the input structure. We do so in two settings: the general ordered case, where the input structures are equipped with a fixed order or successor relation, and the order invariant case, where the formulas may resort to an ordering, but their truth must be independent of the particular choice of order. In the first setting we show very strong intractability results for most interesting classes of structures. In contrast, in the order invariant case we obtain tractability results for order-invariant monadic second-order formulas on the same classes of graphs as in the unordered case. For first-order logic, we obtain tractability of successor-invariant formulas on classes whose Gaifman graphs have bounded expansion. Furthermore, we show that model-checking for order-invariant first-order formulas is tractable on coloured posets of bounded width. △ Less

Submitted 18 December, 2018; originally announced December 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1701.08516

arXiv:1811.07583 [pdf, other]

doi 10.1007/978-3-030-11021-5_44

Localisation via Deep Imagination: learn the features not the map

Authors: Jaime Spencer, Oscar Mendez, Richard Bowden, Simon Hadfield

Abstract: How many times does a human have to drive through the same area to become familiar with it? To begin with, we might first build a mental model of our surroundings. Upon revisiting this area, we can use this model to extrapolate to new unseen locations and imagine their appearance. Based on this, we propose an approach where an agent is capable of modelling new environments after a single visitatio… ▽ More How many times does a human have to drive through the same area to become familiar with it? To begin with, we might first build a mental model of our surroundings. Upon revisiting this area, we can use this model to extrapolate to new unseen locations and imagine their appearance. Based on this, we propose an approach where an agent is capable of modelling new environments after a single visitation. To this end, we introduce "Deep Imagination", a combination of classical Visual-based Monte Carlo Localisation and deep learning. By making use of a feature embedded 3D map, the system can "imagine" the view from any novel location. These "imagined" views are contrasted with the current observation in order to estimate the agent's current location. In order to build the embedded map, we train a deep Siamese Fully Convolutional U-Net to perform dense feature extraction. By training these features to be generic, no additional training or fine tuning is required to adapt to new environments. Our results demonstrate the generality and transfer capability of our learnt dense features by training and evaluating on multiple datasets. Additionally, we include several visualizations of the feature representations and resulting 3D maps, as well as their application to localisation. △ Less

Submitted 19 November, 2018; originally announced November 2018.

Comments: VNAD @ ECCV2018

arXiv:1810.02389 [pdf, other]

First-order interpretations of bounded expansion classes

Authors: Jakub Gajarský, Stephan Kreutzer, Jaroslav Nešetřil, Patrice Ossona de Mendez, Michał Pilipczuk, Sebastian Siebertz, Szymon Toruńczyk

Abstract: The notion of bounded expansion captures uniform sparsity of graph classes and renders various algorithmic problems that are hard in general tractable. In particular, the model-checking problem for first-order logic is fixed-parameter tractable over such graph classes. With the aim of generalizing such results to dense graphs, we introduce classes of graphs with structurally bounded expansion, def… ▽ More The notion of bounded expansion captures uniform sparsity of graph classes and renders various algorithmic problems that are hard in general tractable. In particular, the model-checking problem for first-order logic is fixed-parameter tractable over such graph classes. With the aim of generalizing such results to dense graphs, we introduce classes of graphs with structurally bounded expansion, defined as first-order interpretations of classes of bounded expansion. As a first step towards their algorithmic treatment, we provide their characterization analogous to the characterization of classes of bounded expansion via low treedepth decompositions, replacing treedepth by its dense analogue called shrubdepth. △ Less

Submitted 4 October, 2018; originally announced October 2018.

arXiv:1709.01500 [pdf, other]

SeDAR - Semantic Detection and Ranging: Humans can localise without LiDAR, can robots?

Authors: Oscar Mendez, Simon Hadfield, Nicolas Pugeault, Richard Bowden

Abstract: How does a person work out their location using a floorplan? It is probably safe to say that we do not explicitly measure depths to every visible surface and try to match them against different pose estimates in the floorplan. And yet, this is exactly how most robotic scan-matching algorithms operate. Similarly, we do not extrude the 2D geometry present in the floorplan into 3D and try to align it… ▽ More How does a person work out their location using a floorplan? It is probably safe to say that we do not explicitly measure depths to every visible surface and try to match them against different pose estimates in the floorplan. And yet, this is exactly how most robotic scan-matching algorithms operate. Similarly, we do not extrude the 2D geometry present in the floorplan into 3D and try to align it to the real-world. And yet, this is how most vision-based approaches localise. Humans do the exact opposite. Instead of depth, we use high level semantic cues. Instead of extruding the floorplan up into the third dimension, we collapse the 3D world into a 2D representation. Evidence of this is that many of the floorplans we use in everyday life are not accurate, opting instead for high levels of discriminative landmarks. In this work, we use this insight to present a global localisation approach that relies solely on the semantic labels present in the floorplan and extracted from RGB images. While our approach is able to use range measurements if available, we demonstrate that they are unnecessary as we can achieve results comparable to state-of-the-art without them. △ Less

Submitted 2 May, 2018; v1 submitted 5 September, 2017; originally announced September 2017.

arXiv:1708.05424 [pdf, other]

Nowhere Dense Graph Classes and Dimension

Authors: Gwenaël Joret, Piotr Micek, Patrice Ossona de Mendez, Veit Wiechert

Abstract: Nowhere dense graph classes provide one of the least restrictive notions of sparsity for graphs. Several equivalent characterizations of nowhere dense classes have been obtained over the years, using a wide range of combinatorial objects. In this paper we establish a new characterization of nowhere dense classes, in terms of poset dimension: A monotone graph class is nowhere dense if and only if f… ▽ More Nowhere dense graph classes provide one of the least restrictive notions of sparsity for graphs. Several equivalent characterizations of nowhere dense classes have been obtained over the years, using a wide range of combinatorial objects. In this paper we establish a new characterization of nowhere dense classes, in terms of poset dimension: A monotone graph class is nowhere dense if and only if for every $h \geq 1$ and every $ε> 0$, posets of height at most $h$ with $n$ elements and whose cover graphs are in the class have dimension $\mathcal{O}(n^ε)$. △ Less

Submitted 31 January, 2019; v1 submitted 17 August, 2017; originally announced August 2017.

Comments: v4: Minor changes suggested by a referee

arXiv:1707.01701 [pdf, ps, other]

Algorithmic Properties of Sparse Digraphs

Authors: Stephan Kreutzer, Patrice Ossona de Mendez, Roman Rabinovich, Sebastian Siebertz

Abstract: The notions of bounded expansion and nowhere denseness have been applied very successfully in algorithmic graph theory. We study the corresponding notions of directed bounded expansion and nowhere crownfulness on directed graphs. We show that many of the algorithmic tools that were developed for undirected bounded expansion classes can, with some care, also be applied in their directed counterpart… ▽ More The notions of bounded expansion and nowhere denseness have been applied very successfully in algorithmic graph theory. We study the corresponding notions of directed bounded expansion and nowhere crownfulness on directed graphs. We show that many of the algorithmic tools that were developed for undirected bounded expansion classes can, with some care, also be applied in their directed counterparts, and thereby we highlight a rich algorithmic structure theory of directed bounded expansion classes. More specifically, we show that the directed Steiner tree problem is fixed-parameter tractable on any class of directed bounded expansion parameterized by the number $k$ of non-terminals plus the maximal diameter $s$ of a strongly connected component in the subgraph induced by the terminals. Our result strongly generalizes a result of Jones et al., who proved that the problem is fixed parameter tractable on digraphs of bounded degeneracy if the set of terminals is required to be acyclic. We furthermore prove that for every integer $r\geq 1$, the distance-$r$ dominating set problem can be approximated up to a factor $O(\log k)$ and the connected distance-$r$ dominating set problem can be approximated up to a factor $O(k\cdot \log k)$ on any class of directed bounded expansion, where $k$ denotes the size of an optimal solution. If furthermore, the class is nowhere crownful, we are able to compute a polynomial kernel for distance-$r$ dominating sets. Polynomial kernels for this problem were not known to exist on any other existing digraph measure for sparse classes. △ Less

Submitted 7 July, 2017; v1 submitted 6 July, 2017; originally announced July 2017.

Showing 1–50 of 53 results for author: Mendez, O