-
Existence and Uniqueness for Double-Phase Poisson Equations with Variable Growth
Authors:
Mohamed Khamsi,
Osvaldo Mendez
Abstract:
We study a class of nonlinear elliptic problems driven by a double-phase operator with variable exponents, arising in the modeling of heterogeneous materials undergoing phase transitions. The associated Poisson problem features a combination of two distinct growth conditions, modulated by a measurable weight function \( μ\), leading to spatially varying ellipticity. Working within the framework of…
▽ More
We study a class of nonlinear elliptic problems driven by a double-phase operator with variable exponents, arising in the modeling of heterogeneous materials undergoing phase transitions. The associated Poisson problem features a combination of two distinct growth conditions, modulated by a measurable weight function \( μ\), leading to spatially varying ellipticity. Working within the framework of modular function spaces, we establish the uniform convexity of the modular associated with the gradient term. This structural property enables a purely variational treatment of the problem. As a consequence, we prove existence and uniqueness of weak solutions under natural and minimal assumptions on the variable exponents and the weight.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
VICI: VLM-Instructed Cross-view Image-localisation
Authors:
Xiaohan Zhang,
Tavis Shore,
Chen Chen,
Oscar Mendez,
Simon Hadfield,
Safwan Wshah
Abstract:
In this paper, we present a high-performing solution to the UAVM 2025 Challenge, which focuses on matching narrow FOV street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic s…
▽ More
In this paper, we present a high-performing solution to the UAVM 2025 Challenge, which focuses on matching narrow FOV street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic street-level queries; instead, queries typically consist of limited-FOV images captured with unknown camera parameters. Our work prioritises discovering the highest achievable performance under these constraints, pushing the limits of existing architectures. Our method begins by retrieving candidate satellite image embeddings for a given query, followed by a re-ranking stage that selectively enhances retrieval accuracy within the top candidates. This two-stage approach enables more precise matching, even under the significant viewpoint and scale variations inherent in the task. Through experimentation, we demonstrate that our approach achieves competitive results -specifically attaining R@1 and R@10 retrieval rates of \topone\% and \topten\% respectively. This underscores the potential of optimised retrieval and re-ranking strategies in advancing practical geo-localisation performance. Code is available at https://github.com/tavisshore/VICI.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting
Authors:
Maksym Ivashechkin,
Oscar Mendez,
Richard Bowden
Abstract:
3D human generation is an important problem with a wide range of applications in computer vision and graphics. Despite recent progress in generative AI such as diffusion models or rendering methods like Neural Radiance Fields or Gaussian Splatting, controlling the generation of accurate 3D humans from text prompts remains an open challenge. Current methods struggle with fine detail, accurate rende…
▽ More
3D human generation is an important problem with a wide range of applications in computer vision and graphics. Despite recent progress in generative AI such as diffusion models or rendering methods like Neural Radiance Fields or Gaussian Splatting, controlling the generation of accurate 3D humans from text prompts remains an open challenge. Current methods struggle with fine detail, accurate rendering of hands and faces, human realism, and controlability over appearance. The lack of diversity, realism, and annotation in human image data also remains a challenge, hindering the development of a foundational 3D human model. We present a weakly supervised pipeline that tries to address these challenges. In the first step, we generate a photorealistic human image dataset with controllable attributes such as appearance, race, gender, etc using a state-of-the-art image diffusion model. Next, we propose an efficient mapping approach from image features to 3D point clouds using a transformer-based architecture. Finally, we close the loop by training a point-cloud diffusion model that is conditioned on the same text prompts used to generate the original samples. We demonstrate orders-of-magnitude speed-ups in 3D human generation compared to the state-of-the-art approaches, along with significantly improved text-prompt alignment, realism, and rendering quality. We will make the code and dataset available.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
SignSplat: Rendering Sign Language via Gaussian Splatting
Authors:
Maksym Ivashechkin,
Oscar Mendez,
Richard Bowden
Abstract:
State-of-the-art approaches for conditional human body rendering via Gaussian splatting typically focus on simple body motions captured from many views. This is often in the context of dancing or walking. However, for more complex use cases, such as sign language, we care less about large body motion and more about subtle and complex motions of the hands and face. The problems of building high fid…
▽ More
State-of-the-art approaches for conditional human body rendering via Gaussian splatting typically focus on simple body motions captured from many views. This is often in the context of dancing or walking. However, for more complex use cases, such as sign language, we care less about large body motion and more about subtle and complex motions of the hands and face. The problems of building high fidelity models are compounded by the complexity of capturing multi-view data of sign. The solution is to make better use of sequence data, ensuring that we can overcome the limited information from only a few views by exploiting temporal variability. Nevertheless, learning from sequence-level data requires extremely accurate and consistent model fitting to ensure that appearance is consistent across complex motions. We focus on how to achieve this, constraining mesh parameters to build an accurate Gaussian splatting framework from few views capable of modelling subtle human motion. We leverage regularization techniques on the Gaussian parameters to mitigate overfitting and rendering artifacts. Additionally, we propose a new adaptive control method to densify Gaussians and prune splat points on the mesh surface. To demonstrate the accuracy of our approach, we render novel sequences of sign language video, building on neural machine translation approaches to sign stitching. On benchmark datasets, our approach achieves state-of-the-art performance; and on highly articulated and complex sign language motion, we significantly outperform competing approaches.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
HandOcc: NeRF-based Hand Rendering with Occupancy Networks
Authors:
Maksym Ivashechkin,
Oscar Mendez,
Richard Bowden
Abstract:
We propose HandOcc, a novel framework for hand rendering based upon occupancy. Popular rendering methods such as NeRF are often combined with parametric meshes to provide deformable hand models. However, in doing so, such approaches present a trade-off between the fidelity of the mesh and the complexity and dimensionality of the parametric model. The simplicity of parametric mesh structures is app…
▽ More
We propose HandOcc, a novel framework for hand rendering based upon occupancy. Popular rendering methods such as NeRF are often combined with parametric meshes to provide deformable hand models. However, in doing so, such approaches present a trade-off between the fidelity of the mesh and the complexity and dimensionality of the parametric model. The simplicity of parametric mesh structures is appealing, but the underlying issue is that it binds methods to mesh initialization, making it unable to generalize to objects where a parametric model does not exist. It also means that estimation is tied to mesh resolution and the accuracy of mesh fitting. This paper presents a pipeline for meshless 3D rendering, which we apply to the hands. By providing only a 3D skeleton, the desired appearance is extracted via a convolutional model. We do this by exploiting a NeRF renderer conditioned upon an occupancy-based representation. The approach uses the hand occupancy to resolve hand-to-hand interactions further improving results, allowing fast rendering, and excellent hand appearance transfer. On the benchmark InterHand2.6M dataset, we achieved state-of-the-art results.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Decomposing graphs into stable and ordered parts
Authors:
Hector Buffière,
Patrice Ossona de Mendez
Abstract:
Connections between structural graph theory and finite model theory recently gained a lot of attention. In this setting, many interesting question remain on the properties of hereditary dependent (NIP) classes of graphs, in particular related to first-order transductions.
Motivated by Simon's decomposition theorem of dependent types into a stable part and a distal (order-like) part, we conjectur…
▽ More
Connections between structural graph theory and finite model theory recently gained a lot of attention. In this setting, many interesting question remain on the properties of hereditary dependent (NIP) classes of graphs, in particular related to first-order transductions.
Motivated by Simon's decomposition theorem of dependent types into a stable part and a distal (order-like) part, we conjecture that every hereditary dependent class of graphs is transduction-equivalent to a hereditary dependent class of partially ordered graphs, where the cover graph of the partial order has bounded treewidth and the unordered graph is (edge) stable.
In this paper, we consider the first non-trivial case (classes with bounded linear cliquewidth) and prove that the conjecture holds in a strong form, where the cover graph of the partial order has bounded pathwidth. Then, we extend our study to classes that admit bounded-size bounded linear cliquewidth covers, and prove that the conjecture holds for these classes, too.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Modular topologies on vector spaces
Authors:
Mohamed Khamsi,
Jan Lang,
Osvaldo Mendez
Abstract:
This paper addresses the topological structures induced on vector spaces by convex modulars that do not satisfy the $Δ_2$ condition, with particular focus on their applications to variable exponent spaces such as \( \ell^{(p_n)} \) and \( L^{p(\cdot)} \). The motivation behind this investigation is its applicability to the study of boundary value problems involving the variable exponent $p(x)$-Lap…
▽ More
This paper addresses the topological structures induced on vector spaces by convex modulars that do not satisfy the $Δ_2$ condition, with particular focus on their applications to variable exponent spaces such as \( \ell^{(p_n)} \) and \( L^{p(\cdot)} \). The motivation behind this investigation is its applicability to the study of boundary value problems involving the variable exponent $p(x)$-Laplacian when $p(x)$ is unbounded, a line of research recently opened by the authors. Fundamental topological properties are analyzed, including separation axioms, countability axioms, and the relationship between modular convergence and classical topological concepts such as continuity. Attention is given to the relation between modular and norm topologies. Special emphasis is placed on the openness of modular balls, the impact of the \(Δ_2\)-condition, and duality with respect to modular topologies.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Path degeneracy and applications
Authors:
Y. Lin,
P. Ossona de Mendez
Abstract:
In this work, we relate girth and path-degeneracy in classes with sub-exponential expansion, with explicit bounds for classes with polynomial expansion and proper minor-closed classes that are tight up to a constant factor (and tight up to second order terms if a classical conjecture on existence of $g$-cages is verified). As an application, we derive bounds on the generalized acyclic indices, on…
▽ More
In this work, we relate girth and path-degeneracy in classes with sub-exponential expansion, with explicit bounds for classes with polynomial expansion and proper minor-closed classes that are tight up to a constant factor (and tight up to second order terms if a classical conjecture on existence of $g$-cages is verified). As an application, we derive bounds on the generalized acyclic indices, on the generalized arboricities, and on the weak coloring numbers of high-girth graphs in such classes. Along the way, we prove a conjecture proposed in [T.~Bartnicki et al., Generalized arboricity of graphs with large girth, Discrete Mathematics 342 (2019), no.~5, 1343--1350.], which asserts that, for every integer $k$, there is an integer $g(p,k)$ such that every $K_k$ minor-free graph with girth at least $g(p,k)$ has $p$-arboricity at most $p+1$.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
HyperGS: Hyperspectral 3D Gaussian Splatting
Authors:
Christopher Thirgood,
Oscar Mendez,
Erin Chao Ling,
Jon Storey,
Simon Hadfield
Abstract:
We introduce HyperGS, a novel framework for Hyperspectral Novel View Synthesis (HNVS), based on a new latent 3D Gaussian Splatting (3DGS) technique. Our approach enables simultaneous spatial and spectral renderings by encoding material properties from multi-view 3D hyperspectral datasets. HyperGS reconstructs high-fidelity views from arbitrary perspectives with improved accuracy and speed, outperf…
▽ More
We introduce HyperGS, a novel framework for Hyperspectral Novel View Synthesis (HNVS), based on a new latent 3D Gaussian Splatting (3DGS) technique. Our approach enables simultaneous spatial and spectral renderings by encoding material properties from multi-view 3D hyperspectral datasets. HyperGS reconstructs high-fidelity views from arbitrary perspectives with improved accuracy and speed, outperforming currently existing methods. To address the challenges of high-dimensional data, we perform view synthesis in a learned latent space, incorporating a pixel-wise adaptive density function and a pruning technique for increased training stability and efficiency. Additionally, we introduce the first HNVS benchmark, implementing a number of new baselines based on recent SOTA RGB-NVS techniques, alongside the small number of prior works on HNVS. We demonstrate HyperGS's robustness through extensive evaluation of real and simulated hyperspectral scenes with a 14db accuracy improvement upon previously published models.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
The Radiance of Neural Fields: Democratizing Photorealistic and Dynamic Robotic Simulation
Authors:
Georgina Nuthall,
Richard Bowden,
Oscar Mendez
Abstract:
As robots increasingly coexist with humans, they must navigate complex, dynamic environments rich in visual information and implicit social dynamics, like when to yield or move through crowds. Addressing these challenges requires significant advances in vision-based sensing and a deeper understanding of socio-dynamic factors, particularly in tasks like navigation. To facilitate this, robotics rese…
▽ More
As robots increasingly coexist with humans, they must navigate complex, dynamic environments rich in visual information and implicit social dynamics, like when to yield or move through crowds. Addressing these challenges requires significant advances in vision-based sensing and a deeper understanding of socio-dynamic factors, particularly in tasks like navigation. To facilitate this, robotics researchers need advanced simulation platforms offering dynamic, photorealistic environments with realistic actors. Unfortunately, most existing simulators fall short, prioritizing geometric accuracy over visual fidelity, and employing unrealistic agents with fixed trajectories and low-quality visuals. To overcome these limitations, we developed a simulator that incorporates three essential elements: (1) photorealistic neural rendering of environments, (2) neurally animated human entities with behavior management, and (3) an ego-centric robotic agent providing multi-sensor output. By utilizing advanced neural rendering techniques in a dual-NeRF simulator, our system produces high-fidelity, photorealistic renderings of both environments and human entities. Additionally, it integrates a state-of-the-art Social Force Model to model dynamic human-human and human-robot interactions, creating the first photorealistic and accessible human-robot simulation system powered by neural rendering.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
PEnG: Pose-Enhanced Geo-Localisation
Authors:
Tavis Shore,
Oscar Mendez,
Simon Hadfield
Abstract:
Cross-view Geo-localisation is typically performed at a coarse granularity, because densely sampled satellite image patches overlap heavily. This heavy overlap would make disambiguating patches very challenging. However, by opting for sparsely sampled patches, prior work has placed an artificial upper bound on the localisation accuracy that is possible. Even a perfect oracle system cannot achieve…
▽ More
Cross-view Geo-localisation is typically performed at a coarse granularity, because densely sampled satellite image patches overlap heavily. This heavy overlap would make disambiguating patches very challenging. However, by opting for sparsely sampled patches, prior work has placed an artificial upper bound on the localisation accuracy that is possible. Even a perfect oracle system cannot achieve accuracy greater than the average separation of the tiles. To solve this limitation, we propose combining cross-view geo-localisation and relative pose estimation to increase precision to a level practical for real-world application. We develop PEnG, a 2-stage system which first predicts the most likely edges from a city-scale graph representation upon which a query image lies. It then performs relative pose estimation within these edges to determine a precise position. PEnG presents the first technique to utilise both viewpoints available within cross-view geo-localisation datasets to enhance precision to a sub-metre level, with some examples achieving centimetre level accuracy. Our proposed ensemble achieves state-of-the-art precision - with relative Top-5m retrieval improvements on previous works of 213%. Decreasing the median euclidean distance error by 96.90% from the previous best of 734m down to 22.77m, when evaluating with 90 degree horizontal FOV images. Code will be made available: tavisshore.co.uk/PEnG
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
SpaGBOL: Spatial-Graph-Based Orientated Localisation
Authors:
Tavis Shore,
Oscar Mendez,
Simon Hadfield
Abstract:
Cross-View Geo-Localisation within urban regions is challenging in part due to the lack of geo-spatial structuring within current datasets and techniques. We propose utilising graph representations to model sequences of local observations and the connectivity of the target location. Modelling as a graph enables generating previously unseen sequences by sampling with new parameter configurations. T…
▽ More
Cross-View Geo-Localisation within urban regions is challenging in part due to the lack of geo-spatial structuring within current datasets and techniques. We propose utilising graph representations to model sequences of local observations and the connectivity of the target location. Modelling as a graph enables generating previously unseen sequences by sampling with new parameter configurations. To leverage this newly available information, we propose a GNN-based architecture, producing spatially strong embeddings and improving discriminability over isolated image embeddings. We outline SpaGBOL, introducing three novel contributions. 1) The first graph-structured dataset for Cross-View Geo-Localisation, containing multiple streetview images per node to improve generalisation. 2) Introducing GNNs to the problem, we develop the first system that exploits the correlation between node proximity and feature similarity. 3) Leveraging the unique properties of the graph representation - we demonstrate a novel retrieval filtering approach based on neighbourhood bearings. SpaGBOL achieves state-of-the-art accuracies on the unseen test graph - with relative Top-1 retrieval improvements on previous techniques of 11%, and 50% when filtering with Bearing Vector Matching on the SpaGBOL dataset.
△ Less
Submitted 3 December, 2024; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Colouring negative exact-distance graphs of signed graphs
Authors:
Reza Naserasr,
Patrice Ossona de Mendez,
Daniel A. Quiroz,
Robert Šámal,
Weiqiang Yu
Abstract:
The $k$-th exact-distance graph, of a graph $G$ has $V(G)$ as its vertex set, and $xy$ as an edge if and only if the distance between $x$ and $y$ is (exactly) $k$ in $G$. We consider two possible extensions of this notion for signed graphs. Finding the chromatic number of a negative exact-distance square of a signed graph is a weakening of the problem of finding the smallest target graph to which…
▽ More
The $k$-th exact-distance graph, of a graph $G$ has $V(G)$ as its vertex set, and $xy$ as an edge if and only if the distance between $x$ and $y$ is (exactly) $k$ in $G$. We consider two possible extensions of this notion for signed graphs. Finding the chromatic number of a negative exact-distance square of a signed graph is a weakening of the problem of finding the smallest target graph to which the signed graph has a sign-preserving homomorphism. We study the chromatic number of negative exact-distance graphs of signed graphs that are planar, and also the relation of these chromatic numbers with the generalised colouring numbers of the underlying graphs. Our results are related to a theorem of Alon and Marshall about homomorphisms of signed graphs.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Shallow vertex minors, stability, and dependence
Authors:
H. Buffière,
E. Kim,
P. Ossona de Mendez
Abstract:
Stability and dependence are model-theoretic notions that have recently proved highly effective in the study of structural and algorithmic properties of hereditary graph classes, and are considered key notions for generalizing to hereditary graph classes the theory of sparsity developed for monotone graph classes (where an essential notion is that of nowhere dense class). The theory of sparsity wa…
▽ More
Stability and dependence are model-theoretic notions that have recently proved highly effective in the study of structural and algorithmic properties of hereditary graph classes, and are considered key notions for generalizing to hereditary graph classes the theory of sparsity developed for monotone graph classes (where an essential notion is that of nowhere dense class). The theory of sparsity was initially built on the notion of shallow minors and on the idea of excluding different sets of minors, depending on the depth at which these minors can appear.
In this paper, we follow a similar path, where shallow vertex minors replace shallow minors. In this setting, we provide a neat characterization of stable / dependent hereditary classes of graphs: A hereditary class of graphs $\mathscr C$ is
(1) dependent if and only if it does not contain all permutation graphs and, for each integer $r$, it excludes some split interval graph as a depth-$r$ vertex minor;
(2) stable if and only if, for each integer $r$, it excludes some half-graph as a depth-$r$ vertex minor.
A key ingredient in proving these results is the preservation of stability and dependence of a class when taking bounded depth shallow vertex minors. We extend this preservation result to binary structures and get, as a direct consequence, that bounded depth shallow vertex minors of graphs with bounded twin-width have bounded twin-width.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Interactions Between Brauer Configuration Algebras and Classical Cryptanalysis to Analyze Bach's Canons
Authors:
Agustín Moreno Cañadas,
Pedro Fernando Fernández Espinosa,
José Gregorio Rodríguez Nieto,
Odette M. Mendez,
Ricardo Hugo Arteaga-Bastidas
Abstract:
Since their introduction, Brauer configuration algebras (BCAs) and their specialized messages have helped research in several fields of mathematics and sciences. This paper deals with a new perspective on using such algebras as a theoretical framework in classical cryptography and music theory. It is proved that some block cyphers define labeled Brauer configuration algebras. Particularly, the dim…
▽ More
Since their introduction, Brauer configuration algebras (BCAs) and their specialized messages have helped research in several fields of mathematics and sciences. This paper deals with a new perspective on using such algebras as a theoretical framework in classical cryptography and music theory. It is proved that some block cyphers define labeled Brauer configuration algebras. Particularly, the dimension of the BCA associated with a ciphertext-only attack of the Vigenere cryptosystem is given by the corresponding key's length and the captured ciphertext's coincidence index. On the other hand, historically, Bach's canons have been considered solved music puzzles. However, due to how Bach posed such canons, the question remains whether their solutions are only limited to musical issues. This paper gives alternative solutions based on the theory of Brauer configuration algebras to some of the puzzle canons proposed by Bach in his Musical Offering (BWV 1079) and the canon â 4 Voc: Perpetuus (BWV 1073). Specifically to the canon â 6 Voc (BWV 1076), canon 1 â2 (also known as the crab canon), and canon â4 Quaerendo Invenietis. These solutions are obtained by interpreting such canons as ciphertexts (via route and transposition cyphers) of some specialized Brauer messages. In particular, it is noted that the structure or form of the notes used in such canons can be described via the shape of the most used symbols in Bach's works.
△ Less
Submitted 25 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks
Authors:
Maksym Ivashechkin,
Oscar Mendez,
Richard Bowden
Abstract:
3D hand pose estimation from images has seen considerable interest from the literature, with new methods improving overall 3D accuracy. One current challenge is to address hand-to-hand interaction where self-occlusions and finger articulation pose a significant problem to estimation. Little work has applied physical constraints that minimize the hand intersections that occur as a result of noisy e…
▽ More
3D hand pose estimation from images has seen considerable interest from the literature, with new methods improving overall 3D accuracy. One current challenge is to address hand-to-hand interaction where self-occlusions and finger articulation pose a significant problem to estimation. Little work has applied physical constraints that minimize the hand intersections that occur as a result of noisy estimation. This work addresses the intersection of hands by exploiting an occupancy network that represents the hand's volume as a continuous manifold. This allows us to model the probability distribution of points being inside a hand. We designed an intersection loss function to minimize the likelihood of hand-to-point intersections. Moreover, we propose a new hand mesh parameterization that is superior to the commonly used MANO model in many respects including lower mesh complexity, underlying 3D skeleton extraction, watertightness, etc. On the benchmark InterHand2.6M dataset, the models trained using our intersection loss achieve better results than the state-of-the-art by significantly decreasing the number of hand intersections while lowering the mean per-joint positional error. Additionally, we demonstrate superior performance for 3D hand uplift on Re:InterHand and SMILE datasets and show reduced hand-to-hand intersections for complex domains such as sign-language pose estimation.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
The Dirichlet problem for the $p(x)$-Laplacian with unbounded exponent $p(x)$
Authors:
M. Khamsi,
J. Lang,
O. Mendez,
A. Nekvinda
Abstract:
We prove the solvability of the Dirichlet problem for the variable exponent $p$-Laplacian with boundary data in $W^{1,p(x)}(Ω)$ on a bounded, smooth domain $Ω\subset {\mathbb R}^n$. Our main focus will be on an a.e. finite variable exponent $p(\cdot)$ with $n < \inf\limits_{x\in Ω}p(x)$ and $\sup\limits_{x\in Ω}p(x) = \infty$ under the sole assumption that $p\in C(Ω)$.
We prove the solvability of the Dirichlet problem for the variable exponent $p$-Laplacian with boundary data in $W^{1,p(x)}(Ω)$ on a bounded, smooth domain $Ω\subset {\mathbb R}^n$. Our main focus will be on an a.e. finite variable exponent $p(\cdot)$ with $n < \inf\limits_{x\in Ω}p(x)$ and $\sup\limits_{x\in Ω}p(x) = \infty$ under the sole assumption that $p\in C(Ω)$.
△ Less
Submitted 24 May, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Symbolic powers: Simis and weighted monomial ideals
Authors:
Fernando O. Méndez,
Maria Vaz Pinto,
Rafael H. Villarreal
Abstract:
The aim of this work is to compare symbolic and ordinary powers of monomial ideals using commutative algebra and combinatorics. Monomial ideals whose symbolic and ordinary powers coincide are called Simis ideals. Weighted monomial ideals are defined by assigning linear weights to monomials. We examine Simis and normally torsion-free ideals, relate some of the properties of monomial ideals and weig…
▽ More
The aim of this work is to compare symbolic and ordinary powers of monomial ideals using commutative algebra and combinatorics. Monomial ideals whose symbolic and ordinary powers coincide are called Simis ideals. Weighted monomial ideals are defined by assigning linear weights to monomials. We examine Simis and normally torsion-free ideals, relate some of the properties of monomial ideals and weighted monomial ideals, and present a structure theorem for edge ideals of $d$-uniform clutters whose ideal of covers is Simis in degree $d$. One of our main results is a combinatorial classification of when the dual of the edge ideal of a weighted oriented graph is Simis in degree $2$.
△ Less
Submitted 30 August, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation
Authors:
Tavis Shore,
Simon Hadfield,
Oscar Mendez
Abstract:
Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effec…
▽ More
Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effectively in GNSS-denied environments. Current research employs a variety of techniques to reduce the domain gap such as applying polar transforms to aerial images or synthesising between perspectives. However, these approaches generally rely on having a 360° field of view, limiting real-world feasibility. We propose BEV-CV, an approach introducing two key novelties with a focus on improving the real-world viability of cross-view geo-localisation. Firstly bringing ground-level images into a semantic Birds-Eye-View before matching embeddings, allowing for direct comparison with aerial image representations. Secondly, we adapt datasets into application realistic format - limited Field-of-View images aligned to vehicle direction. BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70° crops of CVUSA and CVACT by 23% and 24% respectively. Also decreasing computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities.
△ Less
Submitted 23 September, 2024; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Modular uniform convexity structures and applications to boundary value problems with non-standard growth
Authors:
M. A. Khamsi,
Osvaldo Mendez
Abstract:
We establish the existence and uniqueness of the solution to the Dirichlet problem for the variable exponent $p$-Laplacian on a bounded, smooth domain $Ω\subset {\mathbb R}^n$, where the boundary datum belongs to $W^{1,p}(Ω)$. Our analysis considers a continuous and bounded exponent $p$ satisfying $1<\inf\limits_{x\in Ω}p(x)$ and $\sup\limits_{x\in Ω}p(x)<\infty $, and is based on the uniform conv…
▽ More
We establish the existence and uniqueness of the solution to the Dirichlet problem for the variable exponent $p$-Laplacian on a bounded, smooth domain $Ω\subset {\mathbb R}^n$, where the boundary datum belongs to $W^{1,p}(Ω)$. Our analysis considers a continuous and bounded exponent $p$ satisfying $1<\inf\limits_{x\in Ω}p(x)$ and $\sup\limits_{x\in Ω}p(x)<\infty $, and is based on the uniform convexity of the Dirichlet integral, which is highly non trivial and in the variable exponent case is not related to the uniform convexity of the Sobolev norm.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Improving 3D Pose Estimation for Sign Language
Authors:
Maksym Ivashechkin,
Oscar Mendez,
Richard Bowden
Abstract:
This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using…
▽ More
This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using neural networks to predict both the joint rotations and bone lengths. These predictions are then combined with skeletal constraints using an FK layer implemented as a network layer in PyTorch. The result is a fast and accurate approach to the estimation of 3D skeletal pose. Through quantitative and qualitative evaluation, we demonstrate the method is significantly more accurate than MediaPipe in terms of both per joint positional error and visual appearance. Furthermore, we demonstrate generalization over different datasets. The implementation in PyTorch runs at between 100-200 milliseconds per image (including CNN detection) using CPU only.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Denoising Diffusion for 3D Hand Pose Estimation from Images
Authors:
Maksym Ivashechkin,
Oscar Mendez,
Richard Bowden
Abstract:
Hand pose estimation from a single image has many applications. However, approaches to full 3D body pose estimation are typically trained on day-to-day activities or actions. As such, detailed hand-to-hand interactions are poorly represented, especially during motion. We see this in the failure cases of techniques such as OpenPose or MediaPipe. However, accurate hand pose estimation is crucial for…
▽ More
Hand pose estimation from a single image has many applications. However, approaches to full 3D body pose estimation are typically trained on day-to-day activities or actions. As such, detailed hand-to-hand interactions are poorly represented, especially during motion. We see this in the failure cases of techniques such as OpenPose or MediaPipe. However, accurate hand pose estimation is crucial for many applications where the global body motion is less important than accurate hand pose estimation.
This paper addresses the problem of 3D hand pose estimation from monocular images or sequences. We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes. Moreover, we enforce kinematic constraints to ensure realistic poses are generated by incorporating an explicit forward kinematic layer as part of the network. The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D. However, when sequence data is available, we add a Transformer module over a temporal window of consecutive frames to refine the results, overcoming jittering and further increasing accuracy.
The method is quantitatively and qualitatively evaluated showing state-of-the-art robustness, generalization, and accuracy on several different datasets.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Learning Adaptive Neighborhoods for Graph Neural Networks
Authors:
Avishkar Saha,
Oscar Mendez,
Chris Russell,
Richard Bowden
Abstract:
Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiabl…
▽ More
Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiable graph generator which builds graph topologies where each node selects both its neighborhood and its size. Our module can be readily integrated into existing pipelines involving graph convolution operations, replacing the predetermined or existing adjacency matrix with one that is learned, and optimized, as part of the general objective. As such it is applicable to any GCN. We integrate our module into trajectory prediction, point cloud classification and node classification pipelines resulting in improved accuracy over other structure-learning methods across a wide range of datasets and GCN backbones.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Subchromatic numbers of powers of graphs with excluded minors
Authors:
Pedro P. Cortés,
Pankaj Kumar,
Benjamin Moore,
Patrice Ossona de Mendez,
Daniel A. Quiroz
Abstract:
A $k$-subcolouring of a graph $G$ is a function $f:V(G) \to \{0,\ldots,k-1\}$ such that the set of vertices coloured $i$ induce a disjoint union of cliques. The subchromatic number, $χ_{\textrm{sub}}(G)$, is the minimum $k$ such that $G$ admits a $k$-subcolouring. Nešetřil, Ossona de Mendez, Pilipczuk, and Zhu (2020), recently raised the problem of finding tight upper bounds for…
▽ More
A $k$-subcolouring of a graph $G$ is a function $f:V(G) \to \{0,\ldots,k-1\}$ such that the set of vertices coloured $i$ induce a disjoint union of cliques. The subchromatic number, $χ_{\textrm{sub}}(G)$, is the minimum $k$ such that $G$ admits a $k$-subcolouring. Nešetřil, Ossona de Mendez, Pilipczuk, and Zhu (2020), recently raised the problem of finding tight upper bounds for $χ_{\textrm{sub}}(G^2)$ when $G$ is planar. We show that $χ_{\textrm{sub}}(G^2)\le 43$ when $G$ is planar, improving their bound of 135. We give even better bounds when the planar graph $G$ has larger girth. Moreover, we show that $χ_{\textrm{sub}}(G^{3})\le 95$, improving the previous bound of 364. For these we adapt some recent techniques of Almulhim and Kierstead (2022), while also extending the decompositions of triangulated planar graphs of Van den Heuvel, Ossona de Mendez, Quiroz, Rabinovich and Siebertz (2017), to planar graphs of arbitrary girth. Note that these decompositions are the precursors of the graph product structure theorem of planar graphs.
We give improved bounds for $χ_{\textrm{sub}}(G^p)$ for all $p$, whenever $G$ has bounded treewidth, bounded simple treewidth, bounded genus, or excludes a clique or biclique as a minor. For this we introduce a family of parameters which form a gradation between the strong and the weak colouring numbers. We give upper bounds for these parameters for graphs coming from such classes.
Finally, we give a 2-approximation algorithm for the subchromatic number of graphs coming from any fixed class with bounded layered cliquewidth. In particular, this implies a 2-approximation algorithm for the subchromatic number of powers $G^p$ of graphs coming from any fixed class with bounded layered treewidth (such as the class of planar graphs). This algorithm works even if the power $p$ and the graph $G$ is unknown.
△ Less
Submitted 29 January, 2024; v1 submitted 3 June, 2023;
originally announced June 2023.
-
A few words about maps
Authors:
Robert Cori,
Yiting Jiang,
Patrice Ossona de Mendez,
Pierre Rosenstiehl
Abstract:
In this paper, we survey some properties, encoding, and bijections involving combinatorial maps, double occurrence words, and chord diagrams. We particularly study quasi-trees from a purely combinatorial point of view and derive a topological representation of maps with a given spanning quasi-tree using two fundamental polygons, which extends the representation of planar maps based on the equivale…
▽ More
In this paper, we survey some properties, encoding, and bijections involving combinatorial maps, double occurrence words, and chord diagrams. We particularly study quasi-trees from a purely combinatorial point of view and derive a topological representation of maps with a given spanning quasi-tree using two fundamental polygons, which extends the representation of planar maps based on the equivalence with bipartite circle graphs. Then, we focus on Depth-First Search trees and their connection with a poset we define on the spanning quasi-trees of a map. We apply the bijections obtained in the first section to the problem of enumerating loopless rooted maps. Finally, we return to the planar case and discuss a decomposition of planar rooted loopless maps and its consequences on planar rooted loopless map enumeration.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Modulo-Counting First-Order Logic on Bounded Expansion Classes
Authors:
J. Nesetril,
P. Ossona de Mendez,
S. Siebertz
Abstract:
We prove that, on bounded expansion classes, every first-order formula with modulo counting is equivalent, in a linear-time computable monadic expansion, to an existential first-order formula. As a consequence, we derive, on bounded expansion classes, that first-order transductions with modulo counting have the same encoding power as existential first-order transductions. Also, modulo-counting fir…
▽ More
We prove that, on bounded expansion classes, every first-order formula with modulo counting is equivalent, in a linear-time computable monadic expansion, to an existential first-order formula. As a consequence, we derive, on bounded expansion classes, that first-order transductions with modulo counting have the same encoding power as existential first-order transductions. Also, modulo-counting first-order model checking and computation of the size of sets definable in modulo-counting first-order logic can be achieved in linear time on bounded expansion classes. As an application, we prove that a class has structurally bounded expansion if and only if it is a class of bounded depth vertex-minors of graphs in a bounded expansion class. We also show how our results can be used to implement fast matrix calculus on bounded expansion matrices over a finite field.
△ Less
Submitted 23 March, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Twin-width V: linear minors, modular counting, and matrix multiplication
Authors:
Édouard Bonnet,
Ugo Giocanti,
Patrice Ossona de Mendez,
Stéphan Thomassé
Abstract:
We continue developing the theory around the twin-width of totally ordered binary structures, initiated in the previous paper of the series. We first introduce the notion of parity and linear minors of a matrix, which consists of iteratively replacing consecutive rows or consecutive columns with a linear combination of them. We show that a matrix class has bounded twin-width if and only if its lin…
▽ More
We continue developing the theory around the twin-width of totally ordered binary structures, initiated in the previous paper of the series. We first introduce the notion of parity and linear minors of a matrix, which consists of iteratively replacing consecutive rows or consecutive columns with a linear combination of them. We show that a matrix class has bounded twin-width if and only if its linear-minor closure does not contain all matrices. We observe that the fixed-parameter tractable algorithm for first-order model checking on structures given with an $O(1)$-sequence (certificate of bounded twin-width) and the fact that first-order transductions of bounded twin-width classes have bounded twin-width, both established in Twin-width I, extend to first-order logic with modular counting quantifiers. We make explicit a win-win argument obtained as a by-product of Twin-width IV, and somewhat similar to bidimensionality, that we call rank-bidimensionality. Armed with the above-mentioned extension to modular counting, we show that the twin-width of the product of two conformal matrices $A, B$ over a finite field is bounded by a function of the twin-width of $A$, of $B$, and of the size of the field. Furthermore, if $A$ and $B$ are $n \times n$ matrices of twin-width $d$ over $\mathbb F_q$, we show that $AB$ can be computed in time $O_{d,q}(n^2 \log n)$. We finally present an ad hoc algorithm to efficiently multiply two matrices of bounded twin-width, with a single-exponential dependence in the twin-width bound: If the inputs are given in a compact tree-like form, called twin-decomposition (of width $d$), then two $n \times n$ matrices $A, B$ over $\mathbb F_2$, a twin-decomposition of $AB$ with width $2^{d+o(d)}$ can be computed in time $4^{d+o(d)}n$ (resp. $4^{d+o(d)}n^{1+\varepsilon}$), and entries queried in doubly-logarithmic (resp. constant) time.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Decomposition horizons and a characterization of stable hereditary classes of graphs
Authors:
Samuel Braunfeld,
Jaroslav Nešetřil,
Patrice Ossona de Mendez,
Sebastian Siebertz
Abstract:
The notions of bounded-size and quasibounded-size decompositions with bounded treedepth base classes are central to the structural theory of graph sparsity introduced by two of the authors years ago, and provide a characterization of both classes with bounded expansions and nowhere dense classes. Strong connections of this theory with model theory led to considering first-order transductions, whic…
▽ More
The notions of bounded-size and quasibounded-size decompositions with bounded treedepth base classes are central to the structural theory of graph sparsity introduced by two of the authors years ago, and provide a characterization of both classes with bounded expansions and nowhere dense classes. Strong connections of this theory with model theory led to considering first-order transductions, which are logically defined graph transformations, and to initiate a comparative study of combinatorial and model theoretical properties of graph classes, with an emphasis on the model theoretical notions of dependence (or NIP) and stability. In this paper, we first prove that every hereditary class with quasibounded-size decompositions with dependent (resp.\ stable) base classes is itself dependent (resp.\ stable). This result is obtained in a more general study of ``decomposition horizons'', which are class properties compatible with quasibounded-size decompositions. We deduce that hereditary classes with quasibounded-size decompositions with bounded shrubdepth base classes are stable. In the second part of the paper, we prove the converse. Thus, we characterize stable hereditary classes of graphs as those hereditary classes that admit quasibounded-size decompositions with bounded shrubdepth base classes. This result is obtained by proving that every hereditary stable class of graphs admits almost nowhere dense quasi-bush representations, thus answering positively a conjecture of Dreier et al. These results have several consequences. For example, we show that every graph $G$ in a stable, hereditary class of graphs $\mathscr C$ has a clique or a stable set of size $Ω_{\mathscr C,ε}(|G|^{1/2-ε})$, for every $ε>0$, which is tight in the sense that it cannot be improved to $Ω_{\mathscr C}(|G|^{1/2})$.
△ Less
Submitted 22 December, 2024; v1 submitted 15 September, 2022;
originally announced September 2022.
-
On first-order transductions of classes of graphs
Authors:
Samuel Braunfeld,
Jaroslav Nešetřil,
Patrice Ossona de Mendez,
Sebastian Siebertz
Abstract:
We study various aspects of the first-order transduction quasi-order on graph classes, which provides a way of measuring the relative complexity of graph classes based on whether one can encode the other using a formula of first-order (FO) logic. In contrast with the conjectured simplicity of the transduction quasi-order for monadic second-order logic, the FO-transduction quasi-order is very compl…
▽ More
We study various aspects of the first-order transduction quasi-order on graph classes, which provides a way of measuring the relative complexity of graph classes based on whether one can encode the other using a formula of first-order (FO) logic. In contrast with the conjectured simplicity of the transduction quasi-order for monadic second-order logic, the FO-transduction quasi-order is very complex, and many standard properties from structural graph theory and model theory naturally appear in it. We prove a local normal form for transductions among other general results and constructions, which we illustrate via several examples and via the characterizations of the transductions of some simple classes. We then turn to various aspects of the quasi-order, including the (non-)existence of minimum and maximum classes for certain properties, the strictness of the pathwidth hierarchy, the fact that the quasi-order is not a lattice, and the role of weakly sparse classes in the quasi-order.
△ Less
Submitted 19 June, 2025; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Distributed domination on sparse graph classes
Authors:
Ozan Heydt,
Simeon Kublenz,
Patrice Ossona de Mendez,
Sebastian Siebertz,
Alexandre Vigny
Abstract:
We show that the dominating set problem admits a constant factor approximation in a constant number of rounds in the LOCAL model of distributed computing on graph classes with bounded expansion. This generalizes a result of Czygrinow et al. for graphs with excluded topological minors to very general classes of uniformly sparse graphs. We demonstrate how our general algorithm can be modified and fi…
▽ More
We show that the dominating set problem admits a constant factor approximation in a constant number of rounds in the LOCAL model of distributed computing on graph classes with bounded expansion. This generalizes a result of Czygrinow et al. for graphs with excluded topological minors to very general classes of uniformly sparse graphs. We demonstrate how our general algorithm can be modified and fine-tuned to compute an ($11+ε$)-approximation (for any $ε>0)$ of a minimum dominating set on planar graphs. This improves on the previously best known approximation factor of 52 on planar graphs, which was achieved by an elegant and simple algorithm of Lenzen et al.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation
Authors:
Nimet Kaygusuz,
Oscar Mendez,
Richard Bowden
Abstract:
Motion estimation approaches typically employ sensor fusion techniques, such as the Kalman Filter, to handle individual sensor failures. More recently, deep learning-based fusion approaches have been proposed, increasing the performance and requiring less model-specific implementations. However, current deep fusion approaches often assume that sensors are synchronised, which is not always practica…
▽ More
Motion estimation approaches typically employ sensor fusion techniques, such as the Kalman Filter, to handle individual sensor failures. More recently, deep learning-based fusion approaches have been proposed, increasing the performance and requiring less model-specific implementations. However, current deep fusion approaches often assume that sensors are synchronised, which is not always practical, especially for low-cost hardware. To address this limitation, in this work, we propose AFT-VO, a novel transformer-based sensor fusion architecture to estimate VO from multiple sensors. Our framework combines predictions from asynchronous multi-view cameras and accounts for the time discrepancies of measurements coming from different sources.
Our approach first employs a Mixture Density Network (MDN) to estimate the probability distributions of the 6-DoF poses for every camera in the system. Then a novel transformer-based fusion module, AFT-VO, is introduced, which combines these asynchronous pose estimations, along with their confidences. More specifically, we introduce Discretiser and Source Encoding techniques which enable the fusion of multi-source asynchronous signals.
We evaluate our approach on the popular nuScenes and KITTI datasets. Our experiments demonstrate that multi-view fusion for VO estimation provides robust and accurate trajectories, outperforming the state of the art in both challenging weather and lighting conditions.
△ Less
Submitted 16 September, 2022; v1 submitted 26 June, 2022;
originally announced June 2022.
-
Generalizing to New Tasks via One-Shot Compositional Subgoals
Authors:
Xihan Bian,
Oscar Mendez,
Simon Hadfield
Abstract:
The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future "General AI". Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptatio…
▽ More
The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future "General AI". Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptation to new tasks, through trial and error learning. However, this can be challenging for complex tasks which require many timesteps or large numbers of subtasks to complete. These "long horizon" tasks suffer from sample inefficiency and can require extremely long training times before the agent can learn to perform the necessary longterm planning. In this work, we introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. These subgoals are recalculated at each step using compositional arithmetic in a learned latent representation space. In addition to improving learning efficiency for standard long-term tasks, this approach also makes it possible to perform one-shot generalization to previously unseen tasks, given only a single reference trajectory for the task in a different environment. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
△ Less
Submitted 25 July, 2022; v1 submitted 16 May, 2022;
originally announced May 2022.
-
SKILL-IL: Disentangling Skill and Knowledge in Multitask Imitation Learning
Authors:
Bian Xihan,
Oscar Mendez,
Simon Hadfield
Abstract:
In this work, we introduce a new perspective for learning transferable content in multi-task imitation learning. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. We take inspiration from this and hypothesize the latent memory of a policy network can be disentangled into two partitions. These contain…
▽ More
In this work, we introduce a new perspective for learning transferable content in multi-task imitation learning. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. We take inspiration from this and hypothesize the latent memory of a policy network can be disentangled into two partitions. These contain either the knowledge of the environmental context for the task or the generalizable skill needed to solve the task. This allows improved training efficiency and better generalization over previously unseen combinations of skills in the same environment, and the same task in unseen environments.
We used the proposed approach to train a disentangled agent for two different multi-task IL environments. In both cases we out-performed the SOTA by 30% in task success rate. We also demonstrated this for navigation on a real robot.
△ Less
Submitted 26 July, 2022; v1 submitted 6 May, 2022;
originally announced May 2022.
-
"The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping
Authors:
Avishkar Saha,
Oscar Mendez,
Chris Russell,
Richard Bowden
Abstract:
Estimating a semantically segmented bird's-eye-view (BEV) map from a single image has become a popular technique for autonomous control and navigation. However, they show an increase in localization error with distance from the camera. While such an increase in error is entirely expected - localization is harder at distance - much of the drop in performance can be attributed to the cues used by cu…
▽ More
Estimating a semantically segmented bird's-eye-view (BEV) map from a single image has become a popular technique for autonomous control and navigation. However, they show an increase in localization error with distance from the camera. While such an increase in error is entirely expected - localization is harder at distance - much of the drop in performance can be attributed to the cues used by current texture-based models, in particular, they make heavy use of object-ground intersections (such as shadows), which become increasingly sparse and uncertain for distant objects. In this work, we address these shortcomings in BEV-mapping by learning the spatial relationship between objects in a scene. We propose a graph neural network which predicts BEV objects from a monocular image by spatially reasoning about an object within the context of other objects. Our approach sets a new state-of-the-art in BEV estimation from monocular images across three large-scale datasets, including a 50% relative improvement for objects on nuScenes.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Transducing paths in graph classes with unbounded shrubdepth
Authors:
Michał Pilipczuk,
Patrice Ossona de Mendez,
Sebastian Siebertz
Abstract:
Transductions are a general formalism for expressing transformations of graphs (and more generally, of relational structures) in logic. We prove that a graph class $\mathscr{C}$ can be $\mathsf{FO}$-transduced from a class of bounded-height trees (that is, has bounded shrubdepth) if, and only if, from $\mathscr{C}$ one cannot $\mathsf{FO}$-transduce the class of all paths. This establishes one of…
▽ More
Transductions are a general formalism for expressing transformations of graphs (and more generally, of relational structures) in logic. We prove that a graph class $\mathscr{C}$ can be $\mathsf{FO}$-transduced from a class of bounded-height trees (that is, has bounded shrubdepth) if, and only if, from $\mathscr{C}$ one cannot $\mathsf{FO}$-transduce the class of all paths. This establishes one of the three remaining open questions posed by Blumensath and Courcelle about the $\mathsf{MSO}$-transduction quasi-order, even in the stronger form that concerns $\mathsf{FO}$-transductions instead of $\mathsf{MSO}$-transductions.
The backbone of our proof is a graph-theoretic statement that says the following: If a graph $G$ excludes a path, the bipartite complement of a path, and a half-graph as semi-induced subgraphs, then the vertex set of $G$ can be partitioned into a bounded number of parts so that every part induces a cograph of bounded height, and every pair of parts semi-induce a bi-cograph of bounded height. This statement may be of independent interest; for instance, it implies that the graphs in question form a class that is linearly $χ$-bounded.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty Estimation
Authors:
Nimet Kaygusuz,
Oscar Mendez,
Richard Bowden
Abstract:
Visual Odometry (VO) estimation is an important source of information for vehicle state estimation and autonomous driving. Recently, deep learning based approaches have begun to appear in the literature. However, in the context of driving, single sensor based approaches are often prone to failure because of degraded image quality due to environmental factors, camera placement, etc. To address this…
▽ More
Visual Odometry (VO) estimation is an important source of information for vehicle state estimation and autonomous driving. Recently, deep learning based approaches have begun to appear in the literature. However, in the context of driving, single sensor based approaches are often prone to failure because of degraded image quality due to environmental factors, camera placement, etc. To address this issue, we propose a deep sensor fusion framework which estimates vehicle motion using both pose and uncertainty estimations from multiple on-board cameras. We extract spatio-temporal feature representations from a set of consecutive images using a hybrid CNN - RNN model. We then utilise a Mixture Density Network (MDN) to estimate the 6-DoF pose as a mixture of distributions and a fusion module to estimate the final pose using MDN outputs from multi-cameras. We evaluate our approach on the publicly available, large scale autonomous vehicle dataset, nuScenes. The results show that the proposed fusion approach surpasses the state-of-the-art, and provides robust estimates and accurate trajectories compared to individual camera-based estimations.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
MDN-VO: Estimating Visual Odometry with Confidence
Authors:
Nimet Kaygusuz,
Oscar Mendez,
Richard Bowden
Abstract:
Visual Odometry (VO) is used in many applications including robotics and autonomous systems. However, traditional approaches based on feature matching are computationally expensive and do not directly address failure cases, instead relying on heuristic methods to detect failure. In this work, we propose a deep learning-based VO model to efficiently estimate 6-DoF poses, as well as a confidence mod…
▽ More
Visual Odometry (VO) is used in many applications including robotics and autonomous systems. However, traditional approaches based on feature matching are computationally expensive and do not directly address failure cases, instead relying on heuristic methods to detect failure. In this work, we propose a deep learning-based VO model to efficiently estimate 6-DoF poses, as well as a confidence model for these estimates. We utilise a CNN - RNN hybrid model to learn feature representations from image sequences. We then employ a Mixture Density Network (MDN) which estimates camera motion as a mixture of Gaussians, based on the extracted spatio-temporal representations. Our model uses pose labels as a source of supervision, but derives uncertainties in an unsupervised manner. We evaluate the proposed model on the KITTI and nuScenes datasets and report extensive quantitative and qualitative results to analyse the performance of both pose and uncertainty estimation. Our experiments show that the proposed model exceeds state-of-the-art performance in addition to detecting failure cases using the predicted pose uncertainty.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Improving Robot Localisation by Ignoring Visual Distraction
Authors:
Oscar Mendez,
Matthew Vowels,
Richard Bowden
Abstract:
Attention is an important component of modern deep learning. However, less emphasis has been put on its inverse: ignoring distraction. Our daily lives require us to explicitly avoid giving attention to salient visual features that confound the task we are trying to accomplish. This visual prioritisation allows us to concentrate on important tasks while ignoring visual distractors.
In this work,…
▽ More
Attention is an important component of modern deep learning. However, less emphasis has been put on its inverse: ignoring distraction. Our daily lives require us to explicitly avoid giving attention to salient visual features that confound the task we are trying to accomplish. This visual prioritisation allows us to concentrate on important tasks while ignoring visual distractors.
In this work, we introduce Neural Blindness, which gives an agent the ability to completely ignore objects or classes that are deemed distractors. More explicitly, we aim to render a neural network completely incapable of representing specific chosen classes in its latent space. In a very real sense, this makes the network "blind" to certain classes, allowing and agent to focus on what is important for a given task, and demonstrates how this can be used to improve localisation.
△ Less
Submitted 25 July, 2021;
originally announced July 2021.
-
Robot in a China Shop: Using Reinforcement Learning for Location-Specific Navigation Behaviour
Authors:
Xihan Bian,
Oscar Mendez,
Simon Hadfield
Abstract:
Robots need to be able to work in multiple different environments. Even when performing similar tasks, different behaviour should be deployed to best fit the current environment. In this paper, We propose a new approach to navigation, where it is treated as a multi-task learning problem. This enables the robot to learn to behave differently in visual navigation tasks for different environments whi…
▽ More
Robots need to be able to work in multiple different environments. Even when performing similar tasks, different behaviour should be deployed to best fit the current environment. In this paper, We propose a new approach to navigation, where it is treated as a multi-task learning problem. This enables the robot to learn to behave differently in visual navigation tasks for different environments while also learning shared expertise across environments. We evaluated our approach in both simulated environments as well as real-world data. Our method allows our system to converge with a 26% reduction in training time, while also increasing accuracy.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Markov Localisation using Heatmap Regression and Deep Convolutional Odometry
Authors:
Oscar Mendez,
Simon Hadfield,
Richard Bowden
Abstract:
In the context of self-driving vehicles there is strong competition between approaches based on visual localisation and LiDAR. While LiDAR provides important depth information, it is sparse in resolution and expensive. On the other hand, cameras are low-cost and recent developments in deep learning mean they can provide high localisation performance. However, several fundamental problems remain, p…
▽ More
In the context of self-driving vehicles there is strong competition between approaches based on visual localisation and LiDAR. While LiDAR provides important depth information, it is sparse in resolution and expensive. On the other hand, cameras are low-cost and recent developments in deep learning mean they can provide high localisation performance. However, several fundamental problems remain, particularly in the domain of uncertainty, where learning based approaches can be notoriously over-confident.
Markov, or grid-based, localisation was an early solution to the localisation problem but fell out of favour due to its computational complexity. Representing the likelihood field as a grid (or volume) means there is a trade off between accuracy and memory size. Furthermore, it is necessary to perform expensive convolutions across the entire likelihood volume. Despite the benefit of simultaneously maintaining a likelihood for all possible locations, grid based approaches were superseded by more efficient particle filters and Monte Carlo Localisation (MCL). However, MCL introduces its own problems e.g. particle deprivation.
Recent advances in deep learning hardware allow large likelihood volumes to be stored directly on the GPU, along with the hardware necessary to efficiently perform GPU-bound 3D convolutions and this obviates many of the disadvantages of grid based methods. In this work, we present a novel CNN-based localisation approach that can leverage modern deep learning hardware. By implementing a grid-based Markov localisation approach directly on the GPU, we create a hybrid CNN that can perform image-based localisation and odometry-based likelihood propagation within a single neural network. The resulting approach is capable of outperforming direct pose regression methods as well as state-of-the-art localisation systems.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Discrepancy and Sparsity
Authors:
Mario Grobler,
Yiting Jiang,
Patrice Ossona de Mendez,
Sebastian Siebertz,
Alexandre Vigny
Abstract:
We study the connections between the notions of combinatorial discrepancy and graph degeneracy. In particular, we prove that the maximum discrepancy over all subgraphs $H$ of a graph $G$ of the neighborhood set system of $H$ is sandwiched between $Ω(\log\mathrm{deg}(G))$ and $\mathcal{O}(\mathrm{deg}(G))$, where $\mathrm{deg}(G)$ denotes the degeneracy of $G$. We extend this result to inequalities…
▽ More
We study the connections between the notions of combinatorial discrepancy and graph degeneracy. In particular, we prove that the maximum discrepancy over all subgraphs $H$ of a graph $G$ of the neighborhood set system of $H$ is sandwiched between $Ω(\log\mathrm{deg}(G))$ and $\mathcal{O}(\mathrm{deg}(G))$, where $\mathrm{deg}(G)$ denotes the degeneracy of $G$. We extend this result to inequalities relating weak coloring numbers and discrepancy of graph powers and deduce a new characterization of bounded expansion classes.
Then, we switch to a model theoretical point of view, introduce pointer structures, and study their relations to graph classes with bounded expansion. We deduce that a monotone class of graphs has bounded expansion if and only if all the set systems definable in this class have bounded hereditary discrepancy.
Using known bounds on the VC-density of set systems definable in nowhere dense classes we also give a characterization of nowhere dense classes in terms of discrepancy. As consequences of our results, we obtain a corollary on the discrepancy of neighborhood set systems of edge colored graphs, a polynomial-time algorithm to compute $\varepsilon$-approximations of size $\mathcal{O}(1/\varepsilon)$ for set systems definable in bounded expansion classes, an application to clique coloring, and even the non-existence of a quantifier elimination scheme for nowhere dense classes.
△ Less
Submitted 29 November, 2021; v1 submitted 8 May, 2021;
originally announced May 2021.
-
Twin-width and generalized coloring numbers
Authors:
Jan Dreier,
Jakub Gajarsky,
Yiting Jiang,
Patrice Ossona de Mendez,
Jean-Florent Raymond
Abstract:
In this paper, we prove that a graph $G$ with no $K_{s,s}$-subgraph and twin-width $d$ has $r$-admissibility and $r$-coloring numbers bounded from above by an exponential function of $r$ and that we can construct graphs achieving such a dependency in $r$.
In this paper, we prove that a graph $G$ with no $K_{s,s}$-subgraph and twin-width $d$ has $r$-admissibility and $r$-coloring numbers bounded from above by an exponential function of $r$ and that we can construct graphs achieving such a dependency in $r$.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Füredi-Hajnal and Stanley-Wilf conjectures in higher dimensions
Authors:
Y. Jang,
J. Nesetril,
P. Ossona de Mendez
Abstract:
In this paper we discuss analogs of Füredi-Hajnal and Stanley-Wilf conjectures for $t$-dimensional matrices with $t>2$.
In this paper we discuss analogs of Füredi-Hajnal and Stanley-Wilf conjectures for $t$-dimensional matrices with $t>2$.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
There and Back Again: Self-supervised Multispectral Correspondence Estimation
Authors:
Celyn Walters,
Oscar Mendez,
Mark Johnson,
Richard Bowden
Abstract:
Across a wide range of applications, from autonomous vehicles to medical imaging, multi-spectral images provide an opportunity to extract additional information not present in color images. One of the most important steps in making this information readily available is the accurate estimation of dense correspondences between different spectra.
Due to the nature of cross-spectral images, most cor…
▽ More
Across a wide range of applications, from autonomous vehicles to medical imaging, multi-spectral images provide an opportunity to extract additional information not present in color images. One of the most important steps in making this information readily available is the accurate estimation of dense correspondences between different spectra.
Due to the nature of cross-spectral images, most correspondence solving techniques for the visual domain are simply not applicable. Furthermore, most cross-spectral techniques utilize spectra-specific characteristics to perform the alignment. In this work, we aim to address the dense correspondence estimation problem in a way that generalizes to more than one spectrum. We do this by introducing a novel cycle-consistency metric that allows us to self-supervise. This, combined with our spectra-agnostic loss functions, allows us to train the same network across multiple spectra.
We demonstrate our approach on the challenging task of dense RGB-FIR correspondence estimation. We also show the performance of our unmodified network on the cases of RGB-NIR and RGB-RGB, where we achieve higher accuracy than similar self-supervised approaches. Our work shows that cross-spectral correspondence estimation can be solved in a common framework that learns to generalize alignment across spectra.
△ Less
Submitted 26 May, 2021; v1 submitted 19 March, 2021;
originally announced March 2021.
-
A Robust Extrinsic Calibration Framework for Vehicles with Unscaled Sensors
Authors:
Celyn Walters,
Oscar Mendez,
Simon Hadfield,
Richard Bowden
Abstract:
Accurate extrinsic sensor calibration is essential for both autonomous vehicles and robots. Traditionally this is an involved process requiring calibration targets, known fiducial markers and is generally performed in a lab. Moreover, even a small change in the sensor layout requires recalibration. With the anticipated arrival of consumer autonomous vehicles, there is demand for a system which can…
▽ More
Accurate extrinsic sensor calibration is essential for both autonomous vehicles and robots. Traditionally this is an involved process requiring calibration targets, known fiducial markers and is generally performed in a lab. Moreover, even a small change in the sensor layout requires recalibration. With the anticipated arrival of consumer autonomous vehicles, there is demand for a system which can do this automatically, after deployment and without specialist human expertise. To solve these limitations, we propose a flexible framework which can estimate extrinsic parameters without an explicit calibration stage, even for sensors with unknown scale. Our first contribution builds upon standard hand-eye calibration by jointly recovering scale. Our second contribution is that our system is made robust to imperfect and degenerate sensor data, by collecting independent sets of poses and automatically selecting those which are most ideal. We show that our approach's robustness is essential for the target scenario. Unlike previous approaches, ours runs in real time and constantly estimates the extrinsic transform. For both an ideal experimental setup and a real use case, comparison against these approaches shows that we outperform the state-of-the-art. Furthermore, we demonstrate that the recovered scale may be applied to the full trajectory, circumventing the need for scale estimation via sensor fusion.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Twin-width and permutations
Authors:
Édouard Bonnet,
Jaroslav Nešetřil,
Patrice Ossona de Mendez,
Sebastian Siebertz,
Stéphan Thomassé
Abstract:
Inspired by a width invariant on permutations defined by Guillemot and Marx, Bonnet, Kim, Thomassé, and Watrigant introduced the twin-width of graphs, which is a parameter describing its structural complexity. This invariant has been further extended to binary structures, in several (basically equivalent) ways. We prove that a class of binary relational structures (that is: edge-colored partially…
▽ More
Inspired by a width invariant on permutations defined by Guillemot and Marx, Bonnet, Kim, Thomassé, and Watrigant introduced the twin-width of graphs, which is a parameter describing its structural complexity. This invariant has been further extended to binary structures, in several (basically equivalent) ways. We prove that a class of binary relational structures (that is: edge-colored partially directed graphs) has bounded twin-width if and only if it is a first-order transduction of a~proper permutation class. As a by-product, we show that every class with bounded twin-width contains at most $2^{O(n)}$ pairwise non-isomorphic $n$-vertex graphs.
△ Less
Submitted 4 July, 2024; v1 submitted 13 February, 2021;
originally announced February 2021.
-
Twin-width IV: ordered graphs and matrices
Authors:
Édouard Bonnet,
Ugo Giocanti,
Patrice Ossona de Mendez,
Pierre Simon,
Stéphan Thomassé,
Szymon Toruńczyk
Abstract:
We establish a list of characterizations of bounded twin-width for hereditary, totally ordered binary structures. This has several consequences. First, it allows us to show that a (hereditary) class of matrices over a finite alphabet either contains at least $n!$ matrices of size $n \times n$, or at most $c^n$ for some constant $c$. This generalizes the celebrated Stanley-Wilf conjecture/Marcus-Ta…
▽ More
We establish a list of characterizations of bounded twin-width for hereditary, totally ordered binary structures. This has several consequences. First, it allows us to show that a (hereditary) class of matrices over a finite alphabet either contains at least $n!$ matrices of size $n \times n$, or at most $c^n$ for some constant $c$. This generalizes the celebrated Stanley-Wilf conjecture/Marcus-Tardos theorem from permutation classes to any matrix class over a finite alphabet, answers our small conjecture [SODA '21] in the case of ordered graphs, and with more work, settles a question first asked by Balogh, Bollobás, and Morris [Eur. J. Comb. '06] on the growth of hereditary classes of ordered graphs. Second, it gives a fixed-parameter approximation algorithm for twin-width on ordered graphs. Third, it yields a full classification of fixed-parameter tractable first-order model checking on hereditary classes of ordered binary structures. Fourth, it provides a model-theoretic characterization of classes with bounded twin-width.
△ Less
Submitted 5 July, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Structural properties of the first-order transduction quasiorder
Authors:
Jaroslav Nesetril,
Patrice Ossona de Mendez,
Sebastian Siebertz
Abstract:
Logical transductions provide a very useful tool to encode classes of structures inside other classes of structures. In this paper we study first-order (FO) transductions and the quasiorder they induce on infinite classes of finite graphs. Surprisingly, this quasiorder is very complex, though shaped by the locality properties of first-order logic. This contrasts with the conjectured simplicity of…
▽ More
Logical transductions provide a very useful tool to encode classes of structures inside other classes of structures. In this paper we study first-order (FO) transductions and the quasiorder they induce on infinite classes of finite graphs. Surprisingly, this quasiorder is very complex, though shaped by the locality properties of first-order logic. This contrasts with the conjectured simplicity of the monadic second order (MSO) transduction quasiorder.
We first establish a local normal form for FO transductions, which is of independent interest. Then we prove that the quotient partial order is a bounded distributive join-semilattice, and that the subposet of \emph{additive} classes is also a bounded distributive join-semilattice. The FO transduction quasiorder has a great expressive power, and many well studied class properties can be defined using it. We apply these structural properties to prove, among other results, that FO transductions of the class of paths are exactly perturbations of classes with bounded bandwidth, that the local variants of monadic stability and monadic dependence are equivalent to their (standard) non-local versions, and that the classes with pathwidth at most $k$, for $k\geq 1$ form a strict hierarchy in the FO transduction quasiorder.
△ Less
Submitted 13 July, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
From $χ$- to $χ_p$-bounded classes
Authors:
Y. Jiang,
J. Nesetril,
P. Ossona de Mendez
Abstract:
$χ$-bounded classes are studied here in the context of star colorings and more generally $χ_p…
▽ More
$χ$-bounded classes are studied here in the context of star colorings and more generally $χ_p$-colorings. This leads to natural extensions of the notion of bounded expansion class and to structural characterization of these. In this paper we solve two conjectures related to star coloring boundedness. One of the conjectures is disproved and in fact we determine which weakening holds true. We give structural characterizations of (strong and weak) $χ_p$-bounded classes. On the way, we generalize a result of Wood relating the chromatic number of a graph to the star chromatic number of its $1$-subdivision. As an application of our characterizations, among other things, we show that for every odd integer $g>3$ even hole-free graphs $G$ contain at most $\varphi(g,ω(G))\,|G|$ holes of length $g$.
△ Less
Submitted 27 February, 2021; v1 submitted 7 September, 2020;
originally announced September 2020.
-
Rankwidth meets stability
Authors:
Jaroslav Nesetril,
Patrice Ossona de Mendez,
Michal Pilipczuk,
Roman Rabinovich,
Sebastian Siebertz
Abstract:
We study two notions of being well-structured for classes of graphs that are inspired by classic model theory. A class of graphs $C$ is monadically stable if it is impossible to define arbitrarily long linear orders in vertex-colored graphs from $C$ using a fixed first-order formula. Similarly, monadic dependence corresponds to the impossibility of defining all graphs in this way. Examples of mona…
▽ More
We study two notions of being well-structured for classes of graphs that are inspired by classic model theory. A class of graphs $C$ is monadically stable if it is impossible to define arbitrarily long linear orders in vertex-colored graphs from $C$ using a fixed first-order formula. Similarly, monadic dependence corresponds to the impossibility of defining all graphs in this way. Examples of monadically stable graph classes are nowhere dense classes, which provide a robust theory of sparsity. Examples of monadically dependent classes are classes of bounded rankwidth (or equivalently, bounded cliquewidth), which can be seen as a dense analog of classes of bounded treewidth. Thus, monadic stability and monadic dependence extend classical structural notions for graphs by viewing them in a wider, model-theoretical context. We explore this emerging theory by proving the following:
- A class of graphs $C$ is a first-order transduction of a class with bounded treewidth if and only if $C$ has bounded rankwidth and a stable edge relation (i.e. graphs from $C$ exclude some half-graph as a semi-induced subgraph).
- If a class of graphs $C$ is monadically dependent and not monadically stable, then $C$ has in fact an unstable edge relation.
As a consequence, we show that classes with bounded rankwidth excluding some half-graph as a semi-induced subgraph are linearly $χ$-bounded. Our proofs are effective and lead to polynomial time algorithms.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.