Search | arXiv e-print repository

Fast 3D point clouds retrieval for Large-scale 3D Place Recognition

Authors: Chahine-Nicolas Zede, Laurent Carrafa, Valérie Gouet-Brunet

Abstract: Retrieval in 3D point clouds is a challenging task that consists in retrieving the most similar point clouds to a given query within a reference of 3D points. Current methods focus on comparing descriptors of point clouds in order to identify similar ones. Due to the complexity of this latter step, here we focus on the acceleration of the retrieval by adapting the Differentiable Search Index (DSI)… ▽ More Retrieval in 3D point clouds is a challenging task that consists in retrieving the most similar point clouds to a given query within a reference of 3D points. Current methods focus on comparing descriptors of point clouds in order to identify similar ones. Due to the complexity of this latter step, here we focus on the acceleration of the retrieval by adapting the Differentiable Search Index (DSI), a transformer-based approach initially designed for text information retrieval, for 3D point clouds retrieval. Our approach generates 1D identifiers based on the point descriptors, enabling direct retrieval in constant time. To adapt DSI to 3D data, we integrate Vision Transformers to map descriptors to these identifiers while incorporating positional and semantic encoding. The approach is evaluated for place recognition on a public benchmark comparing its retrieval capabilities against state-of-the-art methods, in terms of quality and speed of returned point clouds. △ Less

Submitted 28 May, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

Comments: 8 pages, 1 figures

MSC Class: 68T10; 68T45 ACM Class: I.5.4; I.2.10

arXiv:2410.23742 [pdf, other]

Fused-Planes: Improving Planar Representations for Learning Large Sets of 3D Scenes

Authors: Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valérie Gouet-Brunet

Abstract: To learn large sets of scenes, Tri-Planes are commonly employed for their planar structure that enables an interoperability with image models, and thus diverse 3D applications. However, this advantage comes at the cost of resource efficiency, as Tri-Planes are not the most computationally efficient option. In this paper, we introduce Fused-Planes, a new planar architecture that improves Tri-Planes… ▽ More To learn large sets of scenes, Tri-Planes are commonly employed for their planar structure that enables an interoperability with image models, and thus diverse 3D applications. However, this advantage comes at the cost of resource efficiency, as Tri-Planes are not the most computationally efficient option. In this paper, we introduce Fused-Planes, a new planar architecture that improves Tri-Planes resource-efficiency in the framework of learning large sets of scenes, which we call "multi-scene inverse graphics". To learn a large set of scenes, our method divides it into two subsets and operates as follows: (i) we train the first subset of scenes jointly with a compression model, (ii) we use that compression model to learn the remaining scenes. This compression model consists of a 3D-aware latent space in which Fused-Planes are learned, enabling a reduced rendering resolution, and shared structures across scenes that reduce scene representation complexity. Fused-Planes present competitive resource costs in multi-scene inverse graphics, while preserving Tri-Planes rendering quality, and maintaining their widely favored planar structure. Our codebase is publicly available as open-source. Our project page can be found at https://fused-planes.github.io . △ Less

Submitted 31 January, 2025; v1 submitted 31 October, 2024; originally announced October 2024.

arXiv:2410.22936 [pdf, other]

Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder

Authors: Antoine Schnepf, Karim Kassab, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valerie Gouet-Brunet

Abstract: While pre-trained image autoencoders are increasingly utilized in computer vision, the application of inverse graphics in 2D latent spaces has been under-explored. Yet, besides reducing the training and rendering complexity, applying inverse graphics in the latent space enables a valuable interoperability with other latent-based 2D methods. The major challenge is that inverse graphics cannot be di… ▽ More While pre-trained image autoencoders are increasingly utilized in computer vision, the application of inverse graphics in 2D latent spaces has been under-explored. Yet, besides reducing the training and rendering complexity, applying inverse graphics in the latent space enables a valuable interoperability with other latent-based 2D methods. The major challenge is that inverse graphics cannot be directly applied to such image latent spaces because they lack an underlying 3D geometry. In this paper, we propose an Inverse Graphics Autoencoder (IG-AE) that specifically addresses this issue. To this end, we regularize an image autoencoder with 3D-geometry by aligning its latent space with jointly trained latent 3D scenes. We utilize the trained IG-AE to bring NeRFs to the latent space with a latent NeRF training pipeline, which we implement in an open-source extension of the Nerfstudio framework, thereby unlocking latent scene learning for its supported methods. We experimentally confirm that Latent NeRFs trained with IG-AE present an improved quality compared to a standard autoencoder, all while exhibiting training and rendering accelerations with respect to NeRFs trained in the image space. Our project page can be found at https://ig-ae.github.io . △ Less

Submitted 24 February, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

Comments: Accepted at ICLR 2025. Available at https://openreview.net/forum?id=LTDtjrv02Y

arXiv:2403.11678 [pdf, other]

Exploring 3D-aware Latent Spaces for Efficiently Learning Numerous Scenes

Authors: Antoine Schnepf, Karim Kassab, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valérie Gouet-Brunet

Abstract: We present a method enabling the scaling of NeRFs to learn a large number of semantically-similar scenes. We combine two techniques to improve the required training time and memory cost per scene. First, we learn a 3D-aware latent space in which we train Tri-Plane scene representations, hence reducing the resolution at which scenes are learned. Moreover, we present a way to share common informatio… ▽ More We present a method enabling the scaling of NeRFs to learn a large number of semantically-similar scenes. We combine two techniques to improve the required training time and memory cost per scene. First, we learn a 3D-aware latent space in which we train Tri-Plane scene representations, hence reducing the resolution at which scenes are learned. Moreover, we present a way to share common information across scenes, hence allowing for a reduction of model complexity to learn a particular scene. Our method reduces effective per-scene memory costs by 44% and per-scene time costs by 86% when training 1000 scenes. Our project page can be found at https://3da-ae.github.io . △ Less

Submitted 17 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Camera-ready version accepted at 3DMV-CVPR 2024

arXiv:2312.00639 [pdf, other]

RefinedFields: Radiance Fields Refinement for Planar Scene Representations

Authors: Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi, Laurent Caraffa, Jeremie Mary, Valérie Gouet-Brunet

Abstract: Planar scene representations have recently witnessed increased interests for modeling scenes from images, as their lightweight planar structure enables compatibility with image-based models. Notably, K-Planes have gained particular attention as they extend planar scene representations to support in-the-wild scenes, in addition to object-level scenes. However, their visual quality has recently lagg… ▽ More Planar scene representations have recently witnessed increased interests for modeling scenes from images, as their lightweight planar structure enables compatibility with image-based models. Notably, K-Planes have gained particular attention as they extend planar scene representations to support in-the-wild scenes, in addition to object-level scenes. However, their visual quality has recently lagged behind that of state-of-the-art techniques. To reduce this gap, we propose RefinedFields, a method that leverages pre-trained networks to refine K-Planes scene representations via optimization guidance using an alternating training procedure. We carry out extensive experiments and verify the merit of our method on synthetic data and real tourism photo collections. RefinedFields enhances rendered scenes with richer details and improves upon its base representation on the task of novel view synthesis. Our project page can be found at https://refinedfields.github.io . △ Less

Submitted 26 May, 2025; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: Accepted at TMLR

arXiv:2208.00783 [pdf, other]

Location retrieval using visible landmarks based qualitative place signatures

Authors: Lijun Wei, Valerie Gouet-Brunet, Anthony Cohn

Abstract: Location retrieval based on visual information is to retrieve the location of an agent (e.g. human, robot) or the area they see by comparing the observations with a certain form of representation of the environment. Existing methods generally require precise measurement and storage of the observed environment features, which may not always be robust due to the change of season, viewpoint, occlusio… ▽ More Location retrieval based on visual information is to retrieve the location of an agent (e.g. human, robot) or the area they see by comparing the observations with a certain form of representation of the environment. Existing methods generally require precise measurement and storage of the observed environment features, which may not always be robust due to the change of season, viewpoint, occlusion, etc. They are also challenging to scale up and may not be applicable for humans due to the lack of measuring/imaging devices. Considering that humans often use less precise but easily produced qualitative spatial language and high-level semantic landmarks when describing an environment, a qualitative location retrieval method is proposed in this work by describing locations/places using qualitative place signatures (QPS), defined as the perceived spatial relations between ordered pairs of co-visible landmarks from viewers' perspective. After dividing the space into place cells each with individual signatures attached, a coarse-to-fine location retrieval method is proposed to efficiently identify the possible location(s) of viewers based on their qualitative observations. The usability and effectiveness of the proposed method were evaluated using openly available landmark datasets, together with simulated observations by considering the possible perception error. △ Less

Submitted 26 July, 2022; originally announced August 2022.

arXiv:2103.10729 [pdf, other]

Connecting Images through Time and Sources: Introducing Low-data, Heterogeneous Instance Retrieval

Authors: Dimitri Gominski, Valérie Gouet-Brunet, Liming Chen

Abstract: With impressive results in applications relying on feature learning, deep learning has also blurred the line between algorithm and data. Pick a training dataset, pick a backbone network for feature extraction, and voilà ; this usually works for a variety of use cases. But the underlying hypothesis that there exists a training dataset matching the use case is not always met. Moreover, the demand fo… ▽ More With impressive results in applications relying on feature learning, deep learning has also blurred the line between algorithm and data. Pick a training dataset, pick a backbone network for feature extraction, and voilà ; this usually works for a variety of use cases. But the underlying hypothesis that there exists a training dataset matching the use case is not always met. Moreover, the demand for interconnections regardless of the variations of the content calls for increasing generalization and robustness in features. An interesting application characterized by these problematics is the connection of historical and cultural databases of images. Through the seemingly simple task of instance retrieval, we propose to show that it is not trivial to pick features responding well to a panel of variations and semantic content. Introducing a new enhanced version of the Alegoria benchmark, we compare descriptors using the detailed annotations. We further give insights about the core problems in instance retrieval, testing four state-of-the-art additional techniques to increase performance. △ Less

Submitted 19 March, 2021; originally announced March 2021.

arXiv:2102.13392

Unifying Remote Sensing Image Retrieval and Classification with Robust Fine-tuning

Authors: Dimitri Gominski, Valérie Gouet-Brunet, Liming Chen

Abstract: Advances in high resolution remote sensing image analysis are currently hampered by the difficulty of gathering enough annotated data for training deep learning methods, giving rise to a variety of small datasets and associated dataset-specific methods. Moreover, typical tasks such as classification and retrieval lack a systematic evaluation on standard benchmarks and training datasets, which make… ▽ More Advances in high resolution remote sensing image analysis are currently hampered by the difficulty of gathering enough annotated data for training deep learning methods, giving rise to a variety of small datasets and associated dataset-specific methods. Moreover, typical tasks such as classification and retrieval lack a systematic evaluation on standard benchmarks and training datasets, which make it hard to identify durable and generalizable scientific contributions. We aim at unifying remote sensing image retrieval and classification with a new large-scale training and testing dataset, SF300, including both vertical and oblique aerial images and made available to the research community, and an associated fine-tuning method. We additionally propose a new adversarial fine-tuning method for global descriptors. We show that our framework systematically achieves a boost of retrieval and classification performance on nine different datasets compared to an ImageNet pretrained baseline, with currently no other method to compare to. △ Less

Submitted 7 March, 2023; v1 submitted 26 February, 2021; originally announced February 2021.

Comments: Performance margin with the proposed method is not statistically significant. Please refer to http://alegoria.ign.fr/en/SF300_dataset if you are interested in the dataset

arXiv:2005.03388 [pdf, other]

doi 10.1007/s11042-020-08992-6

Semantic Signatures for Large-scale Visual Localization

Authors: Li Weng, Valerie Gouet-Brunet, Bahman Soheilian

Abstract: Visual localization is a useful alternative to standard localization techniques. It works by utilizing cameras. In a typical scenario, features are extracted from captured images and compared with geo-referenced databases. Location information is then inferred from the matching results. Conventional schemes mainly use low-level visual features. These approaches offer good accuracy but suffer from… ▽ More Visual localization is a useful alternative to standard localization techniques. It works by utilizing cameras. In a typical scenario, features are extracted from captured images and compared with geo-referenced databases. Location information is then inferred from the matching results. Conventional schemes mainly use low-level visual features. These approaches offer good accuracy but suffer from scalability issues. In order to assist localization in large urban areas, this work explores a different path by utilizing high-level semantic information. It is found that object information in a street view can facilitate localization. A novel descriptor scheme called "semantic signature" is proposed to summarize this information. A semantic signature consists of type and angle information of visible objects at a spatial location. Several metrics and protocols are proposed for signature comparison and retrieval. They illustrate different trade-offs between accuracy and complexity. Extensive simulation results confirm the potential of the proposed scheme in large-scale applications. This paper is an extended version of a conference paper in CBMI'18. A more efficient retrieval protocol is presented with additional experiment results. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: 12 pages, 22 figures, Multimedia Tools and Applications (2020)

ACM Class: H.3; I.4; I.6

arXiv:1909.08866 [pdf, other]

doi 10.1145/3347317.3357246

Challenging deep image descriptors for retrieval in heterogeneous iconographic collections

Authors: Dimitri Gominski, Martyna Poreba, Valérie Gouet-Brunet, Liming Chen

Abstract: This article proposes to study the behavior of recent and efficient state-of-the-art deep-learning based image descriptors for content-based image retrieval, facing a panel of complex variations appearing in heterogeneous image datasets, in particular in cultural collections that may involve multi-source, multi-date and multi-view Permission to make digital This article proposes to study the behavior of recent and efficient state-of-the-art deep-learning based image descriptors for content-based image retrieval, facing a panel of complex variations appearing in heterogeneous image datasets, in particular in cultural collections that may involve multi-source, multi-date and multi-view Permission to make digital △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: SUMAC '19, 2019

Showing 1–10 of 10 results for author: Gouet-Brunet, V