-
A3D: Does Diffusion Dream about 3D Alignment?
Authors:
Savva Ignatyev,
Nina Konovalova,
Daniil Selikhanovych,
Oleg Voynov,
Nikolay Patakin,
Ilya Olkov,
Dmitry Senushkin,
Alexey Artemov,
Anton Konushin,
Alexander Filippov,
Peter Wonka,
Evgeny Burnaev
Abstract:
We tackle the problem of text-driven 3D generation from a geometry alignment perspective. Given a set of text prompts, we aim to generate a collection of objects with semantically corresponding parts aligned across them. Recent methods based on Score Distillation have succeeded in distilling the knowledge from 2D diffusion models to high-quality representations of the 3D objects. These methods han…
▽ More
We tackle the problem of text-driven 3D generation from a geometry alignment perspective. Given a set of text prompts, we aim to generate a collection of objects with semantically corresponding parts aligned across them. Recent methods based on Score Distillation have succeeded in distilling the knowledge from 2D diffusion models to high-quality representations of the 3D objects. These methods handle multiple text queries separately, and therefore the resulting objects have a high variability in object pose and structure. However, in some applications, such as 3D asset design, it may be desirable to obtain a set of objects aligned with each other. In order to achieve the alignment of the corresponding parts of the generated objects, we propose to embed these objects into a common latent space and optimize the continuous transitions between these objects. We enforce two kinds of properties of these transitions: smoothness of the transition and plausibility of the intermediate objects along the transition. We demonstrate that both of these properties are essential for good alignment. We provide several practical scenarios that benefit from alignment between the objects, including 3D editing and object hybridization, and experimentally demonstrate the effectiveness of our method. https://voyleg.github.io/a3d/
△ Less
Submitted 16 March, 2025; v1 submitted 21 June, 2024;
originally announced June 2024.
-
NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion
Authors:
Savva Ignatyev,
Daniil Selikhanovych,
Oleg Voynov,
Yiqun Wang,
Peter Wonka,
Stamatios Lefkimmiatis,
Evgeny Burnaev
Abstract:
We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured. Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to comple…
▽ More
We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured. Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to complete the shape in unobserved regions in a plausible manner. We introduce three components. First, we suggest employing normal maps as a pure geometric representation for SDS instead of color renderings which are entangled with the appearance information. Second, we introduce the freezing of the SDS noise during training which results in more coherent gradients and better convergence. Third, we propose Multi-View SDS as a way to condition the generation of the non-observable part of the surface without fine-tuning or making changes to the underlying 2D Stable Diffusion model. We evaluate our approach on the BlendedMVS dataset demonstrating significant qualitative and quantitative improvements over competing methods.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects
Authors:
Yue Fan,
Ningjing Fan,
Ivan Skorokhodov,
Oleg Voynov,
Savva Ignatyev,
Evgeny Burnaev,
Peter Wonka,
Yiqun Wang
Abstract:
We develop a method that recovers the surface, materials, and illumination of a scene from its posed multi-view images. In contrast to prior work, it does not require any additional data and can handle glossy objects or bright lighting. It is a progressive inverse rendering approach, which consists of three stages. In the first stage, we reconstruct the scene radiance and signed distance function…
▽ More
We develop a method that recovers the surface, materials, and illumination of a scene from its posed multi-view images. In contrast to prior work, it does not require any additional data and can handle glossy objects or bright lighting. It is a progressive inverse rendering approach, which consists of three stages. In the first stage, we reconstruct the scene radiance and signed distance function (SDF) with a novel regularization strategy for specular reflections. We propose to explain a pixel color using both surface and volume rendering jointly, which allows for handling complex view-dependent lighting effects for surface reconstruction. In the second stage, we distill light visibility and indirect illumination from the learned SDF and radiance field using learnable mapping functions. Finally, we design a method for estimating the ratio of incoming direct light reflected in a specular manner and use it to reconstruct the materials and direct illumination. Experimental results demonstrate that the proposed method outperforms the current state-of-the-art in recovering surfaces, materials, and lighting without relying on any additional data.
△ Less
Submitted 7 April, 2025; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Multi-sensor large-scale dataset for multi-view 3D reconstruction
Authors:
Oleg Voynov,
Gleb Bobrovskikh,
Pavel Karpyshev,
Saveliy Galochkin,
Andrei-Timotei Ardelean,
Arseniy Bozhenko,
Ekaterina Karmanova,
Pavel Kopanev,
Yaroslav Labutin-Rymsho,
Ruslan Rakhimov,
Aleksandr Safin,
Valerii Serpiva,
Alexey Artemov,
Evgeny Burnaev,
Dzmitry Tsetserukou,
Denis Zorin
Abstract:
We present a new multi-sensor dataset for multi-view 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. The scenes are selected to emphasize a diverse set of material properties challenging for existing algorithms. We provide arou…
▽ More
We present a new multi-sensor dataset for multi-view 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. The scenes are selected to emphasize a diverse set of material properties challenging for existing algorithms. We provide around 1.4 million images of 107 different scenes acquired from 100 viewing directions under 14 lighting conditions. We expect our dataset will be useful for evaluation and training of 3D reconstruction algorithms and for related tasks. The dataset is available at skoltech3d.appliedai.tech.
△ Less
Submitted 28 March, 2023; v1 submitted 11 March, 2022;
originally announced March 2022.
-
Can We Use Neural Regularization to Solve Depth Super-Resolution?
Authors:
Milena Gazdieva,
Oleg Voynov,
Alexey Artemov,
Youyi Zheng,
Luiz Velho,
Evgeny Burnaev
Abstract:
Depth maps captured with commodity sensors often require super-resolution to be used in applications. In this work we study a super-resolution approach based on a variational problem statement with Tikhonov regularization where the regularizer is parametrized with a deep neural network. This approach was previously applied successfully in photoacoustic tomography. We experimentally show that its a…
▽ More
Depth maps captured with commodity sensors often require super-resolution to be used in applications. In this work we study a super-resolution approach based on a variational problem statement with Tikhonov regularization where the regularizer is parametrized with a deep neural network. This approach was previously applied successfully in photoacoustic tomography. We experimentally show that its application to depth map super-resolution is difficult, and provide suggestions about the reasons for that.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Unpaired Depth Super-Resolution in the Wild
Authors:
Aleksandr Safin,
Maxim Kan,
Nikita Drobyshev,
Oleg Voynov,
Alexey Artemov,
Alexander Filippov,
Denis Zorin,
Evgeny Burnaev
Abstract:
Depth maps captured with commodity sensors are often of low quality and resolution; these maps need to be enhanced to be used in many applications. State-of-the-art data-driven methods of depth map super-resolution rely on registered pairs of low- and high-resolution depth maps of the same scenes. Acquisition of real-world paired data requires specialized setups. Another alternative, generating lo…
▽ More
Depth maps captured with commodity sensors are often of low quality and resolution; these maps need to be enhanced to be used in many applications. State-of-the-art data-driven methods of depth map super-resolution rely on registered pairs of low- and high-resolution depth maps of the same scenes. Acquisition of real-world paired data requires specialized setups. Another alternative, generating low-resolution maps from high-resolution maps by subsampling, adding noise and other artificial degradation methods, does not fully capture the characteristics of real-world low-resolution images. As a consequence, supervised learning methods trained on such artificial paired data may not perform well on real-world low-resolution inputs. We consider an approach to depth super-resolution based on learning from unpaired data. While many techniques for unpaired image-to-image translation have been proposed, most fail to deliver effective hole-filling or reconstruct accurate surfaces using depth maps. We propose an unpaired learning method for depth super-resolution, which is based on a learnable degradation model, enhancement component and surface normal estimates as features to produce more accurate depth maps. We propose a benchmark for unpaired depth SR and demonstrate that our method outperforms existing unpaired methods and performs on par with paired.
△ Less
Submitted 23 September, 2022; v1 submitted 25 May, 2021;
originally announced May 2021.
-
How Good MVSNets Are at Depth Fusion
Authors:
Oleg Voynov,
Aleksandr Safin,
Savva Ignatyev,
Evgeny Burnaev
Abstract:
We study the effects of the additional input to deep multi-view stereo methods in the form of low-quality sensor depth. We modify two state-of-the-art deep multi-view stereo methods for using with the input depth. We show that the additional input depth may improve the quality of deep multi-view stereo.
We study the effects of the additional input to deep multi-view stereo methods in the form of low-quality sensor depth. We modify two state-of-the-art deep multi-view stereo methods for using with the input depth. We show that the additional input depth may improve the quality of deep multi-view stereo.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Deep Vectorization of Technical Drawings
Authors:
Vage Egiazarian,
Oleg Voynov,
Alexey Artemov,
Denis Volkhonskiy,
Aleksandr Safin,
Maria Taktasheva,
Denis Zorin,
Evgeny Burnaev
Abstract:
We present a new method for vectorization of technical line drawings, such as floor plans, architectural drawings, and 2D CAD images. Our method includes (1) a deep learning-based cleaning stage to eliminate the background and imperfections in the image and fill in missing parts, (2) a transformer-based network to estimate vector primitives, and (3) optimization procedure to obtain the final primi…
▽ More
We present a new method for vectorization of technical line drawings, such as floor plans, architectural drawings, and 2D CAD images. Our method includes (1) a deep learning-based cleaning stage to eliminate the background and imperfections in the image and fill in missing parts, (2) a transformer-based network to estimate vector primitives, and (3) optimization procedure to obtain the final primitive configurations. We train the networks on synthetic data, renderings of vector line drawings, and manually vectorized scans of line drawings. Our method quantitatively and qualitatively outperforms a number of existing techniques on a collection of representative technical drawings.
△ Less
Submitted 30 July, 2020; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds
Authors:
Vage Egiazarian,
Savva Ignatyev,
Alexey Artemov,
Oleg Voynov,
Andrey Kravchenko,
Youyi Zheng,
Luiz Velho,
Evgeny Burnaev
Abstract:
Constructing high-quality generative models for 3D shapes is a fundamental task in computer vision with diverse applications in geometry processing, engineering, and design. Despite the recent progress in deep generative modelling, synthesis of finely detailed 3D surfaces, such as high-resolution point clouds, from scratch has not been achieved with existing approaches. In this work, we propose to…
▽ More
Constructing high-quality generative models for 3D shapes is a fundamental task in computer vision with diverse applications in geometry processing, engineering, and design. Despite the recent progress in deep generative modelling, synthesis of finely detailed 3D surfaces, such as high-resolution point clouds, from scratch has not been achieved with existing approaches. In this work, we propose to employ the latent-space Laplacian pyramid representation within a hierarchical generative model for 3D point clouds. We combine the recently proposed latent-space GAN and Laplacian GAN architectures to form a multi-scale model capable of generating 3D point clouds at increasing levels of detail. Our evaluation demonstrates that our model outperforms the existing generative models for 3D point clouds.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Perceptual deep depth super-resolution
Authors:
Oleg Voynov,
Alexey Artemov,
Vage Egiazarian,
Alexander Notchenko,
Gleb Bobrovskikh,
Denis Zorin,
Evgeny Burnaev
Abstract:
RGBD images, combining high-resolution color and lower-resolution depth from various types of depth sensors, are increasingly common. One can significantly improve the resolution of depth maps by taking advantage of color information; deep learning methods make combining color and depth information particularly easy. However, fusing these two sources of data may lead to a variety of artifacts. If…
▽ More
RGBD images, combining high-resolution color and lower-resolution depth from various types of depth sensors, are increasingly common. One can significantly improve the resolution of depth maps by taking advantage of color information; deep learning methods make combining color and depth information particularly easy. However, fusing these two sources of data may lead to a variety of artifacts. If depth maps are used to reconstruct 3D shapes, e.g., for virtual reality applications, the visual quality of upsampled images is particularly important. The main idea of our approach is to measure the quality of depth map upsampling using renderings of resulting 3D surfaces. We demonstrate that a simple visual appearance-based loss, when used with either a trained CNN or simply a deep prior, yields significantly improved 3D shapes, as measured by a number of existing perceptual metrics. We compare this approach with a number of existing optimization and learning-based techniques.
△ Less
Submitted 9 September, 2019; v1 submitted 24 December, 2018;
originally announced December 2018.