-
ViVo: A Dataset for Volumetric Video Reconstruction and Compression
Authors:
Adrian Azzarelli,
Ge Gao,
Ho Man Kwan,
Fan Zhang,
Nantheera Anantrasirichai,
Ollie Moolan-Feroze,
David Bull
Abstract:
As research on neural volumetric video reconstruction and compression flourishes, there is a need for diverse and realistic datasets, which can be used to develop and validate reconstruction and compression models. However, existing volumetric video datasets lack diverse content in terms of both semantic and low-level features that are commonly present in real-world production pipelines. In this c…
▽ More
As research on neural volumetric video reconstruction and compression flourishes, there is a need for diverse and realistic datasets, which can be used to develop and validate reconstruction and compression models. However, existing volumetric video datasets lack diverse content in terms of both semantic and low-level features that are commonly present in real-world production pipelines. In this context, we propose a new dataset, ViVo, for VolumetrIc VideO reconstruction and compression. The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.). Each video sequence in this database contains raw data including fourteen multi-view RGB and depth video pairs, synchronized at 30FPS with per-frame calibration and audio data, and their associated 2-D foreground masks and 3-D point clouds. To demonstrate the use of this database, we have benchmarked three state-of-the-art (SotA) 3-D reconstruction methods and two volumetric video compression algorithms. The obtained results evidence the challenging nature of the proposed dataset and the limitations of existing datasets for both volumetric video reconstruction and compression tasks, highlighting the need to develop more effective algorithms for these applications. The database and the associated results are available at https://vivo-bvicr.github.io/
△ Less
Submitted 9 June, 2025; v1 submitted 31 May, 2025;
originally announced June 2025.
-
AquaNeRF: Neural Radiance Fields in Underwater Media with Distractor Removal
Authors:
Luca Gough,
Adrian Azzarelli,
Fan Zhang,
Nantheera Anantrasirichai
Abstract:
Neural radiance field (NeRF) research has made significant progress in modeling static video content captured in the wild. However, current models and rendering processes rarely consider scenes captured underwater, which are useful for studying and filming ocean life. They fail to address visual artifacts unique to underwater scenes, such as moving fish and suspended particles. This paper introduc…
▽ More
Neural radiance field (NeRF) research has made significant progress in modeling static video content captured in the wild. However, current models and rendering processes rarely consider scenes captured underwater, which are useful for studying and filming ocean life. They fail to address visual artifacts unique to underwater scenes, such as moving fish and suspended particles. This paper introduces a novel NeRF renderer and optimization scheme for an implicit MLP-based NeRF model. Our renderer reduces the influence of floaters and moving objects that interfere with static objects of interest by estimating a single surface per ray. We use a Gaussian weight function with a small offset to ensure that the transmittance of the surrounding media remains constant. Additionally, we enhance our model with a depth-based scaling function to upscale gradients for near-camera volumes. Overall, our method outperforms the baseline Nerfacto by approximately 7.5\% and SeaThru-NeRF by 6.2% in terms of PSNR. Subjective evaluation also shows a significant reduction of artifacts while preserving details of static targets and background compared to the state of the arts.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Exploring Dynamic Novel View Synthesis Technologies for Cinematography
Authors:
Adrian Azzarelli,
Nantheera Anantrasirichai,
David R Bull
Abstract:
Novel view synthesis (NVS) has shown significant promise for applications in cinematographic production, particularly through the exploitation of Neural Radiance Fields (NeRF) and Gaussian Splatting (GS). These methods model real 3D scenes, enabling the creation of new shots that are challenging to capture in the real world due to set topology or expensive equipment requirement. This innovation al…
▽ More
Novel view synthesis (NVS) has shown significant promise for applications in cinematographic production, particularly through the exploitation of Neural Radiance Fields (NeRF) and Gaussian Splatting (GS). These methods model real 3D scenes, enabling the creation of new shots that are challenging to capture in the real world due to set topology or expensive equipment requirement. This innovation also offers cinematographic advantages such as smooth camera movements, virtual re-shoots, slow-motion effects, etc. This paper explores dynamic NVS with the aim of facilitating the model selection process. We showcase its potential through a short montage filmed using various NVS models.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
BVI-CR: A Multi-View Human Dataset for Volumetric Video Compression
Authors:
Ge Gao,
Adrian Azzarelli,
Ho Man Kwan,
Nantheera Anantrasirichai,
Fan Zhang,
Oliver Moolan-Feroze,
David Bull
Abstract:
The advances in immersive technologies and 3D reconstruction have enabled the creation of digital replicas of real-world objects and environments with fine details. These processes generate vast amounts of 3D data, requiring more efficient compression methods to satisfy the memory and bandwidth constraints associated with data storage and transmission. However, the development and validation of ef…
▽ More
The advances in immersive technologies and 3D reconstruction have enabled the creation of digital replicas of real-world objects and environments with fine details. These processes generate vast amounts of 3D data, requiring more efficient compression methods to satisfy the memory and bandwidth constraints associated with data storage and transmission. However, the development and validation of efficient 3D data compression methods are constrained by the lack of comprehensive and high-quality volumetric video datasets, which typically require much more effort to acquire and consume increased resources compared to 2D image and video databases. To bridge this gap, we present an open multi-view volumetric human dataset, denoted BVI-CR, which contains 18 multi-view RGB-D captures and their corresponding textured polygonal meshes, depicting a range of diverse human actions. Each video sequence contains 10 views in 1080p resolution with durations between 10-15 seconds at 30FPS. Using BVI-CR, we benchmarked three conventional and neural coordinate-based multi-view video compression methods, following the MPEG MIV Common Test Conditions, and reported their rate quality performance based on various quality metrics. The results show the great potential of neural representation based methods in volumetric video compression compared to conventional video coding methods (with an up to 38\% average coding gain in PSNR). This dataset provides a development and validation platform for a variety of tasks including volumetric reconstruction, compression, and quality assessment. The database will be shared publicly at \url{https://github.com/fan-aaron-zhang/bvi-cr}.
△ Less
Submitted 17 November, 2024;
originally announced November 2024.
-
Reviewing Intelligent Cinematography: AI research for camera-based video production
Authors:
Adrian Azzarelli,
Nantheera Anantrasirichai,
David R Bull
Abstract:
This paper offers the first comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Addressing the lack of review papers in the field of intelligent cinematography} (IC) and the breadth of related computer vision research, we present a holistic view of the IC la…
▽ More
This paper offers the first comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Addressing the lack of review papers in the field of intelligent cinematography} (IC) and the breadth of related computer vision research, we present a holistic view of the IC landscape while providing technical insight, important for experts across disciplines. We provide technical background on generative AI, object detection, automated camera calibration and 3-D content acquisition, with references to assist non-technical readers. The application sections categorize work in terms of four production types: General Production, Virtual Production, Live Production and Aerial Production. Within each application section, we (1) sub-classify work according to research topic and (2) describe the trends and challenges relevant to each type of production. In the final chapter, we address the greater scope of IC research and summarize the significant potential of this area to influence the creative industries sector. We suggest that work relating to virtual production has the greatest potential to impact other mediums of production, driven by the growing interest in LED volumes/stages for in-camera virtual effects (ICVFX) and automated 3-D capture for virtual modeling of real world scenes and actors. We also address ethical and legal concerns regarding the use of creative AI that impact on artists, actors, technologists and the general public.
△ Less
Submitted 6 January, 2025; v1 submitted 8 May, 2024;
originally announced May 2024.
-
WavePlanes: Compact Hex Planes for Dynamic Novel View Synthesis
Authors:
Adrian Azzarelli,
Nantheera Anantrasirichai,
David R Bull
Abstract:
Dynamic Novel View Synthesis (Dynamic NVS) enhances NVS technologies to model moving 3-D scenes. However, current methods are resource intensive and challenging to compress. To address this, we present WavePlanes, a fast and more compact hex plane representation, applicable to both Neural Radiance Fields and Gaussian Splatting methods. Rather than modeling many feature scales separately (as done p…
▽ More
Dynamic Novel View Synthesis (Dynamic NVS) enhances NVS technologies to model moving 3-D scenes. However, current methods are resource intensive and challenging to compress. To address this, we present WavePlanes, a fast and more compact hex plane representation, applicable to both Neural Radiance Fields and Gaussian Splatting methods. Rather than modeling many feature scales separately (as done previously), we use the inverse discrete wavelet transform to reconstruct features at varying scales. This leads to a more compact representation and allows us to explore wavelet-based compression schemes for further gains. The proposed compression scheme exploits the sparsity of wavelet coefficients, by applying hard thresholding to the wavelet planes and storing nonzero coefficients and their locations on each plane in a Hash Map. Compared to the state-of-the-art (SotA), WavePlanes is significantly smaller, less resource demanding and competitive in reconstruction quality. Compared to small SotA models, WavePlanes outperforms methods in both model size and quality of novel views.
△ Less
Submitted 23 December, 2024; v1 submitted 3 December, 2023;
originally announced December 2023.
-
Towards a Robust Framework for NeRF Evaluation
Authors:
Adrian Azzarelli,
Nantheera Anantrasirichai,
David R Bull
Abstract:
Neural Radiance Field (NeRF) research has attracted significant attention recently, with 3D modelling, virtual/augmented reality, and visual effects driving its application. While current NeRF implementations can produce high quality visual results, there is a conspicuous lack of reliable methods for evaluating them. Conventional image quality assessment methods and analytical metrics (e.g. PSNR,…
▽ More
Neural Radiance Field (NeRF) research has attracted significant attention recently, with 3D modelling, virtual/augmented reality, and visual effects driving its application. While current NeRF implementations can produce high quality visual results, there is a conspicuous lack of reliable methods for evaluating them. Conventional image quality assessment methods and analytical metrics (e.g. PSNR, SSIM, LPIPS etc.) only provide approximate indicators of performance since they generalise the ability of the entire NeRF pipeline. Hence, in this paper, we propose a new test framework which isolates the neural rendering network from the NeRF pipeline and then performs a parametric evaluation by training and evaluating the NeRF on an explicit radiance field representation. We also introduce a configurable approach for generating representations specifically for evaluation purposes. This employs ray-casting to transform mesh models into explicit NeRF samples, as well as to "shade" these representations. Combining these two approaches, we demonstrate how different "tasks" (scenes with different visual effects or learning strategies) and types of networks (NeRFs and depth-wise implicit neural representations (INRs)) can be evaluated within this framework. Additionally, we propose a novel metric to measure task complexity of the framework which accounts for the visual parameters and the distribution of the spatial data. Our approach offers the potential to create a comparative objective evaluation framework for NeRF methods.
△ Less
Submitted 31 May, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.