Search | arXiv e-print repository

Exploring Dynamic Novel View Synthesis Technologies for Cinematography

Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Abstract: Novel view synthesis (NVS) has shown significant promise for applications in cinematographic production, particularly through the exploitation of Neural Radiance Fields (NeRF) and Gaussian Splatting (GS). These methods model real 3D scenes, enabling the creation of new shots that are challenging to capture in the real world due to set topology or expensive equipment requirement. This innovation al… ▽ More Novel view synthesis (NVS) has shown significant promise for applications in cinematographic production, particularly through the exploitation of Neural Radiance Fields (NeRF) and Gaussian Splatting (GS). These methods model real 3D scenes, enabling the creation of new shots that are challenging to capture in the real world due to set topology or expensive equipment requirement. This innovation also offers cinematographic advantages such as smooth camera movements, virtual re-shoots, slow-motion effects, etc. This paper explores dynamic NVS with the aim of facilitating the model selection process. We showcase its potential through a short montage filmed using various NVS models. △ Less

Submitted 23 December, 2024; originally announced December 2024.

arXiv:2407.03535 [pdf, other]

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Authors: Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull

Abstract: Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, inco… ▽ More Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. We provide benchmarks based on four different technologies: convolutional neural networks, transformers, diffusion models, and state space models (mamba). Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets. Our dataset and links to benchmarks are publicly available at https://doi.org/10.21227/mzny-8c77. △ Less

Submitted 28 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.01970

arXiv:2405.05039 [pdf, other]

doi 10.1007/s10462-024-11089-3

Reviewing Intelligent Cinematography: AI research for camera-based video production

Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Abstract: This paper offers the first comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Addressing the lack of review papers in the field of intelligent cinematography} (IC) and the breadth of related computer vision research, we present a holistic view of the IC la… ▽ More This paper offers the first comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Addressing the lack of review papers in the field of intelligent cinematography} (IC) and the breadth of related computer vision research, we present a holistic view of the IC landscape while providing technical insight, important for experts across disciplines. We provide technical background on generative AI, object detection, automated camera calibration and 3-D content acquisition, with references to assist non-technical readers. The application sections categorize work in terms of four production types: General Production, Virtual Production, Live Production and Aerial Production. Within each application section, we (1) sub-classify work according to research topic and (2) describe the trends and challenges relevant to each type of production. In the final chapter, we address the greater scope of IC research and summarize the significant potential of this area to influence the creative industries sector. We suggest that work relating to virtual production has the greatest potential to impact other mediums of production, driven by the growing interest in LED volumes/stages for in-camera virtual effects (ICVFX) and automated 3-D capture for virtual modeling of real world scenes and actors. We also address ethical and legal concerns regarding the use of creative AI that impact on artists, actors, technologists and the general public. △ Less

Submitted 6 January, 2025; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: This paper has been accepted for publication with "Artificial Intelligence Review" Journal (https://link.springer.com/journal/10462) and we are in the procress of publishing it

arXiv:2402.19041 [pdf, other]

Atmospheric Turbulence Removal with Video Sequence Deep Visual Priors

Authors: P. Hill, N. Anantrasirichai, A. Achim, D. R. Bull

Abstract: Atmospheric turbulence poses a challenge for the interpretation and visual perception of visual imagery due to its distortion effects. Model-based approaches have been used to address this, but such methods often suffer from artefacts associated with moving content. Conversely, deep learning based methods are dependent on large and diverse datasets that may not effectively represent any specific c… ▽ More Atmospheric turbulence poses a challenge for the interpretation and visual perception of visual imagery due to its distortion effects. Model-based approaches have been used to address this, but such methods often suffer from artefacts associated with moving content. Conversely, deep learning based methods are dependent on large and diverse datasets that may not effectively represent any specific content. In this paper, we address these problems with a self-supervised learning method that does not require ground truth. The proposed method is not dependent on any dataset outside of the single data sequence being processed but is also able to improve the quality of any input raw sequences or pre-processed sequences. Specifically, our method is based on an accelerated Deep Image Prior (DIP), but integrates temporal information using pixel shuffling and a temporal sliding window. This efficiently learns spatio-temporal priors leading to a system that effectively mitigates atmospheric turbulence distortions. The experiments show that our method improves visual quality results qualitatively and quantitatively. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2312.02218 [pdf, other]

WavePlanes: Compact Hex Planes for Dynamic Novel View Synthesis

Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Abstract: Dynamic Novel View Synthesis (Dynamic NVS) enhances NVS technologies to model moving 3-D scenes. However, current methods are resource intensive and challenging to compress. To address this, we present WavePlanes, a fast and more compact hex plane representation, applicable to both Neural Radiance Fields and Gaussian Splatting methods. Rather than modeling many feature scales separately (as done p… ▽ More Dynamic Novel View Synthesis (Dynamic NVS) enhances NVS technologies to model moving 3-D scenes. However, current methods are resource intensive and challenging to compress. To address this, we present WavePlanes, a fast and more compact hex plane representation, applicable to both Neural Radiance Fields and Gaussian Splatting methods. Rather than modeling many feature scales separately (as done previously), we use the inverse discrete wavelet transform to reconstruct features at varying scales. This leads to a more compact representation and allows us to explore wavelet-based compression schemes for further gains. The proposed compression scheme exploits the sparsity of wavelet coefficients, by applying hard thresholding to the wavelet planes and storing nonzero coefficients and their locations on each plane in a Hash Map. Compared to the state-of-the-art (SotA), WavePlanes is significantly smaller, less resource demanding and competitive in reconstruction quality. Compared to small SotA models, WavePlanes outperforms methods in both model size and quality of novel views. △ Less

Submitted 23 December, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

arXiv:2305.18079 [pdf, other]

Towards a Robust Framework for NeRF Evaluation

Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Abstract: Neural Radiance Field (NeRF) research has attracted significant attention recently, with 3D modelling, virtual/augmented reality, and visual effects driving its application. While current NeRF implementations can produce high quality visual results, there is a conspicuous lack of reliable methods for evaluating them. Conventional image quality assessment methods and analytical metrics (e.g. PSNR,… ▽ More Neural Radiance Field (NeRF) research has attracted significant attention recently, with 3D modelling, virtual/augmented reality, and visual effects driving its application. While current NeRF implementations can produce high quality visual results, there is a conspicuous lack of reliable methods for evaluating them. Conventional image quality assessment methods and analytical metrics (e.g. PSNR, SSIM, LPIPS etc.) only provide approximate indicators of performance since they generalise the ability of the entire NeRF pipeline. Hence, in this paper, we propose a new test framework which isolates the neural rendering network from the NeRF pipeline and then performs a parametric evaluation by training and evaluating the NeRF on an explicit radiance field representation. We also introduce a configurable approach for generating representations specifically for evaluation purposes. This employs ray-casting to transform mesh models into explicit NeRF samples, as well as to "shade" these representations. Combining these two approaches, we demonstrate how different "tasks" (scenes with different visual effects or learning strategies) and types of networks (NeRFs and depth-wise implicit neural representations (INRs)) can be evaluated within this framework. Additionally, we propose a novel metric to measure task complexity of the framework which accounts for the visual parameters and the distribution of the spatial data. Our approach offers the potential to create a comparative objective evaluation framework for NeRF methods. △ Less

Submitted 31 May, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 9 pages, 2 main experiments, 2 additional experiments

arXiv:2110.06740 [pdf, other]

Transform and Bitstream Domain Image Classification

Authors: P. R. Hill, D. R. Bull

Abstract: Classification of images within the compressed domain offers significant benefits. These benefits include reduced memory and computational requirements of a classification system. This paper proposes two such methods as a proof of concept: The first classifies within the JPEG image transform domain (i.e. DCT transform data); the second classifies the JPEG compressed binary bitstream directly. Thes… ▽ More Classification of images within the compressed domain offers significant benefits. These benefits include reduced memory and computational requirements of a classification system. This paper proposes two such methods as a proof of concept: The first classifies within the JPEG image transform domain (i.e. DCT transform data); the second classifies the JPEG compressed binary bitstream directly. These two methods are implemented using Residual Network CNNs and an adapted Vision Transformer. Top-1 accuracy of approximately 70% and 60% were achieved using these methods respectively when classifying the Caltech C101 database. Although these results are significantly behind the state of the art for classification for this database (~95%), it illustrates the first time direct bitstream image classification has been achieved. This work confirms that direct bitstream image classification is possible and could be utilised in a first pass database screening of a raw bitstream (within a wired or wireless network) or where computational, memory and bandwidth requirements are severely restricted. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 7 pages, 3 figures, one table

arXiv:2110.06697 [pdf, other]

Semantic Image Fusion

Authors: P. R. Hill, D. R. Bull

Abstract: Image fusion methods and metrics for their evaluation have conventionally used pixel-based or low-level features. However, for many applications, the aim of image fusion is to effectively combine the semantic content of the input images. This paper proposes a novel system for the semantic combination of visual content using pre-trained CNN network architectures. Our proposed semantic fusion is ini… ▽ More Image fusion methods and metrics for their evaluation have conventionally used pixel-based or low-level features. However, for many applications, the aim of image fusion is to effectively combine the semantic content of the input images. This paper proposes a novel system for the semantic combination of visual content using pre-trained CNN network architectures. Our proposed semantic fusion is initiated through the fusion of the top layer feature map outputs (for each input image)through gradient updating of the fused image input (so-called image optimisation). Simple "choose maximum" and "local majority" filter based fusion rules are utilised for feature map fusion. This provides a simple method to combine layer outputs and thus a unique framework to fuse single-channel and colour images within a decomposition pre-trained for classification and therefore aligned with semantic fusion. Furthermore, class activation mappings of each input image are used to combine semantic information at a higher level. The developed methods are able to give equivalent low-level fusion performance to state of the art methods while providing a unique architecture to combine semantic information from multiple images. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 10 pages, 3 figures and 2 tables. To be submitted to IEEE Transactions on Image Processing

arXiv:2106.08147 [pdf, other]

doi 10.1117/12.2530688

Perceptually-inspired super-resolution of compressed videos

Authors: Di Ma, Mariana Afonso, Fan Zhang, David R. Bull

Abstract: Spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. This approach encodes a lower resolution version of the input video and reconstructs the original resolution during decoding. Instead of using conventional up-sampling filters, recent work has employed advanced super-resolution methods based on convolutional neural networks… ▽ More Spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. This approach encodes a lower resolution version of the input video and reconstructs the original resolution during decoding. Instead of using conventional up-sampling filters, recent work has employed advanced super-resolution methods based on convolutional neural networks (CNNs) to further improve reconstruction quality. These approaches are usually trained to minimise pixel-based losses such as Mean-Squared Error (MSE), despite the fact that this type of loss metric does not correlate well with subjective opinions. In this paper, a perceptually-inspired super-resolution approach (M-SRGAN) is proposed for spatial up-sampling of compressed video using a modified CNN model, which has been trained using a generative adversarial network (GAN) on compressed content with perceptual loss functions. The proposed method was integrated with HEVC HM 16.20, and has been evaluated on the JVET Common Test Conditions (UHD test sequences) using the Random Access configuration. The results show evident perceptual quality improvement over the original HM 16.20, with an average bitrate saving of 35.6% (Bjøntegaard Delta measurement) based on a perceptual quality metric, VMAF. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2011.09190 [pdf, other]

doi 10.1016/j.image.2024.117127

CVEGAN: A Perceptually-inspired GAN for Compressed Video Enhancement

Authors: Di Ma, Fan Zhang, David R. Bull

Abstract: We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM). The ERNB has also been employed in the discriminator to improve the repre… ▽ More We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM). The ERNB has also been employed in the discriminator to improve the representational capability. The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions. The proposed network has been fully evaluated in the context of two typical video compression enhancement tools: post-processing (PP) and spatial resolution adaptation (SRA). CVEGAN has been fully integrated into the MPEG HEVC video coding test model (HM16.20) and experimental results demonstrate significant coding gains (up to 28% for PP and 38% for SRA compared to the anchor) over existing state-of-the-art architectures for both coding tools across multiple datasets. △ Less

Submitted 26 November, 2020; v1 submitted 18 November, 2020; originally announced November 2020.

arXiv:2009.07583 [pdf, other]

doi 10.1109/MMUL.2021.3052437

Video Compression with CNN-based Post Processing

Authors: Fan Zhang, Di Ma, Chen Feng, David R. Bull

Abstract: In recent years, video compression techniques have been significantly challenged by the rapidly increased demands associated with high quality and immersive video content. Among various compression tools, post-processing can be applied on reconstructed video content to mitigate visible compression artefacts and to enhance overall perceptual quality. Inspired by advances in deep learning, we propos… ▽ More In recent years, video compression techniques have been significantly challenged by the rapidly increased demands associated with high quality and immersive video content. Among various compression tools, post-processing can be applied on reconstructed video content to mitigate visible compression artefacts and to enhance overall perceptual quality. Inspired by advances in deep learning, we propose a new CNN-based post-processing approach, which has been integrated with two state-of-the-art coding standards, VVC and AV1. The results show consistent coding gains on all tested sequences at various spatial resolutions, with average bit rate savings of 4.0% and 5.8% against original VVC and AV1 respectively (based on the assessment of PSNR). This network has also been trained with perceptually inspired loss functions, which have further improved reconstruction quality based on perceptual quality assessment (VMAF), with average coding gains of 13.9% over VVC and 10.5% against AV1. △ Less

Submitted 14 January, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

arXiv:2007.14726 [pdf, other]

doi 10.1117/12.2567633

Video compression with low complexity CNN-based spatial resolution adaptation

Authors: Di Ma, Fan Zhang, David R. Bull

Abstract: It has recently been demonstrated that spatial resolution adaptation can be integrated within video compression to improve overall coding performance by spatially down-sampling before encoding and super-resolving at the decoder. Significant improvements have been reported when convolutional neural networks (CNNs) were used to perform the resolution up-sampling. However, this approach suffers from… ▽ More It has recently been demonstrated that spatial resolution adaptation can be integrated within video compression to improve overall coding performance by spatially down-sampling before encoding and super-resolving at the decoder. Significant improvements have been reported when convolutional neural networks (CNNs) were used to perform the resolution up-sampling. However, this approach suffers from high complexity at the decoder due to the employment of CNN-based super-resolution. In this paper, a novel framework is proposed which supports the flexible allocation of complexity between the encoder and decoder. This approach employs a CNN model for video down-sampling at the encoder and uses a Lanczos3 filter to reconstruct full resolution at the decoder. The proposed method was integrated into the HEVC HM 16.20 software and evaluated on JVET UHD test sequences using the All Intra configuration. The experimental results demonstrate the potential of the proposed approach, with significant bitrate savings (more than 10%) over the original HEVC HM, coupled with reduced computational complexity at both encoder (29%) and decoder (10%). △ Less

Submitted 29 July, 2020; originally announced July 2020.

arXiv:2007.07099 [pdf, other]

doi 10.1109/JSTSP.2020.3043064

MFRNet: A New CNN Architecture for Post-Processing and In-loop Filtering

Authors: Di Ma, Fan Zhang, David R. Bull

Abstract: In this paper, we propose a novel convolutional neural network (CNN) architecture, MFRNet, for post-processing (PP) and in-loop filtering (ILF) in the context of video compression. This network consists of four Multi-level Feature review Residual dense Blocks (MFRBs), which are connected using a cascading structure. Each MFRB extracts features from multiple convolutional layers using dense connect… ▽ More In this paper, we propose a novel convolutional neural network (CNN) architecture, MFRNet, for post-processing (PP) and in-loop filtering (ILF) in the context of video compression. This network consists of four Multi-level Feature review Residual dense Blocks (MFRBs), which are connected using a cascading structure. Each MFRB extracts features from multiple convolutional layers using dense connections and a multi-level residual learning structure. In order to further improve information flow between these blocks, each of them also reuses high dimensional features from the previous MFRB. This network has been integrated into PP and ILF coding modules for both HEVC (HM 16.20) and VVC (VTM 7.0), and fully evaluated under the JVET Common Test Conditions using the Random Access configuration. The experimental results show significant and consistent coding gains over both anchor codecs (HEVC HM and VVC VTM) and also over other existing CNN-based PP/ILF approaches based on Bjontegaard Delta measurements using both PSNR and VMAF for quality assessment. When MFRNet is integrated into HM 16.20, gains up to 16.0% (BD-rate VMAF) are demonstrated for ILF, and up to 21.0% (BD-rate VMAF) for PP. The respective gains for VTM 7.0 are up to 5.1% for ILF and up to 7.1% for PP. △ Less

Submitted 11 December, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:2003.13552 [pdf, other]

doi 10.1109/TMM.2021.3108943

BVI-DVC: A Training Database for Deep Video Compression

Authors: Di Ma, Fan Zhang, David R. Bull

Abstract: Deep learning methods are increasingly being applied in the optimisation of video compression algorithms and can achieve significantly enhanced coding gains, compared to conventional approaches. Such approaches often employ Convolutional Neural Networks (CNNs) which are trained on databases with relatively limited content coverage. In this paper, a new extensive and representative video database,… ▽ More Deep learning methods are increasingly being applied in the optimisation of video compression algorithms and can achieve significantly enhanced coding gains, compared to conventional approaches. Such approaches often employ Convolutional Neural Networks (CNNs) which are trained on databases with relatively limited content coverage. In this paper, a new extensive and representative video database, BVI-DVC, is presented for training CNN-based video compression systems, with specific emphasis on machine learning tools that enhance conventional coding architectures, including spatial resolution and bit depth up-sampling, post-processing and in-loop filtering. BVI-DVC contains 800 sequences at various spatial resolutions from 270p to 2160p and has been evaluated on ten existing network architectures for four different coding tools. Experimental results show that this database produces significant improvements in terms of coding gains over three existing (commonly used) image/video training databases under the same training and evaluation configurations. The overall additional coding improvements by using the proposed database for all tested coding modules and CNN architectures are up to 10.3% based on the assessment of PSNR and 8.1% based on VMAF. △ Less

Submitted 8 October, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

arXiv:2003.06637 [pdf, other]

Fast Depth Estimation for View Synthesis

Authors: Nantheera Anantrasirichai, Majid Geravand, David Braendler, David R. Bull

Abstract: Disparity/depth estimation from sequences of stereo images is an important element in 3D vision. Owing to occlusions, imperfect settings and homogeneous luminance, accurate estimate of depth remains a challenging problem. Targetting view synthesis, we propose a novel learning-based framework making use of dilated convolution, densely connected convolutional modules, compact decoder and skip connec… ▽ More Disparity/depth estimation from sequences of stereo images is an important element in 3D vision. Owing to occlusions, imperfect settings and homogeneous luminance, accurate estimate of depth remains a challenging problem. Targetting view synthesis, we propose a novel learning-based framework making use of dilated convolution, densely connected convolutional modules, compact decoder and skip connections. The network is shallow but dense, so it is fast and accurate. Two additional contributions -- a non-linear adjustment of the depth resolution and the introduction of a projection loss, lead to reduction of estimation error by up to 20% and 25% respectively. The results show that our network outperforms state-of-the-art methods with an average improvement in accuracy of depth estimation and view synthesis by approximately 45% and 34% respectively. Where our method generates comparable quality of estimated depth, it performs 10 times faster than those methods. △ Less

Submitted 14 March, 2020; originally announced March 2020.

Comments: 5 pages

arXiv:1912.02305 [pdf, other]

HABNet: Machine Learning, Remote Sensing Based Detection and Prediction of Harmful Algal Blooms

Authors: P. R. Hill, A. Kumar, M. Temimi, D. R. Bull

Abstract: This paper describes the application of machine learning techniques to develop a state-of-the-art detection and prediction system for spatiotemporal events found within remote sensing data; specifically, Harmful Algal Bloom events (HABs). We propose an HAB detection system based on: a ground truth historical record of HAB events, a novel spatiotemporal datacube representation of each event (from M… ▽ More This paper describes the application of machine learning techniques to develop a state-of-the-art detection and prediction system for spatiotemporal events found within remote sensing data; specifically, Harmful Algal Bloom events (HABs). We propose an HAB detection system based on: a ground truth historical record of HAB events, a novel spatiotemporal datacube representation of each event (from MODIS and GEBCO bathymetry data) and a variety of machine learning architectures utilising state-of-the-art spatial and temporal analysis methods based on Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) components together with Random Forest and Support Vector Machine (SVM) classification methods. This work has focused specifically on the case study of the detection of Karenia Brevis Algae (K. brevis) HAB events within the coastal waters of Florida (over 2850 events from 2003 to 2018; an order of magnitude larger than any previous machine learning detection study into HAB events). The development of multimodal spatiotemporal datacube data structures and associated novel machine learning methods give a unique architecture for the automatic detection of environmental events. Specifically, when applied to the detection of HAB events it gives a maximum detection accuracy of 91% and a Kappa coefficient of 0.81 for the Florida data considered. A HAB forecast system was also developed where a temporal subset of each datacube was used to predict the presence of a HAB in the future. This system was not significantly less accurate than the detection system being able to predict with 86% accuracy up to 8 days in the future. △ Less

Submitted 16 April, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

arXiv:1911.02833 [pdf, other]

doi 10.1016/j.image.2021.116355

ViSTRA2: Video Coding using Spatial Resolution and Effective Bit Depth Adaptation

Authors: Fan Zhang, Mariana Afonso, David R. Bull

Abstract: We present a new video compression framework (ViSTRA2) which exploits adaptation of spatial resolution and effective bit depth, down-sampling these parameters at the encoder based on perceptual criteria, and up-sampling at the decoder using a deep convolution neural network. ViSTRA2 has been integrated with the reference software of both the HEVC (HM 16.20) and VVC (VTM 4.01), and evaluated under… ▽ More We present a new video compression framework (ViSTRA2) which exploits adaptation of spatial resolution and effective bit depth, down-sampling these parameters at the encoder based on perceptual criteria, and up-sampling at the decoder using a deep convolution neural network. ViSTRA2 has been integrated with the reference software of both the HEVC (HM 16.20) and VVC (VTM 4.01), and evaluated under the Joint Video Exploration Team Common Test Conditions using the Random Access configuration. Our results show consistent and significant compression gains against HM and VVC based on Bjønegaard Delta measurements, with average BD-rate savings of 12.6% (PSNR) and 19.5% (VMAF) over HM and 5.5% (PSNR) and 8.6% (VMAF) over VTM. △ Less

Submitted 7 November, 2019; originally announced November 2019.

Comments: 9 pages

arXiv:1711.04853 [pdf, other]

doi 10.1364/JOSAA.35.000690

Denoising Imaging Polarimetry by an Adapted BM3D Method

Authors: Alexander B. Tibbs, Ilse M. Daly, Nicholas W. Roberts, David R. Bull

Abstract: Imaging polarimetry allows more information to be extracted from a scene than conventional intensity or colour imaging. However, a major challenge of imaging polarimetry is image degradation due to noise. This paper investigates the mitigation of noise through denoising algorithms and compares existing denoising algorithms with a new method, based on BM3D. This algorithm, PBM3D, gives visual quali… ▽ More Imaging polarimetry allows more information to be extracted from a scene than conventional intensity or colour imaging. However, a major challenge of imaging polarimetry is image degradation due to noise. This paper investigates the mitigation of noise through denoising algorithms and compares existing denoising algorithms with a new method, based on BM3D. This algorithm, PBM3D, gives visual quality superior to the state of the art across all images and noise standard deviations tested. We show that denoising polarization images using PBM3D allows the degree of polarization to be more accurately calculated by comparing it to spectroscopy methods. △ Less

Submitted 16 November, 2017; v1 submitted 13 November, 2017; originally announced November 2017.

Showing 1–18 of 18 results for author: Bull, D R