Search | arXiv e-print repository

ICME 2025 Generalizable HDR and SDR Video Quality Measurement Grand Challenge

Authors: Yixu Chen, Bowen Chen, Hai Wei, Alan C. Bovik, Baojun Li, Wei Sun, Linhan Cao, Kang Fu, Dandan Zhu, Jun Jia, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Dounia Hammou, Fei Yin, Rafal Mantiuk, Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon

Abstract: This paper reports IEEE International Conference on Multimedia \& Expo (ICME) 2025 Grand Challenge on Generalizable HDR and SDR Video Quality Measurement. With the rapid development of video technology, especially High Dynamic Range (HDR) and Standard Dynamic Range (SDR) contents, the need for robust and generalizable Video Quality Assessment (VQA) methods has become increasingly demanded. Existin… ▽ More This paper reports IEEE International Conference on Multimedia \& Expo (ICME) 2025 Grand Challenge on Generalizable HDR and SDR Video Quality Measurement. With the rapid development of video technology, especially High Dynamic Range (HDR) and Standard Dynamic Range (SDR) contents, the need for robust and generalizable Video Quality Assessment (VQA) methods has become increasingly demanded. Existing VQA models often struggle to deliver consistent performance across varying dynamic ranges, distortion types, and diverse content. This challenge was established to benchmark and promote VQA approaches capable of jointly handling HDR and SDR content. In the final evaluation phase, five teams submitted seven models along with technical reports to the Full Reference (FR) and No Reference (NR) tracks. Among them, four methods outperformed VMAF baseline, while the top-performing model achieved state-of-the-art performance, setting a new benchmark for generalizable video quality assessment. △ Less

Submitted 28 June, 2025; originally announced June 2025.

Comments: ICME 2025 Grand Challenges

arXiv:2506.09795 [pdf, ps, other]

Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment

Authors: Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon

Abstract: This paper presents a novel approach for reduced-reference video quality assessment (VQA), developed as part of the recent VQA Grand Challenge. Our method leverages low-level complexity and structural information from reference and test videos to predict perceptual quality scores. Specifically, we extract spatio-temporal features using Video Complexity Analyzer (VCA) and compute SSIM values from t… ▽ More This paper presents a novel approach for reduced-reference video quality assessment (VQA), developed as part of the recent VQA Grand Challenge. Our method leverages low-level complexity and structural information from reference and test videos to predict perceptual quality scores. Specifically, we extract spatio-temporal features using Video Complexity Analyzer (VCA) and compute SSIM values from the test video to capture both texture and structural characteristics. These features are aggregated through temporal pooling, and residual features are calculated by comparing the original and distorted feature sets. The combined features are used to train an XGBoost regression model that estimates the overall video quality. The pipeline is fully automated, interpretable, and highly scalable, requiring no deep neural networks or GPU inference. Experimental results on the challenge dataset demonstrate that our proposed method achieves competitive correlation with subjective quality scores while maintaining a low computational footprint. The model's lightweight design and strong generalization performance suit real-time streaming quality monitoring and adaptive encoding scenarios. △ Less

Submitted 11 June, 2025; originally announced June 2025.

Comments: ICME 2025

arXiv:2410.06986 [pdf, other]

Diffusion Density Estimators

Authors: Akhil Premkumar

Abstract: We investigate the use of diffusion models as neural density estimators. The current approach to this problem involves converting the generative process to a smooth flow, known as the Probability Flow ODE. The log density at a given sample can be obtained by solving the ODE with a black-box solver. We introduce a new, highly parallelizable method that computes log densities without the need to sol… ▽ More We investigate the use of diffusion models as neural density estimators. The current approach to this problem involves converting the generative process to a smooth flow, known as the Probability Flow ODE. The log density at a given sample can be obtained by solving the ODE with a black-box solver. We introduce a new, highly parallelizable method that computes log densities without the need to solve a flow. Our approach is based on estimating a path integral by Monte Carlo, in a manner identical to the simulation-free training of diffusion models. We also study how different training parameters affect the accuracy of the density calculation, and offer insights into how these models can be made more scalable and efficient. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 20 pages + references, 7 figures

arXiv:2409.03817 [pdf, other]

Neural Entropy

Authors: Akhil Premkumar

Abstract: We explore the connection between deep learning and information theory through the paradigm of diffusion models. A diffusion model converts noise into structured data by reinstating, imperfectly, information that is erased when data was diffused to noise. This information is stored in a neural network during training. We quantify this information by introducing a measure called neural entropy, whi… ▽ More We explore the connection between deep learning and information theory through the paradigm of diffusion models. A diffusion model converts noise into structured data by reinstating, imperfectly, information that is erased when data was diffused to noise. This information is stored in a neural network during training. We quantify this information by introducing a measure called neural entropy, which is related to the total entropy produced by diffusion. Neural entropy is a function of not just the data distribution, but also the diffusive process itself. Measurements of neural entropy on a few simple image diffusion models reveal that they are extremely efficient at compressing large ensembles of structured data. △ Less

Submitted 21 May, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

Comments: 29 pages + references, 18 figures. Significantly revised from the previous version

arXiv:2403.10976 [pdf, other]

doi 10.1145/3625468.3652172

Quality-Aware Dynamic Resolution Adaptation Framework for Adaptive Video Streaming

Authors: Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon, Adam Wieckowski, Benjamin Bross, Detlev Marpe

Abstract: Traditional per-title encoding schemes aim to optimize encoding resolutions to deliver the highest perceptual quality for each representation. XPSNR is observed to correlate better with the subjective quality of VVC-coded bitstreams. Towards this realization, we predict the average XPSNR of VVC-coded bitstreams using spatiotemporal complexity features of the video and the target encoding configura… ▽ More Traditional per-title encoding schemes aim to optimize encoding resolutions to deliver the highest perceptual quality for each representation. XPSNR is observed to correlate better with the subjective quality of VVC-coded bitstreams. Towards this realization, we predict the average XPSNR of VVC-coded bitstreams using spatiotemporal complexity features of the video and the target encoding configuration using an XGBoost-based model. Based on the predicted XPSNR scores, we introduce a Quality-A ware Dynamic Resolution Adaptation (QADRA) framework for adaptive video streaming applications, where we determine the convex-hull online. Furthermore, keeping the encoding and decoding times within an acceptable threshold is mandatory for smooth and energy-efficient streaming. Hence, QADRA determines the encoding resolution and quantization parameter (QP) for each target bitrate by maximizing XPSNR while constraining the maximum encoding and/ or decoding time below a threshold. QADRA implements a JND-based representation elimination algorithm to remove perceptually redundant representations from the bitrate ladder. QADRA is an open-source Python-based framework published under the GNU GPLv3 license. Github: https://github.com/PhoenixVideo/QADRA Online documentation: https://phoenixvideo.github.io/QADRA/ △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: ACM MMSys '24 | Open-Source Software and Dataset. arXiv admin note: substantial text overlap with arXiv:2401.15346

arXiv:2402.03513 [pdf, other]

Video Super-Resolution for Optimized Bitrate and Green Online Streaming

Authors: Vignesh V Menon, Prajit T Rajendran, Amritha Premkumar, Benjamin Bross, Detlev Marpe

Abstract: Conventional per-title encoding schemes strive to optimize encoding resolutions to deliver the utmost perceptual quality for each bitrate ladder representation. Nevertheless, maintaining encoding time within an acceptable threshold is equally imperative in online streaming applications. Furthermore, modern client devices are equipped with the capability for fast deep-learning-based video super-res… ▽ More Conventional per-title encoding schemes strive to optimize encoding resolutions to deliver the utmost perceptual quality for each bitrate ladder representation. Nevertheless, maintaining encoding time within an acceptable threshold is equally imperative in online streaming applications. Furthermore, modern client devices are equipped with the capability for fast deep-learning-based video super-resolution (VSR) techniques, enhancing the perceptual quality of the decoded bitstream. This suggests that opting for lower resolutions in representations during the encoding process can curtail the overall energy consumption without substantially compromising perceptual quality. In this context, this paper introduces a video super-resolution-based latency-aware optimized bitrate encoding scheme (ViSOR) designed for online adaptive streaming applications. ViSOR determines the encoding resolution for each target bitrate, ensuring the highest achievable perceptual quality after VSR within the bound of a maximum acceptable latency. Random forest-based prediction models are trained to predict the perceptual quality after VSR and the encoding time for each resolution using the spatiotemporal features extracted for each video segment. Experimental results show that ViSOR targeting fast super-resolution convolutional neural network (FSRCNN) achieves an overall average bitrate reduction of 24.65 % and 32.70 % to maintain the same PSNR and VMAF, compared to the HTTP Live Streaming (HLS) bitrate ladder encoding of 4 s segments using the x265 encoder, when the maximum acceptable latency for each representation is set as two seconds. Considering a just noticeable difference (JND) of six VMAF points, the average cumulative storage consumption and encoding energy for each segment is reduced by 79.32 % and 68.21 %, respectively, contributing towards greener streaming. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 2024 Picture Coding Symposium (PCS)

arXiv:2401.15346 [pdf, other]

doi 10.1145/3638036.3640801

Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding

Authors: Vignesh V Menon, Amritha Premkumar, Prajit T Rajendran, Adam Wieckowski, Benjamin Bross, Christian Timmerer, Detlev Marpe

Abstract: Traditional per-title encoding schemes aim to optimize encoding resolutions to deliver the highest perceptual quality for each representation. However, keeping the encoding time within an acceptable threshold for a smooth user experience is important to reduce the carbon footprint and energy consumption on encoding servers in video streaming applications. Toward this realization, we introduce an e… ▽ More Traditional per-title encoding schemes aim to optimize encoding resolutions to deliver the highest perceptual quality for each representation. However, keeping the encoding time within an acceptable threshold for a smooth user experience is important to reduce the carbon footprint and energy consumption on encoding servers in video streaming applications. Toward this realization, we introduce an encoding latency-a ware dynamic resolution encoding scheme (LADRE) for adaptive video streaming applications. LADRE determines the encoding resolution for each target bitrate by utilizing a random forest-based prediction model for every video segment based on spatiotemporal features and the acceptable target latency. Experimental results show that LADRE achieves an overall average quality improvement of 0.58 dB PSNR and 0.43 dB XPSNR while maintaining the same bitrate, compared to the HTTP Live Streaming (HLS) bitrate ladder encoding of 200 s segments using the VVenC encoder, when the encoding latency for each representation is set to remain below the 200 s threshold. This is accompanied by an 84.17 % reduction in overall encoding energy consumption. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: 2024 Mile High Video (MHV)

arXiv:2310.04490 [pdf, other]

Generative Diffusion From An Action Principle

Authors: Akhil Premkumar

Abstract: Generative diffusion models synthesize new samples by reversing a diffusive process that converts a given data set to generic noise. This is accomplished by training a neural network to match the gradient of the log of the probability distribution of a given data set, also called the score. By casting reverse diffusion as an optimal control problem, we show that score matching can be derived from… ▽ More Generative diffusion models synthesize new samples by reversing a diffusive process that converts a given data set to generic noise. This is accomplished by training a neural network to match the gradient of the log of the probability distribution of a given data set, also called the score. By casting reverse diffusion as an optimal control problem, we show that score matching can be derived from an action principle, like the ones commonly used in physics. We use this insight to demonstrate the connection between different classes of diffusion models. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 32 pages + references, 2 figures

arXiv:2004.06856 [pdf, other]

Combining Geometric and Information-Theoretic Approaches for Multi-Robot Exploration

Authors: Aravind Preshant Premkumar, Kevin Yu, Pratap Tokekar

Abstract: We present an algorithm to explore an orthogonal polygon using a team of $p$ robots. This algorithm combines ideas from information-theoretic exploration algorithms and computational geometry based exploration algorithms. We show that the exploration time of our algorithm is competitive (as a function of $p$) with respect to the offline optimal exploration algorithm. The algorithm is based on a si… ▽ More We present an algorithm to explore an orthogonal polygon using a team of $p$ robots. This algorithm combines ideas from information-theoretic exploration algorithms and computational geometry based exploration algorithms. We show that the exploration time of our algorithm is competitive (as a function of $p$) with respect to the offline optimal exploration algorithm. The algorithm is based on a single-robot polygon exploration algorithm, a tree exploration algorithm for higher level planning and a submodular orienteering algorithm for lower level planning. We discuss how this strategy can be adapted to real-world settings to deal with noisy sensors. In addition to theoretical analysis, we investigate the performance of our algorithm through simulations for multiple robots and experiments with a single robot. △ Less

Submitted 14 April, 2020; originally announced April 2020.

arXiv:1307.5435 [pdf, ps, other]

Distributed Computation of the Conditional PCRLB for Quantized Decentralized Particle Filters

Authors: Arash Mohammadi, Amir Asif, Xionghu Zhong, A. B. Premkumar

Abstract: The conditional posterior Cramer-Rao lower bound (PCRLB) is an effective sensor resource management criteria for large, geographically distributed sensor networks. Existing algorithms for distributed computation of the PCRLB (dPCRLB) are based on raw observations leading to significant communication overhead to the estimation mechanism. This letter derives distributed computational techniques for… ▽ More The conditional posterior Cramer-Rao lower bound (PCRLB) is an effective sensor resource management criteria for large, geographically distributed sensor networks. Existing algorithms for distributed computation of the PCRLB (dPCRLB) are based on raw observations leading to significant communication overhead to the estimation mechanism. This letter derives distributed computational techniques for determining the conditional dPCRLB for quantized, decentralized sensor networks (CQ/dPCRLB). Analytical expressions for the CQ/dPCRLB are derived, which are particularly useful for particle filter-based estimators. The CQ/dPCRLB is compared for accuracy with its centralized counterpart through Monte-Carlo simulations. △ Less

Submitted 20 July, 2013; originally announced July 2013.

Showing 1–10 of 10 results for author: Premkumar, A