-
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results
Authors:
Xin Li,
Kun Yuan,
Bingchen Li,
Fengbin Guan,
Yizhen Shao,
Zihao Yu,
Xijun Wang,
Yiting Lu,
Wei Luo,
Suhang Yao,
Ming Sun,
Chao Zhou,
Zhibo Chen,
Radu Timofte,
Yabin Zhang,
Ao-Xiang Zhang,
Tianwu Zhi,
Jianzhao Liu,
Yang Li,
Jingwen Xu,
Yiting Liao,
Yushen Zuo,
Mingyang Wu,
Renjie Li,
Shengyun Zhong
, et al. (88 additional authors not shown)
Abstract:
This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re…
▽ More
This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating reliance on model ensembles, redundant weights, and other computationally expensive components in the previous IQA/VQA competitions. Track 2 introduces a new short-form UGC dataset tailored for single image super-resolution, i.e., the KwaiSR dataset. It consists of 1,800 synthetically generated S-UGC image pairs and 1,900 real-world S-UGC images, which are split into training, validation, and test sets using a ratio of 8:1:1. The primary objective of the challenge is to drive research that benefits the user experience of short-form UGC platforms such as Kwai and TikTok. This challenge attracted 266 participants and received 18 valid final submissions with corresponding fact sheets, significantly contributing to the progress of short-form UGC VQA and image superresolution. The project is publicly available at https://github.com/lixinustc/KVQE- ChallengeCVPR-NTIRE2025.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
On the Solution of Linearized Inverse Scattering Problems in Near-Field Microwave Imaging by Operator Inversion and Matched Filtering
Authors:
Matthias M. Saurer,
Han Na,
Marius Brinkmann,
Thomas F. Eibert
Abstract:
Microwave imaging is commonly based on the solution of linearized inverse scattering problems by matched filtering algorithms, i.e., by applying the adjoint of the forward scattering operator to the observation data. A more rigorous approach is the explicit inversion of the forward scattering operator, which is performed in this work for quasi-monostatic imaging scenarios based on a planar plane-w…
▽ More
Microwave imaging is commonly based on the solution of linearized inverse scattering problems by matched filtering algorithms, i.e., by applying the adjoint of the forward scattering operator to the observation data. A more rigorous approach is the explicit inversion of the forward scattering operator, which is performed in this work for quasi-monostatic imaging scenarios based on a planar plane-wave representation according to the Weyl-identity and hierarchical acceleration algorithms. The inversion is achieved by a regularized iterative linear system of equations solver, where irregular observations as well as full probe correction are supported. In the spatial image generation low-pass filtering can be considered in order to reduce imaging artifacts. A corresponding spectral backprojection algorithm and a spatial back-projection algorithm together with improved focusing operators are also introduced and the resulting image generation algorithms are analyzed and compared for a variety of examples, comprising both simulated and measured observation data. It is found that the inverse source solution generally performs better in term of robustness, focusing capabilities, and image accuracy compared to the adjoint imaging algorithms either operating in the spatial or spectral domain. This is especially demonstrated in the context of irregular sampling grids with non-ideal or truncated observation data and by evaluating all reconstruction results based on a rigorous quantitative analysis.
△ Less
Submitted 20 December, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Finite-Time Analysis of Simultaneous Double Q-learning
Authors:
Hyunjun Na,
Donghwan Lee
Abstract:
$Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-learning update. To address this issue, double $Q$-learning employs two independent $Q$-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double $Q…
▽ More
$Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-learning update. To address this issue, double $Q$-learning employs two independent $Q$-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double $Q$-learning, called simultaneous double $Q$-learning (SDQ), with its finite-time analysis. SDQ eliminates the need for random selection between the two $Q$-estimators, and this modification allows us to analyze double $Q$-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double $Q$-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
SpeechBrain: A General-Purpose Speech Toolkit
Authors:
Mirco Ravanelli,
Titouan Parcollet,
Peter Plantinga,
Aku Rouhe,
Samuele Cornell,
Loren Lugosch,
Cem Subakan,
Nauman Dawalatabad,
Abdelwahab Heba,
Jianyuan Zhong,
Ju-Chieh Chou,
Sung-Lin Yeh,
Szu-Wei Fu,
Chien-Feng Liao,
Elena Rastorgueva,
François Grondin,
William Aris,
Hwidong Na,
Yan Gao,
Renato De Mori,
Yoshua Bengio
Abstract:
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing…
▽ More
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
ECAPA-TDNN Embeddings for Speaker Diarization
Authors:
Nauman Dawalatabad,
Mirco Ravanelli,
François Grondin,
Jenthe Thienpondt,
Brecht Desplanques,
Hwidong Na
Abstract:
Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental component of modern diarization systems. Recently, some improvements over the standard TDNN architecture used for x-vectors have been proposed. The ECAPA-TDNN model, f…
▽ More
Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental component of modern diarization systems. Recently, some improvements over the standard TDNN architecture used for x-vectors have been proposed. The ECAPA-TDNN model, for instance, has shown impressive performance in the speaker verification domain, thanks to a carefully designed neural model.
In this work, we extend, for the first time, the use of the ECAPA-TDNN model to speaker diarization. Moreover, we improved its robustness with a powerful augmentation scheme that concatenates several contaminated versions of the same signal within the same training batch. The ECAPA-TDNN model turned out to provide robust speaker embeddings under both close-talking and distant-talking conditions. Our results on the popular AMI meeting corpus show that our system significantly outperforms recently proposed approaches.
△ Less
Submitted 3 April, 2021;
originally announced April 2021.