-
NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results
Authors:
Sangmin Lee,
Eunpil Park,
Angel Canelo,
Hyunhee Park,
Youngjo Kim,
Hyung-Ju Chun,
Xin Jin,
Chongyi Li,
Chun-Le Guo,
Radu Timofte,
Qi Wu,
Tianheng Qiu,
Yuchun Dong,
Shenglin Ding,
Guanghua Pan,
Weiyu Zhou,
Tao Hu,
Yixu Feng,
Duwei Dai,
Yu Cao,
Peng Wu,
Wei Dong,
Yanning Zhang,
Qingsen Yan,
Simon J. Larsen
, et al. (11 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effect…
▽ More
This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effectively fusing these frames while adhering to strict efficiency constraints: fewer than 30 million model parameters and a computational budget under 4.0 trillion FLOPs. A total of 217 participants registered, with six teams finally submitting valid solutions. The top-performing approach achieved a PSNR of 43.22 dB, showcasing the potential of novel methods in this domain. This paper provides a comprehensive overview of the challenge, compares the proposed solutions, and serves as a valuable reference for researchers and practitioners in efficient burst HDR and restoration.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring
Authors:
Yuzhi Zhao,
Lai-Man Po,
Xin Ye,
Yongzhe Xu,
Qiong Yan
Abstract:
Image degradation caused by noise and blur remains a persistent challenge in imaging systems, stemming from limitations in both hardware and methodology. Single-image solutions face an inherent tradeoff between noise reduction and motion blur. While short exposures can capture clear motion, they suffer from noise amplification. Long exposures reduce noise but introduce blur. Learning-based single-…
▽ More
Image degradation caused by noise and blur remains a persistent challenge in imaging systems, stemming from limitations in both hardware and methodology. Single-image solutions face an inherent tradeoff between noise reduction and motion blur. While short exposures can capture clear motion, they suffer from noise amplification. Long exposures reduce noise but introduce blur. Learning-based single-image enhancers tend to be over-smooth due to the limited information. Multi-image solutions using burst mode avoid this tradeoff by capturing more spatial-temporal information but often struggle with misalignment from camera/scene motion. To address these limitations, we propose a physical-model-based image restoration approach leveraging a novel dual-exposure Quad-Bayer pattern sensor. By capturing pairs of short and long exposures at the same starting point but with varying durations, this method integrates complementary noise-blur information within a single image. We further introduce a Quad-Bayer synthesis method (B2QB) to simulate sensor data from Bayer patterns to facilitate training. Based on this dual-exposure sensor model, we design a hierarchical convolutional neural network called QRNet to recover high-quality RGB images. The network incorporates input enhancement blocks and multi-level feature extraction to improve restoration quality. Experiments demonstrate superior performance over state-of-the-art deblurring and denoising methods on both synthetic and real-world datasets. The code, model, and datasets are publicly available at https://github.com/zhaoyuzhi/QRNet.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Decoding Emotions: Unveiling Facial Expressions through Acoustic Sensing with Contrastive Attention
Authors:
Guangjing Wang,
Juexing Wang,
Ce Zhou,
Weikang Ding,
Huacheng Zeng,
Tianxing Li,
Qiben Yan
Abstract:
Expression recognition holds great promise for applications such as content recommendation and mental healthcare by accurately detecting users' emotional states. Traditional methods often rely on cameras or wearable sensors, which raise privacy concerns and add extra device burdens. In addition, existing acoustic-based methods struggle to maintain satisfactory performance when there is a distribut…
▽ More
Expression recognition holds great promise for applications such as content recommendation and mental healthcare by accurately detecting users' emotional states. Traditional methods often rely on cameras or wearable sensors, which raise privacy concerns and add extra device burdens. In addition, existing acoustic-based methods struggle to maintain satisfactory performance when there is a distribution shift between the training dataset and the inference dataset. In this paper, we introduce FacER+, an active acoustic facial expression recognition system, which eliminates the requirement for external microphone arrays. FacER+ extracts facial expression features by analyzing the echoes of near-ultrasound signals emitted between the 3D facial contour and the earpiece speaker on a smartphone. This approach not only reduces background noise but also enables the identification of different expressions from various users with minimal training data. We develop a contrastive external attention-based model to consistently learn expression features across different users, reducing the distribution differences. Extensive experiments involving 20 volunteers, both with and without masks, demonstrate that FacER+ can accurately recognize six common facial expressions with over 90% accuracy in diverse, user-independent real-life scenarios, surpassing the performance of the leading acoustic sensing methods by 10%. FacER+ offers a robust and practical solution for facial expression recognition.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Authors:
Sijing Chen,
Yuan Feng,
Laipeng He,
Tianwei He,
Wendi He,
Yanni Hu,
Bin Lin,
Yiting Lin,
Yu Pan,
Pengfei Tan,
Chengwei Tian,
Chen Wang,
Zhicheng Wang,
Ruoye Xie,
Jixun Yao,
Quanlei Yan,
Yuguang Yang,
Jianhao Ye,
Jingjing Yin,
Yanzhen Yu,
Huimin Zhang,
Xiang Zhang,
Guangcheng Zhao,
Hongbin Zhou,
Pengpeng Zou
Abstract:
With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-…
▽ More
With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-quality speech that is nearly indistinguishable from real human speech and facilitating individuals to customize the speech content according to their own needs. Specifically, we first introduce Takin TTS, a neural codec language model that builds upon an enhanced neural speech codec and a multi-task training framework, capable of generating high-fidelity natural speech in a zero-shot way. For Takin VC, we advocate an effective content and timbre joint modeling approach to improve the speaker similarity, while advocating for a conditional flow matching based decoder to further enhance its naturalness and expressiveness. Last, we propose the Takin Morphing system with highly decoupled and advanced timbre and prosody modeling approaches, which enables individuals to customize speech production with their preferred timbre and prosody in a precise and controllable manner. Extensive experiments validate the effectiveness and robustness of our Takin AudioLLM series models. For detailed demos, please refer to https://everest-ai.github.io/takinaudiollm/.
△ Less
Submitted 23 September, 2024; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Coordinated Spectral Efficiency Prediction for Real-World 5G CoMP Systems
Authors:
Zhixing Chen,
Zhaoyu Fan,
Yang Li,
Yibin Kang,
Qi Yan,
Qingjiang Shi
Abstract:
Coordinated multipoint (CoMP) systems incur substantial resource consumption due to the management of backhaul links and the coordination among various base stations (BSs). Accurate prediction of coordinated spectral efficiency (CSE) can guide the optimization of network parameters, resulting in enhanced resource utilization efficiency. However, characterizing the CSE is intractable due to the inh…
▽ More
Coordinated multipoint (CoMP) systems incur substantial resource consumption due to the management of backhaul links and the coordination among various base stations (BSs). Accurate prediction of coordinated spectral efficiency (CSE) can guide the optimization of network parameters, resulting in enhanced resource utilization efficiency. However, characterizing the CSE is intractable due to the inherent complexity of the CoMP channel model and the diversity of the 5G dynamic network environment, which poses a great challenge for CSE prediction in real-world 5G CoMP systems. To address this challenge, in this letter, we propose a data-driven model-assisted approach. Initially, we leverage domain knowledge to preprocess the collected raw data, thereby creating a well-informed dataset. Within this dataset, we explicitly define the target variable and the input feature space relevant to channel statistics for CSE prediction. Subsequently, a residual-based network model is built to capture the high-dimensional non-linear mapping function from the channel statistics to the CSE. The effectiveness of the proposed approach is validated by experimental results on real-world data.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Enhanced Battery Degradation-Aware Scheduling for Distribution Network with Electric Vehicle Load
Authors:
Vijay Babu Pamshetti,
Wei Zhang,
Andy Man-Fai Ng,
Qingyu Yan,
Kuan Tak Tan
Abstract:
Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the netwo…
▽ More
Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the network performance through energy losses and voltage deviation. We propose Bach for battery degradation-aware cheduling based on e-constraint and fuzzy logic methods. Bach is implemented for the IEEE 33-bus network for an experimental study. The results show the effectiveness of Bach in optimizing costs and performance simultaneously with battery degradation awareness and demonstrate the flexibility of further customization.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task
Authors:
Kangzhen Yang,
Tao Hu,
Kexin Dai,
Genggeng Chen,
Yu Cao,
Wei Dong,
Peng Wu,
Yanning Zhang,
Qingsen Yan
Abstract:
In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev…
▽ More
In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. However, merely performing a single type of image enhancement still cannot yield satisfactory images. In this paper, to deal with the challenge above, we propose the Composite Refinement Network (CRNet) to address this issue using multiple exposure images. By fully integrating information-rich multiple exposure inputs, CRNet can perform unified image restoration and enhancement. To improve the quality of image details, CRNet explicitly separates and strengthens high and low-frequency information through pooling layers, using specially designed Multi-Branch Blocks for effective fusion of these frequencies. To increase the receptive field and fully integrate input features, CRNet employs the High-Frequency Enhancement Module, which includes large kernel convolutions and an inverted bottleneck ConvFFN. Our model secured third place in the first track of the Bracketing Image Restoration and Enhancement Challenge, surpassing previous SOTA models in both testing metrics and visual quality.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition
Authors:
Genggeng Chen,
Kexin Dai,
Kangzhen Yang,
Tao Hu,
Xiangyu Chen,
Yongqing Yang,
Wei Dong,
Peng Wu,
Yanning Zhang,
Qingsen Yan
Abstract:
In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul…
▽ More
In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resulting in less-than-ideal restoration outcomes. Inspired by the notion that high/low frequency information is applicable to different degradations, we introduce HLNet, a Bracketing Image Restoration and Enhancement method based on high-low frequency decomposition. Specifically, we employ two modules for feature extraction: shared weight modules and non-shared weight modules. In the shared weight modules, we use SCConv to extract common features from different degradations. In the non-shared weight modules, we introduce the High-Low Frequency Decomposition Block (HLFDB), which employs different methods to handle high-low frequency information, enabling the model to address different degradations more effectively. Compared to other networks, our method takes into account the characteristics of different degradations, thus achieving higher-quality image restoration.
△ Less
Submitted 24 April, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
Budget Recycling Differential Privacy
Authors:
Bo Jiang,
Jian Du,
Sagar Sharma,
Qiang Yan
Abstract:
Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within…
▽ More
Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.
△ Less
Submitted 12 July, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems?
Authors:
Yuanda Wang,
Qiben Yan,
Nikolay Ivanov,
Xun Chen
Abstract:
The emergence of Artificial Intelligence (AI)-driven audio attacks has revealed new security vulnerabilities in voice control systems. While researchers have introduced a multitude of attack strategies targeting voice control systems (VCS), the continual advancements of VCS have diminished the impact of many such attacks. Recognizing this dynamic landscape, our study endeavors to comprehensively a…
▽ More
The emergence of Artificial Intelligence (AI)-driven audio attacks has revealed new security vulnerabilities in voice control systems. While researchers have introduced a multitude of attack strategies targeting voice control systems (VCS), the continual advancements of VCS have diminished the impact of many such attacks. Recognizing this dynamic landscape, our study endeavors to comprehensively assess the resilience of commercial voice control systems against a spectrum of malicious audio attacks. Through extensive experimentation, we evaluate six prominent attack techniques across a collection of voice control interfaces and devices. Contrary to prevailing narratives, our results suggest that commercial voice control systems exhibit enhanced resistance to existing threats. Particularly, our research highlights the ineffectiveness of white-box attacks in black-box scenarios. Furthermore, the adversaries encounter substantial obstacles in obtaining precise gradient estimations during query-based interactions with commercial systems, such as Apple Siri and Samsung Bixby. Meanwhile, we find that current defense strategies are not completely immune to advanced attacks. Our findings contribute valuable insights for enhancing defense mechanisms in VCS. Through this survey, we aim to raise awareness within the academic community about the security concerns of VCS and advocate for continued research in this crucial area.
△ Less
Submitted 4 January, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Towards High-quality HDR Deghosting with Conditional Diffusion Models
Authors:
Qingsen Yan,
Tao Hu,
Yuan Sun,
Hao Tang,
Yu Zhu,
Wei Dong,
Luc Van Gool,
Yanning Zhang
Abstract:
High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting…
▽ More
High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting problem as an image generation that leverages LDR features as the diffusion model's condition, consisting of the feature condition generator and the noise predictor. Feature condition generator employs attention and Domain Feature Alignment (DFA) layer to transform the intermediate features to avoid ghosting artifacts. With the learned features as conditions, the noise predictor leverages a stochastic iterative denoising process for diffusion models to generate an HDR image by steering the sampling process. Furthermore, to mitigate semantic confusion caused by the saturation problem of LDR images, we design a sliding window noise estimator to sample smooth noise in a patch-based manner. In addition, an image space loss is proposed to avoid the color distortion of the estimated HDR results. We empirically evaluate our model on benchmark datasets for HDR imaging. The results demonstrate that our approach achieves state-of-the-art performances and well generalization to real-world images.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems
Authors:
Hanqing Guo,
Xun Chen,
Junfeng Guo,
Li Xiao,
Qiben Yan
Abstract:
Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation…
▽ More
Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation of existing poisoning attacks against unseen targets. Then, we optimize a universal backdoor that is capable of attacking arbitrary targets. Next, we embed the speaker's characteristics and semantics information into the backdoor, making it imperceptible. Finally, we estimate the channel distortion and integrate it into the backdoor. We validate our attack on 6 popular SV models. Specifically, we poison a total of 53 models and use our trigger to attack 16,430 enrolled speakers, composed of 310 target speakers enrolled in 53 poisoned models. Our attack achieves 100% attack success rate with a 15% poison rate. By decreasing the poison rate to 3%, the attack success rate remains around 50%. We validate our attack in 3 real-world scenarios and successfully demonstrate the attack through both over-the-air and over-the-telephony-line scenarios.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease
Authors:
Lan Wang,
Ruiling He,
Lili Zhao,
Jia Wang,
Zhengzi Geng,
Tao Ren,
Guo Zhang,
Peng Zhang,
Kaiqiang Tang,
Chaofei Gao,
Fei Chen,
Liting Zhang,
Yonghe Zhou,
Xin Li,
Fanbin He,
Hui Huan,
Wenjuan Wang,
Yunxiao Liang,
Juan Tang,
Fang Ai,
Tingyu Wang,
Liyun Zheng,
Zhongwei Zhao,
Jiansong Ji,
Wei Liu
, et al. (22 additional authors not shown)
Abstract:
Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV).
Design: A prospective multicenter study was conducted in patients with…
▽ More
Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV).
Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV.
Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01).
Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation
Authors:
Yuanda Wang,
Hanqing Guo,
Guangjing Wang,
Bocheng Chen,
Qiben Yan
Abstract:
Deep learning based voice synthesis technology generates artificial human-like speeches, which has been used in deepfakes or identity theft attacks. Existing defense mechanisms inject subtle adversarial perturbations into the raw speech audios to mislead the voice synthesis models. However, optimizing the adversarial perturbation not only consumes substantial computation time, but it also requires…
▽ More
Deep learning based voice synthesis technology generates artificial human-like speeches, which has been used in deepfakes or identity theft attacks. Existing defense mechanisms inject subtle adversarial perturbations into the raw speech audios to mislead the voice synthesis models. However, optimizing the adversarial perturbation not only consumes substantial computation time, but it also requires the availability of entire speech. Therefore, they are not suitable for protecting live speech streams, such as voice messages or online meetings. In this paper, we propose VSMask, a real-time protection mechanism against voice synthesis attacks. Different from offline protection schemes, VSMask leverages a predictive neural network to forecast the most effective perturbation for the upcoming streaming speech. VSMask introduces a universal perturbation tailored for arbitrary speech input to shield a real-time speech in its entirety. To minimize the audio distortion within the protected speech, we implement a weight-based perturbation constraint to reduce the perceptibility of the added perturbation. We comprehensively evaluate VSMask protection performance under different scenarios. The experimental results indicate that VSMask can effectively defend against 3 popular voice synthesis models. None of the synthetic voice could deceive the speaker verification models or human ears with VSMask protection. In a physical world experiment, we demonstrate that VSMask successfully safeguards the real-time speech by injecting the perturbation over the air.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Coherently amplified ultrafast imaging using a free-electron interferometer
Authors:
Tomer Bucher,
Harel Nahari,
Hanan Herzig Sheinfux,
Ron Ruimy,
Arthur Niedermayr,
Raphael Dahan,
Qinghui Yan,
Yuval Adiv,
Michael Yannai,
Jialin Chen,
Yaniv Kurman,
Sang Tae Park,
Daniel J. Masiel,
Eli Janzen,
James H. Edgar,
Fabrizio Carbone,
Guy Bartal,
Shai Tsesses,
Frank H. L. Koppens,
Giovanni Maria Vanacore,
Ido Kaminer
Abstract:
Accessing the low-energy non-equilibrium dynamics of materials and their polaritons with simultaneous high spatial and temporal resolution has been a bold frontier of electron microscopy in recent years. One of the main challenges lies in the ability to retrieve extremely weak signals while simultaneously disentangling amplitude and phase information. Here, we present Free-Electron Ramsey Imaging…
▽ More
Accessing the low-energy non-equilibrium dynamics of materials and their polaritons with simultaneous high spatial and temporal resolution has been a bold frontier of electron microscopy in recent years. One of the main challenges lies in the ability to retrieve extremely weak signals while simultaneously disentangling amplitude and phase information. Here, we present Free-Electron Ramsey Imaging (FERI), a microscopy approach based on light-induced electron modulation that enables coherent amplification of optical near-fields in electron imaging. We provide simultaneous time-, space-, and phase-resolved measurements of a micro-drum made from a hexagonal boron nitride membrane visualizing the sub-cycle dynamics of 2D polariton wavepackets therein. The phase-resolved measurements reveals vortex-anti-vortex singularities on the polariton wavefronts, together with an intriguing phenomenon of a traveling wave mimicking the amplitude profile of a standing wave. Our experiments show a 20-fold coherent amplification of the near-field signal compared to conventional electron near-field imaging, resolving peak field intensities in the order of ~W/cm2, corresponding to field amplitudes of a few kV/m. As a result, our work paves the way for spatio-temporal electron microscopy of biological specimens and quantum materials, exciting yet delicate samples that are currently difficult to investigate.
△ Less
Submitted 16 July, 2024; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives
Authors:
Haoning Wu,
Erli Zhang,
Liang Liao,
Chaofeng Chen,
Jingwen Hou,
Annan Wang,
Wenxiu Sun,
Qiong Yan,
Weisi Lin
Abstract:
The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on conte…
▽ More
The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on contents. To understand how these two perspectives affect overall subjective opinions in UGC-VQA, we conduct a large-scale subjective study to collect human quality opinions on overall quality of videos as well as perceptions from aesthetic and technical perspectives. The collected Disentangled Video Quality Database (DIVIDE-3k) confirms that human quality opinions on UGC videos are universally and inevitably affected by both aesthetic and technical perspectives. In light of this, we propose the Disentangled Objective Video Quality Evaluator (DOVER) to learn the quality of UGC videos based on the two perspectives. The DOVER proves state-of-the-art performance in UGC-VQA under very high efficiency. With perspective opinions in DIVIDE-3k, we further propose DOVER++, the first approach to provide reliable clear-cut quality evaluations from a single aesthetic or technical perspective. Code at https://github.com/VQAssessment/DOVER.
△ Less
Submitted 7 March, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring
Authors:
Jisheng Bai,
Jianfeng Chen,
Mou Wang,
Muhammad Saad Ayub,
Qingli Yan
Abstract:
Anomalous sound detection for machine condition monitoring has great potential in the development of Industry 4.0. However, these anomalous sounds of machines are usually unavailable in normal conditions. Therefore, the models employed have to learn acoustic representations with normal sounds for training, and detect anomalous sounds while testing. In this article, we propose a self-supervised dua…
▽ More
Anomalous sound detection for machine condition monitoring has great potential in the development of Industry 4.0. However, these anomalous sounds of machines are usually unavailable in normal conditions. Therefore, the models employed have to learn acoustic representations with normal sounds for training, and detect anomalous sounds while testing. In this article, we propose a self-supervised dual-path Transformer (SSDPT) network to detect anomalous sounds in machine monitoring. The SSDPT network splits the acoustic features into segments and employs several DPT blocks for time and frequency modeling. DPT blocks use attention modules to alternately model the interactive information about the frequency and temporal components of the segmented acoustic features. To address the problem of lack of anomalous sound, we adopt a self-supervised learning approach to train the network with normal sound. Specifically, this approach randomly masks and reconstructs the acoustic features, and jointly classifies machine identity information to improve the performance of anomalous sound detection. We evaluated our method on the DCASE2021 task2 dataset. The experimental results show that the SSDPT network achieves a significant increase in the harmonic mean AUC score, in comparison to present state-of-the-art methods of anomalous sound detection.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
Sparse-based Domain Adaptation Network for OCTA Image Super-Resolution Reconstruction
Authors:
Huaying Hao,
Cong Xu,
Dan Zhang,
Qifeng Yan,
Jiong Zhang,
Yue Liu,
Yitian Zhao
Abstract:
Retinal Optical Coherence Tomography Angiography (OCTA) with high-resolution is important for the quantification and analysis of retinal vasculature. However, the resolution of OCTA images is inversely proportional to the field of view at the same sampling frequency, which is not conducive to clinicians for analyzing larger vascular areas. In this paper, we propose a novel Sparse-based domain Adap…
▽ More
Retinal Optical Coherence Tomography Angiography (OCTA) with high-resolution is important for the quantification and analysis of retinal vasculature. However, the resolution of OCTA images is inversely proportional to the field of view at the same sampling frequency, which is not conducive to clinicians for analyzing larger vascular areas. In this paper, we propose a novel Sparse-based domain Adaptation Super-Resolution network (SASR) for the reconstruction of realistic 6x6 mm2/low-resolution (LR) OCTA images to high-resolution (HR) representations. To be more specific, we first perform a simple degradation of the 3x3 mm2/high-resolution (HR) image to obtain the synthetic LR image. An efficient registration method is then employed to register the synthetic LR with its corresponding 3x3 mm2 image region within the 6x6 mm2 image to obtain the cropped realistic LR image. We then propose a multi-level super-resolution model for the fully-supervised reconstruction of the synthetic data, guiding the reconstruction of the realistic LR images through a generative-adversarial strategy that allows the synthetic and realistic LR images to be unified in the feature domain. Finally, a novel sparse edge-aware loss is designed to dynamically optimize the vessel edge structure. Extensive experiments on two OCTA sets have shown that our method performs better than state-of-the-art super-resolution reconstruction methods. In addition, we have investigated the performance of the reconstruction results on retina structure segmentations, which further validate the effectiveness of our approach.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing
Authors:
Hanqing Guo,
Chenning Li,
Lingkun Li,
Zhichao Cao,
Qiben Yan,
Li Xiao
Abstract:
In this paper, we propose NEC (Neural Enhanced Cancellation), a defense mechanism, which prevents unauthorized microphones from capturing a target speaker's voice. Compared with the existing scrambling-based audio cancellation approaches, NEC can selectively remove a target speaker's voice from a mixed speech without causing interference to others. Specifically, for a target speaker, we design a D…
▽ More
In this paper, we propose NEC (Neural Enhanced Cancellation), a defense mechanism, which prevents unauthorized microphones from capturing a target speaker's voice. Compared with the existing scrambling-based audio cancellation approaches, NEC can selectively remove a target speaker's voice from a mixed speech without causing interference to others. Specifically, for a target speaker, we design a Deep Neural Network (DNN) model to extract high-level speaker-specific but utterance-independent vocal features from his/her reference audios. When the microphone is recording, the DNN generates a shadow sound to cancel the target voice in real-time. Moreover, we modulate the audible shadow sound onto an ultrasound frequency, making it inaudible for humans. By leveraging the non-linearity of the microphone circuit, the microphone can accurately decode the shadow sound for target voice cancellation. We implement and evaluate NEC comprehensively with 8 smartphone microphones in different settings. The results show that NEC effectively mutes the target speaker at a microphone without interfering with other users' normal conversations.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Exploring the Effectiveness of Video Perceptual Representation in Blind Video Quality Assessment
Authors:
Liang Liao,
Kangmin Xu,
Haoning Wu,
Chaofeng Chen,
Wenxiu Sun,
Qiong Yan,
Weisi Lin
Abstract:
With the rapid growth of in-the-wild videos taken by non-specialists, blind video quality assessment (VQA) has become a challenging and demanding problem. Although lots of efforts have been made to solve this problem, it remains unclear how the human visual system (HVS) relates to the temporal quality of videos. Meanwhile, recent work has found that the frames of natural video transformed into the…
▽ More
With the rapid growth of in-the-wild videos taken by non-specialists, blind video quality assessment (VQA) has become a challenging and demanding problem. Although lots of efforts have been made to solve this problem, it remains unclear how the human visual system (HVS) relates to the temporal quality of videos. Meanwhile, recent work has found that the frames of natural video transformed into the perceptual domain of the HVS tend to form a straight trajectory of the representations. With the obtained insight that distortion impairs the perceived video quality and results in a curved trajectory of the perceptual representation, we propose a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation. Specifically, we first extract the video perceptual representations from the lateral geniculate nucleus (LGN) and primary visual area (V1) of the HVS, and then measure the straightness and compactness of their trajectories to quantify the degradation in naturalness and content continuity of video. Experiments show that the perceptual representation in the HVS is an effective way of predicting subjective temporal quality, and thus TPQI can, for the first time, achieve comparable performance to the spatial quality metric and be even more effective in assessing videos with large temporal variations. We further demonstrate that by combining with NIQE, a spatial quality metric, TPQI can achieve top performance over popular in-the-wild video datasets. More importantly, TPQI does not require any additional information beyond the video being evaluated and thus can be applied to any datasets without parameter tuning. Source code is available at https://github.com/UoLMM/TPQI-VQA.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration
Authors:
Yuzhi Zhao,
Yongzhe Xu,
Qiong Yan,
Dingdong Yang,
Xuehui Wang,
Lai-Man Po
Abstract:
Night imaging with modern smartphone cameras is troublesome due to low photon count and unavoidable noise in the imaging system. Directly adjusting exposure time and ISO ratings cannot obtain sharp and noise-free images at the same time in low-light conditions. Though many methods have been proposed to enhance noisy or blurry night images, their performances on real-world night photos are still un…
▽ More
Night imaging with modern smartphone cameras is troublesome due to low photon count and unavoidable noise in the imaging system. Directly adjusting exposure time and ISO ratings cannot obtain sharp and noise-free images at the same time in low-light conditions. Though many methods have been proposed to enhance noisy or blurry night images, their performances on real-world night photos are still unsatisfactory due to two main reasons: 1) Limited information in a single image and 2) Domain gap between synthetic training images and real-world photos (e.g., differences in blur area and resolution). To exploit the information from successive long- and short-exposure images, we propose a learning-based pipeline to fuse them. A D2HNet framework is developed to recover a high-quality image by deblurring and enhancing a long-exposure image under the guidance of a short-exposure image. To shrink the domain gap, we leverage a two-phase DeblurNet-EnhanceNet architecture, which performs accurate blur removal on a fixed low resolution so that it is able to handle large ranges of blur in different resolution inputs. In addition, we synthesize a D2-Dataset from HD videos and experiment on it. The results on the validation set and real photos demonstrate our methods achieve better visual quality and state-of-the-art quantitative scores. The D2HNet codes and D2-Dataset can be found at https://github.com/zhaoyuzhi/D2HNet.
△ Less
Submitted 14 July, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech
Authors:
Hanqing Guo,
Qiben Yan,
Nikolay Ivanov,
Ying Zhu,
Li Xiao,
Eric J. Hunter
Abstract:
Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrogr…
▽ More
Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrographic features extracted from an audible frequency range of voice commands. However, they often have high error rates and/or long delays. In this paper, we explore a new direction of human voice research by scrutinizing the unique characteristics of human speech at the ultrasound frequency band. Our research indicates that the high-frequency ultrasound components (e.g. speech fricatives) from 20 to 48 kHz can significantly enhance the security and accuracy of speaker verification. We propose a speaker verification system, SUPERVOICE that uses a two-stream DNN architecture with a feature fusion mechanism to generate distinctive speaker models. To test the system, we create a speech dataset with 12 hours of audio (8,950 voice samples) from 127 participants. In addition, we create a second spoofed voice dataset to evaluate its security. In order to balance between controlled recordings and real-world applications, the audio recordings are collected from two quiet rooms by 8 different recording devices, including 7 smartphones and an ultrasound microphone. Our evaluation shows that SUPERVOICE achieves 0.58% equal error rate in the speaker verification task, it only takes 120 ms for testing an incoming utterance, outperforming all existing speaker verification systems. Moreover, within 91 ms processing time, SUPERVOICE achieves 0% equal error rate in detecting replay attacks launched by 5 different loudspeakers.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results
Authors:
Eduardo Pérez-Pellitero,
Sibi Catley-Chandar,
Richard Shaw,
Aleš Leonardis,
Radu Timofte,
Zexin Zhang,
Cen Liu,
Yunbo Peng,
Yue Lin,
Gaocheng Yu,
Jin Zhang,
Zhe Ma,
Hongbin Wang,
Xiangyu Chen,
Xintao Wang,
Haiwei Wu,
Lin Liu,
Chao Dong,
Jiantao Zhou,
Qingsen Yan,
Song Zhang,
Weiye Chen,
Yuhang Liu,
Zhen Zhang,
Yanning Zhang
, et al. (68 additional authors not shown)
Abstract:
This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)…
▽ More
This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemapping operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds).
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
ChildPredictor: A Child Face Prediction Framework with Disentangled Learning
Authors:
Yuzhi Zhao,
Lai-Man Po,
Xuehui Wang,
Qiong Yan,
Wei Shen,
Yujia Zhang,
Wei Liu,
Chun-Kit Wong,
Chiu-Sing Pang,
Weifeng Ou,
Wing-Yin Yu,
Buhua Liu
Abstract:
The appearances of children are inherited from their parents, which makes it feasible to predict them. Predicting realistic children's faces may help settle many social problems, such as age-invariant face recognition, kinship verification, and missing child identification. It can be regarded as an image-to-image translation task. Existing approaches usually assume domain information in the image-…
▽ More
The appearances of children are inherited from their parents, which makes it feasible to predict them. Predicting realistic children's faces may help settle many social problems, such as age-invariant face recognition, kinship verification, and missing child identification. It can be regarded as an image-to-image translation task. Existing approaches usually assume domain information in the image-to-image translation can be interpreted by "style", i.e., the separation of image content and style. However, such separation is improper for the child face prediction, because the facial contours between children and parents are not the same. To address this issue, we propose a new disentangled learning strategy for children's face prediction. We assume that children's faces are determined by genetic factors (compact family features, e.g., face contour), external factors (facial attributes irrelevant to prediction, such as moustaches and glasses), and variety factors (individual properties for each child). On this basis, we formulate predictions as a mapping from parents' genetic factors to children's genetic factors, and disentangle them from external and variety factors. In order to obtain accurate genetic factors and perform the mapping, we propose a ChildPredictor framework. It transfers human faces to genetic factors by encoders and back by generators. Then, it learns the relationship between the genetic factors of parents and children through a mapping function. To ensure the generated faces are realistic, we collect a large Family Face Database to train ChildPredictor and evaluate it on the FF-Database validation set. Experimental results demonstrate that ChildPredictor is superior to other well-known image-to-image translation methods in predicting realistic and diverse child faces. Implementation codes can be found at https://github.com/zhaoyuzhi/ChildPredictor.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
System-Wide Security for Offline Payment Terminals
Authors:
Nikolay Ivanov,
Qiben Yan
Abstract:
Most self-service payment terminals require network connectivity for processing electronic payments. The necessity to maintain network connectivity increases costs, introduces cybersecurity risks, and significantly limits the number of places where the terminals can be installed. Leading payment service providers have proposed offline payment solutions that rely on algorithmically generated paymen…
▽ More
Most self-service payment terminals require network connectivity for processing electronic payments. The necessity to maintain network connectivity increases costs, introduces cybersecurity risks, and significantly limits the number of places where the terminals can be installed. Leading payment service providers have proposed offline payment solutions that rely on algorithmically generated payment tokens. Existing payment token solutions, however, require complex mechanisms for authentication, transaction management, and most importantly, security risk management. In this paper, we present VolgaPay, a blockchain-based system that allows merchants to deploy secure offline payment terminal infrastructure that does not require collection and storage of any sensitive data. We design a novel payment protocol which mitigates security threats for all the participants of VolgaPay, such that the maximum loss from gaining full access to any component by an adversary incurs only a limited scope of harm. We achieve significant enhancements in security, operation efficiency, and cost reduction via a combination of polynomial multi-hash chain micropayment channels and blockchain grafting for off-chain channel state transition. We implement the VolgaPay payment system, and with thorough evaluation and security analysis, we demonstrate that VolgaPay is capable of delivering a fast, secure, and cost-efficient solution for offline payment terminals.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results
Authors:
Kai Zhang,
Martin Danelljan,
Yawei Li,
Radu Timofte,
Jie Liu,
Jie Tang,
Gangshan Wu,
Yu Zhu,
Xiangyu He,
Wenjie Xu,
Chenghua Li,
Cong Leng,
Jian Cheng,
Guangyang Wu,
Wenyi Wang,
Xiaohong Liu,
Hengyuan Zhao,
Xiangtao Kong,
Jingwen He,
Yu Qiao,
Chao Dong,
Xiaotong Luo,
Liang Chen,
Jiangtao Zhang,
Maitreya Suin
, et al. (60 additional authors not shown)
Abstract:
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter co…
▽ More
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
Guided Collaborative Training for Pixel-wise Semi-Supervised Learning
Authors:
Zhanghan Ke,
Di Qiu,
Kaican Li,
Qiong Yan,
Rynson W. H. Lau
Abstract:
We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use…
▽ More
We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use task-specific properties. In this paper, we present a new SSL framework, named Guided Collaborative Training (GCT), for pixel-wise tasks, with two main technical contributions. First, GCT addresses the issues caused by the dense outputs through a novel flaw detector. Second, the modules in GCT learn from unlabeled data collaboratively through two newly proposed constraints that are independent of task-specific properties. As a result, GCT can be applied to a wide range of pixel-wise tasks without structural adaptation. Our extensive experiments on four challenging vision tasks, including semantic segmentation, real image denoising, portrait image matting, and night image enhancement, show that GCT outperforms state-of-the-art SSL methods by a large margin. Our code available at: https://github.com/ZHKKKe/PixelSSL.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
Attention-based network for low-light image enhancement
Authors:
Cheng Zhang,
Qingsen Yan,
Yu zhu,
Xianjun Li,
Jinqiu Sun,
Yanning Zhang
Abstract:
The captured images under low light conditions often suffer insufficient brightness and notorious noise. Hence, low-light image enhancement is a key challenging task in computer vision. A variety of methods have been proposed for this task, but these methods often failed in an extreme low-light environment and amplified the underlying noise in the input image. To address such a difficult problem,…
▽ More
The captured images under low light conditions often suffer insufficient brightness and notorious noise. Hence, low-light image enhancement is a key challenging task in computer vision. A variety of methods have been proposed for this task, but these methods often failed in an extreme low-light environment and amplified the underlying noise in the input image. To address such a difficult problem, this paper presents a novel attention-based neural network to generate high-quality enhanced low-light images from the raw sensor data. Specifically, we first employ attention strategy (i.e. channel attention and spatial attention modules) to suppress undesired chromatic aberration and noise. The channel attention module guides the network to refine redundant colour features. The spatial attention module focuses on denoising by taking advantage of the non-local correlation in the image. Furthermore, we propose a new pooling layer, called inverted shuffle layer, which adaptively selects useful information from previous features. Extensive experiments demonstrate the superiority of the proposed network in terms of suppressing the chromatic aberration and noise artifacts in enhancement, especially when the low-light image has severe noise.
△ Less
Submitted 20 May, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Hierarchical Regression Network for Spectral Reconstruction from RGB Images
Authors:
Yuzhi Zhao,
Lai-Man Po,
Qiong Yan,
Wei Liu,
Tingyu Lin
Abstract:
Capturing visual image with a hyperspectral camera has been successfully applied to many areas due to its narrow-band imaging technology. Hyperspectral reconstruction from RGB images denotes a reverse process of hyperspectral imaging by discovering an inverse response function. Current works mainly map RGB images directly to corresponding spectrum but do not consider context information explicitly…
▽ More
Capturing visual image with a hyperspectral camera has been successfully applied to many areas due to its narrow-band imaging technology. Hyperspectral reconstruction from RGB images denotes a reverse process of hyperspectral imaging by discovering an inverse response function. Current works mainly map RGB images directly to corresponding spectrum but do not consider context information explicitly. Moreover, the use of encoder-decoder pair in current algorithms leads to loss of information. To address these problems, we propose a 4-level Hierarchical Regression Network (HRNet) with PixelShuffle layer as inter-level interaction. Furthermore, we adopt a residual dense block to remove artifacts of real world RGB images and a residual global block to build attention mechanism for enlarging perceptive field. We evaluate proposed HRNet with other architectures and techniques by participating in NTIRE 2020 Challenge on Spectral Reconstruction from RGB Images. The HRNet is the winning method of track 2 - real world images and ranks 3rd on track 1 - clean images. Please visit the project web page https://github.com/zhaoyuzhi/Hierarchical-Regression-Network-for-Spectral-Reconstruction-from-RGB-Images to try our codes and pre-trained models.
△ Less
Submitted 10 May, 2020;
originally announced May 2020.
-
NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results
Authors:
Abdelrahman Abdelhamed,
Mahmoud Afifi,
Radu Timofte,
Michael S. Brown,
Yue Cao,
Zhilu Zhang,
Wangmeng Zuo,
Xiaoling Zhang,
Jiye Liu,
Wendong Chen,
Changyuan Wen,
Meng Liu,
Shuailin Lv,
Yunchao Zhang,
Zhihong Pan,
Baopu Li,
Teng Xi,
Yanwen Fan,
Xiyu Yu,
Gang Zhang,
Jingtuo Liu,
Junyu Han,
Errui Ding,
Songhyun Yu,
Bumjun Park
, et al. (65 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This chall…
▽ More
This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern rawRGB and (2) the standard RGB (sRGB) color spaces. Each track ~250 registered participants. A total of 22 teams, proposing 24 methods, competed in the final phase of the challenge. The proposed methods by the participating teams represent the current state-of-the-art performance in image denoising targeting real noisy images. The newly collected SIDD+ datasets are publicly available at: https://bit.ly/siddplus_data.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
COVID-19 Chest CT Image Segmentation -- A Deep Convolutional Neural Network Solution
Authors:
Qingsen Yan,
Bo Wang,
Dong Gong,
Chuan Luo,
Wei Zhao,
Jianhu Shen,
Qinfeng Shi,
Shuo Jin,
Liang Zhang,
Zheng You
Abstract:
A novel coronavirus disease 2019 (COVID-19) was detected and has spread rapidly across various countries around the world since the end of the year 2019, Computed Tomography (CT) images have been used as a crucial alternative to the time-consuming RT-PCR test. However, pure manual segmentation of CT images faces a serious challenge with the increase of suspected cases, resulting in urgent requirem…
▽ More
A novel coronavirus disease 2019 (COVID-19) was detected and has spread rapidly across various countries around the world since the end of the year 2019, Computed Tomography (CT) images have been used as a crucial alternative to the time-consuming RT-PCR test. However, pure manual segmentation of CT images faces a serious challenge with the increase of suspected cases, resulting in urgent requirements for accurate and automatic segmentation of COVID-19 infections. Unfortunately, since the imaging characteristics of the COVID-19 infection are diverse and similar to the backgrounds, existing medical image segmentation methods cannot achieve satisfactory performance. In this work, we try to establish a new deep convolutional neural network tailored for segmenting the chest CT images with COVID-19 infections. We firstly maintain a large and new chest CT image dataset consisting of 165,667 annotated chest CT images from 861 patients with confirmed COVID-19. Inspired by the observation that the boundary of the infected lung can be enhanced by adjusting the global intensity, in the proposed deep CNN, we introduce a feature variation block which adaptively adjusts the global properties of the features for segmenting COVID-19 infection. The proposed FV block can enhance the capability of feature representation effectively and adaptively for diverse cases. We fuse features at different scales by proposing Progressive Atrous Spatial Pyramid Pooling to handle the sophisticated infection areas with diverse appearance and shapes. We conducted experiments on the data collected in China and Germany and show that the proposed deep CNN can produce impressive performance effectively.
△ Less
Submitted 25 April, 2020; v1 submitted 23 April, 2020;
originally announced April 2020.